rwasa

UPDATE August 2016: rwasa is now running the flatassembler board! As a result of the migration and some interesting configuration requirements there, some undiscovered bugs were uncovered and fixed. Check out the latest ChangeLog for details.

rwasa is our full-featured, high performance, scalable web server designed to compete with the likes of nginx. It has been built from the ground-up with no externel library dependencies entirely in x86_64 assembly language, and is the result of many years' experience with high volume web environments. In addition to all of the common things you'd expect a modern web server to do, we also include assembly language function hooks ready-made to facilitate Rapid Web Application Server (in Assembler) development.

We appreciate that there is already a plethora of web server software available. In our opinion, you should only care about rwasa if:

  • You run any sized https/TLS server(s). Regardless of volume, you should care about the latency your users experience when they interact with your secure sites.
  • You run very high volume web traffic. As shown in our performance tests below, rwasa is capable of much higher requests per second per CPU than most all other web servers.

Feature Highlights

  • Open source/GPLv3
  • Entirety hand-written in x86_64 assembly language
  • Faster than nginx for most environments
  • Commercial support available
  • TLS auto-blacklisting for anti-tampering
  • OCSP Stapling by default
  • Randomized Diffie-Hellman safe prime pool
  • Multi-process lockless TLS session resumption cache
  • TLS session cache is encrypted by default
  • Faster dynamic content compression
  • Large-scale FastCGI safely via unix sockets (without hitting EAGAIN)
  • Large-scale backpath (aka upstream) safely via unix sockets
  • Simple command-line arguments covers all common configurations
  • Small footprint, no external dependencies
  • Server-side BREACH mitigation (randomized headers, see notes)
  • HSTS enabled by default

Download

A standalone executable binary of rwasa can be downloaded from our Products page. Please note that you will have to chmod +x it after download.

As with all our products, rwasa is bundled with the HeavyThing library itself. The download link for our library is in the top right of every page on our site, along with the SHA256 sum of the download itself. If you have downloaded the same version and your SHA256 does not match, it has been modified by parties other than ourselves.

NOTE: Compiling from sources is only necessary if you wish to modify rwasa itself or make configuration changes to the HeavyThing library. We have included two binaries for rwasa, one that supports Perfect Forward Secrecy and a separate binary for TLS minimalists titled rwasa_tlsmin.

Table of Contents

Please forgive us for the giant wall of content presented here, but we prefer one scrollable, navigable and connected page rather than many disjointed ones.

Performance Tests

When we did our initial release, we were surprised by the amount of feedback we received about not performing localhost testing to demonstrate "raw" performance of nginx, lighttpd, and rwasa. As a result, the tests we now include are specific to localhost, single-CPU performance of each of the three web servers.

We feel it is important to point out that performance benchmarking can be highly contrived. There are a great many ways to conduct tests, and often the results are highly subjective. To this end, we provide specific details of our test machine, in addition to the specific configurations used for each such that these tests can easily be reproduced outside of our own environment.

Test setup

We configured rwasa, nginx and lighttpd to all use the same TLS parameters as well as Diffie-Hellman parameters. This means the same exact RSA keys and certificates, and the same cipher lists as specified below. Regarding Diffie-Hellman parameters, since the OpenSSL implementations only allow us to specify a single dhparam file, we created both a 4kbit and a 2kbit dhparam and used them for both nginx and lighttpd. By default, our HeavyThing library uses 2048 bit DH, and randomly selects from its pool during normal operations. For the 4kbit test, we specifically recompiled rwasa and modified the dh_bits and dh_privatekey_size settings of our HeavyThing library so that it too would make use of a 4096 bit DHE exchange rather than its default of 2048.

Lighttpd does not support OCSP Stapling. While this does not present significant overhead for our tests, both rwasa and nginx are configured to do so. Further, lighttpd DOES support TLS session resumption, despite there being no documentation that we can find on the subject.

Equipment: 24GB Intel(R) Core(TM) i7 CPU 970 @ 3.20GHz running OpenSUSE 13.1.

Both rwasa and nginx are configured to only use 1 worker process. Despite both rwasa and nginx being capable of scaling beyond this, these tests provide meaningful insights into multi-process scalability when dealing with more grunt than lighttpd could provide with its single process model.

nginx 1.7.9 configuration, noting that for the 4096 bit tests, the ssl_dhparam and ssl_certificate were modified accordingly:


worker_processes  1;

http {
    include       mime.types;
    default_type  application/octet-stream;

    access_log off;
    sendfile        on;

    keepalive_timeout  65;

    gzip  on;
    gzip_comp_level 6;
    gzip_types text/plain text/xml test/css application/x-javascript text/html application/javascript image/svg+xml;

server {
        listen       1080;
        server_name  localhost;

        location / {
            root   html;
            index  index.html index.htm;
        }
}

server {
        listen       1081;
        server_name  localhost;

        ssl                  on;
        # Our key is 2048 bits
        ssl_certificate      rsa_selfsigned_2048.pem;
        ssl_certificate_key  rsa_selfsigned_2048.pem;
        # For the 4096 bit tests, above were commented and these were used:
        # ssl_certificate      rsa_selfsigned_4096.pem;
        # ssl_certificate_key  rsa_selfsigned_4096.pem;

        ssl_session_timeout  10m;
        ssl_session_cache shared:SSL:60m;

        # Our DH parameters are also 2048 bits
        ssl_dhparam dhparam_2k.pem;
        # For the 4096 bit tests, above were commented and this was used:
        # ssl_dhparam dhparam_4k.pem;

        ssl_protocols  TLSv1 TLSv1.1 TLSv1.2;
        ssl_ciphers  DHE-RSA-AES256-SHA256:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-DSS-AES256-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!3DES:!MD5:!PSK;
	# for the non-PFS (no DHE) tests, the below ciphers were used:
	# ssl_ciphers  AES128-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!3DES:!MD5:!PSK;
        ssl_prefer_server_ciphers   on;

        add_header Strict-Transport-Security 'max-age=31536000; includeSubDomains';

        ssl_stapling on;
        ssl_stapling_verify off;

        resolver 10.0.0.1;

        location / {
            root   html;
            index  index.html index.htm;
        }
}

}

Lighttpd 1.4.35 configuration (only the relevant mods are listed):


var.server_root = "/usr/local/nginx"
server.port = 2080
server.username = "nobody"
server.groupname = "nobody"
server.document-root = server_root + "/html"
## for ssl tests, the below was enabled and configured with the same DH + ciphers as nginx above
ssl.engine = "enable"
ssl.pemfile = "/usr/local/nginx/rsa_selfsigned_2048.pem"
## ssl.pemfile = "/usr/local/nginx/rsa_selfsigned_4096.pem"
ssl.dh-file = "/usr/local/nginx/dhparam_2k.pem"
## ssl.dh-file = "/usr/local/nginx/dhparam_4k.pem"
ssl.cipher-list = "DHE-RSA-AES256-SHA256:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-DSS-AES256-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!3DES:!MD5:!PSK"
## ssl.cipher-list = "AES128-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!3DES:!MD5:!PSK"
ssl.honor-cipher-order = "enable"

rwasa configuration, note that for non-PFS (no DHE) tests, rwasa_tlsmin was used:

# # for non-TLS tests:
# ./rwasa -cpu 1 -runas nobody -bind 3080 -sandbox /usr/local/nginx/html -foreground
# # for 2048 TLS tests with no PFS (no DHE):
# ./rwasa_tlsmin -cpu 1 -runas nobody -tls /usr/local/nginx/rsa_selfsigned_2048.pem -bind 3081 -sandbox /usr/local/nginx/html -foreground
# # for 2048 TLS tests with PFS (DHE):
# ./rwasa -cpu 1 -runas nobody -tls /usr/local/nginx/rsa_selfsigned_2048.pem -bind 3081 -sandbox /usr/local/nginx/html -foreground
# # for 4096 TLS tests with no PFS (no DHE):
# ./rwasa_tlsmin -cpu 1 -runas nobody -tls /usr/local/nginx/rsa_selfsigned_4096.pem -bind 3081 -sandbox /usr/local/nginx/html -foreground
# # for 4096 TLS tests with PFS (DHE):
# ./rwasa -cpu 1 -runas nobody -tls /usr/local/nginx/rsa_selfsigned_4096.pem -bind 3081 -sandbox /usr/local/nginx/html -foreground

No TLS, localhost, flat out

To illustrate the raw requests per second handling per-CPU of each web server, we used wrk to produce the four charts in the image below. These tests ran each single-core web server at 100% CPU, and since they are localhost, they highlight very effectively what each has to offer re: raw request rate handling. We set /proc/sys/net/ipv4/tcp_tw_reuse and /proc/sys/net/ipv4/tcp_tw_recycle to 1 due to the high number of localhost connections per second in the no keep-alive test. The specific commands we issued to produce these results were:

# nginx, no TLS, keep-alive, c=256, 1M@12b
# wrk -r 1M -c 256 -t 2 http://127.0.0.1:1080/hello_world.txt
# lighttpd, no TLS, keep-alive, c=256, 1M@12b
# wrk -r 1M -c 256 -t 2 http://127.0.0.1:2080/hello_world.txt
# rwasa, no TLS, keep-alive, c=256, 1M@12b
# wrk -r 1M -c 256 -t 2 http://127.0.0.1:3080/hello_world.txt

# nginx, no TLS, NO keep-alive, c=256, 1M@12b
# wrk -r 1M -c 256 -t 4 -H 'Connection: close' http://127.0.0.1:1080/hello_world.txt
# lighttpd, no TLS, NO keep-alive, c=256, 1M@12b
# wrk -r 1M -c 256 -t 4 -H 'Connection: close' http://127.0.0.1:2080/hello_world.txt
# rwasa, no TLS, NO keep-alive, c=256, 1M@12b
# wrk -r 1M -c 256 -t 4 -H 'Connection: close' http://127.0.0.1:3080/hello_world.txt

# nginx, no TLS, keep-alive, c=256, 1M@128kb
# wrk -r 1M -c 256 -t 2 http://127.0.0.1:1080/rand_128kb.bin
# lighttpd, no TLS, keep-alive, c=256, 1M@128kb
# wrk -r 1M -c 256 -t 2 http://127.0.0.1:2080/rand_128kb.bin
# rwasa, no TLS, keep-alive, c=256, 1M@128kb
# wrk -r 1M -c 256 -t 2 http://127.0.0.1:3080/rand_128kb.bin

# nginx, no TLS, keep-alive, c=1024, 1M@12b
# wrk -r 1M -c 1024 -t 2 http://127.0.0.1:1080/hello_world.txt
# lighttpd, no TLS, keep-alive, c=1024, 1M@12b
# wrk -r 1M -c 1024 -t 2 http://127.0.0.1:2080/hello_world.txt
# rwasa, no TLS, keep-alive, c=1024, 1M@12b
# wrk -r 1M -c 1024 -t 2 http://127.0.0.1:3080/hello_world.txt

No TLS, localhost, flat out

TLS, RSA 2048 bit keys

Since the vast majority of secure web sites are now 2048 bit keys, these tests highlight the various TLS modes of operation for each of the three test web servers. For a typical web browser TLS session with a webserver, it first makes an initial full handshake TLS connection, and then if the server supports TLS Session Resumption, then all connections after that are resumed using the same encryption keys as the first. Perfect Forward Secrecy (PFS, and also known as DHE) is a separate and more involved full handshake TLS method, and as such we tested both types of handshakes with and without TLS Session Resumption. The fact that nginx and lighttpd are so very similar in TLS tests is because they both make use of OpenSSL for all TLS operations.

Because no other web performance testing tool will perform TLS Session Resumption testing, webslap was used to produce the four charts in the image below. The specific commands we issued to produce these results were:

# nginx, TLS RSA 2048, no PFS, no keep-alive, no resume, c=64, 250K@12b
# ./webslap -cpu 2 -n 250000 -c 64 -nogz -noetag -nolastmodified -noui -notlsresume -nokeepalive https://127.0.0.1:1081/hello_world.txt
# lighttpd, TLS RSA 2048, no PFS, no keep-alive, no resume, c=64, 250K@12b
# ./webslap -cpu 2 -n 250000 -c 64 -nogz -noetag -nolastmodified -noui -notlsresume -nokeepalive https://127.0.0.1:2080/hello_world.txt
# rwasa, TLS RSA 2048, no PFS, no keep-alive, no resume, c=64, 250K@12b
# ./webslap -cpu 2 -n 250000 -c 64 -nogz -noetag -nolastmodified -noui -notlsresume -nokeepalive https://127.0.0.1:3081/hello_world.txt

# nginx, TLS RSA 2048, no PFS, no keep-alive, RESUME, c=512, 1M@12b
# ./webslap -cpu 2 -n 1000000 -c 512 -nogz -noetag -nolastmodified -noui -nokeepalive https://127.0.0.1:1081/hello_world.txt
# lighttpd, TLS RSA 2048, no PFS, no keep-alive, RESUME, c=512, 1M@12b
# ./webslap -cpu 2 -n 1000000 -c 512 -nogz -noetag -nolastmodified -noui -nokeepalive https://127.0.0.1:2080/hello_world.txt
# rwasa, TLS RSA 2048, no PFS, no keep-alive, RESUME, c=512, 1M@12b
# ./webslap -cpu 2 -n 1000000 -c 512 -nogz -noetag -nolastmodified -noui -nokeepalive https://127.0.0.1:3081/hello_world.txt

# nginx, TLS RSA 2048, PFS, no keep-alive, no resume, c=64, 250K@12b
# ./webslap -cpu 2 -n 250000 -c 64 -nogz -noetag -nolastmodified -noui -notlsresume -nokeepalive https://127.0.0.1:1081/hello_world.txt
# lighttpd, TLS RSA 2048, PFS, no keep-alive, no resume, c=64, 250K@12b
# ./webslap -cpu 2 -n 250000 -c 64 -nogz -noetag -nolastmodified -noui -notlsresume -nokeepalive https://127.0.0.1:2080/hello_world.txt
# rwasa, TLS RSA 2048, PFS, no keep-alive, no resume, c=64, 250K@12b
# ./webslap -cpu 2 -n 250000 -c 64 -nogz -noetag -nolastmodified -noui -notlsresume -nokeepalive https://127.0.0.1:3081/hello_world.txt

# nginx, TLS RSA 2048, PFS, no keep-alive, RESUME, c=512, 1M@12b
# ./webslap -cpu 2 -n 1000000 -c 512 -nogz -noetag -nolastmodified -noui -nokeepalive https://127.0.0.1:1081/hello_world.txt
# lighttpd, TLS RSA 2048, PFS, no keep-alive, RESUME, c=512, 1M@12b
# ./webslap -cpu 2 -n 1000000 -c 512 -nogz -noetag -nolastmodified -noui -nokeepalive https://127.0.0.1:2080/hello_world.txt
# rwasa, TLS RSA 2048, PFS, no keep-alive, RESUME, c=512, 1M@12b
# ./webslap -cpu 2 -n 1000000 -c 512 -nogz -noetag -nolastmodified -noui -nokeepalive https://127.0.0.1:3081/hello_world.txt

TLS, RSA 2048 bit keys

TLS, RSA 4096 bit keys

Same tests as the previous test, only this time with 4096 bit RSA keys and 4096 bit PFS/DHE. For the charts contained in the below image, the specific commands we issued to produce these results were:

# nginx, TLS RSA 4096, no PFS, no keep-alive, no resume, c=64, 250K@12b
# ./webslap -cpu 2 -n 250000 -c 64 -nogz -noetag -nolastmodified -noui -notlsresume -nokeepalive https://127.0.0.1:1081/hello_world.txt
# lighttpd, TLS RSA 4096, no PFS, no keep-alive, no resume, c=64, 250K@12b
# ./webslap -cpu 2 -n 250000 -c 64 -nogz -noetag -nolastmodified -noui -notlsresume -nokeepalive https://127.0.0.1:2080/hello_world.txt
# rwasa, TLS RSA 4096, no PFS, no keep-alive, no resume, c=64, 250K@12b
# ./webslap -cpu 2 -n 250000 -c 64 -nogz -noetag -nolastmodified -noui -notlsresume -nokeepalive https://127.0.0.1:3081/hello_world.txt

# nginx, TLS RSA 4096, no PFS, no keep-alive, RESUME, c=512, 1M@12b
# ./webslap -cpu 2 -n 1000000 -c 512 -nogz -noetag -nolastmodified -noui -nokeepalive https://127.0.0.1:1081/hello_world.txt
# lighttpd, TLS RSA 4096, no PFS, no keep-alive, RESUME, c=512, 1M@12b
# ./webslap -cpu 2 -n 1000000 -c 512 -nogz -noetag -nolastmodified -noui -nokeepalive https://127.0.0.1:2080/hello_world.txt
# rwasa, TLS RSA 4096, no PFS, no keep-alive, RESUME, c=512, 1M@12b
# ./webslap -cpu 2 -n 1000000 -c 512 -nogz -noetag -nolastmodified -noui -nokeepalive https://127.0.0.1:3081/hello_world.txt

# nginx, TLS RSA 4096, PFS, no keep-alive, no resume, c=64, 25K@12b
# ./webslap -cpu 2 -n 25000 -c 64 -nogz -noetag -nolastmodified -noui -notlsresume -nokeepalive https://127.0.0.1:1081/hello_world.txt
# lighttpd, TLS RSA 4096, PFS, no keep-alive, no resume, c=64, 25K@12b
# ./webslap -cpu 2 -n 25000 -c 64 -nogz -noetag -nolastmodified -noui -notlsresume -nokeepalive https://127.0.0.1:2080/hello_world.txt
# rwasa, TLS RSA 4096, PFS, no keep-alive, no resume, c=64, 25K@12b
# ./webslap -cpu 2 -n 25000 -c 64 -nogz -noetag -nolastmodified -noui -notlsresume -nokeepalive https://127.0.0.1:3081/hello_world.txt

# nginx, TLS RSA 4096, PFS, no keep-alive, RESUME, c=512, 1M@12b
# ./webslap -cpu 2 -n 1000000 -c 512 -nogz -noetag -nolastmodified -noui -nokeepalive https://127.0.0.1:1081/hello_world.txt
# lighttpd, TLS RSA 4096, PFS, no keep-alive, RESUME, c=512, 1M@12b
# ./webslap -cpu 2 -n 1000000 -c 512 -nogz -noetag -nolastmodified -noui -nokeepalive https://127.0.0.1:2080/hello_world.txt
# rwasa, TLS RSA 4096, PFS, no keep-alive, RESUME, c=512, 1M@12b
# ./webslap -cpu 2 -n 1000000 -c 512 -nogz -noetag -nolastmodified -noui -nokeepalive https://127.0.0.1:3081/hello_world.txt

TLS, RSA 4096 bit keys

Test Conclusions

There are many more rwasa features that we could have highlighted here where rwasa would have similarly stood out. For non-TLS testing, rwasa significantly outperforms the others for small static file serving (and thus dynamic content as well). This is a reflection of our choice to focus on dynamic content (compressed or otherwise), and TLS where both of these cannot make use of the linux sendfile feature that makes the others outperform rwasa for large static file serving.

For the TLS tests, as we can see here for the non-PFS (no DHE) performances, our HeavyThing library's modular exponentation speeds are about 15% faster than OpenSSL. For PFS/DHE however, rwasa is considerably faster. The reason is in part due to the modular exponentation speed benefit, but mostly due to the fact that we have very tight control over the safe primes and their Sophie Germain prime counterparts used for the Diffie-Hellman key exchange. Our choices for the dh_privatekey_size setting here are stronger than NIST recommends, and OpenSSL's choice to remain very very conservative about this is because they do not have the same assurances regarding the prime numbers used for DH. In our opinion, the choice of DH private key size should be made available in the other common web server environments so that DHE can be actively used in stronger encryption environments as we have done with rwasa. No doubt, web server administrators can also tightly control how their dhparam files are generated and ensure they only contain Safe Primes just as we have done with our dh$pool. Even with specific modifications to OpenSSL however, rwasa still provides faster initial TLS handshaking for PFS.

Hardness and Stability

Every reasonable effort has been undertaken to ensure rwasa is bug-free. We have been running it on the wild-wild interwebs for a decent while in various configurations. Despite our extensive background in these things, we know first-hand that every operation and environment is different. In this way, while rwasa works very well for us, you may encounter problematic situations that we simply have not thought to test. If you do encounter such situations that make rwasa (or you) cry, please contact us and/or the community and let us know so we can fix it up.

TLS Information

This section provides specific details about our HeavyThing library's TLS implementation as it pertains to HTTPS webserving.

BREACH/TIME/etc

Both the BREACH and TIME attacks rely on measuring the size of compressed response bodies. Since rwasa supports dynamic content compression by default, the HeavyThing library's default setting for webserver_breach_mitigation is enabled and set to 48 bytes. For each rwasa response when TLS and gzip is active, this setting adds an X-NB header that contains a random 0..48 bytes that is hex-encoded to each response header. While this doesn't render response sizing attacks completely useless, it makes a would-be attacker's job much more difficult due to the highly variable response lengths.

TLS Blacklist

To prevent padding oracle and other timing side-channel attacks, the HeavyThing library employs a TLS blacklist feature. When a decryption error occurs, by default the offending source IP address is added to a blacklist such that no further connections will be accepted for a period of a full 24 hours. Please note that the blacklist is not shared or synchronized between multiple rwasa processes, so what this really means is that -cpu worth of connections will be accepted at most per 24 hour period. Even if this setting is disabled, special care is taken such that no timing information is leaked during MAC failures. During the course of normal operations, decrypt errors do not (and should not) occur so if everyone plays nice with rwasa this setting goes largely unnoticed. It is our opinion that this strategy of automatically blacklisting clients who are tampering is an effective server-side mitigation technique.

Nessus/Comodo Scans

Thanks entirely to our feature of TLS blacklisting, if you run a Nessus scan against a TLS-enabled rwasa installation, the scanner will end up blacklisted when they perform their various TLS vulnerability checks (one or more times). Since Hackerguardian (Comodo's PCI Compliance scanner) uses Nessus, doing normal PCI compliance scans can result unpredictable results. This is because the Nessus scanner does not seem to appreciate the way our blacklist hangs up on them. The only way to get reliable results using these scans is to completely disable the HeavyThing tls_blacklist feature entirely (at least for the duration of your scanning periods). It is unfortunate that in order to pass these predictably and without issue that we have to disable our anti-tampering blacklist.

NOTE: This also applies to the SSL LABS tests, in that if our TLS blacklist is enabled, the results that come back are not accurate.

SSL LABS A- rating

Obviously, our own 2 Ton Digital website is running rwasa. As we understand, our SSL Labs rating of A- is directly a result of Internet Explorer versions not being able to negotiate Perfect Forward Secrecy with rwasa. Internet Explorer supports DHE, but only with DSA keys. So, to increase our rating, we'd have to replace our RSA key with a DSA key, or go ahead with ECDHE which we specifically opted out of (for the time being). The decision by Microsoft to support DHE-DSS but not DHE-RSA seems quite strange. They went ahead with ECDHE-RSA, but skipped DHE-RSA entirely despite the code requirements being basically the same for both DHE methods.

When the HeavyThing library was built, and up to the time of this writing, the Wikipedia article on EC states "In the wake of the exposure of Dual_EC_DRBG as 'an NSA undercover operation', cryptography experts have also expressed concern over the security of the NIST recommended elliptic curves, suggesting a return to encryption based on non-elliptic-curve groups." This was taken from comments made by the much-respected Bruce Schneier, and it does not appear to us that he (or anyone) has yet to retract them. This was sufficient grounds for us to specifically exclude elliptic curve methods in our current TLS implementation. If there is sufficient community interest, we may come back around on this position. In any case, we don't feel that PFS with Internet Explorer is worth contravening our position on elliptic curve cryptography to bump our rating up from an A-.

NSS/Firefox and DH >2236 bits

By default, rwasa is configured to use 2048 bit Diffie-Hellman parameters. If you choose to increase the size of dh_bits and dh_privatekey_size and recompile rwasa you may encounter difficulties if you exceed 2236 bits. Old versions of NSS (prior to mid 2012), and therefore Firefox which uses NSS have a known issue of a maximum DH size of 2236 bits.

Usage and options

# ./rwasa
Usage: rwasa [options...]
Options are:
    -cpu count                  How many processes to start, defaults to 1
    -runas username             Run as username (defaults to nobody, parses /etc/passwd)
    -foreground                 Run in foreground (defaults to background)
    -new                        Start a new webserver configuration object
    -tls pemfile                Specify TLS PEM for next bind option
    -bind [addr:]port           Add a listener on [addr:]port
    -cachecontrol secs          Set static file cache control (default: 300)
    -filestattime secs          Set static file stat time (default: 120)
    -logpath directory          Specify full pathname where to put logs
    -errlog filename            Specify full filename for error logs
    -errsyslog                  Send errors to syslog
    -fastcgi endswith address   Add fastcgi handler (addr:host or /unixpath)
    -fastcgi_starts with addr   Add fastcgi handler (addr:host or /unixpath)
    -backpath address           Add backpath/upstream (addr:host or /unixpath)
    -vhost directory            Add virtual hosting directory (full path)
    -sandbox directory          Add global sandbox directory (full path)
    -hostsandbox host directory Add hostname sandbox directory (full path)
    -indexfiles list            Index files list (comma separated)
    -redirect url               Redirect all requests to url
    -funcmatch endswith         Function map ends with match (default: .asmcall)

Option: -cpu count

Specifies the number of "worker" processes to fire up, defaults to 1. It is perhaps counterintuitive to think that more is better here. The number you choose should be dependent on the kind of loads you are running, and not necessarily just a simple CPU core count of your webserver machine. This is in due to two main factors; 1) Static content compression is not shared per process, so each independent process maintains its own separate cache of gzipped goods. 2) TLS session cache is broadcast to all child processes with our design. So, just because you have a 64 core piece of webserver hardware, does not necessarily mean you should configure 64 CPUs for rwasa (though you certainly CAN, and in some rare cases, may even be prudent to do that or more still).

Option: -runas username

rwasa is meant to be started as root, and by default switches to the user nobody. Specifying a different user here overrides this behaviour. Note that rwasa parses /etc/passwd to determine the UID and GID of whatever user is specified.

Option: -foreground

By default, rwasa will be very quiet and detach from its controlling terminal without a word. Specifying this option will cause rwasa to display its banner and remain attached to your terminal session.

Option: -new

This option is a configuration "separator" if you will. For single-configuration startups, this option is obviously unnecessary, but by specifying this, allows you to start over as it were with separate and additional rwasa configurations.

Option: -tls pemfile

This option expects as its argument a PEM file that must contain a private key, public key, and any intermediate certificates (in that order). NOTE: this option MUST appear before the -bind option.

Option: -bind [addr:]port

Simple as it sounds, bind the current configuration to a port with an optional IP address specified. If the bind fails, rwasa will complain and refuse to start.

Option: -cachecontrol secs

For file-based (sandbox/vhost) serving, rwasa automatically adds Cache-Control, Last-Modified, and ETag headers. This setting determins the max-age setting. rwasa sets s-maxage to whatever this value is * 3. If this value is set to zero, then rwasa will only send Cache-Control: no-cache.

Option: -filestattime secs

Also for file-based (sandbox/vhost) serving, rwasa does not constantly stat underlying static files for each and every request, and instead does them periodically. This setting determines how frequently rwasa checks for underlying file modifications. For production systems that don't change a lot, higher is better. The maximum upper limit of this is 900 seconds.

Option: -logpath directory

If specified, this is the directory location where rwasa will dump "normal" webserver access logs. Special care must be taken such that the run-as user has write permissions to this path. rwasa will create files in this directory named access.log.YYYYMMDD. NOTE: log writes are on a 1.5 second interval, so if you are tailing them and it seems "chunky" this is quite by design.

Option: -errlog filename

Similarly, this option specifies a full filename for the error log. Unlike the access logs, only a single error log can be specified. If you are employing FastCGI, stderr from there will also land in this file.

Option: -errsyslog

In addition to the prior option, you can use either and/or both and this option sends error logs to the syslog.

Option: -fastcgi endswith address

This option configures rwasa for FastCGI. The endswith argument is precisely that, e.g. .php would redirect all requests that end in .php to be forwarded to the specified FastCGI handler. Multiples of this option are fine. For the address argument, if this begins with a forward slash, it is assumed that the FastCGI handler is a unix socket, otherwise it will assume it is an IP:port combination. See the section below on Unix FastCGI for details as to why you should be using unix sockets if your FastCGI handler is on the same machine as rwasa.

Option: -fastcgi_starts with addr

Similar to the -fastcgi option, this enables matching against starts_with instead of endswith. This is useful for environments where PATH_INFO and PATH_TRANSLATED are needed. See the above commentary on -fastcgi as to the argument requirements.

Option: -backpath address

This option configures rwasa for backpath (aka upstream) handling (think: HAProxy). The address supplied must either be an IPv4 address:port, or a full pathname for a unix fd. Note that the same benefits for FastCGI via unix socket exists for backpaths. Also note that if this option is combined with a sandbox, the sandbox will be tested first, and the backpath will only receive requests for files that do not exist in the locally configured sandbox.

Option: -vhost directory

To provide virtual hosting (many domains on one address of course), this option specifies the directory whereby hostnames exist. For example, if /tmp was passed for this option, and then a request arrives for example.com, rwasa will construct the document root to be /tmp/example.com and proceed with normal processing from there.

Option: -sandbox directory

As opposed to virtual host based serving, rwasa will also ignore the Host header entirely, and choose the specified sandbox directory as the document root for requests that arrive on the current configuration.

Option: -hostsandbox host directory

Alternatively, you can specify individual hosts, which is perhaps a security enhancement over the blanket directory-based -vhost option approach, but accomplishes similar things.

Option: -indexfiles list

Just as it sounds, a comma-separated list of index filenames, e.g. index.php,index.html.

Option: -redirect url

A brutish option that overrides all other configuration directives (if they are specified), and does a 302 Redirect for any and all requests that arrive on the current configuration. Must obviously be a fully qualified URL.

Option: -funcmatch endswith

Similar to the -fastcgi directive, this option directs all incoming requests that match the endswith argument to the default assembly language function hook included in rwasa.

Quick start hints

Simple

In its simplest form, all rwasa needs is a bind and a sandbox, and this will start rwasa as user nobody:

# ./rwasa -bind 80 -sandbox /var/www/html

Assuming you had started a PHP FastCGI server like: PHP_FCGI_CHILDREN=20 php-cgi -e -b /dev/shm/php.sock, then you could add to the first:

# ./rwasa -bind 80 -sandbox /var/www/html -fastcgi .php /dev/shm/php.sock

Simple with logs

Assuming you created a log directory like: mkdir /var/log/rwasa && chown nobody:nobody /var/log/rwasa, then you could add to that:

# ./rwasa -bind 80 -sandbox /var/www/html -fastcgi .php /dev/shm/php.sock -logpath /var/log/rwasa

Simple with TLS

NOTE: rwasa is not a TLS validation tool, far from it. It is assumed that you already know beforehand your private key, certificate and intermediates are good. So, to replicate our last test, but toss TLS into the mix (noting that rwasa reads the certificates before it changes privilege level):

# ./rwasa -tls /root/example.pem -bind 443 -sandbox /var/www/html -fastcgi .php /dev/shm/php.sock -logpath /var/log/rwasa

Virtual hosts

If we abandon our global sandbox and want to do virtual hosting, say: for i in {1..5}; do mkdir -v /var/www/html/example$i.com; echo "Heya" > /var/www/html/example$i.com/index.html; done, then we could say:

# ./rwasa -bind 80 -vhost /var/www/html -fastcgi .php /dev/shm/php.sock -logpath /var/log/rwasa

Virtual hosts with a "catch-all" directory:

In addition to the above Virtual hosts example, -vhost can be combined with the -sandbox option to provide fallthrough features. The below would correctly use the document roots for valid Host headers presented above, and for IP-based requests, or when a non-matching Host header is found, serves all of those requests from the catchall subdirectory:

# ./rwasa -bind 80 -vhost /var/www/html -sandbox /var/www/html/catchall -fastcgi .php /dev/shm/php.sock -logpath /var/log/rwasa

Multiple binds

To combine our port 80 example with our TLS example:

# ./rwasa -bind 80 -sandbox /var/www/html -fastcgi .php /dev/shm/php.sock -logpath /var/log/rwasa -new -tls /root/example.pem -bind 443 -sandbox /var/www/html -fastcgi .php /dev/shm/php.sock -logpath /var/log/rwasa

Troubleshooting

It is assumed that only highly experienced web/system administrators will be using rwasa. By design, rwasa is not verbose about any administrative error reporting. This section aims to provide simple ways to troubleshoot various common situations where rwasa misbehaves as a result of configuration issues.

Letsencrypt Keys/Certificates

Let's Encrypt keys and certificates don't come in the normal[old] variety of RSA formatting, and as a result some special handling is needed until such time as we modify our private key parsing to include it. In the meantime:

# openssl rsa -in privkey.pem -out privkey_rsa.pem
# # and then cat the privkey_rsa.pem and the fullchain.pem together for an rwasa-compatible .pem file

TLS issues

TLS issues are usually related to PEM file issues. Thanks to lighttpd making use of PEM files in the same manner as rwasa, you may find it helpful to first get lighttpd to be happy with your TLS configuration, and then move to rwasa. This is because of the level of verbosity provided by OpenSSL that we didn't include with rwasa.

OCSP Stapling issues

OCSP Stapling can be difficult to get right. By default, rwasa logs its OCSP handling via syslog, so that it is easy to determine what precisely is going on. It also makes use of /etc/resolv.conf in order to perform DNS queries to OCSP servers. It is assumed that /etc/ssl/certs contains a valid (and current) CA store, noting that not all linux distributions appear to treat this path the same. If, after verifying that your certificate chains are in order and that you can correctly resolve DNS issues, but you still don't receive syslog messages re: OCSP, see the next section for identifying the culprit.

NOTE: Until RFC 6961 gets adopted everywhere, even through rwasa will go ahead and acquire OCSP responses for multiple certificate chains, it will only send the first one. Insofar as the purpose and intent behind OCSP Stapling, this seems to work well for us. Once multiple certificate stapling is adopted, we'll move to adding both options. The simple short-term solution, as noted by others is to acquire a certificate that doesn't require intermediates, or deal with only the first one.

All other issues

Due to the lack of verbosity with rwasa for configuration issues, often the easiest way to locate errors is with strace. In an isolated environment, if you start rwasa with only a single cpu and force it to remain in the foreground, then you can use strace -f ./rwasa [your config options] and usually locate configuration problems without much difficulty. When in doubt, make sure your basic configuration/environment works well with other more verbose webservers first if rwasa gives you grief.

Unix socket FastCGI

Due to the fact that most FastCGI/PHP server environments are tightly coupled with their supporting webserver software, they normally operate on the same physical machine. Often, administrators configure their FastCGI listeners on localhost via IPv4 as opposed to AF_UNIX. This is because under high load and/or peak demand scenarios, AF_UNIX will return EAGAIN (rather than EWOULDBLOCK which all our nonblocking webservers require). Typically, this EAGAIN condition results in the webserver returning a 502 Bad Gateway response to the end-users. If the same server is configured to use IPv4 and localhost however, that same load does not cause an error condition for the webserver (and the connection is queued normally). NOTE: Unless of course you are on a very powerful single-system-image multicore machine, and you run out of localhost ports. If an administrator chose localhost over AF_UNIX for one service, likely the same choice was made for services OTHER than FastCGI.

A deeper investigation as to why AF_UNIX sockets are avoided in most configurations revealed that the maximum number of connect attempts to an AF_UNIX socket is limited by two factors: 1) /proc/sys/net/core/somaxconn, and 2) the listening process' listen() backlog parameter. It is unclear from the documentation whether the call to listen() has any affect on AF_UNIX sockets, but it is definitely so for the /proc/sys/net/core/somaxconn system setting.

In a normal high availability webserver environment, any condition that raises a 502 Bad Gateway should really be a bad gateway, and not a full backlog for local connects. While many administrators set /proc/sys/net/core/somaxconn to a very high (improbably so) number and then modify their FastCGI process' arguments to listen(), we did not feel this is the proper solution to the problem.

When rwasa under extreme loads hits this backlog ceiling, the linux kernel returns us with an EAGAIN condition (as is the case with any nonblocking webserver). We carefully considered the language of the EAGAIN return, and decided to include the HeavyThing library setting epoll_unixconnect_forgiving which is enabled by default. This has the pleasant side-effect that FastCGI calls from rwasa to AF_UNIX sockets will not return 502 Bad Gateway even if the backlog is full. Instead, the HeavyThing library manages its own pending connect queue and waits as it should (and thus does what the documentation suggests for receiving EAGAIN from connect()).

We are not suggesting that /proc/sys/net/core/somaxconn does not play a role, certainly it does and should be set according to your load and operating environment. What we are suggesting is that it need not be set to some insane value, and that rwasa will manage peak demands without returning errors to your user base. Of course, if your FastCGI handler actually does get "stuck", then this behaviour may not be desirable but high availability FastCGI webserver environments are commonplace.

Assembly language function hook

In addition to being a full-featured webserver and a showcase piece for our HeavyThing library, rwasa has been designed as a template for quickly building web application servers in x86_64 assembly language. To this end, rwasa by default includes a function hook whereby all requests that arrive that end with .asmcall get directed to this function. This can be seen on our own rwasa webserver here. The function itself is named asmcall and lives in rwasa.asm, and returns a simple dynamic response containing the original request URL. See the HeavyThing page for details on recompiling rwasa. For production environments using rwasa as-is, it is recommended that the command line option -funcmatch be utilised to change the default away from .asmcall, though the function provided in rwasa is harmless.

The HeavyThing library's webserver architecture, which is not specific to rwasa, provides a webserver object that in itself is an epoll listener. For every inbound connection, a new webserver object is created. Multiple requests may occur for any given webserver object. This section here is intentionally oversimplified, and you are encouraged to peruse the code itself for a deeper understanding of how the functionality all comes together. We'll start with the function hook code itself, and follow that with more descriptive information after:

	; this is our main function call hook, as defined by _start.hookthemall
	; it is called by the webserver layer with:
	; rdi == webserver object, rsi == request url, rdx == mimelike request object
	; per the webserver layer requirements, we must return one of:
	; null: webserver will respond with a 404 automatically.
	; -1 == webserver will sit there and do absolutely nothing
	; or anything else is a properly formed mimelike response object (including
	; preface line)
	;
	; for our demonstration purposes, we'll construct a simple text/plain return
falign
asmcall:
	prolog	asmcall
	push	rbx r12
	; build a dynamic text reply first up
	mov	rbx, rsi
	call	buffer$new
	mov	rdi, rax
	mov	rsi, .stringpreface
	mov	r12, rax
	call	buffer$append_string
	mov	rdi, rbx
	call	url$tostring
	mov	rbx, rax
	mov	rdi, r12
	mov	rsi, rax
	call	buffer$append_string
	mov	rdi, rbx
	call	heap$free
	mov	rdi, r12
	mov	rsi, .stringreply
	call	buffer$append_string

	; construct our return object
	call	mimelike$new
	; set the http preface
	mov	rbx, rax
	mov	rdi, rax
	mov	rsi, .httppreface
	call	mimelike$setpreface
	; set our content type
	mov	rdi, rbx
	mov	rsi, mimelike$contenttype
	mov	rdx, mimelike$textplain
	call	mimelike$setheader
	; set our body to the UTF8 of our string
	mov	rdi, rbx
	mov	rsi, [r12+buffer_itself_ofs]
	mov	rdx, [r12+buffer_length_ofs]
	call	mimelike$setbody
	; free our working buffer
	mov	rdi, r12
	call	buffer$destroy
	; return our mimelike response
	mov	rax, rbx
	pop	r12 rbx
	epilog
cleartext .stringpreface, 'Welcome to rwasa!',13,10,'URL: '
cleartext .stringreply, 13,10,'This is a native assembler function call hook.',13,10,13,10,'See https://2ton.com.au/rwasa for more information/documentation.',13,10
cleartext .httppreface, 'HTTP/1.1 200 rwasa reporting for duty'

The first thing to point out is that for such a simple example, the HeavyThing library tools we have used were a bit overkill. For demonstration purposes however, this serves as an excellent example. The first thing we see is the comments about what arguments the function receives, and what its possible return values are. Passed in rdi is the client connection webserver object, in rsi is the url object of the request itself, in rdx is the mimelike object of the request which includes headers, POST body if present, etc.

The function hook needn't worry about the communications layer, or any of the other required and/or standard HTTP headers, only the basics such that the webserver layer can take over from there. The mimelike object, which provides both MIME and HTTP parsing and composition capabilities serves both our request and return values throughout the webserver layer. Dependent of course on the request itself, the Content-Type that the function hook returns and the body length, the webserver layer will automatically gzip the outbound contents all without any actions inside our function call hook.

While this rwasa page isn't intended to be a programming guide or reference to the HeavyThing library itself, we hope it provides a decent introductory into both the method and difficulty level of writing assembly language applications using our HeavyThing library. Perusing the code from the function hook backward through rwasa and the library itself is made much easier with a starting reference point.