Recently I got the idea of not only securing connections from my servers to the outside world, but also securing any connections between them in the internal network. It is unlikely that somebody could get access to it - one would need physical access, or access to a server in the internal network, because the servers only communicate with each other via 192.168.x.x IPs. Yet it could happen, and if the network data would be intercepted, all the data could be in jeopardy.

The three applications who communicate on my internal network are:

  • Nginx calls PHP-FPM, via FastCGI
  • PHP-FPM calls Elasticsearch, via HTTP
  • PHP-FPM calls MariaDB, via TCP

Four applications are involved with three different protocols. It took me 2 days to figure this out completely.

Certificates

All encryption entails certificates, and if you want to secure connections between servers you need quite a lot of them. The easiest and best way to do it is to create your own CA (Certificate Authority) as a basis of trust, and create new certificates which are then signed by the CA certificate.

All the servers involved then know to trust only certificates signed by this custom CA, which nobody except you has access to.

This is the procedure to create your root CA key and certificate:

  1. You will need openssl installed. All this was tested on Debian Linux and using openssl in a terminal.
  2. Go to a directory where you want to store the certificates.
  3. Type the following to create the key and certificate (enter any information you want when executing the second command):

    openssl genrsa 4096 > ca.key
    openssl req -new -x509 -days 15000 -key ca.key -out ca.crt
    

Each time you need a new key and certificate pair to use in the examples below, create them like this:

openssl req -new -nodes -sha512 -keyout server.php.key -out server.php.csr -newkey rsa:4096
openssl x509 -req -extfile <(printf "subjectAltName=DNS:php_cluster_http") -days 15000 -CA DadaismCA.crt -CAkey DadaismCA.key -CAcreateserial -in server.php.csr > server.php.crt

This is sensible for the php server certificate. As both the “Common Name” and the “subjectAltName” you should use the internally used domain name (if http is used as a protocol, otherwise it does not matter).

PHP / FastCGI

PHP-FPM works only via FastCGI - it does not know other protocols. When calling PHP-FPM from Nginx I had an upstream group with multiple server IPs. It would connect to FastCGI on one of the web servers, and if one went down, another would take over.

FastCGI does not support encryption - from a 2018 perspective it is a weird protocol anyway, almost like a relic of the past. Yet as long as PHP-FPM supports nothing else, we are stuck with it.

In blocks where I called PHP via FastCGI, my Nginx configuration looked something like this:

location @php {
    include        /etc/nginx/fastcgi_params.conf;
    fastcgi_pass   php_cluster;
    fastcgi_param  HTTPS on;
    fastcgi_param  SCRIPT_NAME  $document_root/app.php;
    fastcgi_param  SCRIPT_FILENAME  $document_root/app.php;
    fastcgi_param  SYMFONY_ENV example_com;
    fastcgi_send_timeout 1800;
    fastcgi_read_timeout 1800;
}
 
upstream php_cluster {
    server 192.168.5.2:9000;
    server 192.168.5.3:9000 backup;
}

PHP-FPM ran on port 9000, and the backup server (192.168.5.3) was only used if 192.168.5.2 was down.

Because I could not add encryption to FastCGI, I decided to instead proxy all HTTPS requests between the different Nginx instances on the servers (letting Nginx handle all the between-server connections) and I restricted all FastCGI connections to 127.0.0.1 (so FastCGI never goes on the network). To avoid too much SSL overhead the connections are reused. The configuration on the first Nginx server then looks like this:

location @php {
    # Set FastCGI variables for receiving PHP proxy
    proxy_set_header X-Var-FCGI-Pass php_local;
    proxy_set_header X-Var-FCGI-Documentroot $document_root;
    proxy_set_header X-Var-FCGI-Symfony-Env example_com;
    proxy_set_header X-Var-FCGI-IP $remote_addr;
    
    ### The following can be put in a separate 
    ### file and included in each PHP location block
   
    # Pass request to our PHP http cluster
    proxy_pass https://php_cluster_http;
    
    # SSL verification
    proxy_ssl_protocols           TLSv1.2;
    proxy_ssl_certificate         /etc/nginx/php.client.crt;
    proxy_ssl_certificate_key     /etc/nginx/php.client.key;
    proxy_ssl_ciphers             HIGH:!aNULL:!MD5;
    proxy_ssl_trusted_certificate /etc/nginx/ca.crt;
    proxy_ssl_verify       on;
    proxy_ssl_verify_depth 2;
    proxy_ssl_session_reuse on;
    
    # HTTP connection and headers - make sure it supports keepalive
    proxy_http_version 1.1;
    proxy_set_header Connection "";
    
    # Connection settings
    proxy_connect_timeout 1s;
    
    # Save host in a different variable, because Nginx
    # overwrites it when proxying
    proxy_set_header X-Var-Host $http_host;
}
 
# Cluster of PHP servers
upstream php_cluster_http {
    server 192.168.5.2:7000;
    server 192.168.5.3:7000 backup;
  
    # Keep the connections alive to avoid overhead
    keepalive 100;
}

For all servers which handle these PHP requests we add the following configuration:

server {
    listen 192.168.5.2:7000 ssl;
    listen 192.168.5.3:7000 ssl;
  
    # https server name equal to the proxy name
    server_name php_cluster_http;
  
    # Valid certificate for php_cluster_http common name
    ssl_certificate        /etc/nginx/php.server.crt;
    ssl_certificate_key    /etc/nginx/php.server.key;
    ssl_client_certificate /etc/nginx/ca.crt;
    ssl_verify_client      on;
  
    # Send everything to fastcgi local server
    location / {
        # Set internal variables for PHP
        include        /etc/nginx/fastcgi_params.conf;
        fastcgi_pass   $http_x_var_fcgi_pass;
        fastcgi_param  HTTPS on;
        fastcgi_param  SCRIPT_NAME  $http_x_var_fcgi_documentroot/app.php;
        fastcgi_param  SCRIPT_FILENAME  $http_x_var_fcgi_documentroot/app.php;
        fastcgi_param  SYMFONY_ENV $http_x_var_fcgi_symfony_env;
        fastcgi_param  HTTP_HOST $http_x_var_host;
        fastcgi_param  REMOTE_ADDR $http_x_var_fcgi_ip;
        fastcgi_send_timeout 600;
        fastcgi_read_timeout 600;
    }
}
 
upstream php_local {
  server 127.0.0.1:9000;
}

As you can see we use two sets of certificates - a server certificate (php.server.crt) with the Common Name “php_cluster_http”, and a client certificate (php.client.crt) which can be any certificate signed by our root certificate.

The client checks the Common Name of the server certificate (it must equal “php_cluster_http”) and that it was signed by our CA, while the server only checks that the client certificate was signed by our CA.

The connection only succeeds if both client and server have valid certificates. That way nobody can connect to either one of them (or play man-in-the-middle) without the connection failing.

Because the HTTP connections are keepalive, the overhead of the encryption and proxying should by relatively small. On my servers I could not detect the difference in speed when measuring regular website requests - I suspect it is between 1 and 5 milliseconds at most.

In PHP-FPM the only thing that needs to be changed is the “listen” directive in the pool configuration, so PHP-FPM listens only on 127.0.0.1:9000.

Elasticsearch

With Elasticsearch 5 a plugin called “X-Pack” was introduced which now handles all security aspects (and many other things) for Elasticsearch. With it you can add a certificate for both inter-node communication and for client access.

This is what I ended up adding to my elasticsearch.yml configuration:

# Disable most parts of xpack - we only are interested in SSL
xpack.ml.enabled: false
xpack.monitoring.enabled: false
xpack.watcher.enabled: false

# Enable security, which also has SSL in it
xpack.security.enabled: true

# Allow anonymous access with all privileges - the
# authorization is already guaranteed with our custom
# CA certificates which are validated on both sides
xpack.security.authc:
  anonymous:
    roles: superuser

# No auth tokens
xpack.security.authc.token.enabled: false

# SSL settings and certificates
xpack.ssl.verification_mode: certificate
xpack.ssl.supported_protocols: TLSv1.2
xpack.ssl.key:                     /etc/elasticsearch/x-pack/es.server.key
xpack.ssl.certificate:             /etc/elasticsearch/x-pack/es.server.crt
xpack.ssl.certificate_authorities: [ "/etc/elasticsearch/x-pack/ca.crt" ]

# Enable SSL between nodes
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate

# Enable/Force SSL with clients
xpack.security.http.ssl.enabled: true
xpack.ssl.client_authentication: required
xpack.security.http.ssl.client_authentication: required

Some surprising things which cost me quite a bit of time:

  • Anonymous users have no access to anything by default as soon as you enable X-Pack security. I think this is an overreaction of the Elasticsearch team to compensate for rather relaxed security settings in the past, because in my example I am already using a mandatory client certificate which is checked. So authentication is provided by that certificate, and having additional role authentication would be overkill.
  • All the certificates have to be in /etc/elasticsearch or a subdirectory of it. No symbolic links are allowed (which seems like a bug to me), so you really have to copy your certificates there for real, otherwise Elasticsearch will not start.

So far so good. Being a client in this setup is quite similar to the PHP example from before: You need to connect via HTTPS and show a certificate which is signed by our CA to be able to connect.

It would be possible to connect via HTTPS directly from PHP, but performance would suffer a lot: The SSL connections would not be reused between different requests, so every page requests would create its own SSL connection and the crypto overhead that comes with it. That is a disadvantage of the shared-nothing architecture of PHP-FPM - connection reuse is not possible.

Nginx comes to the rescue for this problem too. Instead of sending HTTPS requests directly to Elasticsearch, we can send them to our local Nginx instance via HTTP and Nginx can proxy the HTTPS requests - and also reuse these HTTPS connections, so the HTTPS overhead should be neglible.

In Nginx, the local proxy looks something like this:

# Elasticsearch relay
server {
    listen 127.0.0.1:999;

    location / {
        # Pass to elasticsearch cluster
        proxy_pass https://elasticsearch;
  
        # SSL verification
        proxy_ssl_protocols           TLSv1.2;
        proxy_ssl_certificate         /etc/nginx/php.client.crt;
        proxy_ssl_certificate_key     /etc/nginx/php.client.key;
        proxy_ssl_ciphers             HIGH:!aNULL:!MD5;
        proxy_ssl_trusted_certificate /etc/nginx/ca.crt;
        proxy_ssl_verify       on;
        proxy_ssl_verify_depth 2;
        proxy_ssl_session_reuse on;
  
        # HTTP connection and headers - make sure it supports keepalive
        proxy_http_version 1.1;
        proxy_set_header Connection "";
  
        # Connection settings
        proxy_connect_timeout 1s;
    }
}

# Elasticsearch reverse proxy
upstream elasticsearch {
    server 192.168.5.5:9434;
    server 192.168.5.6:9434 backup;
    
    # Up to 50 keep-alive connections
    keepalive 50;
}

The local proxy connects to https://elasticsearch via one of the elasticsearch upstream IPs, it verifies that the elasticsearch certificate has a Common Name of “elasticsearch” and that it is signed by our CA. It also shows its own certificate signed by our CA - in this case I just reused the PHP client certificate from the PHP example. It does not matter what certificate we use, as long as it is signed by our custom CA.

Within our PHP application we now connect to http://127.0.0.1:999 and interact with the Nginx proxy as we did before - the PHP application does not notice the encryption, because it is all handled between Nginx and Elasticsearch.

MariaDB (also applicable to MySQL)

After all this I felt confident to also tackle MariaDB - only the TCP connections between PHP and MariaDB remained unencrypted. Usually the database is especially security relevant (sensitive data, access credentials for large amounts of data), so securing all database connections seemed even more worthwhile compared to PHP-FPM and Elasticsearch.

MariaDB does support SSL connections which I have been using for off-site replication for a few years now. Yet connecting to MariaDB via SSL had the same major drawback we encountered with Elasticsearch: Each PHP process would create its own SSL connection to MariaDB, which would be super slow, and all the crypto overhead would not be reused across different page requests.

Even worse, encryption in MariaDB & MySQL is still not too stellar. There are no options in the MariaDB/MySQL configuration to force a client to present a certain certificate to the MariaDB server, so client validation is not possible.

This might seem overkill, as you usually also need a valid username and password to get access to MariaDB, but forcing a client to have a certificate signed by a custom CA would make it completely impossible to even try out a username-password combination on the MariaDB server. It would render any attack or any man-in-the-middle attempt impossible from the get-go. After implementing it that way for Elasticsearch it seems weird that the two major databases do not have anything like it, and that there are seemingly no plans for such a feature.

Nginx comes to the rescue again - since version 1.9.0 Nginx has a stream module. In addition to having a http { } section in your Nginx config which targets http traffic, you can have a stream section which handles TCP streams. MariaDB connections are a TCP stream, so Nginx can proxy these TCP connections!

A drawback of this method is that Nginx has to run on the database server too - I didn’t need Nginx there before, so this is an additional service running on the database server just to proxy these connections. Because Nginx is lightweight and easy to configure this still seemed like a good option.

This is the stream section config for Nginx on the MariaDB server:

stream {
    # MariaDB/MySQL relay
    server {
        listen 3307 ssl;

        # Pass to mysql cluster
        proxy_pass mysql_local;

        # Connection settings
        proxy_connect_timeout 1s;

        # SSL configuration - use server certificate & key
        ssl_certificate         /etc/nginx/mysql.server.crt;
        ssl_certificate_key     /etc/nginx/mysql.server.key;
        ssl_protocols           TLSv1.2;
        ssl_ciphers             "ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256";
        ssl_prefer_server_ciphers on;
        ssl_client_certificate  /etc/nginx/ca.crt;
        ssl_verify_client       on;
        ssl_session_tickets     off;
        ssl_session_cache       shared:SSL:100m;
        ssl_session_timeout     600m;
        ssl_handshake_timeout   5s;
    }
    
    # Database reverse proxy, MariaDB listening on port 3306
    upstream mysql_local {
        server 127.0.0.1:3306;
    }
}

Nginx is listening on port 3307 for TCP connections, it has its own certificate with a Common Name of “mysql” and it makes sure the connecting client has a valid certificate signed by our CA, so both client and server only establish a connection if both parties have certificates of our CA. We can also enforce TLS 1.2 and secure ciphers.

Quite important is the ssl_session_cache here - I have not tested the caching aspect thoroughly yet, but because we are establishing TCP connections we are missing all the nice HTTP features we relied on with PHP-FPM and Elasticsearch, like HTTP keepalive. With the SSL session cache we should be able to reuse our connections.

The client has the following Nginx stream configuration:

stream {
    # MariaDB/MySQL relay
    server {
        listen 127.0.0.1:998;

        # Pass to mysql cluster
        proxy_pass mysql;

        # Connection settings
        proxy_connect_timeout 1s;

        # SSL configuration - use server certificate & key
        proxy_ssl  on;
        proxy_ssl_certificate         /etc/nginx/mysql.server.crt;
        proxy_ssl_certificate_key     /etc/nginx/mysql.server.key;
        proxy_ssl_protocols           TLSv1.2;
        proxy_ssl_ciphers             "ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256";
        proxy_ssl_trusted_certificate /etc/nginx/ca.crt;
        proxy_ssl_verify              on;
        proxy_ssl_verify_depth        2;
        proxy_ssl_session_reuse       on;
    }
    
    # Database reverse proxy
    upstream mysql {
        server 192.168.5.5:3307;
    }
}

We are reusing the server certificate for the client too - one could also use a different certificate, as long as it is signed by our CA.

Our PHP application in PHP-FPM can now connect to 127.0.0.1:998 without encryption and without worrying about encryption, because Nginx handles the encryption between client and server, so both PHP and MariaDB remain oblivious of it.

On our MariaDB server, we can change the host name restrictions for our database users. I had 192.168.5.* as a restriction there for most users, and I was able to change it to 127.0.0.1 thanks to our Nginx proxy. This means the attack surface for MariaDB itself becomes a lot lower, and Nginx is the new barrier of entry to log in to our MariaDB server.

Problems with our TCP proxy

There was one new problem with the connections to MariaDB from PHP. While the HTTP proxies (used by PHP-FPM and Elasticsearch) have built-in HTTP methods of reconnecting when a keepalive connection somehow goes away, TCP connections do not have automatic reconnect or similar measures. So if a proxied TCP connection becomes unavailable (for example a Nginx restart on either the client or the server, or because a connection was closed for other reasons) and the application thinks the connection still exists, there will be a “MySQL server has gone away” error.

Because TCP is much more low level than HTTP, the application has to devise its own strategies for these circumstances. In my example I changed my Database class in PHP so PDO exceptions with the error message “MySQL server has gone away” are detected and a reconnect is attempted multiple times with increasing wait times inbetween, and after reconnecting to the server the SQL query is sent again. This works very well and makes the application more resilient in general - because a temporary unavailability of a database server could happen anyway, and you might not want to display an error message prematurely or interrupt a long-running cron job.

This becomes more complex if you use transactions, because losing the server connection also means the transaction has to be restarted from scratch, so you will need a solution in your application for that. It might still be worth it to implement these safeguards just to have a very resilient application - so even if you restart your MySQL server there are no half-finished transactions which leave your application in an undefined state. Often applications probably rely a bit too much on a “perfect” database connection which is never interrupted or unavailable.

Adding other services

Because I only use PHP-FPM, MariaDB and Elasticsearch, I only implemented solutions for these services. But I basically showed all avenues to extend these concepts to other services:

  • Redis works with TCP connections, so accessing Redis via SSL (if it is not on the same server) could be solved in a similar way as our working MariaDB example. The same TCP proxy problems could occur here as we had with MariaDB, meaning we would need to handle disconnects or timeouts gracefully within our applications and reconnect if necessary.
  • MongoDB also works with TCP connections, so also the same as MariaDB.
  • Most other databases (like PostgreSQL, Oracle, etc.) usually use TCP connections too, so it would be similar to the MariaDB solution.
  • XML or JSON APIs (or SOAP, or whatever format you prefer) all work via HTTP, which is the best case scenario for a proxy and are easy to secure. Often these APIs have built-in methods of upgrading them to HTTPS, or we can just use our own HTTPS proxy.

In general it should be possible to secure every kind of service with the described methods, by using Nginx as a proxy.

Results & performance

All traffic is now encrypted, and even if someone accesses the local network of my servers, all the data passed between the servers would be useless. Even if someone managed to get access to my switch (maybe a vulnerability, maybe a backdoor, who knows) all the data passing through it cannot be deciphered. In a world where you never know where the next vulnerability will be this seems like a sensible precaution - and never completely trusting a network is a good rule of thumb.

Performance is a question which went through my mind though: Would all this extra encryption have an effect on performance? My one website has an average page generation time of about 25-26ms (very fast in comparison with most websites), so the effect was likely to be noticeable. Because page generation time is measured inside PHP it only measures the additional time needed for MariaDB and Elasticsearch encryption - and in my case, this additional overhead is about 4-5ms, so I am up to 29-31ms now. I can live with that, and I suspect there will be ways to further reduce the speed impact of encryption. Maybe by adding better connection caching, or by newer encryption methods in the future.

Alternative solution: VPN

I secured individual services with individual certificates while using the existing local network - this makes securing each service a bit more difficult, but the design of the network is very easy.

Another option would be to connect all servers via VPN. You need a VPN server which connects all servers, and then the servers only use the VPN IPs to talk to each other. All the services can use unencrypted connections within the VPN network because the VPN connection encrypts all the network traffic anyway. These are the pros I see with this method:

  • All services are automatically secure and encrypted, you can add services and different protocols without changing the network, adding certificates, or adding proxies.
  • Performance would likely be better because VPN encryption is established only once between the servers, compared to having several encrypted connections with different certificates and proxies and keeping these connections alive.
  • Nginx as a proxy would not be necessary, the applications could talk to each other directly again (via FastCGI, TCP, or HTTP), which reduces the complexity per service.
  • VPN and especially OpenVPN is a well established and tested solution to secure a network.
  • VPN can connect distant networks with each other quite easily, creating one large trusted network.

But there are also some negatives:

  • Two VPN server need to be added to the infrastructure (having only one would lead to a new single point of failure), with each server being connected to each VPN server and listening in each VPN network.
  • Without the VPN the servers cannot talk to each other at all, or if they can, it is unencrypted.
  • The network design becomes more complex, and you have to distinguish once more between the secure VPN network and a potentially insecure outside network. Even small misconfigurations (like listening on an IP or interface which is not secured by the VPN) can lead to a lapse in security.
  • If the VPN is somehow misconfigured or is flooded by requests the whole internal network becomes unavailable, instead of only one service failing.
  • Encryption becomes invisible for the applications, the developers and the sysadmins, which can erode the knowledge about security or the need for it. By securing and testing individual services you can incorporate security at a more noticeable level.
  • If an unauthorized person gets access to the VPN (by finding our valid credentials for it, or by exploiting a security vulnerability) the whole network is at risk, instead of only one service being breached.

Not sure which method really wins overall - I like the idea of securing individual services, and for them to fail individually as soon as something is wrong. For large networks or connected networks VPN may seem a lot easier and natural, and it probably needs less performance tuning. So I think both are viable, and it depends on the requirements. For my infrastructure it seems okay to secure the 3 described services individually, yet if I had 10 services and a high likelihood of adding even more I would probably prefer a VPN.

Possible improvements

While this setup is very secure, it could be even more secure if the client certificates were more thoroughly checked. Right now for all our proxies (PHP-FPM, Elasticsearch and MariaDB) the client certificates just have to be signed by our CA root certificate to be accepted - but the contents of the client certificates do not matter at all.

Being able to only allow one specific certificate to connect could improve security even more. There probably is a way to do that (with all the options Nginx and Elasticsearch provide), so I guess I just have to find out. As long as all the custom certificates signed by our CA are safe and inaccessible it does not matter much, yet it would be nice to be able to specifically allow only certain certificates, so someone getting their hands on one of our certificates would still be unable to access other parts of the infrastructure.