Developing stateful anycast architecture: SSL+HTTP/3

October 9, 2020

A few months ago I wrote about the CDN that I’ve been building to learn about anycast routing (link). Almost everything about the CDN has changed as I’ve completely rewritten the system. One of the challenges faced by the new platform is I wanted to give HTTP a try over the CDN. Easy enough, I added a few lines to the configuration profile and instructed the provisioning process to deploy a Caddy webserver on each of the nodes. Caddy served a preset JSON response object containing the name of the edge node that is serving that content. I made an HTTP request with curl and everything is all well and good. I then realized a problem: the domain I had picked for the project is a .dev TLD which means it’s HSTS preloaded on many major browsers. This means it’s time to face the elephant in the room; TLS certificates.

I typically let Caddy handle SSL entirely, as it has excellent automatic integration with the LetsEncrypt CA. Caddy takes care of the request and renewal processes completely transparently to the user which makes things nice and easy. With an Anycast deployment it’s not that easy.

LetsEncrypt supports a few ACME challenges to validate the domain before issuing a certificate. The two primary means are HTTP-01 and DNS-01, using HTTP and DNS protocols respectively. The DNS challenge works by adding a TXT record to the domain, and the HTTP challenge works by adding a text file to the .well-known directory of ACME challenges. I chose the HTTP-01 challenge for this project in order to maintain separation between the DNS and HTTP parts of my CDN.

Here’s how the process works. First, the controller uses LetsEncrypt’s certbot to initialize the certificate request and specifies it would like to use the HTTP-01 ACME challenge. LetsEncrypt then responds with the validation string and token and certbot invokes a callback on the controller to signal an update. This update propagates the challenge to the anycast nodes, which serve the file to be picked up by the closest LetsEncrypt validation server. Once the process is complete, LetsEncrypt issues a signed certificate to the controller, which sends the certificate to the anycast nodes along with a signal to remove the ACME challenge. Certificate renewals work in the same way.

TLS certificate flow

TLS certificate flow

HTTP/3 (or HTTP over QUIC) is the newest version of the HTTP protocol, the primary benefit being the redesign of how stateful connections work. HTTP/1 utilized a TCP transport and in most cases multiple TCP sessions (one per connection). This led to extra overhead and more importantly for an anycast deployment if you don’t have TCP connection synchronization, a higher likelihood that you’re going to drop connections if something goes wrong. HTTP/2 improved upon this by multiplexing multiple requests on to a single TCP transport, and HTTP/3 takes it a step further. With HTTP over QUIC, requests are served on a UDP transport that allows the protocol to control the retransmission of packets. This is beneficial for spotty connections, but also for the case of an anycast node going offline without connection synchronization. HTTP/3 is able to retransmit packets without the problem of a broken TCP connection as a result of node failover breaking the sequence. This is great for anycast because it means higher reliability on top of a highly resilient network topology like anycast.

It’s important to note that HTTP/3 is not enabled by default in any major browsers or webservers as it’s considered an experimental feature. Like IPv6, it’s a game of who wants to take the first leap which will force the other parties to catch up with their implementation. Either way, HTTP/3 is a promising technology and has clear uses in anycast. I look forward to seeing how the protocol evolves in the future.