The second article covering my attempt to implement a HTTP/3 server from scratch in Rust. Having looked at the QUIC protocol at length in the previous article, this one sees how HTTP/3 is implemented atop it.
This is the 2nd of the 2 articles that currently make up the “HTTP/3 in Practice” series.
In the previous article, we took a fairly in-depth tour of QUIC, the transport protocol that HTTP/3 is built on top of instead of TCP, as with previous HTTP versions.
This article looks at how HTTP/3 uses QUIC to implement HTTP semantics. Since QUIC was itself designed with this goal in mind, I was expecting this to be a significantly shorter article, since it’s just mapping simple HTTP semantics on to a transport designed for it. It turned out quite a bit longer than I originally expected, however — partly that’s because of some brief(ish!) digressions into previous HTTP versions for context, but partly because HTTP/3 has some complexities, especially around the way it compresses request and response headers.
I am going to assume that you’ve at least got a good working understanding of HTTP/1.1 and at least a passing acquaintance with HTTP/2 — if you’re not familiar with HTTP/2 then you might like to read the article I wrote a few years ago about it.
In case you’re short on time, the tl;dr of this article is that HTTP/3 maintains the same request methods, status codes and header fields as previous versions, but maps these concepts to the underlying transport in a different way. If that’s enough detail for you then no need to thank me for giving you back a few minutes of your life you might otherwise have spent reading this article. If you’re interested in drilling into things in a little more detail then this is the article for you — read on!
I believe it’s often helpful to know a little historical context of how a system has evolved to understand some of its behaviours today, and in this section I’ll run through a very brief summary of how HTTP has evolved since its first incarnation in 1991. You may already be quite familiar with this, or might simply disagree with me that this is useful, and if so I suggest you skip to the section on HTTP Semantics, which briefly outlines the high-level semantics of HTTP, which remain the same in HTTP/3, or skip that as well and jump straight to the Tour of HTTP/3 section.
In principle there have been four major versions in HTTP in common usage. The first version was HTTP as it was defined in the original 1991 proposal by Tim Berners-Lee — it consisted of making a TCP connection, sending a single GET /path/to/file
request and then disconnecting. These days, this is generally known as HTTP/0.9. It was never described by any rigorous standard, so it’s questionable whether it even qualifies as a version.
The second version, HTTP/1.0, was the first to be properly defined in RFC 1945, which was published in 1996. It defines a much more recogisable request and response formats, modelled after MIME, and it specifies multiple request methods: GET
, HEAD
and POST
. It also added the HTTP version field, as well as specification of the content type and the status codes that we still use today.
This was quite quickly replaced by HTTP/1.1 in RFC 2068, released only a year later. This addressed some concerns with the original specification, and I think was regarded by many at the time as the proper HTTP/1 specification. This made the Host
header mandatory, which was important to enable the now common practice of hosting multiple domains on a single IP address, and it also added persistent connections to address the increasing waste of requiring the TCP handshake for every resource on increasingly multimedia websites.
HTTP/1.1 also added a number of other features which were less used at the time, such as 100 Continue
semantics and slew of additional request methods, namely PUT
, PATCH
, DELETE
, CONNECT
, TRACE
and OPTIONS
. Some of these features were initially little-used, but most did find later use. Amazon S3, for example, makes good use of 100 Continue
, and the additional methods were later quite useful for use in RESTful APIs.
HTTP/1.1 was a remarkably stable and is still in very wide use today. Partly this has been down to a solid initial design, but it has also been updated on a few of occasions to address some minor issues:
ETag
headers.Despite HTTP/1.1’s longevity, there was eventually a newer version HTTP/2, published as RFC 7540 in 2015. As I mentioned earlier, I discussed this in a little more detail previously, so I’ll try to be brief. This standard doesn’t really mess with the request/response semantics, but what it does try to do is address some of the underlying transport performance issues. It does this in several ways:
This brings us almost up to date. HTTP/2 adoption grew for several years, rising from around 16% of websites in August 2016 until plateauing at around 65-70% in November 20201. Adoption rose most quickly in the most popular websites, unsurprisingly, but there’s still a long tail of smaller websites which don’t support it, which is a testament to how effective HTTP/1 still is. There’s also been a small but puzzling drop of support in the last few years, and I’m not quite sure of the reason for that — perhaps site admins don’t see the point in maintaining HTTP/2 support once they’ve added HTTP/3, but that’s just a guess. The vast majority of browsers support HTTP/2, at least for those users who’ve updated them at some point since 2015.
HTTP/3 was released most recently, its very first draft being released at the end of 2016 under the title “HTTP over QUIC”, and being referred to as HTTP/3 from the end of 2018. It became a full Standards Track RFC in June 2022. As with HTTP/2, this version of the standard is primarily concerned with improving the transport performance by shifting the underlying transport from TCP to QUIC — as well as the performance benefits, this also enshrines TLS as a core and mandatory part of the standard. It has seen increasing support among websites since early 2020, reaching around 20% of websites at time of writing, and almost 30% among the top 1000 sites.
Since the improvements in page load latency should be more substantial than with HTTP/2, I’m hopeful that there’s more incentive for sites to update — but it’s still a very young standard in the scheme of things, and there’s probably a ton of proxies and other middleboxes that don’t properly support it properly yet, which may be a barrier for some time to come. Whilst updating of browsers is fairly fast, due to the automatic update process, updating websites is much more driven by the perceived benefits and these are potentially less compelling for smaller sites. Updating middleboxes is slowest of all — many companies may have hardware that’s outside its support contract and no longer receiving substantive firmware updates, and many vendors may prefer to use HTTP/3 support to incentivise customers to upgrade to newer hardware as opposed to adding it to existing device firmware. These costs must be justified by benefits, and frankly a lot of websites are “good enough”.
In any case, I appear to have drifted from the history to the future of HTTP, so it’s time to move on. Before I do so, I’ll just mention one more thing, which is that the HTTP/1.1 RFCs mentioned above have been once again obseleted by a reorganisation of the standards which happened in 2022, at the same time that HTTP/3 became an RFC. There are now two standards which are independent of the HTTP version in use:
Each specific version then has its own RFC for the version-specific aspects:
The rest of the discussion in this article will be based on these standards, and not give any regard to the previous versions now obsoleted.
Before getting into the aspects of HTTP which change with HTTP/3, I’ll run through a quick summary of the current state of HTTP semantics, as outlined in RFC 9110. These are independent of the HTTP version in use, and some understanding of them is important to comprehend how HTTP/3 applies them to QUIC. If you’re already very familiar with previous HTTP versions then I doubt there’s anything much for you to learn here, so you can skip ahead to the Tour of HTTP/3 if you like.
HTTP is a client/server protocol where a client requests resources from a server, where resources are identifed by URLs2 — a resource is intentionally kept vague, but will typically either be a file or dynamically-generated data. A client forms a connection and makes requests, and the server responds to each request with some representation of the requested resource, or an error indicating why it cannot or will not do so.
HTTP is a stateless protocol, so each request message must contain the full context required for the server to process it, and each response must only depend on the specific request to which it’s responding, plus any server-side state that might also be relevant. Not state is maintained between requests on the same connection, and indeed HTTP doesn’t generally assume that all requests on a single connection come from the same client — for example, an intermediate proxy might maintain a single connection to a server on behalf of multiple clients.
HTTP messages, both requests and responses, consist of:
In HTTP/1.1 the control data was the initial line of the request or response and the header fields were on subsequent lines, but HTTP/2 and HTTP/3 have their own ways to include this information.
Header fields are name/value pairs, and the HTTP specification defines the meaning of many headers — going through them all is rather beyond the scope of this brief summary. Request headers typically specify things like the format of data expected in response, information about any content sent with the request, and some cache control information to save the server responding if the client already has the latest version of the resource cached. Response headers specify information about the returned content, redirections to other locations to find the resource, and information about how long the returned data may be cached.
Here’s a simple HTTP/1.1 request, which doesn’t contain any content — note the use of ␍␊
(\r\n
) as line delimiters, which is specific to HTTP/1.1 and only used in the header section.
GET /blog/posts/2023/index.html HTTP/1.1␍␊
Host: www.andy-pearce.com␍␊
User-Agent: curl/7.79.1␍␊
Accept: */*␍␊
␍␊
Here is a typical response to that request in HTTP/1.1:
HTTP/1.1 200 OK␍␊
Server: nginx/1.18.0 (Ubuntu)␍␊
Date: Thu, 06 Apr 2023 10:26:41 GMT␍␊
Content-Type: text/html␍␊
Content-Length: 27325␍␊
Last-Modified: Tue, 28 Mar 2023 07:53:03 GMT␍␊
Connection: keep-alive␍␊
ETag: "64229cdf-6abd"␍␊
Accept-Ranges: bytes␍␊
␍␊
<!DOCTYPE html>
<html lang="en">
<head>
...
The 200
in the response above is the status code of the response, which is a general indication of the success or failure of the request. These status codes have the same semantics across all versions of HTTP, though representations of them may differ. The first digit of the code indicates the general type of status:
1xx
— Informational100 Continue
, which indicates the client should continue to send the full request, and 101 Switching Protocols
, which is used with an Upgrade
header to switch to a later HTTP version or some other protocol entirely.2xx
— Success2xx
code depends on the type of request, but 200 OK
is the most common response, typically with content included which represents the entire resource. Another common exampl is 206 Partial
, which is used when the server responds to a range request and the content only includes the specified portion of the resource representation.3xx
— Redirection301 Moved Permanently
or 302 Found
in conjunction with a Location
header indicating the new URL of the resource. The difference between these is whether the client should use the new URL in future requests (for 301
), or continue to use the old URL (for 302
) as this new URL is only temporary.4xx
— Client Error404 Not Found
, where a URL does not map to any resource the server recognises, and 400 Bad Request
, where the client’s request syntax appears to be invalid.5xx
— Server Error500 Internal Server Error
if, say, some component on the server has crashed, or 503 Service Unavailable
, if the server would normally be able to respond but is currently overloaded or undergoing scheduled maintenance.§15 of RFC 9110 has a full list of all 46 specific status codes defined at time of writing.
Whilst a resource is an abstract and generic concept in HTTP, it must have some sort of representation as a series of bytes to be sent down the connection to the client. How a resource is mapped to the bytes that represent it is a topic with some subtlties, which I’ll try to briefly summarise here.
Let’s start by running through the various response header fields which are relevant to the content in the response — this isn’t an exhaustive list, it just includes the basic options which affect how the response is processed.
Content-Type
text/html
and image/png
.Content-Encoding
Content-Type
. A common example is gzip
, which indicates that the server is sending a gzip-compressed version of the data.Content-Length
Content-Encoding
has been applied, so if the content is gzipped then this size will be the compressed size, for example.Transfer-Encoding
chunked
and gzip
, and it’s also important to note that this header is mutually exclusive with Content-Length
. The header has a very limited use in HTTP/2 and is not permitted in HTTP/3 at all.Last-Modified
, which specifies a timestamp at which the resource was last changed, and ETag
, which specifies a checksum or similar opaque value which is expected to change if the resource is altered. This allows clients to make conditional requests with headers reflecting these values back, allowing servers to respond with 304 Not Modified
if the client can safely continue to use a cached version of the resource — this improves performance whilst allow cached copies to be invalidated promptly when required.These feel like the important aspects required to understand HTTP. Whilst there is, of course, significantly more detail in the RFC, there are also a number of additional topics I’ve chosen not to discuss at all. A rough list of these, and the section in RFC 9110 in which to find them, is shown below.
Expect
header.So now to the main topic of this article, the HTTP/3 protocol itself. It borrows heavily from the changes that HTTP/2 already made, and essentially just maps those semantics to a QUIC transport instead of TCP.
Over the following sections, we’ll try to answer these questions:
Let’s start at the beginning. Presuming that a browser or other client supports HTTP/3, it can’t assume that every server does — and servers can’t possibly afford to drop HTTP/1 and HTTP/2 support until they’re certain every single browser supports HTTP/3. Giv en all this, how does a client discover whether a given URL should be fetched over HTTP/3 or not?
For HTTP/2 this situation was resolved by making an initial HTTP/1.1-style connection and then upgrading if the sevrer supports it This is done using the Upgrade
header if using the http
scheme, although I believe more or less all web browsers have decided to only support HTTP/2 if using https
. In that case, a TLS extension Application Layer Protocol Negotiation (ALPN), specified in RFC 7301, is used. For HTTP/3, however, the use of QUIC means an inline upgrade won’t work, since QUIC is UDP-based rather than TCP so the existing connection is useless.
The first thing to note is that since QUIC always uses TLS, only the https
scheme is valid for use with HTTP/3 — so any URL which uses http
is only going to be TCP-based. Beyond this, there currently seems to be two ways to discover HTTP/3 support, and we’ll look at both below.
The first approach relies on an approach specified in RFC 7838 called HTTP Alternative Services. This uses a HTTP header Alt-Svc
to advertise an “alternative service” which the server would prefer the client use if possible. This can be used to direct to other hosts, ports and protocols — it is the client’s decision whether to open a new connection and make the request again on the alternative service.
You’ll need to read the RFC if you want the full details, but to illustrate the principle, here’s an example of a possible such header:
Alt-Svc: h3=www3.andy-pearce.com:8003;ma=3600 , h2=:8002;ma=3600
This means that the server would prefer that the client reconnect using HTTP/3 to host www3.andy-pearce.com
on port 8003
. Failing that, it would prefer the client connect using HTTP/2 to the same host as the current request but on port 8002
. The ma=3600
indicates that both these preferences can be cached for an hour (3600 seconds). The protocol names h2
and h3
come from the IANA registry for TLS extensions as they’re the same names used for ALPN.
One downside of this approach is that it adds latency to the connection, since the client must set up TLS over TCP before making the HTTP/1.1 request whose response contains this header — since one of the prime benefits of QUIC is a reduction in latency, this feels quite counterproductive.
It’s also worth noting that the same approach can be used with a new frame type ALTSVC
in HTTP/2, although since HTTP/2 is typically already upgraded from HTTP/1.1 anyway, I don’t see a great deal of value in this at present.
This where the second approach comes in, however, and it relies on a new HTTPS
DNS record type. This approch is specified in Internet Draft draft-ietf-dnsop-svcb-https-latest, which is at version 12 at time of writing.
The standard defines two new DNS record types, SVCB
(“Service Binding”) and HTTPS
— the latter is essenially a special case of the former, and since it’s the one that you should use for HTTP services then that’s what we’re going to look at here. The principle is actually quite similar to the Alt-Svc
header mentioned above, but it’s a lot faster for a client to do a quick check for a DNS record than it is to run through all the TLS handshake and make a request.
Each HTTPS
record has the following attributes:
HTTPS
records, which won’t be expected to change much.HTTPS
records. This one indiates the priority of this service, which allows multiple alternatives to be specified at different priorities..
) to indicate that it is the same as the name of this record.alpn
which specifies the protocols that can be used on this alternative service. The port
parameter can also be used to specify a different port to use for this service.Let’s see an example of this — I’m using the dnspython library in these snippets, so lookup and decode the HTTPS
records. First let’s look up google.com
.
>>> import dns.resolver
>>> ans = dns.resolver.resolve("google.com", "HTTPS")
>>> ans.rrset[0].to_text()
'1 . alpn="h2,h3"'
There’s only a single record of priority 1 which specifies .
as the target, hence still google.com
in this example, and the only parameter specified is alpn="h2,h3"
, which means that the you should be able to make requests with HTTP/2 or HTTP/3 to port 443 of their hostname.
A slightly more interesting example comes courtesy of cloudflare.com
:
>>> ans = dns.resolver.resolve("cloudflare.com", "HTTPS")
>>> ans.rrset[0].to_text()
'1 . alpn="h3,h3-29,h2"
ipv4hint="104.16.132.229,104.16.133.229"
ipv6hint="2606:4700::6810:84e5,2606:4700::6810:85e5"'
(Output manually wrapped and indented for readability)
This is saying something similar, although you can see that h3-29
is included — this means the version of HTTP/3 specified by draft 29 of the specification, before it became an RFC. You can also see the ipv4hint
and ipv6hint
parameters, which are a time-saver so the client doesn’t need to do additional DNS loopups for the A
and/or AAAA
records for the domain.
This DNS-based approach does not appear to have caught on massively so far, mind you. I found a handy list of the top 1000 websites in CSV form (citation needed) and did a DNS lookup of HTTPS
records on each of them. Only 127 of them had such records at all, and only 58 of them mentioned “h3” in their HTTPS
record. Hopefully this approach will become more popular over time, because it’s a real shame to think the latency improvements of HTTP/3 will be largely nullified by having to make a HTTP/1.1 connection first.
Once the client has decided to set up a HTTP/3 connection, it follows the usual QUIC connection process as described in the previous article.
There are a couple of specific considerations when using HTTP/3 over QUIC. The first thing is the server name must be sent to the server, as with the Host
header in HTTP/1.1. This is done using Server Name Indication (SNI), which is specified in §3 of RFC 6066.
The second consideration is that h3
token must be specified in the ALPN TLS extension, as mentioned earlier in this article in the context of HTTP/2. The client can also offer other protocols if it wishes.
Once the connection is made, both endpoints create a unidirection control stream and send a SETTINGS
frame down it to negotiate connection settings — streams are discussed in the next section, and frame types are discussed in the section after it.
Now we’ve looked at the ways that a client might use to decide to connect with HTTP/3, we’ll look at the different QUIC streams that it creates.
For requests and responses, QUIC uses bidirectional streams, always created by the client3. Since QUIC allows each endpoint to limit the number of concurrent streams, the HTTP/3 standard expects that each end configures at least 100 concurrent bidirectional streams at a time, to avoid reducing performance by reducing parallelism.
A HTTP message, either request or response, consists of up to three sections:
HEADERS
frame, containing control data and initial headersDATA
frames containing message contentHEADERS
frame containing trailing headersThe client initates the stream and sends a single request — each request stream should only ever contain a single request. The server sends its response, and then the stream is closed.
We’ll look at the specifics of these frame types a little later in this article.
Unlike bidirectional streams, unidirectional streams may be created in either direction. Also, they are used for multiple different purposes, so the first thing sent on each stream is a single variable-length integer (encoded as per §16 of the QUIC RFC) indicating the stream type.
The initial standards define four stream types:
ID | Type |
---|---|
0x00 |
Control |
0x01 |
Push |
0x02 |
Encoder |
0x03 |
Decoder |
The last two stream types are defined in a separate document, RFC 9204, which specifies the QPACK header compression technique used to save bandwidth on commonly repeated header text. The use of these streams are discussed as part of the Header Compression section later in this article.
Control streams are used for messages which apply to the connection as a whole, not just a single request or response. Both endpoints create a single such stream at the start of the connection, and the first frame on each should be a SETTINGS
frame to negotiate connection-specific settings. There are additional frame types which can be sent on this stream, such as GOAWAY
to initiate a graceful connection shutdown, and these are discussed later in this article.
Push streams can only be initiated by the server and are used for the optional “server push” feature that was introduced in HTTP/2, although HTTP/3 uses different mechanisms to implement the same principle. The use of this stream is discussed in more detail in the later Server Push section.
HTTP/3 uses frames to carry all information, where frames are serialised across QUIC streams4. A HTTP frame header consists of simply a type and a length, both of which are variable-length integers as per §16 of the QUIC RFC, and these are immediately followed by the frame payload.
Different frames are only valid on some stream types. The frame types are:
ID | Type | Streams |
---|---|---|
0x00 |
DATA |
Request & Push |
0x01 |
HEADERS |
Request & Push |
0x03 |
CANCEL_PUSH |
Control |
0x04 |
SETTINGS |
Control |
0x05 |
PUSH_PROMISE |
Request |
0x07 |
GOAWAY |
Control |
0x0d |
MAX_PUSH_ID |
Control |
HEADERS
and DATA
¶As mentioned earlier, request streams use HEADERS
and DATA
frames to encapsulate requests and responses. The use of the HEADERS
frame is tied in with the encoder and decoder streams, and specified in a different RFC, so I’ve discussed that all together in Header Compression section a little later in this article.
The DATA
frame is simply the frame header (type and length) followed by a series of bytes which form the data content. The earlier HEADERS
frame contains all the context required to process the request or response, the DATA
frames just contain raw content. A given response may span as many DATA
frames as necessary to transfer it.
Note that both HTTP/2 and HTTP/3 do not allow use of chunked encoding, as does HTTP/1. It is not required, as each request/response consumes a single stream, so in the absence of a Content-Length
header then closing the request stream can be used to indicate the end of the response. As such, the Content-Length
header is really just advisory, to allow clients to implement download progress bars or similar — however, the RFC does say that endpoints SHOULD provide it if the size of the content is known in advance5.
SETTINGS
¶A SETTINGS
frame consists of the usual type and length, and the remainder of the frame consists of pairs of variable-length integers. The first of the pair is a numeric identifier for the setting — these value are specified from a registry managed by IANA. The second integer is the value itself — all settings are numeric.
Every setting has a default, and endpoints should use those defaults initially until receiving the SETTINGS
frame from their peer. Clients do not need to explicitly wait for SETTINGS
from the server, but they should process all received traffic before sending anything, just to maximise their chances of setting the SETTINGS
frame first. Also, clients using 0-RTT QUIC traffic should use settings from the previous session rather than the defaults, although of course these should be updated by any SETTINGS
frame subsequently received from the server.
The main RFC defines only a single setting, SETTINGS_MAX_FIELD_SECTION_SIZE
, which allows an endpoint to specify an upper bound on the size of header it will accept on each HTTP message — by default there is no such limit.
The RFC covering how header fields are encoded also defines two more settings, QPACK_MAX_TABLE_CAPACITY
and QPACK_BLOCKED_STREAMS
, and these are discussed in the Header Compression section below.
CANCEL_PUSH
, PUSH_PROMISE
and MAX_PUSH_ID
¶These are discussed in the later Server Push section.
GOAWAY
¶This frame can be sent by either endpoint at any point to initiate a graceful shutdown of the connection. If sent by the server, it includes the client-initiated stream ID indicating the final stream that the server has handled, or still intends to handle. If sent by the client, the highest push ID is sent — the concept of push IDs will be explained later in the Server Push section, but suffice to say this indicates the final server-initiated push that the client intends to handle.
The sender of the GOAWAY
then refuses and rejects any additional streams beyond this limit, and the receiver should not initiate any more such streams.
This approach aims to allow both endpoints to have a consistent idea which streams were handled before the QUIC connection itself is torn down. That said, the endpoint isn’t under an obligation to tear the connection down after a GOAWAY
— it can simply leave it to become idle and be closed later.
We now know enough to see what a simple connection and first couple of requests might look like in HTTP/3. The diagram below starts once the QUIC connection is established between a client and server, and shows it requesting an index.html
and associated style.css
.
This diagram is somewhat simplified, however, as it assumes no use of server push and ignores any additional communication on the streams used for header compression — we’ll look at both these mechanisms in the following sections.
In this section we’ll look at how header values are transmitted in HEADERS
frames. A technique called QPACK is used for compressing these headers, which is specified in a separate document, RFC 9204. This is similar to the situation in HTTP/2 where a method called HPACK was used, specified in RFC 7541. Since HPACK relies compressed field sections being transmitted in-order, and this can’t be guaranteed by QUIC across different streams, then the QPACK method was developed instead.
The HEADERS
frames carry both control data and header fields. The control data is carried by mapping it into pseudo-headers, which start with a colon (:
). For requests, the following pseudo-headers are defined:
:method
GET
or POST
.:scheme
CONNECT
. Specifies the scheme from the request URL (e.g. https
).:path
CONNECT
. Contains the path and query parts of the request URL.:authority
Host
header in HTTP/1.1.For responses, only a single psuedo-header is defined, which is mandatory:
:status
200
), as per the core HTTP semantics in RFC 9110.Header strings must be mapped to integer IDs to be used within the HEADERS
frame to identify header fields. To do this mapping, QPACK uses two different tables:
Entries in the tables can refer to just the name of the field, or a name/value pair, for maximal compression of common values. For example, the value of the :method
pseudo-header is an enumeration with low cardinality, so each name/value pair is represented as a unique value in the static table.
The encoder is responsible for converting header names and values to wire represenations, which can be mapped IDs or just be plain literal representations as well. It is also responsible for maintaining the dynamic table, which is separately stored at each endpoint. The encoder uses the encoder stream to transmit instructions to the decoder in the other endpoint to add entries to the dynamic table to allow the decoder to do its job.
The decoder component is responsible for taking the compressed representations that the encoder generates, and converting them back to textual names and values for the application to consume. It does this by referring both the static and dyanmic table entries that have been built for the connection. It uses a separate decoder stream to send back acknowledgements.
Since both endpoints need to send headers — request headers in one case, response headers in the other — they both have an encoder and decoder, and so there are two encoder streams and two decoder streams. In the next subsection we’ll look at how the tables are used to actually encode a HEADERS
frame, and in the subsection after we’ll look at how the streams are used to sychronise the encoder/decoder pairs.
HEADERS
¶To populated a HEADERS
frame, two basic data types are used:
The HEADERS
frame itself consists of the usual frame header that we saw earlier (frame type and length) followed by a field section. The field section itself starts with a header containing two values, which are then followed by the fields themselves — the encoding of the fields use the static and dynamic tables mentioned earlier. The two values in the header are:
After this header, the remainder of the HEADERS
frame consists of the header fields themselves. These can take any of several forms depending on how the header field has been encoded.
Putting all this together, the diagram below shows the format of the HEADERS
frame as a whole, including the options for the field lines. You don’t need to worry too much about the specific layouts, unless you’re actually planning to implement a HTTP/3 library6 — but I think it’s often helpful to see something drawn out to get an idea of how it fits together.
Since QPACK is a mandatory extension of HTTP/3, endpoints should set up the encoder and decoder streams at the same time as the control stream, which we discussed earlier. This means that each HTTP/3 connection will have six unidirectional streams in total, plus any used for server push (next section), and the bidirectional request streams. Each encoder/decoder pair work in a symmetric way, so this discussion isn’t specific to the client or server.
The encoder can send the following events down the encoder stream:
QPACK_MAX_TABLE_CAPACITY
to specify an upper bound to this, and the setting defaults to zero which effectively disables use of the dynamic table.Conversely, a decoder can send the following events on its own stream:
HEADERS
frames that refer to these new entries.To put this all together, let’s see the exchange of instructions required to open a connection and send the first request. To keep the diagram somewhat simple, I’ve just considered the request encoding, so we’re only looking at the client encoder and server decoder here — when the server sends the response, the converse process will happen with the server encoder and client encoder on their respective streams.
One interesting thing to note is that after the client encoder has sent the instructions to the server decoder, it doesn’t need to wait for any acknowledgement back to start using them to encoder header fields. This is safe because of the Required insert count
field in the field section header that we saw earlier, which indicates to the server’s decoder that it should wait until it’s received the corresponding number of updates before proceeding with the decode.
The final feature of HTTP/3 we’ll look at here is server push. Essentially this is where a server predicts a request that the client is going to make and pre-emptively pushes it to the client to reduce latency of page loading. For example, if the client requests a particular HTML file, the server might quite reasonably assume that the client is also going to request the CSS and javascript files linked within it, so if the server has enough logic to figure this out (or is simply so configured by the administrator) then it can push these files and save a round trip for the client to request them after parsing the HTML.
Of course, this is something that should be used with caution — pushing a number of large images could take up a lot of network bandwidth, for example, and if a client is configured not to display images to the user then this would be totally wasted. But the RFC doesn’t talk about the logic servers should use to decide whether to use push, simply the mechanics of how it does so.
Every push is assigned a unique integer ID which is sequentially assigned starting with zero. These are capped by the value sent in the MAX_PUSH_ID
frame by the client, and the server is not permitted to use server push until the client has sent a first MAX_PUSH_ID
frame to allow some space to allocate IDs — this means that clients can choose not to support server push by simply not sending this frame. The client is free to send another MAX_PUSH_ID
frame at any time to allow the server to send additional pushes.
A push is always triggered by another client request — for example, a request for /path/index.html
might trigger a push of /path/style.css
. The first step is that the server sends a PUSH_PROMISE
frame on the request stream for /path/index.html
, and this frame contains the ID that’s been allocated to the push. It also contains the field section that would normally be in the HEADERS
frame of the request sent by the client for this resource.
Not all requests can be pushed. In particular, requests that require request content cannot be pushed (for hopefully obvious reasons) and also typically only cacheable resources would be pushed. Also, generally only request methods which are safe as defined in §9.2.1 of RFC 9110 — i.e. those which are read-only — can be pushed in this way, and safe methods don’t typically take request payloads anyway.
A client can reject the push by sending a CANCEL_PUSH
frame on the control stream, specifying the push ID — this should abort any server push that is planned or in progress. Failing that, however, the server opens a new stream of type 0x01
and then sends the push ID as a variable length integer. It then proceeds to send response HEADERS
and content DATA
frames down this new push stream as if it was responding on a request stream. Once the push is complete, it closes the stream, also just like a request stream.
That’s about it, really — aside from the server generating the request headers instead of the client, and the use of a push stream as opposed to a request stream, the process is essentially the same as responding to a normal client request. One aspect which might not immediately be obvious is that since streams are not synchronised, it’s entirely possible that the push stream arrives at the client before the PUSH_PROMISE
which corresponds to it — therefore, clients must be written to deal with this, and should probably buffer up the recevied data in expectation of receiving a PUSH_PROMISE
for it shortly. As we saw in the previous article, QUIC offers flow control mechanisms that the client can use to limit the amount of data it must buffer in this way.
So that’s it for our whirlwind tour of HTTP/3. My opinions of it are not entirely dissimilar to my views on HTTP/2, though rather more pronounced. There are definitely some clever aspects, but my main concerns remain complexity and inscrutability.
Let’s talk about the clever aspects first. HTTP/3 doesn’t really try to do all that much more than HTTP/2, it just leverages features of QUIC to do it in a way which is less prone to blocking interactions between streams. As with HTTP/2, I applaud the use of streams across a single connection, which promises worthwhile improvements to latency, as well as requiring fewer connections, which makes life easier for servers and all kinds of middleboxes. The server’s level of control over prioritisation of content is also potentially valuable, and hopefully web authors and developers don’t need to worry so much about where CSS and Javascript is introduced in their HTML if the server’s going to pre-emptively push it anyway. Another minor advantage is dropping the notions of transfer encodings and other connection-specific behaviour, which simplifies HTTP semantics somewhat.
That said, some of these aspects are only going to be of benefit if website administrators and/or webserver vendors put work into leveraging them, and clients indicate support for them. For example, a server would probably need to be configured to properly use server push because it relies on knowing relationships between request patterns of resources. Over time this is something that perhaps servers can handle more automatically, at least for static resources — it wouldn’t be hard to imagine a utility which would parse HTML and work out which other resources (CSS, Javascript, images, etc.) should be pushed to clients requesting it. This could be dumped out in some server-specific configuration file to configure the server push operation. Things are harder for dynamic content, but it might still be possible for a server to automatically detect recurring themes in request patterns and dynamically adjust its push strategy accordingly.
Which brings us on to the first of my other concerns: complexity. This protocol is, if anything, more complicated than HTTP/2, which was itself way more complicated than HTTP/1. The header compression in particular has lots of edge cases which will be quite hard to test systematically. Also, although the encoder and request streams are nominally independent, the fact that the dynamic header field table is required to decode requests creates a serialisation constraint between them — it wouldn’t be hard to see how this could negatively impact performance if both server and client authors aren’t quite careful. The fact that potentially multiple request streams could all be blocked on receiving some update on the encoder stream, which might have randomly been dropped or delayed, is something that I’d be wary of.
A good amount of this complexity is in the header compression, as it was with HTTP/2, and I still question whether this is justified, if I’m honest. HTTP connections commonly transfer many meganytes of data, once you consider chunky HTML files with lots of layout, sprawling CSS to cover the whole site, a myriad of Javascript files, and large image files and other media. Are a coupl e of KB of request and response headers really so much of an overhead in this process to warrant the difficulties of binary representations, dynamically managed lookup tables, and Huffman coding? I’m not suggesting these measures don’t save bandwidth, and I’m not suggesting that saving bandwidth isn’t a worthy goal in principle, but complexity is a cost and costs must be justified — whether the complexity of header compression is justified by the practical benefits it will bring is, in my view, open to debate.
That said, some of this complexity can be side-stepped completely by implementors wishing to keep things simple. Although the header compression must be supported to some extent, endpoints can refuse to support their peer using the dynamic table, and are of course under no obligation to use it themselves. Similarly, server push is disabled by default unless the client explicitly allows it.
My other concern is the inscrutability of the protocol, by which I mean the degree of difficulty inspecting traffic for debugging purposes. Firstly, QUIC itself is quite opaque, even if you disregard its use of TLS. On top of this, HTTP/3 layers its own framing over QUIC’s streams, which are themselves implemented by frames on top of packets on top of UDP datagrams. Taken all together, this is going to make things really hard to work out what’s going on without copious amounts of logging in the endpoints. With HTTP/1 and HTTP/2, at least you could disable TLS for your local development, which allow tools like Wireshark to be used to see what’s going on. I’m sure traffic sniffers could, in principle, also help with QUIC (and, by extension, HTTP/3), but they’ll need some help getting hold of the TLS keys, and this is going to add friction to debugging.
If you take the complexity and the inscrutability together, I feel that this adds up to it being much, much harder to debug and test implementations than it was in the old HTTP/1 days. This is bad news for stability, because obscure bugs are also more likely with this complex protocol — race conditions between endpoints and streams, for example. This might mean that implementations end up riddled with all sorts of odd bugs for a long time, which is going to be quite frustrating, and also a barrier to adoption if HTTP/3 gets an unfair reputation for unreliability due to buggy implementations in some langauge or other.
On the flip side, if this raises the barrier to entry for anyone and their dog writing their own HTTP client implementations, perhaps there might actually be some counterintuitive upsides in focusing more attention on improving the fewer implementations that do exist, which hopefully become more stable, performance and flexible as a result. If a single solid implementation of a client exists for a language, it’s got a much better chance of being adopted into the standard library and making things better for a lot of developers, rather than fragmenting developer efforts across multiple competing libraries.
Still, these concerns are entirely unproven right now — it’s too early in HTTP/3’s lifecycle to be making these sorts of assertions with any degree of confidence. This is one reason why I’m planning to through with actually implementing it — it may prove simpler (or perhaps even more complicated!) than I’m estimating here.
Overall, I don’t want to come across as too negative on HTTP/3 — I don’t think any of the aspects are badly designed per se, and some of them have a lot of great potential. Given HTTP’s ubiquity across the web, it is worth looking for ways to improve things, and it’s not always clear up front whether any given feature will be pointless and over-engineered, or a game-changing enhancement that we all come to love. There’s an argument for being ambitious and throwing a lot at the web to see what sticks — rarely used parts of the protocol can always be ignored or trimmed back in later versions.
But I’m always wary of standards where a good chunk of the complexity is inherent, as opposed to in optional extensions, because of the risk that implementors end up ignoring the hard parts and diverging from the standards — non-compliant implementations end up hurting everyone if they see any kind of wide usage. I think that’s really my primary fear here, that we’re going to see a lot of rubbish client and server implementations which are riddled with issues, and if these become popular then everyone else starts to become obliged to work around them with their own hacks and deviations from the standards. It could become like the browser wars all over again.
But I would only too pleased for these features to be proven unfounded — that’s the direction in which I’m always happy to be wrong.
In any case, that’s all we have for this article. As usual, I hope it’s been interesting and/or helpful in some way. Also as usual, I’ll caution that there may be errors in my elaborations above — it’s based on pulling information together across a comparatively large number of RFCs and other sources, and due to the immature nature of the protocol then there are comparatively few sources with which to corroborate my understanding. If you do spot anything you think is a mistake, I’d very much appreciate you letting me know.
Next time I plan to shift my focus to Rust and implementing a simple UDP server and client, just to get the hang of things.
My statistics are all based on data from the Internet Archive State of the Web report from March 2016 to March 2023, which is the latest available at time of writing. I did some cursory checks in other reports and didn’t see anything that differed too wildly, but I can’t claim to have done anything approaching exhaustive research. ↩
You might notice the RFCs use the term URI, and I’m instead using the term URL. If you don’t already know the difference, and you just want to get things done (which is, after all, the very loosely linking theme of this blog) then I suggest you simply don’t worry about it. If not knowing pains you, then take a read of RFC 3986. My reasons for using URL are simply that I think it’s a more familiar term to most people and doesn’t prompt unnecessary confusion about what a URI is and how it differs. Anyone who understands the distinction will not be confused whichever term is used. ↩
The standard does mention that server-initiated streams could be potentially added in a future extension, but unless such an extension is supported by the client and has been negotiated, the client should regard creation of such streams as an error. ↩
Pausing for a moment just to set the context, that’s HTTP/3 frames over QUIC streams over QUIC frames over QUIC packets over UDP datagrams over IP packets over whatever type of packets the underlying transport uses. Network protocols love these sorts of layered abstractions, but to be fair to both HTTP/3 and QUIC they’ve put some effort into making sure the header overheads of these layers are smallish. ↩
In particular, proxies trying to translate HTTP/2 or HTTP/3 from a client back to HTTP/1.1 to send to a server would probably be forced to buffer up an entire (potentially large) request before sending so that they can generate a Content-Length
header — this is because server support for chunk-encoded requests has always been poor. Proxies going from server to client would be OK, they could just add a Transfer-Encoding: chunked
and send each DATA
frame as a separate chunk, as chunked encoding of responses is well supported in clients. ↩
But who’d be crazy enough to come up with a plan like that when there are already perfectly good options out there? ↩