Learn how to implement HTTP caching properly in your APIs
Cache headers have been an essential part of the HTTP specification from the very beginning. They have played a crucial role in scaling the Web to the enormous size that it has today. Or at least, that statement is true when we talk about the “human Web”. Unfortunately, the vast majority of APIs (the “machine Web”) either completely ignore HTTP-level caching or have implementations that are different levels of broken.
What follows is a quick guide for implementing HTTP caching properly in your APIs.
The Role of HTTP Caching
If you ask a randomly-selected developer what caching is for, more likely than not you will get an answer that it’s for “making things faster”. Now, that is a very generic answer and actually not accurate when we are talking about network-level caching (which HTTP caching is a part of). Network-level caching is not for making a slow computation faster. Its primary purpose is to increase scalability—throughput defined in requests-per-second that we can process without degrading performance, compared to a single-user scenario. Generally speaking, speed of response when there are few users connected should already be at the desired level, using other forms of optimization.
Network-level caching is appropriate for improving throughput (scalability) but not response time under low levels of load (speed).
Two Use-Cases for HTTP Caching
Given the role of HTTP caching we explained above, you can use this type of caching for both mostly-static data, as well as dynamic (i.e. rapidly changing) data.
If you are dealing with dynamic data, you cannot cache for long periods of time, such as days, hours or sometimes even minutes because data becomes stale too quickly. That doesn’t mean, however, that such data shouldn’t be cached at all.
Since network-level caching is mostly used for increasing the throughput, it makes sense to cache responses even for very short periods of time. Let’s see how this works.
If you have low load-levels, there’s no sense in caching something for several seconds, as most clients won’t be able to reach cached data. The rate of cache misses compared to cache hits will be very high. However, when the load increases (let’s say to hundreds of requests-per-second) even a five-second cache will be a lifesaver, satisfying thousands of requests from cache, rather than origin, providing very effective protection for the backend systems by avoiding database hits etc. The amazing thing about this type of caching is that it becomes more effective as the load on the system increases (more cache hits before cache expires).
When I worked at NPR, we were using this type of caching extensively. With a news Web site, you obviously cannot cache content for too long—reporters and journalists will update an article as they see fit and they have very little patience for the article not refreshing despite the edit they made. However, if you cache content for even very short time periods, you get huge payback when millions of readers are simultaneously accessing the very same page/API, during a breaking news event of some kind.
This is the kind of scenario in which caching even dynamic content makes a lot of sense. Most people who have worked on high-traffic systems would have used this approach.
The second type of caching that can and should be used extensively deals with nearly-static data. In any application you have plenty of such data-sets: lists of countries, states, cities (any domain), insurance providers (healthcare), podcasts, series and topics (news media), currencies (banking) etc. These lists do change and generally we have no idea when they will change but we do know that they change quite infrequently. Nearly-static data-sets are very effective targets for long-term caching.
In general, the way we cache long-lasting data-sets in HTTP is different from the way we do it with dynamic data-sets.
Caching Dynamic Data with HTTP
If you have had an opportunity to hear Mike Amundsen talk about distributed systems architecture, you may already know that in distributed systems deployed over large geographic areas (e.g. the Web) you cannot rely on the existence of a shared understanding of “now” (or time in general). This, among other things, has to do with basic physics—information cannot propagate instantaneously due to the limit on speed of light. For instance, if a server in Chicago, at 11:55:00am local time, tells a client in Melbourne that a response is valid until 11:55:02am Chicago time:
- We will have to be sure that timezone conversions are done properly by every participant of that exchange.
- We will need to assume clocks on the Chicago server and Melbourne client are ideally synchronized (generally, a pipe dream).
- For the client to leverage the cache, a response from Chicago server to Melbourne client will need to arrive in less than two seconds, otherwise the cache will already be invalid by the time the response is received. Considering the distance between Chicago and Melbourne, the theoretical limitation of the speed of light and the actual speed of data transmission on the public Web (which is much slower), this goal may be unattainable.
In distributed systems, deployed at large distances (such as the Web), the above assumptions are so unrealistic that using date-based caching instructions, such as the Expires header, is highly ineffective. The same is true for the combination of Last-Modified and If-Modified-Since headers, which also rely on a shared understanding of date-time.
Caching Near-Static Data Sets
If you are caching resources (API responses) for sufficiently long periods of time (hours, days, potentially months) you usually do not have to worry about the issues related to date-time-based caching, which were described in the previous section.
For facilitating caching of near-static data, where we have no reliable clue about when data will become stale but we know it won’t happen too soon and yet it will happen, you could use two approaches:
- Entity Tags (ETags) that don’t rely on shared agreement on time
- The Last-Modified HTTP header, which is date-time-centric
Let’s see how each of these works:
Using Entity Tags In this workflow, for each response, the server provides an ETag header in the response. For a specific “version” of the data, ETag has to be unique and constant, until the data changes:
HTTP/1.1 200 OK Content-Type: application/vnd.uber+json; charset=UTF-8 Expires: Sat, 01 Jan 1970 00:00:00 GMT Pragma: no-cache ETag: "88d979a0a78942b5bda05ace4214556a" … the rest of the response …
In a number of implementations, ETag is some sort of a hash of the response payload but it can really be anything as long as it’s unique and consistent with the change of data (same response = same ETag).
Important: Please note that while we are discussing ETags in the context of caching, the ETag HTTP header is not, technically, a “caching” header per se. It’s part of the RFC7232 – Conditional Requests specification, separate from the RFC7234 – HTTP 1.1 Caching spec. HTTP clients handle ETags in the response and cache instructions independently. This is why the example response above has
pragma: no-cache and the Expires header set in distant past. That specific API wants you to use ETags for determining the freshness of the response but not cache headers. The two approaches can get in each other’s way through double caching, if they instruct the client with inconsistent hints. In general, you should either always use only one approach in your responses (explicitly disabling the other) or make absolutely sure that the two hints lead to the same result, regardless of which one the client is paying attention to or even if the client respects both instructions (“good” clients should).
The reason the example API decided to explicitly disable caching is that some clients (e.g. Web browsers) make some default assumptions about the cache-ability of content when no cache headers are present. To the best of my knowledge, no major HTTP client makes assumptions on ETags, so we typically don’t need to worry about undefined ETags.
Once the client receives the response and sees the ETag value, it should save the ETag and the response in the local store and issue subsequent requests to the same data with the
If-None-Match header that points to the value of the ETag saved:
Get /countries HTTP/1.1 Host: api.example.org Content-Type: application/vnd.uber+json If-None-Match: "88d979a0a78942b5bda05ace4214556a"
If the data-set hasn’t modified on the server, the server must respond with HTTP 304 and an empty body:
HTTP/1.1 304 Not Modified Content-Type: application/vnd.uber+json Expires: Sat, 01 Jan 1970 00:00:00 GMT Pragma: no-cache Content-Length: 0
If the data-set has modified on the server, the server must respond with a full HTTP 200 response and the new value of ETag.
Using Last-Modified In this workflow, for each response, the server provides a “Last-Modified” header in the response, containing the last date the specific data was modified:
HTTP/1.1 200 OK Last-Modified: Mon, 7 Dec 2015 15:29:14 GMT Content-Length: 23456 Content-Type: application/vnd.uber+json; charset=UTF-8 … the rest of the response …
Once the client receives the response and sees the Last-Modified header, it should save the value of the Last-Modified date-time and the corresponding response in the local store (cache). The client should issue subsequent requests to the same data with the
If-Modified-Since header that points to the value of the date-time saved:
Get /countries HTTP/1.1 Host: api.example.org Content-Type: application/vnd.uber+json If-Modified-Since: Mon, 7 Dec 2015 15:29:14 GMT
If the data-set hasn’t modified on the server since the date-time indicated, the server must respond with HTTP 304 and an empty body:
HTTP/1.1 304 Not Modified Content-Type: application/vnd.uber+json Content-Length: 0
If the data-set has modified on the server, the server must respond with a full HTTP 200 response and the new value of the “Last-Modified” header.
One More Thing: The Vary Header When using cache-controlling headers (such as
cache-control: max-age, and
Expires) the communicating parties have to determine what constitutes “access to the same resource representation”. This is generally determined based on: “Is the full request URL the same between two interactions?” However, depending on what you are doing, two requests with the same full URL may be pointing to different resource representations because some of the headers in the exchange are different. Case in point: you don’t want a client to cache a JSON representation when you are asking for an XML representation at the same API endpoint. In these two cases, the URL can be the same between two interactions but the expected content type of the payload will be different because of the HTTP header values.
If values of HTTP headers have to be taken into account during HTTP caching, you need to utilize a special O HTTP header called: Vary. Basically, this header lets communicating parties know which other headers they need to pay attention to, besides the URL, in caching determinations.
You can see an example of effective usage of the Vary header in my recent blog post where I discussed usage of the Prefer header.
(This post was originally published on my personal blog.)