HTTP cache optimization

Ramón Saquete

Written by Ramón Saquete

The HTTP protocol cache headers allow you to control how each of the file types of a web page are cached in the user’s browser. If they are used correctly , it will significantly improve the performance for recurring users of our site, but if not, we could indefinitely and hopelessly cache content and redirects that, in reality, we want to force their update.

Operation of the HTTP cache

Caching is saving a duplicate of the original data for faster access on the next access. The HTTP cache consists of storing the files of a website in the browser or in an intermediate proxy cache , so that subsequent requests for the same files do not have to be retrieved from the origin server, but are either obtained from the browser cache (which is quite fast because the request does not have to travel over the network), or retrieved from a proxy server (which in theory should be closer or at least faster at serving files than the origin server). This speeds up the retrieval of all web files in a second load and saves server resources.

HTTP cache operation
Example of HTTP cache operation in a client browser. The first request arrives at the server and returns the complete file. When it is received, the browser saves it in the cache. The second request is retrieved directly from the browser cache saving another request to the server.

In a intermediate proxy the same files are cached for multiple clients so that a file requested by one user may not need to be re-requested from the server for another userHowever, care must be taken not to cache personal data of a logged-in user, as we do not want to show another user data that is not theirs. We can see how it works visually in the following image:

operation of a proxy cache
Example of proxy server operation. We see that 4 requests are made and the server only has to serve 1.
If the proxy is server-side (we cannot reach the server without going through it), it is called a reverse proxy. If, on the other hand, it is on the client side (the client goes out to the Internet through the client), it is called a direct proxy.

Cache headers for server responses

The HTTP cache is configured with different parameters that provide information to the browser or intermediate proxy cache about how the file should be cached or stored. These parameters appear inside the protocol header, encapsulating the transmitted file.
Loading time will only be saved, if we have correctly configured the HTTP header parameters on the server. The configuration can be done in many ways, you can set that it is not necessary to re-request the file from the server on a subsequent page load, or we can configure the parameters so that the server is asked if the file has changed and, if it has not, the server will not have to return the file, it will only have to indicate that the file has changed. has not been modified with a 304 response code.

Let’s see in depth what parameters and configuration options we have for each of them, but, if you are not interested in going deeper, go directly to the practical use cases:

Cache-control

It is used to define the type of cache policy to be used. It can be applied individually to each file, but it is normal to set one policy per file type. The values or directives that we are going to use, in the answers that we send from the server, are the following ones:

  • max-age: is the maximum time in seconds that the response can be reused on the client without requesting it again from the server. For example “max-age=60” indicates that the file will be reused from the client cache for 1 minute, starting from the time the request was made. This policy applies to both proxies and clients.
  • s-max-age: same as above, but only applicable to intermediate proxies.
  • no-store: indicates that the response will never be stored on the client so it will always be downloaded from the server.
  • no-cache: indicates that you should always check if the file is up to date on the server and if it is, then download it again. With this value “max-age” is ignored. The logical thing would be to use this directive or the previous one (“no-store”), but not both at the same time, although in many occasions you will see that in some pages they use both at the same time. When this happens the browsers consider that you want to use the more restrictive “no-store” directive, but both are set because Internet Explorer 6 incorrectly used the “no-cache” directive with the meaning of “no-store”, although doing this is no longer necessary because this browser is now deservedly extinct.
  • must-revalidate: indicates that compliance with the other directives must be enforced, even if they do not lead to revalidation. This header exists because the client can be configured to apply its own cache policy, such as keeping expired requests for a period of time or it can give expired responses if the server does not respond. With “must-revalidate” we tell the client to ignore its configuration and obey the parameters we return from the server. This policy applies to both clients and proxies.
  • proxy-revalidate: same as above but only for proxies.
  • public: the response can be cached on intermediate proxy servers as it does not contain private information of a specific user. Proxies usually assume that this is the default option.
  • private: contrary to the previous one, it indicates that the response contains private information and should not be stored in an intermediate proxy. We can indicate the value “public” or “private”, but it does not make sense to use both at the same time.
  • no-transform: an intermediate proxy must not transform the response to, for example, optimize the compression of an image or minimize a CSS file.

There are more directives but they are for the Cache-Control header of the client, the ones we are interested in here are the ones we have seen, since they are the ones that can be sent by the server.

ETag

It is called a validation token. It is a hash of the file (hexadecimal number that changes when the file is modified). For example, if the file has expired in the cache or if you have set the policy to always revalidate if updated, the client sends the Etag value to the server in the parameter “If-None-Match“and if it sees that it matches the one it has stored, it sends a response with code 304 (it has not been modified), otherwise it will send the new file with a new Etag.

Last-Modified

It must contain the date of the last modification of the file. When the client exists, similar to the previous parameter, the“If-Modified-Since” header is sent in the request with the value of the last date received. If the server has a higher date it will return the file and if not a 304 response.

The existence of Etag and Last-Modified is mandatory for the operation of the “no-cache”, “max-age” and “must-revalidate” policies, since these parameters are necessary to revalidate requests. So there must be at least one of them, although they can also be used at the same time. The absence of any form of validation means that the complete response will always be returned from the server.

Vary

Indicates whether the content should be cached independently depending on whether any other parameter in the request header varies. The values we use will depend on how the website has been programmed. For example, “Vary: User-Agent” indicates that if the parameter User-Agent If the client’s request is different from a previous request (this happens, for example, when we click on the “request desktop site” option in a mobile browser), then the browser must cache the returned result as a separate file (so we cache the mobile and desktop version). This is necessary for responsive websites, where the HTML generated on the server changes depending on the User-Agent.

Another example, if we use the value “Cookie”, we are saying that the response is different if the client carries a different cookie, which is usually applied when the page changes when modifying a cookie.

Thus we could indicate any parameter of the HTTP header, except “method” since only requests using the GET method are susceptible to caching. In fact, if we make a request by the POST, PUT or DELETE method of a cached file, it is invalidated and the request is launched again.
With this directive it is always recommended to use the “Accept-Encoding” value to prevent intermediate proxies from returning files with incorrect compression formats to clients that do not accept it. For example, if a client that accepts the brotli compression format, a file is returned in this format and stored in the cache, and if the next client only accepts gzip or even no compression at all, if the same file is returned with brotli cached, an error will occur.

Age

It is only for proxies and indicates the time the file was in the proxy. This is necessary to control the time the file spends in cache, since it should be counted from the time the server sent it to the proxy and not from the time the proxy sent it to the client.

Cache headers for the old HTTP/1.0 protocol

There are probably no longer any clients that support this protocol, it is normal to find HTTP/1.1 or HTTP/2, so currently it is not necessary to put these headers. It is only necessary to know what they mean in case we encounter them.

Expires

This is the old version of the “Cache-control” parameter, and only allows you to set the date on which the response expires. It is not the same as the “max-age” parameter of “Cache-control” which sets a time from the time the request was made, but if the latter also appears, then the value of “Expires” is ignored. By setting a date in the past, we are saying that it is not cached.

Pragma

This header is used with the value“no-cache” to indicate that it should not be cached in the browser as well as “Cache-control” with the value “no-store”.

Case studies: which cache headers to apply to which file types?

Below are the optimal parameters for the most common cases, but to better adjust it to a specific case, it is best to review all the options available in the previous point, especially with regard to the Vary parameter.

Dynamic files

These are the HTML files that we want to be updated as soon as they change, so the header that should be set for these files would be:

Cache-Control: no-cache, must-revalidate
Vary: Accept-Encoding
Etag:  "5a0ad72f-1396"
last-modified: Tue, 14 Nov 2017 11:44:47 GMT

Within Cache-Control, we will include the public or private directive depending on whether the response can be shared among several users or not, respectively.
Another option would be to cache these files for a very limited time, even if they have changed, such as 5 seconds:

Cache-Control: max-age=5, must-revalidate

This solution is less aggressive and would save resources in situations where the user reloads the page as soon as it loads.

Static files

Images (including favicon), JavaScript files, CSS and fonts should be cached for as long as possible, as they are rarely changed resources. If we want to force the update of any of these files, we can change the name and its reference in the HTML. There are programming tools that allow to automatically generate a version number or a hash in the file name and include it in the code, which is necessary to be able to cache for a long time and synchronize the update of all caches of all files whenever we want. This technique is called fingerprinting, if a hash is used which is a fingerprint of the file. It is also called revving if a revision number is used. When the developer does not use an automatic tool to generate the file name and link it from the code, instead of fingerprinting what he usually does is to add a query string to the URL (for example: style.css?v=1), to avoid having to rename the file and refresh the cache anyway, but it is not the best option because proxy servers discard caching URLs with query string.

The maximum time we can indicate is one year:

Cache-Control: public, max-age=31536000
Vary: Accept-Encoding
Etag:  "5a0ad72f-1396"
last-modified: Tue, 14 Nov 2017 11:44:47 GMT

In this case we don’t care if the client keeps it in cache for longer than necessary, because if we want to update it we will change the name of the file and therefore it is not necessary to add the must-revalidate directive.

Files that should not be cached

This situation only occurs with files that contain confidential user information. We can use:
Cache-Control: no-cache, no-store, must-revalidate
In this case it is not necessary to indicate whether public or private storage is allowed because we are saying that it should not be cached.
As a matter of security, we must be stricter here and add no-cache to keep compatibility with IE6. It is also not superfluous to add the old “expires” header with a date in the past and “Pragma: no-cache”.

External domain files

If we include files from external domains, their cache parameters will not depend on our server, but on the server where they are hosted.
JavaScript files from Google Analytics, Adwords and other APIs are external domain files that should not be cached.It would be impractical for all owners of pages with these scripts to have to change the referenced file name from the HTML every time Google releases an update.
When there are scripts that are not cached or have small cache times, as in this case, automatic Web optimization tools such as Google PageSpeed, give an error warning to correct, but as the only solution to correct it is to download the files and serve them from our server, it is better not to do it because otherwise we would have a problem when Google updates the code. This is one of those cases where it is better to sacrifice performance in favor of maintainability and code reliability.

Avoid caching 301 redirects

In 301 and 307 redirects, since they are permanent, the Cache-control parameters are also taken into account. So be careful with this, because if we put a 301 redirect that is not correct and the client’s browser caches it for a year, there will be no way to force it to update. So it is recommended that 301 redirects are not cached (here we can simply use Cache-Control: no-store, must-revalidate). Besides, you never know if you will want to revert to the old URLs in the future, or after a migration gone wrong. If this happens and we have the old redirections cached, when we set the new ones we could cause an infinite redirection loop in the user’s browser, in which it would go from one URL to another, that one would send it back to the previous one and so on until the browser gave an error.
302 redirects, being temporary, do not have this problem. Cache parameters are ignored.

Avoid caching 404 errors

404 errors are also cached, so, as with redirects, it is advisable to avoid this, because if at some point we want a URL that gives 404 to return results again, we do not want some clients to keep caching the error.

Conclusion

As we have already seen in other entries such as CSS optimization o JavaScript optimization WPO issues are complex, however, HTTP Cache is relatively easy to implement and improves performance considerably. The rest of the techniques, although complex, are necessary to ensure a good user experience, especially on mobile, so do not fail to apply them in your projects.

Additional references

    Modificado el
Ramón Saquete
Ramón Saquete
Web developer and technical SEO consultant at Human Level. Graduated in Computer Engineering and Technical Engineering in Computer Systems. He is also a Technician in Computer Applications Development and later obtained the Pedagogical Aptitude Certification. Expert in WPO and indexability.

What do you think? Leave a comment

Just in case, your email will not be shown ;)

Related Posts

en