Going Crazy with Caching – HTTP Caching and Logged in Users

  • David Buchmann

HTTP caching is an efficient way to make your application scalable and achieve great response times under heavy load. The basic assumption of HTTP caching is that, at least for some time, the same web request will lead to an identical response. As long as “same” means simply the same domain name and path, you will get many cache hits. When users are logged in, we have the opposite situation, where potentially everybody will see different content. Lets take a closer look to see where we can still find safe uses for HTTP caching even with logged in users.

Controlling the HTTP Cache Behaviour

A HTTP request is not only the URL, but also the headers. Some are only for statistics or not relevant for your application. But for some web applications, there are relevant headers. The Accept-Language header can be used to decide on the content language, or when building an API, the Accept header can be used to choose whether to encode the answer in JSON or XML.

HTTP responses can use the header Vary to declare what headers lead to distinct responses on the same URL. A HTTP cache uses the Vary to keep the variants of the same URL apart. This works well when there are few variants – you will still get frequent cache hits. However, if every request comes with a different header, caching on the server side makes no sense anymore. There is no benefit in storing results in the cache that will rarely be reused. Even worse, this is a waste of resources that should be used for caching relevant data.

For this reason, caching proxies like Varnish will by default not attempt any caching as soon as there is a Authorization or Cookie header present in the request. Cookies are commonly used to track a session in the application, meaning the user might see a personalized page that can not be shared with any other user. If you force caching with cookies and have your application send a Vary: Cookie header, you will have the situation described above, where you get no value out of your cache.

The rest of this article will dig into various aspects of what you can do to still do some HTTP caching:

  • Avoid Session Cookie, remove when no longer needed
  • Delegate to the frontend: “Cheating” with Javascript
  • Different cache rules for different parts
  • User Context: Cache by permission group

Avoiding Logins

If your application does not provide any user specific content, avoid having a session cookie (like PHPSESSID for PHP). If there are only a few pages that need a session, remove the cookie afterwards, or limit it to the relevant sub path. Note that frontend tools like google analytics set cookies in javascript and use those to track the user over page reloads. For this reason, you need to make your caching proxy remove all cookies that are not relevant to the backend. The Varnish documentation shows VCL code how to do that in Varnish.

In a similar vein, if you build an API that is password protected with the Authorization header but returns the exact same output for all consumers, you can move the authorization header check to the caching proxy and then remove the Authorization header to allow caching. For Varnish, there is a vmod to use a apache htpasswd file within Varnish.

Delegate to the Frontend

Some websites have only a small user specific part where they show the logged in users' name and maybe a dialog to log out or go to the personal profile page. In this case, you can have the application send a cookie with the user name and profile link to the browser. You build some Javascript into the website to alter the page on every page load to display that users name and update the profile link. On the HTTP cache, you discard cookies except for the profile page, as well as login and logout, which won't be cached. Suddenly, most of your website is cacheable.

Different Cache Rules for Different Parts

If you need different cache rules for different parts of a website, you can use Ajax on the client to have the browser build the site using multiple requests. Each request can have its own caching rules (e.g. vary on cookie or not, but also the cache lifetime).

Multiple requests involve an overhead, especially with slow internet connection like on mobile devices. Therefore it makes sense to do the aggregation of different parts on the server side. The technology for this is Edge Side Includes (ESI), allowing the application to tell the caching proxy how to assemble the page. Using ESI, it can even make sense to cache the login information fragment with a Vary: Cookie header so that it can be embedded in every page for that user.

User Context: Cache by Group

Sometimes, the cache does actually not differ for each individual user. Take for example a newspaper site with a subscription model: As anonymous user or without an active subscription, you can see the front page and read some selected public articles. However, if you are logged in and have an active subscription, you may read all articles. In this simplified example, the website has only two variants: “Guest” and “Subscriber”. But on the caching proxy, we can't decide between the two from the request. A guest might be logged in but have no active subscription. (Note: Even if you could only log in when you have an active subscription, the caching proxy has no way of telling whether a session is valid or not. Just requiring a session cookie would be very weak security.)

In this scenario, you can have the caching proxy do a first request to get a hash for the context of the current user. This context hash is built by the web application from permissions and preferences of the logged in user. The caching proxy then adds the context hash header to the original request and sends that request to the application. The application sees the credentials of the actual user and builds the answer. If the answer only depends on the context but not the individual user, it can send it with the Vary header on the context hash. This means that the permission checks within the application do not need to be rewritten. The response can be cached and if the context hash of a different user matches, that user gets a cache hit. The context hash lookup is another place where caching and vary on cookie / authorization headers makes sense.

I did a talk on this topic at PHP Benelux in Antwerp, end of January. The slides are online if you want to have look at them. I am happy to help you with tricky caching setups with Varnish, contact me to discuss conditions and rates.

Sag uns was du denkst