Why every web developer should read the HTTP specifications

  • Patrick Zahnd

Last week, some Liipers and I have attended the JSDay and PHPDay in Verona, Italy. We have seen many interesting talks and could take home lots of new ideas and knowledge.

In this blog post I just want to bring close what I have learned about HTTP caching and RESTful Applications during the workshop by Fabien Potentier and the talk by David Zuelke.

HTTP caching

Workshop “Caching on the Edge” by Fabien Potentier –  phpday.it/2011/session/caching-edge

Fabien started his workshop with the question: “Who in this room has read the HTTP specifications?”. As you might already guess, there where very few who could answer with “Yes”. To be exact: nobody but himself.

He seemed not to be very surprised, but on the other hand he was not pleased to hear people joking about. He meant that very serious and in the next hours we could then understand why.

Some facts about HTTP caching

HTTP/1.1 allows caching anything by default. That simple means, that every HTTP response without a “Cache-Control” header is being cached.

In practice, servers do already avoid the caching of “Cache-Control”, “Cookie / Set-Cookie”, “WWW-Authenticate”, “Post / Put” and “Status Codes”. Important is to know, that they don't have to.

Cache headers only work with “safe” HTTP methods like GET and HEAD. It's important to avoid changing the state of a server on a GET request. You should only change its state if you have to.

Cookies also prevent the page of being cached.

The cache headers

It's not always easy to decide which caching header you should use for which case. There are just some important facts about cahing headers that you should know about:

  • The expires header is very unsafe to use, because there is no way to make sure that the time of all the machines a response is passing is the same. You should usually prefer the Cache-Control header using the property “max-age”.
  • If you use the “max-age” property of the Cache-Control header, every reverse proxy automatically adds an “age” property to compensate the time the request between every machine needed.
  • For both expires and cache-control response header, the request sends an “If-modified-since” header.
  • To avoid the reverse proxies of caching, you can add the param “private”. With this param, only the Browser is allowed to cache. On the other hand, you can set the “s-maxage” param to only let the proxies cache. This is used for Edge Side Includes (ESI).
  • To set up a hash tag for a page, you can use the ETag header. This one can be checked by sending the “If-none-match” header in your request.
  • Non-modified pages can respond with the code 304 “Not modified”.

_As we use PHP, we should be aware of that PHP automatically adds the “Cache-Control: no-store, no-cache” header whenever there is a session (sessionstart).

Edge side includes (ESI)

Edge side includes make it possible to only cache single parts of the page. Also you can set up own cache controls for each include.

To use ESI, you have to send two response headers: “Surrogate-Capability” and “Surrogate-Control”. Through these headers the server identifies itself and tells that it understands ESI.

You can then use the ESI-tags as follows:

<esi:include src="/path/to/content" />

More informations about ESI you can find under  www.w3.org/TR/esi-lang

There are some reverse proxies that support edge side includes, but the most common one at this time is Varnish. If you are interested in reading more about varnish, you can check out the project page:  www.varnish-cache.org

RESTful Applications

Talk “Designing HTTP Interfaces and RESTful Web Services” by David Zuelke –  joind.in/3013

Another topic in most of our projects are their APIs. I attended this talk because I have already started to implement the internal API for the Migipedia project. It was very interesting to see, how RESTful web services would be supposed to be.

HTTP interface design

To start, I would like to list some bad URLs, that you can take a look at them and to have some examples to correct later on.

  • liip.ch/api/1.0/products.format
  • liip.ch/api/1.0/product/show/333.format
  • liip.ch/api/1.0/products/filter/chocolate/sort/desc.format
  • liip.ch/api/1.0/photos/filter/product/333.format
  • liip.ch/api/1.0/photo/show/1234.format

REST – DEFINED BY ROY THOMAS FIELDING

To answer what exactly could be done better, you have to know the definition of REST. The rest approach could be used anywhere, it just matches very good to HTTP. Things REST is meant to be:

  • There has to be a Client / Server connection
  • It must be stateless
  • The pages should be cacheable
  • It has to be a layered system
  • The system must be  uniform
    • An URL identifies a resource
    • Sub URLs are sub resources (HTTP specification)
    • URLs have a implicit hierarchy
    • Methods perform operations on resources
    • Operation is implicit and is not part of the URL
    • Hypermedia formats are used to represent the data
    • Link relations are used to navigate a service

url problems solved

liip.ch/api/1.0/products/show.format

  • Versions should be handled by defining a hypermedia format
  • Formats can also be requested by adding a hypermedia format
  • The actions (show, add) should be accessed by using methods (GET, POST)

Solution:

liip.ch/api/products

Accept: text/vnd.ch.liip.api.v1+html, application/vnd.ch.liip.api.v1+xml

Methods: GET, POST

liip.ch/api/1.0/product/show/333.format

  • The actions (show, delete) should be accessed by using methods (GET, DELETE)
  • Resource names should always be in plural (name should not change for single or multiple entries)

Solution:

liip.ch/api/products/333

Accept: text/vnd.ch.liip.api.v1+html, application/vnd.ch.liip.api.v1+xml

Methods: GET, DELETE, …

liip.ch/api/products/filter/chocolate/sort/desc

  • Filters should be defined as GET parameters

Solution:

liip.ch/api/products?filter=chocolate&sort=desc

Accept: text/vnd.ch.liip.api.v1+html, application/vnd.ch.liip.api.v1+xml

Method: GET

liip.ch/api/photos/filter/product/333

  • Sub resources have to be sub URLs
  • Retrieve (Get), add (POST) or erase (DELETE) sub resources by using the related methods. But keep in mind to protect your resources being changed or deleted by unauthorized users

Solution:

liip.ch/api/products/333/photos

Accept: text/vnd.ch.liip.api.v1+html, application/vnd.ch.liip.api.v1+xml

Methods: GET, POST, DELETE

Uncommon HTTP methods

Besides the usually used HTTP methods, there are a number of often forgotten but useful methods to be kept in mind. To be brief, I picked the imho two most useful ones:

  • OPTIONS

If a client asks for information using this method, it should get the available HTTP methods for this specific location and user.

  • PATCH

This method signalizes a partial update of the information of the given resource. This might become handy to keep a history of changes and reduces the amount of data to be send.

Hyperlinks

There was another problem, that most of the APIs on the web do not serve you with the thing that makes the Web ticking: hyperlinks.

Every response should contain hyperlinks to related or further contents. The consumer of the API should be able to find these pages by parsing the response. In an XML response for example, you can add related hyperlinks (XLinks) in the following format:

<atom:link rel="product" type="aplication/vnd.ch.liip.api.v1+xml" href="http://liip.ch/api/products/333" />

Conclusion

You can now see, that both topics have HTTP headers in common. As a web engineer, you need to understand those headers to create web applications corresponding to the specifications of HTTP. That's why you should at least read some informations about the HTTP specifications, for example the Headers and the Status Codes.

By writing this post I could hopefully bring you the topics a bit closer and hope you have an idea of how you should use HTTP headers for caching and how RESTful applications should look like. If you want to learn more about this topic I recommend to read the following pages:

HTTP headers –  net.tutsplus.com/tutorials/other/http-headers-for-dummies

REST –  en.wikipedia.org/wiki/Representational_State_TransferXLinks –  en.wikipedia.org/wiki/XLink

Lovefilm.com API (suggestion by D. Zuelke)

Thank you very much for your attention.


Tell us what you think