In the Hypertext Transfer Protocol (HTTP) specification, content negotiation is the mechanism that is used, when facing the ability to serve several equivalent contents for a given URI, to provide the best suited one to the final user. The determination of the best suited content is made through one of three mechanisms:
- specific HTTP headers by the client (server-driven negotiation)
- the 300 Multiple Choices or 406 Not Acceptable HTTP response codes by the server (agent-driven negotiation)
- a cache (transparent negotiation)
In this kind of negotiation, the browser (or any other kind of agent) sends several HTTP headers along with the URI. These headers describe the preferred choice of the user. The server uses them as hints and an internal algorithm let it choose the best content to serve to the client. The algorithm is server-specific and not defined in the standard. See, for example, the Apache 2.2 negotiation algorithm.
The HTTP/1.1 standard gives an exhaustive list of the standard headers that may be used in a server-driven negotiation algorithm (Accept, Accept-Charset, Accept-Encoding, Accept-Language: and User-Agent). Nevertheless it allows the server to use other aspects in its algorithm, either aspects outside the request itself or extension header fields, i.e., headers not defined in the HTTP/1.1 standard.
The Accept: header
Defined in the HTTP/1.1 Standard, section 14.1, the Accept: header lists the MIME Types of the media that the agent is willing to process. It is comma-separated lists of MIME type, each combined with a quality factor, as parameters giving the relative degree of preference between the different MIME Types lists.
The Accept: header is defined by the browser, or any other user-agent, and can vary according to the context. It is therefore different when fetching
a document entered in the address bar or an element linked via an
<audio> elements. Neither the HTTP standard, nor the HTML
one, define specific MIME Type to use in specific contexts.
The Accept-Charset: header
Defined in the HTTP/1.1 standard, section 14.2, this header indicates to the server
what character encodings are understood by the user-agent. Traditionally, it was set to a different value for each locale for the browser,
ISO-8859-1,utf-8;q=0.7,*;q=0.7 for a Western European locale.
- Considering that :
- UTF-8 is now well-supported by all relevant user-agents,
- the presence of the header increases the configuration-based entropy exposed,
- the presence of the header increases the data transmitted for each request
- almost no sites are using the value of this header for choosing content during the negotiation,
- browsers started to stop sending this header in each request, starting with Internet Explorer 8, Safari 5, Opera 11 and Firefox 10. In
- the absence of Accept-Charset:, servers can simply assume that UTF-8 and the most common characters sets are understood by the client.
The Accept-Encoding: header
Defined in the HTTP/1.1 standard, section 14.3, this header defines the acceptable content-encoding, mainly supported encryption. The following values are possible :
||A format using the Lempel-Ziv coding (LZ77), with a 32-bit CRC. This is originally the format of the UNIX gzip program. The HTTP/1.1 standard also recommends that the servers supporting this content-encoding should recognize x-gzip as an alias, for compatibility purposes.||RFC 1952|
||A format using the Lempel-Ziv-Welch (LZW) algorithm. The value name was taken from the UNIX compress program, which implemented this algorithm. Like the compress program, which has disappeared from most UNIX distributions, this content-encoding is used by almost no browsers today, partly because of a patent issue (which expired in 2003).||HTTP/1.1|
||Using the zlib structure (defined in RFC 1950), with the deflate compression algorithm (defined in RFC 1951).||RFC 1950 and RFC 1951|
||Indicates the identity function (i.e. no compression, nor modification). This token, except if explicitly specified, is always deemed acceptable.||HTTP/1.1|
||A format using the Brotli algorithm||Experimental|
||This wildcard represents any content-encoding not explicitly specified in the header||HTTP/1.1|
- An IANA registry maintains a complete list of official content encodings. Non-standard ones can be used, but must be prefixed with the x- prefix.
- Two others content encoding, bzip and bzip2, are sometimes used, though not standard. They implement the algorithm used by these two UNIX programs. Note that the first one was discontinued due to patent licensing problems.
- As long as the identity value is not explicitly forbidden, by an identity;q=0 or a *;q=0 without another explicitly set value for identity, the server must never send back a 406 Not Acceptable error.
- Even if both the client and the server supports the same compression algorithms, the server may choose not to compress the body of a response, if the identity value is also acceptable. Two common cases lead to this:
- The data to be sent is already compressed and a second compression won't lead to smaller data to be transmitted. This may the case with some image formats;
- The server is overloaded and cannot afford the computational overhead induced by the compression requirement. Typically, Microsoft recommends not to compress if a server use more than 80 % of its computational power.
The Accept-Language: header
Defined in the HTTP/1.1 standard, section 14.4, this header is used to indicate the language preference of the user. A different value is set according the language of the graphical interface but most browsers allow setting different language preferences.
In this header there is a language quality factor. From w3.org:
- Each language-range MAY be given an associated quality value which represents an estimate of the user's preference for the languages specified by that range. The quality value defaults to "q=1". For example:
Accept-Language: da, en-gb;q=0.8, en;q=0.7
- This header, especially when user-modified, greatly increases the configuration-based entropy and may be used in HTTP fingerprinting of the user.
- Site-designers must not be over-zealous by using language detection via this header as it can lead to a poor user experience:
- They should always provide a way to overcome the server-chosen language, e.g., by providing small links near the top of the page. Most user-agents provide a default value for the Accept-Language: header, adapted to the user interface language and end users often do not modify it, either by not knowing how, or by not being able to do it, as in an Internet café for instance.
- Once a user has overridden the server-chosen language, a site should no longer use language detection and should stick with the explicitly-chosen language. In other words, only entry pages of a site should select the proper language using this header.
The User-Agent: header
Defined in the HTTP/1.1 standard, section 14.43, this header identifies the browser sending the request. This string may contain a space-separated list of product tokens and comments.
A product token is a name followed by a ‘/’ and a version number, like
Firefox/4.0.1. There may be as many of them as the user-agent wants. A comment is a free string delimited by parentheses. Obviously parentheses cannot be used in that string. The inner format of a comment is not defined by the standard, though several browser put several tokens in it, separated by ‘;’.
- Though there are legitimate uses of this header for selecting content, it is considered bad practice to rely on it to define what features are supported by the user agent. Instead try to use in priority feature-oriented object detection.
- Consider the User-Agent: header as a hint only. It may be altered by third-party tools or by the user. If serving tailored content according this header, always provide a way to manually switch to the alternative content.
- Do not expect the product tokens to be served in a specific order or the format or the comments to be fixed; always parse it first by comment and product token, then product tokens by product name and version number. Always take in account that the format of a comment may vary in the future by providing an adequate fallback case.
- A website should not send a 406 Not Acceptable error codes based on the user agent string. It is better to send less suited content than no content at all (See W3C Blog).
- This article describes the current Gecko user-agent strings.
The Vary: response header
In opposition with the previous
Accept-*: headers which are sent by the client, the Vary: HTTP header is sent by the web server in its
response. It indicates the list of headers used by the server during the server-driven content negotiation phase. The header is needed
in order to inform the cache of the decision criteria so that can reproduce it, allowing the cache to be functional while preventing
serving erroneous content to the user.
The special value of ‘*’ means that the server-driven content negotiation also uses information not conveyed in a header to choose the appropriate content.
The Vary: header was added in the version 1.1 of HTTP and is necessary in order to allow caches to work appropriately. A cache, in order to work with agent-driven content negotiation, needs to know which criteria was used by the server to select the transmitted content. That way, the cache can replay the algorithm and will be able to serve acceptable content directly, without more request to the server. Obviously, the wildcard ‘*’ prevents caching from occurring, as the cache cannot know what element is behind it.
Server-driven negotiation suffers from a few downsides:
- It doesn’t scale well. There is one header per feature used in the negotiation. If one wants to use screen size, resolution or other dimensions, a new HTTP header must be created.
- Sending of the headers must be done on every request. This is not too problematic with few headers, but with the eventual multiplications of them, the message size would lead to a decrease in performance.
- The more headers are sent, the more entropy is sent, allowing for better HTTP fingerprinting and corresponding privacy concern.
HTTP allowed from the start another negotiation type, agent-driven negotiation. In this negotiation, when facing an ambiguous request, the server sends back a page containing links to the available alternative resources. The user is presented the resources and choose the one to use.
A second problem is that one more request is needed in order to fetch the real resource, slowing the availability of the resource to the user.
Also note that the caching of the resource is trivial, as each resource has a different URI.
- RFC 2616, section 12, Content Negociation in Hypertext Transfer Protocol — HTTP/1.1, 1999, The Internet Society.
- RFC 2616, section 14.1, Accept in Hypertext Transfer Protocol — HTTP/1.1, 1999, The Internet Society.
- Content Negociation in Apache HTTP Server version 2.2, retrieved on May 8th, 2011.
- Content Negociation in Wikipedia, retrieved on June 10th, 2010.