Gzip compression from top to bottom

Ramón Saquete

Written by Ramón Saquete

gzip compressionGzip compression should be a basic requirement for any hosting company aiming to provide a good service. However, many of them do not activate it or configure their servers correctly to provide the best performance. It’s a quite common situation to find ourselves in, where we have to configure it ourselves, at domain level, via .htaccess, Web.Config or similar.

The technique consists in compressing the information server-side, before sending it to the client browser that requested it, saving time in transmitting this data over the network. To compress the information we can use the gzip format, or alternatively, the deflate format.

Gzip is an open-source compression format developed by the GNU project. As it happens with most lossless compression algorithms, it reduces the size of the files, replacing the most frequent strings by shorter strings. More specifically, gzip and deflate formats are based on the LZ77 algorithm, created in 1977 by Abraham Lempel and Jacob Ziv, which creates a dictionary with the most repeated strings, replacing them in the file with Huffman codes. These codes are unequivocally decodifiable binary sequences (they can be read without separators) and they are shorter for the most frequent strings.

Why is it so important?

This is one of the easiest WPO techniques we can apply, as it doesn’t require us to modify our code, while providing a notable performance improvement at the same time, because it reduces considerably the amount of information that each client has to download to view the page.

It’s especially important if we have embedded images into style sheets or the HTML code using Data URIs, because when we convert images to text, their size increases, and with gzip compression they’re going to stay as they were.

How does it work?

To be able to use it, both the client and the server must be compatible with gzip compression. All compatible browsers (and these are all browsers no older than 10 years) send in the HTTP/1.1 header requests the following parameter:

GET /
Host: www.humanlevel.com
Accept-Encoding: gzip, deflate

With this header, the client is telling the server: give me the home page for www.humanlevel.com, you can send it to me compressed with the gzip or deflate algorithms.

The server, upon seeing this request, can respond with compressed content or not. If it returns compressed content, it will include the following parameter in the response header:

Content-Encoding:gzip

With this parameter, the server is telling the client: I’m sending you the page compressed with gzip. This way, the client knows that it has to apply said algorithm to decompress it. When the header doesn’t appear, it means the content comes without compression.

Usually, the compressed content always comes with gzip. The other alternative compression algorithm is deflate, but it’s never used, because it compresses less and browsers that decompress deflate also decompress gzip. However, it doesn’t work the other way around, because gzip is based on deflate.

Google Chrome sends within the Accept-Encoding Google’s proprietary compression format, the SDCH (Shared Dictionary Compression over HTTP), but there are no Web servers capable of compressing in this format as of yet, and it doesn’t appear there will be, as no one has shown interest in doing so, even though it’s been proposed a few years ago.

The server should additionally include the following parameter, together with the previous one:

Vary:Accept-Encoding

This parameter is oriented to intermediate proxy servers, and it aims to solve the following situation: let’s imagine that a browser, which doesn’t support gzip, carries out the first request and the page is stored without compression in the proxy cache. The subsequent responses will all go uncompressed, even though the browsers request it in gzip. If the first request corresponds to a browser capable of using it, the subsequent responses will all go compressed, even though the browser cannot decompress it and display it to the user. With this parameter we are instructing the proxy that it must store in its cache different versions when the value of the parameter Accept-Encoding varies in the browser’s request, which solves the problem.

While it’s true that using gzip creates an added CPU cost –in the server to compress, and in the client to decompress– the improvement in performance we get as a result of a faster information transmission over the network is much more significant. And that’s not only because it weighs less, but because the information will be separated into less data packages when it’s sent and recomposed at its destination.

What should be keep in mind with regard to its configuration?

First, we must know what should go compressed and what shouldn’t, because if we compress certain files, we may lose in performance.

We should apply compression to files, which aren’t already compressed. I’m primarily talking about HTML, JavaScript, CSS, XML and JSON files, which are generated statically or dynamically. By doing this, we’ll get to save between 60% and 70%, on average, in terms of the size of these files.

If we attempt to compress files, which have already been compressed, we’ll be slightly increasing the size of these files. In this case, I am referring especially to images in JPG, GIF or PNG formats, as well as PDF files. In addition to the CPU cost for the server to compress them, and for the client to decompress them, it will increase the loading time, instead of reducing it. This is because it gets to a point, where information cannot be compressed any further. If we try to compress it again, we’ll be making the file larger by including a new header needed for its decompression.

We should also consider the file size. It’s not worth compressing a file smaller than 1KB, because it is likely that when it’s sent over the network, the information will never be segmented, taking the same time to be transmitted either compressed or not, with the added cost for the CPU of having to work with compressed information in the client and the server. It’s not easy to establish what should be the minimum size for using compression, because it depends on various factors of the networks it has to go through to reach its destination. Generally speaking, it’s usually worse for performance if we compress files below 1 or 2KB.

Occasionally, we can configure the server to store a cache of the compressed files, so that we don’t have to compress them with each new request. When the corresponding uncompressed file changes, the original file will be updated.

Another interesting option we could encounter is to regulate the level of compression, by increasing the CPU cost for file compression, we can make them weight less. And the other way around, we can decrease the CPU cost at the expense of having heavier files. This option will be set up depending on the use of server CPU in each particular case.

How can it be configured?

Let’s take a look at the most important parameters of the most common Web servers:

Apache 2.x

In Apache 2.x we have to active the mod_deflate module (although it’s called deflate, it uses Gzip) and set up the configuration as follows:

ParameterAction
SetOutputFilter DEFLATEEnables compression
AddOutputFilterByType DEFLATE text/html text/css application/x-javascriptSets the MIME types for the files to be compressed.
DeflateCompressionLevel 5Sets the level of compression (between 0 and 9).

Setting the files to be compressed with the MIME type will always be better than specifying the file extensions, because it will apply compression even when the file is generated dynamically.

Apache 1.3

In earlier versions of Apache, below 1.3, we must activate the mod_gzip module, and set up the configuration in the httpd.conf file (server-side) or in the .htaccess file (at a directory level):

ParameterAction
mod_gzip_on YesActivates the module
mod_gzip_item_include file \.js$Activates compression for .js files.
mod_gzip_item_include mime ^text/html$Activates compression for MIME HTML type of files.
mod_gzip_minimum_file_sizeSets the minimum file size to be compressed.
mod_gzip_can_negotiate YesSets that compressed files can be served.
gzip_update_static YesCached gzip files are updated automatically and not manually.

PHP

From PHP we can also enable compression with the following option of the php.ini file:

zlib.output_compression = On

If we have already enabled compression with an Apache module (this option is preferred), we shouldn’t do this, because it would result in a conflicted configuration.

Internet Information Server 7

In IIS7 we have the following configuration screen, which can be accessed by clicking on “compression” within its general options:

IIS7 compression configuration options

Then, in the Web.Config file we must set the following values:

<system.webServer>

<urlCompression doStaticCompression=»true» doDynamicCompression=»true» dynamicCompressionBeforeCache=»true»/>
<httpCompression directory=»%SystemDrive%\inetpub\
temp\IIS Temporary Compressed Files»>
<scheme name=»gzip» dll=»%Windir%\system32\inetsrv\gzip.dll»/>
<dynamicTypes>
<add mimeType=»text/*» enabled=»true»/>
<add mimeType=»message/*» enabled=»true»/>
<add mimeType=»application/javascript» enabled=»true»/>
<add mimeType=»*/*» enabled=»false»/>
</dynamicTypes>
<staticTypes>
<add mimeType=»text/*» enabled=»true»/>
<add mimeType=»message/*» enabled=»true»/>
<add mimeType=»application/javascript» enabled=»true»/>
<add mimeType=»*/*» enabled=»false»/>
</staticTypes>
</httpCompression>
<urlCompression doStaticCompression=»true» doDynamicCompression=»true»/>
</system.webServer>

“doStaticCompression” and “doDynamicCompression” parameters mean that both static and dynamic files must be compressed.

I hope I’ve succeeded in explaining how gzip compression can be correctly configured.

What about your website? Does it have the gzip compression correctly configured?

Ramón Saquete
Autor: Ramón Saquete
Web developer at Human Level Communications online marketing agency. He's an expert in WPO, PHP development and MySQL databases.

Leave a comment

Your email address will not be published. Required fields are marked *