Caching for maximum performance and income: a SEO aspect!

For larger sites with a small team, the response times of sites like Google in the US or the computer magazine heise.de are sometimes incredible. I know, how to create a fast site that scales OK, but some things are really hard to reach.

Current studies show, that a perfect LOADING time of a complete page is smaller than 2 seconds and the response time (elements already appearing in the browser window) is less than 250 milliseconds. For most websites, which are created on the fly (dynamic pages, building themselves along certain parameters like language, etc.) these values are tough nuts to crack!

We have decided to take different technologies and combine them in one big “site creation technology basket” to deliver the pages from tradebit.com as fast as possible. Here is a short overview of the technologies used and how they are used:

Memcached

That software is basically a little server, that holds value pairs in memory. You connect to the memcache server and set a value like

setMemcacheValue("myKey","myValue")

and we use that process to store COMPLETE webpages for maximum performance, just like facebook is doing it. The technology is explained nicely in this slideshare file.

It is also PERFECT for session storage on PHP and the language is prepared for that out of the box on any modern Linux servers. You just have to enable it in the /etc/php.d/ config files. Memcached is part of most Linux distributions or can be downloaded here. Take a look at the companies using it… pretty impressive!

SQUID

The next product we could not live without is the proxy solution “Squid“. This software runs as a caching server on dedicated boxes in front of our sites and handles all request to static files like images, CSS or JavaScript. Many providers use the software to tunnel your internet access from home to the web via that software, so parts from websites can be held locally. It is also a good idea to do that the reverse way and keep all those static element requests away from your WordPress installation, for example.

The setup we use it in is called “transparent reverse proxy” and funnels all internet traffic to the Squid Cache first, to check if an image or CSS is already in the cache. Only if that file is NOT found, the Squid asks the web server for the file: very nice for heavy traffic surges on specific pages!

Content Distribution Network (CDN)

A CDN is basically a distributed network of Squid (or other caching) proxies around the world, that are addressed by a specific nameserver setup. For us, this is CDN.tradebit.org. If you are in Asia, you get another address for that domain than people in Europe – for example.

CDN info

Providers like Softlayer or the cloud solutions from Amazon feature such “plug and play” CDNs nowadays.

APC

The Alternative PHP Cache (APC) is working like a specific memcache version just to speed up PHP execution. It holds your code pre-processed in memory and avoids drive access. For us, this is a basic form of caching that works transparently for years!

apc php cache

The graphic shows, that the APC has served over 7.5 million code requests just on one of our 6 web servers within 11.5 hours. This amazing number comes from a lot of single files that are included in a more complex PHP program (like config, user classes, tool libraries, etc.). One dynamic page could easily use 15 or 20 different libraries of functions in a complex page. These libraries provide functions (like “connect to database”), that you might not need on every page and would result in hard drive interactions, if you do not use a cache like that.

Sphinx, MySQL and Apache

For the aspect of┬ácompleteness, we have to mention that software packages like the Sphinx Search server, the MySQL database server or the Apache web server software packages are doing their “built in” caching. It is basically your starting point as a system administrator: get your memory usage of these solutions in order FIRST. After that, you can play with the other solutions.

MySQL for example: the database tries to keep the indices of your data in memory. If you configure that setting the wrong way, most other additions will not help you to gain speed!

The same accounts for Sphinx. Searching in memory always beats searching on hard drives, even on SSD.

Go and cache in peace!

Comments are closed.