For larger sites with a small team, the response times of sites like Google in the US or the computer magazine heise.de are sometimes incredible. I know, how to create a fast site that scales OK, but some things are really hard to reach.
Current studies show, that a perfect LOADING time of a complete page is smaller than 2 seconds and the response time (elements already appearing in the browser window) is less than 250 milliseconds. For most websites, which are created on the fly (dynamic pages, building themselves along certain parameters like language, etc.) these values are tough nuts to crack!
We have decided to take different technologies and combine them in one big “site creation technology basket” to deliver the pages from tradebit.com as fast as possible. Here is a short overview of the technologies used and how they are used:
That software is basically a little server, that holds value pairs in memory. You connect to the memcache server and set a value like
and we use that process to store COMPLETE webpages for maximum performance, just like facebook is doing it. The technology is explained nicely in this slideshare file.
It is also PERFECT for session storage on PHP and the language is prepared for that out of the box on any modern Linux servers. You just have to enable it in the /etc/php.d/ config files. Memcached is part of most Linux distributions or can be downloaded here. Take a look at the companies using it… pretty impressive!
The setup we use it in is called “transparent reverse proxy” and funnels all internet traffic to the Squid Cache first, to check if an image or CSS is already in the cache. Only if that file is NOT found, the Squid asks the web server for the file: very nice for heavy traffic surges on specific pages!
Content Distribution Network (CDN)
A CDN is basically a distributed network of Squid (or other caching) proxies around the world, that are addressed by a specific nameserver setup. For us, this is CDN.tradebit.org. If you are in Asia, you get another address for that domain than people in Europe – for example.
Providers like Softlayer or the cloud solutions from Amazon feature such “plug and play” CDNs nowadays.
The Alternative PHP Cache (APC) is working like a specific memcache version just to speed up PHP execution. It holds your code pre-processed in memory and avoids drive access. For us, this is a basic form of caching that works transparently for years!
The graphic shows, that the APC has served over 7.5 million code requests just on one of our 6 web servers within 11.5 hours. This amazing number comes from a lot of single files that are included in a more complex PHP program (like config, user classes, tool libraries, etc.). One dynamic page could easily use 15 or 20 different libraries of functions in a complex page. These libraries provide functions (like “connect to database”), that you might not need on every page and would result in hard drive interactions, if you do not use a cache like that.
Sphinx, MySQL and Apache
For the aspect of completeness, we have to mention that software packages like the Sphinx Search server, the MySQL database server or the Apache web server software packages are doing their “built in” caching. It is basically your starting point as a system administrator: get your memory usage of these solutions in order FIRST. After that, you can play with the other solutions.
MySQL for example: the database tries to keep the indices of your data in memory. If you configure that setting the wrong way, most other additions will not help you to gain speed!
The same accounts for Sphinx. Searching in memory always beats searching on hard drives, even on SSD.
Go and cache in peace!