On our main flagship we currently host around 40 million files with close to 80 Terabytes of data stored. That data is NOT stored in MySQL. Just the references and meta-data is stored and retrieved with MySQL and SphinxSearch. We needed to introduce Sphinx in 2006, when we reached over 1 million products and since then we have added more and more goodies to improve the speed of the whole platform. To distribute the load, we run a MySQL replication setup with 2 masters and now 8 slaves – 4 for each master.
I have talked about that setup on PerconaLive last year and for those, who deal with similar large setups, I wanted to add one more hint to avoid the suffering from large updates on tables.
A pretty neat trick is to put the load into an extra database. Let’s call this the “tmpDatabase”, which you can configure to be ignored by the replication process (see graphic above).
If you need to resort the catalog, do statistics or create really extensive updates on large tables: keep in mind, that these statements are all REPLICATED to all the slaves. Most of the time this load is totally unnecessary and can be avoided by doing that on a slave in a temporary database and the result can be copied into the master afterwards.
We do that in the middle of the night, when the catalog restructuring needs about 30 minutes. Copying the result over just takes 1 minute to load!