Strategy: Scale Writes to 734 Million Records Per Day Using Time Partitioning

In Scaling writes in MySQL (slides) Philip Tellis, while working for Yahoo, describes how using time based partitions they were able to increase their write capability from 2100 inserts per second (7 million a day) to a sustained 8500 inserts per second (734 million a day). This was capacity enough to handle the load during Michael Jackson's memorial service. In summary, the secrets to scalable writes are:

  • Bulk inserts push up insert rate
  • Partitioning lets you insert more records
  • Partition based on incoming data for fast inserts

Partitioning is a standard approach for handling high write loads because it means data can be written to different hard disks in parallel. In this example Phillip created a separate table for each day with each table having it's own database file. Each table is partitioned on time, 12 partitions per day, 2 hours of data per partition. Huge log streams are often handled this way. Other advantages of this approach: 1) fast drop table operations 2) space for dropped tables is reclaimed immediately. The problems are that it's difficult  to: 1) read by a natural primary key 2) make cross partition queries. But the goal here is fast writes, not analytics or easy reads, so if you want to do something meaningful with the data you may need to load it into another system.

If you are ravenous for even more information take a look at Maximal write througput in MySQL, from the wizards at the MySQL Performance Blog.