HSCALE - Handling 200 Million Transactions Per Month Using Transparent Partitioning With MySQL Proxy
Update 2: A HSCALE benchmark finds HSCALE "adds a maximum overhead of about 0.24 ms per query (against a partitioned table)." Future releases promise much improved results.
Update: A new presentation at An Introduction to HSCALE.
After writing Skype Plans for PostgreSQL to Scale to 1 Billion Users, which shows how Skype smartly uses a proxy architecture for scaling, I'm now seeing MySQL Proxy articles all over the place. It's like those "get rich quick" books that say all you have to do is visualize a giraffe with a big yellow dot superimposed over it and by sympathetic magic giraffes will suddenly stampede into your life. Without realizing it I must have visualized transparent proxies smothered in yellow dots.
One of the brightest images is a wonderful series of articles by Peter Romianowski describing the evolution of their proxy architecture. Their application is an OLTP system executing 200 million transaction per month, tables with more than 1.5 billion rows, and a 600 GB total database size. They ran into a wall buying bigger boxes and wanted to move to a sharded architecture. The question for them was: how do you implement sharding?
In the first article four approaches to sharding were identified:
The proxy solution was selected because it's transparent to the application layer. Applications need not know about the partitioning scheme to make it work. Not mucking with apps is a big win. The downside is implementation complexity. How do you parse a query and and map it correctly to the right server? Will this cause a big performance degradation? How is this new more complex and dynamic system to be tested? Can we run the same queries they did before or will they have to rewrite parts of their application? A lot of questions to be worked out.
The second article starts working out those problems using MySQL Proxy. The process was broken into a few steps:
Some of the comments were concerned that a modulus scheme was being used to identify a partition. The recommendation was to use a directory service for mapping to partitions instead. A directory service allows you to logically map partitions behind the scenes and doesn't tie you to a deterministic physical mapping.
After getting all this working they generously released it to the world as HSCALE - Transparent MySQL Partitioning:
HSCALE is a plugin written for MySQL Proxy which allows you to transparently split up tables into multiple tables called partitions. In later versions you will be able to put each partition on a different MySQL server. Application based partitioning means that your split up your data logically and rewrite your application to select the right piece of data (i.e. partition) at any given time. More on application based partitioning. Read here some more about what could be done with HSCALE. HSCALE helps in application based partitioning. Using the MySQL Proxy it sits between your application and the database server. Whenever a sql statement is sent to the server HSCALE analyzes it to find out whether a partitioned table is used. It then tries to find out which partition the sql statement should go to.
Access release .1 at HSCALE 0.1 released - Partitioning Using MySQL Proxy.
The transparent proxy ability is very powerful, but what we are lacking that various companies have created internally is a partition management layer. How do you move partitions? How do you split partitions when a table outgrows the shard or performance declines? Lots of cool tools still to build.