Scaling Out MySQL

This post covers two main options for scaling-out MySql and compare between them. The first is based on data-base clustering and the second is based on In Memory clustering a.k.a Data Grid. A special emphasis is given to a pattern which shows how to scale our existing data base without changing it through a combination of Data Grid and data base as a background service. This pattern is referred to as Persistency as a Service (PaaS). It also address many of the fequently asked question related to how performance, reliability and scalability is achieved with this pattern.

Comments

Todd Hoff's picture

Re: Scaling Out MySQL

Excellent article Nati. It fits nicely with the whole "memory is the new disk" meme. I love how you tackled objections head on in a very organized and clear manner. While reading I was wondering about a few things...

Is it possible to use the database in parallel with the grid? It would seem that changes made directly to the database wouldn't be reflected in the in memory versions and it would sidestep your transactions.

Do you have any customers not use a database? The database seems pretty useless at this point.

And how would you compare yourself with a queue based architectures? Storing work in queues and scaling up processing nodes is a pretty simple and robust architecture.

Re: Scaling Out MySQL

Hi Todd

Thanks for the complements...
Now to your questions:

"Is it possible to use the database in parallel with the grid? It would seem that changes made directly to the database wouldn't be reflected in the in memory versions and it would sidestep your transactions."

Since there is no standard way to triggers events on existing databases updates done on the data base directly will not be propagated to the data-grid. We had done specific integration with one of our customers using Sybase replicator as a mean to get updates from the data base and propagate it back to the data-grid but that was tailored to Sybase at the time. Having said that the recommended way is to let the data-grid handle the changes and execute all updates through the data-grid rather through the database.

"Do you have any customers not use a database? The database seems pretty useless at this point."

Most of our customers still use a database somewhere in their architecture however as i mentioned in my post it is the role of the database that is changed. The role of the database depends on the type of the application:
In transactional applications we used the database for maintaining in-flight transaction, high availability and durability of our application, with Data-Grid we use the database for persistence only where the rest is managed by the data-grid. In real-time-analytical application the database is used during initial load only. The actual analytics happens in-memory. In low-latency applications such as market-data and billing applications the database is used mostly as a background process that is used for auditing purposes. The database is also used for integration with legacy systems and for reporting purposes.

"And how would you compare yourself with a queue based architectures?"

It is interesting that you ask this question. One of the things that led to inception of GigaSpaces was the realization that messaging and data goes hand in hand. This realization came through the experience of working with B2B exchanges (ecommerce site of today). For years i argued that one of the main fallacies in distributed system is the fact that we were thought to think of messaging and data as two separate things.
If you think about it messaging is just another form of data. The only difference between a queue and a table is that a queue is ordered by the time in which object are written and rows are ordered by their type. Does that justify a totally separate products, semantics, and clustering model?

GigaSpaces started by implementing the JavaSpaces specification. JavaSpaces was one of the first attempt to combine the two worlds of messaging and data to one model which was used primarily for coordination of parallel processing tasks using the master/worker pattern. We even support JMS and Mule which is an open-source ESB implementation.

I even discussed the need for the combination of the two in one of my recent posts here: The Missing Piece in Cloud Computing: Middleware Virtualiztion

HTH
Nati S.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd><div ?=?><p ?=?> <img ?=?><h1 ?=?><h2 ?=?><h3 ?=?>
  • Lines and paragraphs break automatically.
  • Glossary terms will be automatically marked with links to their descriptions
  • You may link to webpages through the weblinks registry

More information about formatting options

To combat spam, please enter the code in the image.