NoSQL Pain? Learn How to Read/write Scale Without a Complete Re-write
Monday, June 6, 2011 at 8:42AM Lately I've been reading more cases were different people have started to realize the limitations of the NoSQL promise to database scalability. Note the references below:
- Why does Quora use MySQL as the data store instead of NoSQLs such as Cassandra, MongoDB, CouchDB etc?
- Why did Diaspora abandon MongoDB for MySQL?
- How scalable is CouchDB in practice, not just in theory?
Take MongoDB for example. It's damn fast, but it doesn't really know how to save data reliably to disk. I've had it set up in a replica pair to mitigate that risk. Guess what - both servers in the pair failed and corrupted their data files at the same day.
It appears that for many, the switch to NoSQL can be rather painful. IMO that doesn't necessarily mean that NoSQL is wrong in general, but it's a combination of 1) lack of maturity 2) not the right tool for the job.
That brings the question of what's the alternative solution?
In the following post I tried to summarize the lessons from Ronnie Bodinger
(Head of IT at Avanza Bank AB) presentation on how they turned their current read-mostly scale architecture into a complete read/write scale without a complete re-writing of their existing application and while keeping the database as-is.
The lessons learned:
- Minimize the change by clearly Identifying the scalability hotspots
- Keep the database as is
- Put an In Memory Data Grid as a front end to the database
- Use write-behind to reduce the synchronization overhead
- Use O/R mapping to map the data back into its original format
- Use standard Java API and framework to leverage existing skillset
- Use two parallel (old/new) sites to enable gradual transition
- Use RAM for high performance access and disk for long term storage
- Use commodity Database and HW
For a more detailed explanation read more here.

Reader Comments (6)
Finally, some discussion about when NoSQL breaks.
Great that it's fast, but how do I fix it when it falls over.
This is a strange conflation of mongodb with "nosql."
A single mongodb bug does not justify any broad conclusions about anything, except perhaps the quality of mongodb.
@Dave, 100% agreed. Mongo != NoSQL. NoSQL = {Mongo, Redis, Couch, Cassandra, etc...}
I've been a MySQL fan since mSQL and MySQL back in 1997; stable, reliable, actually pretty fast when architected properly, and somewhat read-scalable with replication and write-scalable with NDB. However, maintenance can get to be a chore (as with any SQL solution) with more than five or six nodes. I've also been a NoSQL convert for many years, but even fantastic solutions like Redis are still pretty immature. (For instance, Redis' Virtual Memory is now something that Salvatore is backing away from, and with good reason, and running out of memory is not dealt with gracefully.)
(Most) NoSQL solutions are terrific, as is mySQL. they both have a very real place in the data center. Keep in mind that things like MySQL are actually built ultimately on top of a NoSQL solution. For instance, MySQL is built on top of Berkeley DB (Sleepycat), which was the key-value de facto standard DB for years before the term NoSQL was even invented.
If I had to choose just one, I'd choose MySQL but that's because it's kind of a swiss army knife and can do a lot. Fortunately, I don't have to choose just one. ;-)
"doesn't really know how to save data reliably to disk"
This obviously refers to a very old MongoDB Version. This (what you call) "bug" is long gone.
"Guess what - both servers in the pair failed and corrupted their data files at the same day"
This is not NoSQL specific. Replica sets in different environments would be a good idea..
To trash NoSql for selling GigaSpaces is so uncool!
Now imagine the data grid failing when write-behind is way behind :(
I rather go with write to disks solution every day.
Adi
Do you call that trashing?
"It appears that for many, the switch to NoSQL can be rather painful. IMO that doesn't necessarily mean that NoSQL is wrong in general, but it's a combination of 1) lack of maturity 2) not the right tool for the job."
"Now imagine the data grid failing when write-behind is way behind :("
If you ever used data-grid you'd know that a data-grid failure doesn't loose the state of the log to the database as the log is synchronously replicated to another backup node which will continue the synch to the database.
"I rather go with write to disks solution every day."
Good luck! - one things that you'll notice is that to get the performance that you expect write to disk isn't synched to disk. The reliability is guaranteed through replication to other backup node in just the same way as with Data Grid.