In our first post we busted the myth that cloud != high performance and outlined the steps to 1 Million TPS (100% reads in RAM) on 1 Amazon EC2 instance for just $1.68/hr. In this post we evaluate the performance of 4 Amazon instances when running a 4 node Aerospike cluster in RAM with 5 different read/write workloads and show that the r3.2xlarge instance delivers the best price/performance.
Several reports have already documented the performance of distributed NoSQL databases on virtual and bare metal cloud infrastructures:
If any of these items interest you there's a full description of each sponsor below. Please click to read more...
Cloud infrastructure services like Amazon EC2 have proven their worth with wild success. The ease of scaling up resources, spinning them up as and when needed and paying by unit of time has unleashed developer creativity, but virtualized environments are not widely considered as the place to run high performance applications and databases.
Cloud providers however have come a long way in their offerings and need a second review of their performance capabilities. After showing 1 Million TPS on Aerospike on bare metal servers, we decided to investigate cloud performance and in the process, bust the myth that cloud != high performance.
We examined a variety of Amazon instances and just discovered the recipe for processing 1 Million TPS in RAM on 1 Aerospike server on a single C3.8xlarge instance - for just $1.68/hr !!!
According to internetlivestats.com, there are 7.5k new tweets per second, 45k google searches per second and 2.3 Million emails sent per second. What would you build if you could process 1 Million database transactions per second for just $1.68/hr?
In this post, I’d like to introduce you to hamsterdb, an Apache 2-licensed, embedded analytical key-value database library similar to Google's leveldb and Oracle's BerkeleyDB.
hamsterdb is not a new contender in this niche. In fact, hamsterdb has been around for over 9 years. In this time, it has dramatically grown, and the focus has shifted from a pure key-value store to an analytical database offering functionality similar to a column store database.
hamsterdb is single-threaded and non-distributed, and users usually link it directly into their applications. hamsterdb offers a unique (at least, as far as I know) implementation of Transactions, as well as other unique features similar to column store databases, making it a natural fit for analytical workloads. It can be used natively from C/C++ and has bindings for Erlang, Python, Java, .NET, and even Ada. It is used in embedded devices and on-premise applications with millions of deployments, as well as serving in cloud instances for caching and indexing.
hamsterdb has a unique feature in the key-value niche: it understands schema information. While most databases do not know or care what kind of keys are inserted, hamsterdb supports key types for binary keys...
In the post I'll show you the way we developed quite simple architecture based on HAProxy, PHP, Redis and MySQL that seamlessly handles approx 1 billion requests every week. There’ll be also a note of the possible ways of further scaling it out and pointed uncommon patterns, that are specific for this project.
What data structure does your database use? It's not something the typical database user spends much time pondering. Since data structure is destiny, what data structure your database uses is a key point to consider in your selection process.
We know CouchDB uses a modified B+ tree. We've learned a lot fascinating details over the years about the use of Log-structured merge-trees in Cassandra, HBase and LevelDB. So B+ trees and LSMs seem familiar by now.
What may not be so familiar is Tokutek's Fractal Tree Indexing technology that is supposed to be even better than B+ trees and LSMs.
As a comparison between Fractal Tree Indexing and LSMs, Bradley Kuszmaul, Chief Architect at Tokutek, has written a detailed paper, a must read for the algorithmically inclined or someone interested in database internals: A Comparison of Log-Structured Merge (LSM) and Fractal Tree Indexing.
Here's a quick intro to Fractal Tree (FT) indexes: