What data structure does your database use? It's not something the typical database user spends much time pondering. Since data structure is destiny, what data structure your database uses is a key point to consider in your selection process.
We know CouchDB uses a modified B+ tree. We've learned a lot fascinating details over the years about the use of Log-structured merge-trees in Cassandra, HBase and LevelDB. So B+ trees and LSMs seem familiar by now.
What may not be so familiar is Tokutek's Fractal Tree Indexing technology that is supposed to be even better than B+ trees and LSMs.
As a comparison between Fractal Tree Indexing and LSMs, Bradley Kuszmaul, Chief Architect at Tokutek, has written a detailed paper, a must read for the algorithmically inclined or someone interested in database internals: A Comparison of Log-Structured Merge (LSM) and Fractal Tree Indexing.
Here's a quick intro to Fractal Tree (FT) indexes:
The idea behind FT indexes is to maintain a B tree in which each internal node of the tree contains a buffer. When a data record is inserted into the tree, instead of traversing the entire tree the way a B tree would, we simply insert the Eventually the root buffer will fill up with new data records. At that point the FT index copies the inserted records down a level of the tree. Eventually the newly inserted records will reach the leaves, at which point they are simply stored in a leaf node as a B tree would store them. The data records descending through the buffers of the tree can be thought of as messages that say “insert this record”. FT indexes can use other kinds of messages, such as messages that delete a record, or messages that update a record.
The paper starts with an explanation of write amplification, read amplification, and space amplification, the metrics that will be used to compare B trees, FT indexes, and LSMs. These are nicely detailed sections and well worth the price of admission.
Write amplification: the amount of data written to storage compared to the amount of data that the application wrote. Read amplification: the number of I/O’s required to satisfy a particular query. Space amplification: the space required by a data structure can be inflated by fragmentation or requirements for temporary copies of the data.
The paper then has sections explaining in great detail the inner magic of B trees, FT indexes, and LSMs. Which are also worth the price of admission.
As you might expect the analysis shows FT indexes the winner:
- FT indexes provide excellent write amplification, read amplification, and space amplification, both asymptotically and in practice.
- B trees have much higher write amplification than the other alternatives, and are good at read amplification, and pretty good at space amplification. Very few applications should use B trees.
- Leveled LSM’s incur significantly more read amplification than FT indexes both asymptotically and in practice. Very few applications should use leveled LSM’s.
- Size-tiered LSM’s can provide much lower write amplification than Leveled LSM’s or FT indexes at a horrible cost for range query and pretty bad space amplification. A workload that performs few reads and is not sensitive to space amplification, but is very sensitive to write costs might benefit from size-tiered LSM’s. In my experience, it is an unusual application that is so short on write bandwidth that it can gladly give up a factor of 13 on read amplification.
A solid data structure concept is not enough. The paper points out that the data structure a database uses is only one part of entire product. You must also consider features like MVCC, transactions with ACID recovery, two-phase distributed commit, backup, and compression.