Performance data for LevelDB, Berkley DB and BangDB for Random Operations
Thursday, November 29, 2012 at 9:15AM
Todd Hoff in Product, benchmark, key-value store

This is a guest post by Sachin Sinha, Founder of Iqlect and developer of BangDB.

The goal for the paper is to provide the performances data for following embedded databases under various scenarios for random operations such as write and read. The data is presented in graphical manner to make the data self explanatory to some extent.

The comparison has been done on the similar grounds (as much as possible) for all the dbs to measure the data as crisply and accurately as possible.

The results of the test show BangDB faster in both reads and writes:

Since BangDB may not be familiar to most people, here's a quick summary of its features: 

  1. Highly concurrent operations on B link -Tree
    1. Manipulation of the tree by any thread uses only a small constant number of page locks any time
    2. Search through the tree does not prevent reading any node. Search procedure in-fact does no locking most of the time
    3. Based on Lehman and Yao paper but extended further for performance improvement
  2. Concurrent Buffer Pools
    1. Separate pools for different types of data. This gives flexibility and better performance when it comes to managing data in the buffer pool in different scenarios
    2. Semi adaptive data flush to ensure performance degrades gracefully in case of data overflowing out of buffer
    3. Access to individual buffer header in the pool is extensively optimized which allows multiple threads to get right headers in highly efficient manner resulting in better performance
    4. 2 LRU lists algorithm for better temporal locality
  3. Other
    1. Write of data/log is always sequential
    2. Vectored read and write as far as possible
    3. Aries algorithm for WAL has been extended for performance. For ex; only index pages have metadata related to logging, data pages are totally free of any such metadata
    4. Key pages for index are further cached to shortcut various steps which results in less locking on highly accessed pages giving way towards better performance
    5. Slab allocator for most of the memory requirements 

Since leveldb writes data in sorted manner always(based on user defined comparator) hence I have used BTREE (and not HASH) for BDB and BangDB across all tests (though you should note that the BTREE way of organizing data is different than the way LSM-Tree does it). Since the best write performance by LevelDB is achieved using single thread and best read performance by using 4 threads(on 4 core machine), hence I have used the best possible numbers for all these dbs unless mentioned for a particular test.

For BangDB and BDB we have used 4 threads for both read and write. Of course these are the active number of threads performing the write and read but there could be more number of threads doing the background operations for each db. For ex; LevelDB uses background thread for compaction, similarly BangDB uses background threads for log flush, managing the buffer pool, read/write from disk, checkpointing etc. I have used the same machine and the similar db configuration (as much as possible) for all the dbs for the performance analysis.

 

Following machine ($750 commodity hardware) used for the test;

Following are the configuration that are kept constant throughout the analysis (unless restated before the test)

There are overall six test cases as described below. Before each test case, the different configuration parameters set for the test case are mentioned. For most of test cases large cache sizes(around 1 - 2GB) are used simply because mostly I have used relatively large number of operations (in the range of couple of millions and 100K operations here consumes around 50MB including key and value) and for many tests I needed to ensure that data doesn't overflow out of the buffer. But for few test cases I have repeated the tests using small memory as well (for test A and B) to show the numbers when cache size is smaller (in the range of 64MB). Other test cases can be repeated for small (or even smaller) cache sizes and similar trends (compared to the bigger cache size tests) could be seen.

The reason for showing the graph for individual test rather than putting a number against the test case is to assess the variability of the performance with different number of operations and other variable parameters. The typical number of operations are in millions starting with thousands.

Random Write and Read using single thread

No Overflow of data (out of the memory), 100K - 2M operations, following are the DB configurations for the test;


Same test case result for smaller cache size is shown below;

Random Write and Read using Multiple Threads

No Overflow of data (out of the memory), 100K - 2M operations, following are the DB configurations for the test;


Same test case result for smaller cache size is shown below;

Random Write and Read of 3GB of data using 1.5GB Buffer Cache

50% Overflow of data (50% to/from disk), 1M - 10M operations, following are the DB configurations for the test;


Random Write and Read vs the number of concurrent threads

No Overflow of data (out of the memory), 1M operations, following are the DB configurations for the test;


Random Write and Read for Large Values

No Overflow of data (out of the memory), 10K - 100K operations, following are the DB configurations for the test;


Random Write and Read simultaneously in overlapped manner

No Overflow of data (out of the memory), 1M operations. The test basically first inserts around (100-x)% of data for x% write and (100-x)% read and then tries to write remaining x% of data with (100-x)% of simultaneous read by 4 concurrent threads and time elapsed is computed only for the later x% write with (100-x)% read. If data is already present then it updates the existing data. following are DB configuration for the test;


 

Summary

I have just covered very few use cases for the performance analysis, for example the sequential read and writes, batch operations are not covered. Also the changes in various configurable parameters may change the graphs completely. BDB and BangDB support Hash based access of the data as well but not covered here because LevelDB stores data in sorted order only.

If we see the amount of data written by individual dbs, we will find that BangDB is writing more data than the other two dbs. BangDB doesn't use compression at the moment hence this can be improved in future if compression is supported.

BDB write performance goes down drastically when writing more than the allocated buffer. For multiple thread tests the dbs perform best (as expected) when number of threads is close to the number of cores in the machine, with the exception of LevelDB write, which works best when single thread performs the operation.

All dbs are available for download freely hence one can/may cover more scenarios comparing these three or other/more such dbs. The test files for the performance analysis is available for download.

Related Articles

Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.