Using SSD as a Foundation for New Generations of Flash Databases - Nati Shalom
“You just can't have it all” is a phrase that most of us are accustomed to hearing and that many still believe to be true when discussing the speed, scale and cost of processing data. To reach high speed data processing, it is necessary to utilize more memory resources which increases cost. This occurs because price increases as memory, on average, tends to be more expensive than commodity disk drive. The idea of data systems being unable to reliably provide you with both memory and fast access—not to mention at the right cost—has long been debated, though the idea of such limitations was cemented by computer scientist, Eric Brewer, who introduced us to the CAP theorem.
The CAP Theorem and Limitations for Distributed Computer Systems
Through this theorem Brewer stated that it was impossible for any distributed computer system to be able to provide users with these three following guarantees simultaneously:
- Consistency (Every node will be able to view the same data at the same time)
- Availability (Every request will receive a response)
- Partition Tolerance (The system will continue to operate even if the system faces any arbitrary failures)
We can evaluate different approaches for data management solution by the above three properties and the tradeoffs that each one of them faces when utilized over the other. For example, what is the tradeoff of putting more emphasis on consistency? The trade off will most likely be less availability or partition tolerance.
Traditional RDBS solutions offered consistency over partition tolerance and cost. In-Memory caching solution such as memcache offered a different set of tradeoffs, for example speed over some degree of consistency (between the in-memory data and the data held in disk).In Memory Data Grid or In Memory Computing, as it is called today, extended the use of memory as the system of record and thus enabled higher degree of consistency by reducing the dependency on the underlying disk databases. The new generation of NoSQL databases took a different approach and offered speed and scale at low cost over consistency by utilizing distributed commodity storage and relying heavily on asynchronous data flow to speed up the data processing.
Quite often there is a direct correlation between those tradeoffs and cost - for example high degree of consistency often relies on synchronous data-flow and replication operations which often come at a cost of speed. To overcome the speed limit we often need to use memory or other flavors of high speed storage which often translate to high cost. On the other hand if we can compromise on consistency, we can offer speed and scale using commodity resources ( without relying on high cost resources).
As technology advances and demands for cost effective and scalable, reliable data increase, new solutions are coming to light, with one of the most favored being a combination between two items we are already familiar with: RAM and flash.
Why SSD Is a Viable Storage Solution
Many of the underlying assumptions in the previously mentioned alternatives that was built under the assumption that disk is the bottleneck. If disk is the bottleneck, we needed layers of optimizations to minimize the access to disk. SSD provides a high speed storage device, which gets rid of the idea that disk is the bottleneck. This is a fundamental change in one of core assumptions behind the design of many databases today, let me explain..
For example if the disk is slow it makes sense to put various filters to minimize the access to disk. As disk gets faster all of those filters becomes an overhead. In many cases it would be faster writing directly to disk without those additional layers. In similar way the use of asynchronous operations need to change as well. When the disk is slow it makes sense to use asynchronous operations to deffer write operations. However, when the disk is no longer the bottleneck it would be faster and vastly simpler to access the disk directly. This also allows avoiding many of the consistency tradeoffs that i mentioned earlier that are often a result of those asynchronous operations.
Putting SSD and RAM Together using off Heap Storage
RAM storage can be as big as necessary and it is incredibly quick—but it costs about ten times more than a flash disk. By putting SSD and RAM together we can optimize the cost/performance ratio. The main challenge in doing so, remain around consistency-i.e. how do we keep the data in-memory and flash in sync so that from an end user perspective this integration would look completely seamless?
One method of such integration is referred to as off-heap storage.
Off Heap storage is often implemented as a plug-in that is used by the memory data-store to offload its data from the actual RAM into Flash and visa versa.
Traditionally Off Heap was implemented with shared memory which is basically a block storage that provides external access to the same RAM device. By doing so, we bypass the management overhead of managing data in RAM through JAVA. The limitation of that approach is that the capacity is still limited by the size of RAM. It also forces an external data management and garbage collection layer to manage this external heap.
With SSD, users can now have both the memory and the SSD device synchronized and regularly kept up-to-date, meaning storage is scalable and consistent.
Some of the SSD drivers provides a key,value interface which makes the integration vastly simpler as it takes care of its own data management. Having said that, most of the In-Memory implementation that uses SSD as off-heap storage provides fairly limited functionality in terms of query and transaction support. This is due to the fact that they( What is they?) integrate with SSD through a disk storage interface.
Using SSD as a Foundation for Flash DataBase
To really make the best use of SSD and maximize its potential we need to think of SSD as foundation for database. This requires a more tight and native integration with SSD in order to overcome some of the limitations that exist with many of the current In Memory Data Grid and SSD implementations.
The specific set of features that are needed for this sort of integration includes:
- Portable and native Key/Value interface that works against any flash device.
- Using Flash as Durable Storage - Flash can be used as a persistent data-store and not just as extension to RAM. As such we can leverage the durability of flash to speed up the load time of data from the underlying flash device incase of a planned or un-planned recovery process.
- Support for batch operations - batch operations are a common way to speed up the access to flash by minimizing the number of cross boundaries calls between the RAM device and the underlying flash devices.
- Transactional - To support transactional access we need to extend the batch operations to fail, succeed as a single atomic operation.
- Query index support - to speed up the query and access time, the data needs to be indexed in a way that fetching a particular object by its IDcould point directly to its physical location on disk.
To make this a practical reality GigaSpaces partnered with SanDisk who through fusion I/O acquisition own the majority of the SSD market. As part of this partnership, we implemented a new version of our In Memory Data Grid, XAP MemoryXtend, which is now integrated with SanDisk ZetaScale interface. ZetaScale implements all of the above features as a general purpose flash disk interface. Through this integration we are now able to utilize the XAP RAM based storage for handling complex queries and analytics as well as transactional support. We can also leverage the XAP cluster support for integrating multiple flash devices that are mounted on various machines and make them work as one big transactional database ( as illustrated in the diagram below).
The XAP MemoryXtend cluster consists of a number of partitions each plugged into a local SSD drive. Each partition can have at least one backup running on a separate machine to ensure availability. The XAP client is a smart proxy that abstracts the underlying cluster and exposes all of the physical partitions through a single data-grid interface. The proxy routes the write or query requests to a particular partition in cases where we’re looking for a specific data item. In the case of agregated queries it will invoke a parallel query against all nodes and consolidate the results into a single result-set.
Flash DataBase as Service
As Flash devices are now being supported by cloud providers such as Amazon, it only make sense to leverage this capability and offer this sort of Flash based database on-demand.
This is where the XAP Deployment and Orechestration comes handy as it automates the process of deployment and management of XAP clusters across a variety of cloud infrastructures.
For more information, check out The next Big Disruption in Big Data.