Datanet: a New CRDT Database that Let's You Do Bad Bad Things to Distributed Data

We've had databases targeting consistency. These are your typical RDBMSs. We've had databases targeting availability. These are your typical NoSQL databases.

If you're using your CAP decoder ring you know what's next...what databases do we have that target making concurrency a first class feature? That promise to thrive and continue to function when network partitions occur?

No many, but we have a brand new concurrency oriented database: Datanet - a P2P replication system that utilizes CRDT algorithms to allow multiple concurrent actors to modify data and then automatically & sensibly resolve modification conflicts.

Datanet is the creation of Russell Sullivan. Russell spent over three years hidden away in his mad scientist layer researching, thinking, coding, refining, and testing Datanet. You may remember Russell. He has been involved with several articles on HighScalability and he wrote AlchemyDB, a NoSQL database, which was acquired by Aerospike.

So Russell has a feel for what's next. When he built AlchemyDB he was way ahead of the pack and now he thinks practical, programmer friendly CRDTs are what's next. Why?

Concurrency and data locality. To quote Russell:

Datanet lets you ship data to the spot where the action is happening. When the action happens it is processed locally, your system's reactivity is insanely quick. This is pretty much the opposite of the non-concurrent case where you need to go to a specific machine in the cloud to modify a piece of data regardless of where the action takes place. As your system grows, the concurrent approach is superior.
We have been slowly moving away from transactions towards NoSQL for reasons of scalability, availability, robustness, etc. Datanet continues this evolution by taking the next step and moving towards extreme distribution: supporting tons of concurrent writers.
The shift is to more distribution in computation. We went from one app-server & one DB to app-server-clusters and clustered-DBs, to geographically distributed data-centers, and now we are going much further with Datanet, data is distributed anywhere you need it to a local cache that functions as a database master.

How does Datanet work?

In Datanet, the same piece of data can simultaneously exist as a write-able entity in many many places in the stack. Datanet is a different way of looking at data: Datanet more closely resembles an internet routing protocol than a traditional client-server database ... and this mirrors the current realities that data is much more in flight than it used to be.

What bad bad things can you do to your distributed data? Here's an amazing video of how Datanet recovers quickly, predictably, and automatically from Chaos Monkey level extinction events. It's pretty slick.

Here's an email interview I did with Russell. He goes into a lot more detail about Datanet and what it's all about. I think you will find it interesting.

Let's start with your name and a little of your background?

My name is Russell Sullivan. I am a distributed database architect. I created AlchemyDB which was acquired by Aerospike where I became the principal architect and these days I am working on a new startup/project called Datanet.

What have you brought to show and tell today?

A shiny new open source CRDT (conflict-free replicated data types) based data synchronization system named Datanet.

Datanet is a P2P replication system that utilizes CRDT algorithms to allow multiple concurrent actors to modify data and then automatically & sensibly resolve modification conflicts.

Datanet's goal is aims to achieve ubiquitous write though caching.

CRDT replication capabilities can be added to any cache in your stack, meaning modifications to these stacks are globally & reliably replicated. Locally modifying data yields massive gains in latency, produces a more efficient replication stream, & is extremely robust.

It’s time to pre-fetch data to compute :)

So AlchemyDB was a key-value store and now you are creating a CRDT database? That's a big change. Why have you gone in that direction?

CRDTs open up new use cases, you can do things with them that are not feasible without them. Plus I kept running into problems in the field where a CRDT system would be the perfect solution. Lastly Marc Shapiro and his people formalized the field of CRDTs between 2011-2014 so the timing worked.

What problems do you see where CRDTs are the perfect solution?

CRDT systems have some very interesting properties when compared to traditional data systems.

First, in a CRDT system you locally modify your data and then immediately return to computation. This is great for say an app-server request that looks up 5 pieces of data and then writes 2 pieces of data. If you are using a traditional DB, you have 7 network I/Os for that request. If you are using a CRDT system and you have the necessary pieces of data locally cached, you do ZERO network I/Os (in the fast path ... you still have to replicate the 2 writes, but this happens later via lazy replication).

Second, since CRDT systems have very loose requirements on the immediacy of replication they are inherently more robust in architectures with variable replication latencies (aka: replication over WAN). CRDT systems have no expectations of speedy or reliable replication, so hiccups, lag, or outages in the replication path are not disasters, they are merely delays that will be correctly processed as resources become available.

These two points are very different: the first is all about doing everything locally and the second is about being robust to bad WAN networks, but they are subtly closely related, they are both byproducts of the fundamentally different (zero-coordination, asynchronous, etc...) manner in which CRDT systems replicate data.

Can you go up a level or two? What kind of problems can you solving using Datanet that you can't solve or are hard to solve with existing systems?

Datanet is unique in that it's a distributed system where data can be modified concurrently by multiple actors and the system automatically resolves any conflicts. To illustrate when these conflicts happen, lets use the example of a service that goes through multiple iterations of scaling.

In the beginning the service has a single app server and a single database. As load grows a memcache cluster is added and the single app server becomes a cluster of app servers. Next geographical fail over is added via an additional data center. Finally to decrease request latency and increase request throughput the cluster of app servers is made stateful by adding a write through cache to every app server. Everyone of these system enhancements introduces a new point where additional actors concurrently modify data. In this architecture the list of actors is: each data center's database, each data center's memcache cluster, and every app server in both data centers.

If we have 10 app servers per data center this means we have 24 concurrent actors modifying data. Traditional systems can only deal with a single concurrent actor modifying data, so they partition requests accordingly. For instance a given user is software load balanced to a sticky data center and then a sticky app server in that data center and this app servers write through cache is only utilized for per user information.

If this architecture used Datanet, the partitioning logic would no longer be needed as all 24 actors are free to modify data concurrently!

With Datanet app server logic can be stateless: each app server can locally modify partitionable data (e.g. per-user) as well as global data (e.g. global user login counter) and then replicate asynchronously to every other actor caching the data where CRDT logic resolves the conflicts. The entire replication flow is wildly different than traditional flows, it has advantages in latency and robustness and of course some trade offs :)

The current state of affairs is that CRDTs are beautiful academia gradually emerging to become practical systems.

Datanet provides a simple API: JSON and presents CRDTs as a simple metaphor: local write through caches that replicate globally.

JSON is simple: you set strings and numbers, you increment numbers, you add and delete elements from an array, it's all CS 101. Datanet hides the complexities of the underlying commutative replicated data types (e.g. distributed zero coordination garbage collection) and clearly communicates how merge operations resolve conflicts. The goal is to lower the barrier of entry from someone with multiple PhD's to the junior developer level :)

To address CRDTs architectural complexities Datanet's messaging focuses on local operations: an actor modifies the contents of his local cache and this cache also receives external modifications (via Datanet) when other actors modify the same data. Each Datanet actor is a state machine that receives and applies internal and external modifications and stores the results in a local cache.

Changing the belief that CRDTs are impractical is important as we're not talking about niche use cases. CRDT style replication has been shown by Peter Bailis of Stanford to be applicable to 70% of the database calls in a large collection of top-level github projects.

What actually is a CRDT? What operations do you support and what's special about your implementation?

CRDTs are pretty hard to explain, so I try to explain them using the following example: A social media site has a rack of app-servers, each app-server updates a global-login-counter on user-login. Since updating a counter (by one) is a commutative operation replication requirements become looser: replication of individual increment-by-one operations can be applied in an arbitrary order and the final value of global-login-counter always turns out the same.

Using CRDTs each app-server increments a local global-login-counter and lazily broadcasts the increment operation to all other app-servers who then apply the operation. At any point in time, there is no guarantee that all app-servers will have the same value for global-login-counter, there are bound to be increment-by-one replication operations in flight, but the app-servers' values quickly converge to the same value and all increments are guaranteed to be applied (i.e. no data loss).

This concept of commutative operations being locally applied, then lazily broadcast, then remotely applied, and eventually all actors converging to the same value can be generalized to include data types far more complex than a counter, and this family of data types is called CRDTs.

Can you explain the architecture behind Datanet?Datanet's architecture begins at the individual actor level. Actors are mobile-devices, browsers, app-servers, memcache-clusters ... they are all database masters that have a local write-thru cache of a portion of the entire system's data-set.

Actors modify data in their local cache, then broadcast the modifications peer-to-peer* to all other actors caching the modified data, and the actors receiving these modifications will apply them to their local cache.

Besides Actors there is a centralized component that deals w/ K-safety (e.g. guarantee of 2 copies), drives distributed garbage collection, and deals with re-syncing actors who were offline. Datanet's hybrid architecture is p2p for optimal replication latency & centralized for high-availability.

More info:

Question: How much does Datanet cost? What's the licensing?

Cost is free: BSD open source license

Question: What's your dream for the future of Datanet?

Datanet's CRDT algorithms can add value anywhere a write-thru-cache can add value, which is a long list of places in modern stacks. This means the future for Datanet has tons of possibilities, so the current dream is to penetrate as many places in the stack as possible, which we refer to as "Ubiquitous write thru caching".

For example, at the micro level write-thru-caches can be utilized at the per-thread & per-process level for performance gains. At the data-center level write-thru-caches can prefetch data to compute for serverless architectures and provide high-availability for stateful app-server-clusters.

At the WAN level write-thru-caches merge the conflicts inherent in multiple-data-center-replication and make it feasible for CDNs to host dynamic real-time data at-the-edge.

More use case info:

In Datanet, the same piece of data can simultaneously exist as a write-able entity in many many places in the stack. Datanet is a different way of looking at data: Datanet more closely resembles an internet routing protocol than a traditional client-server database ... and this mirrors the current realities that data is much more in flight than it used to be.

CRDT Articles