Update on Scalable Causal Consistency For Wide-Area Storage With COPS

Here are a few updates on the article Paper: Don’t Settle For Eventual: Scalable Causal Consistency For Wide-Area Storage With COPS from Mike Freedman and Wyatt Lloyd.

Q: How software architectures could change in response to casual+ consistency?

A: I don't really think they would much. Somebody would still run a two-tier architecture in their datacenter:  a front-tier of webservers running both (say) PHP and our client library, and a back tier of storage nodes running COPS.  (I'm not sure if it was obvious given the discussion of our "thick" client -- you should think of the COPS client dropping in where a memcache client library does...albeit ours has per-session state.)

Q: Why not just use vector clocks?

A: The problem with vector clocks and scalability has always been that the size of vector clocks in O(N), where N is the number of nodes.  So if we want to scale to a datacenter with 10K nodes, each piece of metadata must have size O(10K).  And in fact, vector clocks alone only allow you to learn Lamport's happens-before relationship -- they don't actually govern how you would enforce causal consistency in the system.  You'd still need to employ either a serialization point or explicit dependency checking, as we do, in order to provide the desired consistency across the datacenter.

Distributed systems protocols (and DB systems) typically think about sites that include all the data they wish to establish causal ordering over.  In COPS, a datacenter node only has a shard of the total data set, so you are fundamentally going to need to do some consistency enforcement or verification protocol.

In short:  Vector clocks give you potential ordering between two observed operations.  We use explicit dependency metadata to tell a server what other operation it depends on, because that operation likely resides only on other servers in the cluster!

Q: Why would you pick this over Cassandra?

A:  I think it introduces a different consistency model.  To my knowledge, Cassandra doesn't actually allow you to achieve *any* consistency guarantees *across* keys, it's support for transactions and strong consistency is limited to each row (as is BigTable, PNUTS, etc.).  COPS allows you to achieve gain stronger consistency than eventual *across* rows/tablets/servers, while preserving the ALPS properties. This is why one of the directions of future work we are looking at is actually seeing how difficult it would be to "port" COPS' algorithms to Cassandra (or maybe even sharded MySQL) to provide cross-key consistency properties.

Q: Please explain the tweet: Conflicts are always possible without transactions! Likelihood ^ as consistency v from: strong -> causal -> eventual.

A: In your post, you noticed we have to deal with conflicting updates to the same key in a similar way to eventual consistency, either applying the last-writer-wins rule or some app-specific function.  This is a good point, and one of the things I like about eventual consistency is that it forces people to acknowledge that conflicts are possible and that you have to deal with them.

What's sort of surprising to many is that they are possible even with strong consistency (linearizability).  Even if two updates don't conflict explicitly, they can still conflict logically.  For instance, if two people (U and I) write to the same key (lock initially 0, U: lock = 1, I: lock = 1) at about the same time, the updates will be ordered by real time, but they should really conflict (the second lock = 1 should fail!) given the intentions of people using the system.

Instead, what you need for those sort of situations are transactions. At a minimum, you need test-and-set operations to ensure conflict-free operation!