« Stuff The Internet Says On Scalability For March 14th, 2014 | Main | Douglas Adams - 3 Rules that Describe Our Reactions to Technologies »

Paper: Scalable Eventually Consistent Counters over Unreliable Networks

Counting at scale in a distributed environment is surprisingly hard. And it's a subject we've covered before in various ways: Big Data Counting: How to count a billion distinct objects using only 1.5KB of Memory, How to update video views count effectively?, Numbers Everyone Should Know (sharded counters).

Kellabyte (which is an excellent blog) in Scalable Eventually Consistent Counters talks about how the Cassandra counter implementation scores well on the scalability and high availability front, but in so doing has "over and under counting problem in partitioned environments."

Which is often fine. But if you want more accuracy there's a PN-counter, which is a CRDT (convergent replicated data type) where "all the changes made to a counter on each node rather than storing and modifying a single value so that you can merge all the values into the proper final value. Of course the trade-off here is additional storage and processing but there are ways to optimize this."

And there's a paper you can count on that goes into more details: Scalable Eventually Consistent Counters over Unreliable Networks:

Counters are an important abstraction in distributed computing, and play a central role in large scale geo-replicated systems, counting events such as web page impressions or social network “likes”. Classic distributed counters, strongly consistent, cannot be made both available and partition-tolerant, due to the CAP Theorem, being unsuitable to large scale scenarios. This paper defines Eventually Consistent Distributed Counters (ECDC) and presents an implementation of the concept, Hand- off Counters, that is scalable and works over unreliable networks. By giving up the sequencer aspect of classic distributed counters, ECDC implementations can be made AP in the CAP design space, while retaining the essence of counting. Handoff Counters are the first CRDT (Conflict- free Replicated Data Type) based mechanism that overcomes the identity explosion problem in naive CRDTs, such as G-Counters (where state size is linear in the number of independent actors that ever incremented the counter), by managing identities towards avoiding global propagation and garbage collecting temporary entries. The approach used in Handoff Counters is not restricted to counters, being more generally applicable to other data types with associative and commutative operations.

Reader Comments (1)

As a complement to the paper there is a presentation that gives an overview of some alternatives and a general description of the mechanisms in the paper in these slides and a reference Clojure implementation in github.

March 15, 2014 | Unregistered CommenterCarlos Baquero

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>