Paper

Paper: The Declarative Imperative: Experiences and Conjectures in Distributed Logic

High Scalability

18 Jun 2010 — 3 min read

The Declarative Imperative: Experiences and Conjectures in Distributed Logic is written by UC Berkeley's Joseph Hellerstein for a keynote speech he gave at PODS. The video version of the talk is here. You may have heard about Mr. Hellerstein through the Berkeley Orders Of Magnitude project (BOOM), whose purpose is to help people build systems that are OOM (orders of magnitude) bigger than are building today, with OOM less effort than traditional programming methodologies. A noble goal which may be why BOOM was rated as a top 10 emerging technology for 2010 by MIT Technology Review. Quite an honor.

The motivation for the talk is a familiar one: it's a dark period for computer programming and if we don't learn how to write parallel programs the children of Moore's law will destroy us all. We have more and more processors, yet we are stuck on figuring out how the average programmer can exploit them. The BOOM solution is the Bloom language which is based on Dedalus:

Dedalus is a temporal logic language that serves as a clean foundation for Bloom. The key insight in Dedalus is that distributed programming is about time, not about space, and programmers should focus their attention on data, invariants, and changes in time. Issues of spatial layout are set aside where they belong: as performance details that need to be addressed as part of tuning, or managed by an optimizer. Dedalus is an evolution of our earlier Overlog language, which in turn was based in Datalog. Where Overlog had complicated operational semantics, Dedalus is pure temporal logic with no need for the programmer to understand the behavior of the interpreter/compiler

Dedalus is completely declarative and logic based. This is nothing like most us of are used to. I have to say the data flow language approach makes more intuitive sense to me, but I feel this is something that I should try to understand. Hopefully a Dummies book will be out soon :-)

Mr. Hellerstein wrote an overview of the paper in a blog post. What follows is from that post.

The goal of the talk is to: 1) summarize seven years of experience using logic to build distributed systems and network protocols (including P2, DSN, and recent BOOM work), and 2) set out some ideas about the foundations of distributed and parallel programming that fell out from that experience.

The big ideas he discusses in the paper are:

The CALM Conjecture. CALM stands for “Consistency And Logical Monotonicity”. The conjecture maps out the systems that can be implemented in an eventually consistent fashion without any coordination or waiting across sites. The answer? Systems that can be expressed in monotonic logic. As a somewhat simplified rubric, this includes systems that (a) do append-only updates (no overwriting) and (b) never try to aggregate data across sites via counting, summing, voting, finding a max or min, etc. (This is more restrictive than what CALM intends: these behaviors can in fact be “used” monotonically in special cases. More on that another day.) Interestingly, cross-site joins and recursion are included: these are monotonic operations and can easily be implemented in an eventually consistent, streaming fashion.
The CRON Conjecture. CRON stands for “Causality Required Only for Non-Monotonicity”. It’s an extension or corollary of CALM to address classic distributed systems complexities of coordinating clocks across machines (Lamport’s famous foundations and follow-ons). It basically says that you can toss out all of that distributed clocks stuff if your program is monotonic (a corollary of CALM), and also says that you need that stuff if the results you ship between machines are involved in non-monotonic operations—again, aggregation and state overwrites are the standard examples. As a side-note, it also argues that monotonic systems will work correctly even if messages are delivered at an earlier time than they are sent! (If you think this sounds nuts, think about restarting a process but reusing its message logs from earlier runs to save work.)
The Fateful Time Conjecture. This one is more cosmic — it tries to explain the underlying purpose of Time in computing (and maybe in general). The basic idea is that Time (meaning both the sequentiality of program steps in a single “thread”, and coordination of steps across threads/machines) is needed for only one purpose: to prevent multiple possible states from co-occurring. I.e. the purpose of time is to seal fate at each instantaneous moment. What happens if you don’t seal fate? One of two things: (a) logically impossible things like “P and Not P” co-occur, or (b) multiple non-deterministic choices among valid possibilities co-occur (so-called “multiple worlds”). I’m claiming that if you are using synchronization in a program — i.e. preventing things from happening in parallel — then you should convince yourself that the parallelism you avoided would have led to those conditions. Otherwise you are Wasting Time. (Did you think through this the last time you wrote a two consecutive lines in Java/Python/C? I didn’t think so. Hopefully Bloom will help automate that thinking for you!)

Paper: The Declarative Imperative: Experiences and Conjectures in Distributed Logic

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale