DistSys May 2022: Keeping CALM: When Distributed Consistency is Easy

boris · May 24, 2022, 2:58pm

Please sign up on the Luma page to get access to the this and future events

Introduction

Nearly all of the software we use today is part of a distributed system. Apps on your phone participate with hosted services in the cloud; together they form a distributed system. Hosted services themselves are massively distributed systems, often running on machines spread across the globe. “Big data” systems and enterprise databases are distributed across many machines. Most scientific computing and machine learning systems work in parallel across multiple processors. Even legacy desktop operating systems and applications like spreadsheets and word processors are tightly integrated with distributed backend services.

Distributed systems are tricky, so their ubiquity should worry us. Multiple unreliable machines are running in parallel, sending messages to each other across network links with arbitrary delays. How can we be confident that our programs do what we want despite this chaos?

This problem is urgent, but it is not new. The traditional answer has been to reduce this complexity with memory consistency guarantees: assurances that the accesses to memory (heap variables, database keys, etc) occur in a controlled fashion. However, the mechanisms used to enforce these guarantees—coordination protocols—are often criticized as barriers to high performance, scale
and availability of distributed systems.

Recording

https://vimeo.com/713370739

Chat Log

08:38:29	 From Quinn Wilton : You're doing great!
08:39:54	 From Adam Lancaster : "You can't keep pointing at things and saying they are monotonic"
08:40:03	 From Adam Lancaster : "Immutable data structutes"
08:40:17	 From Philipp Krüger : :D
08:45:22	 From Adam Lancaster : "CRDTs are an OO lens on a long tradition of prior work that exploits commutativity to achieve determinism under concurrency."
08:46:47	 From Philipp Krüger : You can't keep pointing at things and saying they are transducers"
08:46:52	 From Philipp Krüger : :P
08:48:45	 From Brooklyn Zelenka (@expede) : 😛
08:49:32	 From Marc-Antoine Parent : From wikipedia: According to University of California, Berkeley computer scientist Eric Brewer, the theorem first appeared in autumn 1998.[5] It was published as the CAP principle in 1999[10] and presented as a conjecture by Brewer at the 2000 Symposium on Principles of Distributed Computing (PODC).[11] In 2002, Seth Gilbert and Nancy Lynch of MIT published a formal proof of Brewer's conjecture, rendering it a theorem.[1]
08:50:10	 From Brian : Ah great, thanks!
08:50:53	 From Adam Lancaster : You can have CAPM then. Consistency, Availability, Partition tolerance and montonicity
08:56:45	 From Quinn Wilton : Bottom left of page 1
08:56:53	 From Quinn Wilton : > a coordination-free implementation called Anna ran over two orders of magnitude faster by eliminating that coordination
08:57:27	 From Philipp Krüger : Beat me to it :P
09:02:14	 From Quinn Wilton : The key paragraph here is right above where it's scrolled too, top left
09:05:37	 From Marc-Antoine Parent : the full proof would be in https://arxiv.org/pdf/1012.2858v1.pdf
09:05:50	 From Brooklyn Zelenka (@expede) : > Relational transducers for declarative networking
09:05:54	 From Brooklyn Zelenka (@expede) : Nice
09:08:55	 From Adam Lancaster : left as an exercise to the reader
09:09:05	 From Adam Lancaster : "how can I make my program monotonic"
09:09:48	 From Na : at least they call that out in the "questions" section
09:10:42	 From Na : "Note that a coupon is not a clear mathematical inverse of any action in the original program; domain-aware compensation often goes beyond typical type system logic."
09:18:53	 From Philipp Krüger : LASP
09:19:00	 From Quinn Wilton : https://lasp-lang.readme.io/
09:19:13	 From Marc-Antoine Parent : https://blog.acolyer.org/2015/08/17/lasp-a-language-for-distributed-coordination-free-programming/
09:19:32	 From Marc-Antoine Parent : https://github.com/lasp-lang/lasp
09:20:22	 From Quinn Wilton : Sorry I have a bit of an audio delay so I keep cutting people off
09:22:54	 From Marc-Antoine Parent : lasp fails to compile with latest erlang
09:23:18	 From Adam Lancaster : no one has made a keep CALM and carry on joke yet
09:27:26	 From Adam Lancaster : thanks all I have to drop off!!
09:27:32	 From Adam Lancaster : 👋
09:28:00	 From Justin : This has been fascinating.  Got part way through the paper, but it was enough to generally follow discussion.
09:28:42	 From Quinn Wilton : I love that pape
09:28:51	 From Na : https://martin.kleppmann.com/papers/bft-crdt-papoc22.pdf

Quinn · May 24, 2022, 5:45pm

During the call I gave a brief explanation of why CAP doesn’t apply to CALM, but I wasn’t super satisfied with my answer and it came out kind of rambly.

I went back to the 2002 proof of the CAP Theorem to pull out the exact gap, and it stems from the choice of consistency model. For their proof, they choose atomic consistency, which requires that every replica process the same events in the same order.

Since monotonic programs eschew this traditional idea of order, they’re able to converge without requiring atomic consistency like this, and so the proof doesn’t actually apply to the space of programs covered by CALM, which instead opts for a confluence based consistency model that focuses on the observed outcome of a program’s execution.

As is often the case with popular results from academia, it’s important to keep in mind the exact conditions under which the result holds and was proven, because that sort of nuance is often lost as the result makes its way to a wider audience.

None of this is to say that the CAP Theorem is incorrect or anything like that, but it’s more narrowly applicable than you’d expect from the way it’s used as a buzzword.