“CRDT Emulation, Simulation, and Representation Independence” will appear at ICFP 2025
My research group has a new paper, “CRDT Emulation, Simulation, and Representation Independence”, appearing at ICFP this year! This project was headed up by my PhD student Nathan Liittschwager, with help from another PhD student in my group, Jonathan Castello, and our collaborator Stelios Tsampas. You can read our preprint (warts and all, but soon to be improved, thanks to feedback from the ICFP reviewers) on arXiv, but here’s a quick summary.
Conflict-free replicated data types – CRDTs to their friends – are data structures designed for fault tolerance and high availability in distributed systems. They replicate the same data across many physical locations, which helps ensure that some replica will be available and close at hand when clients need it. CRDTs have traditionally been taxonomized into state-based and op-based flavors. State-based CRDTs synchronize with each other by periodically sending their local state to each other, while op-based CRDTS synchronize by sending a stream of locally applied operations. It has long been known that these two flavors of CRDT can “emulate” each other: if you have a state-based CRDT, you can turn it into an op-based one by putting a kind of op-based wrapper around it, and vice versa. Shapiro et al. described these “wrapper” algorithms in their famous 2011 article introducing CRDTs.
But…what does “emulate” actually mean? In what sense does an emulated CRDT object actually behave like the original? We argue that this question actually matters, because researchers love to publish results about one or the other of state-based or op-based CRDTs and then argue that their results are general due to the existence of Shapiro et al.’s wrapper algorithms. For instance, in “Automated Parameterized Verification of CRDTs”, Nagar and Jagannathan write that their technique for op-based CRDTs “naturally extends to state-based CRDTs since they can be emulated by an op-based model”, and in “Katara: synthesizing CRDTs with verified lifting”, Laddad et al. write that their synthesized state-based CRDTs “can always be translated to op-based CRDTs if necessary”. But what results about an original object actually “transfer” over to an emulated object when we do this wrapping? That’s the question we tackled in this paper!
We specify and formalize CRDT emulation in terms of simulation by modeling CRDTs and their interactions with the network as transition systems. It turns out that emulation can be understood as weak simulations between the transition systems of the original and emulating CRDT systems. Getting the details right here was rather involved, because state-based and op-based CRDTs place different requirements on the behavior of the underlying network, both with regard to causal order of messages, and with regard to both the granularity of the messages themselves. With our main result in hand, though, we can then pretty straightforwardly get a good old-fashioned representation independence result about state-based and op-based CRDTs: client programs indeed can’t tell which flavor of CRDT they are interacting with, as long as they are limited to making certain (realistic) observations.
This paper has been in progress for more than two and a half years. Our collaboration got under way when Nathan and Stelios met way back at POPL 2023, when Nathan was presenting a different (but similar in spirit!) project at the POPL Student Research Competition. Nathan and Stelios got to talking, and we all began collaborating on what would eventually become this paper. Nathan presented some early ideas at CALCO in 2023, but the journey from there to this paper was long and challenging, and it’s wonderful to see our efforts finally bearing fruit!
It was Nathan’s idea to use different colors in the paper for the op-based and state-based CRDT semantics respectively, and to me, that suggests a tantalizing analogy to a long line of work on compiler correctness, like in Patterson and Ahmed’s “The Next 700 Compiler Correctness Theorems” (or lots of Amal’s other work). Of course, Amal would probably be doing this with logical relations instead. But it’s cool (and, I guess, not surprising in retrospect!) to see that emulator correctness is kind of like compiler correctness, and I’d like to continue to explore this direction of research beyond CRDTs in the future. Emulation is everywhere, and what it means for an emulator to be “correct” is an interesting question!
Comments