This winter, I once again have the chance to offer a graduate seminar course on any topic I want. (Yay!) A couple months ago, I brainstormed a list of topics I liked and polled my grad students to see if there were any that particularly appealed to them. The results of the poll were, shall we say, indecisive:

Narrowing down to just one topic that would make everyone happy seemed like a lost cause. Fortunately, I didn’t have to pick just one! Two of the topics in particular – multitier/choreographic programming, and local-first software – seemed to present an especially interesting juxtaposition, and so “Distributed Software Systems: Global-First and Local-First Perspectives” was born.

Global first!

In the global-first part of the course, we’ll study multitier programming and choreographic programming – distributed programming paradigms in which a single, unified program expresses the behavior of multiple participants. We call this approach “global-first” because in multitier and choreographic programming, one begins by considering the behavior of an entire system rather than the behavior of individual participants. While various approaches to distributed programming purport to “let you think about your application as a single program that happens to run on multiple machines, rather than a collection of programs running on different machines that talk to one another”, global-first programming takes this point of view to the extreme: in global-first programming, you have no choice but to think of your application as a single program.

Our exploration of the multitier and choreographic programming research literature will include pioneering work, such as the Links programming language from the mid-2000s, as well as recent work such as the Pirouette language. As we read these papers, we’ll consider the strengths and weaknesses of a global-first approach. For example, a strength of choreographic programming is “deadlock-freedom-by-design”: because communication between participants can only be expressed by a single language construct (which compiles to a message send on one participant and a receive on another), it is impossible for the resulting collection of programs to contain a mismatched send and receive. On the other hand, much work on choreographic programming assumes that communication must be instantaneous, synchronous, and lossless. Can the promise of deadlock-freedom-by-design actually be realized on an asynchronous Internet where “everything fails all the time”? Furthermore, does a global-first approach make sense for applications where we may not know in advance how all participants in an execution will behave or how they should interact?

Local first!

In seeming contrast to multitier and choreographic programming is the notion of “local-first” software, in which network connections are expected to be intermittent, different participants may be running different versions of an application, and individuals operate autonomously with a minimum of centralization and coordination. The local-first ethos also emphasizes user agency, privacy, security, and long-term preservation of data, and we hope to honor these principles in our study of local-first software – while resisting techno-utopianism and longtermism.

In this part of the course, we will aim at developing an understanding of what local-first software means in terms of technical requirements, especially from a languages and systems perspective. Local-first software should work when a network connection is absent or intermittent, and after reconnecting, there should be a way to resolve Alice’s edits against the edits made by Bob and Carol while Alice was offline. Network storage and peer-to-peer connectivity may be necessary. Conflict-free replicated data types – data structures designed for replication across multiple participants in a setting where coordination may be infrequent – have been proposed as a foundational technology for local-first software, and have seen much attention in the programming languages and verification research world in the last ten years. Content-addressable storage will also be a topic of discussion: it is particularly well suited to decentralized systems, in which the contents of the data to be retrieved are more important than its current physical location.

At the end of the course, we’ll attempt to synthesize what we’ve learned from studying global-first and local-first distributed software systems into a coherent whole. Perhaps these paradigms need not be at odds with each other. Can we write local-first software in a global-first paradigm?

Students wanted!

If you think all this sounds interesting, I’ve posted a draft schedule of readings for those interested in following along with the course material. If you’re a computer science grad student (or an ambitious undergrad, perhaps?) at UC Santa Cruz, check out the course overview and consider enrolling in the class – there’s currently a waitlist, but it’s likely that some people will drop. I’d also welcome anyone who just wants to hang out and audit the class. Finally, if you’re not a computer science student at UC Santa Cruz, but you think you might want to be, then let’s talk.

Comments