Course retrospective: Languages and Abstractions for Distributed Programming

December 31, 2018

My first term here at UC Santa Cruz has wrapped up, and with it, my graduate seminar course, Languages and Abstractions for Distributed Programming. I had a delightful group of six students, all from the UCSC CS Ph.D. program; over ten weeks, we wrote a group blog, read twenty-six papers, and hosted six guest speakers.

Readings, responses, and presentations

As is typical for a graduate seminar, most of the work students did in this course consisted of reading papers, writing responses to them, and giving class presentations about them. We began with the CAP trade-off, then spent a couple weeks exploring the zoo of distributed consistency models – from session guarantees to causal consistency (with and without convergent conflict handling) to linearizability. Once we were thoroughly tired of consistency models, we spent some time on the theory and practice of replicated data types, as well as on languages, abstractions, and verification tools for combining consistencies in distributed systems. In the last month of the course, we looked at lots more languages and frameworks for distribution, with a side trip into abstractions for configuration management at the end.

One of the hardest things about planning the readings for the course is that ten weeks really isn’t very long. I had to leave plenty of good stuff off of our reading list because we just didn’t have room for it. Still, I ended up being reasonably happy with the set of readings we chose. My students had diverse interests, and it was hard to please everyone – it seemed like every time someone really liked a particular paper, someone else disliked it just as strongly – but in the end, I think everyone got to read some papers they liked. The students’ paper preferences for presentations were also sufficiently diverse that I was able to assign everybody papers that they had actually requested. (Whether they ended up still liking those papers after they were done reading and presenting on them is another question, but at least I was able to give everyone some papers they initially thought they wanted!)

Each student presented on three papers. I think that three is a lot to ask, and it probably would have been better if each student had only had to do two presentations, which would have been possible if we’d had nine students enrolled instead of six. I also did three of the paper presentations myself, two of which I think went fine and one of which I think went poorly. The paper for that latter one was Viotti and Vukolić’s survey of notions of consistency, and since that paper – as well as some of the others on our reading list – had some sort of dependency on material in Sebastian Burckhardt’s (free) book Principles of Eventual Consistency, I decided to just devote the class to talking about the material from Burckhardt’s book, instead of about the Viotti and Vukolić paper that students had actually read. This might have been a reasonable pedagogical choice if I’d managed to connect what I talked about to the reading that students had done, but I pretty much failed to do so.¹

If I could do it over, there are a couple of things that I’d do differently regarding the logistics of reading responses. Students were supposed to write a short response to every paper we read. I told them that they could skip up to four of these responses with no consequences, and that if they wanted to skip one, they should email me. In retrospect, the “email me” step was totally unnecessary! I should have just told them to go ahead and skip up to four responses, no questions asked, and saved myself having to read a bunch of mail.

Another logistical hiccup had to do with the way that reading response submissions worked. Like almost everything else in the course, reading responses were public on GitHub. In order to comply with US law (specifically, FERPA) and UC policy, though, I needed to give students a way to submit their work in a way that would protect their right to not have their homework be made publicly available and identifiable as theirs. (Students also have the right to not have the fact that they’re even taking a certain class be made public.) After talking with our staff about FERPA compliance, the solution I arrived at was to assign each student a random ID number for the term, which they used in the file naming convention for their reading response submissions. However, since students were using their own GitHub IDs to submit homework, the random ID numbers didn’t actually anonymize submissions. To fix that, I told them that they were welcome to create a separate GitHub account just for use in my course that didn’t reveal any personal information (with a username that was, say, a random string), and that they could opt out of having their names associated with their public blog posts. Nobody took me up on either of those suggestions, so it appears that no one minded having their name associated with their work in the course. Still, it probably would have been easier to just have the reading responses not be public, and instead have them shared with the class on some internal forum, instead of making people do the random ID number thing.

The course blog

In addition to reading papers, writing responses to them, and giving presentations about them, my students poured a lot of work into the course blog. Every student contributed two posts to the blog, as well as serving as an editor for two posts written by their classmates. I gave the students some suggestions for what to write about and provided feedback on the ideas they came to me with, but they were ultimately responsible for coming up with their own topics for the blog. I was delighted by the results:

Sohum Banerjea’s posts showed both impressive depth and impressive breadth. On the depth side, he dug into the appendix of a recent POPL paper and picked it apart to make sense of the apparent correspondence between weak memory models and distributed memory consistency. On the breadth side, he surveyed a bunch of PL-flavored papers from Joe Hellerstein’s “Progressive Systems” seminar course from a few years back.
Austen Barker wrote a great two-part series on implementing a CRDT for sets and graphs, with a particular focus on enabling garbage collection. He had the clever idea of garbage-collecting deltas and tombstones together, which I’d love to see evaluated further.
Natasha Mittal’s first contribution to the blog was a discussion of consistency – or lack thereof – in the Cassandra database system, tying concepts we’d discussed in class to the practicalities of a real system. For her second post, she wrote an overview of the Erlang programming language, which I learned a lot from – her post is filled with great pointers into Joe Armstrong’s 2003 thesis and other Erlang resources.
In what were perhaps the most “research-y” posts of anyone in the class, Aldrin Montana looked at several language-level mechanisms for mixed-consistency programming and discussed how they might be used to support mixed consistency in a programmable storage system, then followed that up with a detailed look at the internals of the (strongly consistent) Ceph storage system and a discussion of how it might be modified to support a wider range of consistency options.
Dev Purandare bravely tackled the topic that no one else dared to go near: he wrote an overview of distributed consensus protocols and what makes them hard to implement. He followed this up with a deep dive into the DistAlgo DSL to explore what language-level support for implementing consensus protocols looks like.
Finally, Abhishek Singh wrote a wonderful two-part series on Ellis and Gibbs’ classic operational transformation algorithm for collaborative text editing, including his own implementation in Go. I understand OT a lot better after reading these posts!

Blogging ate up more of the students’ time than I had anticipated it would. I had imagined them spending about thirty hours per post, but some people spent much more time than that. Some students also didn’t like having to write two posts. My reasoning had been that one large post would feel unapproachable, while two smaller posts with distinct deadlines would break up the work into manageable chunks. The feedback I got from some students, though, was that they felt like they had to do the equivalent of two course projects, which hadn’t been my intention.

Although every post had a student editor, I ended up pretty heavily editing all twelve posts myself, working together with the students and using Google Docs to make comments and provide suggested edits. I had imagined that having student editors would take some of the editing burden off me, and also give students some practice with editing each other’s writing. Although I think this worked to some extent, I didn’t really give students much guidance on how to edit each others’ work, so some of the advice they gave each other contradicted what I would have said. I also felt that the students were just too nice to each other a lot of the time!

I think some students were happy with the amount of attention I was paying to the quality of their writing. Others may have found it irritating. It’s certainly true that if I’d had more than six students, I wouldn’t have been able to give students’ posts the amount of individual attention that I did this fall. (One option for a bigger class might be to have students write posts together in small groups.) Also, with the way the deadlines for finishing posts worked, I ended up with several posts to edit at once. A better approach might have been to assign particular weeks to students when they had to finish their posts, with no more than two per week, so that my editing work could have been better spaced out.

In the end, I think all our hard work paid off: we got some really nice reactions to the blog! This tweet from KC Sivaramakrishnan (who was the author of one of my favorite papers from the course really made my year:

I am particularly impressed with the quality of the blog posts! Very well researched and written so clearly. I highly recommended reading all of the posts. https://t.co/XIAjl3XRuZ
— KC Sivaramakrishnan (@kc_srk) November 22, 2018

I’m by no means the first instructor to incorporate public blogging into a computer science seminar course. Two good examples I know of are the Understanding and Securing TLS and Security and Privacy of Machine Learning seminars run by David Evans at UVA. In those courses, teams of students worked together to write posts about each class meeting. (The classes met for a single, long session once a week.) I decided to do it differently: for us, the blog was a replacement for a traditional course project rather than a record of what was discussed in class. Anyway, I’m interested in talking to other people who’ve used blogging as part of teaching CS classes; let me know what worked for you and what didn’t!

Guest speakers

We were fortunate to have a star lineup of guest speakers in the course:

Brandon Holt told us about his work on consistency types. This paper is part of a recent wave of work on mixed-consistency programing models that also includes MixT and Quelea. Brandon’s a really good speaker, and I’m delighted that he was willing to stop by.
My colleague Peter Alvaro made time to come chat with us about Bloom! I was already familiar with this line of work, but there’s no substitute for having an expert in to talk about it, and hearing Peter’s take on things really helped solidify some of the concepts for me.
Ankush Desai visited from Berkeley to tell us about his work on the P programming language, as well as his recent follow-up work on ModP. I liked Ankush’s papers so much that I started a novelty Twitter account inspired by them.
Michael Isard visited from Google to give a great retrospective talk on Naiad and timely dataflow. Out of all the guest speakers we had, this one was the most well-attended by faculty.
Arjun Guha visited all the way from the east coast to talk about Puppet configuration verification and repair. Arjun was our only non-Californian guest speaker. For the most part, I tried to only invite people who were relatively local, since I didn’t have budget to pay for guest speaker travel. So inviting Arjun was a spur-of-the-moment thing, and I was pleasantly surprised when he was actually willing to come all the way to Santa Cruz. And then he almost had to give a spontaneous whiteboard talk when our building’s power went out! Fortunately, thanks to Owen Arden’s god-tier last-minute room-finding skills, we were able to find a place for Arjun’s talk in the adjacent building, although not before he had drawn his title slide on the whiteboard in the original room.
Finally, Mohsen Lesani visited from UC Riverside to talk about brand-new work on replication coordination analysis and synthesis at our end-of-term party! This wasn’t one of the papers that we read for class, but I thought it capped off the class nicely and was a good follow-up to some of the ones we did read, like the “‘Cause I’m Strong Enough” paper.

For all the external speakers (that is, everyone except Peter), I opened up the talk to people outside of our class and made an effort to advertise, because I didn’t want people to have to come from far away to give a talk to only six students. My efforts here were sometimes quite successful and other times not at all successful. In the future, instead of asking speakers to discuss specific papers that we read for the course, I might ask them to talk more generally about their work (which some of the speakers went ahead and did anyway), which might have more broad appeal to people not enrolled in the course who likely hadn’t seen the paper. Organizing things that way would also make it possible to invite people who didn’t happen to be an author of a paper we were reading, but nevertheless were doing exciting and relevant work.

Overall, I’m pleased with how the guest speakers went. I learned a lot from having them, and I’m not just talking about learning how to arrange campus parking passes for visitors, although that is indeed a useful skill to have picked up.

Impact beyond {UCSC, academia}?

The way I ran this course was influenced by my own Ph.D. experience, during which I got lots of training in how to communicate my work to other people in my own narrowly-focused academic subfield, but not much training in how to communicate to anyone else. I wanted to do better, so I asked my students to try to aim their blog posts at a “general technical audience”, in the hope that the blog might have some impact beyond our narrow slice of academia. I never really defined “general technical audience” to anyone’s satisfaction, though, including my own. Although people said nice things about the blog, the people saying them tended to be, well, other academics in my subfield, just at different institutions. So, although we did reach people beyond UCSC, I don’t know if I can claim that the blog succeeded at communicating with an audience beyond academia. What could we try instead? Maybe ten-minute !!Con-style talks would be worth a shot.

Having said all that, one other way in which I think the course did have a noticeable impact beyond UCSC is that that a group of students at CU Boulder have created a reading group based on it! I met the student running the reading group, David Moon, back at ICFP in September, and we had a long lunch conversation. The papers he chose for the group ended up being a subset of my course’s reading list, with one more good one that we didn’t have room for (the Verdi paper!) added at the end. I’m absolutely thrilled that this reading group is happening, and I hope that it means that a few more people have the opportunity to think about this particular set of readings as a collection and consider the connections and potential connections between them.

Burckhardt’s book (which should really just be called Principles of Consistency) builds up a lot of mathematical machinery to define what he calls operation contexts, which can be thought of as a graph of events that affect the result of an operation. The concept of an operation context is necessary to define a replicated data type specification, which in turn is necessary to specify a consistency model like the ones in Viotti and Vukolić’s survey. I failed at putting all these pieces together in my lecture, but I hope that the students at least got something out of being exposed to Burckhardt’s specification framework, which has been used in a lot of follow-up work. In particular, Quelea takes some of the interesting parts of Burckhardt’s framework and turns it into a programming language, which I find extremely cool. ↩

Lindsey Kuper

Course retrospective: Languages and Abstractions for Distributed Programming

Readings, responses, and presentations

The course blog

Guest speakers

Impact beyond {UCSC, academia}?

Comments