Schedule of topics and readings

This schedule is tentative, and it is neither sound (that is, if a particular reading or topic is listed here, that doesn’t mean we’ll cover it!) nor complete (that is, if a particular reading or topic is not listed here, that doesn’t mean we won’t cover it!).

The course will alternate between Discussion and Lecture days. For days marked Discussion, you must turn in your reading response for that day’s reading assignment on Canvas by noon Pacific time on the day prior to the day of class. For example, for the paper to be discussed on Wednesday, October 7, you must turn in your reading response on Canvas by noon Pacific time on Tuesday, October 6.

Date Topic Notes
Friday, 10/2 Course overview  
Monday, 10/5 Lecture: distributed systems: what and why?; time and clocks; Lamport diagrams Start-of-course survey due by start of class today
Wednesday, 10/7 Discussion: Jim Waldo et al., “A Note on Distributed Computing” (1994) Reading response due noon PT on 10/6; scribes’ wiki write-up due 11:59:59pm PT on 10/14
Friday, 10/9 Lecture: causality and happens-before; network models; state and events; partial orders; total orders; Lamport clocks  
Monday, 10/12 Discussion: Leslie Lamport, “Time, Clocks, and the Ordering of Events in a Distributed System” (CACM 1978) (skip the section “Physical Clocks”) Reading response due noon PT on 10/11; scribes’ wiki write-up due 11:59:59pm PT on 10/19
Wednesday, 10/14 Lecture: vector clocks; protocol runs and anomalies; delivery vs. receiving; FIFO delivery; causal delivery  
Friday, 10/16 Discussion: Reinhard Schwarz and Friedemann Mattern, “Detecting Causal Relationships in Distributed Computations: In Search of the Holy Grail” (Distributed Computing, 1994) (sections 1-3 only) Reading response due noon PT on 10/15; scribes’ wiki write-up due 11:59:59pm PT on 10/26
Monday, 10/19 Lecture: totally-ordered delivery; implementing FIFO delivery; implementing causal broadcast  
Wednesday, 10/21 Discussion: Kenneth Birman, André Schiper, and Pat Stephenson, “Lightweight Causal and Atomic Group Multicast” (TOCS, 1991) (sections 1-5 only) Reading response due noon PT on 10/20; scribes’ wiki write-up due 11:59:59pm PT on 10/28
Friday, 10/23 Lecture: uses of causality in distributed systems; consistent snapshots; Chandy-Lamport snapshot algorithm  
Monday, 10/26 Discussion: K. Mani Chandy and Leslie Lamport, “Distributed Snapshots: Determining Global States of Distributed Systems” (TOCS, 1985) Reading response due noon PT on 10/25; scribes’ wiki write-up due 11:59:59pm PT on 11/2
Wednesday, 10/28 Lecture: Chandy-Lamport wrap-up: limitations, assumptions, properties; uses of snapshots; centralized vs. decentralized algorithms; recap of delivery guarantees and protocols; safety and liveness; reliable delivery  
Friday, 10/30 Discussion: Bowen Alpern and Fred B. Schneider, “Defining Liveness” (Information Processing Letters, 1985) Reading response due noon PT on 10/29; scribes’ wiki write-up due 11:59:59pm PT on 11/6
Monday, 11/2 Lecture: fault classification and fault models; two generals problem  
Wednesday, 11/4 Discussion: Joseph Y. Halpern and Yoram Moses, “Knowledge and Common Knowledge in a Distributed Environment” (JACM, 1990) (sections 1-4 only) Reading response due noon PT on 11/3; scribes’ wiki write-up due 11:59:59pm PT on 11/11
Friday, 11/6 Lecture: implementing reliable delivery; idempotence; at-least-once/at-most-once/exactly-once delivery; unicast/broadcast/multicast; reliable broadcast; implementing reliable broadcast; preview of replication  
Monday, 11/9 Discussion: Felix C. Gärtner, “Fundamentals of Fault-Tolerant Distributed Computing in Asynchronous Environments” (ACM Computing Surveys, 1999) Reading response due noon PT on 11/8; scribes’ wiki write-up due 11:59:59pm PT on 11/16
Wednesday, 11/11 No class (Veterans Day)  
Friday, 11/13 Lecture: replication; total order vs. determinism; consistency models: FIFO, causal, “strong”; primary-backup replication; chain replication; latency and throughput  
Monday, 11/16 Discussion: Robbert van Renesse and Fred B. Schneider, “Chain Replication for Supporting High Throughput and Availability” (OSDI 2004) Reading response due noon PT on 11/15; scribes’ wiki write-up due 11:59:59pm PT on 11/23
Wednesday, 11/18 Lecture: handling node failure in replication protocols; introduction to consensus; problems equivalent to consensus; the FLP result; introduction to Paxos  
Friday, 11/20 Discussion: Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson, “Impossibility of Distributed Consensus with One Faulty Process” (JACM 1985) Reading response due noon PT on 11/19; scribes’ wiki write-up due 11:59:59pm PT on 11/27
Monday, 11/23 Lecture: Paxos: the interesting parts; Multi-Paxos  
Wednesday, 11/25 Discussion: Leslie Lamport, “Paxos Made Simple” (2001) Reading response due noon PT on 11/24; scribes’ wiki write-up due 11:59:59pm PT on 12/2
Friday, 11/27 No class (Thanksgiving break)  
Monday, 11/30 Lecture: fault tolerance of Paxos; brief mention of other consensus protocols: Viewstamped Replication, Zab, Raft; passive vs. active (state machine) replication  
Wednesday, 12/2 Discussion: Tushar Chandra, Robert Griesemer, and Joshua Redstone, “Paxos Made Live: An Engineering Perspective” (invited talk, PODC 2007) Reading response due noon PT on 12/1; scribes’ wiki write-up due 11:59:59pm PT on 12/9
Friday, 12/4 Lecture: eventual consistency; strong convergence and strong eventual consistency; intro to application-specific conflict resolution; network partitions; availability; the consistency/availability trade-off; anti-entropy with Merkle trees; gossip; quorum consistency  
Monday, 12/7 Discussion: Giuseppe DeCandia et al., “Dynamo: Amazon’s Highly Available Key-value Store” (SOSP 2007) Reading response due noon PT on 12/6; scribes’ wiki write-up due 11:59:59pm PT on 12/14
Wednesday, 12/9 Lecture: sharding; consistent hashing  
Friday, 12/11 End-of-course wrap-up and AMA; overview of Lindsey’s research  
Wednesday, 12/16 No final exam. Enjoy your winter break!