Course retrospective: SMT Solving and SolverAided Systems
Last fall, at the start of my second year at UC Santa Cruz, I taught a graduate seminar on SMT solving and solveraided systems!
I was lucky I had the chance to offer this course at all. I had originally been scheduled to teach something more typical and boring, but in spring 2019, as my first year at UCSC was ending, my department chair asked me if I wanted to instead teach a special topics grad seminar in the fall, on anything I wanted. Hell yeah! Sign me up! I couldn’t believe my good fortune, since I’d already gotten to teach one such course in my first year. The topic I’d chosen that first time, Languages and Abstractions for Distributed Programming, had been really fun, but pretty squarely in my comfort zone as a researcher. For this one, I decided I would venture outside of my comfort zone and learn something new.
Choosing a topic
It wasn’t too hard to decide what I wanted to learn. For years, ever since my time working with the Reluplex team at Stanford while I was at Intel Labs, I’d wanted to know more about how the guts of SMT solvers worked. In my field, programming languages (PL), lots of people use solvers as blackbox tools without being too concerned about their implementation. But to build things like, for instance, Reluplex, which is a domainspecific SMT solver for verifying safety properties of neural networks, deeper knowledge of solver internals is required.
I was excited by the idea of domainspecific SMT solvers. Just as with domainspecific compilers, a domainspecific solver could exploit domain knowledge for efficiency (and elegance!). But, while I had some kind of a clue about how to implement a compiler, I had very little of a clue about how to implement an SMT solver. This was rather embarrassing, considering I had just published a paper arguing that people ought to be building domainspecific SMT solvers. Well, time to learn. I would force myself to learn about solver internals by signing myself up to teach about them!
I decided that the first three weeks of my course would have a lecture format and focus on solver internals. After that, it would switch to a seminar format in which students would read and present papers on solveraided languages (like Dafny, Liquid Haskell, and Rosette), solveraided systems (like KLEE), and domainspecific solvers (like Reluplex). The papers I chose came from a wide variety of venues – verification, testing, security, operating systems, and several distinct flavors of PL – reflecting my own expanding interests, or at least reflecting my attempt to keep my own PL myopia in check.
Six students signed up to take my class for credit (a number that, to my amazement, grew to eight over the first few weeks), and a handful of others audited. Most people in the room were grad students, but a few were undergrads, and there were even a couple of professors who sat in from time to time (!).
The first three weeks: SMT solver innards
Once I knew that I was going to spend three weeks lecturing about solvers, I went to look for courses at other institutions that seemed to have have a goodsized overlap with what I wanted to teach. Two that stood out were Işıl Dillig’s Automated Logical Reasoning at UT Austin, and Emina Torlak’s ComputerAided Reasoning for Software at UW. Both were influential in how I designed this part of my course. (If you like slides, I strongly recommend both of their lecture slides!)
I saw that some folks I trusted had recommended Kroening and Strichman’s Decision Procedures: An Algorithmic Point of View as a textbook, so I began reading it and making lecture notes based on the first few chapters. I also decided to go with Marijn Heule and Oliver Kullmann’s very nice 2017 CACM article “The Science of Brute Force” as the first reading assignment in the course, and I made the first lecture entirely based on it.
I’ve been meaning to clean up my lecture notes before sharing them – make them less embarrassingly loquacious, maybe add some proper diagrams instead of the ASCII art – but let’s face it, that’s never gonna happen. So, here they are, in 17point font because the use case is mesquintingatmylaptopfromtwometersawaywhilewritingfuriouslyontheboard. The PDFs are generated from Markdown, using Pandoc; if you want the original Markdown documents (for instance, if you want to regenerate the PDFs with a smaller font size), here they are on GitHub! If you find any issues with my lecture notes – whether they’re typos, major conceptual misunderstandings on my part, or anything in between – I’d appreciate pull requests.
 “The science of brute force” (goes with Heule and Kullman’s article)
 Introduction to decision procedures, part 1 (goes with sections 1.01.3 (pp. 114) of Kroening and Strichman)
 Introduction to decision procedures, part 2 (goes with sections 1.41.7 (pp.1423) of Kroening and Strichman)
 Introduction to SAT solving, part 1 (goes with sections 2.12.2.3 (pp. 2738) of Kroening and Strichman)
 Introduction to SAT solving, part 2 (goes with sections 2.2.42.2.9 (pp. 3850) of Kroening and Strichman)
 Introduction to SMT solving (goes with sections 3.13.4 (pp. 5972) of Kroening and Strichman)
 Theories: Equality and uninterpreted functions (goes with sections 4.14.5 (pp. 7791) of Kroening and Strichman)
 Theories: Linear arithmetic (goes with sections 5.15.2 (pp. 97106) of Kroening and Strichman)
Here’s what the board looked like at the end of the last day of class!
tag urself I'm the "what the spec allows"/"what the compiler does" diagram pic.twitter.com/OPIp72j4fx
— Lindsey Kuper (@lindsey) October 21, 2019
If I had wanted to keep lecturing after three weeks, we could’ve continued marching through Kroening and Strichman and talked about a bunch more theories and decision procedures for them, but at this point, we had a basic understanding of how solvers worked. I had planned to do one more lecture, about the NelsonOppen method of combining decision procedures, but I ended up dropping it because it wasn’t actually essential background for the papers we were reading, and because two additional students joining the class meant I had to rearrange the schedule to make room for one more day of student project presentations at the end.
The rest of the course: solveraided systems
Since the whole point of teaching a seminar is to sit down, shut up, and give students a chance to talk instead (and then offer feedback on what they say), the rest of the course consisted of paper presentations by students, mostly on solveraided systems and languages. We read 18 papers, 16 of which were presented by students (with each student presenting two papers) and two of which were presented by guest speakers.
Each student had to turn in a short response for each of the readings they didn’t present. Unlike with my fall 2019 seminar, I didn’t make the reading responses public on GitHub. Instead, we used a private forum that was part of our course management system, and it was set up so that students had to turn in a response themselves before being able to see other people’s. I tried to leave comments in response to everyone’s responses, every time, but this was a lot of work, and it didn’t really do much to foster discussion, as I had hoped it might. In the future, I might just leave one general toplevel responsetotheresponses for each reading.
Here’s what we read:
 We started with a bunch of work on symbolic execution for finding bugs: the EXE bugfinding tool and its codesigned solver, STP, then HAMPI, an STPbased solver for string constraints, and then EXE’s famous successor, KLEE. I had been aware of some of these tools, but hadn’t actually read the papers about them until this course. I’m really glad I did! It was an intentional choice to put these four papers first, because I wanted to give people a break from the more theoretical material from the lectures and show them that solveraided systems were immediately useful for fuzzing the software that they used every day (like the GNU coreutils!).
 We read about Rosette, which is a domainspecific language embedded in Racket, intended for building solveraided languages.^{1} We then had two guest speakers to talk about their work that used or built on Rosette: Mangpo Phothilimthana visited to talk about her Rosettebased work on synthesizing efficient GPU kernels, and James Bornholt visited to talk about his symbolic profiler for Rosette. We also read about Smten, which could be thought of as Rosette’s Haskellbased cousin. I think Smten is really cool, although it hasn’t taken off as a platform for solveraided programming in the way that Rosette has (though G2Q is related followon work).
 There are other ways of augmenting a programming language with a SMT solver, aside from “use the solver to implement a runtime symbolic execution strategy”! We read about liquid types, an SMTpowered decidable refinement type system, and its realization in Liquid Haskell, which required reckoning with laziness; these tools use the solver entirely at compile time, in contrast to Rosette or Smten. We read about Dafny, a verification language that compiles both to executable code and (by way of Boogie) to verification conditions intended to be used as input to a solver. And we read about Kaplan, an extension to Scala that enables SMTpowered constraint programming, and that can make use of the solver both at run time and at compile time.
 We read two papers on custom solver theories: the aforementioned Reluplex paper, which I’ve written about before, and “SMT solving for the theory of ordering constraints” by Cunjing Ge et al. In this second paper, the authors observed that Z3’s integer difference logic was more expressive a theory than they needed for their domain of SMTpowered data race detection. A less expressive theory not only did everything they needed, but had an asymptotically more efficient decision procedure. I like this paper because, like Reluplex, it’s about a situation where a custom solver theory was called for, but unlike Reluplex, it’s not about adding stuff to an existing theory, but rather, taking stuff away.
 Finally, we read about solverpowered verification tools for a variety of domains: web page layout with Cassius, OS system calls with Hyperkernel, and my personal favorite, distributed data consistency, with Indigo and Quelea. (The Quelea paper was actually the only paper in the intersection of this course and my previous special topics course. I might really like the Quelea paper, y’all.)
Course projects
I asked every student to do an independent project as part of the course. This was different from my previous grad seminar, where, in lieu of traditional course projects, I had asked students to contribute to a public course blog. The blog had been a success, but largely because I poured an immense amount of effort into editing it, and I didn’t think I would be able to handle that level of effort again, especially not now that I was teaching a topic that was further from my area of expertise. So we stuck with the typical course project approach. I’ve seen many an overambitious course project crash and burn, so I told students to aim low and spend no more than 40 hours on their project over the course of the quarter.
At the halfway point of the course, I asked students to turn in a twopage project proposal, including an overview, background, related work, the approach, the expected outcomes and deliverables of the project, and a timeline. None of this was legally binding, and some projects changed drastically between this point and the end of the course, which was to be expected. For the final project report, I asked for 812 pages. For projects that involved developing some sort of software artifact, which was most of them, I asked the students to turn that in, too. Finally, each student gave a short inclass presentation about their project during the last week of class.
The students carried out a nice variety of projects, reflecting the diversity of their interests:
 A liquid type checker implemented using Rosette.
 A solver for the classic task scheduling problem of “which courses should I take to get a degree, and when?”, also implemented using Rosette.
 A largescale graph data analysis, carried out with the help of the Z3 Python API.
 Spelunking the source code of an opensourcebututterlyundocumented solveraided system, Indigo, getting it running, and commenting it extensively.
 A comparison of two solveraided tools for checking the equivalence of SQL queries, Cosette and EQUITAS, and an experimental evaluation of Cosette’s capabilities.
 A survey of nonlinear arithmetic solvers and their applications to hybrid systems.
 A survey of applications of SAT and SMT solving to software security. (Two students independently decided to do this.)
Some students struggled with carrying out an independent project, while others were almost too independent: their project ended up being only quite loosely connected to the material we had actually studied. The most successful of the projects were those that took something we had looked at in the course as a jumpingoff point and then explored something new. Personally, I thought the Rosettebased projects stood out in this regard, which I think says something about the power of Rosette as a platform for creating solveraided tools!
Overall reflections
I was happy with my choice of Kroening and Strichman as a textbook. That said, when I got to conflictdriven clause learning (CDCL), the algorithm at the guts of modern SAT (and hence SMT) solvers, I went looking around for alternative reading material. Chapter 4 of the Handbook of Satisfiability is a standard reference that I found useful, and the aforementioned slides from Işıl and Emina’s courses were helpful, too.
In the end, though, I decided to teach CDCL in a slightly different way from the sources I’d seen. I first presented a solver that only does Boolean constraint propagation (BCP), and hence can only solve an “easy” subset of SAT problems. In particular, it cannot solve problems that require making guesses and then backtracking if those guesses turn out to wrong. Then I extended that solver to a CDCL solver in the next lecture, with support for guessing and backtracking. We talked about clause learning (that is, adding “conflict clauses” to the formula you’re trying to solve), but we glossed over the details of conflict analysis. It helped that the Heule and Kullman article that we started the course with had already discussed clause learning at a high level, too, so we weren’t going into it cold.
I only briefly mentioned CDCL’s predecessor, DPLL, and I never showed a complete DPLL solver. DPLL is simpler than CDCL: it also uses BCP, but it doesn’t do clause learning. Clause learning is conceptually not so hard (learn from your past bad decisions, so you don’t make them again!), but the details are tricky. So, for students thought it would be helpful to learn about DPLL in isolation before moving on to CDCL, I also pointed them to this “SAT Solver Primer” by David Mitchell, recommended to me by Chungchieh Shan, that takes that approach. In fact, Mitchell starts with something that’s even simpler than DPLL: a bruteforce algorithm that doesn’t do any BCP at all. Instead, it checks every possible assignment to see if it is a satisfying assignment. This approach can solve all SAT problems if you’re willing to wait long enough, but is intractably slow. He then moves on to DPLL, then CDCL.
What I did in my lectures was kind of the opposite. I asked students to first consider an algorithm that only does BCP. But that algorithm – which I called the “elementary school” solver – only works on formulas that can be solved without guessing, which to me are the ones that are reminiscent of elementaryschool logic puzzles. Starting with BCP alone made sense to me, because BCP seems more like what humans do intuitively when we’re, say, solving sudoku puzzles. I later gave a talk called “Reasoning Under Uncertainty in SMT Solving, Research, and Life” at PLMW @ POPL 2020 that takes this same pedagogical tack.
As for the rest of the course, the reading list was kind of all over the map, and we only had time for a superficial discussion of many of the readings. My hope is that, even if students didn’t take in all the gory details of the individual papers, they began to map out the connections between them. I can say that for me personally, the most exciting new connection I made was identifying two ways of thinking about symbolic execution, which I’ll call “the KLEE way” and “the Rosette way”. KLEE is “an operating system for symbolic processes” and a symbolic interpreter for LLVM instructions, and Rosette is a “symbolic virtual machine”. Those things sure sound like they ought to be related – and they are! But KLEE uses symbolic execution as a tool for finding bugs in programs that are normally executed concretely, whereas in Rosetteland, symbolic execution is the normal, default execution strategy for programs. I think there’s a shift in mindset that takes place when one starts writing programs that are actually supposed to be executed symbolically all the time, not just for debugging. Once you make that shift, it necessitates all kinds of helpful programming tools, like James’s symbolic profiler, or a symbolic type system! This is all very recent work, and I’m sure there’s a lot more interesting work to do in this vein.
Symbolic execution, by the way, is an old idea that long predates the rise of SMT solvers. This was also true of some of the other applications for SMT solving that we explored in the course – refinement types, for instance. SMT solving is merely an implementation strategy for these ideas, and not necessarily the only implementation strategy. But once you are using SMT as an implementation strategy, it seems to open up really interesting new possibilities, and I think that makes SMT worth studying purely for its own sake, too!

Once, years ago, I asked Ras Bodik if one should think of Rosette as a solveraided DSL for building solveraided DSLs. He said, “Yes, but we don’t describe it that way, because people’s heads would explode.” ↩
Comments