My work draws on the research traditions of programming languages, distributed systems, and software verification. I work on programming-language-based approaches to building concurrent and distributed software systems that are correct and efficient. The unifying principle and goal of my work is to use high-level abstractions to express software systems in a way that not only does not compromise performance, but actually enables it.

This page summarizes some of what I’ve worked on over the years, although the most recent stuff is inevitably missing. As of early 2023, my research group has been investigating library-level choreographic programming, surveying bugs in serverless runtime systems, and using solver-aided tools to build trustworthy distributed systems.

Using solver-aided languages and tools to build reliable distributed systems

To guard against machine failures and keep data close to clients who need it, modern internet services store multiple replicas of the same application data within and across data centers. Replication introduces the problem of keeping geo-distributed copies of data consistent with one another in the face of network partitions and unpredictable message latency. To avoid costly and conservative synchronization protocols, many real-world systems provide only weak consistency guarantees (e.g., causal consistency or PRAM consistency), which permit certain kinds of disagreement among replicas. There has been much recent interest in programming-language support for specifying and verifying such consistency properties. Although these properties are typically beyond the scope of what traditional type checkers or compiler analyses can guarantee, solver-aided programming languages – languages that have integrated SAT or SMT solving capabilities, either during compile time, run time, or both – are up to the task.

In our OOPSLA ‘20 paper, my collaborators and I used the Liquid Haskell solver-aided language to prove that replicated data structures are strongly convergent – the property that, if replicas have applied the same set of update operations, in any order, then they have equivalent state. We were able to state and prove the strong convergence property at the level of a generic interface in Liquid Haskell, and then plug in a data-structure-specific commutativity proof for each data structure we wanted to verify. Being able to do this proof in this modular fashion required extending Liquid Haskell to add the ability to state and prove properties at the typeclass level, a feature with many applications beyond our work.

More broadly, in our SNAPL ‘19 paper, my colleague Peter Alvaro and I make a case for domain-specific SMT-based tools that exploit the mathematical foundations of distributed consistency. Such consistency-aware solver-aided tools would support the rapid implementation of formally verified programming abstractions that guarantee distributed consistency. In the long run, we aim to democratize the development of such domain-specific solvers by creating a framework for domain-specific solver development that brings new theory solver implementation within the reach of programmers who are not necessarily SMT solver internals experts.

Verification for safety-critical neural networks

In 2017, my group at Intel Labs began sponsoring and collaborating with a team of researchers at Stanford and HUJI who are extending the capabilities of automated verification tools, such as SMT solvers, to formally verify properties of deep neural networks used in safety-critical systems. One of the findings from this work was that in order to scale up SMT-based verification to realistic networks, we need to develop domain-specific solvers that incorporate domain knowledge about neural network internals. The Reluplex solver developed by our collaborators exemplifies this idea of exploiting high-level domain-specific abstractions for efficiency. In the longer term, I’m interested in creating a framework that makes it easy to develop such high-performance domain-specific solvers, analogously to how, for instance, the Delite framework aimed to make it easy to develop high-performance DSLs.

For more on this line of work, aside from the posts in the “neural network verification” category on this blog, see our short paper that appeared at SysML ‘18 and the accompanying poster.

Parallelizing Julia with a non-invasive DSL

Computational scientists often prototype their software using a “productivity” language such as MATLAB, Python, R, or Julia. When it comes time to scale up such prototypes to run on a high-performance computing system, though, the code typically has to be rewritten in an “efficiency” language like C++. This rewriting process takes considerable time and expense and often results in brittle code, with which it is harder to experiment. Domain-specific languages (DSLs) are one well-regarded solution to the efficiency/productivity trade-off, but it is difficult to design a DSL that can anticipate any use case that the programmer will eventually encounter.

With my colleagues at Intel Labs, I’m working on ParallelAccelerator, a compiler for the Julia programming language that attempts to bridge this productivity-efficiency gap in scientific computing. ParallelAccelerator discovers and exploits the implicit parallelism in source programs that use parallel programming patterns. For the most part, these patterns are already present in standard Julia, so programmers can use ParallelAccelerator to compile Julia programs with only minor changes to source code. Our results show that ParallelAccelerator enables orders-of-magnitude speedups over plain Julia on a variety of scientific computing workloads, without requiring the programmer to extensively rewrite code or add lots of type annotations to guide the compiler.

For more on ParallelAccelerator, aside from the posts in the “ParallelAccelerator” category on this blog, see:

LVars: lattice-based data structures for deterministic parallel and distributed programming

Parallel programming is notoriously difficult. Programs can yield inconsistent answers, or even crash, due to unpredictable interactions between parallel tasks. But it doesn’t have to be this way: deterministic-by-construction programming models offer programmers the promise of freedom from subtle, hard-to-reproduce nondeterministic bugs in parallel code. To make that promise a reality, though, deterministic-by-construction parallel programming models must sharply restrict sharing of state between parallel tasks, usually either by disallowing sharing entirely or by restricting it to one type of data structure, such as single-assignment locations.

A common theme that emerges in the study of such deterministic-by-construction programming models – from venerable ones like Kahn process networks, to modern single-assignment-based models like Haskell’s monad-par library and Intel’s Concurrent Collections system – is that the determinism of the system hinges on some notion of monotonicity. By taking monotonicity as a starting point, then, we can generalize and unify existing models. For my dissertation research with Ryan Newton, I developed LVars, a new model for deterministic-by-construction parallel programming that generalizes existing single-assignment models to allow multiple assignments that are monotonically increasing with respect to a specified lattice. LVars maintain determinism by allowing only monotonic writes and “threshold” reads to and from shared data.

Finally, LVar-style threshold reads can also apply to the setting of convergent replicated data types (CvRDTs), which specify the behavior of eventually consistent replicated objects in a distributed system. In recent work, I’ve extended the CvRDT model to support deterministic, strongly consistent threshold queries. The technique generalizes to any lattice, and hence any CvRDT, and allows deterministic observations to be made of replicated objects before the replicas’ states converge.

For more on LVars, aside from the posts in the “LVars” category on this blog, see:

River Trail: parallel programming in JavaScript

In 2014, I joined Intel Labs and began contributing to River Trail, a library, JIT compiler, and Firefox extension to enable data-parallel programming in JavaScript. The River Trail library provides a ParallelArray array type and a variety of potentially-parallel operations on it, like map, reduce, and filter; it works in conjunction with the Firefox extension, which enables programs written using the library to run in parallel, via OpenCL. Together with my colleagues at Intel, I rewrote the River Trail Firefox extension in JavaScript, replacing the old binary XPCOM extension. This was my first major JavaScript project, and I learned a lot about browser extensions.

For more on River Trail, see the tutorial, as well as the OOPSLA 2013 paper by Stephan Herhut, Richard Hudson, Tatiana Shpeisman, and Jaswanth Sreeram.

Various contributions to the Rust programming language

Rust is a new systems programming language being developed by Mozilla Research with the goals of safety, concurrency, and speed. During the summers of 2011 and 2012, I worked on the Rust team at Mozilla, making various contributions to the self-hosted compiler. Rust remains under active development by a team of paid and volunteer contributors; follow the progress on GitHub!

During summer 2011, I added self-dispatch, object extension, and method overriding to the prototype-based object system that Rust had at the time. Prototype-based object systems are usually only found in dynamic languages such as Self and JavaScript, and adding these features to Rust posed an unusual challenge, since dynamic object extension and method overriding cause the type of the current object to change at run-time. I gave a talk about this project to my research group in September 2011. (Although the prototype-based object system was scrapped soon afterward, a vestige of my work lives on: the self keyword in Rust!)

During summer 2012, I made various contributions to Rust, including implementing integer-literal suffix inference and some work on extending the Rust trait system to support Haskell-like default methods. I gave a talk about the latter project at Mozilla in August 2012. (When I started this work, traits were still called ifaces; I changed the keyword to trait!)

Parametric polymorphism through run-time sealing

Languages with parametric polymorphism provide the compile-time guarantee that programs behave uniformly regardless of the types they are instantiated with. In such languages, this parametricity guarantee forms the basis of data abstraction and makes possible Wadler’s “theorems for free!” Ordinarily, embedding untyped code fragments in a statically typed language would break parametricity. However, it’s possible to preserve parametricity in such a setting by the use of dynamic sealing, which enforces data abstraction by using uniquely generated keys to seal values that should be kept opaque.

I worked with Amal Ahmed and Jacob Matthews on an update to their previous work on proving parametricity for a multi-language system that combines System F with a Scheme-like language. Our updated proof employs a cross-language, step-indexed logical relation that uses possible worlds to model the semantics of key generation. Using this possible-worlds semantic model, we can show two results: first, that System F’s parametricity guarantee is preserved when interoperating with Scheme, and, second, that parametricity is preserved in a Scheme-only setting that interacts with System F only to implement a polymorphic contract system. I gave a talk about this project at the Northeastern University Programming Languages Seminar in February 2011.

Testing abstract representations of CPU semantics

Junghee Lim and Thomas Reps’ Transformer Specification Language (TSL) system comprises a domain-specific programming language for specifying the semantics of a processor at the level of the instruction set architecture, together with a run-time system for generating static analysis engines for executables compiled for that instruction set. TSL’s ability to generate static analysis engines from ISA specifications makes it a powerful tool for analyzing executables, but it’s of limited value if the ISA specifications are wrong in the first place. We are therefore interested in verifying the correctness of TSL’s ISA specifications.

During summer 2010, I worked with David Melski and the research team at GrammaTech and implemented a prototype system for testing TSL specifications. Given a CPU emulator generated from a TSL spec, our system attaches the emulator to a Linux process running on a physical CPU, initializes the emulator’s state from that of the process, and compares the states of the emulated and physical CPUs after the execution of each instruction. If the states differ, that suggests a possible bug in the specification from which the emulator was generated. Our emulator-testing framework is flexible enough that it could be extended to support testing of abstract emulators that have an abstract interpreter at their core, in addition to the concrete CPU emulators it supports now. This work was done as part of the Safety in Numbers research project and appears in the project report; I also gave a talk about my work on the project to my research group in October 2010.

A pattern matcher for miniKanren

miniKanren is a declarative logic programming system implemented in a pure functional subset of Scheme. In 2009, I worked with Andy Keep, Michael Adams, Will Byrd, and Dan Friedman on implementing a pattern matcher for miniKanren. Our self-imposed challenge was to implement the pattern matcher using Scheme’s syntax-rules macro system, as opposed to a more powerful macro system such as syntax-case. Our pattern matcher implementation uses CPS macros to control the order in which macro expansion takes place, delaying expansion of a macro that generates variable bindings until after other macro expansions have happened. The ability to control the order in which our macros expand allows us to generate more concise code. However, we discovered a situation in which delayed expansion caused incorrect code to be generated, leaving variables unbound that should have been bound. The problem can be worked around either by using syntax-case, or by using syntax-rules but not (as much) CPS. Our paper on this project appeared at the 2009 Scheme Workshop, giving me an upper bound of 4 on my Erdős number.