Update (February 2017): I’ve written a follow-up to this post!


A few days ago, my friend Stephen Tu pointed me in the direction of “Java Generics are Turing Complete”, a “short note” that Radu Grigore recently posted on arXiv. In it, Grigore shows that in Java, subtyping is undecidable (his “Theorem 1”) and that therefore, type checking is undecidable. He then remarks:

For Java, Theorem 1 implies that there cannot exist a formally verified type checker: All Java type checkers have bugs.

Stephen wanted to know what exactly the claim of “all Java type checkers have bugs” meant. To understand what this means, let’s take a closer look at Grigore’s result.

If a programming language’s type system is Turing-complete, that means you can write programs that express arbitrary computations entirely within types. A Turing-complete type system makes type checking undecidable. This is not so unusual: Grigore points out right off the bat that Haskell, C++, and Scala have all turned out to have Turing-complete type systems, and that therefore, type checking for those languages is undecidable too.

Grigore formally models a fragment of Java’s type system involving type variables and classes. This small part of the type system is all that’s necessary to show undecidability of type checking. He defines a Turing machine called a “subtyping machine” that will take two types t and t' and a class table as input and will halt if and only if t is a subtype of t' according to the class table. He then shows how to construct inputs t and t' and a class table that will make the subtyping machine halt if and only if a given Turing machine halts on a given input.1

The details of Grigore’s construction are a bit hairy, and I don’t understand all of them, but the point is that because the halting problem is undecidable, it’s also undecidable whether the subtyping machine will halt. Therefore, it’s undecidable whether t is a subtype of t'. So the subtyping decision problem is undecidable, and hence, so is the type checking decision problem.

Having satisfied ourselves that Java type checking is undecidable, then, what does Grigore mean when he says that “all Java type checkers have bugs”?

One point of view goes like this: A bug is a violation of a specification. If the specification for the program “Java type checker” says that that program is always supposed to produce a correct yes-or-no answer, then, indeed, the undecidability of type checking would mean that “all Java type checkers have bugs”. However, I find it pretty hard to fault a type checker for not being able to satisfy a mathematically unsatisfiable spec. If it is impossible to write a type checker that will produce a correct yes-or-no answer for every input it’s given (as Grigore’s undecidability result implies), then the fault is really with the spec, not the type checkers that inevitably fail to satisfy it. Of course, in program verification, the point is to show whether or not a program satisfies a spec, not to philosophize about whether the spec is “correct”.

Another possibility is that Grigore considers nontermination of a type checker to always be a bug, regardless of what any specification might say. Grigore discusses a scenario in which subtype checking would need to happen at runtime in order to implement instanceof; this is not the case today, but it might be the case in a future version of Java, if the language adopts reified generics. In that case, nontermination might be construed as a security problem: a malicious caller of a function that uses instanceof might be able to construct an input that causes the function call not to terminate, causing a sort of denial of service.

However, as Grigore also points out, such a situation can be avoided easily enough with a timeout. Indeed, for Haskell, the GHC compiler gets around the possibility of nonterminating type checking with a fixed reduction depth, which is currently set to 200 by default. Furthermore, with GHC, it’s only possible to write a nonterminating type if you’re using the UndecidableInstances language extension. As a comment in the guts of the GHC typechecker rather opinionatedly points out:

For civilised (decidable) instance declarations, each increase of depth removes a type constructor from the type, so the depth never gets big; i.e. is bounded by the structural depth of the type.

So, for a program where typechecking is decidable, there will always be some finite bound above which the counter will not go. However, that bound could be somewhere north of 200, so, if the programmer is sure that type checking will terminate (that is, if she isn’t using UndecidableInstances), she can feel free to disable the check by setting -freduction-depth=0.

To summarize, “all Java type checkers have bugs” feels like a bit of a stretch for me. I have sympathy for the idea that nontermination of type checking is always a bug, but it’s the sort of bug that can be easily (albeit uglily) gotten around with a timeout or a fixed depth limit. All that said, I think Grigore’s undecidability result is really cool.

Thanks to various people who discussed aspects of this post with me, including Gabriel Scherer, Billy Bowman, Wren Romano, Alan O’Donnell, Greg Price, and Stephen Tu. They may or may not agree with my conclusions.

  1. The two-step process doesn’t seem to be essential to the proof; Grigore says, “There is nothing deep about subtyping machines: they are introduced simply because the new notation will make it much easier to notice certain patterns.” 

Comments