# Write an interpreter

I spent last week volunteering as a resident at Hacker School, where I helped a number of students write small interpreters. For the most part, the students I worked with weren’t exactly beginning programmers. One of them, for instance, was a robotics Ph.D. student who had written a trajectory follower that helped a robot autonomously navigate through the desert.

Clearly, it’s possible to get quite far in a programming career without having written an interpreter – or, at least, without having consciously done so. So, why does it matter? Why should you write an interpreter?

To me, one reason is that an interpreter is the quintessential program that operates on programs. To be comfortable with interpreters is to be comfortable with the idea of *code as data*, a powerful and ubiquitous idea. Programs that operate on programs include things like interpreters and compilers, but also things like emulators and debuggers. Much of the programming world regards these kinds of programs as magical, but they aren’t.

So, if you’ve never written an interpreter, or if you’re not sure if you have, let’s write a tiny one, right now!

## A tiny language of arithmetic expressions

We’ll start by interpreting simple arithmetic expressions, like `(5 + 3)`

or `((4 - 2) * (5 + (3 + 9)))`

.

To *interpret* an expression means to *evaluate* it: to determine its value. For instance, the value of `(5 + 3)`

might be `8`

, and the value of `((4 - 2) * (5 + (3 + 9)))`

might be `34`

. So, our interpreter will be a program that takes an arithmetic expression as input and returns an integer as output.

Let’s write down a grammar for expressions we want to be able to interpret. For now, we’ll handle integers and a few two-argument operations on them: addition, subtraction, and multiplication.

```
Expr =:: int
| (Expr + Expr)
| (Expr - Expr)
| (Expr * Expr)
```

Because the grammar is defined recursively, it allows for nested expressions like `(1 + ((3 * 7) - 8))`

, as well as simple ones like `5`

and `(1000 * 3)`

.

If we use a language that allows us to define our own data types, we can turn our grammar into a data type for expressions. In Haskell, for instance, it might look like this:

```
data Expr = Number Int
| Plus Expr Expr
| Minus Expr Expr
| Times Expr Expr
```

Here, `Number`

, `Plus`

, `Minus`

, and `Times`

can be thought of as “tags” that we’ve put on each variant of `Expr`

so that we can tell them apart. These “tags”, though, are really *value constructor* functions that we can call to construct various values of type `Expr`

.

The `Plus`

value constructor takes two arguments, each of which must itself be of type `Expr`

; it’s the same for `Minus`

and `Times`

. The `Number`

value constructor, though, only expects an `Int`

, which is a type built in to Haskell.

With our `Expr`

type defined, we can construct a few `Expr`

s by calling the value constructors. We can use GHCi’s `:t`

command to see the type of each one, making sure that it is in fact an `Expr`

:

```
*Interp> :t Number 3
Number 3 :: Expr
*Interp> :t Plus (Number 3) (Number 6)
Plus (Number 3) (Number 6) :: Expr
*Interp> :t Plus (Number 3) (Times (Number 6) (Number 1))
Plus (Number 3) (Times (Number 6) (Number 1)) :: Expr
```

Do things like `3`

, `(3 + 6)`

, and `(3 + (6 * 1))`

count as `Expr`

s? Not quite. We’ve skipped a step, which is to write a *parser* that will convert a string like `"(3 + (6 * 1))"`

into its `Expr`

representation of `Plus (Number 3) (Times (Number 6) (Number 1))`

.

We’ll come back to parsing in a little while. For now, let’s concern ourselves with `Expr`

s rather than strings. We want to be able to interpret `Expr`

s like `Plus (Number 3) (Times (Number 6) (Number 1))`

, evaluating them to integer values like `9`

.

## The `interp`

function

We know that any combination of addition, subtraction, and multiplication operations on integers will result in another integer. (We conveniently left out division, which would mean dealing with non-integral values.) So, our interpreter can be a function that takes an expression and returns an integer. In Haskell, its signature would look like this:

```
interp :: Expr -> Int
```

Since there are four value constructors for `Expr`

s, the `interp`

function will have four cases to handle. The first case is easy: If `interp`

is passed a `Number`

, then all it has to do is return the `Int`

wrapped up inside.

```
interp :: Expr -> Int
interp e = case e of
Number n -> n
```

Trying out our code, we see that a `Number`

just evaluates to itself, represented as an `Int`

.

```
*Interp> interp (Number 4)
4
```

At this point, though, if we try to call `interp`

on any value that isn’t a `Number`

, we’re in trouble:

```
*Interp> interp (Plus (Number 4) (Number 5))
*** Exception: Interp.hs:(10,12)-(14,42): Non-exhaustive patterns in case
```

GHCi is complaining that we haven’t covered all the possibilities for `Expr`

s. So, let’s write the rest of the cases:

```
interp :: Expr -> Int
interp e = case e of
Number n -> n
Plus e1 e2 -> (interp e1) + (interp e2)
Minus e1 e2 -> (interp e1) - (interp e2)
Times e1 e2 -> (interp e1) * (interp e2)
```

Just as the `Plus`

, `Minus`

, and `Times`

value constructors in the `Expr`

type are defined recursively, so are the `Plus`

, `Minus`

, and `Times`

clauses of `interp`

. To interpret a `Plus`

expression, we interpret its two subexpressions `e1`

and `e2`

, then just add the results together using Haskell’s built-in `+`

operation. We do the analogous thing for `Minus`

and `Times`

.

`(Plus (Number 4) (Number 5))`

works fine now:

```
*Interp> interp (Plus (Number 4) (Number 5))
9
```

And fancier, nested expressions work too:

```
*Interp> interp (Plus (Number 3) (Times (Number 6) (Number 1)))
9
*Interp> interp (Plus (Number 3) (Times (Number 6) (Number (-3))))
-15
```

Hooray! We’ve written a tiny, ten-line interpreter. Here’s the whole thing:

```
data Expr = Number Int
| Plus Expr Expr
| Minus Expr Expr
| Times Expr Expr
interp :: Expr -> Int
interp e = case e of
Number n -> n
Plus e1 e2 -> (interp e1) + (interp e2)
Minus e1 e2 -> (interp e1) - (interp e2)
Times e1 e2 -> (interp e1) * (interp e2)
```

## Parsing

It’s cumbersome to have to write long `Expr`

s like `(Plus (Number 3) (Times (Number 6) (Number 1)))`

to pass to `interp`

. Being able to write the string `"(3 + (6 * 1))"`

would be much more convenient. This is where parsing comes in. The job of a parser is to turn strings of *concrete syntax* like `"(3 + (6 * 1))"`

into *abstract syntax* representations like `(Plus (Number 3) (Times (Number 6) (Number 1)))`

.

Some people enjoy writing parsers. Others find it unbearably tedious. Fortunately, if you belong to the latter group, *parser generators* help automate the work. We can use Happy, a parser generator for Haskell, to generate a parser for our tiny language.

In a Happy grammar file of a few dozen lines, we provide a grammar for our language, plus a little supporting code. The Happy documentation explains how to go about creating such a file. (In fact, I had never used a parser generator before writing this post, but I didn’t have any trouble creating the Happy grammar file. It was largely a matter of paring down the provided example in the Happy docs to the smaller grammar we’re using here!)

Happy will take this grammar file and generate a Haskell module that provides `lexer`

and `parser`

functions. The `lexer`

function turns a string into a list of *tokens*:

```
*Main> lexer "(3 + (6 * 1))"
[TokenOB,TokenInt 3,TokenPlus,TokenOB,TokenInt 6,TokenTimes,TokenInt 1,TokenCB,TokenCB]
```

And `parser`

turns this list of tokens into an `Expr`

:

```
*Main> (parser . lexer) "(3 + (6 * 1))"
Plus (Number 3) (Times (Number 6) (Number 1))
```

Finally, once we have an `Expr`

, we can pass it on to our interpreter, `interp`

:

```
*Main> (interp . parser . lexer) "(3 + (6 * 1))"
9
```

We intentionally kept the grammar for our language excruciatingly simple in order to make it as easy as possible to create the Happy grammar file. This leads to some annoyances, like the fact that we have to parenthesize all operations. For instance, `"(3 + 4)"`

parses, but `"3 + 4"`

doesn’t. The parser doesn’t handle negative integers well, either.

```
*Main> (parser . lexer) "(3 + 4)"
Plus (Number 3) (Number 4)
*Main> (parser . lexer) "3 + 4"
*** Exception: Parse error
*Main> (parser . lexer) "3 + (-4)"
*** Exception: Parse error
```

We could fix these bugs at the cost of complicating the grammar. Nothing about the interpreter itself would need to change, though; `"(3 + 4)"`

and `"3 + 4"`

both ought to parse into `Plus (Number 3) (Number 4)`

, and negative integers present no difficulty for `interp`

. The only hitch is in turning them into `Expr`

s in the first place.

## Next time: variables

So far, we’ve managed to interpret a tiny, severely restricted language of arithmetic expressions. It would be nice to eventually grow our language into some semblance of a real programming language. To do that, sooner or later we’ll need a notion of variables. Luckily, it’s quite straightforward to add support for variables to the interpreter and parser we’ve got now. I’ll cover variables in my next post.

*The code from this post, plus a little additional scaffolding, is on GitHub.*

## Comments