Write an interpreter: variables
This post is the second in a series. Last time, we started coding up a toy interpreter for a language of arithmetic expressions. This time, we’ll extend the language to include variables and update our interpreter to handle that extended language.
A slightly less tiny grammar of expressions
Our interpreter only knows how to deal with expressions: things of type
Expr. To be able to interpret a bigger language, we’ll need to expand the interpreter’s notion of what an
Let’s add a new variant to our
data Expr = Number Int | Var String | Plus Expr Expr | Minus Expr Expr | Times Expr Expr
This is what we had before, but with one new variant added:
Var is a value constructor for variables, just as
Number is a value constructor for integers. A convenient way to represent a variable is as a string, such as
Var will take a
Var added to the definition of
Expr, we can write down some
Exprs with variables in them:
*Interp> :t Var "x" Var "x" :: Expr *Interp> :t Plus (Var "x") (Number 5) Plus (Var "x") (Number 5) :: Expr *Interp> :t Plus (Var "x") (Var "y") Plus (Var "x") (Var "y") :: Expr
GHCi reports that these expressions are indeed
Exprs. Now we just have to teach our interpreter how to deal with them.
When given an expression, the job of an interpreter is to return that expression’s value. This is easy enough for the interpreter to accomplish when we’re working with expressions like
(3 + 4) or
((203 + (6 * (34 - 1))) + 7000). But what is the value of an expression containing a variable, like
(x + 5)?
To interpret expressions that contain variables, merely having the expressions isn’t enough. But if we could get our hands on some extra information that would tell us that, say, the value of
42, then we would know how to interpret
(x + 5), or any other expression containing
x. This extra information is called an environment.
A simple way to represent an environment might be a list of pairs. For instance,
[("x", 42), ("y", 5)] could represent the environment in which
x’s value is
y’s value is
5. Instead of using a list of pairs, we’ll use Haskell’s
Map data structure from the
Data.Map library, which comes with many convenient operations – but it doesn’t really matter what data structure we use, as long as it lets us associate variables with values.
Map, let’s define our own
Env type for environments. Since we want to map variables to values, or in our case,
Ints, we’ll instantiate the
Map type with
import qualified Data.Map as Map type Env = Map.Map String Int
Env type defined, we can define a function
lookupEnv that will let us look things up in an environment.
lookupEnv is just a thin wrapper around the
lookup function that
Data.Map already provides.
lookupEnv :: Env -> String -> Int lookupEnv env x = case Map.lookup x env of Just n -> n Nothing -> error "Unbound variable!"
lookupEnv operation takes two arguments:
env, the environment, and
x, the variable we’re trying to look up. If the lookup succeeds,
lookupEnv returns the value it found; otherwise, it raises an exception.
We haven’t yet written anything that will let us add new bindings to an environment. For now, we can roll our own environments by calling the convenient
fromList, which takes an argument like
[("x", 42), ("y", 5)] and produces the corresponding
*Interp> lookupEnv (Map.fromList [("x", 42), ("y", 5)]) "x" 42 *Interp> lookupEnv (Map.fromList [("x", 42), ("y", 5)]) "y" 5 *Interp> lookupEnv (Map.fromList [("x", 42), ("y", 5)]) "z" *** Exception: Unbound variable!
Now that we have environments and a way to look things up in them, we’re ready to add support for variables to
First, we’ll tweak the type of
interp. Before, it simply took an
Expr and returned an
Int. Now that we have to be able to interpret variables, we’ll need to have an environment handy to look up their values in. We can pass one in as a second argument to
interp :: Expr -> Env -> Int
Next, in the definition of
interp, we’ll add a new case to handle
Var expressions. When we encounter a
Var, we have to determine its value. Fortunately, this is easy: we just call the
lookupEnv function we defined, passing in the environment and the variable as its arguments.
None of the other cases of
interp will need to make use of the environment at all. But since
interp now takes an environment argument, each of the recursive calls in the
Times cases will have to pass the environment along.
interp e env = case e of Var x -> lookupEnv env x Number n -> n Plus e1 e2 -> (interp e1 env) + (interp e2 env) Minus e1 e2 -> (interp e1 env) - (interp e2 env) Times e1 e2 -> (interp e1 env) * (interp e2 env)
We now have what’s known as an environment-passing interpreter.
We’re ready to interpret some expressions with variables. We’ll have to remember to give
interp two arguments: first, the
Expr to be interpreted, and second, the environment to interpret them in. How will we get an environment to pass to
interp? We can do something like
Map.fromList [("x", 42), ("y", 5)], or, if we just want an empty environment, we can call
Map.empty. Let’s define a quick shorthand for empty environments:
emptyEnv :: Env emptyEnv = Map.empty
Now we’re ready to call
interp. What happens if we try to interpret an
Expr containing variables, but we only give it
*Interp> interp (Plus (Var "x") (Number 5)) emptyEnv *** Exception: Unbound variable!
The interpreter correctly reports that
emptyEnv isn’t enough information to interpret
Var "x". Nevertheless,
emptyEnv is handy for when we just want to interpret expressions without variables.
*Interp> interp (Times (Number 2) (Number 0)) emptyEnv 0
But, when we do have variables to interpret, we need to give
interp an environment that binds them to values.
interp (Plus (Var "x") (Number 5)) (Map.fromList [("x", 42)]) 47
Hooray! Our language now supports variables. Here’s the complete interpreter:
import qualified Data.Map as Map data Expr = Number Int | Var String | Plus Expr Expr | Minus Expr Expr | Times Expr Expr type Env = Map.Map String Int emptyEnv :: Env emptyEnv = Map.empty lookupEnv :: Env -> String -> Int lookupEnv env x = case Map.lookup x env of Just n -> n Nothing -> error "Unbound variable!" interp :: Expr -> Env -> Int interp e env = case e of Var x -> lookupEnv env x Number n -> n Plus e1 e2 -> (interp e1 env) + (interp e2 env) Minus e1 e2 -> (interp e1 env) - (interp e2 env) Times e1 e2 -> (interp e1 env) * (interp e2 env)
Since we can now interpret expressions containing variables, it would be convenient to be able to lex and parse them as well, so that we can write, say,
"(10 + (x + y))" instead of
Plus (Number 10) (Plus (Var "x") (Var "y")).
Last time, we generated a parser for our language from a Happy grammar file. Adding support for variables requires only a couple of tweaks to that file. Happily, we can deal with lexing and parsing variables in a way analogous to how we’re already dealing with integers. The resulting Happy grammar file is a little longer, but no more complicated.
*Main> lexer "(10 + (x + y))" [TokenOB,TokenInt 10,TokenPlus,TokenOB,TokenVar "x",TokenPlus,TokenVar "y",TokenCB,TokenCB] *Main> (parser . lexer) "(10 + (x + y))" Plus (Number 10) (Plus (Var "x") (Var "y")) *Main> import qualified Data.Map as Map *Main Map> (interp . parser . lexer) "(10 + (x + y))" (Map.fromList [("x", 42), ("y", 5)]) 57
Next time: functions
Now that we have support for variables in our language, we’re on our way to being able to write functions that take arguments. In the next post in this series, we’ll extend our interpreter to be able to handle functions and function calls.
All of the code from this post, plus a little additional scaffolding, is on GitHub. Here’s a diff of everything that changed from last time.