Recently I’ve started reading the book by Andrew W. Appel titled Modern compiler implementation in ML. I’ve picked it up because I’ve heard some really good reviews about the ML version of it. Also there are other editions available that use C and Java. Each chapter covers a single phase of the compilation process and comes with some initial ML code and programming excercies. At the end you will have a working optimizing compiler.
I’m building a compiler for the first time and it is a lot of fun. In these blog post series, I’ll be working through the book and explaining my solutions to each chapter.
Getting started Link to heading
The author describes Tiger as a simple (but nontrivial) language of the Algol family, which could be easily modified to be functional or object-oriented (maybe I’ll skip the OO-part of the book).
Here is an example program in Tiger
type tree = {
key: string,
left: tree,
right: tree
}
function prettyprint(tree: tree) : string =
let
var output := ""
function write(s: string) =
output := concat(output, s)
function show(n: int, t: tree) =
let
function indent(s: string) = (
for i := 1 to n do
write(" ");
output := concat(output, s);
write("\n")
)
in
if t = nil
then ident(".")
else (
indent(t.key);
show(n + 1, t.left);
show(n + 1, t.right)
)
end
in
show(0, tree);
output
end
For my implementation I’ve decided to use OCaml. It is close enough to SML so with even basic ML experience it should be fairly straightforward to port the code examples.
Straight-line programs Link to heading
The first “warm-up” assignment is to implement a program analyzer and interpreter for the straight-line programming language. We have the following types:
type id = string
type binop = Plus | Minus | Times | Div
type stm = Compound of stm * stm
| Assign of id * exp
| Print of exp list
and exp = Id of id
| Num of int
| Op of exp * binop * exp
| Eseq of stm * exp
So for the following sample straight-line program
a := 5 + 3;
b := (print(a, a - 1), a * 10);
print(b)
The corresponding AST would be
Compound(
Assign("a", Op(Num 5, Plus, Num 3)),
Compound(
Assign("b",
Eseq(
Print [Id "a"; Op(Id "a", Minus, Num 1)],
Op(Num 10, Times, Id "a"))),
Print [Id "b"]))
The first task is to write a function
val maxargs : stm -> int
This function tells the maximum number of args of any print
statement within any subexpression of a given statement.
Sounds easy, right? The implementation is pretty straightforward too:
open Base
open Straightline
let rec maxargs stm =
let maxargs_exp = function
| Eseq (stm, _) -> maxargs stm
| _ -> 0
in match stm with
| Print exps ->
let len = List.length exps in
let exp_lens = List.map ~f:maxargs_exp exps in
let max_exp_len = List.max_elt ~compare:Int.compare exp_lens in
Int.max len (Option.value ~default:0 max_exp_len)
| Compound (exp1, exp2) ->
let l1 = maxargs exp1 in
let l2 = maxargs exp2 in
Int.max l1 l2
| Assign (_, exp) ->
maxargs_exp exp
Next, we need to write another function that “interprets” program in this language.
val eval : stm -> unit
A simple interpreter could look like this:
open Base
open Stdio
open Straightline
let eval x op y =
match op with
| Plus -> x + y
| Minus -> x - y
| Times -> x * y
| Div -> x / y
let rec interp_stm stm env =
let print env0 exp =
let (v, env1) = interp_exp exp env0 in
print_endline (Int.to_string v);
env1
in
match stm with
| Print exps ->
List.fold_left ~init:env ~f:print exps
| Assign (key, exp) ->
let (data, env1) = interp_exp exp env in
Map.set env1 ~key ~data
| Compound (stm1, stm2) ->
let env1 = interp_stm stm1 env in
interp_stm stm2 env1
and interp_exp exp env =
match exp with
| Id k ->
let v = Map.find_exn env k in (v, env)
| Num x -> (x, env)
| Op (exp1, op, exp2) ->
let (x, env1) = interp_exp exp1 env in
let (y, env2) = interp_exp exp2 env1 in
let r = eval x op y in
(r, env2)
| Eseq (stm, exp) ->
let env1 = interp_stm stm env in
interp_exp exp env1
Here we’re just recursively traversing a tree, keeping a map of identifiers, evaluating arithmetic operations and printing values.
Excersises Link to heading
A couple of more excercises about trees, nothing super exciting. So I won’t paste the rest of my solutions here. To implement a balanced-tree (ex. d) I’ve read the chapter about Red-Black Trees in Okasaki book. The code is here.
Well, thats all for part 1! The next one is about lexical analysis where we’ll write a simple lexer for the Tiger language.
Conclusion Link to heading
By the time I’ve decided to start writing this post I’ve already finished reading 6th chapter and here are some first impressions:
- The book is clear and enjoyable to read. Programming assignments and exercises are well thought out
- Implementing the typechecker (chapter 5) wasn’t easy, for me it took a week to make it work and typecheck correctly every given example program
- Chapter 6 feels easier (less code to write), but requires some basic knowledge about microprocessor architectures (SPARK, MIPS)
I wouldn’t recommend starting with this book to someone with zero compiler design knowledge. But it is definitely a must read for people who already know the basics of compiler construction.