Recently I’ve started reading the book by Andrew W. Appel titled Modern compiler implementation in ML. I’ve picked it up because I’ve heard some really good reviews about the ML version of it. Also there are other editions available that use C and Java. Each chapter covers a single phase of the compilation process and comes with some initial ML code and programming excercies. At the end you will have a working optimizing compiler.

I’m building a compiler for the first time and it is a lot of fun. In these blog post series, I’ll be working through the book and explaining my solutions to each chapter.

Getting started Link to heading

The author describes Tiger as a simple (but nontrivial) language of the Algol family, which could be easily modified to be functional or object-oriented (maybe I’ll skip the OO-part of the book).

Here is an example program in Tiger
type tree = {
  key: string,
  left: tree,
  right: tree
}

function prettyprint(tree: tree) : string =
  let
    var output := ""

    function write(s: string) =
      output := concat(output, s)

    function show(n: int, t: tree) =
      let
        function indent(s: string) = (
          for i := 1 to n do
             write(" ");
             output := concat(output, s);
             write("\n")
        )
      in
        if t = nil
        then ident(".")
        else (
          indent(t.key);
          show(n + 1, t.left);
          show(n + 1, t.right)
        )
      end
  in
    show(0, tree);
    output
  end

For my implementation I’ve decided to use OCaml. It is close enough to SML so with even basic ML experience it should be fairly straightforward to port the code examples.

Straight-line programs Link to heading

The first “warm-up” assignment is to implement a program analyzer and interpreter for the straight-line programming language. We have the following types:

type id = string

type binop = Plus | Minus | Times | Div

type stm = Compound of stm * stm
         | Assign of id * exp
         | Print of exp list

 and exp = Id of id
         | Num of int
         | Op of exp * binop * exp
         | Eseq of stm * exp

So for the following sample straight-line program

a := 5 + 3;
b := (print(a, a - 1), a * 10);
print(b)

The corresponding AST would be

Compound(
  Assign("a", Op(Num 5, Plus, Num 3)),
  Compound(
    Assign("b",
      Eseq(
        Print [Id "a"; Op(Id "a", Minus, Num 1)],
        Op(Num 10, Times, Id "a"))),
    Print [Id "b"]))

The first task is to write a function

val maxargs : stm -> int

This function tells the maximum number of args of any print statement within any subexpression of a given statement.

Sounds easy, right? The implementation is pretty straightforward too:

open Base
open Straightline

let rec maxargs stm =
  let maxargs_exp = function
    | Eseq (stm, _) -> maxargs stm
    | _ -> 0
  in match stm with
  | Print exps ->
    let len = List.length exps in
    let exp_lens = List.map ~f:maxargs_exp exps in
    let max_exp_len = List.max_elt ~compare:Int.compare exp_lens in
    Int.max len (Option.value ~default:0 max_exp_len)
  | Compound (exp1, exp2) ->
    let l1 = maxargs exp1 in
    let l2 = maxargs exp2 in
    Int.max l1 l2
  | Assign (_, exp) ->
    maxargs_exp exp

Next, we need to write another function that “interprets” program in this language.

val eval : stm -> unit

A simple interpreter could look like this:

open Base
open Stdio
open Straightline

let eval x op y =
  match op with
  | Plus  -> x + y
  | Minus -> x - y
  | Times -> x * y
  | Div   -> x / y

let rec interp_stm stm env =
  let print env0 exp =
    let (v, env1) = interp_exp exp env0 in
    print_endline (Int.to_string v);
    env1
  in
  match stm with
  | Print exps ->
    List.fold_left ~init:env ~f:print exps
  | Assign (key, exp) ->
    let (data, env1) = interp_exp exp env in
    Map.set env1 ~key ~data
  | Compound (stm1, stm2) ->
    let env1 = interp_stm stm1 env in
    interp_stm stm2 env1

and interp_exp exp env =
  match exp with
  | Id k ->
    let v = Map.find_exn env k in (v, env)
  | Num x -> (x, env)
  | Op (exp1, op, exp2) ->
    let (x, env1) = interp_exp exp1 env in
    let (y, env2) = interp_exp exp2 env1 in
    let r = eval x op y in
    (r, env2)
  | Eseq (stm, exp) ->
    let env1 = interp_stm stm env in
    interp_exp exp env1

Here we’re just recursively traversing a tree, keeping a map of identifiers, evaluating arithmetic operations and printing values.

Excersises Link to heading

A couple of more excercises about trees, nothing super exciting. So I won’t paste the rest of my solutions here. To implement a balanced-tree (ex. d) I’ve read the chapter about Red-Black Trees in Okasaki book. The code is here.

Well, thats all for part 1! The next one is about lexical analysis where we’ll write a simple lexer for the Tiger language.

Conclusion Link to heading

By the time I’ve decided to start writing this post I’ve already finished reading 6th chapter and here are some first impressions:

  • The book is clear and enjoyable to read. Programming assignments and exercises are well thought out
  • Implementing the typechecker (chapter 5) wasn’t easy, for me it took a week to make it work and typecheck correctly every given example program
  • Chapter 6 feels easier (less code to write), but requires some basic knowledge about microprocessor architectures (SPARK, MIPS)

I wouldn’t recommend starting with this book to someone with zero compiler design knowledge. But it is definitely a must read for people who already know the basics of compiler construction.