Extending the Monkey Interpreter

27 Jun, 2020

I just finished reading Thorsten Ball's book Writing an Interpreter in Go. It takes you through writing a programming language from scratch called Monkey. It's a fun project book that's lighter on the theory but gets your hands dirty lexing tokens, parsing for syntax and evaluating Monkey code.

Now that I've finished reading it, I thought I would check my understanding and try to extend the Monkey language a little bit. I noticed Monkey is missing any kind of looping structure, so I'm going to add a while loop.

Adding the keyword

This starts from the completed interpreter at the end of the book, so we already have everything we need to get started. Step one for adding a new keyword to the language is letting our lexer know about it. Following along with Thorsten's writing and programming style, the first thing to do is write some tests. We have a test for the lexer, so let's extend it with what my loop will look like:

// lexer_test.go
funcTestNextToken(t *testing.T) {
    input := `
    // [...]
    let i = 0;
    while (i < 10) {
        i = i + 1;
    }
    `

    test := []struct {
        expectedType 	   token.TokenType
        expectedLiteral string
    } {
        // [...]
        {token.LET, "let"},
        {token.IDENT, "i"},
        {token.ASSIGN, "="},
        {token.INT, "0"},
        {token.SEMICOLON, ";"},
        {token.WHILE, "while"},
        {token.LPAREN, "("},
        {token.IDENT, "i"},
        {token.LT, "<"},
        {token.INT, "10"},
        {token.RPAREN, ")"},
        {token.LBRACE, "{"},
        {token.IDENT, "i"},
        {token.ASSIGN, "="},
        {token.IDENT, "i"},
        {token.PLUS, "+"},
        {token.INT, "1"},
        {token.SEMICOLON, ";"},
        {token.RBRACE, "}"},

        {token.EOF, ""},
    }

    // [...]
}

The rest of the test execution remains the same. As it stands now, we can't compile because token.WHILE is not defined. Let's add it.

// token.go
const (
    // Keywords
    // [...] 
    WHILE = "WHILE"
)

Great. Now the tests run, but fail. Progress!

--- FAIL: TestNextToken (0.00s)
    D:\Source\go\interpreterbook\src\monkey\lexer\lexer_test.go:170: tests[91] - tokentype wrong. Expected="WHILE" got="IDENT"
FAIL
FAIL	monkey/lexer	0.402s

Looking through our lexer code, most of the work happens in the NextToken function. In our default case we check if our lexer's current character is a letter, and if so we read it and check if it's a keyword or identifier. Then we call our handy function token.LookupIdent, which returns the keyword TokenType if it is a keyword, or IDENT otherwise. All we have to do is add our while keyword to our keywords map:

// token.go
var keywords = map[string]TokenType {
    // [...]
    "while": WHILE,
}

And just like that our test cases pass.

Extending the AST

Before we go any further with writing code, let's take a minute to think about what our while loop will look like. Luckily, while loops are the most simple loop in terms of syntax, with fewer moving parts than a for or for-in range loop. Here's our loop from the test case above:

while (i < 10) {
    i = i + 1;
}

If we break that down a little bit, what we have is:

while (<condition>) {
    <body>
}

We're going to add this to our abstract syntax tree, so we need to decide first whether our loop is a Statement or an Expression. According to Thorsten's definition, expressions produce values, and statements don't. I'm used to languages where loops don't produce values, or at least not directly (we can edit variables in a loop, creating side effects). We don't want to use a loop as the value of an assignment, or as a condition in a loop. I mean, we could do this, but that would require thinking outside the box a bit, and I'm really just trying to add a simple feature to this language.

Our while loop syntax is a lot like the IfExpression that we already have defined in Monkey. Let's create a WhileStatement that follows along.

// ast.go
// [...]

// WhileStatement represents an while loop
type WhileStatement struct {
    Token     token.Token // The 'while' token
    Condition Expression
    Body      *BlockStatement
}

func (we *WhileStatement) statementNode() {}

// TokenLiteral for WhileStatement
func (we *WhileStatement) TokenLiteral() string {
    return we.Token.Literal
}

func (we *WhileStatement) String() string {
    var out bytes.Buffer

    out.WriteString("while")
    out.WriteString(we.Condition.String())
    out.WriteString(" ")
    out.WriteString(we.Body.String())

    return out.String()
}

Nothing fancy here. We declare a struct for our while expression that has a token, condition and loop body. Then we implement the Node and Statement interfaces. Now that we have our AST node for while defined, we can move on to writing parser tests.

Parsing while loops

Again, let's start with a test.

// parser_test.go
func TestWhileStatement(t *testing.T) {
    input := `while (i < 10) { let i = i + 1 }`

    l := lexer.New(input)
    p := New(l)
    program := p.ParseProgram()
    checkParserErrors(t, p)

    if len(program.Statements) != 1 {
        t.Fatalf("program.Statements does not contain %d statements. got=%d\n",
            1, len(program.Statements))
    }

    stmt, ok := program.Statements[0].(*ast.WhileStatement)

    if !ok {
        t.Fatalf("program.Statements[0] is not ast.WhilleStatement. got=%T", program.Statements[0])
    }

    if !testInfixExpression(t, stmt.Condition, "i", "<", 10) {
        return
    }

    if len(stmt.Body.Statements) != 1 {
        t.Errorf("body is not 1 statements. got=%d", len(stmt.Body.Statements))
    }

    body, ok := stmt.Body.Statements[0].(*ast.LetStatement)
    if !ok {
        t.Fatalf("Statements[0] is not ast.ExpressionStatement. got=%T", stmt.Body.Statements[0])
    }

    if !testLetStatement(t, body, "i") {
        return
    }
}

Again, nothing fancy here. (In fact, this is almost the exact same as TestIfExpression!) But when we run it, we get some expected failures:

--- FAIL: TestWhileStatement (0.00s)
    D:\Source\go\interpreterbook\src\monkey\parser\parser_test.go:976: Parser has 5 errors
    D:\Source\go\interpreterbook\src\monkey\parser\parser_test.go:978: 	parser error: "no prefix parse function found for WHILE"
    D:\Source\go\interpreterbook\src\monkey\parser\parser_test.go:978: 	parser error: "no prefix parse function found for LET"
    D:\Source\go\interpreterbook\src\monkey\parser\parser_test.go:978: 	parser error: "expected next token to be :, got IDENT instead"
    D:\Source\go\interpreterbook\src\monkey\parser\parser_test.go:978: 	parser error: "no prefix parse function found for ="
    D:\Source\go\interpreterbook\src\monkey\parser\parser_test.go:978: 	parser error: "no prefix parse function found for }"
FAIL
FAIL	monkey/parser	0.322s

The first error tells us exactly what we need to do. There's no prefix parse function for our while keyword, so we need to add it. Since while is a statement, let's add it beside the other existing statements in the parseStatement function.

// parser.go
func (p *Parser) parseStatement() ast.Statement {
    switch p.curToken.Type {
    // [...]
    case token.WHILE:
        return p.parseWhileStatement()
    default:
        return p.parseExpressionStatement()
    }
}

// [...]

func (p *Parser) parseWhileStatement() ast.Statement {
    statement := &ast.WhileStatement{
        Token: p.curToken,
    }

    if !p.expectPeek(token.LPAREN) {
        return nil
    }

    p.nextToken()
    statement.Condition = p.parseExpression(LOWEST)

    if !p.expectPeek(token.RPAREN) {
        return nil
    }

    if !p.expectPeek(token.LBRACE) {
        return nil
    }

    statement.Body = p.parseBlockStatement()

    return statement
}

Again, this is almost the exact same as parseIfExpression() but with the Alternative property removed. Running the test again shows we've got it to pass. That wasn't too tough!

I did notice at this point that Monkey doesn't allow existing variables to be reassigned like in other languages. Here's what I mean:

>> let x = 5;
>> x = 10;
        no prefix parse function found for =

We can work around this by reusing the let keyword like we did in the tests above.

>> let y = true;
>> let y = false;
>> y
false

It's kind of unusual, but something we can maybe fix at a later date. It turns out for our simple cases here this isn't an issue. Consider the loop body we had in our test cases above:

let x = x + 1;

What happens here is we are assigning a variable to the environment, but the value of the variable comes back from the environment while we evaluate the expression. In a language like C# this wouldn't work, we'd get an error saying we couldn't redeclare the variable with that name because it already exists in the enclosing scope. We could update Monkey to do something similar, but for now we have a workaround. Let's move on to evaluation.

Updating the evaluator

Again, we start with some tests. We can make good use of the existing helper functions in our evaluator tests, so we don't actually have to write too much:

// evaluator_test.go
func TestWhileLoop(t *testing.T) {
    tests := []struct {
        input    string
        expected int64
    }{
        {
            "let x = 0; while (x < 10) { let x = x + 1; } x",
            10,
        },
        {
            "let x = 0; while (false) { let x = x * 123; } x",
            0,
        },
    }

    for _, tt := range tests {
        evaluated := testEval(tt.input)
        testIntegerObject(t, evaluated, tt.expected)
    }
}

Of course, right away this fails.

--- FAIL: TestWhileLoop (0.00s)
    D:\Source\go\interpreterbook\src\monkey\evaluator\evaluator_test.go:547: object has wrong value. got=0, want=10
FAIL
FAIL	monkey/evaluator	0.330s

We haven't actually looped yet! Let's check out our evaluator's Eval function, where all the magic happens. We need to add a case for our new while statement:

// evaluator.go
func Eval(node ast.Node, env *object.Environment) object.Object {
    switch node := node.(type) {
    // [...]
    case *ast.WhileStatement:
        return evalWhileExpression(node, env)

    // [...]
}

func evalWhileExpression(we *ast.WhileStatement, env *object.Environment) object.Object {
    condition := Eval(we.Condition, env)

    if isError(condition) {
        return condition
    }

    for isTruthy(condition) {
        Eval(we.Body, env)
        condition = Eval(we.Condition, env)
    }

    return NULL
}

Again, there's really not much special here. The trick is we need to keep evaluating our condition statement, otherwise we'll loop forever (we still might, but that's up to the person writing the Monkey code). We can run our tests again, and they pass with flying colours.

Playing with loops in the REPL

Now we're ready to test out our loops using the REPL.

>> let helloTimes = fn(name, times) { let i = 0; while (i < times) { puts("hello " + name); let i = i + 1; } };
>> helloTimes("greg", 5)
hello greg
hello greg
hello greg
hello greg
hello greg
null
>>

Not bad! Of course, there's still room to improve. Our while loop checks whether or not the value in our condition is truthy, as opposed to true, so we can make weird loops like while ("hello") {} that run forever. It's the same way with Javascript, so I can't be too hard on Monkey. We could also improve by adding the ability to redefine variables so we don't need to overuse the let keyword. There's also a number of operators that we could easily add to the language, or add other nice quality of life improvements like having line numbers on our error messages. Monkey is a great starting point however, and the book taught me a lot about interpreters. If you're at all curious, even if you don't know Go, I think it's a great read. You can also see my branch adding the while loop here.

#go #interpreters #monkey