calculator-parser.js

Jump To … +

calculator-backends.js calculator-parser.js

¶

calculator-parser.js

a simple calculator language
¶

This program parses a very simple language that just does a little basic arithmetic. Here are some simple examples of the sort of thing you can write in the calculator language:
- 2 + 2
- 1 * 2 + 3 * 4 + 5 / 6
- 3 + 1/(7 + 1/(15 + 1/(1 + 1/(292 + 1/(1 + 1/(1 + 1/1))))))
- 1 / ((z + 1) * (z - 1))
If you’d like to try it out, open calculator.html.
¶

Part One – Breaking code down into tokens

This function, tokenize(code), takes a string code and splits it into tokens, the numbers, words, and symbols that make up our little calculator mini-language.

function tokenize(code) {
    var results = [];
    var tokenRegExp = /\s*([A-Za-z]+|[0-9]+|\S)\s*/g;

    var m;
    while ((m = tokenRegExp.exec(code)) !== null)
        results.push(m[1]);
    return results;
}

Let’s test as we go!

var assert = require('assert');
assert.deepEqual(tokenize("123\n"), ["123"]);
assert.deepEqual(tokenize("2+2"), ["2", "+", "2"]);
assert.deepEqual(tokenize("+-*/"), ["+", "-", "*", "/"]);
assert.deepEqual(tokenize("   1   * 24 +\n\n  pi"), ["1", "*", "24", "+", "pi"]);
assert.deepEqual(tokenize("()"), ["(", ")"]);
assert.deepEqual(tokenize("    "), []);

¶

Here are a few helper functions for working with tokens. To keep things simple, a number is any sequence of digits.
```
function isNumber(token) {
    return token !== undefined && token.match(/^[0-9]+$/) !== null;
}
```

And a name, or identifier, is any sequence of letters.

function isName(token) {
    return token !== undefined && token.match(/^[A-Za-z]+$/) !== null;
}

Tests.

assert(isNumber("123"));
assert(!isNumber("x"));
assert(!isNumber("-"));
assert(isName("xyz"));
assert(!isName("+"));

¶

Part Two – The parser
¶

The parser’s job is to decode the input and build a collection of objects that represent the code.

(This is just like the way a Web browser decodes an HTML file and builds the DOM. The part that does that is called the HTML parser.)
¶

Parse the given string code as an expression in our little language.
```
function parse(code) {
```
¶

Break the input into tokens.
```
    var tokens = tokenize(code);
```
¶

The parser will do a single left-to-right pass over tokens, with no backtracking. position is the index of the next token. Start at 0. We’ll increment this as we go.
```
    var position = 0;
```

peek() returns the next token without advancing position.

    function peek() {
        return tokens[position];
    }

consume(token) consumes one token, moving position to point to the next one.

    function consume(token) {
        assert.strictEqual(token, tokens[position]);
        position++;
    }

¶

Now we have the functions that are actually responsible for parsing. This is the cool part. Each group of syntax rules is translated to one function.

Parse a PrimaryExpr—that is, tokens matching one of the three syntax rules below. Whatever kind of expression we find, we return the corresponding JS object.

PrimaryExpr :

Number

Name

( Expr )

    function parsePrimaryExpr() {
        var t = peek();

        if (isNumber(t)) {
            consume(t);
            return {type: "number", value: t};
        } else if (isName(t)) {
            consume(t);
            return {type: "name", id: t};
        } else if (t === "(") {
            consume(t);
            var expr = parseExpr();
            if (peek() !== ")")
                throw new SyntaxError("expected )");
            consume(")");
            return expr;
        } else {

If we get here, the next token doesn’t match any of the three rules. So it’s an error.

            throw new SyntaxError("expected a number, a variable, or parentheses");
        }
    }

MulExpr :

PrimaryExpr ( * PrimaryExpr | / PrimaryExpr )^*

    function parseMulExpr() {
        var expr = parsePrimaryExpr();
        var t = peek();
        while (t === "*" || t === "/") {
            consume(t);
            var rhs = parsePrimaryExpr();
            expr = {type: t, left: expr, right: rhs};
            t = peek();
        }
        return expr;
    }

Expr :

MulExpr ( + MulExpr | - MulExpr )^*

    function parseExpr() {
        var expr = parseMulExpr();
        var t = peek();
        while (t === "+" || t === "-") {
            consume(t);
            var rhs = parseMulExpr();
            expr = {type: t, left: expr, right: rhs};
            t = peek();
        }
        return expr;
    }

¶

Now all that remains, really, is to call parseExpr() to parse an Expr.
```
    var result = parseExpr();
```
¶

Well, one more thing. Make sure parseExpr() consumed all the input. If it didn’t, that means the next token didn’t match any syntax rule, which is an error.
```
    if (position !== tokens.length)
        throw new SyntaxError("unexpected '" + peek() + "'");

    return result;
}
```

And test it.

assert.deepEqual(
    parse("(1 + 2) / 3"),
    {
        type: "/",
        left: {
            type: "+",
            left: {type: "number", value: "1"},
            right: {type: "number", value: "2"}
        },
        right: {type: "number", value: "3"}
    });

calculator-parser.js

Part One – Breaking code down into tokens

Part Two – The parser