Part 0 and code reference: https://github.com/quangIO/Lucix

In this short blog, we will use ANTLR (it's awesome) to create our parser.

Grammar

There are many formal articles you can easily find in literature about parsing but in this blog, I will go with the informal, lazy approach. Basically, I want my language to look like this

test() -> i32 {
    return 0;
}

main () -> i32 {
    c: i32 = 3;
    if a > b {
        c = test();
    }
    return c;
}

What we have

  • Variables: a, b, c
    • Declare as identifier ‘:’ type (‘=’ expression)? ‘;’ (? means the thing before is optional)
  • Number literals: 3,…
  • Operators: +, -, *, /,…
  • Expressions:
    • something + something (or multiply, subtract …, [insert operator here])
    • Numbers and identifiers themeselve are also expressions
    • We may have unary operators, so something like -expression are also expressions
    • Parenthensis (expression)
    • Assignment
  • Statements
    • if
    • for loop
    • return
    • expression “;”
    • {...} block
  • Types
    • i32 only for now for simplicity
  • Functions
    • Declare as identifier '(' functionParameters? ')' ('->' type)? block

What we should not have

  • Ambiguity

What we want: Abstract syntax tree

Know about expression trees? From those trees, it is easy to do transformations, calculations, or compilation to other format, on expressions. The same thing goes with syntax trees. For example, parsing the above grammar will return this one

Expression Tree
Lucix AST (truncated)
More about AST:

How to write the rules in ANTLR

If you have programmed in Haskell or know about BNF, this blog is useless for you I think. If you don't, search BNF up ;)… or if you are lazy, you don't really need to. Feel free to tinker around this.

Tips: you can install ANTLR plugin in Intellij, copy this into Lucix.g4, go to that file, right click on application and test the rules

grammar Lucix;

application:
    (variableDeclaration | functionDeclaration)* ;

variableDeclaration:
    ID ':' type ('=' expression)? ';' ;

functionDeclaration:
    ID '(' functionParameters? ')' ('->' type)? block ;

type: 'i32' ;

functionParameters:
    parameter (',' parameter)* ;

parameter:
    ID ':' type ;

expression
    : ID '(' expressionList? ')'            # CallExpr
    | '-' expression                        # MinusExpr
    | expression op=('*' | '/') expression  # MulDivExpr
    | expression op=('+' | '-') expression  # AddSubExpr
    | expression op=('<' | '>') expression  # LeGeExpr
    | ID                                    # IdExpr
    | INT                                   # IntExpr
    | '(' expression ')'                    # ParensExpr
    | assign                                # AssignExpr
    ;

expressionList:
    expression (',' expression)* ;

block:
    '{' statement* '}' ;

statement
    : variableDeclaration
    | ifStatement
    | block
    | assign ';'
    | jumpExpression
    | expression   ';' ;

ifStatement
 : 'if' expression block ('else' ifStatement? block)?
 ;

assign:
    ID '=' expression ;

jumpExpression:
    'return' expression? ';' ;

ID:
    Letter (Letter | Digit)* ;

INT:
    Digit+ ;

fragment Digit:
    [0-9] ;

fragment Letter:
    [a-zA-Z] ;

SPACE:
    [ \t\n\r]+ -> skip ;

COMMENT:
    ';;' ~[\r\n]* -> skip ;

Follow the white/black/asian/latinX… rabbit?

Good starting point is recursive parser, LL, … Checking out grammars from other programming languages is also quite useful

Next parts: Code generation; JIT compilation; Memory + FFI; Garbage collector