Skip to content

Implementing the Oil Expression Language

andychu edited this page Jul 3, 2019 · 33 revisions

Turn Oil's expression grammar into an AST #387

Demo:

bin/osh -c 'var x = 1 + 2 * 3;`

This already works. (Right now semicolon or newline are accepted, we should also add EOF.)

Code

  • https://github.com/oilshell/oil/tree/master/oil_lang
    • grammar.pgen2 is literally Python 3's grammar!!!
    • expr_parse.py contains the public interface that the rest of the code uses. It turns a stream of tokens into an AST, which is two steps under the hood. (tokens -> parse tree, then parse tree -> AST)
    • expr_to_ast.py -- the "transformer" i.e. parse tree -> AST step
  • frontend/syntax.asdl is the unified OSH and Oil code representation
    • Scroll down to OIL LANGUAGE, and then everything we care about is under the expr type.

Related Code

Things We Want to Add

  • LHS and RHS of assignments
    • Python distinguishes LHS and RHS after parsing and before AST construction, i.e. in this "transformer", and we'll follow the same strategy. That is, certain expr nodes can appear on both LHS and RHS, and others can only appear on the RHS.
  • All the operators
    • unary, binary
    • ternary operator: a if cond else b
    • including in, not in, is, is not
    • subscribing, slicing
    • Small changes:
      • // is div
      • ** is ^ (following R and other mathematical languages)
      • ^ is xor
    • lower priority, but we'll probably end up having:
      • starred expressions on LHS and RHS for "splatting". (Might use @ operator instead?)
      • chained comparisons like 3 < x <= 5
  • function calls f(x, y=3). Includes method calls with . operator, e.g. mydict.clear()
    • To start, all the functions we will be builtins. User Function definitions come later!
  • Literals
    • dict -- except keys are "bare words", like JS
    • list
    • tuples, although I want to disallow 1-tuples like x,
    • bool -- true and false, following C, Java, JS, etc.
      • not True and False because types are generally capitalized Str, Dict, List
    • integer
    • float
    • probably sets, although the syntax might be different to allow for dict punning, like {key1, key2} taking their values from surrounding scope
    • string: single quoted are like Python strings, but double quoted allows interpolation. This involves lexer modes. (Already implemented to a large extent)
    • later: homogeneous arrays
      • @[ mycommand --flag1 --flag2 ] -- uses the "command" lexer mode for "bare words"
      • @[1 2 3]
  • Comprehensions (lower priority)
    • list, dict, set
  • Function literals (lower priority)

Testing Strategy

TODO: We should talk about this.

Generally I test things very quickly with osh -n -c, or an interactive shell, but we should somehow record those tests. The simplest thing to do is to write some Python unit tests that take strings and print out the AST. Maybe they don't even need to make assertions?

NOTE: The way I hacked everything together was with pgen2/pgen2-test.sh all. (You can run less with a particular function in that file, like parse-exprs or oil-productions.)

  • Idea: Can we compare against Python somehow? That might come into play more in execution, rather than parsing.

Typing

The whole front end is statically typed with MyPy now. The types/osh-parse.sh script checks it in Travis.

I usually the code working, and then add types. However filling in types first is conceivable. ASDL types map to MyPy types in a straightforward way.

Building

See Contributing, but

build/dev.sh minimal

should be enough (on an Ubuntu/Debian machine).

Clone this wiki locally