| # Chapter 1: Toy Language and AST |
| |
| [TOC] |
| |
| ## The Language |
| |
| This tutorial will be illustrated with a toy language that we’ll call “Toy” |
| (naming is hard...). Toy is a tensor-based language that allows you to define |
| functions, perform some math computation, and print results. |
| |
| Given that we want to keep things simple, the codegen will be limited to tensors |
| of rank <= 2, and the only datatype in Toy is a 64-bit floating point type (aka |
| ‘double’ in C parlance). As such, all values are implicitly double precision, |
| `Values` are immutable (i.e. every operation returns a newly allocated value), |
| and deallocation is automatically managed. But enough with the long description; |
| nothing is better than walking through an example to get a better understanding: |
| |
| ```toy |
| def main() { |
| # Define a variable `a` with shape <2, 3>, initialized with the literal value. |
| # The shape is inferred from the supplied literal. |
| var a = [[1, 2, 3], [4, 5, 6]]; |
| |
| # b is identical to a, the literal tensor is implicitly reshaped: defining new |
| # variables is the way to reshape tensors (element count must match). |
| var b<2, 3> = [1, 2, 3, 4, 5, 6]; |
| |
| # transpose() and print() are the only builtin, the following will transpose |
| # a and b and perform an element-wise multiplication before printing the result. |
| print(transpose(a) * transpose(b)); |
| } |
| ``` |
| |
| Type checking is statically performed through type inference; the language only |
| requires type declarations to specify tensor shapes when needed. Functions are |
| generic: their parameters are unranked (in other words, we know these are |
| tensors, but we don't know their dimensions). They are specialized for every |
| newly discovered signature at call sites. Let's revisit the previous example by |
| adding a user-defined function: |
| |
| ```toy |
| # User defined generic function that operates on unknown shaped arguments. |
| def multiply_transpose(a, b) { |
| return transpose(a) * transpose(b); |
| } |
| |
| def main() { |
| # Define a variable `a` with shape <2, 3>, initialized with the literal value. |
| var a = [[1, 2, 3], [4, 5, 6]]; |
| var b<2, 3> = [1, 2, 3, 4, 5, 6]; |
| |
| # This call will specialize `multiply_transpose` with <2, 3> for both |
| # arguments and deduce a return type of <3, 2> in initialization of `c`. |
| var c = multiply_transpose(a, b); |
| |
| # A second call to `multiply_transpose` with <2, 3> for both arguments will |
| # reuse the previously specialized and inferred version and return <3, 2>. |
| var d = multiply_transpose(b, a); |
| |
| # A new call with <3, 2> (instead of <2, 3>) for both dimensions will |
| # trigger another specialization of `multiply_transpose`. |
| var e = multiply_transpose(c, d); |
| |
| # Finally, calling into `multiply_transpose` with incompatible shape will |
| # trigger a shape inference error. |
| var f = multiply_transpose(transpose(a), c); |
| } |
| ``` |
| |
| ## The AST |
| |
| The AST from the above code is fairly straightforward; here is a dump of it: |
| |
| ``` |
| Module: |
| Function |
| Proto 'multiply_transpose' @test/Examples/Toy/Ch1/ast.toy:4:1' |
| Params: [a, b] |
| Block { |
| Return |
| BinOp: * @test/Examples/Toy/Ch1/ast.toy:5:25 |
| Call 'transpose' [ @test/Examples/Toy/Ch1/ast.toy:5:10 |
| var: a @test/Examples/Toy/Ch1/ast.toy:5:20 |
| ] |
| Call 'transpose' [ @test/Examples/Toy/Ch1/ast.toy:5:25 |
| var: b @test/Examples/Toy/Ch1/ast.toy:5:35 |
| ] |
| } // Block |
| Function |
| Proto 'main' @test/Examples/Toy/Ch1/ast.toy:8:1' |
| Params: [] |
| Block { |
| VarDecl a<> @test/Examples/Toy/Ch1/ast.toy:11:3 |
| Literal: <2, 3>[ <3>[ 1.000000e+00, 2.000000e+00, 3.000000e+00], <3>[ 4.000000e+00, 5.000000e+00, 6.000000e+00]] @test/Examples/Toy/Ch1/ast.toy:11:11 |
| VarDecl b<2, 3> @test/Examples/Toy/Ch1/ast.toy:15:3 |
| Literal: <6>[ 1.000000e+00, 2.000000e+00, 3.000000e+00, 4.000000e+00, 5.000000e+00, 6.000000e+00] @test/Examples/Toy/Ch1/ast.toy:15:17 |
| VarDecl c<> @test/Examples/Toy/Ch1/ast.toy:19:3 |
| Call 'multiply_transpose' [ @test/Examples/Toy/Ch1/ast.toy:19:11 |
| var: a @test/Examples/Toy/Ch1/ast.toy:19:30 |
| var: b @test/Examples/Toy/Ch1/ast.toy:19:33 |
| ] |
| VarDecl d<> @test/Examples/Toy/Ch1/ast.toy:22:3 |
| Call 'multiply_transpose' [ @test/Examples/Toy/Ch1/ast.toy:22:11 |
| var: b @test/Examples/Toy/Ch1/ast.toy:22:30 |
| var: a @test/Examples/Toy/Ch1/ast.toy:22:33 |
| ] |
| VarDecl e<> @test/Examples/Toy/Ch1/ast.toy:25:3 |
| Call 'multiply_transpose' [ @test/Examples/Toy/Ch1/ast.toy:25:11 |
| var: b @test/Examples/Toy/Ch1/ast.toy:25:30 |
| var: c @test/Examples/Toy/Ch1/ast.toy:25:33 |
| ] |
| VarDecl f<> @test/Examples/Toy/Ch1/ast.toy:28:3 |
| Call 'multiply_transpose' [ @test/Examples/Toy/Ch1/ast.toy:28:11 |
| Call 'transpose' [ @test/Examples/Toy/Ch1/ast.toy:28:30 |
| var: a @test/Examples/Toy/Ch1/ast.toy:28:40 |
| ] |
| var: c @test/Examples/Toy/Ch1/ast.toy:28:44 |
| ] |
| } // Block |
| ``` |
| |
| You can reproduce this result and play with the example in the |
| `examples/toy/Ch1/` directory; try running `path/to/BUILD/bin/toyc-ch1 |
| test/Examples/Toy/Ch1/ast.toy -emit=ast`. |
| |
| The code for the lexer is fairly straightforward; it is all in a single header: |
| `examples/toy/Ch1/include/toy/Lexer.h`. The parser can be found in |
| `examples/toy/Ch1/include/toy/Parser.h`; it is a recursive descent parser. If |
| you are not familiar with such a Lexer/Parser, these are very similar to the |
| LLVM Kaleidoscope equivalent that are detailed in the first two chapters of the |
| [Kaleidoscope Tutorial](https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl02.html). |
| |
| The [next chapter](Ch-2.md) will demonstrate how to convert this AST into MLIR. |