| # C Type Names |
| |
| This is implementated in `naming.{h,cc}`. |
| |
| STG does not contain full type names for every type node in the graph. In order |
| to meaningfully describe type changes, STG needs to be able to render C and C++ |
| type names back into source-level syntax. |
| |
| ## Implementation |
| |
| The STG type `Name` is the basic entity representing the human-friendly name of |
| a graph node. Values of this type are computed recursively by `Describe` and are |
| memoised in a `NameCache`. |
| |
| `Name` copes with the inside-out C type syntax in a fairly uniform fashion. |
| There is some slightly special code for `Qualifier` decoration. |
| |
| Infinite recursion prevention in `Describe` is done in the most straightforward |
| fashion possible. One consequence of this is that if there is a naming cycle, |
| the memoised names for nodes in the cycle will depend on where it was entered. |
| |
| The rest of this document covers the concepts and design principles. |
| |
| ## Background |
| |
| In sensible operator grammars, composition can be done using precedence levels. |
| |
| Example with binary operators (there are minor adjustments needed if operators |
| have left or right associativity): |
| |
| **op** | **precedence** |
| ------ | -------------- |
| `+` | 0 |
| `*` | 1 |
| *num* | 2 |
| |
| ```haskell |
| data Expr = Number Int | Times Expr Expr | Plus Expr Expr |
| |
| show_paren p x = if p then "(" ++ x ++ ")" else x |
| |
| shows_prec p (Number n) = show n |
| shows_prec p (Times e1 e2) = show_paren (p > 1) $ shows_prec 2 e1 ++ "*" ++ shows_prec 2 e2 |
| shows_prec p (Plus e1 e2) = show_paren (p > 0) $ shows_prec 1 e1 ++ "+" ++ shows_prec 1 e2 |
| ``` |
| |
| The central idea is that expressions are rendered in the context of a precedence |
| level. Parentheses are needed if the context precedence is higher than the |
| expression's own precedence. Atomic values can be viewed as expressions having |
| maximal precedence. The default precedence context for printing an expression is |
| the minimal one; no parentheses will be emitted. |
| |
| ## The more-than-slightly-bonkers C declaration syntax |
| |
| C's type syntax is closely related to the inside-out declaration syntax it uses |
| and has the same precedence rules. A simplified, partial precedence table for |
| types might look like this. |
| |
| **thing** | **precedence** |
| ------------ | -------------- |
| `int` | 0 |
| `refer *` | 1 |
| `elt[N]` | 2 |
| `ret(args)` | 2 |
| `identifier` | 3 |
| |
| The basic (lowest precedence) elements are: |
| |
| * primitive types, possibly CV-qualified |
| * typedefs, possibly CVR-qualified |
| * struct, union and enum types, possibly CV-qualified |
| |
| The "operators" in increasing precedence level order are: |
| |
| * pointer-to, possibly CVR-qualified |
| * function (return type) and array (element type) |
| |
| The atomic (highest precedence) elements are: |
| |
| * variable names |
| * function names |
| |
| ### CVR-qualifiers |
| |
| The qualifiers `const`, `volatile` and `restrict` appear to the right of the |
| pointer-to operator `*` and are idiomatically placed to the left of the basic |
| elements.[^1] They can be considered as transparent to precedence. |
| |
| [^1]: They can be idiomatically placed to the right, but that's a different |
| idiom. |
| |
| ### User-defined types |
| |
| Struct, union and enum types can be named or anonymous. The normal case is a |
| named type. Anonymous types are given structural descriptions. |
| |
| ### Pointer, array and function types |
| |
| To declare `x` as a pointer to type `t`, we declare the dereferencing of `x` as |
| `t`. |
| |
| ```c++ |
| t * x; |
| ``` |
| |
| To declare `x` as an array of type `t` and size `n`, we declare the `n`th |
| element of `x` as `t`. |
| |
| ```c++ |
| t x[n]; |
| ``` |
| |
| To declare `x` as a function returning type `t` and taking args `n`, we declare |
| the result of applying `x` to `n` as `t`. |
| |
| ```c++ |
| t x(n); |
| ``` |
| |
| The context precedence level for rendering `x`, which may be a complex thing in |
| its own right, is 2 in the case of arrays and functions and 1 in the case of |
| pointers. |
| |
| We need to do things inside-out now, because the outer type is a leaf of the |
| type expression tree. Instead we say `x` has a precedence and the leaf type |
| wraps around it, using parentheses if `x`'s precedence is less than the leaf |
| type (which typically happens if `x` is a pointer type). |
| |
| In each of these cases `x` has been replaced by another type that mentions `y`. |
| |
| ```c++ |
| t * * y; // y is a pointer to pointer to t |
| t * y[m]; // y is an array of pointer to t |
| t * y(m); // y is a function returning pointer to t |
| t (* y)[n]; // y is a pointer to an array of t |
| t y[m][n]; // y is an array of arrays of t |
| t y(m)[n]; // if a function could return an array, this is what it would look like |
| t (* y)(n); // y is a pointer to a function returning t |
| t y[m](n); // if an array could contain functions, this is what it would look like |
| t y(m)(n); // if a function could return a function, this is what it would look like |
| ``` |
| |
| ## Concise and Efficient Pretty Printer (sketch) |
| |
| This builds a type recursively, so that outside bits will be rendered first. The |
| recursion needs to keep track of a left piece, a right piece and the precedence |
| level of the hole in the middle. |
| |
| ```haskell |
| data LR = L | R deriving Eq |
| |
| data Type = Basic String | Ptr Type | Function Type [Type] | Array Type Int | Decl String Type |
| |
| render_final expr = ll ++ rr where |
| (ll, _, rr) = render expr |
| |
| render (Basic name) = (name, 0, "") |
| render (Ptr ref) = add L 1 "*" (render ref) |
| render (Function ret args) = add R 2 ("(" ++ intercalate ", " (map final_render args) ++ ")") (render ret) |
| render (Array elt size) = add R 2 ("[" ++ show size ++ "]") (render elt) |
| render (Decl name t) = add L 3 name (render t) |
| |
| add side prec text (l, p, r) = |
| case side of |
| L -> (ll ++ text, prec, rr) |
| R -> (ll, prec, text ++ rr) |
| where |
| paren = prec < p |
| ll = if paren then l ++ "(" else if side == L then l ++ " " else l |
| rr = if paren then ")" ++ r else r |
| ``` |
| |
| To finally print something, just print the left and right pieces in sequence. |
| |
| The cunning bit about structuring the pretty printer this way is that `render` |
| can be memoised. With the expectation that any given type will appear many times |
| in typical output, this is a big win. |
| |
| NOTE: C type qualifiers are a significant extra complication and are omitted |
| from this sketch. They must appear to the left or right of (the left part of) a |
| type name. Which side can be determined by the current precendence. |
| |
| NOTE: Whitespace can be emitted sparingly as specific side / precedence contexts |
| can imply the impossibility of inadvertently joining two words. |