| # Design notes |
| |
| (I wrote some high-level design observations in |
| [a blog post](https://neugierig.org/software/blog/2022/03/n2.html), whose |
| contents maybe ought to migrate here.) |
| |
| ## Build states |
| |
| While building, we have a bunch of `Build` objects that represent individual |
| build steps that go through a series of states. Here I give a picture about how |
| those states work. |
| |
| Imagine a hypothetical build, represented here by a series of boxes with arrows |
| representing outputs. n2 models both builds and files in its graph because build |
| steps may produce more than one output, so it's more complex than this, but it |
| will do for this discussion. Builds start in the `Unknown` state, just a default |
| state shown here in gray. |
| |
| <img src='build1.png' width='267'> |
| |
| The user requests bringing some build up to date which we mark by the `Want` |
| state, and traverse the dependencies backwards to mark all inputs also as |
| `Want`. |
| |
| <img src='build2.png' width='267'> |
| |
| Any build that doesn't depend on another build is marked `Ready`. |
| |
| <img src='build3.png' width='267'> |
| |
| The main loop pops a `Ready` build and examines its inputs/output to judge |
| whether it needs to be brought up to date. Here, we examined the upper left |
| build and determined it was already up to date and marked it `Done`. |
| |
| For each downstream output of that build, we check whether all the inputs to |
| that output are `Done` and if so, we mark it `Ready`. This makes the build to |
| the right `Ready`. Note that `Ready` means "ready to be checked for |
| up-to-date-ness", and is purely a function of which builds we've checked off and |
| not any on-disk file state. |
| |
| <img src='build4.png' width='267'> |
| |
| In the next iteration of the loop we pop the lower left `Ready` and determine it |
| is out of date. It then moves to state `Queued` and added to the relevant pool. |
| |
| <img src='build5.png' width='267'> |
| |
| Concurrently with visiting `Ready` builds, if we have available execution |
| parallelism, we examine all the pools that have spare depth for any `Queued` |
| builds and start executing those tasks, moving the build to the `Running` state. |
| For example, this build might be in the `console` pool, which has `depth=1` |
| meaning that pool can only run one build step at a time, which means it might |
| remain `Queued` if we're already running a build in that pool. |
| |
| And similarly, if any running builds complete, we move them to the `Done` state |
| and repeat the same logic done above to mark downstream builds `Ready`. |
| |
| There is more to this. For example there's a `Failed` state, which is used in |
| builds with the `-k` flag, that lets us keep running even when some build steps |
| fail. But the above is most of the picture. |
| |
| ## Missing files |
| |
| What happens if a file referenced in a build rule isn't present? |
| |
| For the purpose of ordering: a build is "ready" when all dependent builds have |
| been brought up to date, and that is independent of whether the files are |
| present or not. |
| |
| Ninja was pretty lax about missing files. If one step generated a file and a |
| later step consumed it, nothing verified whether that file was actually |
| produced; Ninja just ran the steps in order. So to follow Ninja, n2 also allows |
| any file required by a build step to be absent as long as it is declared to be |
| generated by another build step. |
| |
| In particular, here are some use cases: |
| |
| - A missing output, after a build runs, is allowed. This is used in build files |
| as a marker for build rules that want to always run. |
| - Build steps that only have order-only dependencies on another step don't need |
| any files on disk for n2 manage their relative order. |
| - Discovered inputs may disappear without breaking builds. Imagine a C file that |
| includes a header. After building, we note that changes to the header should |
| prompt a rebuild. But if you remove the include and the header file at the |
| same time, we should still allow you to build despite the historical |
| dependency. |
| |
| But if any of those files are missing we call the step dirty without bothering |
| to compute a hash. In this manner we never compute a hash involving any missing |
| files. |
| |
| Further, CMake generates Ninja files that claim a build step generates a |
| [depfile](https://ninja-build.org/manual.html#_depfile) when it doesn't. Ninja |
| treats this as an empty depfile, not an error. (See |
| [#80](https://github.com/evmar/n2/issues/80).) |
| |
| ## Parsing |
| |
| Parsing .ninja files is part of the critical path for n2, because it must be |
| complete before any other work can be done. Some properties of the n2 parser |
| that break abstraction to this end: |
| |
| - There is no separate lexer. Ninja syntax is not really lexer friendly in the |
| first place. |
| - When possible, `$variable` expansions happen as they're encountered, so that |
| we don't need to build up a parsed representation of strings and carry around |
| variable environments. |
| - Parsed entities that deal with paths are generic over a `Path` type, and path |
| strings are converted to Paths as they are parsed. (In practice the `Path` |
| type is `graph::FileId`, but the parsing code isn't aware of this type |
| directly.) This (and the previous bullet) allows the parser to reuse a single |
| `String` buffer when parsing paths, which is the bulk of what the parser does. |
| |
| ## Unicode |
| |
| Ninja |
| [was intentionally "encoding agnostic"](https://ninja-build.org/manual.html#ref_lexer), |
| which is to say it treated input build files as any byte stream that is ASCII |
| compatible. In other words, any string of bytes found in a `build.ninja` is |
| passed verbatim through printing to stdout and to the OS for path operations, |
| which meant Ninja was compatible with both UTF-8 and other encodings. The intent |
| is that those other encodings occur on Linuxes, especially in East Asia, and |
| also it means Ninja doesn't need any specific handling of even UTF-8. |
| |
| It looks like since my time, |
| [Ninja on Windows changed its input files to require UTF-8](https://github.com/ninja-build/ninja/pull/1915). |
| As you can see from the comments on that bug, this area is unfortunately pretty |
| subtle. |
| |
| Windows is particularly fiddly not only because its native path representation |
| is UTF-16 -- which is incompatible with the original byte stream assumption made |
| by Ninja and which requires conversions -- but further because Ninja needs to |
| parse the `/showIncludes` output from the MSVC compiler, which is localized. See |
| the `msvc_deps_prefix` variable in |
| [the Ninja docs on deps handling](https://ninja-build.org/manual.html#_deps); |
| there have been lots of bug reports over the years from people with Chinese |
| output that is failing to parse right due to Windows code page mess. |
| |
| In any case, n2 doesn't support any of this for now, and instead just follows |
| Ninja in treating paths as bytes. (n2 doesn't support `/showIncludes` or MSVC at |
| all yet.) |
| |
| It's possibly a better design to require input files to always be UTF-8, though |
| I think I'd want to better understand the `/showIncludes` situation. (The above |
| Ninja bug observed this was a breaking change in Ninja, too.) Possibly we could |
| mandate "input files are UTF-8, and if you need something other than UTF-8 in |
| your `msvc_deps_prefix` it's on you to escape the exact byte sequence you |
| desire". However, I'm not clear on whether all possible Windows paths are |
| representable as UTF-8 or if we'd need to go down the WTF-8 route for full |
| compatibility. |
| |
| ## Tracking build state |
| |
| While building, we have a bunch of `Build` objects that represent individual |
| build steps that go through a series of states. To represent these well I went |
| through a few patterns and eventually came up with a design I'm pretty happy |
| with. |
| |
| First, for each `Build` we store its current state. This lets us quickly answer |
| questions like "is the build id X ready or not?" (You could imagine storing this |
| directly in the `Build` or in a side HashMap from id to state, but that's an |
| implementation detail.) We use this for things like tracking whether we've |
| already visited a given `Build` when doing a traveral of the graph while |
| loading. This also has the benefit of ensuring a given `Build` is always in |
| exactly one known state. |
| |
| Second, we store data structures on the side for states where we care about |
| having quicker views onto this state. The idea here is that depending on the |
| particular needs of a given state we can model those needs specially. For |
| example, we need to be able to grab the next `Ready` build to work on it, so |
| there's a `VecDeque` holding those, while builds that go into the `Queued` state |
| queue into separate run pools, and builds that are `Running` are just tracked |
| with an integer counter on the run pool. |
| |
| ## Spawning subprocesses |
| |
| Ninja (and n2) use `posix_spawn` to spawn subprocesses (on non-Windows). I saw a |
| blog post recently that called `posix_spawn` something like a bloated wrapper |
| for simpler primitives, but another view on it is is that `posix_spawn` has the |
| platform-local libc's author's best knowledge about how to safely run processes. |
| |
| In particular, I was surprised to learn that Rust's built-in process spawning |
| library [leaks file descriptors](https://github.com/evmar/n2/issues/14). (See |
| also [the upstream Rust bug](https://github.com/rust-lang/rust/issues/95584), |
| which says Cargo also ran into this.) |
| |
| ## Subprocess command lines |
| |
| Ninja (and n2) model the commands they execute as strings, in that the build |
| file language has you construct a `command = ...` string. However, on Unixes |
| commands are fundamentally arrays of arguments. To convert a string to an array |
| there must be some sort of parsing, and for this purpose we use `/bin/sh` to |
| execute subcommands. |
| |
| `/bin/sh` is well-specified by POSIX and has the semantics everyone expects for |
| how commands work, from quoting to `$FOO`-style environment expansion to |
| `a && b` job control, pipelines, and so on. And it's also used everywhere on |
| Unixes so it is fast. |
| |
| However it has some downsides, particularly in that it means the person |
| generating the `.ninja` file has to know about these rules as well. For example, |
| consider a build rule like this: |
| |
| ``` |
| build foo$ bar baz: ... |
| command = pummel $out |
| ``` |
| |
| Per the Ninja syntax rules (which are _not_ the shell rules), that build has two |
| outputs, the files `foo bar` and `baz`. The Ninja variable `$out` then expands |
| to the string `foo bar baz`, and once the shell parses the command line the |
| `pummel` command receives three arguments in argv: `foo`, `bar`, and `baz`, |
| which is not what you wanted. |
| |
| For these kinds of problems there are further shell tricks around handling |
| spaces like how the string `"$@"` is expanded. In Ninja's case, we have the |
| escape hatch that the `.ninja` file author can just provide the extra text |
| explicitly, like in |
| [this bug report and workaround](https://github.com/ninja-build/ninja/issues/1312). |
| It's pretty unsatisfying though. |
| |
| In principle, it might be a better design for the Ninja language to instead have |
| an array type that would allow us to construct an argv array directly. This |
| would avoid shell-related quoting issues and save us a `/bin/sh` invocation on |
| each process spawn. However, this is a pretty large departure from how Ninja |
| currently works. |
| |
| Additionally, on Windows the platform behavior is reversed. On Windows command |
| lines are fundamentally strings, and it's up to each executable to interpret |
| those strings as they want. (Does this seem like a recipe for bugs? Yes it does. |
| See [this blog post](https://neugierig.org/software/blog/2011/10/quoting.html)'s |
| "Command lines" section for more on this.) So even if Ninja did work with arrays |
| we'd then need some sort of stringification for Windows. |
| |
| On the subject of Windows: Ninja (and n2) spawn processes directly without an |
| intervening shell. This has the consequence that commands like |
| `my-compiler && echo hello` do not what you'd expect; the opposite of the Ninja |
| behavior on Unixes. The `build.ninja` author instead has to themselves write out |
| `cmd /c my-command` if they want the shell involved. We did this only because |
| (in my recollection) the overhead of spawning `cmd` on Windows was noticeable. |
| And further, `cmd` randomly limits its argument to 8kb. |
| |
| PS: in writing this section I noticed that cmd has |
| [terrifying quoting rules](https://stackoverflow.com/questions/355988/how-do-i-deal-with-quote-characters-when-using-cmd-exe). |
| |
| ## Variable scope |
| |
| Ninja syntactic structures (`build`, `rule`) are at some level just lists of |
| key-value bindings that ultimately combine to set properties on individual build |
| steps, such as the `command = ...` command line. |
| |
| Additionally, bindings can be referenced by other bindings via the `$foo` or |
| `${foo}` syntax. This means variable lookup traverses through a hierarchy of |
| scopes. The intent was this was simple enough that the behavior is |
| straightforward, which in retrospect's insight really just means |
| "underspecified". (Forgive me! I hacked Ninja together in a few weekends and |
| made the all too easy mistake of "this is so simple I don't really need to think |
| it through".) |
| |
| ### Basics |
| |
| First, here's a high level description that conveys the idea but omits details. |
| |
| Consider a build file like the following: |
| |
| ``` |
| var = $A |
| |
| rule r |
| command = $B |
| |
| build output-$C: r |
| var2 = $D |
| ``` |
| |
| The `build` block stamps out a build step using the rule `r`, and the properties |
| of that build step are found by looking up attributes like `command`. |
| |
| When evaluating the expressions marked `$A`/`$B`/`$C`/`$D` above, these are the |
| scopes to consider, in order of nesting: |
| |
| 1. The toplevel scope, which defines `var` above. |
| 1. The build variable scope, which defines `var2` above. |
| 1. The build file list scope, which defines the implicit `$in` etc. variables. |
| 1. The rule scope, which defines `command` above. |
| |
| References found in each may refer to the scopes above in the list. For example, |
| at the time the `command = ...` is evaluated, the in-scope variables include |
| `$in`, `$var2`, and `$var`. |
| |
| ### Details |
| |
| Unfortunately, [Hyrum's Law](https://www.hyrumslaw.com/) means that Ninja files |
| in the wild depend on whatever the Ninja implementation allows, which is more |
| complex than the above. I don't think it's really worth fleshing out all the |
| idiosyncracies of Ninja's implementation as long as existing builds work, but |
| some details do matter. |
| |
| In particular, Ninja has particular behaviors around variable references found |
| within the same scope. Consider: |
| |
| ``` |
| var = 1 |
| var = ${var}2 |
| rule ... |
| command = echo $depfile |
| depfile = abc |
| ``` |
| |
| In Ninja, the value of `var` is `12`: the assignment proceeds from top down. But |
| within a `rule` block, the variable lookup of `$depfile` instead refers forward |
| to `abc`, which means there is a possibility of circular references, along with |
| logic that attempts to detect and warn in those cases(!). |