(I wrote some high-level design observations in a blog post, whose contents maybe ought to migrate here.)
While building, we have a bunch of Build
objects that represent individual build steps that go through a series of states. Here I give a picture about how those states work.
Imagine a hypothetical build, represented here by a series of boxes with arrows representing outputs. n2 models both builds and files in its graph because build steps may produce more than one output, so it's more complex than this, but it will do for this discussion. Builds start in the Unknown
state, just a default state shown here in gray.
The user requests bringing some build up to date which we mark by the Want
state, and traverse the dependencies backwards to mark all inputs also as Want
.
Any build that doesn't depend on another build is marked Ready
.
The main loop pops a Ready
build and examines its inputs/output to judge whether it needs to be brought up to date. Here, we examined the upper left build and determined it was already up to date and marked it Done
.
For each downstream output of that build, we check whether all the inputs to that output are Done
and if so, we mark it Ready
. This makes the build to the right Ready
. Note that Ready
means “ready to be checked for up-to-date-ness”, and is purely a function of which builds we've checked off and not any on-disk file state.
In the next iteration of the loop we pop the lower left Ready
and determine it is out of date. It then moves to state Queued
and added to the relevant pool.
Concurrently with visiting Ready
builds, if we have available execution parallelism, we examine all the pools that have spare depth for any Queued
builds and start executing those tasks, moving the build to the Running
state. For example, this build might be in the console
pool, which has depth=1
meaning that pool can only run one build step at a time, which means it might remain Queued
if we're already running a build in that pool.
And similarly, if any running builds complete, we move them to the Done
state and repeat the same logic done above to mark downstream builds Ready
.
There is more to this. For example there's a Failed
state, which is used in builds with the -k
flag, that lets us keep running even when some build steps fail. But the above is most of the picture.
What happens if a file referenced in a build rule isn't present?
For the purpose of ordering: a build is “ready” when all dependent builds have been brought up to date, and that is independent of whether the files are present or not.
Ninja was pretty lax about missing files. If one step generated a file and a later step consumed it, nothing verified whether that file was actually produced; Ninja just ran the steps in order. So to follow Ninja, n2 also allows any file required by a build step to be absent as long as it is declared to be generated by another build step.
In particular, here are some use cases:
But if any of those files are missing we call the step dirty without bothering to compute a hash. In this manner we never compute a hash involving any missing files.
Further, CMake generates Ninja files that claim a build step generates a depfile when it doesn't. Ninja treats this as an empty depfile, not an error. (See #80.)
Parsing .ninja files is part of the critical path for n2, because it must be complete before any other work can be done. Some properties of the n2 parser that break abstraction to this end:
$variable
expansions happen as they‘re encountered, so that we don’t need to build up a parsed representation of strings and carry around variable environments.Path
type, and path strings are converted to Paths as they are parsed. (In practice the Path
type is graph::FileId
, but the parsing code isn't aware of this type directly.) This (and the previous bullet) allows the parser to reuse a single String
buffer when parsing paths, which is the bulk of what the parser does.Ninja was intentionally “encoding agnostic”, which is to say it treated input build files as any byte stream that is ASCII compatible. In other words, any string of bytes found in a build.ninja
is passed verbatim through printing to stdout and to the OS for path operations, which meant Ninja was compatible with both UTF-8 and other encodings. The intent is that those other encodings occur on Linuxes, especially in East Asia, and also it means Ninja doesn't need any specific handling of even UTF-8.
It looks like since my time, Ninja on Windows changed its input files to require UTF-8. As you can see from the comments on that bug, this area is unfortunately pretty subtle.
Windows is particularly fiddly not only because its native path representation is UTF-16 -- which is incompatible with the original byte stream assumption made by Ninja and which requires conversions -- but further because Ninja needs to parse the /showIncludes
output from the MSVC compiler, which is localized. See the msvc_deps_prefix
variable in the Ninja docs on deps handling; there have been lots of bug reports over the years from people with Chinese output that is failing to parse right due to Windows code page mess.
In any case, n2 doesn‘t support any of this for now, and instead just follows Ninja in treating paths as bytes. (n2 doesn’t support /showIncludes
or MSVC at all yet.)
It‘s possibly a better design to require input files to always be UTF-8, though I think I’d want to better understand the /showIncludes
situation. (The above Ninja bug observed this was a breaking change in Ninja, too.) Possibly we could mandate “input files are UTF-8, and if you need something other than UTF-8 in your msvc_deps_prefix
it's on you to escape the exact byte sequence you desire”. However, I‘m not clear on whether all possible Windows paths are representable as UTF-8 or if we’d need to go down the WTF-8 route for full compatibility.
While building, we have a bunch of Build
objects that represent individual build steps that go through a series of states. To represent these well I went through a few patterns and eventually came up with a design I'm pretty happy with.
First, for each Build
we store its current state. This lets us quickly answer questions like “is the build id X ready or not?” (You could imagine storing this directly in the Build
or in a side HashMap from id to state, but that‘s an implementation detail.) We use this for things like tracking whether we’ve already visited a given Build
when doing a traveral of the graph while loading. This also has the benefit of ensuring a given Build
is always in exactly one known state.
Second, we store data structures on the side for states where we care about having quicker views onto this state. The idea here is that depending on the particular needs of a given state we can model those needs specially. For example, we need to be able to grab the next Ready
build to work on it, so there's a VecDeque
holding those, while builds that go into the Queued
state queue into separate run pools, and builds that are Running
are just tracked with an integer counter on the run pool.
Ninja (and n2) use posix_spawn
to spawn subprocesses (on non-Windows). I saw a blog post recently that called posix_spawn
something like a bloated wrapper for simpler primitives, but another view on it is is that posix_spawn
has the platform-local libc‘s author’s best knowledge about how to safely run processes.
In particular, I was surprised to learn that Rust's built-in process spawning library leaks file descriptors. (See also the upstream Rust bug, which says Cargo also ran into this.)
Ninja (and n2) model the commands they execute as strings, in that the build file language has you construct a command = ...
string. However, on Unixes commands are fundamentally arrays of arguments. To convert a string to an array there must be some sort of parsing, and for this purpose we use /bin/sh
to execute subcommands.
/bin/sh
is well-specified by POSIX and has the semantics everyone expects for how commands work, from quoting to $FOO
-style environment expansion to a && b
job control, pipelines, and so on. And it's also used everywhere on Unixes so it is fast.
However it has some downsides, particularly in that it means the person generating the .ninja
file has to know about these rules as well. For example, consider a build rule like this:
build foo$ bar baz: ... command = pummel $out
Per the Ninja syntax rules (which are not the shell rules), that build has two outputs, the files foo bar
and baz
. The Ninja variable $out
then expands to the string foo bar baz
, and once the shell parses the command line the pummel
command receives three arguments in argv: foo
, bar
, and baz
, which is not what you wanted.
For these kinds of problems there are further shell tricks around handling spaces like how the string "$@"
is expanded. In Ninja‘s case, we have the escape hatch that the .ninja
file author can just provide the extra text explicitly, like in this bug report and workaround. It’s pretty unsatisfying though.
In principle, it might be a better design for the Ninja language to instead have an array type that would allow us to construct an argv array directly. This would avoid shell-related quoting issues and save us a /bin/sh
invocation on each process spawn. However, this is a pretty large departure from how Ninja currently works.
Additionally, on Windows the platform behavior is reversed. On Windows command lines are fundamentally strings, and it's up to each executable to interpret those strings as they want. (Does this seem like a recipe for bugs? Yes it does. See this blog post‘s “Command lines” section for more on this.) So even if Ninja did work with arrays we’d then need some sort of stringification for Windows.
On the subject of Windows: Ninja (and n2) spawn processes directly without an intervening shell. This has the consequence that commands like my-compiler && echo hello
do not what you'd expect; the opposite of the Ninja behavior on Unixes. The build.ninja
author instead has to themselves write out cmd /c my-command
if they want the shell involved. We did this only because (in my recollection) the overhead of spawning cmd
on Windows was noticeable. And further, cmd
randomly limits its argument to 8kb.
PS: in writing this section I noticed that cmd has terrifying quoting rules.
Ninja syntactic structures (build
, rule
) are at some level just lists of key-value bindings that ultimately combine to set properties on individual build steps, such as the command = ...
command line.
Additionally, bindings can be referenced by other bindings via the $foo
or ${foo}
syntax. This means variable lookup traverses through a hierarchy of scopes. The intent was this was simple enough that the behavior is straightforward, which in retrospect‘s insight really just means “underspecified”. (Forgive me! I hacked Ninja together in a few weekends and made the all too easy mistake of "this is so simple I don’t really need to think it through".)
First, here's a high level description that conveys the idea but omits details.
Consider a build file like the following:
var = $A rule r command = $B build output-$C: r var2 = $D
The build
block stamps out a build step using the rule r
, and the properties of that build step are found by looking up attributes like command
.
When evaluating the expressions marked $A
/$B
/$C
/$D
above, these are the scopes to consider, in order of nesting:
var
above.var2
above.$in
etc. variables.command
above.References found in each may refer to the scopes above in the list. For example, at the time the command = ...
is evaluated, the in-scope variables include $in
, $var2
, and $var
.
Unfortunately, Hyrum's Law means that Ninja files in the wild depend on whatever the Ninja implementation allows, which is more complex than the above. I don‘t think it’s really worth fleshing out all the idiosyncracies of Ninja's implementation as long as existing builds work, but some details do matter.
In particular, Ninja has particular behaviors around variable references found within the same scope. Consider:
var = 1 var = ${var}2 rule ... command = echo $depfile depfile = abc
In Ninja, the value of var
is 12
: the assignment proceeds from top down. But within a rule
block, the variable lookup of $depfile
instead refers forward to abc
, which means there is a possibility of circular references, along with logic that attempts to detect and warn in those cases(!).