# The Cargo Vet Algorithm | |
The heart of `vet` is the "[resolver](https://github.com/mozilla/cargo-vet/blob/main/src/resolver.rs)" which takes in your build graph and your supply_chain dir, and determines whether `vet check` should pass. | |
If `check` fails, it tries to determine the reason for that failure (which as we'll see is a non-trivial question). If you request a `suggest` it will then try to suggest "good" audits that will definitely satisfy `check` (which is again non-trivial). | |
These results are a basic building block that most other commands will defer to: | |
* `vet check` (the command run with bare `vet`) is just this operation | |
* `vet suggest` is this operation with all suggestable exemptions deleted | |
* `vet certify` fills in any unspecified information using this operation | |
* `vet regenerate` generally uses this operation to know what to do | |
For the sake of clarity, this chapter will also include some discussion of "initialization" which gathers up the input state that the resolver needs. | |
## Initialization Steps | |
This phase is generally just a bunch of loading, parsing, and validating. Different commands | |
may vary slightly in how they do these steps, as they may implicitly be --locked or --frozen, | |
or want to query hypothetical states. | |
1. Acquire the build graph ([cargo metadata][] via the [cargo_metadata][] crate) | |
2. Acquire the store (`supply_chain`) (load, parse, validate) | |
3. Update the imports (fetch, parse, validate) | |
4. Check `audit-as-crates-io` (check against local cargo registry) | |
## Resolve Steps | |
These are the logical steps of the resolver, although they are more interleaved than this | |
initial summary implies: | |
1. Build data structures | |
1. Construct the `DepGraph` | |
2. Construct the `CriteriaMapper` | |
2. Determine the required criteria for each package | |
1. Apply requirements for dev-dependencies | |
2. Propagate policy requirements from roots out to leaves | |
3. Resolve the validated criteria for each third party (crates.io) package | |
1. Construct the `AuditGraphs` for each package (and check violations) | |
2. Search for paths in the audit graph validating each requirement | |
4. Check if each crate validates for the required criteria | |
1. Record caveats which were required in order to satisfy these criteria | |
5. Suggest audits to fix leaf failures (the dance of a thousand diffs) | |
Here in all of its glory is the entirety of the resolver algorithm today in | |
abbreviated pseudo-rust. Each of these steps will be elaborated on in the | |
subsequent sections. | |
```rust ,ignore | |
// Step 1a: Build the DepGraph | |
let graph = DepGraph::new(..); | |
// Step 1b: Build the CriteriaMapper | |
let mapper = CriteriaMapper::new(..); | |
// Step 2: Determine the required criteria for each package | |
let requirements = resolve_requirements(..); | |
// Step 3: Resolve the validated criteria for each third-party package | |
for package in &graph.nodes { | |
if !package.is_third_party { | |
continue; | |
} | |
// Step 3a: Construct the AuditGraph for each package | |
let audit_graph = AuditGraph::build(..); | |
// Step 3b: Search for paths in the audit graph validating each requirement | |
let search_results = all_criteria.map(|criteria| audit_graph.search(criteria, ..)); | |
// Step 4: Check if the crate validates for the required criteria | |
for criteria in requirements[package] { | |
match &search_results[criteria] { | |
.. | |
} | |
} | |
} | |
// If there were any conflicts with violation entries, bail! | |
if !violations.is_empty() { | |
return ResolveReport { conclusion: Conclusion::FailForViolationConflict(..), .. }; | |
} | |
// If there were no failures, we're done! | |
if failures.is_empty() { | |
return ResolveReport { conclusion: Conclusion::Success(..), .. }; | |
} | |
// Step 5: Suggest time! Compute the simplest audits to fix the failures! | |
let suggest = compute_suggest(..); | |
return ResolveReport { conclusion: Conclusion::FailForVet(..), .. }; | |
``` | |
As we determine the required criteria in an separate pass, all analysis after | |
that point can be performed in any order. Requirements analysis starts on root | |
nodes and is propagated downwards to leaf nodes. | |
# Step 1a: The DepGraph (Processing Cargo Metadata) | |
All of our analysis derives from the output of [cargo metadata][] and our | |
interpretation of that, so it's worth discussing how we use it, and what we | |
believe to be true of its output. | |
Our interpretation of the metadata is the DepGraph. You can dump the DepGraph with | |
`cargo vet dump-graph`. Most commands take a `--filter-graph` argument which will | |
force us to discard certain parts of the DepGraph before performing the operation | |
of the command. This can be useful for debugging issues, but we recommend only doing | |
this while `--locked` to avoid corrupting your store. | |
By default we run `cargo metadata --locked --all-features`. If you pass `--locked` to vet, | |
we will instead pass `--frozen` to `cargo metadata`. `--all-features` can be negated | |
by passing `--no-all-features` to vet. We otherwise expose the usual feature flags of | |
cargo directly. | |
The reason we pass `--all-features` is because we want the "maximal" build graph, which | |
all "real" builds are simply a subset of. Cargo metadata in general provides this, but | |
will omit optional dependencies that are locked behind disabled features. By enabling them all, | |
we should get every possible dependency for every possible feature and platform. | |
By validating that the maximal build graph is vetted, all possible builds should in turn | |
be vetted, because they are simply subsets of that graph. | |
Cargo metadata produces the build graph in a kind of awkward way where some information | |
for the packages is in `"packages"` and some information is in `"resolve"`, and we need | |
to manually compute lots of facts like "roots", "only for tests", and "[topological sort][]" | |
(metadata has a notion of roots, but it's not what you think, and mostly reflects an | |
internal concept of cargo that isn't useful to us). | |
If we knew about it at the time we might have used [guppy][] to handle interpretting | |
cargo metadata's results. As it stands, we've hand-rolled all that stuff. | |
Cargo metadata largely uses [PackageId][]s as primary keys for identifying a package | |
in your build, and we largely agree with that internally, but some human-facing interfaces | |
like audits also treat (PackageName, [Version][]) as a valid key. This is a true | |
statement on crates.io itself, but may not hold when you include unpublished packages, | |
patches/renames(?), or third party registries. We don't really have a solid disambiguation | |
strategy at the moment, we just assume it doesn't happen and don't worry about it. | |
The resolver primarily use a PackageIdx as a primary key for packages, which is an interned PackageId. | |
The DepGraph holds this interner. | |
## Dealing With Cycles From Tests | |
The resolver assumes the maximal graph is a [DAG][], which is an almost true statement | |
that we can make true with a minor desugaring of the graph. There is only one situation | |
where the cargo build graph is not a DAG: the tests for a crate. This can happen very | |
easily, and is kind of natural, but also very evil when you first learn about it. | |
As a concrete example, there is kind of a conceptual cycle between [serde](https://github.com/serde-rs/serde/blob/master/serde/Cargo.toml) and [serde_derive](https://github.com/serde-rs/serde/blob/master/serde_derive/Cargo.toml). However serde_derive is a standalone crate, and serde (optionally) | |
pulls in serde_derive as a dependency... unless you're testing serde_derive, and then serde_derive | |
quite reasonably depends on serde to test its output, creating a cyclic dependency on itself! | |
The way to resolve this monstrosity is to realize that the *tests* for serde_derive are actually | |
a different package from serde_derive, which we call serde_derive_dev (because cargo calls test | |
edges "dev dependencies"). So although the graph reported by cargo_metadata looks like a cycle: | |
``` | |
serde <-----+ | |
| | | |
| | | |
+--> serde_derive | |
``` | |
In actuality, serde_derive_dev breaks the cycle and creates a nice clean DAG: | |
``` | |
+--serde_derive_dev ---+ | |
| | | | |
v | v | |
serde | test_only_dep | |
| | | | |
| v ... | |
+--> serde_derive | |
``` | |
There is a subtle distinction to be made here for packages *only* used for tests: | |
these wouldn't be part of the build graph without dev-dependencies (dev edges) but | |
they are still "real" nodes, and all of their dependencies are "real" and still | |
must form a proper DAG. The only packages which can have cycle-causing dev-dependencies, | |
and therefore require a desugaring to produce "fake" nodes, are *workspace members*. | |
These are the packages that will be tested if you run `cargo test --workspace`. | |
Actually doing this desugaring is really messy, because a lot of things about the "real" | |
node are still true about the "fake" node, and we generally want to talk about the "real" | |
node and the "fake" node as if they were one thing. So we actually just analyze the build graph | |
in two steps. To understand how this works, we need to first look at how DAGs are analyzed. | |
Any analysis on a [DAG][] generally starts with a [topological sort][], which is just a fancy way of saying you do depth-first-search ([DFS][]) on every root and only use a node only after you've searched all its children (this is the post-order, for graph people). Note that each iteration of DFS reuses the | |
"visited" from the previous iterations, because we only want to visit each node once. | |
Also note that knowing the roots is simply an optimization, you can just run DFS on every node and you will get a valid topological order -- we run it for all the workspace members, which includes all of | |
the roots, but none of the test-only packages, which will be useful for identifying test-only packages | |
when we get to our desugaring. (You may have workspace members which in fact are only for testing, | |
but as far as `vet` is concerned those are proper packages in their own right -- those packages are | |
however good candidates for a `safe-to-run` policy override.) | |
The key property of a DAG is that if you visit every node in a topological | |
order, then all the transitive dependencies of a node will be visited before it. | |
You can use this fact to compute any property of a node which recursively | |
depends on the properties of its dependencies. More plainly, you can just have a | |
for-loop that computes the properties of each node, and blindly assume that any | |
query about your dependencies will have its results already computed. Nice! | |
In our algorithm, however, we actually visit in reverse-topological order, so | |
that we know all reverse-dependencies of a node will be visited before it. This | |
is because criteria requirements are inherited by reverse-dependency, (or pushed | |
out from a crate to its dependencies). | |
With that established, here is the *actual* approach we use to emulate the "fake" node desugaring: | |
1. analyze the build graph without dev deps (edges), which is definitely a DAG | |
2. add back the dev deps and reprocess all the nodes as if they were the "fake" node | |
The key insight to this approach is that the implicit dev nodes are all roots -- nothing | |
depends on them. As a result, adding these nodes can't change which packages the "real" | |
nodes depend on, and any analysis done on them is valid without the dev edges! | |
When doing the topological sort, because we only run DFS from workspace members, | |
the result of this is that we will visit all the nodes that are part of a "real" build | |
in the first pass, and then the test-only packages in the second pass. This makes computing | |
"test only" packages a convenient side-effect of the topological sort. Hopefully it's clear | |
to you that the resulting ordering functions as a topological sort as long as our recrusive | |
analyses take the form of two loops as so: | |
``` | |
for node in topological_sort: | |
analysis_that_DOESNT_query_dev_dependencies(node) | |
for node in topological_sort: | |
analysis_that_CAN_query_dev_dependencies(node) | |
``` | |
The second loop is essentially handling all the "fake" dev nodes. | |
Note that when we run this in a reversed manner to ensure that | |
reverse-dependencies have been checked before a crate is visited, we need to do | |
the dev-dependency analysis first, as the dev-dependency "fake" nodes are | |
effectively appended to the topological sort. | |
## The DepGraph's Contents | |
The hardest task of the DepGraph is computing the topological sort of the packages as | |
described in the previous section, but it also computes the following facts for each package | |
(node): | |
* [PackageId][] (primary key) | |
* [Version][] | |
* name | |
* is_third_party (is_crates_io) | |
* is_root | |
* is_workspace_member | |
* is_dev_only | |
* normal_deps | |
* build_deps | |
* dev_deps | |
* reverse_deps | |
Whether a package is third party is deferred to [cargo_metadata][]'s [is_crates_io][] method | |
but overrideable by `audit-as-crates-io` in config.toml. This completely changes how the | |
resolver handles validating criteria for that package. Packages which aren't third party | |
are referred to as "first party". | |
Roots are simply packages which have no reverse-deps, which matters because those will | |
implicitly be required to pass the default root policy (safe-to-deploy) if no other policy | |
is specified for them. | |
Workspace members must pass a dev-policy check, which is the only place where | |
we query dev-dependencies (in the fabled "second pass" from the previous section). | |
Dev-only packages are only used in tests, and therefore will only by queried in | |
dev-policy checks (and so by default only need to be safe-to-run). | |
# Step 1b: The CriteriaMapper | |
The CriteriaMapper handles the process of converting between criteria names and | |
CriteriaIndices. It's basically an interner, but made more complicated by the existence | |
of builtins, imports, and "implies" relationships. | |
The resolver primarily operates on CriteriaSets, which are sets of CriteriaIndices. | |
The purpose of this is to try to handle all the subtleties of criteria in one place | |
to avoid bugs, and to make everything more efficient. | |
Most of the resolver's operations are things like "union these criteria sets" or | |
"check if this criteria set is a superset of the required one". | |
There is currently an artificial maximum limit of 64 criteria for you and all | |
your imports to make CriteriaSets efficient (they're just a u64 internally). | |
The code is designed to allow this limit to be easily raised if anyone ever hits | |
it (either with a u128 or a proper BitSet). | |
Imported criteria are pre-mapped onto local criteria while acquiring the store, | |
by using a CriteriaMapper in the imported namespace to determine implied | |
criteria, and then applying the mappings specified in the `criteria-map` to | |
determine the corresponding local criteria. This avoids worrying about imported | |
namespaces when running the actual resolver, and helps avoid potential issues | |
with large numbers of criteria. | |
The biggest complexity of this process is handling "implies". This makes a | |
criteria like safe-to-deploy *actually* safe-to-deploy AND safe-to-run in most | |
situations. The CriteriaMapper will precompute the [transitive closure][] of | |
implies relationships for each criteria as a CriteriaSet. When mapping the name | |
of a criteria to CriteriaIndices, this CriteriaSet is the thing returned. | |
When mapping a criteria set to a list of criteria names, we will elide implied criteria | |
(so a `["safe-to-deploy", "safe-to-run"]` will just be `["safe-to-deploy"]`). | |
## Computing The Transitive Closure of Criteria | |
The [transitive closure][] of a criteria is the CriteriaSet that would result if you | |
add the criteria itself, and every criteria that implies, and every criteria THEY imply, | |
and so on. This resulting CriteriaSet is effectively the "true" value of a criteria. | |
We do this by constructing a directed "criteria graph" where an "implies" is an edge. | |
The transitive closure for each criteria can then be computed by running depth-first-search | |
([DFS][]) on that node, and adding every reachable node to the CriteriaSet. | |
That's it! | |
Being able to precompute the transitive closure massively simplifies the resolver, | |
as it means we never have to re-evaulate the implies relationships when unioning | |
CriteriaSets, making potentially O(n<sup>3</sup>) operations into constant time ones, | |
where n is the number of criteria (the criteria graph can have O(n<sup>2</sup>) criteria, | |
and a criteria set can have O(n) criteria, and we might have to look at every edge of | |
the graph for every criteria whenever we add one). | |
The *existence* of the transitive closure is however not a fundamental truth. It | |
exists because we have artifically limited what import maps and implies is allowed to | |
do. In particular, if you ever allowed an implies relationship that requires | |
*two different criteria* to imply another, the transitive closure would not be | |
a useful concept, and we'd be forced to re-check every implies rule whenever | |
a criteria got added to a criteria set (which is happening constantly in the resolver). | |
[See this issue for a detailed example demonstrating this problem](https://github.com/mozilla/cargo-vet/issues/240). | |
# Step 2: Determine the required criteria for each package | |
In general, every package requires that all dependencies satisfy the same | |
criteria which were required for the original package. This is handled by | |
starting at the root crates, and propagating the required `CriteriaSet` outwards | |
towards the leaves. In some cases, the `policy` table will specify alternative | |
criteria to place as a requirement on dependencies, which will be used instead | |
of normal propagation. | |
In order to avoid the cyclic nature of dev-deps, these targets are handled | |
first. As all dependencies of dev-dependencies are normal dependencies, we can | |
rely on the normal non-cyclic requirement propagation after the first edge, so | |
we only need to apply the requirements one-level deep in this first phase. By | |
default, this requirement is `safe-to-run`, though it cna be customized through | |
the `policy`. | |
Afterwards, we start at the root crate in the graph and work outwards, checking | |
if we need to apply policy requirements, and then propagating requirements to | |
dependencies. This results in every crate having a corresponding `CritseriaSet` | |
of the criteria required for the audit. | |
# Step 3a: The AuditGraph | |
The AuditGraph is the graph of all audits for a particular package *name*. | |
The nodes of the graph are [Version][]s and the edges are delta audits (e.g. `0.1.0 -> 0.2.0`). | |
Each edge has a list of criteria it claims to certify, and dependency criteria that the | |
dependencies of this package must satisfy for the edge to be considered "valid" (see | |
the next section for details). | |
There is an implicit Root Version which represents an empty package, meaning that throughout | |
much of the audit graph, versions are represented as `Option<Version>`. | |
When trying to validate whether a particular version of a package is audited, we also add | |
a Target Version to the graph (if it doesn't exist already). | |
Full audits are desugarred to delta audits from the Root Version (so an audit for `0.2.0` would | |
be lowered to a delta audit from `Root -> 0.2.0`). | |
Exemptions are desugared to full audits (and therefore deltas) with a special | |
DeltaEdgeOrigin indicating their origin. This is used to deprioritize the edges | |
so that we can more easily detect exemptions that aren't needed anymore. | |
Imported audits are lowered in the exact same way as local criteria, but with | |
special DeltaEdgeOrigin to indicate their origin, to allow us to deprioritize | |
imported audits, and determine exactly which audits are needed. | |
A special DeltaEdgeOrigin is also used for imported wildcard criteria, | |
indicating both which wildcard audit is responsible, as well as which publisher | |
information is being used. | |
With all of this established. the problem of determining whether a package is audited for a given | |
criteria can be reduced to determining if there *exists* a path from the Root Version to the | |
Target Version along edges that certify that criteria. Suggesting an audit similarly becomes | |
finding the "best" edge to add to make the Root and Target connected for the desired criteria. | |
## Checking Violations | |
During AuditGraph construction violations are also checked. Violations have a [VersionReq][] and | |
a list of violated criteria. They claim that, for all versions covered by the VersionReq, you believe | |
that the listed criteria are explicitly violated. An error is produced if any edge is | |
added to the AuditGraph where *either* endpoint matches the VersionReq and *any* criteria | |
it claims to be an audit for is listed by the violation. | |
This is an extremely complicated statement to parse, so let's look at some examples: | |
``` | |
violation: safe-to-deploy, audit: safe-to-deploy -- ERROR! | |
violation: safe-to-deploy, audit: safe-to-run -- OK! | |
violation: safe-to-run, audit: safe-to-deploy -- ERROR! | |
violation: [a, b], audit: [a, c] -- ERROR! | |
``` | |
One very notable implication of this is that a violation for `["safe-to-run", "safe-to-deploy"]` | |
is actually equivalent to `["safe-to-run"]`, not `["safe-to-deploy"]`! This means that the normal | |
way of handling things, turning the violation's criteria into one CriteriaSet and checking | |
if `audit.contains(violation)` is incorrect! | |
We must instead do this check for each individual item in the violation: | |
```rust | |
let has_violation = violation.iter().any(|item| audit.contains(item)); | |
``` | |
It may seem a bit strange to produce an error if *any* audit is in any way contradicted | |
by *any* violation. Is that necessary? Is that sufficient? | |
It's definitely sufficient: it's impossible to validate a version without having an audit edge | |
with an end-point in that version. | |
I would argue that it's also *necessary*: the existence of any audit (or exemption) | |
that is directly contradicted by a violation is essentially an integrity error on the | |
claims that we are working with. Even if you don't even use the audit for anything | |
anymore, people who are peering with you and importing your audits might be, so you | |
should do something about those audits as soon as you find out they might be wrong! | |
There is currently no mechanism for mechanically dealing with such an integrity error, | |
even if the audit or violation comes from a foreign import. Such a situation is serious | |
enough that it merits direct discussion between humans. That said, if this becomes | |
enough of a problem we may eventually add such a feature. | |
# Step 3b: Searching for paths in the `AuditGraph` | |
A lot of the heavy lifting for this task is in Step 3a (AuditGraph). | |
Trying to validate all criteria at once is slightly brain-melty (because | |
different criteria may be validated by different paths), so as a simplifying | |
step we validate each criteria individually (so everything I'm about to | |
describe happens in a for loop). | |
If all we care about is finding out if a package has some criteria, then all | |
we need to do is run depth-first-search ([DFS][]) from the Root Node and see if it reaches | |
the Target Node, with the constraint that we'll only follow edges that are | |
valid (based on the already validated criteria of our dependencies). | |
If it does, we've validated the criteria for the Target Version. If it doesn't, | |
then we haven't. | |
But things are much more complicated because we want to provide more feedback | |
about the state of the audits: | |
* Did this validation require an exemption? (Is it fully audited?) | |
* Did this validation even use any audits? (Is it at least partially audited?) | |
* Did this validation need any new imports? (Should we update imports.lock?) | |
* What nodes were reachable from the Root and reverse-reachable from the Target? (candidates for suggest) | |
This is accomplished by running the search off of a priority queue, rather than | |
using a stack, such that we only try to use the "best" edges first, and can | |
be certain that we don't try to use a "worse" edge until we've tried all of the | |
paths using better edges. | |
The best edge of all is a local audit. If we can find a path using only | |
those edges, then we're fully audited, we don't need any exemptions we | |
might have for this package (a lot of caveats to this, so we don't really | |
make that conclusion reliably), and the imports.lock doesn't need to be updated. | |
If we need to add back in exemptions to find a path, then the exemptions | |
were necessary to validate this criteria. | |
If we need to add back in new imports to find a path, then we need to update | |
imports.lock to cache necessary audits for --locked executions. (The fact | |
that this comes after exemptions means we may be slightly imprecise about | |
whether something is "fully audited" when updating imports, as subsequent | |
runs won't get this far. We think this is worth the upside of minimizing | |
imports.lock updates.) | |
If any of those succeed, then we return Ok(..), communicating both that the | |
package validates this criteria, plus any caveats back to the caller. | |
Otherwise, we'll return Err(..), and consider the current node to blame. If this | |
criteria is required, this package will require additional audits or exemptions | |
to successfully vet. | |
In doing this, we also compute the nodes that are reachable from the Root | |
Version and the nodes that are reverse-reachable from the Target Version. | |
The latter is computed by following all edges backwards, which is to say | |
in Step 3a the AuditGraph also contains another directed graph with the edges | |
all reversed, and rerun the algorithm with Root and Target reversed. | |
This information is useful because in the Err case we want to suggest a diff to | |
audit, and any diff from the Root Reachable nodes to the Target Reachable nodes | |
is sufficient. | |
All search results are stored in the ResolveResult for a node along with | |
validated criteria and other fun facts we found along the way. The | |
contents of the ResolveResult will be used by our reverse-dependencies | |
in steps 2 and 3. | |
It's worth noting here that delta audits can "go backwards" (i.e. `1.0.1 -> 1.0.0`), | |
and all of this code handles that perfectly fine without any special cases. | |
It *does* make it possible for there to be cycles in the AuditGraph, but | |
[DFS][] doesn't care about cycles at all since you keep track of nodes you've | |
visited to avoid revisits (slightly complicated by us iteratively introducing edges). | |
# Step 4: Checking if each crate validates for the required criteria | |
This step is a fairly trivial combination of the results from Step 2 (computing | |
requirements) and Step 3 (resolving validated criteria) - for each package, we | |
check if the validated criteria is a superset of the requirements, and if it is | |
then we're successful, otherwise we're not. | |
We'll record which criteria failed so we can suggest better audits in the | |
errored case, and combine the caveats from successful runs in the success case | |
to get a combined result for each crate, rather than for each individual | |
criteria. | |
# Step 5: Suggesting Audits (Death By A Thousand Diffs) | |
This step takes the failed packages from Step 4 and recommends audits that will | |
fix them. In Step 3b we compute the Root Reachable Nodes and the Target | |
Reachable Nodes for a disconnected package. In this phase we use those as | |
candidates and try to find the best possible diff audit. | |
More specifically, we use the intersection of all the Root Reachable Nodes | |
for every criteria this package failed (ditto for Target Reachable). | |
By using the intersection, any diff we recommend from one set to the other | |
is guaranteed to cover all required criteria, allowing us to suggest a single | |
diff to fix everything. Since the Root and Target are always in their respective | |
sets, we are guaranteed that the intersections are non-empty. | |
So how do we pick the *best* diff? Well, we straight up download every version of the package that | |
we have audits for and diff-stat all the combinations. Smallest diff wins! Does that sound horrible | |
and slow? It is! That's why we have a secret global diff-stat cache on your system. | |
Also we don't *literally* diff every combination. We turn the O(n<sup>2</sup>) diffs | |
into only O(n) diffs with a simple heuristic: for each Target Reachable Node, | |
we find the package closest version *smaller* than that version and the closest version | |
*bigger* than that version. We then diff that version against only those two versions. | |
This may potentially miss some magical diff where a big change is made and then reverted, | |
but this diffing stuff needs some amount of taming! | |
It's worth noting that [Version]s don't form a proper metric space: We cannot compute | |
the "distance" between two Versions in the abstract, and then compare that to the "distance" | |
between two other versions. Versions *do* however have a total ordering, so we *can* | |
compute minimum and maximum versions, and say whether a version is bigger or smaller | |
than another. As a result it's possible to compute "the largest version that's smaller than X" | |
and "the smallest version that's larger than X", which is what we use. There is however | |
no way to say whether the smaller-maximum or the bigger-minimum is closer to X, so we must | |
try both. | |
It's also worth reiterating here that diffs *can* go backwards. If you're on 1.0.0 and | |
have an audit for 1.0.1, we will happily recommend the reverse-diff from `1.0.1 -> 1.0.0`. | |
This is slightly brain melty at first but nothing really needs to specially handle this, | |
it Just Works. | |
Any diff we recommend from the Root Version is "resugared" into recommending a full audit, | |
(and is also computed by diffing against an empty directory). It is impossible | |
to recommend a diff to the Root Version, because there cannot be audits of the | |
Root Version. | |
[cargo metadata]: https://doc.rust-lang.org/cargo/commands/cargo-metadata.html | |
[cargo_metadata]: https://docs.rs/cargo_metadata/latest/cargo_metadata/ | |
[is_crates_io]: https://docs.rs/cargo_metadata/latest/cargo_metadata/struct.Source.html#method.is_crates_io | |
[DAG]: https://en.wikipedia.org/wiki/Directed_acyclic_graph | |
[PackageId]: https://docs.rs/cargo_metadata/latest/cargo_metadata/struct.PackageId.html | |
[Version]: https://docs.rs/semver/latest/semver/struct.Version.html | |
[VersionReq]: https://docs.rs/semver/latest/semver/struct.VersionReq.html | |
[guppy]: https://docs.rs/guppy/latest/guppy/ | |
[topological sort]: https://en.wikipedia.org/wiki/Topological_sorting | |
[transitive closure]: https://en.wikipedia.org/wiki/Transitive_closure | |
[DFS]: https://en.wikipedia.org/wiki/Depth-first_search |