blob: 28aca5c7d4fd259aefeae713d046e02e38682181 [file] [log] [blame] [view] [edit]
# The Compile-time Benchmark Suite
This file describes the programs in the compile-time benchmark suite and explains why they
were included.
The suite changes over time. Sometimes the code for a benchmark is updated, in
which case a small suffix will be added (starting with "-2", then "-3", and so
on.)
There are three categories of compile-time benchmarks, **Primary**, **Secondary**, and
**Stable**.
## Primary
These are real programs that are important in some way, and worth tracking.
They mostly consist of real-world crates.
- **bitmaps-3.1.0**: A bitmaps implementation. Stresses the compiler's trait
handling by implementing a trait `Bits` for the type `BitsImpl<N>` for every
`N` value from 1 to 1024.
- **cargo-0.60.0**: The Rust package manager. A large program, and an important
part of the Rust ecosystem.
- **clap-3.1.6**: A command line argument parser library. A crate used by many
Rust programs.
- **cranelift-codegen-0.82.1**: The largest crate from a code generator. Used by
wasmtime. Stresses obligation processing.
- **diesel-1.4.8**: A type safe SQL query builder. Utilizes the type system to
ensure a lot of invariants. Stresses anything related to resolving
trait bounds, by having a lot of trait impls for a large number of different
types.
- **exa-0.10.1**: An `ls` replacement. A widely-used utility, and a binary
crate.
- **helloworld**: A trivial program. Gives a lower bound on compile time.
- **html5ever-0.26.0**: An HTML parser. Stresses macro parsing code.
- **hyper-0.14.18**: A fairly large crate. Utilizes async/await, and used by
many Rust programs. The crate uses cargo features to enable large portions of its
structure and is built with `--features=client,http1,http2,server,stream`.
- **image-0.24.1**: Basic image processing functions and methods for
converting to and from various image formats. Used often in graphics
programming.
- **libc-0.2.124**: An interface to `libc`. Contains many declarations of
types, constants, and functions, but relatively little normal code. Stresses
the parser. A very widely-used crate.
- **regex-1.5.5**: A regular expression parser. Used by many Rust programs.
- **ripgrep-13.0.0**: A line-oriented search tool. A widely-used utility, and a
binary crate.
- **serde-1.0.136**: A serialization/deserialization crate. Used by many other
Rust programs.
- **serde_derive-1.0.136**: A proc-macro sub-crate used by `serde`. Used by
many other Rust programs. Stresses declarative macro expansion somewhat.
- **stm32f4-0.14.0**: A crate that has many thousands of blanket impl blocks.
It uses cargo features to enable large portions of its structure and is
built with `--features=stm32f410` to have faster benchmarking times.
- **syn-1.0.89**: A library for parsing Rust code. An important part of the Rust
ecosystem.
- **typenum-1.17.0**: A library that encodes integer computation within the trait system. Serves as
a stress test for the trait solver, but at the same time it is also a very popular crate.
- **unicode-normalization-0.1.19**: Unicode character composition and decomposition
utilities. Uses huge `match` statements that stress the compiler in unusual
ways.
- **webrender-2022**: A web renderer. A large, complex crate used by Firefox
and Servo. Webrender isn't released regularly so this is a development
version (revision da1df33). The `-2022` suffix distinguishes it from earlier
Webrender versions that used to be used in this benchmark suite.
## Secondary
These are either artificial programs or real crates that stress one particular aspect of the
compiler in interesting ways.
- **await-call-tree**: A tree of async fns that await each other, creating a
large type composed of many repeated `impl Future` types. Such types caused
[poor performance](https://github.com/rust-lang/rust/issues/65147) in the
past.
- **coercions**: Contains a static array with 65,536 string literals, which
caused [poor performance](https://github.com/rust-lang/rust/issues/32278) in
the past.
- **ctfe-stress-5**: A stress test for compile-time function evaluation.
- **deeply-nested-multi**: A small program containing multiple examples
([one](https://github.com/rust-lang/rust/issues/38528),
[two](https://github.com/rust-lang/rust/issues/72408),
[three](https://github.com/rust-lang/rust/issues/75992))
of code that caused exponential behavior in the past.
- **deep-vector**: A test containing a single large vector of zeroes, which
caused [poor performance](https://github.com/rust-lang/rust/issues/20936) in
the past. Stresses macro expansion and type inference.
- **derive**: A large number of simple structs with a `#[derive]` attribute for common built-in traits such as Copy and Debug.
- **externs**: A large number of extern functions has caused [slowdowns in the past](https://github.com/rust-lang/rust/pull/78448).
- **helloworld-tiny**: A trivial program optimized with flags that should reduce binary size.
Gives a lower bound on compiled binary size.
- **issue-46449**: A small program that caused [poor
performance](https://github.com/rust-lang/rust/issues/46449) in the past.
- **issue-58319**: A small program that caused [poor
performance](https://github.com/rust-lang/rust/issues/58319) in the past.
- **issue-88862**: A MCVE of a program that had a
[severe performance regression](https://github.com/rust-lang/rust/issues/88862)
when trying to normalize large opaque types with late-bound regions.
- **many-assoc-items**: Contains a struct with many associated items, which
caused [quadratic behavior](https://github.com/rust-lang/rust/issues/68957)
in the past.
- **match-stress**: Contains examples
(one involving [a huge enum](https://github.com/rust-lang/rust/issues/7462),
one involving
[`exhaustive_patterns`](https://github.com/rust-lang/rust/pull/79394)) of
`match` code that caused bad performance in the past.
- **projection-caching**: A small program that causes extremely, deeply nested
types which stress the trait system's projection cache. Removing that cache
resulted in hours long compilations for some programs using futures,
actix-web and other libraries with similarly nested type combinators.
- **regression-31157**: A small program that caused a [large performance
regression](https://github.com/rust-lang/rust/issues/31157) from the past.
- **ripgrep-13.0.0-tiny**: A line-oriented search tool, optimized with flags that should reduce
binary size.
- **token-stream-stress**: A proc-macro crate. Constructs a long token stream
much like the `quote` crate does, which caused [quadratic
behavior](https://github.com/rust-lang/rust/issues/65080) in the past.
- **tt-muncher**: Calls a quadratic TT muncher macro (based on `quote::quote!`)
with a long input, which stresses macro expansion.
- **tuple-stress**: Contains a single array of 65,535 nested `(i32, (f64, f64,
f64))` tuples. The data was extracted and reduced from a [program dealing
with grid coordinates](https://github.com/urschrei/ostn15_phf) that was
causing rustc to [run out of
memory](https://github.com/rust-lang/rust/issues/36799).
- **ucd**: A Unicode crate. Contains large statics that
[stress](https://github.com/rust-lang/rust/issues/53643) the borrow checker's
implementation of NLL.
- **unify-linearly**: Contains many variables that all have equality relations
between them, which caused [exponential
behavior](https://github.com/rust-lang/rust/pull/32062) in the past.
- **unused-warnings**: Contains many unused imports, which caused [quadratic
behavior](https://github.com/rust-lang/rust/issues/43572) in the past.
- **wf-projection-stress-65510**: A stress test which showcases [quadratic
behavior](https://github.com/rust-lang/rust/issues/65510) (in the number of
associated type bounds).
- **wg-grammar**: A parser generator.
[Stresses](https://github.com/rust-lang/rust/issues/58178) the borrow
checker's implementation of NLL.
## Stable
These are benchmarks used in the
[dashboard](https://perf.rust-lang.org/dashboard.html). They provide the
longest continuous data set for compiler performance. As a result, they are
quite old (e.g. 2017 or earlier), and not necessarily reflective of typical
Rust code being written today.
- **encoding**: An old crate providing character encoding support. Contains
some large tables.
- **futures**: v0.1.0 of the popular `futures` crate, which was used by many
Rust programs. Newer versions of this crate (e.g. v0.3.21 from February 2021)
contain very little code, instead relying on sub-crates. This makes them less
interesting as benchmarks, because we only measure final crate compilation.
This is why there is no futures crate among the primary benchmarks.
- **html5ever**: See above. This is an older version (v0.5.4) of the crate.
- **inflate**: An old implementation of the DEFLATE algorithm. Contains
a very large function containing many locals and basic blocks, which stresses
obligation processing.
- **regex**: See above. This is an older version of the crate.
- **piston-image**: See above. This is an older version of the `image` crate.
- **style-servo**: An old version of Servo's `style` crate. A large crate, and
one used by old versions of Firefox. Built with `--features=gecko`.
- **syn**: See above. This is an older version (0.11.11) of the crate.
- **tokio-webpush-simple**: A simple web server built with a very old version
of tokio. Uses futures a lot, but doesn't use `async`/`await`.
# How to update/add/remove benchmarks
## Add a new benchmark
- Decide on which category it belongs to. Probably primary if it's a real-world
crate, and secondary if it's a stress test or intended to catch specific
regressions.
- If it's a third-party crate:
- If you are keen: talk with a maintainer of the crate to see if there is
anything we should be aware of when using this crate as a compile-time
benchmark.
- Look at [crates.io](https://crates.io) to find the latest (non-prerelease) version.
- Download it with `collector download -c $CATEGORY -a $ARTIFACT crate $NAME $VERSION`.
The `$CATEGORY` is probably `primary`. `$ARTIFACT` is either `library` or `binary`, depending
on what kind of artifact does the benchmark build.
- It makes it easier for reviewers if you split things into two commits.
- In the first commit, just add the code for the entire benchmark.
- Do this by doing `git add` on the new directory.
- There is no need to remove seemingly unnecessary files such as
documentation or CI configuration.
- In the second commit, do everything else.
- Add `[workspace]` to the very bottom of the benchmark's `Cargo.toml`, if
doesn't already have a `[workspace]` section. This means commands like
`cargo build` will work within the benchmark directory.
- Add any necessary stuff to the `perf-config.json` file.
- If the benchmark is a sub-crate within a top-level crate, you'll need a
`"cargo_toml"` entry.
- If you get a "non-wrapped rustc" error when running it, you'll need a
`"touch_file"` entry.
- See [`collector/src/benchmark/mod.rs`](https://github.com/rust-lang/rustc-perf/blob/12cb796f8a932a891b385ba23a36d78a2867ace1/collector/src/benchmark/mod.rs#L24-L27) for a complete reference.
- Consider adding one or more `N-*.patch` files for the `IncrPatched`
scenario.
- If it's a primary benchmark, you should definitely do this.
- These usually consist of a patch that adds a single
`println!("testing");` statement somewhere.
- Creating the patch against what you've committed so far might be useful.
Use `git diff` from the repository root, or `git diff --relative` within
the benchmark directory. Copy the output into the `N-*.patch` file.
- Do a test run with an `IncrPatched` scenario to make sure the patch
applies correctly, e.g. `target/release/collector bench_local +nightly
--id Test --profiles=Check --scenarios=IncrPatched
--include=$NEW_BENCHMARK`
- Add the new entry to `collector/compile-benchmarks/README.md`.
- Add a new licensing entry to `collector/compile-benchmarks/REUSE.toml` (see existing entries
for inspiration).
- If the benchmark is artificial, use the `MIT OR Apache-2.0` license and set Rust Project
developers as the copyright owners (see e.g. `await-call-tree` as an example).
- If the benchmark is a third-party crate, make sure to use its license. Try to find the
copyright owner in the crate's `COPYRIGHT` or `README` files. If you cannot find it, consider
using the copyright owner `<crate-name> contributors`.
- `git add` the `Cargo.lock` file, if it's not already part of the
benchmark's committed code.
- If the benchmark has a `.gitignore` file that contains `Cargo.lock`,
you'll need to comment out that line so that `Cargo.lock` gets uploaded
in the PR.
- Consider the benchmarking time for the benchmark.
- First, measure the entire compilation time with something like this, by
doing this within the benchmark directory is good:
```
CARGO_INCREMENTAL=0 cargo check ; cargo clean
CARGO_INCREMENTAL=0 cargo build ; cargo clean
CARGO_INCREMENTAL=0 cargo build --release ; cargo clean
```
- Second, compare the final crate time with these commands:
```
target/release/collector bench_local +nightly --id Test \
--profiles=Check,Debug,Opt --scenarios=Full --include=$NEW,helloworld
target/release/site results.db
```
(See [here](../../site/README.md) for instructions on how to build the website).
Then switch to wall-times, compare `Test` against itself, and toggle the
"Show non-relevant results"/"Display raw data" check boxes to make sure it
hasn't changed drastically.
- E.g. `futures` was changed so it's just a facade for a bunch of
sub-crates, and the final crate time became very similar to `helloworld`,
which wasn't interesting.
- File a PR, including the two sets of timing measurements in the description.
## Remove a benchmark
- It makes it easier for reviewers if you split things into two commits.
- In the first commit just remove the old code.
- Do this with `git rm -r` on the directory.
- In the second commit do everything else.
- Remove the entry from `collector/compile-benchmarks/README.md`.
- `git grep` for occurrences of the old benchmark name (e.g. in
`.github/workflows/ci.yml` or `ci/check-*.sh`) and see if anything needs
changing... usually not.
- File a PR.
## Update a benchmark
- Do this in two steps.
- First add the new version of the benchmark. Compare the benchmarking time
of the two versions to make sure nothing outrageous has happened. Once the
PR is merged, make sure it's running correctly.
- Second, remove the old version of the benchmark.
Doing it in two steps ensures we have continuity of profiling coverage in the
case where something goes wrong.
- Compare the benchmarking time of the two versions.
- When adding the new version, for `perf-config.json` and the `N-*.patch`
files, use the corresponding files for the old version as a starting point.
# Benchmark update policy
## Background
rustc-perf is a "living benchmark suite" that is regularly changed. Some
benchmarks in rustc-perf are verbatim copies of third-party crates. We
periodically do a mass update of these benchmarks.
Benefits of this approach:
- We ensure we are measuring compilation of the crates most people are using.
This is most relevant for popular crates.
- We get coverage of newer language features.
Costs of this approach:
- It takes time and effort.
- We lose some data continuity.
- But the stable set of benchmarks used for the dashboard are not affected,
and they provide the greatest continuity.
- If the code hasn't changed much, it won't have much effect.
## Update policy
- The third-party crates should be updated every three years. This is a
reasonable refresh period that is neither too short or too long. It happens
to match the Rust edition cycle, but this is just coincidence.
- All third-party crates that have had at least one new release should be
updated, even if not much code has changed. This avoids having to make
decisions about whether a crate has changed enough.
- When doing this mass update, there may be some benchmarks that are deemed no
longer interesting and removed. For example, in the 2022 update we found that
the `futures` crate was no longer interesting because all the functionality
had been split into sub-crates that rustc-perf doesn't measure. Likewise,
there may be some new benchmarks that are added.
- New versions should be added before old versions are removed, to ensure
continuity of profiling coverage.
- The ad hoc addition and removal of individual benchmarks can continue
independently of this update cycle, as per the judgment of the rustc-perf
maintainers.
History:
- The first mass update of third-party crates occurred in [March/April
2022](https://hackmd.io/d9uE7qgtTWKDLivy0uoVQw).