2022-01-26 Triage Log

An awesome week. There was some bits of noise from PR #91032 that landed and then had to be backed out (and may soon land again), and we continue to wrestle with how to classify which things to include in rollup PR‘s. But overall there were some very real wins to the compiler’s performance, and it is definitely reflected in the total bootstrap time graph. Great job!

Triage done by @pnkfelix. Revision range: 7bc7be860f99f4a40d45b0f74e2d01b02e072357..c54dfee65126a0ac385d55389a316e89095a0713

4 Regressions, 5 Improvements, 4 Mixed; 3 of them in rollups

29 comparisons made in total

Regressions

Update some rustc dependencies to deduplicate them #92896

Average relevant regression: 0.5%
Largest regression in instruction counts: 0.6% on full builds of match-stress-enum check
6 of the 7 significant relevant regresions were to variants of match-stress-enum.
PR author guesses that it could be noise injected via one of the dependency updates, specifically hashbrown 0.11.0 to 0.11.2.
Left it untriaged for now (I would like to circle back and check whether there's any way to check that hypothesis; but if it goes untouched for a week, then we might just rubber-stamp it as triaged).

Disable drop range tracking in generators #93165

Average relevant regression: 284.5%
Largest regression in instruction counts: 879.2% on full builds of deeply-nested-async check
This regression was expected; it is a result of backing out PR #91032, which was included in rollup PR #93138, but was reverted due to correctness concerns (discussed under “Mixed” section below).

Revert “Do not hash leading zero bytes of i64 numbers in Sip128 hasher” #93014

Average relevant regression: 1.5%
Average relevant improvement: -0.7%
Largest regression in instruction counts: 7.9% on incr-full builds of clap-rs check
This regression was expected; this PR reverts a perf optimization to restore correctness.

Store a Symbol instead of an Ident in AssocItem #93095

Average relevant regression: 0.8%
Largest regression in instruction counts: 2.1% on incr-patched: compile one builds of regex check
This PR was already triaged: it is a correctness fix for incremental compilation, and the lesser of two evils (when compared to PR #92837).

Improvements

Improve capacity estimation in Vec::from_iter #92138

Average relevant improvement: -1.4%
Largest improvement in instruction counts: -3.0% on full builds of projection-caching doc
This hopefully closes the loop on an nearly four-year old hypothesized performance fix, issue #48994.

Make Decodable and Decoder infallible. #93066

Average relevant improvement: -0.8%
Largest improvement in instruction counts: -2.1% on full builds of helloworld doc
Wow. This compare page has an impressive amount of green.
Perhaps even more notably, the bootstrap time had 8 seconds (-1.141%) shaved off via this PR, and from the bootstrap graph, the bulk of that improvement has stuck.
Huge kudos to @nnethercote (the PR author) here.

Use indexmap to avoid sorting LocalDefIds #90842

Average relevant improvement: -1.1%
Largest improvement in instruction counts: -1.1% on incr-unchanged builds of ctfe-stress-4 check
This is part of foundational work (namely #90317) to make our incr. comp. system more robust, in that we want to ensure that information untracked for incremental compilation does not indirectly influence values and cause stable hashes to deviate.
The point is: the motivation here is not performance. That's just a happy accident, from what I can tell.

Rustdoc: remove ListAttributesIter and use impl Iterator instead #92353

Average relevant improvement: -2.6%
Largest improvement in instruction counts: -4.2% on full builds of deeply-nested doc
rustdoc performance improvement.

Rollup of 10 pull requests #93069

Average relevant improvement: -0.7%
Largest improvement in instruction counts: -1.0% on full builds of ripgrep opt
This was auto-classifed as not relevant and pnkfelix does not know why, so its been moved in “Improvements”

Mixed

Rollup of 17 pull requests #93138

Average relevant regression: 1.6%
Average relevant improvement: -57.9%
Largest improvement in instruction counts: -89.5% on full builds of deeply-nested-async check
Largest regression in instruction counts: 2.6% on full builds of await-call-tree check
The noted improvement from this roll-up was due to the inclusion of PR #91032, “Introduce drop range tracking to generator interior analysis”.
PR #91032 injected a family of ICEs, such as issue #93161, so the feature it added is being disabled.
As for the improvement: The PR author, @eholk, made a note hypothesizing that the improvement to deeply-nested-async may be an artifact of how much is pruned from the generator type. (This may be a sign of a overly artificial benchmark; I wrote a comment asking for more clarification there.)

Emit simpler code from format_args #91359

Average relevant regression: 1.2%
Average relevant improvement: -0.7%
Largest improvement in instruction counts: -2.2% on full builds of cranelift-codegen check
Largest regression in instruction counts: 3.0% on full builds of html5ever opt
These performance differences were anticipated ahead of time. @simulacrum posted a nice analysis explaining the probable root cause.
Notably, “with -Csymbol-mangling-version=v0 the hashes (changes to which cause LLVM's workload to change) go away; [...] this patch is pretty much an improvement in terms of emitted IR (as roughly expected).”

Update hashbrown to 0.12.0 #92998

Average relevant regression: 1.0%
Average relevant improvement: -0.9%
Largest improvement in instruction counts: -9.4% on incr-patched: println builds of webrender opt
Largest regression in instruction counts: 2.5% on incr-unchanged builds of externs debug
“an overall win but with a bit of noise since this code is extremely sensitive to inlining.”

Rollup of 8 pull requests #93288

Average relevant regression: 1.5%
Average relevant improvement: -0.7%
Largest improvement in instruction counts: -0.8% on incr-full builds of keccak check
Largest regression in instruction counts: 2.0% on incr-full builds of stm32f4 check
stm32f4 regressed on many axes (check/debug/opt/doc, full and incremental); inflate check for both full and incremental. keccak improved slightly.
It is not obvious what caused the changes here in this rollup.
stm32f4 was added in part to test compiler trait machinery.
After looking at each of 8 PR's in the rollup, most likely causes are either PR #93064 (“Properly track DepNodes in trait evaluation provisional cache”) or PR #93175 (“Implement stable overlap check considering negative traits”). Left comments on each PR asking if they should have had perf runs.
I am leaving this unmarked (i.e. untriaged) for now.

2022-01-26 Triage Log

Regressions

Improvements

Mixed

Untriaged Pull Requests