2024-07-02 Triage Log

We saw a large set of primary benchmarks regress, mostly due to PR #120924 (lint_reasons and #[expect]) and PR #120639 (new effects desugaring). Separate from those, there are a couple rollup PRs (#127076, #127096) with some regressions that were limited to relatively few benchmarks; pnkfelix was unable to isolate a injecting PR that can be identified as a root cause (outside assistance welcome!).

Triage done by @pnkfelix. Revision range: c3d7fb39..cf2df68d

Summary:

(instructions:u)meanrangecount
Regressions ❌
(primary)
1.0%[0.2%, 2.8%]109
Regressions ❌
(secondary)
1.4%[0.3%, 8.0%]50
Improvements ✅
(primary)
-1.3%[-4.3%, -0.2%]41
Improvements ✅
(secondary)
-1.3%[-4.4%, -0.2%]75
All ❌✅ (primary)0.4%[-4.3%, 2.8%]150

4 Regressions, 3 Improvements, 11 Mixed; 7 of them in rollups 59 artifact comparisons made in total

Regressions

Rollup of 7 pull requests #126951 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
0.5%[0.5%, 0.6%]3
Regressions ❌
(secondary)
--0
Improvements ✅
(primary)
--0
Improvements ✅
(secondary)
--0
All ❌✅ (primary)0.5%[0.5%, 0.6%]3
  • regressions are all to serde incr-patched:println {check, debug, opt}.
  • on its own, the regression is limited to instruction counts, and seems minor enough to not warrant deeper investigation.
  • (the 30-day history tells a slightly more complex story, though)
  • marked as triaged

Let's #[expect] some lints: Stabilize lint_reasons (RFC 2383) #120924 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
0.6%[0.2%, 1.9%]142
Regressions ❌
(secondary)
0.6%[0.1%, 1.5%]79
Improvements ✅
(primary)
--0
Improvements ✅
(secondary)
--0
All ❌✅ (primary)0.6%[0.2%, 1.9%]142
  • wide collection of regressions.
  • PR discussion indicates regression may be inherent to how #[expect] is implemented; it is also hypothesized to be “likely” that the implementation can be better optimized.
  • not marking as triaged.
  • (my hope is that someone will look into the “further optimizations” that xFrednet alludes to above, and after we've done a reasonable amount of investigation there, then we can mark this as triaged.)

Update browser-ui-test version to 0.18.0 #127010 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
--0
Regressions ❌
(secondary)
7.2%[7.2%, 7.2%]1
Improvements ✅
(primary)
--0
Improvements ✅
(secondary)
--0
All ❌✅ (primary)--0
  • already marked as triaged (secondary benchmark deep-vector is being noisy at the moment).

Implement new effects desugaring #120639 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
0.3%[0.2%, 0.6%]72
Regressions ❌
(secondary)
0.4%[0.1%, 0.9%]24
Improvements ✅
(primary)
--0
Improvements ✅
(secondary)
--0
All ❌✅ (primary)0.3%[0.2%, 0.6%]72
  • Biggest (>=0.4%) primary regressions: regex, bitmaps, typenum, stm32f4, exa. (19 variants of those five benchmarks.)
  • the PR author (fee1-dead) has made a couple follow-up attempts to address the regressions, but nothing has hit yet.
  • not marking as triaged, in order to encourage addressing the regressions. (note however: the cycles:u metric didn't regress, at least not past our noise-filtering significance threshold. Nor did task-clock:u. It is not totally clear how much effort is warranted here, apart from a desire to keep the instruction count low just because that is our most stable proxy for “computational effort”)

Improvements

Save 2 pointers in TerminatorKind (96 → 80 bytes) #126784 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
--0
Regressions ❌
(secondary)
--0
Improvements ✅
(primary)
-0.4%[-0.5%, -0.2%]9
Improvements ✅
(secondary)
-0.1%[-0.1%, -0.1%]4
All ❌✅ (primary)-0.4%[-0.5%, -0.2%]9
  • improvements are to serde and diesel.
  • skimming 30 day history indicates that the effect is real, though may have been somewhat undone by subsequent changes.

rustdoc: use current stage if download-rustc enabled #126728 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
--0
Regressions ❌
(secondary)
--0
Improvements ✅
(primary)
--0
Improvements ✅
(secondary)
-8.0%[-8.0%, -8.0%]1
All ❌✅ (primary)--0
  • (this was just deep-vector noise)

Rollup of 9 pull requests #127174 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
--0
Regressions ❌
(secondary)
0.2%[0.2%, 0.2%]1
Improvements ✅
(primary)
-0.4%[-1.1%, -0.2%]46
Improvements ✅
(secondary)
-1.3%[-2.9%, -0.2%]36
All ❌✅ (primary)-0.4%[-1.1%, -0.2%]46
  • this had broad improvements to instruction counts, but the cycle counts metric reports that there were 13 regressions (one of which, unicode-normalization, was primary) here with only one improvement (none of which was primary).
  • nonetheless, not worth investigating further.

Mixed

Rollup of 9 pull requests #126878 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
--0
Regressions ❌
(secondary)
0.4%[0.3%, 0.5%]6
Improvements ✅
(primary)
-0.4%[-0.5%, -0.3%]4
Improvements ✅
(secondary)
-0.3%[-0.3%, -0.3%]1
All ❌✅ (primary)-0.4%[-0.5%, -0.3%]4
  • regressions are all to secondary benchmark: coercions.
  • marking as triaged

Add SliceLike to rustc_type_ir, use it in the generic solver code (+ some other changes) #126813 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
--0
Regressions ❌
(secondary)
0.5%[0.4%, 0.8%]7
Improvements ✅
(primary)
-0.4%[-0.6%, -0.3%]12
Improvements ✅
(secondary)
-0.7%[-2.2%, -0.2%]9
All ❌✅ (primary)-0.4%[-0.6%, -0.3%]12
  • regressions are all to secondary benchmark: match-stress
  • marking as triaged

Also get add nuw from uN::checked_add #126852 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
0.6%[0.3%, 0.9%]4
Regressions ❌
(secondary)
0.4%[0.3%, 0.4%]2
Improvements ✅
(primary)
-0.3%[-0.3%, -0.3%]1
Improvements ✅
(secondary)
-1.3%[-1.4%, -0.9%]7
All ❌✅ (primary)0.4%[-0.3%, 0.9%]5
  • PR was analyzed and thought to be a net win, despite the anticipated regression to compiler instruction-counts
  • but there was a bystander follow-up comment that the result here might be a pessimization, at least for Intel x86.
  • not marking as triaged, in hopes that follow-up comment gets addressed in some manner.

ast: Standardize visiting order for attributes and node IDs #125741 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
--0
Regressions ❌
(secondary)
0.3%[0.2%, 0.3%]3
Improvements ✅
(primary)
--0
Improvements ✅
(secondary)
-0.3%[-0.4%, -0.2%]12
All ❌✅ (primary)--0
  • solely regressions to secondary benchmark: tt-muncher.
  • marking as triaged

Rollup of 8 pull requests #126965 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
--0
Regressions ❌
(secondary)
3.3%[1.7%, 5.8%]9
Improvements ✅
(primary)
--0
Improvements ✅
(secondary)
-3.0%[-5.7%, -0.3%]2
All ❌✅ (primary)--0
  • solely regressions to secondary benchmark: derive
  • marking as triaged

Remove more PtrToPtr casts in GVN #126844 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
1.3%[0.3%, 2.9%]4
Regressions ❌
(secondary)
--0
Improvements ✅
(primary)
-0.7%[-1.1%, -0.4%]2
Improvements ✅
(secondary)
-0.3%[-0.3%, -0.3%]1
All ❌✅ (primary)0.6%[-1.1%, 2.9%]6
  • Main primary regressions to opt-full benchmarks ripgrep (2.89%), webrender (1.11%), html5ever (0.70%).
  • Some interesting discussion on the PR about the cause; e.g. are PR's like this causing individual MIR reduction that leads to more inlining and then more bloat overall?
  • but I do not think any of that would cause us to undo this particular change; there are higher level inlining and code-generation policies that need to be revisited.
  • marking as triaged.

Rollup of 6 pull requests #127014 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
--0
Regressions ❌
(secondary)
8.2%[8.2%, 8.2%]1
Improvements ✅
(primary)
-0.2%[-0.2%, -0.2%]1
Improvements ✅
(secondary)
-2.2%[-5.0%, -0.2%]13
All ❌✅ (primary)-0.2%[-0.2%, -0.2%]1
  • already marked as triaged (sole regressionw as to noisy deep-vector)

Rollup of 6 pull requests #127076 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
1.4%[0.6%, 2.1%]2
Regressions ❌
(secondary)
--0
Improvements ✅
(primary)
-1.8%[-2.7%, -0.8%]2
Improvements ✅
(secondary)
-0.7%[-6.2%, -0.2%]17
All ❌✅ (primary)-0.2%[-2.7%, 2.1%]4
  • regressions are to opt-full: image (2.11%) and cargo (0.61%).
  • eyeballing the self-profile results provides a hint that we might be spending more time in LLVM optimizations passes after this rollup PR landed.
  • fired off some follow-up rust-timer builds on a couple potential culprits, but I admit that I‘m only making semi-educated guesses. (outcome: It wasn’t PR #124741 nor PR #126970)
  • not marking as triaged, but if no one can identify a root cause within a week, then we should just mark it so at that point.

Rollup of 11 pull requests #127096 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
0.5%[0.3%, 0.7%]7
Regressions ❌
(secondary)
1.1%[0.2%, 1.6%]7
Improvements ✅
(primary)
-3.4%[-6.2%, -1.2%]12
Improvements ✅
(secondary)
--0
All ❌✅ (primary)-1.9%[-6.2%, 0.7%]19
  • all 7 primary regressions are variants of syn; all but one are incremental.
  • skimming the detailed results reports for the top three regressing variants, I see the following queries at the top of the ordering by time-delta: incr_comp_persist_dep_graph, hir_crate, codegen_copy_artifacts_from_incr_cache, early_lint_checks...
  • what in this rollup would have impacted those incremental-compilation related queries?
  • PR #1270668 already had its own dedicated rustc-perf run.
  • (is this potentially just fallout noise from internal API changes like that in PR #127071?)
  • fired off a rust-timer build against that, just to scratch that itch.
  • not marking as triaged, but if no one can identify a root cause within a week, then we should just mark it so at that point.

Automatically taint InferCtxt when errors are emitted #126996 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
--0
Regressions ❌
(secondary)
0.5%[0.4%, 0.9%]7
Improvements ✅
(primary)
-0.2%[-0.2%, -0.2%]1
Improvements ✅
(secondary)
--0
All ❌✅ (primary)-0.2%[-0.2%, -0.2%]1
  • all regressions are to secondary match-stress benchmark, and were anticipated during a perf run during review
  • marking as triaged.

Avoid MIR bloat in inlining #127113 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
1.1%[0.3%, 2.8%]6
Regressions ❌
(secondary)
1.6%[1.5%, 1.9%]6
Improvements ✅
(primary)
-0.8%[-2.2%, -0.2%]17
Improvements ✅
(secondary)
-1.6%[-4.5%, -0.2%]18
All ❌✅ (primary)-0.3%[-2.2%, 2.8%]23
  • regressed opt-full html5ever, diesel, hyper, and clap. Also regressed ripgrep and regex in two isolated opt incremental scenarios.
  • overall gains more than it loses, as noted after the perf run done during PR development.
  • the big impact was to binary sizes, where the improvement is pretty clear.
  • marking as triaged.

Untriaged Pull Requests