2024-01-16 Triage Log

This week had some small regressions that did not warrant further investigation, several of which were dismissed as being noise/blips in the data. There were also a number of gains. (Don‘t get exicited about that 20.6% improvement, its an measurement artifact from a temporary blip in the PR that immediately preceded this week’s triage.)

Triage done by @pnkfelix. Revision range: 76101eec..f9c2421a

Summary:

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.7%	[0.6%, 0.7%]	2
Regressions ❌ (secondary)	3.1%	[0.8%, 4.1%]	9
Improvements ✅ (primary)	-1.2%	[-20.6%, -0.2%]	133
Improvements ✅ (secondary)	-0.8%	[-7.3%, -0.1%]	31
All ❌✅ (primary)	-1.2%	[-20.6%, 0.7%]	135

3 Regressions, 5 Improvements, 5 Mixed; 3 of them in rollups 55 artifact comparisons made in total

Regressions

Rollup of 10 pull requests #119754 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.4%	[2.4%, 2.4%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

The 2 regressing (and secondary) benchmarks are tt-muncher debug {incr-full, full}. Its not transient.
I‘ve skimmed over the list of PR’s in the rollup. None of them are obvious culprits here. I looked at the ones related to debuginfo (#118903) and to code-coverage (#119033 and #119681), but none of those seem likely to be to blame here
Since this only affects a secondary benchmark, and only the instruction count (e.g. not cpu-clock:u nor wall-time for these two benchmarks), I do not think its worth further investigation and I'm going to mark it as triaged.

Exhaustiveness: use an Option instead of allocating fictitious patterns #119688 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	3.8%	[3.6%, 4.1%]	6
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

This impacted the 6 variants of match-stress {incr-full,full} x {check,debug,opt}
I think the impact on match-stress was probably well-anticipated, and within a reasonable range for a stress-test benchmark.
Note that #119688 was a precursor to some further cleanup code (namely to remove the use of a local-arena within exhaustiveness checking).
Marking as triaged.

never patterns: Check bindings wrt never patterns #119610 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.3%	[0.3%, 0.4%]	3
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.3%	[0.3%, 0.4%]	3

This impacted 3 variants of unicode-normalization-0.1.19: debug incr-unchanged and check {incr-unchanged, incr-patched:println}.
Interestingly, during two different try runs, those three variants were found to have improved by similar amounts by this PR.
there's some weird interaction between that benchmark and the code paths impacted by this PR, and I do not think its worth investing effort in further investigation.
marking as triaged.

Improvements

macro_rules: Add an expansion-local cache to span marker #119693 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.4%	[-20.5%, -0.2%]	80
Improvements ✅ (secondary)	-0.8%	[-1.9%, -0.3%]	16
All ❌✅ (primary)	-1.4%	[-20.5%, -0.2%]	80

the bitmaps changes (-20.5%, -17.9%, -13.1%) are all artifacts of returning to normal after a blip in the previous PR.

A more efficient slice comparison implementation for T: !BytewiseEq #116846 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.5%	[-0.9%, -0.2%]	15
Improvements ✅ (secondary)	-0.6%	[-0.6%, -0.6%]	1
All ❌✅ (primary)	-0.5%	[-0.9%, -0.2%]	15

it is too bad that work in PR #100124 stalled.

Remove a large amount of leb128-coded integers #119791 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.1%	[1.1%, 1.1%]	1
Improvements ✅ (primary)	-0.3%	[-0.3%, -0.2%]	5
Improvements ✅ (secondary)	-0.3%	[-0.5%, -0.1%]	12
All ❌✅ (primary)	-0.3%	[-0.3%, -0.2%]	5

the 1.1% hit is to deep-vector debug full. It may be transient; the history is pretty up-and-down at the time of this PR, and has settled at a lower level than where it was when this PR landed.
in any case, the gains elsewhere, especially bootstrap, outweigh the loss to that one secondary benchmark. (Which ... I guess is what the rustc-perf bot now computes as well, since it categorized this as an Improvement rather than Mixed?)

Exhaustiveness: track overlapping ranges precisely #119396 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.4%	[-1.7%, -0.2%]	32
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.4%	[-1.7%, -0.2%]	32

Rollup of 6 pull requests #119889 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.8%	[-1.9%, -1.7%]	4
Improvements ✅ (secondary)	-4.3%	[-7.4%, -1.3%]	2
All ❌✅ (primary)	-1.8%	[-1.9%, -1.7%]	4

Mixed

Support async recursive calls (as long as they have indirection) #117703 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.2%	[0.2%, 0.2%]	1
Improvements ✅ (primary)	-0.3%	[-0.4%, -0.3%]	3
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.3%	[-0.4%, -0.3%]	3

this is weird, it looks like an inverse blip occurred on the preceding PR, where tt-muncher check incr-unchanged had a single point with -0.2% instruction-count, and then it preceding to “return to normal” on the succeeding PRs.
(Its harder for me to explain away “inverse blips” ...)
but at the same time, this does not seem like a significant regression by our usual metrics.
marking as triaged.

Rollup of 9 pull requests #119767 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	1.3%	[0.4%, 2.3%]	2
Regressions ❌ (secondary)	0.9%	[0.5%, 1.2%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.3%	[-0.3%, -0.3%]	1
All ❌✅ (primary)	1.3%	[0.4%, 2.3%]	2

primary regressions: syn opt-full regressed by 2.3%, bitmaps check-incr-full by 0.35%. secondary regressions: coercions debug-full by 1.23%, ctfe-stress check-full by 0.51%
from the overall history, it seems like syn opt-full returned to “normal” with later PRs that don‘t necessarily seem like they would have affected syn (e.g. PR #117449). bitmap check-incr-full’s trend is likewise downward after this point.
marking as triaged

Add assume into NonZeroIntX::get #119452 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.5%	[0.4%, 0.9%]	4
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.7%	[-0.7%, -0.7%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.3%	[-0.7%, 0.9%]	5

scottmcm writes: “Instructions have a couple red in instruction counts for opt, but that's entirely reasonable for something intended to enable optimizations. Notably, the cycles are green, with no regressions. So I think this is fine.”
marking as triaged

Avoid some redundant work in GVN #119439 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.8%	[0.6%, 0.9%]	4
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.7%	[-1.1%, -0.5%]	4
Improvements ✅ (secondary)	-0.4%	[-0.4%, -0.4%]	1
All ❌✅ (primary)	0.0%	[-1.1%, 0.9%]	8

primary regressions are regex-1.5.5 debug-full, opt-incr-patched:Job, incr-full, and exa opt-full.
the exa regression looks like a blip. The regex ones were predicted during a try run for the PR. I assume they were deemed acceptable as they are offset improvements elsewhere (or dismissed as noise?)
marking as triaged.

Sandwich MIR optimizations between DSE. #119672 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.7%	[0.2%, 1.4%]	14
Regressions ❌ (secondary)	0.5%	[0.2%, 2.7%]	14
Improvements ✅ (primary)	-1.0%	[-2.2%, -0.2%]	31
Improvements ✅ (secondary)	-0.9%	[-2.2%, -0.2%]	10
All ❌✅ (primary)	-0.4%	[-2.2%, 1.4%]	45

already marked as triaged by @lqd with the comment “As seen in the previous runs: some nice wins on bigger benchmarks, and overall gains outweigh the few losses.”