2024-06-03 Triage Log

A quiet week; we did have one quite serious regression (#115105, “enable DestinationPropagation by default”), but it was shortly reverted (#125794). The only other PR identified as potentially problematic was rollup PR #125824, but even that is relatively limited in its effect.

Triage done by @pnkfelix. Revision range: a59072ec..1d52972d

Summary:

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.9%	[0.2%, 2.0%]	28
Regressions ❌ (secondary)	0.4%	[0.2%, 0.6%]	6
Improvements ✅ (primary)	-0.4%	[-1.2%, -0.2%]	30
Improvements ✅ (secondary)	-0.5%	[-0.9%, -0.2%]	24
All ❌✅ (primary)	0.2%	[-1.2%, 2.0%]	58

3 Regressions, 5 Improvements, 6 Mixed; 4 of them in rollups 57 artifact comparisons made in total

Regressions

Rollup of 5 pull requests #125649 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	1.3%	[0.5%, 2.1%]	12
Regressions ❌ (secondary)	0.3%	[0.3%, 0.4%]	3
Improvements ✅ (primary)	-0.1%	[-0.1%, -0.1%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.2%	[-0.1%, 2.1%]	13

all 12 of the regressing primary benchmarks are diesel-1.4.8 (in a variety of configurations).
problem was isolated to PR #125089 (improve diagnostic output of non_local_definitions lint)
Urgau notes: “The lint triggers nearly 150 times in the version of diesel used by rustc-perf, so the benchmark has become a bit a linting machinery benchmark”
cc rustc-perf#1819
marked as triaged.

Rollup of 5 pull requests #125665 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.5%	[0.5%, 0.5%]	1
Regressions ❌ (secondary)	0.4%	[0.2%, 0.5%]	7
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.5%	[0.5%, 0.5%]	1

helloworld is sole primary regression.
marked as triaged (my own opinion is that helloworld is a useful canary when it regresses by a more significant amount than this)
(also the 30-day history shows the story for helloworld to be quite a bit more complicated than what is presented by the effects of this single PR, there are lots of spikes mixed in there)

fn_arg_sanity_check: fix panic message #125695 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.8%	[0.8%, 0.9%]	4
Regressions ❌ (secondary)	0.5%	[0.2%, 1.5%]	19
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.8%	[0.8%, 0.9%]	4

helloworld is the sole primary regression.
(already) marked as triaged

Improvements

Omit non-needs_drop drop_in_place in vtables #122662 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.6%	[-1.5%, -0.2%]	9
Improvements ✅ (secondary)	-0.5%	[-1.0%, -0.2%]	18
All ❌✅ (primary)	-0.6%	[-1.5%, -0.2%]	9

improvements are to helloworld-opt and regex-opt.
small but seems real given nature of PR (largely a binary size reduction)

Update cargo #125682 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.5%	[-0.6%, -0.5%]	3
Improvements ✅ (secondary)	-0.4%	[-0.5%, -0.2%]	12
All ❌✅ (primary)	-0.5%	[-0.6%, -0.5%]	3

improvements are to helloworld-check
probably just noise

Stabilize custom_code_classes_in_docs feature #124577 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.4%	[-0.8%, -0.2%]	9
Improvements ✅ (secondary)	-0.7%	[-0.9%, -0.5%]	2
All ❌✅ (primary)	-0.4%	[-0.8%, -0.2%]	9

improvements are to various doc-full benchmarks.
Probably measurement bias (unless somehow the stability checks are a noticeable expense?)

Increase vtable layout size #123572 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.3%	[-0.6%, -0.2%]	10
All ❌✅ (primary)	-	-	0

all “improvements” are to secondary benchmarks: unify-linearly, match-stress, and unused-warnings
(the improvement from this PR is expected to be realized in runtime performance, especially for code heavy with vtable lookups. Its unsurprising that it wouldn't have a noticeable effect on the compiler tooolchain.)

Avoid checking the edition as much as possible #125828 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.4%	[-0.4%, -0.3%]	6
Improvements ✅ (secondary)	-0.4%	[-0.5%, -0.3%]	4
All ❌✅ (primary)	-0.4%	[-0.4%, -0.3%]	6

this is recovering performance that was lost in PR #123865

Mixed

Create const block DefIds in typeck instead of ast lowering #124650 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.2%	[0.2%, 0.2%]	2
Regressions ❌ (secondary)	0.2%	[0.2%, 0.2%]	3
Improvements ✅ (primary)	-0.5%	[-0.6%, -0.3%]	6
Improvements ✅ (secondary)	-0.3%	[-0.3%, -0.3%]	1
All ❌✅ (primary)	-0.3%	[-0.6%, 0.2%]	8

as previously noted by @lqd : “Tiny changes, and overall more gains than losses, probably not worth investigation effort imho.”
marking as triaged

Rollup of 8 pull requests #125691 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.4%	[0.4%, 0.4%]	1
Regressions ❌ (secondary)	1.2%	[1.2%, 1.2%]	1
Improvements ✅ (primary)	-0.5%	[-0.6%, -0.4%]	2
Improvements ✅ (secondary)	-0.3%	[-0.3%, -0.3%]	1
All ❌✅ (primary)	-0.2%	[-0.6%, 0.4%]	3

regression to image-opt-full
improvements to webrender-2022-opt-full and regex-opt-incr-patched
had a broad (if small) improvement to binary sizes, which was isolated to PR #124251
overall wins seem to outweigh losses; marking as triaged.

don't inhibit random field reordering on repr(packed(1)) #125360 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.9%	[0.6%, 1.3%]	13
Regressions ❌ (secondary)	0.4%	[0.4%, 0.4%]	2
Improvements ✅ (primary)	-0.7%	[-0.8%, -0.7%]	4
Improvements ✅ (secondary)	-0.4%	[-0.7%, -0.3%]	13
All ❌✅ (primary)	0.5%	[-0.8%, 1.3%]	17

regressed bitmaps and typenum; improved helloworld
instruction counts were affected but not cycle counts; one theory is that object code has extra offset computations or niche computations...
since cycle count was not affected, does not seem worth further investigation; marking as triaged

Enable DestinationPropagation by default. #115105 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	1.0%	[0.3%, 3.4%]	18
Regressions ❌ (secondary)	1.3%	[0.3%, 3.3%]	22
Improvements ✅ (primary)	-0.5%	[-4.0%, -0.2%]	23
Improvements ✅ (secondary)	-0.8%	[-1.6%, -0.2%]	18
All ❌✅ (primary)	0.2%	[-4.0%, 3.4%]	41

was reverted due to injecting big regression for “Building stage1 codegen backend gcc”
marking as triaged

Revert “Auto merge of #115105 - cjgillot:dest-prop-default, r=oli-obk” #125794 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.5%	[0.2%, 4.2%]	18
Regressions ❌ (secondary)	0.8%	[0.1%, 1.5%]	20
Improvements ✅ (primary)	-1.0%	[-3.2%, -0.3%]	15
Improvements ✅ (secondary)	-1.5%	[-3.1%, -0.4%]	18
All ❌✅ (primary)	-0.2%	[-3.2%, 4.2%]	33

revert of above PR
marking as triaged

Rollup of 7 pull requests #125824 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.7%	[0.5%, 1.1%]	3
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.4%	[-0.4%, -0.4%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.4%	[-0.4%, 1.1%]	4

instruction-counts regressed webrender-2022-opt-full, cargo-opt-{incr-patched, full}
cycle-counts regressed webrender-2022-opt-{full, incr-full}, cranelift-codegen-opt-incr-full, and clap-opt-incr-patched
history view for webrender shows that the cycle-count effect seems real though not quite as pronounced as the original measurements indicate.
there are many potential candidates for the cause here in this rollup.
not marking as triaged; doing some followup perf runs on individual PR's