2024-05-07 Triage Log

Largely uneventful week; the most notable shifts were considered false-alarms that arose from changes related to cfg-checking (either cargo enabling it, or adding cfg's like rustfmt to the “well-known cfgs list”).

Triage done by @pnkfelix. Revision range: c65b2dc9..69f53f5e

Summary:

(instructions:u)	mean	range	count
Regressions ❌ (primary)	3.0%	[0.2%, 19.5%]	65
Regressions ❌ (secondary)	1.3%	[0.2%, 4.5%]	103
Improvements ✅ (primary)	-0.9%	[-2.2%, -0.2%]	24
Improvements ✅ (secondary)	-0.7%	[-1.4%, -0.4%]	23
All ❌✅ (primary)	1.9%	[-2.2%, 19.5%]	89

3 Regressions, 2 Improvements, 3 Mixed; 5 of them in rollups 54 artifact comparisons made in total

Regressions

Rollup of 7 pull requests #124675 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.5%	[0.2%, 1.2%]	11
Regressions ❌ (secondary)	0.8%	[0.4%, 1.3%]	17
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.5%	[0.2%, 1.2%]	11

all primary regressions are to doc-full scenarios, and the 1.2% is to helloworld.
not worth teasing apart a rollup PR.
marking as triaged.

Update cargo #124684 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	2.4%	[0.2%, 19.1%]	83
Regressions ❌ (secondary)	1.6%	[0.2%, 5.7%]	92
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	2.4%	[0.2%, 19.1%]	83

syn (mostly check builds, but also a debug incr-unchanged and opt incr-unchanged) had regressions ranging from 7.24% all the way up to 19.11%.
The most plausible hypothesis is that this is due to an explosion in the number of warnings emitted for this benchmark. (The number of warnings went from ~200 up to 1800, according to Urgau's analysis).
This means the code ends up becoming, at least in part, a benchmark of the lint machinery, regardless of whether that is our intent or not.
see also rustc-perf#1819 “Consider passing -Awarnings (or similar) to avoid false alarms from lint reporting”
marking as triaged.

Rollup of 3 pull requests #124784 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.3%	[0.2%, 0.4%]	5
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.3%	[0.2%, 0.4%]	5

all regressions were to syn, to various incr-unchanged and incr-patched:println scenarios.
current hypothesis is that this is due to PR #124742, which adds rustfmt to the well-known cfgs list.
that hypothesis implies that this is a (mostly-)false alarm, much like #124684.
marking as triaged

Improvements

Rollup of 10 pull requests #124646 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.0%	[-2.8%, -0.2%]	24
Improvements ✅ (secondary)	-0.9%	[-1.6%, -0.3%]	9
All ❌✅ (primary)	-1.0%	[-2.8%, -0.2%]	24

the bulk of the improvements are to variations of html5ever and serde_derive.
skimming over the rollup list, I cannot identify an immediate root cause for improvement
but for now will treat it like a happy accident

Some hir cleanups #124401 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.1%	[-0.2%, -0.1%]	3
Improvements ✅ (secondary)	-1.1%	[-2.0%, -0.2%]	2
All ❌✅ (primary)	-0.1%	[-0.2%, -0.1%]	3

all improvements are to variations of typenum
the hir cleanups in question are largely to store AnonConst (e.g. for array lengths) in the HIR arena, and then move the ConstArg span over to AnonConst span instead.
inspection of typenum didn't show any particular cases that seemed like the would stress AnonConst; maybe the benefit comes more from the places where we now pass a span by value instead of passing a pointer to it.

Mixed

Account for immutably borrowed locals in MIR copy-prop and GVN #123602 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.3%	[0.2%, 0.9%]	10
Regressions ❌ (secondary)	0.8%	[0.2%, 2.6%]	4
Improvements ✅ (primary)	-0.5%	[-1.1%, -0.2%]	6
Improvements ✅ (secondary)	-0.5%	[-1.0%, -0.3%]	8
All ❌✅ (primary)	0.0%	[-1.1%, 0.9%]	16

html5ever opt-full regressed by 0.92%; libc in various incremental scenarios regressed by 0.30% to 0.39%.
the libc changes were anticipated in the perf build prior to merge; html5ever opt-full was not predicted there.
pnkfelix hypothesizes that this just reflects some extra-work from the compiler attempting to do the copy-propagation and global-value-numbering mir-optimizations on a larger set of immutably-borrowed locals, and is acceptable given the expected benefits.
marking as triaged

Rollup of 8 pull requests #124703 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.5%	[0.2%, 0.6%]	4
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.0%	[-1.5%, -0.5%]	4
All ❌✅ (primary)	0.5%	[0.2%, 0.6%]	4

image opt-full regressed by 0.63%; html5ever debug-{incr-full,full} by ~0.5%, html5ever opt-incr-unchaged by 0.21%
already triaged by Kobzol, who hypothesizes that PR #124700 modified some inlining decisions.

Rollup of 4 pull requests #124716 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.3%	[0.3%, 0.5%]	6
Improvements ✅ (primary)	-0.8%	[-0.8%, -0.8%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.8%	[-0.8%, -0.8%]	1

all regressions are secondary (specifically on unused-warnings benchmark)
regression identified by Kobzol as caused by PR #124584 “Various improvements to entrypoint code”
seems like noise to pnkfelix
marked as triaged

2024-05-07 Triage Log

Regressions

Improvements

Mixed

Untriaged Pull Requests