2024-03-05 Triage Log

A bunch of noise this week which has been dropped from the report (but may be present in the summary figures). As a result, the week is pretty busy in amount of changes, but the net effect is nearly neutral to a slight regression for most workloads.

Triage done by @simulacrum. Revision range: 71ffdf7ff7ac6df5f9f64de7e780b8345797e8a0..41d97c8a5dea2731b0e56fe97cd7cb79e21cff79

Summary:

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.6%	[0.1%, 2.0%]	136
Regressions ❌ (secondary)	0.8%	[0.2%, 2.6%]	78
Improvements ✅ (primary)	-0.6%	[-1.2%, -0.3%]	9
Improvements ✅ (secondary)	-0.6%	[-1.0%, -0.2%]	14
All ❌✅ (primary)	0.6%	[-1.2%, 2.0%]	145

2 Regressions, 0 Improvements, 10 Mixed; 4 of them in rollups 51 artifact comparisons made in total

Regressions

Weekly cargo update #112865 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	1.1%	[0.2%, 2.9%]	3
Regressions ❌ (secondary)	1.1%	[0.2%, 1.9%]	7
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.1%	[0.2%, 2.9%]	3

Doesn't appear to be entirely noise (though some of the delta is likely due to bimodality). However marking this regression as triaged since investigating is likely to be painful and the regressions are predominantly in secondary benchmarks.

Add a scheme for moving away from extern "rust-intrinsic" entirely #120675 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.6%	[0.2%, 1.0%]	93
Regressions ❌ (secondary)	0.8%	[0.2%, 2.4%]	28
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.6%	[0.2%, 1.0%]	93

This regression is being addressed in a followup PR: https://github.com/rust-lang/rust/pull/122010

Mixed

syms for legacy numeric constants diag items #121667 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.5%	[0.3%, 1.1%]	3
Improvements ✅ (primary)	-0.3%	[-0.3%, -0.2%]	2
Improvements ✅ (secondary)	-0.8%	[-1.5%, -0.2%]	3
All ❌✅ (primary)	-0.3%	[-0.3%, -0.2%]	2

Regression seems likely to be noise (bimodality) or not significant enough to investigate further. Marking as triaged. Some of the improvements seem genuine.

Diagnostic renaming #121489 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.2%	[1.2%, 1.2%]	1
Improvements ✅ (primary)	-0.9%	[-0.9%, -0.8%]	4
Improvements ✅ (secondary)	-0.5%	[-0.8%, -0.4%]	9
All ❌✅ (primary)	-0.9%	[-0.9%, -0.8%]	4

Regression is essentially guaranteed to be bimodality, not a real change. Marking as triaged.

Rollup of 10 pull requests #121790 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.6%	[0.6%, 0.6%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.4%	[-0.6%, -0.3%]	2
Improvements ✅ (secondary)	-1.9%	[-2.4%, -1.5%]	2
All ❌✅ (primary)	-0.1%	[-0.6%, 0.6%]	3

The single primary benchmark regression (Cargo) change looks to be both above any noise floor and does not show signs of reverting. However, it's not significant enough (given limit to just Cargo) to investigate further.

Rollup of 7 pull requests #121804 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.3%	[0.2%, 0.6%]	7
Regressions ❌ (secondary)	1.8%	[1.7%, 1.9%]	6
Improvements ✅ (primary)	-0.8%	[-0.9%, -0.8%]	4
Improvements ✅ (secondary)	-0.5%	[-0.8%, -0.2%]	8
All ❌✅ (primary)	-0.1%	[-0.9%, 0.6%]	11

Trying to track down the cause of the regression. Suspecting #121000 based on the regressed benchmarks.

Combine Sub and Equate #121462 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.5%	[0.2%, 0.9%]	13
Regressions ❌ (secondary)	0.5%	[0.3%, 0.8%]	17
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.5%	[-0.9%, -0.4%]	7
All ❌✅ (primary)	0.5%	[0.2%, 0.9%]	13

Future PRs may improve perf here, but this is targeting clean up and preparing for other refactoring first:

Some ideas how to reduce the perf impact. Even if they don't completely remove it, this will allow significant improvements and cleanups going forward and generally simplifies the core type system. I would merge this even with these regressions.

(https://github.com/rust-lang/rust/pull/121462#issuecomment-1973282815)

Rollup of 12 pull requests #121859 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	1.9%	[0.5%, 3.3%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.8%	[-1.1%, -0.5%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.6%	[-1.1%, 3.3%]	4

Cargo regression looks real but otherwise this looks like noise to me. I‘m going to mark as triaged, I don’t think it merits digging through individual PRs to try and isolate it.

Don‘t grab variances in TypeRelating relation if we’re invariant #121864 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.2%	[0.2%, 0.2%]	1
Improvements ✅ (primary)	-0.6%	[-0.8%, -0.2%]	11
Improvements ✅ (secondary)	-0.5%	[-0.6%, -0.4%]	8
All ❌✅ (primary)	-0.6%	[-0.8%, -0.2%]	11

Regression is likely real, but is in a secondary benchmark and incr-unchanged scenario. Not worth further investigation, particularly given the improvements.

Always generate GEP i8 / ptradd for struct offsets #121665 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.3%	[0.2%, 0.4%]	3
Regressions ❌ (secondary)	0.5%	[0.2%, 1.0%]	12
Improvements ✅ (primary)	-0.7%	[-1.6%, -0.3%]	14
Improvements ✅ (secondary)	-0.5%	[-1.0%, -0.3%]	12
All ❌✅ (primary)	-0.5%	[-1.6%, 0.4%]	17

Changes are expected since this changed a lot of our codegen. In practice though mostly not meaningful (i.e., some improvements, some regressions).

Rollup of 5 pull requests #121955 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.7%	[0.2%, 3.7%]	17
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.4%	[-0.8%, -0.2%]	10
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.3%	[-0.8%, 3.7%]	27

Likely to be caused by a correctness fix: https://github.com/rust-lang/rust/pull/121955#issuecomment-1975993842

perf: improve write_fmt to handle simple strings #121001 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	1.7%	[0.5%, 3.6%]	4
Regressions ❌ (secondary)	0.3%	[0.3%, 0.3%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.3%	[-1.3%, -1.3%]	1
All ❌✅ (primary)	1.7%	[0.5%, 3.6%]	4

Improvements to runtime (or at least assembly quality), at the cost of a bit more time in LLVM.