2022-09-20 Triage Log

This was a fairly negative week for compiler performance, with regressions overall up to 14% on some workloads (primarily incr-unchanged scenarios), largely caused by #101620. We are still chasing down either a revert or a fix for that regression, though a partial mitigation in #101862 has been applied. Hopefully the full fix or revert will be part of the next triage report.

We also saw a number of other regressions land, though most were much smaller in magnitude.

Triage done by @simulacrum. Revision range: 17cbdfd07178349d0a3cecb8e7dde8f915666ced..8fd6d03e22fba2930ad377b87299de6a37076074

Summary:

(instructions:u)	mean	range	count
Regressions ❌ (primary)	3.8%	[0.3%, 14.0%]	148
Regressions ❌ (secondary)	4.3%	[0.3%, 23.6%]	141
Improvements ✅ (primary)	-6.4%	[-24.8%, -0.5%]	24
Improvements ✅ (secondary)	-2.1%	[-4.0%, -0.4%]	12
All ❌✅ (primary)	2.4%	[-24.8%, 14.0%]	172

1 Regressions, 2 Improvements, 6 Mixed; 2 of them in rollups 43 artifact comparisons made in total

Regressions

Cleanup slice sort related closures in core and alloc #101816 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	1.1%	[1.1%, 1.1%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.1%	[1.1%, 1.1%]	1

This regression occurred in a single benchmark and only in the incremental scenario, so it isn't worth follow up investigation at this time.

Improvements

Partially revert #101433 #101902 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.5%	[-0.7%, -0.4%]	6
Improvements ✅ (secondary)	-0.6%	[-0.6%, -0.6%]	1
All ❌✅ (primary)	-0.5%	[-0.7%, -0.4%]	6

Do not fetch HIR node when iterating to find lint. #101862 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.6%	[-2.3%, -0.2%]	24
Improvements ✅ (secondary)	-13.1%	[-39.1%, -3.2%]	6
All ❌✅ (primary)	-0.6%	[-2.3%, -0.2%]	24

Mixed

Simplify caching and storage for queries #101307 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.7%	[1.7%, 1.7%]	1
Improvements ✅ (primary)	-0.2%	[-0.3%, -0.2%]	7
Improvements ✅ (secondary)	-0.3%	[-0.3%, -0.3%]	3
All ❌✅ (primary)	-0.2%	[-0.3%, -0.2%]	7

This change is either neutral or a slight improvement; the one regression occurs in a benchmark and only in one scenario, which is also prone to noise, so further investigation isn't necessary.

Rollup of 6 pull requests #101805 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.4%	[0.4%, 0.5%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-4.5%	[-5.0%, -4.1%]	6
All ❌✅ (primary)	0.4%	[0.4%, 0.5%]	2

Using the newish rollup-introspection tooling, @nnethercote narrowed this down to #101433. Put a comment on that PR and marked it as a perf-regression.

Compute lint levels by definition #101620 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	4.1%	[0.4%, 14.5%]	133
Regressions ❌ (secondary)	5.2%	[0.2%, 64.1%]	131
Improvements ✅ (primary)	-5.1%	[-24.3%, -0.4%]	30
Improvements ✅ (secondary)	-1.0%	[-1.2%, -0.5%]	5
All ❌✅ (primary)	2.4%	[-24.3%, 14.5%]	163

Regression is partially addressed by #101862, but not completely. Week/week diff remains negative. Left a comment asking us to pursue the revert for the time being, since the suggested fix for this regression didn't recover performance fully.

Derive various impls instead of hand-rolling them #101858 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	1.2%	[0.2%, 2.2%]	11
Regressions ❌ (secondary)	0.9%	[0.3%, 2.1%]	18
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.7%	[-0.9%, -0.5%]	6
All ❌✅ (primary)	1.2%	[0.2%, 2.2%]	11

This was a fairly large regression in a number of benchmarks; #101893 is pursuing further investigation here and tracking either improvements or a possible revert. It seems like moving to the derive'd impls may have also been a (correct) behavior change in some cases so a straightforward revert may be undesirable.

Change FnMutDelegate to trait objects #101857 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.2%	[0.9%, 1.6%]	9
Improvements ✅ (primary)	-1.2%	[-1.2%, -1.2%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.2%	[-1.2%, -1.2%]	1

Regressions are limited to secondary benchmarks and stress tests at that. The goal here is an improvement in bootstrap times (measured at ~1.5% earlier, though the actual PR merge was during a window where that measurement was failing). That improvement is worth the slight loss on stress tests.

Rollup of 7 pull requests #101949 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.2%	[0.2%, 0.2%]	3
Regressions ❌ (secondary)	0.8%	[0.2%, 3.5%]	15
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.4%	[-1.6%, -1.3%]	6
All ❌✅ (primary)	0.2%	[0.2%, 0.2%]	3

Still working on identifying the causes. Likely to be a minor enough delta that spending too much more time won't make sense.