2023-09-13 Triage Log

An interesting week. We saw a massive improvement to instruction-counts across over a hundred benchmarks, thanks to #110050 an improved encoding scheme for the dependency graphs that underlie incremental-compilation. However, these instruction-count improvements did not translate to direct cycle time improvements. We also saw an improvement to our artifact sizes due to #115306. Beyond that, we had a scattering of small regressions to instruction-counts that were justified because they were associated with bug fixes.

Triage done by @pnkfelix. Revision range: 15e52b05..7e0261e7

Summary:

(instructions:u)	mean	range	count
Regressions ❌ (primary)	2.8%	[0.7%, 10.2%]	11
Regressions ❌ (secondary)	1.5%	[0.4%, 7.7%]	9
Improvements ✅ (primary)	-1.7%	[-5.9%, -0.2%]	112
Improvements ✅ (secondary)	-1.3%	[-2.7%, -0.4%]	41
All ❌✅ (primary)	-1.3%	[-5.9%, 10.2%]	123

3 Regressions, 2 Improvements, 5 Mixed; 2 of them in rollups 84 artifact comparisons made in total

Regressions

Add FreezeLock type and use it to store Definitions #115401 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.3%	[0.2%, 0.4%]	11
Regressions ❌ (secondary)	0.3%	[0.3%, 0.3%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.3%	[0.2%, 0.4%]	11

The impact here is hypothesized to be due to serial/parallel trade-off; we benchmark the serial case and observe a small regression, while the parallel case is observing an improvement of roughly the same caliber.
Marked as triaged

Rollup of 6 pull requests #115672 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	4.2%	[0.8%, 9.8%]	5
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	4.2%	[0.8%, 9.8%]	5

already marked as triaged
all five regressions are to doc benchmarks, due to new feature added in https://github.com/rust-lang/rust/pull/115201

Use the same DISubprogram for each instance of the same inlined function within a caller #115417 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	1.0%	[0.6%, 1.3%]	3
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.0%	[0.6%, 1.3%]	3

already marked as triaged
regression was expected, though we may be able to claw back performance after resolving rust#115455

Improvements

Span tweaks #115594 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.4%	[-0.4%, -0.4%]	1
Improvements ✅ (secondary)	-0.4%	[-0.5%, -0.3%]	6
All ❌✅ (primary)	-0.4%	[-0.4%, -0.4%]	1

Disentangle Debug and Display for Ty. #115661 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.3%	[-0.3%, -0.2%]	4
Improvements ✅ (secondary)	-0.3%	[-0.5%, -0.2%]	3
All ❌✅ (primary)	-0.3%	[-0.3%, -0.2%]	4

Mixed

Represent MIR composite debuginfo as projections instead of aggregates #115252 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	3.9%	[3.9%, 3.9%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.3%	[-0.3%, -0.3%]	2
Improvements ✅ (secondary)	-0.4%	[-0.4%, -0.3%]	4
All ❌✅ (primary)	1.1%	[-0.3%, 3.9%]	3

The single regression is to exa-0.10.1-opt-full
However, nnethercote noted that this PR introduced broad (if small) regressions to linked artifact (aka binary) sizes (in both opt and debug settings)
not marking as triaged

Use a specialized varint + bitpacking scheme for DepGraph encoding #110050 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.5%	[0.3%, 0.8%]	4
Improvements ✅ (primary)	-1.7%	[-5.8%, -0.3%]	104
Improvements ✅ (secondary)	-1.4%	[-2.9%, -0.5%]	32
All ❌✅ (primary)	-1.7%	[-5.8%, -0.3%]	104

on its surface, the improvements to instruction counts here clearly outweigh the regressions
it is worth noting that the cycle counts did not see the same trends; there were zero improvements and 7 primary regressions to cycle counts.
still, marking as triaged; this PR has gone through enough performance evaluation already.

Rollup of 7 pull requests #115665 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.7%	[0.6%, 0.7%]	2
Regressions ❌ (secondary)	0.6%	[0.5%, 0.7%]	5
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.3%	[-0.3%, -0.3%]	1
All ❌✅ (primary)	0.7%	[0.6%, 0.7%]	2

primary regressions were helloworld-check (incr-unchanged and incr-patched:println)
marking as triaged; not worth investigating a rollup for that benchmark.

Avoid a source_span query when encoding Spans into query results #115657 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.4%	[0.3%, 0.4%]	2
Regressions ❌ (secondary)	0.7%	[0.4%, 1.0%]	7
Improvements ✅ (primary)	-0.4%	[-0.4%, -0.4%]	2
Improvements ✅ (secondary)	-0.5%	[-0.6%, -0.4%]	4
All ❌✅ (primary)	-0.0%	[-0.4%, 0.4%]	4

primary regressions are to diesel-check (full and incr-full).
This is fixing a soundness issue with the dep-graph maintenance; therefore, these regressions seem tolerable.
Marking as triaged

Encode only MIR reachable from other crates #115306 (Comparison Link)

(instructions:u)	mean	range	count
Regressions ❌ (primary)	0.8%	[0.3%, 2.4%]	15
Regressions ❌ (secondary)	1.9%	[0.3%, 9.1%]	7
Improvements ✅ (primary)	-1.3%	[-2.7%, -0.4%]	12
Improvements ✅ (secondary)	-0.9%	[-1.2%, -0.7%]	5
All ❌✅ (primary)	-0.1%	[-2.7%, 2.4%]	27

the big (>1%) primary regressions were to three check-incr-unchanged cases: cranelift-codegen-0.82.1, html5ever-0.26.0, and hyper-0.14.18
the regressions seem unfortunate, but tolerable given the improvement to linked artifact sizes
marking as triaged