2023-09-13 Triage Log

An interesting week. We saw a massive improvement to instruction-counts across over a hundred benchmarks, thanks to #110050 an improved encoding scheme for the dependency graphs that underlie incremental-compilation. However, these instruction-count improvements did not translate to direct cycle time improvements. We also saw an improvement to our artifact sizes due to #115306. Beyond that, we had a scattering of small regressions to instruction-counts that were justified because they were associated with bug fixes.

Triage done by @pnkfelix. Revision range: 15e52b05..7e0261e7

Summary:

(instructions:u)meanrangecount
Regressions ❌
(primary)
2.8%[0.7%, 10.2%]11
Regressions ❌
(secondary)
1.5%[0.4%, 7.7%]9
Improvements ✅
(primary)
-1.7%[-5.9%, -0.2%]112
Improvements ✅
(secondary)
-1.3%[-2.7%, -0.4%]41
All ❌✅ (primary)-1.3%[-5.9%, 10.2%]123

3 Regressions, 2 Improvements, 5 Mixed; 2 of them in rollups 84 artifact comparisons made in total

Regressions

Add FreezeLock type and use it to store Definitions #115401 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
0.3%[0.2%, 0.4%]11
Regressions ❌
(secondary)
0.3%[0.3%, 0.3%]1
Improvements ✅
(primary)
--0
Improvements ✅
(secondary)
--0
All ❌✅ (primary)0.3%[0.2%, 0.4%]11
  • The impact here is hypothesized to be due to serial/parallel trade-off; we benchmark the serial case and observe a small regression, while the parallel case is observing an improvement of roughly the same caliber.
  • Marked as triaged

Rollup of 6 pull requests #115672 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
4.2%[0.8%, 9.8%]5
Regressions ❌
(secondary)
--0
Improvements ✅
(primary)
--0
Improvements ✅
(secondary)
--0
All ❌✅ (primary)4.2%[0.8%, 9.8%]5

Use the same DISubprogram for each instance of the same inlined function within a caller #115417 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
1.0%[0.6%, 1.3%]3
Regressions ❌
(secondary)
--0
Improvements ✅
(primary)
--0
Improvements ✅
(secondary)
--0
All ❌✅ (primary)1.0%[0.6%, 1.3%]3
  • already marked as triaged
  • regression was expected, though we may be able to claw back performance after resolving rust#115455

Improvements

Span tweaks #115594 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
--0
Regressions ❌
(secondary)
--0
Improvements ✅
(primary)
-0.4%[-0.4%, -0.4%]1
Improvements ✅
(secondary)
-0.4%[-0.5%, -0.3%]6
All ❌✅ (primary)-0.4%[-0.4%, -0.4%]1

Disentangle Debug and Display for Ty. #115661 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
--0
Regressions ❌
(secondary)
--0
Improvements ✅
(primary)
-0.3%[-0.3%, -0.2%]4
Improvements ✅
(secondary)
-0.3%[-0.5%, -0.2%]3
All ❌✅ (primary)-0.3%[-0.3%, -0.2%]4

Mixed

Represent MIR composite debuginfo as projections instead of aggregates #115252 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
3.9%[3.9%, 3.9%]1
Regressions ❌
(secondary)
--0
Improvements ✅
(primary)
-0.3%[-0.3%, -0.3%]2
Improvements ✅
(secondary)
-0.4%[-0.4%, -0.3%]4
All ❌✅ (primary)1.1%[-0.3%, 3.9%]3
  • The single regression is to exa-0.10.1-opt-full
  • However, nnethercote noted that this PR introduced broad (if small) regressions to linked artifact (aka binary) sizes (in both opt and debug settings)
  • not marking as triaged

Use a specialized varint + bitpacking scheme for DepGraph encoding #110050 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
--0
Regressions ❌
(secondary)
0.5%[0.3%, 0.8%]4
Improvements ✅
(primary)
-1.7%[-5.8%, -0.3%]104
Improvements ✅
(secondary)
-1.4%[-2.9%, -0.5%]32
All ❌✅ (primary)-1.7%[-5.8%, -0.3%]104
  • on its surface, the improvements to instruction counts here clearly outweigh the regressions
  • it is worth noting that the cycle counts did not see the same trends; there were zero improvements and 7 primary regressions to cycle counts.
  • still, marking as triaged; this PR has gone through enough performance evaluation already.

Rollup of 7 pull requests #115665 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
0.7%[0.6%, 0.7%]2
Regressions ❌
(secondary)
0.6%[0.5%, 0.7%]5
Improvements ✅
(primary)
--0
Improvements ✅
(secondary)
-0.3%[-0.3%, -0.3%]1
All ❌✅ (primary)0.7%[0.6%, 0.7%]2
  • primary regressions were helloworld-check (incr-unchanged and incr-patched:println)
  • marking as triaged; not worth investigating a rollup for that benchmark.

Avoid a source_span query when encoding Spans into query results #115657 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
0.4%[0.3%, 0.4%]2
Regressions ❌
(secondary)
0.7%[0.4%, 1.0%]7
Improvements ✅
(primary)
-0.4%[-0.4%, -0.4%]2
Improvements ✅
(secondary)
-0.5%[-0.6%, -0.4%]4
All ❌✅ (primary)-0.0%[-0.4%, 0.4%]4
  • primary regressions are to diesel-check (full and incr-full).
  • This is fixing a soundness issue with the dep-graph maintenance; therefore, these regressions seem tolerable.
  • Marking as triaged

Encode only MIR reachable from other crates #115306 (Comparison Link)

(instructions:u)meanrangecount
Regressions ❌
(primary)
0.8%[0.3%, 2.4%]15
Regressions ❌
(secondary)
1.9%[0.3%, 9.1%]7
Improvements ✅
(primary)
-1.3%[-2.7%, -0.4%]12
Improvements ✅
(secondary)
-0.9%[-1.2%, -0.7%]5
All ❌✅ (primary)-0.1%[-2.7%, 2.4%]27
  • the big (>1%) primary regressions were to three check-incr-unchanged cases: cranelift-codegen-0.82.1, html5ever-0.26.0, and hyper-0.14.18
  • the regressions seem unfortunate, but tolerable given the improvement to linked artifact sizes
  • marking as triaged