2022-07-27 Triage Log

Overall it was a mostly good week, with some very significant wins among the secondary benchmarks. Rollups continue to complicate triage process.

Triage done by @pnkfelix. Revision range: 8bd12e8c..50166d5e

Summary:

	mean	max	count
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	2.2%	3.2%	6
Improvements 🎉 (primary)	-1.8%	-21.2%	199
Improvements 🎉 (secondary)	-2.6%	-9.0%	124
All 😿🎉 (primary)	-1.8%	-21.2%	199

5 Regressions, 4 Improvements, 4 Mixed; 4 of them in rollups 61 artifact comparisons made in total

Regressions

Rollup of 9 pull requests #99520 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	2.0%	2.7%	4
Regressions 😿 (secondary)	1.3%	2.5%	29
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	2.0%	2.7%	4

The 4 primary regressions, 3 are helloworld check, regressing by 2.5% to 2.7% on various incr scenarios. The last is ripgrep check but that only regressed by 0.36%.
From looking at the graph of helloworld-check over time, the regression to helloworld-check that was injected here was legitimate, as it plateaued up there for 4 or 5 days until it jumped back down due to PR #99677
PR #99677 was put in to address regressions injected by PR #97786, which was rolled up in PR #98656. Looking at the data from that rollup, it appears that helloworld-check there also regressed by 2.6%; so it seems to me like the regression injected by #99520 is probably still persisting; its presence is just masked by the effect of PR #98656...
Perhaps the regression is coming from the following queries/functions: stability_implications, metadata_decode_entry_stability_implications, defined_lib_features, metadata_decode_entry_defined_lib_features, all of which are present in the new commit but not the base commit. Were all of those added as part of PRs in this rollup?
If the above queries are indeed to blame for the regression here, then I think that would be tied to PR #99212, “introduce implied_by in #[unstable] attribute”.
Not marking as triaged. I'm leaving the perf-regression marker in place until we at least confirm which PR was the cause; then we can better evaluate whether the regression is an acceptable price to pay.

move considering_regions to the infcx #99501 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	0.4%	0.4%	2
Regressions 😿 (secondary)	0.4%	0.5%	5
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	0.4%	0.4%	2

The secondary regressions were already anticipated by the PR reviewer. The primary regressions are both diesel and they look like blips in the data to me from the graph.
Marking as triaged.

Sync in portable-simd subtree #99491 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	0.5%	1.0%	11
Regressions 😿 (secondary)	0.8%	1.3%	20
Improvements 🎉 (primary)	-0.2%	-0.2%	1
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	0.4%	1.0%	12

All of the regressions here are on doc profiles. I don't think its worth us spending time trying to figure out 1% regressions to rustdoc performance.
Marking as triaged.

Fix hack that remaps env constness. #99521 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	0.5%	0.8%	7
Regressions 😿 (secondary)	0.6%	0.6%	1
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	0.5%	0.8%	7

This regression was anticipated by the PR author and analyzed by the reviewer.
marking as triaged.

Rollup of 8 pull requests #99792 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	0.5%	0.8%	9
Regressions 😿 (secondary)	1.8%	2.9%	6
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	0.5%	0.8%	9

Primary regressions were to clap (check full, check incr-full, and doc full), libc (doc full), hyper (check full, check incr-full, and doc full), image (doc full), and webrender (doc full).
The significance factor points mostly to the clap cases (with 4.13x, 3.25x, and 7.15x respectively to each of the scenarios I listed above for clap).
The detailed query data for clap check full indicates that the problem is mostly in metadata_decode_entry_item_attrs and visible_parent_map; those are the ones that had a significant time delta that end up explaining the overall time delta (0.003 + 0.003 > 0.005).
visible_parent_map slowdown may be due to PR #99698.
The slowdown to metadata_decode_entry_Item_attrs may be due to PR #99712 ? Hard to say.
The secondary regressions are all to the projection-caching benchmark, which regressed by 1.2% to 2.9% in various scenarios. That regression seems to be to due a combination of both the metadata_decode_entry_item_attrs and visible_parent_map regressions, as well as a little bit more time spent in type_op_prove_predicate, evaluate_obligation, and normalize_projection_ty. Not sure why though, I don't think those got touched by this rollup. Maybe just different execution paths from the stdlib changes that did come in with this rollup?
Leaving comments on both the rollup PR and the two suspect PRs from the rollup. Not marking as triaged.

Improvements

Revert "Rollup merge of #98582 - oli-obk:unconstrained_opaque_type, r… #99495 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	-0.6%	-2.6%	136
Improvements 🎉 (secondary)	-1.0%	-5.5%	93
All 😿🎉 (primary)	-0.6%	-2.6%	136

Rollup of 7 pull requests #99506 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	-1.4%	-20.7%	35
Improvements 🎉 (secondary)	-1.1%	-2.8%	19
All 😿🎉 (primary)	-1.4%	-20.7%	35

The -20.7% improvement was to webrender-2022 (check profile, incr-patched:println scenario).
Not quite sure which PR in the rollup yielded that kind of improvement. Maybe PR #99486 sidestepped some pathological string construction(s) and comparison(s) in webrender?
The primary benchmarks other than webrender all observed <1% improvement.

Tweak SubstFolder implementation #99600 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	1.6%	1.6%	1
Improvements 🎉 (primary)	-0.4%	-0.6%	22
Improvements 🎉 (secondary)	-1.6%	-3.6%	14
All 😿🎉 (primary)	-0.4%	-0.6%	22

Remove new allocations from imported_source_files #99677 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	-1.5%	-9.9%	132
Improvements 🎉 (secondary)	-3.2%	-9.8%	77
All 😿🎉 (primary)	-1.5%	-9.9%	132

Mixed

Improve the function pointer docs #98180 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	0.2%	0.3%	3
Regressions 😿 (secondary)	0.4%	0.4%	8
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-1.2%	-1.2%	1
All 😿🎉 (primary)	0.2%	0.3%	3

The regressions above are all in doc generation, and they are all minor.
Marked as triaged.

Rollup of 11 pull requests #99567 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	0.5%	0.5%	1
Improvements 🎉 (primary)	-0.3%	-0.3%	4
Improvements 🎉 (secondary)	-0.7%	-1.0%	5
All 😿🎉 (primary)	-0.3%	-0.3%	4

Sole (small) regression was to secondary benchmark wg-grammar (doc full scenario), of 0.54%.
Not worth trying to tease that out of a rollup.

rustc_expand: Switch FxHashMap to FxIndexMap where iteration is used #99320 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	0.4%	0.4%	1
Improvements 🎉 (primary)	-1.1%	-1.8%	11
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	-1.1%	-1.8%	11

Sole (small) regression was to secondary benchmark tt-muncher (check incr-unchanged scenario), of 0.41%
Seems like a justifiable cost given that 11 primary benchmarks were improved by a mean -1.1%.

Upgrade indexmap and thorin-dwp to use hashbrown 0.12 #99251 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	0.2%	0.2%	3
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	-0.4%	-0.5%	7
Improvements 🎉 (secondary)	-1.4%	-1.4%	2
All 😿🎉 (primary)	-0.2%	-0.5%	10

7 primary improvements, eight on diesel in check+opt full+incr-full profiles, in the range -0.31% to -0.47%; 3 primary regressions, two on diesel in debug+opt incr-unchanged, all roughly 0.23%.
The change here was in part motivated by a soundness fix. So the relatively small regression here is easily outweighed by the soundness fix (and the fact that there were more significant improvements to boot is icing on the cake).
marking as triaged.

2022-07-27 Triage Log

Regressions

Improvements

Mixed

Untriaged Pull Requests