2022-05-31 Triage Log

A good week: The regressions were small; some have follow-up PR's in flight to address them; and we saw a big improvement from PR #97345, which adds more fast paths for quickly exiting comparisons between two types (such as BitsImpl<M> and BitsImpl<N> for const integers M and N). This improved compile-times for the bitmaps benchmark by 50-65% in some cases (including the trunk nalgebra, according to independent investigation from nnethercote). That same PR had more modest improvements (1% to 2%) to the compile-times for a number of other crates. Many thanks to lcnr and nnethercote for some excellent work here!

Triage done by @pnkfelix. Revision range: 43d9f385..0a43923a

Summary:

	mean	max	count
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	0.8%	0.9%	3
Improvements 🎉 (primary)	-4.0%	-65.9%	227
Improvements 🎉 (secondary)	-2.0%	-7.7%	217
All 😿🎉 (primary)	-4.0%	-65.9%	227

3 Regressions, 1 Improvements, 9 Mixed; 0 of them in rollups 59 artifact comparisons made in total

Regressions

Proc macro tweaks #97004 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	0.2%	0.2%	1
Regressions 😿 (secondary)	0.6%	1.5%	10
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	0.2%	0.2%	1

aparently making a Buffer<T> non-generic (its solely instantiated at u8) caused a performance regression.
there is ongoing follow-up work to address the regression, either by making the buffer generic again, or by adding #[inline] annotations.
see e.g. PR 97539

rustdoc: include impl generics / self in search index #96652 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	0.5%	1.0%	3
Regressions 😿 (secondary)	1.1%	1.1%	3
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	0.5%	1.0%	3

This regressed doc-generation for a few primary benchmarks, but I think that might be a inherent cost of a change like this. I marked it as triaged based on that assumption.

Move things to rustc_type_ir #97287 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	0.5%	0.7%	9
Regressions 😿 (secondary)	1.1%	1.1%	3
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	0.5%	0.7%	9

During development, this PR was identified as regressing one primary benchmark, bitmaps (in a variety of contexts), and the PR author said better to take that perf hit and deal with it later.
In the PR that landed, the number of regressing variants was smaller, but the affected set of benchmarks slightly larger: both bitmaps and unicode-normalization are affected, regressing by 0.35% to 0.70%.
My very brief inpsection of the flamegraphs (old, new) didn't show any smoking guns.
At this point I think we can just live with this performance hit.

Improvements

Add suggestion for relaxing static lifetime bounds on dyn trait impls in NLL #97284 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	-0.2%	-0.2%	1
Improvements 🎉 (secondary)	-5.3%	-5.9%	6
All 😿🎉 (primary)	-0.2%	-0.2%	1

This managed to improve performance (slightly), probably because it restructured the CallArgument and that had fallout that ended up being positive overall (despite our intuition being that it would hurt performance due to the increase in size).
It might be nice to confirm that hypothesis independently (by isolating just that structural change and confirming that it has a similar effect on performance here) ...
... but, the improvements are essentially isolated to just the secondary wg-grammar benchmark, so its not really worth too much digging, except if we think it might reveal other structural changes we should make elsewhere.

Mixed

add a deep fast_reject routine #97345 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	0.9%	1.6%	10
Improvements 🎉 (primary)	-11.0%	-65.8%	45
Improvements 🎉 (secondary)	-0.5%	-0.9%	12
All 😿🎉 (primary)	-11.0%	-65.8%	45

this was great, as noted in summary.

Move various checks to typeck so them failing causes the typeck result to get tainted #96046 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	0.9%	1.1%	4
Improvements 🎉 (primary)	-0.3%	-0.4%	4
Improvements 🎉 (secondary)	-0.3%	-0.3%	24
All 😿🎉 (primary)	-0.3%	-0.4%	4

This had some small improvements to primary benchmarks.
The CTFE stress test regressed, but I assume that was expected since this was a change to the CTFE engine (to address some ICE's).

Update jemalloc to v5.3 #96790 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	0.3%	0.3%	1
Improvements 🎉 (primary)	-0.9%	-6.2%	212
Improvements 🎉 (secondary)	-1.1%	-3.2%	191
All 😿🎉 (primary)	-0.9%	-6.2%	212

This had various improvements to our primary benchmarks' instruction counts.
The other big thing to observe: the max-rss was improved in several primary benchmarks (by 1% to 6%), and some secondary benchmarks saw even more significant improvements to their max-rss.

Split dead store elimination off dest prop #97158 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	0.5%	1.3%	15
Regressions 😿 (secondary)	0.6%	2.3%	12
Improvements 🎉 (primary)	-0.4%	-1.9%	50
Improvements 🎉 (secondary)	-0.6%	-1.3%	33
All 😿🎉 (primary)	-0.2%	-1.9%	65

The changes here were investigated by the PR author.
Marking as triaged based on their investigation.

Try to cache region_scope_tree as a query #97383 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	1.0%	1.0%	3
Improvements 🎉 (primary)	-1.7%	-4.7%	98
Improvements 🎉 (secondary)	-2.4%	-7.7%	43
All 😿🎉 (primary)	-1.7%	-4.7%	98

This was a targeted PR to address the regressions introduced by PR #95563 (Comparison Link).
It seems to succeed at recovering the instruction-count performance lost in that PR.
It doesn't quite recover all of the max-rss lost (PR #97383 improves max-rss by about 3.75% while PR #95563 degraded max-rss by about 4%), but its close enough to satisfy our needs.

proc_macro: don't pass a client-side function pointer through the server. #97461 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	0.5%	0.5%	2
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	-0.4%	-0.8%	11
Improvements 🎉 (secondary)	-2.6%	-5.4%	10
All 😿🎉 (primary)	-0.3%	-0.8%	13

On the primary benchmarks, this mostly yielded small improvements; the one exception was serde_derive (debug), which regressed by ~0.5% for the full and incr-full variants.
Looking at the flamegraphs for the before and after commits, and at the table at bottom of details page, it seems like the instruction-count regression is in codegen_module
Looking at the history of serde_derive-debug (massively zoomed in on the “Percent Delta from First” Graph kind), it seems reasonable to think that something did happen on this PR.
.... but I also don't really think its a big enough regression to be worth tearing our hair out over. This is a (smallish) win overall, and even for serde_derive-debug, it is a small regression in the context of much larger wins, so overall the trajectory is good.
Marked as triaged.

Replace #[default_method_body_is_const] with #[const_trait] #96964 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	0.2%	0.3%	9
Regressions 😿 (secondary)	0.2%	0.3%	3
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-1.1%	-1.1%	3
All 😿🎉 (primary)	0.2%	0.3%	9

The primary regressions are mostly in variants of stm32f4, with a two serde and one syn thrown in for good measure.
The improvements are solely to ctfe stress test, which I guess makes sense given the PR?
Anyway, the regressions seem minor, and they are contained to an unstable feature that the stdlib is using.
Marking as triaged.

improve format impl for literals #97480 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	0.4%	0.5%	4
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-1.5%	-1.5%	1
All 😿🎉 (primary)	0.4%	0.5%	4

This is adding a fast-path so that format!("literal") will compile into the same code as "literal".to_owned().
The primary regression is solely contained to bitmaps.
Its possible that the regression to bitmaps is due to format!("literal") being totally unused in that code; all instances of format! there take an additional argument. So its possible that the extra code to check about whether to use the fast-path is slowing things down there.
But I personally don‘t believe that explanation here: Unless I’m misunderstanding the code, there is some amount of macro-expansion into multiple instances of format!, but most of the expanded code is going to be dominated by all the impl blocks, not the relatively few format! instances. (Unless I massively misunderstand how the macros and/or codegen and/or inlining end up linking up here.)
So: I don't believe the best hypothesis I have for what is happening here.
But I also do not think the regression here is large enough to warrant further investigation.
Marking as triaged.

errors: simplify referring to fluent attributes #97357 (Comparison Link)

	mean	max	count
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	0.4%	0.6%	9
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-0.4%	-0.5%	4
All 😿🎉 (primary)	N/A	N/A	0

There were some regressions here, but they are few, minor, and contained solely to secondary benchmarks (specifically projection-caching and wg-grammar). Marking as triaged.

2022-05-31 Triage Log

Regressions

Improvements

Mixed

Untriaged Pull Requests