Unexpected 3 x performance regression starting with Rust version 1.67

scottmcm · April 25, 2023, 6:09am

I do too! I've been calling it "move-only fields".

Noratrieb · April 25, 2023, 7:45am

repr(packed) does forbid potentially unaligned references, but it doesn't stop you from taking pointers to it (addr_of!) and reading from them. It still doesn't overlap any fields.

the8472 · April 25, 2023, 9:35am

I don't think that should apply to a hypothetical field alignment larger than that of the containing type

Consider

struct Foo(u8, [u8; 4]);
struct Bar(Foo, u32);

By sorting the array to the front bar.0.1 becomes a 4-aligned access.

quaternic · April 25, 2023, 1:40pm

Good point, I hadn't considered the effect on nested structures. Although that does only apply to code that accesses Foo as a field of Bar. For that case you could also get the same effect by placing the non-reordered Foo in bytes 3..8 or 7..12, but I can see how that would have its own issues like creating interior padding and just being more difficult to implement (For one, types would need to have a property for "a preferred unaligned offset relative to some larger alignment" ).

I did take a look at the layout code yesterday, thinking I might try what happens by clamping the field sorting key to the alignment of the containing type, but it looked like that isn't yet computed at the point where the ordering is done, so it wouldn't have been a sufficiently trivial change for me to test.

cuviper · April 25, 2023, 11:16pm

I think that's kind of accidental, e.g. (Foo, u16, u32) will layout Foo on 2-byte alignment. Maybe we could save as preferred in layout.align().pref? But I'm not sure how else that gets used, and in this case we'd have size 5 and pref 4 for Foo, which is also weird to not be a multiple of the size.

the8472 · April 26, 2023, 9:10am

I don't think there's a better strategy for (Foo, u16, u32) if we want to avoid interior padding. But communicating the preferred alignment might help with (u32, u8, Foo). Then it could bump Foo to the front of the align-1 group because that offset is still 4-aligned.

The layout optimization code is getting complex though and if mikebenfield finishes his multi-niche PR it'll need a refactor because sorting won't be able to find the best placement for all fields anymore.

afetisov · April 26, 2023, 10:38am

I don't see how that's beneficial. Sounds like it would just encourage people to rely on incorrect assumptions. The worst kind of error is the one which happens very rarely in very important piece of code. Personally I would prefer if something like -Zrandomize-layout were the default, where the compiler would choose a random layout unless it has good reasons to prefer a specific one.

mathstuf · April 26, 2023, 4:59pm

Reproducible build folks and benchmarking would become a lot harder (or at least need more docs). For example, one could not bisect over Rust releases finding perf regressions without remembering to turn this off (or potentially being led down false paths during the search).

scottmcm · April 26, 2023, 5:19pm

I do think there are some interesting options here. Like if the total field size of a repr(rust) type is 2ⁿ, we could try implicitly giving it an align of at least 2^½n, so a wrapper on a [u8; 1024] field can be aligned-simd-copied.

(I haven't figured out the details for that, though, since there's lots of way that could also make things worse. Like struct Foo([u8; 63]); getting rounded up could be terrible if it's often used as Option<Foo> or (Blah, Foo). But maybe this post will inspire someone to have a better idea than what I could come up with.)

afetisov · April 26, 2023, 5:27pm

That's true, but it would also help to find hidden false assumptions which will break reproducible builds and benchmarks anyway, just like in the OP. Arguably proper benchmarking and reproducibility already requires controlling many other sources of randomness, so this shouldn't be a significant extra issue. With reproducible builds, you should pass a separate flag to rustc which would disable all randomness. With benchmarking you should arguably do the same, because stuff like different optimizer behaviour based on random heuristics, or even a different order of functions in the binary, can already cause unexpected changes in performance. You should either eliminate all randomness, or treat it as some extra noise.

scottmcm · April 26, 2023, 8:12pm

This is why it also came with -Z layout-seed.

Benchmarking multiple times with different seeds sounds like a great way to find places where layout could be better.

system · July 25, 2023, 8:13pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Compile time performance changes for 1.8	4	1847	March 25, 2019
Rust Compiler Performance Working Group announcements	44	12242	March 25, 2019
Resolving Rust's forward progress guarantees compiler	53	6811	December 22, 2024
Measuring compiler performance tools and infrastructure	4	1552	March 25, 2019
Interesting Rust/web project - rework the compiler performance website	4	1820	March 25, 2019

Unexpected 3 x performance regression starting with Rust version 1.67

Related topics