Unexpected 3 x performance regression starting with Rust version 1.67

I do too! I've been calling it "move-only fields".


repr(packed) does forbid potentially unaligned references, but it doesn't stop you from taking pointers to it (addr_of!) and reading from them. It still doesn't overlap any fields.


I don't think that should apply to a hypothetical field alignment larger than that of the containing type


struct Foo(u8, [u8; 4]);
struct Bar(Foo, u32);

By sorting the array to the front bar.0.1 becomes a 4-aligned access.


Good point, I hadn't considered the effect on nested structures. Although that does only apply to code that accesses Foo as a field of Bar. For that case you could also get the same effect by placing the non-reordered Foo in bytes 3..8 or 7..12, but I can see how that would have its own issues like creating interior padding and just being more difficult to implement (For one, types would need to have a property for "a preferred unaligned offset relative to some larger alignment" ).

I did take a look at the layout code yesterday, thinking I might try what happens by clamping the field sorting key to the alignment of the containing type, but it looked like that isn't yet computed at the point where the ordering is done, so it wouldn't have been a sufficiently trivial change for me to test.

1 Like

I think that's kind of accidental, e.g. (Foo, u16, u32) will layout Foo on 2-byte alignment. Maybe we could save as preferred in layout.align().pref? But I'm not sure how else that gets used, and in this case we'd have size 5 and pref 4 for Foo, which is also weird to not be a multiple of the size.

I don't think there's a better strategy for (Foo, u16, u32) if we want to avoid interior padding. But communicating the preferred alignment might help with (u32, u8, Foo). Then it could bump Foo to the front of the align-1 group because that offset is still 4-aligned.

The layout optimization code is getting complex though and if mikebenfield finishes his multi-niche PR it'll need a refactor because sorting won't be able to find the best placement for all fields anymore.

I don't see how that's beneficial. Sounds like it would just encourage people to rely on incorrect assumptions. The worst kind of error is the one which happens very rarely in very important piece of code. Personally I would prefer if something like -Zrandomize-layout were the default, where the compiler would choose a random layout unless it has good reasons to prefer a specific one.


Reproducible build folks and benchmarking would become a lot harder (or at least need more docs). For example, one could not bisect over Rust releases finding perf regressions without remembering to turn this off (or potentially being led down false paths during the search).

1 Like

I do think there are some interesting options here. Like if the total field size of a repr(rust) type is 2n, we could try implicitly giving it an align of at least 2½n, so a wrapper on a [u8; 1024] field can be aligned-simd-copied.

(I haven't figured out the details for that, though, since there's lots of way that could also make things worse. Like struct Foo([u8; 63]); getting rounded up could be terrible if it's often used as Option<Foo> or (Blah, Foo). But maybe this post will inspire someone to have a better idea than what I could come up with.)

That's true, but it would also help to find hidden false assumptions which will break reproducible builds and benchmarks anyway, just like in the OP. Arguably proper benchmarking and reproducibility already requires controlling many other sources of randomness, so this shouldn't be a significant extra issue. With reproducible builds, you should pass a separate flag to rustc which would disable all randomness. With benchmarking you should arguably do the same, because stuff like different optimizer behaviour based on random heuristics, or even a different order of functions in the binary, can already cause unexpected changes in performance. You should either eliminate all randomness, or treat it as some extra noise.

This is why it also came with -Z layout-seed.

Benchmarking multiple times with different seeds sounds like a great way to find places where layout could be better.


This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.