Rolling out (or unrolling) struct field reorderings

Then, if we go this route, let’s just call it ReprLinear, and we are one parser modification away from allowing repr(linear).

I like the optimization fuel approach, honestly. I just think the default should be all, not none.

One more thing: how does allowing Rust to redorder fields impact the separate discussion about specifying drop order of struct fields?

I did not mean to imply it follows. I meant to imply it would make sense to me and would seem to fulfill a common need. Otherwise, if I wish to have two structs with identical layouts, I am forced to disable reordering and other optimizations, which seems unnecessary. But I'm not sure honestly what's the best setup here. I don't claim to have given this deep thought or gathered statistics or anything like that.

I don’t think it matters. You still have the necessary info and infrastructure to drop fields in any order, in so far as I understand how that works. Nothing should be connecting in-memory order to drop order short of a decision to actually do it.

This is the emergent behavior we have now.

That said, I don’t see why this is useful? What are the realistic needs for transmuting between different Rust structs in Rust, outside of the FFI context, or some other context where you need full control of the layout anyway? Short of unsafe things to do with extending lifetimes, of course.

I don’t think C++ makes this guarantee: fields are in order and in practice it works out this way, but I’m pretty sure nothing stops you adding extra padding to one or other of the “equivalent” structs as a compiler dev.

Sorry I’m late to the party, I just saw this thread.

To respond to Niko’s original message, I think it was not very useful to revert this particular change. At least for Servo the “damage” was done, we already had (with eddyb’s help) found the type that was missing #[repr(C)] and got a build to pass the test suite.

I do agree that future changes that can be similarly disruptive should be handled better. With an announcement around the time the change lands in Nightly, and a compiler flag. I agree with other commenters that the change should be enabled by default, with a flag to opt out. (Otherwise very few people/projects will actually try it.) An opt-out flag is still very useful, if only to confirm which compiler change caused a regression. Without it, we’re left with bisecting the compiler’s git history and spending a lot of time building all these compiler versions.

eddyb, I disagree with you that a flag in not necessary. Brute-forcing a work around with sed on $CARGO_HOME honestly sounds like a pain, and something that might be very easy to get wrong (so you might end up not testing what you think your testing, and draw incorrect conclusions). And there is no saying that the next disruptive compiler change will happen have a nice work-around with existing syntax like #[repr(C)].

As to randomizing the field order, this idea reminds me of rr’s chaos mode. It can be very a useful debugging tool, but I think it should absolutely not be enabled by default. Even if only in cargo’s “debug” mode, which often really is “take slightly less of forever to build” mode, not “chaos” aka “please do everything you can to break my program” mode.

1 Like

Wait, I don't understand your response.

The reason to revert the change, at least as I understand it, was not to address a Servo regression (or at least, that was not the sole reason).

The reason to revert the change was to ensure that it did not leak out onto the stable channel.

In other words, we didn't want to silently change the struct layout for clients who are on the stable channel, at least not without providing:

  1. better tools for diagnosing when issues related to the change arise, and
  2. so-called "public service announcements" advertising that the change is coming.

Am I misunderstanding something about what happened?

Ok, maybe I’m the one who misunderstood. I though this change was merged after the 1.15 beta was cut. Or are we also worried about 1.16?

What happened is that the PR merged literally hours before the 1.15 beta. This wasn’t the intent; as the author I was almost immediately horrified because this is the kind of work that probably has a major bug somewhere. Then Servo broke, Eddyb immediately started fixing servo, and I immediately began a reverting PR (which wasn’t hard) so that we could backport it if desired. Reverting the revert PR is removing one line and fixing up some tests.

I think that for testing purposes, the best way to catch bugs is to simply reverse the field order in all structs. This avoids issues where you might miss a struct if a random shuffle happens to give you the same layout you were previously getting.

3 Likes

I’d like to add a vote for keeping both debug and release builds deterministic by default. A “chaos”/“please break my code” flag is also something I’d love to see, just not as a default.

5 Likes

Here[0] is some Real World breakage that was “caused” (probably “uncovered” is a better term) by field reordering. My hope is that it can help guide the design of the future transmute-related lints that have been discussed in this thread.

In this case, xmas-elf calls into zero functions, which funnel into zero::read_unsafe and zero::read_array_unsafe which both transmute the input into the desired generic output type[1]. For this transmuting to be safe, the generic type must be #[repr(C)], so the lint would need to see through generics to have caught this bug.

Additionally, there is already a zero::Pod trait, which would ideally have a trait bound something like Pod: ReprLinear to leverage the type system to help enforce safety. Has such an idea been proposed already?

[0] https://github.com/nrc/xmas-elf/pull/34 [1] https://github.com/nrc/zero/blob/1d571c9e4e844df5703cad164cb097e75f2828c1/src/lib.rs#L111-L120

3 Likes

I think that a trait representing C types is an RFC, and kind of beyond the scope of this thread. ReprLinear also needs an RFC to be publicly exposed, and I doubt that it will be defined enough that you can usefully transmute with it. I’m not saying that a trait for types which are repr(C) is a bad idea, mind you.

I don’t think a transmute lint can ever see through generics. It’s a good point, but we would have to be really clever to introduce this in such a way that it doesn’t trigger on large amounts of already-okay code. If we had a repr(C) trait, fine, but right now a function like fn convert<T, U>(x: T) -> U is acceptable. We would have to trigger the lint on specific monomorphizations. I don’t know if that’s possible or not.

I see two trade-offs at play here, two of them are incompatible:

  • deterministic builds (no randomization),
  • increased security (randomization),

and one consequence of no randomization + stable sort of the fields:

  • control: no randomization + stable sort allows users to pack fields that are accessed together close to each other. This allows packing hot fields in the same cache line, or packing fields in different cache lines to avoid false sharing.

So in a nutshell, no randomization gives us deterministic builds, which make debugging easier, and more control, which might result in better performance when used correctly.

However, I cannot recall the last time I was debugging something in C++ and struct layout optimization made that harder, so while I think that determinism does in theory make debugging easier, I am not convinced that the improvement is worth it in practice. The same goes for more control. Users wanting control are going to be using #[repr(C)] anyways (which I assume is not going to reorder fields if randomization is enabled).

OTOH Rust already has ASLR enabled by default, and randomized struct layout does make exploiting Rust programs significantly harder.

For these reasons I think that randomization is a better default, the advantages of determinism and control are just not worth it.

Following these thoughts, I think that a good default would be:

  • debug builds: true randomization to detect programs that assume a specific order early,
  • release builds: optimized randomization without guarantees about stably sorting the fields (to force those who want to rely on some kind of ordering to use #[repr(C)],
  • better #[repr(C)]: diagnostics/warnings about padding/alignment/sub-optimal layouts that can be easily disabled, and maybe some language level constructs to control what goes into which cache line (e.g. crossbeam solutions to this are pragmatic, hacky, and non-portable).

These rules would make struct layout very easy to teach:

The layout of struct fields in Rust is undefined. If you need to make any assumptions about struct field layout use #[repr(C)].

An alternative to #[repr(C)] could be #[repr(manual)] or something like that (but this is bikeshedding).

1 Like

@nikomatsakis and I talked on IRC, and it looks like this is the approach we’re going to take. It will be a few days before I can open the PR, and I’ll probably need to lean on someone for it.

I plan to internally assume repr(optimize) by default, with repr(linear) being the one we actually choose to represent. Leaving the discussion of turning it on by default can probably wait until after the switch is in.

I’ll need to lean on someone for this: I have never ventured much beyond trans.

1 Like

I don’t think we can do random layouts in debug builds without significant effort because my understanding is that std is compiled in release mode always.

I know that we were discussing adding a flag that just turned everything off, and the problem with that ended up being that std could be compiled with it on. If the randomness isn’t stable, then it seems to me that a similar objection might arise here.

The problem I have with randomization by default is that if it ever does uncover a correctness or performance bug, that bug will probably be unfixable unless the randomization was deterministic/reproducible, i.e. there was an explicit seed value exposed to or provided by the user. While I would appreciate the ability to tell Rust to “generate ten random seeds and rank them by cargo bench results” or “do a release build with this seed after making sure it passes all tests”, having to talk about seed values by default seems like it’ll generate a lot of annoying noise/busywork during “normal development” for marginal gain. And I personally don’t think “it practically never happens” is a good enough counterargument when referring to bugs so subtle they only get uncovered by struct field reordering.

@camlorn

Can struct layout be controled in a crate-by-crate basis? Or does it need to be a “whole program” setting?

@Ixrec

FWIW what I meant is that optimizations that affect the layout of struct fields do uncover bugs, but while fixing those bugs the exact layout that the struct had did not really matter during debugging.

In particular, I cannot write down addresses of struct members anyways and expect them to be equal between runs due to ASLR, so a truly random order between builds (compilations) would not have make that particular bug harder to find/fix.

One thing worth mentioning is that a truly random debug layout prevents anybody from assuming any order and uncovers bugs. We can always go to a deterministic debug layout later if it is deemed worth it, but the reverse is not true without introducing breakage again.

In particular, the first attempt at fixing my bug was to manually sort the struct fields after alignment and still assume some layout (because I knew the fields were stably sorted and did not wanted to use reprC…).

If we guarantee some layout that can be inferred, some code somewhere is going to rely on it.

An alternative would be to fully guarantee struct layout, since optimal struct layout is a solved problem, we could guarantee that the struct fields are always stably sorted with decreasing alignment. That way all code can depend on it, and we can never break it. That might make Rust code more easily exploitable, but it is also an easy rule to teach and remember.

That only provides useful guarantees if the size and alignment of the fields is known, which is reasonable if the struct only contains basic integer types, but not in most other cases. For example, enum layout depends on optimizations which could be improved in the future, standard library types can change over time, and porting to new platforms can cause various changes. Thus depending on this guarantee would usually be at least somewhat risky; better to be more explicit about it.

Also, optimal struct layout isn't necessarily a solved problem. For one thing, the kinds of optimizations discussed in the "repr(transparent)" thread could also be applied automatically for fields the compiler can prove that code never takes a reference to (either for private fields/structs or, someday, perhaps even across crates). I don't really know whether this would be a good idea, since it creates nonobvious optimization hazards, but especially if it worked cross-crate, it could work well for certain common cases of space waste. Further, while space-optimality is easy to determine in lieu of such optimizations, time-optimality is another question. In principle the compiler could detect fields which are often accessed together and try to ensure they end up in the same cache line.

1 Like

Sorry I didn’t get back to this sooner; this thread and project is not forgotten, just misplaced by some stuff I’ve got going on.

We can control on a per-crate basis by introducing more reprs used internally only by the compiler, and adding them to types from the crate whose layout is to be controlled. This is what is being proposed with the -z struct-reordering flag above.

In the long run, how reprs work needs to be changed in order to make working with multiple reprs easier. I’ve promised to implement repr(pack=n) and repr(align=n), so this refactor will probably fall to me at some point (I don’t know who Windowsbunny is here, but if they’re reading, this wasn’t forgotten either). At that point, we just make more fields in the let’s call it the layout descriptor, and then that entire struct goes into metadata.

1 Like