Discussion: Editions in Rust-GCC (and other Rust compilers)

I think there's other options too:

Basically, I think it's a win that people have to say if they're relying on something. That way we can do things like field randomization on unannotated structs, as well as various other tricks that haven't even been invented yet, on structs that don't need such rigid rules.

6 Likes

@Nemo157, @josh, @scottmcm, I like all of those ideas! If the representation could also be designed to be forwards/backwards compatible[1], then successive Rust specs could build on earlier specs. Given enough time (Rust 2075 maybe?) we might know what the layout spec is supposed to be, and then we (or our successors) nail it down then.

Oh, and here's my take on the naming convention for the layout: #[repr(InteroperableRust_2027)]. The suffix is the edition year, which should let us keep on changing the spec, but only on edition boundaries.


  1. When I say 'forwards compatible', it doesn't have to be anything particularly clever; it could be something as simple as the linker emitting an error saying that you can't link version A.B.C objects to version X.Y.Z objects, and erroring out. ↩ī¸Ž

I'd point out that despite not being required to support all standards, basicallly all implementations do of C (from c89 all the way to c18) and C++ (c++98 through c++20). I'd find it highly likely that editions will end up with the same treatment.

2 Likes

Is there any reason as to why one should ever couple the repr to editions? This is not technically needed and at least to my understanding, older editions are merely code level compatibility modes and are skin deep, meaning that later stages should preferable not need to know about them at all.

1 Like

The main reason is that it allows crate authors to opt-in rather than opt-out of transitions in representations. If I have a crate that uses edition=2021 in its Cargo.toml, the compiler will know that it won't understand #[repr(InteroperableRust_2027)], and will do the right thing in that case[1]. On the other hand, if I update my crate's Cargo.toml to edition=2027, then I'm telling the compiler that it's fine to assume that #[repr(Rust)] can be #[repr(InteroperableRust_2027)], if that appears to be the best representation to use in a give case.

In between editions, we can introduce something like #[repr(InteroperableRust_2027)], but it would probably have to live on nightly until the next edition boundary.


  1. What the 'right' thing is will need to be discussed and agreed to by the community. Warnings, errors, compiler generated glue code, etc., etc., etc., are all possibilities, and are all beyond the scope of this topic. ↩ī¸Ž

I don't think repr(Rust) should be tied to edition. The #[repr(InteroperableRust_2027)] would be fine, though.

2 Likes

I just realized that I was misunderstanding what you and @nacaclanga might have been saying. Just to be 100% clear, what both of you are saying is that #[repr(Rust)] should not be tied to any edition boundary, but that representations like #[repr(InteroperableRust_2027)] should be, right? If so, I agree with you.

Based on the discussions earlier, my feelings are (now) that #[repr(Rust)] shouldn't be specified (no spec) beyond the spec stating that relying on any given layout or representation in #[repr(Rust)] can lead to UB. Compilers are free to layout #[repr(Rust)] as they see fit to optimize whatever it is they think needs to be optimized. However, #[repr(InteroperableRust_2027)] and other similar representations will be fully specified, and compilers are required to adhere strictly to the spec when users ask that something be laid out using #[repr(InteroperableRust_2027)] (or similar).

The only slightly strange thing I can see is if you want to override the compiler and force everything to be laid out using some interoperable form. Maybe something like #![repr(InteroperableRust_2027)] in lib.rs or main.rs should be a thing?

That is what I'm saying, yes.

I'll note that if it's named that way, it could be supported on all editions.

Even if it were named repr(InteroperableRust), that'd essentially be lowered to repr(InteroperableRustYear) in the compiler anyway.

I do think we should have some very small guarantees about repr(Rust), such as that (T1, T2, T3) and S(T1, T2, T3) and S { f1: T1, f2: T2, f3: T3 } have the same layout and can be transmuted between. But otherwise, yes.

This makes -Zrandomize-layout much more difficult, and precludes PGO field reordering. This isn't necessarily a bad thing, and being able to transmute Container<T> to Container<U> so long as used associated types are also field-order-equivalent is a big benefit, but it's important to keep in mind.

Then there's also the question of whether the field types need to be equivalent, or if transmute-compatible is enough. What about #[repr(transparent)] fields? Unless there's a more clever solution I'm missing, -Zrandomize-layout would basically be restricted to a seeded deterministic fn([Layout; N]) -> (Layout, [Offset; N]) rather than being able to permute more different layouts.

6 Likes

If you do this, then Rust's privacy guarantees become void. Any struct can have its private fields hacked into by transmuting it into a local type with the same definition. Currently this is UB and quite likely to break, afterwards it will be an accepted workaround for privacy rules, making library evolution impossible.

In that case it would be more consistent and less error prone to go with Python's "pretty please don't touch this kthx" from the start.

4 Likes

That was my hope, but I wanted to give us an out if for some reason we need to make a breaking change. And, just to be clear, I'm assuming that Rust will have a lifespan similar to C, which means we need to have plans in place for at least 50 years, and more than likely 100, so this isn't for breaking changes in the next 10 years, but possibly after we're all dead of old age.

I could be on board with that, if the final binary included the edition information within it. That would let future compilers link with old binaries, provided the specs for the old edition's representation are known, and let you automatically update to the latest representation just by recompiling with the latest version of the compiler. Would you permit repr(InteroperableRustYear) for those (hopefully rare) cases when you really need to pin the representation to some given edition?

I second (third?) what @CAD97 and @afetisov have already said. I think of repr(Rust) kind of like a nightly version of the Rust representation. Anything goes, and the only compatibility guarantee is that if two chunks of code are compiled by the same version of the same compiler, then they need to be able to be linked with one another.

The one feature I'd really like to see in a representation is a guarantee that it is possible to determine the version of the representation that is in use from the compiled binary alone somehow, with a guarantee that the compiler doesn't strip that information out regardless of optimization level. That'll allow linkers & loaders to work when all they have is the compiled binary to work with. In keeping with my earlier statements about Rust needing to have plans for use for the next 50-100 years, that information should be encoded so that a) people can look at the binary using nothing more than a hexdump, and b) we don't do something limiting, like trying to pack the information into a fixed (and tiny) field size.

This is effectively asking for the same RTTI[1] that type introspection needs. Even with C libraries, the compiled binary alone isn't enough, as you need the .lib in order to link. (I don't know the exact extent of C++ RTTI, but I believe the same holds true there.) The equivalent for Rust is the .rlib; I think it completely reasonable to require the .rlib's presence to determine the full representation information required to link. (And if generics are used, the .rlib is of course required to carry the MIR for the monomorphized functionality.)

The .rlib format isn't specified and can change freely between rustc versions, but if we gain some sort of #[repr(interopEdition)], it would make sense to also couple this with stabilizing enough of the .rlib format to enable interop between different Rust compilers.

For the purpose of introspecting a binary itself, debug info already exists with the information you're asking about, and it definitely still makes sense to maintain the ability to strip the debug type information due to the fact that it can help protect the integrity of the executable for a little longer if malicious actors have to reverse engineer the binary from scratch rather than have debug symbols to work from. (Or at the very least to satisfy "distributions must be obfuscated" requirements.)

[1]: runtime type information

I'm working on specifying metadata in lccc for rlibs. Currently I have part of the manifest format specified, as well as the form for declarative macros (CRML or "Compiled Rust Macro language"), with the language item table planned (a heap of lang item names to xref indecies). The formats are designed that implementations that wish to interoperate could adopt the specifications and contribute. They're also designed to be extensible both inside of the spec (with new revisions) and between implementations (the various "Extra" tags, "Compiler Specific Archive Content specifications"). It also comes with an ABI spec (which is more complete) that is versioned (and the version is present in the manifest), and dictates layouts of just about everything at a language level, though the versioning allows an "easy" path to breaking this ABI for legitimate reasons (such as a new layout guarantee being adopted, or a new layout optimization becoming desired enough to mandate).

The idea was never designed to become some sort of "mandatory" interoperability specification, as it makes a number of trade offs (particularily from the ABI side), however it might be a good starting point for any kind of rust impl binary interoperability spec.

2 Likes

There already aren't any protections. Field privacy is not an anti-hacking feature, but a lint for cooperative programmers against accidental misuse. Rust does not prevent any intentional misuse anywhere within the code.

9 Likes

Do you have a link to all of this work? I'd love to look at it, and I suspect that a lot of others would too.

Agreed. Part of the reason I said 'somehow' is because there are a numerous ways of capturing this information, one of which @InfernoDeity is working on already. My reasoning for making it possible to inspect using the simplest possible tools is because code tends to live far longer than anyone expects; witness the number of COBOL programmers that are in high demand. That said, with proper specs, version control, backups, etc., etc., etc., programmers of the future should have an easier time of it when trying to figure out what this pile of bits is, and how to work with it.

While I understand the reasoning for this, security through obscurity is a terrible security model, and I would rather not encourage its use.

Here here. Security Through Obscurity is no security at all.

1 Like

How's that true? What is a non-UB way to access the private fields of a type defined in a different library, without modifying the library itself or doing something like scanning /proc/mem?