If you do this, then Rust's privacy guarantees become void. Any struct can have its private fields hacked into by transmuting it into a local type with the same definition. Currently this is UB and quite likely to break, afterwards it will be an accepted workaround for privacy rules, making library evolution impossible.
In that case it would be more consistent and less error prone to go with Python's "pretty please don't touch this kthx" from the start.
That was my hope, but I wanted to give us an out if for some reason we need to make a breaking change. And, just to be clear, I'm assuming that Rust will have a lifespan similar to C, which means we need to have plans in place for at least 50 years, and more than likely 100, so this isn't for breaking changes in the next 10 years, but possibly after we're all dead of old age.
I could be on board with that, if the final binary included the edition information within it. That would let future compilers link with old binaries, provided the specs for the old edition's representation are known, and let you automatically update to the latest representation just by recompiling with the latest version of the compiler. Would you permit repr(InteroperableRustYear) for those (hopefully rare) cases when you really need to pin the representation to some given edition?
I second (third?) what @CAD97 and @afetisov have already said. I think of repr(Rust) kind of like a nightly version of the Rust representation. Anything goes, and the only compatibility guarantee is that if two chunks of code are compiled by the same version of the same compiler, then they need to be able to be linked with one another.
The one feature I'd really like to see in a representation is a guarantee that it is possible to determine the version of the representation that is in use from the compiled binary alone somehow, with a guarantee that the compiler doesn't strip that information out regardless of optimization level. That'll allow linkers & loaders to work when all they have is the compiled binary to work with. In keeping with my earlier statements about Rust needing to have plans for use for the next 50-100 years, that information should be encoded so that a) people can look at the binary using nothing more than a hexdump, and b) we don't do something limiting, like trying to pack the information into a fixed (and tiny) field size.
This is effectively asking for the same RTTI[1] that type introspection needs. Even with C libraries, the compiled binary alone isn't enough, as you need the .lib in order to link. (I don't know the exact extent of C++ RTTI, but I believe the same holds true there.) The equivalent for Rust is the .rlib; I think it completely reasonable to require the .rlib's presence to determine the full representation information required to link. (And if generics are used, the .rlib is of course required to carry the MIR for the monomorphized functionality.)
The .rlib format isn't specified and can change freely between rustc versions, but if we gain some sort of #[repr(interopEdition)], it would make sense to also couple this with stabilizing enough of the .rlib format to enable interop between different Rust compilers.
For the purpose of introspecting a binary itself, debug info already exists with the information you're asking about, and it definitely still makes sense to maintain the ability to strip the debug type information due to the fact that it can help protect the integrity of the executable for a little longer if malicious actors have to reverse engineer the binary from scratch rather than have debug symbols to work from. (Or at the very least to satisfy "distributions must be obfuscated" requirements.)
I'm working on specifying metadata in lccc for rlibs. Currently I have part of the manifest format specified, as well as the form for declarative macros (CRML or "Compiled Rust Macro language"), with the language item table planned (a heap of lang item names to xref indecies). The formats are designed that implementations that wish to interoperate could adopt the specifications and contribute. They're also designed to be extensible both inside of the spec (with new revisions) and between implementations (the various "Extra" tags, "Compiler Specific Archive Content specifications").
It also comes with an ABI spec (which is more complete) that is versioned (and the version is present in the manifest), and dictates layouts of just about everything at a language level, though the versioning allows an "easy" path to breaking this ABI for legitimate reasons (such as a new layout guarantee being adopted, or a new layout optimization becoming desired enough to mandate).
The idea was never designed to become some sort of "mandatory" interoperability specification, as it makes a number of trade offs (particularily from the ABI side), however it might be a good starting point for any kind of rust impl binary interoperability spec.
There already aren't any protections. Field privacy is not an anti-hacking feature, but a lint for cooperative programmers against accidental misuse. Rust does not prevent any intentional misuse anywhere within the code.
Do you have a link to all of this work? I'd love to look at it, and I suspect that a lot of others would too.
Agreed. Part of the reason I said 'somehow' is because there are a numerous ways of capturing this information, one of which @InfernoDeity is working on already. My reasoning for making it possible to inspect using the simplest possible tools is because code tends to live far longer than anyone expects; witness the number of COBOL programmers that are in high demand. That said, with proper specs, version control, backups, etc., etc., etc., programmers of the future should have an easier time of it when trying to figure out what this pile of bits is, and how to work with it.
While I understand the reasoning for this, security through obscurity is a terrible security model, and I would rather not encourage its use.
How's that true? What is a non-UB way to access the private fields of a type defined in a different library, without modifying the library itself or doing something like scanning /proc/mem?
The point being made is that relying on layout details already is not language UB[1]. Making more layout guarantees for #[repr(Rust)] makes the hacks more portable between compilers, but it still remains library UB, in that the library author can change the layout and interpretation of the type's private fields without breaking compatibility.
You can claim that private fields could get OSSified into being layout stable, but I don't think making more guarantees on #[repr(Rust)] consistency meaningfully changes the calculus here. #[repr(Rust)] is already consistent in practice[2] (though you aren't supposed to rely on it, just like you're not supposed to rely on being able to transmute your way to private fields) as the PGO field reordering optimization is still purely theoretical and -Zrandomize-layout is an opt-in (partly due to not wanting to break code that wrongly assumes #[repr(Rust)] is consistent without reason).
It's in the strange place between language UB and just library UB. Relying on layout choices made by the compiler is relying on implementation details, so nonportable (including between compiler versions), but it is not a violation of any rules of the Abstract Machine to read bytes out of a type's representation and then use them at whatever type you want. ↩︎
Disclaimer: niche availability is also factored into field ordering IIRC, so it's not just based on field size/align. Also don't rely on this until if/when a guarantee is provided, because the flag to break you already exists and could theoretically be turned on by default. ↩︎
UB is not a security feature. In practice you can successfully perform operations that the language declares to be UB. In the end it's just a bunch of bytes in your own process and you have full access to all of it.
It would be a bad practice and ugly not-future-proof code, but that's not a security boundary.
I would very much want -Zrandomize-layout to be the default. It should have been the default from Rust 1.0, specifically to avoid future problems with people depending on it. Regardless, with respect to program correctness I have about as much compassion for depending on #[repr(Rust)] as for depending on double free and reading freed memory being OK because the allocator should just recycle memory pages for the same process. Which is, absolutely zero compassion. The language should help to enforce correctness and portability, not encourage the can of worms like C++ accidental ABI stabilization.
There is also no good reason why Rust couldn't optimize layouts in a type-specific way regardless of field similarity, even if it doesn't do so today.
It never works this way. Either you enforce that some property cannot be depended upon, or all of your observable behaviour is your public contract. Considering Rust's stability guarantees, this is very disturbing. It's not like we're talking about some weird corner case of unsafe semantics, it's something that is explicitly documented as unspecified and arbitrarily varying behaviour.
Rust goes to great length to enforce the privacy boundaries of libraries and modules, to enforce soundness and the possibility of API evolution. A stable #[repr(Rust)] means that subverting all those guarantees is a single google search with a stackoverflow question at the top: "how do I access private fields in Rust". At that point it will be copied everywhere, the most by the people who understand the consequences the least. Why must I even bother with all boilerplate and draconian restrictions of orphan rules, privacy, const correctness and explicit type declarations if I can't rely on it when I need it most?
We're not talking about security features, we're talking about which semantics the language guarantees, what counts as backwards compatibility and which way the APIs must be structured. I change a private field and the consumers of my library break. Is it their problem or mine?
This wastes both memory and execution time. -Z randomize-layout is intended as a debugging/fuzzing tool, and is not good for general use.
Rust never Promises that it will break #[repr(Rust)] layout, only that it's allowed to.
You still don't know what fields might be present in the struct. Private fields can be removed, change, and (as long as at least one private field existed from the start) added. They can also be maintaining arbitrary invariants, and have any meaning whatsoever. This can further change in any update whatsoever. This is true whether it's an unstable #[repr(Rust)] under -Z randomize-layout, or straight up #[repr(C)], as long as the actual layout of the structure isn't specified by the API of the type.
Their's.
From a standard perspective, yes. Individual implementations do specify a reliable ABI (and bend over backwards to preserve it).
That doesn't need to be true. For example, you could pick from one of the possible layout randomizations that doesn't increase the size of the struct from from the minimum. Especially for things like (Box<T>, bool, bool, bool) there's, what, a dozen different layouts that are all minimum size?
And I don't think execution time is affected once it's minimum size, because things like cache locality are already not guaranteed because even without randomization you don't know what the order will be.
Hmm, okay. Personally I don't think one should allow code to mix different stable APIs and hardcode them in a single crate (which would be possible with you notation.) In C++ the implementation of the stable API can be selected with a compiler flag. I believe that's the way to go for Rust (where stable API might not be the repr(Rust) but the repr(Portable) API). Hence there shouldn't #[repr(InteroperableRust_2027)] that can be attached to individual items.