Repr questions, and requests


#1

Hi folks.

I’ve have two issues related to repr and a request that seems not-hard, but could totally be shot down.

Issue 1: I use Abomonation a bunch for serialization (and will continue to do so, until something with similar speed/ergo shows up), and it makes some assumptions about the stability of field layout. I’m comfortable (note: not “happy” :)) with the requirement that field layout is stable within a binary, and not expecting anything between binaries, but it would be really helpful to have some constancy across builds. It is a pain to put #[repr(C)] on all types, partly because (i) it is meant to be a light touch, and (ii) many types are system-provided (it would suck to not get to use String or tuples, etc.).

Issue 2: I’ve recently become interested in Rust shared libraries, as a way to modularly load up dataflow graphs. The interface is really narrow: the library code provides a method that builds a graph if handed an Allocator, and the server binary loads libraries and calls into these methods. However, the Allocator can mint channels of arbitrary types, and everything goes off the rails if the library and server binary have different opinions on the layouts of these types. Or if a second library which receives the data has a different opinion. Again, you could put #[repr(C)] on literally every type that you use, and avoid system types, but this seems like a horror.

Both of these issues suffer from the fact that repr(C) is not a trait, and so I can’t actually require it, and so things would just silently fail when field orders get monkeyed with. The library/server code I’ve written is like 20 lines total, issues no warnings, and just wont work correctly once Rust starts changing field orders around. There is absolutely unsafe in both pieces of code, so the problem isn’t “Rust is not behaving correctly” so much as “Rust makes it very hard to write correctly behaving code”.

Ask: It seems not too hard (and please correct me once you’ve finished laughing) to indicate something like a binary-wide #repr(C) on all introduced types. For example, a Cargo.toml field or rustc parameter. All of the discussion on field re-ordering has made it sound like these options aren’t so hard to tweak, so maybe this wouldn’t be a tragic undertaking. My understanding is that std already has an opinion on its types and this isn’t something I can change, but it would be nice to be able to request a stable ABI for the nonce.

The request is not to have Rust commit to an ABI, but just a Cargo.toml entry or something that results in a consistent (if unspecified) field layout for … say any given version of std. I understand that (some of) you would prefer not to have anyone rely on field ordering, but loading and using shared libraries is really not meant to be some abusive use of underspecified internals.


I wanted to write some alternatives here, but I couldn’t think of any. What I really hoped was that #repr(C) would be transitive, requiring nested fields be repr(C) also so that I could wrap types that might transit binary boundaries, and restrict channel/serialization to such types. Apparently that doesn’t work, though.


#2

#![apply_attr_to_all(struct | enum, repr(C))]

That’s something you could potentially implement in a procedural macro and apply to a crate/module.


#3

This wouldn’t address e.g. (T, U) types though, would it? They could still be laid out differently from build to build I think.

Edit: not trying to sound thankless, just making sure I understand when I see a solution that would actually solve the full problem I have.


#4

Oh, no, it wouldn’t be global. It… can’t be. Not without you recompiling libstd (because the compiled libstd already assumes some layouts).

We could probably guaranteed that if you didn’t actually change the composition and full path of your nominal types you’ll get the same layout.

The main things we’d rather not guarantee are:

  • stability across compiler versions
  • writing the same definition twice in distinct locations (maybe crate-based? not sure) and getting the same layout (randomization will be helpful here)

Other than that, layout stability across rebuilds of your own code is very useful for incremental recompilation and we might want to guarantee a subset of it.


#5

The thing I am most interested in having is

writing the same definition twice in distinct locations (maybe crate-based? not sure) and getting the same layout (randomization will be helpful here)

I understand that you don’t want to be in the business of guaranteeing this sort thing generally, but is there a good reason to prevent opting in to something like this? It seems very useful, especially for anyone who needs interacting Rust binaries. I’m not really sure what the alternative is (advise users to put #repr(C) everywhere, avoid generic system types?).


#6

FWIW the accepted solution there is to have a “common public root”, i.e. all parties compile against the same crate, and using the same compiler version.


#7

Eddyb and I talked a bit on IRC about this, and I got some re-assuring signals from him (in the form of specific words that my limited brain filtered down to just signals).

As long as one builds against a type exposed in some common library, it is pretty much (caveat: my understanding is limited) going to have its representation locked in at this point. No guarantees about what the representation is, but given that other users of the library can also use the type and will need to have compatible code, there isn’t a great deal of further flexibility in how it is represented.

I think my misunderstanding was that Rust had the ability to re-layout e.g. (u32, u8) in each binary, and perhaps it does “retain the right” to do this, but realistically speaking it cannot do this without breaking a great deal of compatibility.