Pre-RFC: Explicit Opt-in OIBIT for truly POD data and safe transmutes

If you’re worried about performance isn’t the better solution to use your own padding with #[repr(packed)]? Writing to a padding byte is definitely bad given tbu’s example—if that space is used to hold something else on stack, then you’d be clobbering that.

1 Like

I don’t get to write the types. Someone shows up with a Vec<(u8, u64)> and wants it serialized.

If Rust made a deal with LLVM that it points out padding bytes and promises never to use them, that would be a great example of one of these bargains.

If Rust didn’t make that deal, LLVM should keep its register spill out of locations my code writes to. :smile:

Edit: as far as I can see, writing to a padding byte isn’t undefined behavior (ideally because LLVM only uses locations it can see as unused). If that isn’t the case, someone should add it to the list of behavior considered undefined.

Edit 2: in case it clears up where I’m coming from / what I’d like to be “defined behavior” https://github.com/frankmcsherry/abomonation

The “reads of uninitialized memory” rule doesn’t mean a read in the sense of the CPU reading memory, but rather reading a value from memory into a Rust temporary or variable (expressions like *ptr). Using std::ptr::copy_nonoverlapping or libc::write on a pointer to uninitialized memory is fine.

As a first step, this can be done without additional compiler support:

pub unsafe trait Mappable: Copy
// no impl Mappable for .. {}
unsafe impl Mappable for u8, u16, u32, u64, i8, i16, i32, i64, f32, f64 {}
unsafe impl<T: Mappable> Mappable for [T]

I wonder if there’s a way (at runtime) to check for packed repr, maybe something like:

debug_assert!(size_of::<T>() == min_align_of::<T>())

I’d also like the transmute_mappable_ref and transmute_mappable_refmut to be investigated further, since they would be very useful and I don’t see how they could be illegal in Rust. And if it would be, maybe that can be worked around by returning a &UnsafeCell<D> instead of a &D.

1 Like

Oh, and one more thing. The source needs only to be Mappable in the map_bytes_mut case, not the map_bytes case, as reading from private values is not considered unsafe. And in transmute_mappable, only the destination needs to be Mappable.

Btw. Maybe we can call it Horse, as in “Has Only Really Safe Essence”. It goes well with the fact that we already have Cow in the standard library.

2 Likes

I like the idea, but I don’t think we could later introduce language support in a backcompat way if we make the trait unsafe, since all unsafe impl Mappable for MyType {} incantations would become incorrect when we did that. I think we could use the following trick however:

#[unstable]
pub trait EnableMappable {}
impl EnableMappable for u8, u16, u32, u64, i8, i16, i32, i64, f32, f64 {}
impl<T: EnableMappable> EnableMappable for [T] {}

pub trait Mappable: EnableMappable {}
impl<T: EnableMappable> Mappable for T {}

This allows users of std to bound types with Mappable since it is stable, but prevent them from writing their own impl-s since they’d need to implement EnableMappable which is unstable.

Then when language support comes in some form or another, we can get rid of the EnableMappable trait altogether. What do you think?


Regarding transmute_mappable_ref, I’ll admit I don’t know too much about LLVM internals, but my understanding is that the optimizer is allowed to assume that pointer of different types never alias. UnsafeCell wouldn’t help at all in this case since that only deals with pointers of the same type; I’m sure LLVM still assumes UnsafeCell<f32> and UnsafeCell<u32> never alias.

This is very useful in C++ where pointers, unless defined as restrict, are otherwise always considered aliasable. Due to Rust’s limits on mutability and aliasing, I think this is actually a non-issue: if I had a two pointers of different types pointing to the same underlying piece of memory, neither of them could modify it so LLVM may as well think they don’t alias.

I mention this in my pre-RFC and suggest being conservative initially. We can relax -> &[u8] to -> &D later (assuming the defaults-affect-inference changes land).

I’d really like someone’s input with more LLVM knowledge.


Reading private fields with map_bytes makes me queasy, but I take your point—mem::forget also makes me queasy. However, reading from padding still applies, so in both transmute_mappable and map_bytes the source needs to be at least ReprPacked.

Right, I see. What about this? I'm not sure default impls are supposed to work on bounded traits, but it seems to work as you'd expect, [edit]and I can't find anything to the contrary in the RFC[/edit], so it should be okay. I was also bit surprised by how the cycle was resolved, but that seems to be the behaviour even if you use normal traits, so I think it's the intended behaviour.

Edit: The errors are also remarkably good.

Edit 2: It occurred to me that all three traits are essentially equivalent. So, since OptInToMarker is the one you actually implement, it would probably be more ergonomic to make it the actual Marker trait, rename Marker to EligibleForMarker (this'll keep it obvious what it means in errors), and then rename the original EligibleForMarker to EligibleForMarkerHelper or something, since it's really only there to break the cyclic dependency between the other two traits (example).

1 Like

Breaking the cyclic dependency by relying on an unsafe trait to prevent safe impls tying into it is brilliant! It really looks like it could work! We can even make the trait you called Marker private—there’s no need to export it (though we may want to for docs).

We still need language support for ReprPacked (and bind MappableEligible by it) but that should be trivial to add by comparison.

I agree with your name changes in Edit 2 the ‘main’ trait is the OptIn one, how about something like this:

pub trait Mappable: MappableSafe {}

pub unsafe trait MappableSafe  {}
unsafe impl<T> MappableSafe for T
    where T: RecursivelyMappable /* + ReprPacked */ {}

unsafe trait RecursivelyMappable: Mappable {}
unsafe impl RecursivelyMappable for .. {}

impl<'a, T> !RecursivelyMappable for &'a T {}
impl<'a, T> !RecursivelyMappable for &'a mut T {}
impl !RecursivelyMappable for bool {}
impl !RecursivelyMappable for char {}

With your permission (and crediting you, of course) I’d like to use your idea and write out the final RFC tonight. Let me know!

I’m not really following w r t your EnableMappable, MappableEligible and so on, so maybe you’re saying the same thing as this:

We might actually need one MappableRead and one MappableWrite trait.

  • The MappableRead is simply anything that does not have padding. It could directly map to the proposed ReprPacked trait if we get compiler support for it.

  • The MappableWrite will always have to be an unsafe trait regardless of compiler support.

Or - and I think that’s better - we could potentially change the language reference to say that reading unitialized memory is safe, as long as it is interpreted as a MappableWrite type. In that case we wouldn’t need a MappableRead trait.

The thing I wrote with Mappable, MappableSafe and RecusivelyMappable is just steven099’s proposal of how to implement the proposal as described in the OP, but without any additional compiler support (except ReprPacked). As a user you’d still only need to care about Mappable really, the existence of Safe and Recursively enables the "a type can opt-in to Mappable if and only if all its members are also Mappable" semantics using only OIBITs and a clever trick to break an otherwise cyclic dependency.

Regarding your suggestion of MappableRead vs MappableWrite. We could indeed extend map_bytes to work on any type which is ReprPacked. I’ll mention this as an alternative. As it stands, reading uninitialized memory is unsafe, and if we want to further relax the bound we can do that in the future.

Why would MappableWrite have to be an unsafe trait? As long as we ensure it’s only defined when it does not cause UB and is explictly opted into, there’s no reason for it to be unsafe. The whole point of this proposal is to allow transmutes without unsafe code. As long as we gain support for:

  1. ReprPacked.
  2. Somehow disallowing the trait impl unless (i) all members of a type are public or (ii) the trait is implemented in the same module as the type.

I think Mappable can be a safe trait. Do you see any reason why it wouldn’t?

"a type can opt-in to Mappable if and only if all its members are also Mappable"

Right, now I get what you're trying to accomplish. If you can implement that restriction, then Mappable should not have to be an unsafe trait, I suppose.

What I don't get is what ReprPacked has to do with Mappable (overwriting padding is not unsafe, AFAIK, or is that wrong?)

As far as I understand, overwriting is even “more” unsafe, since LLVM can use it to put its register spill in it. If you store a type with padding on stack, then other things on stack can be put inside that padding. So when you’re overwriting the padding you’re actually modifying other local vars.

Edit: Also I should mention that even if you ignore reading/writing padding, you still need at least ReprC as a bound, since Rust’s layout is entirely undefined otherwise—randomized field order etc, anything goes.

1 Like

Feel free to use it however you want. Hopefully there aren't any more weird caveats. The only ones I see (other than the various safety issues others have mentioned) are that ReprPacked needs to rule out enums for you (I think it would anyway, since #repr(packed) is for structs), and that there's no way to rule out empty structs, since all zero of their fields have successfully opted in. Also, do you need an additional Public bound?

Note that, in the second example, I used the public-trait-in-a-private-mod trick to make the eligibility traits unimplementable by users. It seems reasonable to fully prevent this, since mucking with the eligibility test isn't really part of the interface you're trying to present, even if it can only be done unsafely. I'm not sure if this approach is too hackish, though. Either way, you might still want to make the eligibility traits unsafe in order to communicate to maintainers that adding impls can affect safety, even if this wouldn't affect the public API.

I'll point out that the orphan rules already rule out impls on any builtin types that aren't opted in within the crate offering the marker trait, so you don't strictly need the negative impls for them. But again, the negative impls make adding these opt-ins unsafe, which might be useful documentation for future maintainers of the crate. The exceptions to this are &T and &mut T, which the orphan rules allow clients of your crate to implement your traits for, provided that they own T, so these have to have explicit negative impls.

The only other thing I'll say is to keep in mind that the default-impl'd eligibility trait will appear in errors, while the helper trait won't, so it makes sense to tailor the naming with an emphasis on making the eligibility trait self documenting. It does make sense to also document the cause of any errors mentioning the eligibility trait in the documentation for the marker, as you mentioned.

1 Like
  1. Yeah, something like a Public bound would be useful. TBH I think Rust can do with more Marker traits like these; similar to type_traits in C++.
  2. I saw the public in private, but it’s a bit hacky (especially since you’ll see errors about MappableSafe which wouldn’t appear in docs). Unsafe trait strikes the right balance. Ultimately being able to override with unsafe code has precedent in Rust (Sync, Send).
  3. I know, I added RecursivelyMappable negative impl-s just for clarity.
  4. Agreed. I’m happy with the triple Mappable (a type has opted in to be mappable), MappableSafe (it’s safe to opt-in to Mappable) and RecursivelyMappable (all the members in a type are Mappable).

3.  Cool, just checking. You were saying in your OP that you didn’t need a negative impl for & and &mut since they’d need an opt-in anyways, so I wanted to mention that that logic applies better to bool and such, and not (I don’t think) to & and &mut, since I don’t see how a reference could ever safely be Mappable, but the orphan rules would allow third-party crates to safely opt-in for them if you didn’t have a negative impl. If you’ve abandoned that idea in favour of being explicit anyways, then it doesn’t matter.

2.1  Yeah, I’m iffy about it for sure. I think it would be useful to have a public trait that just can’t be implemented at all, it just seems like there should be a better way than a visibility hack. In this case in particular, there is no good reason to ever implement EligibleHelper/MappableSafe. In fact, unless/until specialization of some sort is added, you can’t. Even if you could, a type that implemented MappableSafe wouldn’t fit into the eligibility check properly, since it still wouldn’t implement RecursivelyMappable, which is needed in order for types that have it as a field to be RecursivelyMappable themselves. On that note:

4.  I don’t think the name RecursivelyMappable fits, since any type that implements it to say it’s ‘mappable safe’ wouldn’t actually be recursively mappable itself, since its fields wouldn’t necessarily all be Mappable. Instead, by implementing RecursivelyMappable, the type is actually claiming to be MappableSafe, so I think RecursivelyMappable should be called MappableSafe, and the original MappableSafe should be called something else, such as MappableSafeHelper, which makes it clear that it’s just helper to allow you to express the cyclic dependency between Mappable and MappableSafe. Ideally this cyclic dependency would be directly allowed and the helper could be removed.

2.2  As for unsafe traits having a precedent, that’s only true when the ability to unsafely implement a trait is part of the interface you’re trying to present. In this case:

  • .1  This could be true of RecursivelyMappable (cum MappableSafe per paragraph 4), since you might want to allow types that aren’t recursively Mappable to unsafely declare that they are safely mappable. So it makes sense for this trait to be public and unsafe.

  • .2  In the case of the original MappableSafe (cum MapableSafeHelper per paragraph 4), though, it should never be implemented—or even referenced—by a client type. It’s only there to overcome a shortcoming in the trait system, and could ideally be removed if that shortcoming were eliminated. So it should preferably be private, but it can’t be because Mappable depends on it. So this would be a good candidate either for the private mod hack, or else to use #[unstable]. I don’t think unsafe is sufficient, especially since it might become possible for people to implement it in the future, or to depend on it in some other way which would prevent it from being removed in the future if cyclic traits were allowed.

Been thinking more about it, I don’t think we need this whole opt-in infrastructure: not much point providing an opt-in mechanism if all your fields need to be public anyway. There’s no invariant you can hope to preserve by opting out of Mappable since anyone can clobber your fields to their heart’s content anyway. So I think the whole thing is actually much simpler. I’ve been writing out the RFC with the following definition:

pub unsafe trait Reinterpret: Public + ReprPacked + Copy {}
unsafe impl Reinterpret for .. {}

/// References of any kind are non-`Reinterpret`. Otherwise, by mapping to
/// `&mut [u8]` they could be made to an arbitrary location.
impl<'a, T> !Reinterpret for &'a T {}
impl<'a, T> !Reinterpret for &'a mut T {}

/// `bool`-s are not `Reinterpret` since the only valid values are 0 and 1.
impl !Reinterpret for bool {}

/// `char`-s are not `Reinterpret` since they need to enfore UTF-8 invariants
/// (0xff, for instance, is not a valid char).
impl !Reinterpret for char {}

Under this definition, explicit opt-outs for bool and char are neccessary I think since they’d otherwise be implicitly defined. I’ll be posting the RFC proper soon and it will be great to hear more of your excellent feedback there.

Btw, I don’t know if Reinterpret is better as a name, but at least it’s in line with verbs as trait names in Rust.

I don’t think you need to worry about the strict aliasing rules here: rust’s references already guarantee exclusivity or immutability, so the compiler doesn’t need to rely on peculiarities such as the strict aliasing rules in order to generate optimal code.

Thanks. I think you’re right, this is part of why it’s taken me so long to write the full RFC—LLVM doesn’t use type-based aliasing rules at all: http://llvm.org/docs/LangRef.html#pointer-aliasing-rules

This couple with my decision to go without an explicit opt in, changes my design almost entirely. The RFC proper is coming soon ™!

That's reasonable enough. Keep in mind, though, that opt-ins don't exist solely to protect invariants. They also serve to permit forward evolution of libraries. For example, Public seems like a candidate for opt-in rather than opt-out. If you look at the reasoning behind making Copy opt-in, some of that reasoning seems to apply to Public. Very few types would seem to actually want to implement it, whereas it seems like it would be fairly common for a type that is initially Public to add a private field in the future. At the same time, there isn't a pressing need to prevent types from accidentally not being Public. Unlike Send and Share, being able to ensure a given type has only public fields isn't necessary for the language to be usable.

ReprPacked is already essentially opt-in, due to requiring an attribute.

Reinterpret is hard to classify in one camp or the other. On the one hand, it not only requires Public and ReprPacked, but also requires those to be applied recursively, which is a subtle requirement that presents a backwards compatibility hazard for types that aren't designed with it in mind, which suggests it should be opt-in. On the other hand, the circumstances where it applies are fairly narrow. Types already have to opt-in to ReprPacked before Reinterpret becomes relevant, so it might be reasonable to assume that anyone opting in to the one is aware of the other. This doesn't so much argue directly for opt-in or opt-out, but more indicates that it might not matter, which in turn argues in favour of the simpler one, which happens to be opt-out.

Regardless, I think any RFC on this should treat this consideration, since I don't think it's sufficient to say that opt-out is fine because encapsulation isn't a concern.

Wow, I… weird coincidence. So I just came up with this today, it’s a similar system that uses the current OIBIT implementation to do something very similar. Take a look at: http://arcnmx.github.io/nue/pod/#example

It requires explicit opt-in on each type, while still denying opt-in on types that are bad (like references, char, bool, etc). It’s set up in such a way that (u16,) is PodType but struct NewType(u16) isn’t without an explicit impl Pod, and struct NewType(&'static u16) requires an unsafe impl PodType.

The unsafe impl can be used to override it and make whatever you want pod-able, but that’s probably a very bad idea in almost all cases. Unfortunately unstable features are required for the automagic safe impls…

I have some syntax extensions that handle the padding attributes, and a few other fancy weird things, just didn’t have a chance to clean them up tonight.