Specifying a set of transmutes from Struct<T> to Struct<U> which are not UB

Pulling this discussion out here to avoid derailing RFC #2756 any further.

This is largely stream-of-consciousness, describing a problem I would like to see solved and the issues I am aware of with solving it.

The problem

I want there to be a well-defined way to perform the following transmutes (especially the last one, which is impossible to achieve with safe code):

// Newtype index wrapper for a graph node.
#[repr(transparent)]
#[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)]
struct Node(usize);

unsafe {
    // It is highly desirable for all of these to be valid operations.
    // Currently, however, they are UB.
    transmute::<Vec<Vec<usize>>, Vec<Vec<Node>>>(vecs);
    transmute::<Vec<HashSet<usize>>, Vec<HashSet<Node>>>(sets);
    transmute::<&'a HashSet<usize>, &'a HashSet<Node>>(set);
}

These sort of transmutes (especially those that occur behind references) significantly decrease the cost of adopting a newtype wrapper in a crate, by allowing it to remain private to the crate rather than having to make it part of the public API.

Unfortunately, to my understanding, all of these are currently UB because Rust provides no guarantees about the layout[^1] of a generic struct for different T. That is to say, Type<T> and Type<U> can have different layouts even if T and U have the same layout (or are even ABI compatible)—a prominent example being Option<u32> versus Option<NonzeroU32>. Beyond that, there is occasional talk of potentially making the compiler "randomize" struct layouts.

The precise set of requirements for well-defined transmutes is not yet clear. But what I can tell from the discussion thus far is that, if it even is possible to specify a subset of transmutes that are well-defined, then we need at least all of the following:

  • Some unambiguous set of compatibility requirements for T and U.
  • Some unambiguous set of parametricity requirements for Type.
  • Cooperation from the compiler that, if Type meets the parametricity requirements, and T/U meet the compatibility requirements, then Type<T> and Type<U> must lay out their fields identically.

Let me go into each of these in more detail:

Compatibility requirements

Let's consider some of the types that we would like to be compatible:

  • #[repr(transparent)] struct Node(u32); should definitely be compatible with u32.
  • Vec<T> should be compatible with Vec<U> if T is compatible with U.
  • PhantomData<T> should be compatible with PhantomData<U> if T is compatible with U. (and maybe even if T/U aren't compatible!)
  • u32 should maaaaaybe be compatible with Option<NonzeroU32>.
  • Type<'a> and Type<'b> are compatible, as lifetimes are erased in codegen.
  • Compatibility should be symmetric and transitive.
    • Note: Symmetry might actually be undesirable, so that we can have NonZeroU32 -> u32. In that case, however, the term "compatible" needs to be replaced, and I am too lazy to rewrite this!

But let's also be mindful of counter-examples:

  • u32 is clearly not compatible with [u8; 4] as they have different alignments.
  • u32 is not compatible with NonzeroU32 even though it is #[repr(transparent)].

The last counterexample is tricky. #[repr(transparent)] alone isn't enough to guarantee compatibility! There must also be no niches! Some possibilities I see for the requirement:

Implicit: The requirement for compatibility could simply be that the T is #[repr(transparent)], and has no niches. Because the presence of niches depends on private details, the author of a newtype would need to provide a documented guarantee that the type will never contain niches.

Explicit: We could add an attribute stronger than #[repr(transparent)] (say, #[repr(field_transparent)]). This way, it can be explicitly opted into by the author of the newtype, and the requirements could be verified by the compiler. An opt-in attribute such as this doesn't seem too demanding as the code that wants to transmute a newtype is likely closely associated with that newtype.

In addition to one of the above, there should also be a recursion rule: If A and B are compatible, and Type<T> is parametric in T (as discussed below), then Type<A> and Type<B> are also compatible.

Parametricity requirements

Compatibility of A and B is insufficient to guarantee that Type<A> and Type<B> have the same layout. This is because Type could use associated items:

// Peekable<A> and Peekable<B> can have different layouts
// even for compatible A and B
pub struct Peekable<I: Iterator> {
    iter: I,
    peeked: Option<Option<I::Item>>,
}

Suppose that some crate tries to transmute Struct<A> into Struct<B> when Struct uses an associated type like Peekable does. Who is to be held accountable? The code that defines A/B? The code that defines Struct? The code that performs the transmute?

We cannot blame this failing on the requirements of compatibility (i.e. we can't claim that the issue is that A and B are not truly compatible), because any downstream crate can implement a trait for A and B:

// crate a
#[repr(field_transparent)]
pub struct Node(u32);

// crate b
pub trait Trait { type Assoc; }
impl Trait for u32 { type Assoc = u32; }
impl Trait for Node { type Assoc = u8; }

pub struct Struct<T: Trait>(<T as Trait>::Assoc);

In the above, crate b clearly is the one responsible for the fact that Struct<u32> and Struct<Node> are not layout-equivalent.

In the above examples, the trait bounds appearing on the struct clearly warn downstream users of the fact that these types may not be parametric in T. But in the future, specialization will allow this to occur even without any visible evidence in the public details of Struct<T>:

#![feature(specialization)]

pub trait Trait {
    type Assoc;
}

impl<T: ?Sized> Trait for T {
    default type Assoc = u32;
}

#[repr(transparent)]
struct Node(u32);
impl Trait for Node {
    type Assoc = u8;
}

// violates parametricity despite having no visible trait bounds
struct Struct<T> {
    foo: <T as Trait>::Assoc,
}

There could also conceivably be things like private_field: [u8; type_name::<T>().len()] in the future if type_name is stabilized.

Thus, we cannot blame the user for failing to verify that Struct<T> is parametric. It must be the responsibility of the author of the Struct to declare that their type is parametric. As with compatibility, this requirement could be implicit or explicit:

Implicit: When a type uses nothing like <T as Trait>::Assoc (or <Struct as Trait<T>>::Assoc) or functions like type_hint::<T> that violate parametricity, the compiler guarantees that Type<T> and Type<U> are compatible if T and U are compatible. Users of a type should not rely on this fact unless it is explicitly guaranteed in the type's documentation (or is otherwise dead obvious, e.g. due to a lack of private fields).

Explicit: We could introduce an attribute like struct Struct<#[parametric] T> so that this guarantee can be made explicit, and even be checked by the compiler. Unfortunately, this is a lot more invasive than the related idea of #[repr(field_transparent)] and would cause tons of churn, because code that needs to add #[parametric] is not necessarily anywhere near the code that wants to transmute newtypes. We would want it on all sorts of types like &T, [T], Vec<T>, HashMap<K, V> (for both K and V), Wrapping<T>...

Cooperation from the compiler

Whatever requirements are chosen, it must become part of the language specification that Type<T> and Type<U> have identical layouts when these requirements are met.

If layout randomization is ever added to the compiler, each group of types with identical layouts must receive the same layout. I believe this is always possible. (if you consider the DAG of monomorphized types whose edges are #[repr(field_transparent)] compatibility relationships, it should form a forest, where each tree is rooted in a type that is not #[repr(field_transparent)]. That root node lives in a crate that must be upstream to all other types in the tree, thus all types in the tree can agree to use the layout of the root type.)

Possible extension to #[repr(C)] types

There may also be use cases for declaring identically-shaped #[repr(C)] types to be compatible.

#[repr(C)] struct Foo { a: i32, b: u32 }
#[repr(C)] struct Bar { x: i32, y: u32 }

#[repr(field_transparent)] doesn't make sense for these types, but that annotation could still exist for newtypes. (#[repr(C)] types would simply constitute another set of "base cases" for compatible types).

Randomization would be a bit more complicated when #[repr(C)] types appear as fields in #[repr(Rust)] types. In particular, given a generic #[repr(Rust)] type Struct, the types Struct<Foo> and Struct<Bar> could appear in separate crates with no single type to serve as a common ancestor. To solve this there would need to be a global random seed shared by all crates in the build tree so that each crate can independently generate the same randomized layout for these types.


That's what I have so far. Any thoughts?


[^1]: To clarify, when I use the term "layout" in this post, I am referring to a type's size, alignment, ABI, niches, and the offsets of all of its fields. This is different from alloc::Layout, which only considers size and alignment.

7 Likes

Also, @hanna-kruppe, I apologize for not responding to this, as I wasn't sure what you were saying here:

And that's all just for actual layout, to say nothing of the aforementioned invariants which are just as much of a menace here. I stress these, @ExpHP, because they're just as much of an obstacle to safe transmutes as layout differences are. If you're fine with requiring the user to unsafely shoulder responsibility for asserting that no invariant is violated, it's not really a big step up from requiring them to also ensure layout compatibility.

As far as I can tell, the issue is that there's currently no way for a user to ensure layout compatibility.

In my suggestions above, the user still has the responsibility to ensure that the types involved supply these compatibility and parametricity guarantees (either by checking for annotations if we choose to add those, or by reading the documentation on the types). In this manner, the problem is finally brought down to the same level as all of the other unsafe invariants that the user needs to consider.

Making any sort of transmutation safe, however plausible it might be on the surface, sounds like a massive footgun in terms of safety.

The very first thing that came to my mind was exactly the parametricity issue you mentioned too (and which I have only read later). I'm not even remotely sure, however, that this is the only such problem to consider.

All these complex requirements, of which there are already many (and there may be more, yet latent ones), seem to be very hard to check correctly. I have a feeling that because of the parametricity requirements, trying to introduce something like this proposal would need to add further rules to coherence checking too, for one.

I really don't think this is the right tradeoff and I find the motivation quite weak too. If you are sure your types uphold to such strong guarantees, I think you should abstract the conversions away eg. in a function and declare this fact with a regular unsafe transmute, because this really does seem like an infrequent and dangerous enough operation to warrant an unsafe block.

1 Like

I did not suggest making transmutation possible in safe code. Only to make it not UB.

Edit: Sorry for the triple negative.

1 Like

Alright, but then why the "safe" in the title? Did I misunderstand and does that just mean "defined"? Because those are two very different concepts.

Because the phrasing necessary to put the word "defined" in there (and have it clearly refer to "the opposite of UB") is awkward. I'll rework it.

2 Likes

I see, sorry then! I think I'll leave my reply there anyway in case it comes up later.

I guess the core of the communication difficulty is that I can't really agree with this. Yes, currently Vec<T> is UB to transmute to Vec<U> for all T != U. Yes, many types have undefined layout. Hell, it's not even sure what "layout" should include (cc https://github.com/rust-lang/unsafe-code-guidelines/issues/122). But it seems clear to me that:

  1. the bottleneck is (not just for Vec but all types) will be whether the type in question is actually being willing to commit to guaranteeing something
  2. in a whole lot of cases, there are flawed-but-workable tools to actually implement those guarantees

With these two components (commitment to a particular layout, and ability to actually ensure that layout) in hand, sound transmutes can be achieved, by looking at the type(s) in question and what each type guarantees and checking that it suffices for the conversion you want to do. For most types this will involve reading docs rather than source code, but such is life with encapsulation.

I stress this "layout rules first" approach because it's the ground truth. In your post you collected quite a few examples of incorrect reasoning principles, generalized shortcuts that one may be tempted to take to conclude that a transmute is OK but which have some counter-examples. But that these broad strokes fail don't mean any particular transmute is UB. Shortcuts are nice (if they're correct), but you can always fall back to working out what you can state about the layout of a concrete type (note: concrete as in e.g. Vec<T>, not meaning it has to be fully monomorphized).

This does require cooperation from the types in question, but this is required anyway, because if the author of that code doesn't explicitly support it, no amount of layout guarantees and staring at source code will give you a sound transmute. But when the types/authors in question are willing, we already do have significant tools for them to nail down down size, alignment, field offsets, etc. of types:

  • repr(C) for structs and unions guarantees a deterministic layout that is only a function of field order and field types' sizes and alignments (+ overall struct alignment/packing, if any)
  • repr(C) and repr(Int) enums (even those with variants that have fields!) likewise have fully specified and deterministic layout(s)
  • even the Option<Nullable> "optimization" and that it looks through repr(transparent) wrappers is guaranteed (though nothing about not bigger niches, enums that are structurally unlike Option, or non-transparent structs)
  • there is repr(transparent) too, which doesn't help with transmutes compared to repr(C), but provides more guardrails in exchange for being more limited (it does add call-ABI compatibility, but that's irrelevant for transmutes, even transmutes of all type constructors except function pointers)

You lose some layout optimizations by going for these tools, but in many case those are at odds with the desired layout guarantees anyway. In other cases, some sensible deterministic layouts aren't expressible (at all or at least idiomatically), for example we lack:

  • a way to request layout which is deterministic but does field reordering based on fields' sizes and alignments
  • a theory of niches beyond just Option<Nullable> -- what's the niche of a given type, and how will fields' niches be used when laying out types?

That is IMO where this we need to focus to enable more defined transmutes, beyond the significant subset already possible today. And that such matters have been discussed to some degree in the unsafe code guidelines WG (e.g. here). Though we're not really in a position as a WG to drive new feature proposals or push apparently-controversial things like "all layout is deterministic, with a reasonably small set of inputs".

1 Like

I should probably spell out that I don't mean to imply it's easy to pick, describe, and enforce layout guarantees sufficient for sound transmutes. I also don't mean to say that the current state of what's defined and what isn't is satisfactory (I would very much like stronger guarantees and less uncertainty for most types and argue for them on occasion). But we already have enough tools that if you are dead set on making certain transmutes sound, you can achieve it at some cost.

1 Like

That's exactly why we should design a safe API: so that the compiler can check the requirements, as opposed to leaving people to check them by hand (and likely screw it up) before using unsafe transmute.

2 Likes

Of particular concern is a vec[T] where T has a size that is not an exact multiple of a power of 2 (e.g. a 3-byte struct). Rust is likely to allocate this as an array of 4-byte items, where one byte is compiler-inserted pad. My understanding is that any attempt to transmute such a byte is always UB.

That's a non-issue. For the most part, the stride and the width of a type are equal; I don't know if Rust has non-ZST types where size < alignment? I seem to remember there being unpleasant corner cases last time I manipulated Layout values.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=2c111f051ae28313529fc40f68a27cc7

I have a sketch of an idea that seems related in https://github.com/rust-lang/rust/issues/49792#issuecomment-379638786

I should write a real libs RFC for it...

1 Like

Please check out my follow-up comment; it turns out I misunderstood the original intent of OP and so I replied to something else.

At least one part of Rust std can benefit from this machinery, namely Cell::as_slice_of_cells (and possibly Cell::from_mut). I personally would like to see these methods as functions using general sace coercions mechanism rather than weird ad-hoc methods.

My trouble is that these existing tools like #repr(C)] are a tough sell; we'd never see that on our favorite containers like HashMap.

I do think this could be a significant help, as it sounds like a reasonable thing to have in general on many types. (It's also a strictly more powerful idea than #[parametric])

I doubt this. It would seem to imply that transmuting any type with padding is UB, which I just find too hard to believe.

In the particular case of Vec<T>, transmuting a Vec doesn't even copy this byte. It transmutes a pointer.

I don't think so? It sounds like you misinterpreted my post as providing a safe API for some transmutes.

What about my post causes people to keep misinterpreting it this way?

I suspect that this comes out of semantic confusion between "Rust-safe" and "not-UB" which are distinct but not orthogonal concepts. I'm not sure there's much we can do about this, though. =/

I think #[repr(transparent)] is enough to be compatible for transmutes in all cases.

The problem for NonZeroUsize and friends that makes them not compatible with the zeroed version is the "nonstandard" #[rustc_layout_scalar_start(1)] which removes values from being valid.

The restriction isn't "no niches", it's that there aren't "extra" rules added. (I think NonZero_ displaying as #[repr(transparent)] is misleading because of the extra restriction. It's not transparent.)

The rules for transmute soundness as I understand currently are:

  • It is reflexive: sound_transmute::<A,A>
  • It is commutative: sound_transmute::<A,B> == sound_transmute::<B,A>
  • It is transitive: sound_transmute::<A,B> + sound_transmute::<B,C> == sound_transmute<A,C>
  • It is sound to transmute between types if their layout is identical (size, alignment, padding, field offsets, niches, everything):
    • It is never sound to transmute between two distinct #[repr(Rust)] types (excluding built-in types with defined layouts)
      • It is sound to transmute between compatible types for & _, &mut _ [_; _], &[_], &mut [_], *const _, *mut _`
    • It is sound to transmute between two types when one type is a #[repr(transparent)] wrapper around the other (and not annotated with a rustc-internal attribute)
    • It is sound to transmute between two types with a defined layout (such as repr(C)) if they have the same defined layout.
      • For structs, this means compatible members in the same order for #[repr(C)].
      • For enums, this means the same number of variants and equivalent explicit discriminants with #[repr(C)], #[repr(Int)], or #[repr(C, Int)].

Vec<T> to Vec<U> falls under the no-repr(Rust)-transmutes rule, and as such that is the rule we'd have to weaken to make it sound. The stuff behind the pointer is already valid; you could make a sound transmute today by decomposing the Vec into its raw parts and doing a cast there. It's the Vec layout itself which is allowed to change.

I think the solution would indeed be something along the line of #[parametric] where the guarantee provided would be that generic instantiations with compatible types would get the same layout. I don't know the best way to achieve it, but that seems good from a language POV.

Giving Vec a defined layout with #[repr(C)] would also work for just that case. The only thing we lose is the compiler's power to reorder the (ptr::Unique<T>, usize, usize) 3-tuple.


Correction to the OP: #[repr(C)] cannot do field reordering. It is sound to transmute between syntactically identical #[repr(C)] definitions. This is because the C standard lays out members in syntactical order.


On avoiding not-unsafe interpretation of unsafe soundness discussions: avoid the word safe like the plague. Use sound, valid, or defined instead. If really annoyed, put unsafe { } around every code example.

3 Likes

The issue with randomization isn't between two #[repr(C)] structs Foo and Bar, it's between Struct<Foo> and Struct<Bar> where Struct is #[repr(Rust)].

Misleading or not, the raison d'être of repr(transparent) is ABI, and the NonZero types are a prime example of types that would want to preserve ABI.

It should be fine to transmute from NonZeroU8 to u8. To allow this, I don't think sound transmute should require commutative and instead of requiring the same set of niches , between T and U it should that the set of niches of T is a super-set of the set of niches of U. This would allow transmutes that strictly increase the amount of values a type can have, like NonZeroU8 to u8.

Note that transmuting between NonZeroU8 to u8 is similar to transmuting from &T to *const T.

5 Likes