This is exactly what the simplified formulation is. However, constructability is also impacted by the visibility of the field type definitions, and the visibility of the paths of those definitions. Usually, Rust forbids private types in public type signatures, but the pub-in-priv trick circumvents this check. Ignoring this is a safety hazard.
I havenāt read through the whole RFC yet, but is seems like there are no comparisons to Haskellās safe coercions yet. The situation is kind-of similar. There was (and still is) a function called unsafeCoerce in Haskell that completely ādisablesā the type checker like Rustās transmute but can safely be used around e.g. newtypes (which are like #[repr(transparent)] structs). The safe coercions are presenting themselves with a coerce method of a Coercible type class (type classes are what Rustās traits are based on) that the compiler resolves AFAICT similarly to the proposed TransmuteFrom/-Into, that is: also ad-hoc, based on the visibility of constructors of the types involved (a distinction similar to visibility of fields in Rust). Unlike this proposal, the safe coercions in Haskell are only dealing with newtypes and generic types, and not with safely transmutable C-style structs, and they are an important implementation detail of the GeneralizedNewtypeDeriving feature, something that Rust is still missing but should in my opinion eventually get, too.
They introduce so-called ārolesā for parameters of generic types, in particular those roles are called nominal, representational and phantom. Speaking in Rust examples, in a type like PhantomData<T>, the variable T would have a phantom role with the effect that a PhantomData<T> could be freely transmuted into a PhantomData<S> for any S. A type like Vec<T> would have a parameter T with representational role, meaning that Vec<T> could be transmuted into Vec<S> if and only if T can be transmuted into S. And a type like HashSet<T> has a parameter T with nominal role, which means that HashSet<T> can never be transmuted into HashSet<S> except for when T and S are the same type. The nominal role of HashSet<T> would be an API design decision since changing the type inside the HashSet would change the Hash implementation of the contained type which could put the whole HashSet into a totally broken state. A different situation, where nominal parameters are actually strictly necessary would be for something like
trait SomeTrait {
type AssociatedType
}
struct MyStruct<T: SomeTrait>(T, T::AssociatedType);
because there changing T to some different S changes the associated type and hence the layout of MyStruct<T>, even if T is transmutable into S.
The safe coercions of Haskell are always invertible, unlike this Rust proposal, so a representational role in Rust would probably need to come with some kind of notion of āvarianceā.
Iāll continue reading up on how the current RFC perhaps already handles these kinds of issues but I wanted to make sure this prior art is also taken into consideration.
This would probably want to be another Neglect* option, since inconsistent Hash/Eq implementations are only a logic error and not a memory safety error - you normally want to be protected from logic errors, but it makes sense to allow it as an option.
I actually knew someone who was working (alone) on a Rust safe-transmute system based on Haskell's Coercible (unfortunately the project is pretty much lost now). One thing I remember is that it did have explicit variance of type parameters.
Mostly off-topic discussion about library UB
Hmm, but in an unsafe-using HashContainer implementation, such a "logic error" (elsewhere named "library UB") could get promoted to memory errors pretty quickly upon attempting to actually use the container, depending on what exactly is done with the bits of the hash (which has now changed) - e.g. if the set somehow cached that a bucket is present for a given element, but the element's hash has now changed, the bucket for that hash is no longer present.
In the case of HashMap and Hash the container has no way of relying on the results to be consistent anyways. For what it's worth, an impl Hash could use a random number generator or read values from the network. Nonetheless, with reasonable Hash impls around, you cannot break your hashmaps, and thus safe transmutations should not introduce a way to do this either without some extra hurdles. I agree that those hurdles don't need to include requiring unsafe in the case of HashMap.
Other data structures could be different, in particular they could rely on the correct implementation of sealed or unsafe traits and thus require a nominal type role to be safe because otherwise the transmutation would trigger some actual library UB.
Edit: On a second thought, for a type with a correct Hash implementation, it would be impossible to create a broken HashMap currently. Thus a function receiving a HashMap<i32, T> or so, using unsafe and requiring the map to be not broken for ensuring no UB is triggered, could become unsound by this change.
Edit2: Wait... perhaps it is possible to create a broken HashMap<i32, T> because of the HashMap API using the Borrow trait. I'm not sure. Maybe it's still impossible because Borrow is used in retrievals but not in (key-changing) updates.
Even if it's true that you can't break a HashMap<i32, T> in safe code, it seems like that would be an irresponsible function to write. After all, some other function could provide a safe API wrapping unsafe code that edits the keys in a HashMap<i32, T> to make it logically invalid. Neither of them causes memory errors by itself, but if they both exist, then you get memory errors.
Now, you might say, "but that's the fault of the second function, because it modified the internals of a type that doesn't guarantee that it will remain memory-safe if you do that". However, the documentation of HashMap states (emphasis mine):
It is a logic error for a key to be modified in such a way that the key's hash, as determined by the
Hashtrait, or its equality, as determined by theEqtrait, changes while it is in the map. This is normally only possible throughCell,RefCell, global state, I/O, or unsafe code.
Which is ambiguous, but might imply that you ARE permitted to modify keys using unsafe code, and at worst, you only get logic errors rather than memory errors. Which would mean that the second function IS allowed to be safe, while the function you describe would be obligated to be an unsafe fn even in current Rust.
Further, in nightly, we have the raw entry API, which already allows constructing a broken HashMap<i32, T> in safe code.
Borrow is a safe trait, thus it must not cause UB in safe Rust. If the std-HashMap caused UB by having a user calling a safe method on it without ever having used unsafe, the user who wrongly used Borrow cannot possibly be held responsible for causing UB. The documentation's requirements cannot be checked by the compiler, i.e. you cannot count on the correct behavior in a way, that could cause UB, if someone ignored the documentation of Borrow.
The only way to hold the user responsible would be, if Borrow was an unsafe trait and the documentation clearly outlines the requirements for its use to not cause UB. As a side note, the documentation uses word "should", not "must" for the part mentioning Eq and co.
The same goes for the usage of transmute, as it is used today. As long as the user upholds all the invariants required to call transmute without causing UB, i.e. (short summary) it doesn't produce an invalid value, the transmuted value must not cause UB in safe code, otherwise calling transmute would always be UB.
P.S.: Copied from the Rust documentation, as a reminder:
Rust is otherwise quite permissive with respect to other dubious operations. Rust considers it "safe" to:
- Deadlock
- Have a race condition
- Leak memory
- Fail to call destructors
- Overflow integers
- Abort the program
- Delete the production database
However any program that actually manages to do such a thing is probably incorrect. Rust provides lots of tools to make these things rare, but these problems are considered impractical to categorically prevent.
Did you maen safe for the second safer? What does it mean to be "safer"? Still unsafe? What does that actually mean?
Whoops, yes, that's exactly what I meant.
When writing unsafe code, it's important to be clear on what invariants you are taking upon yourself to enforce. mem::transmute makes this very difficultāit will freely allow you to do many types of invalid transmutations without communicating why those transmutations are invalid.
Our belief is that even when a completely safe transmute is not possible (e.g., u8 to bool), you should have unsafe alternatives which are less pitfall-prone than mem::transmute. This is achieved in the RFC by the options system.
I wasn't aware of this work, so thank you very much for pointing me towards it! I'm not much of a Haskell expert, so I hope you can be patient as I (inevitably) will make mistakes in my comparison of our RFC to the Haskell stuff. ![]()
As you note, the problems solved by this RFC and Haskell are slightly different: Coerce is, fundamentally, about safely exposing that newtypes are layout-equivalent to their member. (In contrast to our RFC, Coerce doesn't attempt to answer, for instance, if two u8s are interchangeable with a u16.)
Haskell takes an algebraic approach to this problem, reasoning at the level of type definitions, not type layouts. However, not all type parameters have an impact on a type's layout; for instance:
#[repr(C)]
struct Bar<U>(PhantomData<U>);
#[repr(transparent)]
struct Foo<T, U>(T, Bar<U>);
Foo's layout is impacted solely by T, not U, but this isn't necessarily clear by looking at the definition of Foo. To reason about these scenarios, Haskell introduces the concept of type parameter roles; a type parameter has either a nominal, representational or phantom role.
Our RFC does not need the concept of roles, because it does not attempt to abstractly reason about type definitions. Rather, it reasons about type layouts. This example therefore does not pose a challenge to our proposal:
trait SomeTrait { type AssociatedType; }
#[repr(C)]
struct MyStruct<T: SomeTrait>(pub T, pub T::AssociatedType);
For a particular T, MyStruct<T> will have a particular layout. Our proposed TransmuteFrom trait reasons about the layouts of types (which are fully concrete), not the definitions (which may be somewhat abstract).
Please let me know if you think I'm missing anything!
I think the subsequent discussion might have gotten slightly side-tracked on this point. Vec and HashSet will not be transmutable by the mechanisms proposed by this RFC:
Damn, I totall forgot about this. Is it even true for something repr(transparent) like
#[repr(transparent)]
struct Wrapper<T>(T);
that you cannot transmute Vec<i32> into Vec<Wrapper<i32>> or Option<u8> into Option<Wrapper<u8>>?
Edit: Of course either way the point stands that while transmuting &u8 into *const u8 is okay, transmuting Option<&u8> into Option<*const u8> definitelty isnāt.
You cannot. Rust currently makes no guarantee that the layout of Vec<i32> is interchangeable with Vec<Wrapper<i32>>!
Not in this case. However, although Option is not #[repr(C)], "option-like" enums carry some special layout guarantees that make certain transmutations sound.
Interesting. Iām wondering if there is any drawback to introducing a guarantee for something like this.
If 2 fields have the same size and alignment, the order doesn't matter, but maybe it does matter, because one of the fields is used more often than the other and the struct spans more than a single cache line. If the compiler can figure that out, ordering of fields could potentially change between compilations without touching the struct. This might be different for the Vec containing T and the Vec containing Wrapper<T>.
I'm not sure.
Vecs aren't transmutable, but they are unsafely constructible via the Vec::from_raw_parts function. However, its invariants are rather strict; we may only convert a Vec<T> to Vec<U> if:
Uis transmutable fromTUhas the same size asTUhas the same static alignment asT
Fortunately, our RFC can statically enforce all three of these invariants:
fn cast<Src, Dst>(src: Vec<Src>) -> Vec<Dst>
where
Dst: TransmuteFrom<Src>
+ AlignEq<Src>,
+ SizeEq<Src>,
{
let (ptr, len, cap) = src.into_raw_parts();
unsafe { Vec::from_raw_parts(ptr as *mut Dst, len, cap) }
}
(AlignEq and SizeEq are described here. They are are defined in terms of TransmuteFrom, and do not require additional compiler support.)
The moment you need to determine the layout of a type and the moment you know how itās used can easily be different compilations (i.e. of different crates). Furthermore thereās gotta be some rather deterministic algorithm for determining layout at work already, since you can have e.g.
- crate
adefine a generic typeA<T> - crate
bdefine a typeB - crates
canddindependently depending onaandbusing the typeA<B> - crate
edepending oncanddpassing anA<B>fromctod.
Every crate comes from its own call to rustc, possibly even in parallel where dependency allows it, so determining the layout of A<B> can only be done deterministically from nothing but the definitions of A<T> and B and possibly some decisions the compiler made and wrote down while compiling a and b (but a and b are independent from each other, too).
Of course for a crate defining a private type Ty and using Vec<Ty>, it might be okay for rustc to include other factors into the decision of how Vec<T> is layed out. The question remains if these kinds of optimizations will ever happen and if theyāre worth it.
Don't forget about PGO / profile-guided optimization, which can inform rustcs layout decisions based on usage statistics. (E.g., historically for some architectures, code that accesses relative-0 in a struct was more compact and faster than accesses at non-zero offsets into the struct.) Similarly, on modern cache-based architectures, arranging more-frequently-accessed fields in the same cache line may significantly improve performance. PGO provides the statistical access information needed for such field rearrangement.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.