I’ve decided to include SizeLeq and AlignLeq in the proposal because I think they give us a lot more power, but I could be convinced to remove them if folks think the proposal encompases too much. A good compromise might be to just remove SizeLeq since it’s less useful than AlignLeq, although my preference is to keep them both.
I’ve opted to include a derive for FromBits in addition to it being an auto trait. This is so that authors of types with private fields can still opt in if they want.
Can’t we utilize From trait for this problem by making a wrapper type Bits<T> with appropriate From implementations for numeric types? So instead of f32::from_bits(0x0_i32) we will write f32::from(Bits(0x0i32)).
On a side note I would really like to see as operator to be a sugar for Into trait, for example it could be really convenient to use it with units of measure: let dist_meters = dist_miles as Meter;. What is the main reason for not doing it?
I think that might work. Users would need to impl From<Bits<T>> for U and the transitivity magic would need to work on that.
On a side note I would really like to see as operator to be a sugar for Into trait, [...] What is the main reason for not doing it?
These are just different types of conversions: as performs fallible non-value-preserving zero/sign extending and truncating "conversions", while From and Into perform value-preserving infallible conversions.
Hmmm interesting. You could do something like #[repr(transparent)] struct Bits<T>(T) and then give Bits<T> two from_ref(&T) -> &Bits<T> and from_mut(&mut T) -> &mut Bits<T> constructors so it'd work for references too.
The big question I'd have is: how do you construct the From impl? One of the things that I like about this proposal is that, if we go with having either compiler assistance or a custom derive, the user doesn't have to reason about the (very subtle and complex) memory safety themselves.
Also, since From and From::from are safe, there's nothing stopping somebody from implementing From<Bits<T>>::from in a way that doesn't actually depend on the argument, but instead produces some default value. That, in turn, means that you can no longer use U: From<T> as a signal that it's safe to coerce a reference to T into a reference to U.
So how does coerce work then? It looks like it's a safe function, which means that there's some mechanism for deciding whether coerce::<T, U> is valid for T and U.
If anything it could be a way to replace coerce AFAICT. That way you can do let y: U = Bits(x).into(); or let y = U::from_bits(Bits(x));. Whether that’s better than let y: U = coerce(x); or not… is debatable.
One thing that allows is for users to provide their own From<Bits<T>>::from implementations (if not right now, via specialization later), which might be something that is or isn’t desired.
(Sorry it took me a while to respond, this thread also moved so fast I couldn't just do this quickly on the side.^^ I'm probably late to the party but here you go.)
I usually consider padding to be uninitialized memory. This arises naturally because when you put a struct into unintiailized memory and the initialize it by writing to all fields, the padding will remain uninitialized. AFAIK C treats both as indeterminate values, and LLVM treats both as undef.
So, I think it is fair to consider these the same problem. That also reduces our number of problems by one
On IRC, @hanna-kruppe asked about what is okay here from a pure language operation perspective, not just from a type perspective. Ignoring types, I think reading uninitialized memory is fine, but you can expect any operation on it to be immediate UB -- this includes bit-masking, or multiplying by 0. The only thing you can do with such an uninitialized value is store it back to memory. Moreover, conservatively, better assume that when you load a u32 and any byte is uninitialized, then the entire value you are loading is uninitialized. We may end up allowing more, but if you follow these rules you should be fine from an operations/LLVM standpoint.
Now, types may place additional restrictions, like e.g. &T cannot be NULL. From what I can tell, this proposed FromBits instance could would end up with a &[u8] that contains uninitialized memory, and the question is whether that's okay? Essentially, this amounts to the question of whether "uninitialized" is a valid value for u8. I think the answer should be "no", and with my "Types as contracts" that's certainly the intention (though that is not implemented currently). I think when safe code calls a function returning u8, it should be able to rely on the fact that this u8 is initialized. Everything else is just a big hazard. With this interpretation, there would be UB the moment you load the uninitialized data into the u8, because the intention is that all types in scope are always valid. That's just like it's not okay to load the value 3 into a bool even if you never look at the bool.
These issues are the reason why the MaybeUninit type is being introduced. So, turning any sequence of bytes into a &[MaybeUninit<u8>] should be okay because you are no longer claiming that this is a valid u8.
C's "character types" rules are a crazy hack that I'd rather not replicate in Rust. Also, the "character types" exception is about TBAA/strict aliasing, which Rust doesn't have anyway. That's independent of whether, in C, a value of a type can be a "bad" indeterminate value even if it does not have a trap representation. In that regard, all the integer types are likely the same in C.
But anyway, we don't need such a strange hack in Rust. We have MaybeUninit, and if you implement memcpy in Rust you should do it by reading and writing MaybeUninit<u8>.
Yes. Essentially there is a special exception if you write &... as *..., and we pretend the reference never existed and you directly created a raw pointer. For now, better assume you have to exactly write this, syntactically.
I think there is one "bad" value, called "poison", that represents uninitialized data. Also see this paper that defines LLVM with posion (and without undef).