pre-RFC FromBits/IntoBits

OK, I’ve started working on a draft RFC. The first draft is not complete, but you folks may have feedback nonetheless. https://github.com/joshlf/rfcs/blob/joshlf/from-bits/text/0000-from-bits.md

A few things to note:

  • I’ve decided to include SizeLeq and AlignLeq in the proposal because I think they give us a lot more power, but I could be convinced to remove them if folks think the proposal encompases too much. A good compromise might be to just remove SizeLeq since it’s less useful than AlignLeq, although my preference is to keep them both.
  • I’ve opted to include a derive for FromBits in addition to it being an auto trait. This is so that authors of types with private fields can still opt in if they want.
2 Likes

Can’t we utilize From trait for this problem by making a wrapper type Bits<T> with appropriate From implementations for numeric types? So instead of f32::from_bits(0x0_i32) we will write f32::from(Bits(0x0i32)).

On a side note I would really like to see as operator to be a sugar for Into trait, for example it could be really convenient to use it with units of measure: let dist_meters = dist_miles as Meter;. What is the main reason for not doing it?

I think that might work. Users would need to impl From<Bits<T>> for U and the transitivity magic would need to work on that.

On a side note I would really like to see as operator to be a sugar for Into trait, [...] What is the main reason for not doing it?

These are just different types of conversions: as performs fallible non-value-preserving zero/sign extending and truncating "conversions", while From and Into perform value-preserving infallible conversions.

Hmmm interesting. You could do something like #[repr(transparent)] struct Bits<T>(T) and then give Bits<T> two from_ref(&T) -> &Bits<T> and from_mut(&mut T) -> &mut Bits<T> constructors so it'd work for references too.

The big question I'd have is: how do you construct the From impl? One of the things that I like about this proposal is that, if we go with having either compiler assistance or a custom derive, the user doesn't have to reason about the (very subtle and complex) memory safety themselves.

Also, since From and From::from are safe, there's nothing stopping somebody from implementing From<Bits<T>>::from in a way that doesn't actually depend on the argument, but instead produces some default value. That, in turn, means that you can no longer use U: From<T> as a signal that it's safe to coerce a reference to T into a reference to U.

If coerce would work with references then maybe something like:

impl From<Bits<T>> for U where U: FromBits<T> {
    #[inline]
    fn from(x: Bits<T>) -> Self {
        coerce(x.0)  // EDIT: fixed bug, had coerce(x) before
    }
}

So how does coerce work then? It looks like it's a safe function, which means that there's some mechanism for deciding whether coerce::<T, U> is valid for T and U.

The same way that it is currently implemented:

fn coerce<T, U>(x: T) -> U where U: FromBits<T>

For references one needs to provide different blanket impls of From. I don't know if that can be done with specialization or not.

EDIT: had a bug in the way coerce is called, but like this it should work because the constraints are the same:

impl From<Bits<T>> for U where U: FromBits<T> {
    #[inline]
    fn from(x: Bits<T>) -> Self {
        coerce(x.0)
    }
}

Oh, I read @newpavlov’s proposal as a way to replace FromBits, not augment it.

If anything it could be a way to replace coerce AFAICT. That way you can do let y: U = Bits(x).into(); or let y = U::from_bits(Bits(x));. Whether that’s better than let y: U = coerce(x); or not… is debatable.

One thing that allows is for users to provide their own From<Bits<T>>::from implementations (if not right now, via specialization later), which might be something that is or isn’t desired.

(Sorry it took me a while to respond, this thread also moved so fast I couldn't just do this quickly on the side.^^ I'm probably late to the party but here you go.)

I usually consider padding to be uninitialized memory. This arises naturally because when you put a struct into unintiailized memory and the initialize it by writing to all fields, the padding will remain uninitialized. AFAIK C treats both as indeterminate values, and LLVM treats both as undef.

So, I think it is fair to consider these the same problem. That also reduces our number of problems by one :slight_smile:

On IRC, @hanna-kruppe asked about what is okay here from a pure language operation perspective, not just from a type perspective. Ignoring types, I think reading uninitialized memory is fine, but you can expect any operation on it to be immediate UB -- this includes bit-masking, or multiplying by 0. The only thing you can do with such an uninitialized value is store it back to memory. Moreover, conservatively, better assume that when you load a u32 and any byte is uninitialized, then the entire value you are loading is uninitialized. We may end up allowing more, but if you follow these rules you should be fine from an operations/LLVM standpoint.

Now, types may place additional restrictions, like e.g. &T cannot be NULL. From what I can tell, this proposed FromBits instance could would end up with a &[u8] that contains uninitialized memory, and the question is whether that's okay? Essentially, this amounts to the question of whether "uninitialized" is a valid value for u8. I think the answer should be "no", and with my "Types as contracts" that's certainly the intention (though that is not implemented currently). I think when safe code calls a function returning u8, it should be able to rely on the fact that this u8 is initialized. Everything else is just a big hazard. With this interpretation, there would be UB the moment you load the uninitialized data into the u8, because the intention is that all types in scope are always valid. That's just like it's not okay to load the value 3 into a bool even if you never look at the bool.

These issues are the reason why the MaybeUninit type is being introduced. So, turning any sequence of bytes into a &[MaybeUninit<u8>] should be okay because you are no longer claiming that this is a valid u8.

C's "character types" rules are a crazy hack that I'd rather not replicate in Rust. Also, the "character types" exception is about TBAA/strict aliasing, which Rust doesn't have anyway. That's independent of whether, in C, a value of a type can be a "bad" indeterminate value even if it does not have a trap representation. In that regard, all the integer types are likely the same in C.

But anyway, we don't need such a strange hack in Rust. We have MaybeUninit, and if you implement memcpy in Rust you should do it by reading and writing MaybeUninit<u8>.

Yes. Essentially there is a special exception if you write &... as *..., and we pretend the reference never existed and you directly created a raw pointer. For now, better assume you have to exactly write this, syntactically.

I think there is one "bad" value, called "poison", that represents uninitialized data. Also see this paper that defines LLVM with posion (and without undef).

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.