Towards even smaller structs

https://doc.rust-lang.org/reference/items/unions.html#reading-and-writing-union-fields

Effectively, writing to and then reading from a union with the C representation is analogous to a transmute from the type used for writing to the type used for reading.

So it should be fine since we have a #[repr(c)] union and BitField is [repr(transparent)]

3 Likes

Sorry, but how would that work? How do you create a field that is an rvalue and that automatically generates a temporary to be referenced? And how do you even create a stand-alone value if bitfields by definition can only exist within their containing struct?

union + varying size fields from @Salabar:s example and a convenience macro!

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=2b5b7a7926fff5e841a4ffb7042caf13

bitflags! {
    struct Packed: u8 {
        // name: SIZE => OFFSET,
        a: 1 => 0,
        b: 4 => 1,
        c: 2 => 5,
        
        // compile error if field overflows `u8`
        // d: 4 => 7, // ERROR
    }
}

fn main() {
    let mut packed = Packed { __data: 0 };

    unsafe {
        packed.a.set(1);
        
        packed.b.set(8);

        assert_eq!(packed.b.get(), 8)
    }
}

I mentioned this earlier in the thread: Towards even smaller structs - #9 by comex

Stated more clearly, the expression foo.bar, if bar is a #[flat] field, would copy bar out of the object and produce an rvalue of whatever type it is. Then Rust's existing rules for taking references to rvalues would apply. This can be simulated in current Rust by adding braces – {foo.bar} – which forces the expression to be an rvalue.

struct Foo { bar: i32 }
let foo = Foo { bar: 10 };

&{foo.bar};

  ^ OK, copies to the stack and takes a reference to that

std::ptr::addr_of!({foo.bar});

  ^ error[E0745]: cannot take address of a temporary

&mut {foo.bar};

  ^ Currently no error or warning, but probably should be an error when using flat fields without braces (heck, it should probably be a warning in all cases). It copies to the stack and only mutates the copy, not the original.

So the limitations are:

  • The field type has to be Copy for this to make sense.
  • Since &foo.bar points to a stack copy, its lifetime is not the same as the lifetime of foo. Any function like
fn get_bar(foo: &Foo) -> &i32 { &foo.bar }

         will obviously not work.

  • You can't usefully take a mutable reference. Though, I suppose the compiler could allow mutable references by making a stack copy, then automatically writing the modified value back to the original at the point the stack copy goes out of scope. But that would be going beyond "just make it an rvalue", it would require disabling NLL for those references, and I'm not sure if it's useful enough to justify the special case.

Still, it means that simple use cases, like calling &self methods on #[flat] fields, would 'just work'.

3 Likes

Alright. So it can be made work. I can't say I like it, though. It introduces an ugly special case that only works in a very limited scenario, and it does not generalize (only Copy types, only immutable refs). That is certainly bending the language in a dangerous direction.

One more take using my old approach. I use Cell to allow aliasing and I hide it to prevent a mutable borrow.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=aa7a1d4cbceabe774a793a338fe9af40

Now you don't need any unsafe. One last piece of the puzzle is to make a macro to calculate offsets automatically.

I'm almost certain that code is not sound even tough Miri doesn't spot the ub.

 77 | fn parent(&self) -> &Cell<u8> {
 78 |    unsafe { &*(self as *const _ as *const Cell<u8>) } // this part is unsound because
 79 | }

Any memory access must be done through a pointer value associated with an address range of the memory access, otherwise the behavior is undefined.

https://llvm.org/docs/LangRef.html#pointer-aliasing-rules

correct me if I'm wrong...

// assume this array is stored at address 15
let arr = [0u8; 3];

// this reference is associated with range 15..16 and 
// cannot be used to access any memory outside
let first = &arr[0];

// this reference based on `first` goes outside of the valid range
// and thus is ub
let second = unsafe { &*(first as *const u8).add(1) }; // range 16..17

// the same applies to zst;

// you cannot cast a `&T` to `&U` if `sizeof(T) < sizeof(U)` because

#[repr(transparent)]
struct Foo {
     a: (),
     b: u8,
}

// foo is stored at 4
let foo = Foo {
     a: (),
     b: 0,
}

// since `()` is a zst this reference is based on range 4..4
let a = &foo.a;

// UB: cannot use a pointer associated with range 4..4 to access memory
// outside the valid range 
let b = unsafe { &*(a as *const () as *const u8) }; // range 4..5

it does complain with -Zmiri-track-raw-pointers. (Indeed on creation of the &UnsafeCell<u8>, even without the write.)

The linked playground definitely is UB.

The same should apply to the playground of @Salabar which also fails on

MIRIFLAGS=-Zmiri-track-raw-pointers cargo +nightly miri run
error: Undefined Behavior: trying to reborrow for SharedReadWrite at alloc1345, but parent tag <18212> does not have an appropriate item in the borrow stack
   --> src/main.rs:86:13
    |
86  |             &*(self as *const _ as *const Cell<u8>)
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ trying to reborrow for SharedReadWrite at alloc1345, but parent tag <18212> does not have an appropriate item in the borrow stack
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
1 Like

Any memory access must be done through a pointer value associated with an address range of the memory access, otherwise the behavior is undefined.

I'm pretty sure this means "Pointer must point to an actual memory, and not an arbitrary address".

// this reference is associated with range 15..16 and // cannot be used to access any memory outside let first = &arr[0];

At LLVM level, a borrow is an ordinary pointer with aliasing guarantees. This means you're fine as long as you stay within range of the original array and don't violate aliasing. Otherwise, stuff like this would be impossible to implement. Mapping Address Independent Pointer: offset_ptr - 1.35.0

NB:

A pointer value is based on another pointer value according to the following rules:

  • The result value of a bitcast is based on the operand of the bitcast .

LLVM IR does not associate types with memory

This means LLVM doesn't care how this pointer originated. Since BitFields happen to share an address with its parent structure (at least with repr(transparent)), the bitcast ought to be legal. My actual concern is that I haven't been able to confirm this to be a feature, but it could be worth codifying.

Same thing here, which is definitely not right.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=f70c48a8d523ced47d035a05bb702afe

This one does feature <untagged> in the message though. More on that in the README.md of miri:

  • -Zmiri-track-raw-pointers makes Stacked Borrows track a pointer tag even for raw pointers. This can make valid code fail to pass the checks, but also can help identify latent aliasing issues in code that Miri accepts by default. You can recognize false positives by <untagged> occurring in the message -- this indicates a pointer that was cast from an integer, so Miri was unable to track this pointer. Note that it is not currently guaranteed that code that works with -Zmiri-track-raw-pointers also works without -Zmiri-track-raw-pointers , but for the vast majority of code, this will be the case.

The main pronlem with miri-track-raw-pointers is – as far as I understand – that it implements some form of stacked borrows which isn’t the official formal modal of Rust in any way (yet?) and also still in development, I guess. That’s because actual Rust has not formal model and thus you can’t always know if some code is considered UB or not because it may not have been decided yet. However, casting some &T reference into a &Cell<T> or similar things, is most definitely UB, according to Rust documentation, especially if you do use the cell for mutation afterwards. (I’m not 100% sure I know what the documentation currently has to say about the case where you don’t use the cell for mutation.)

Well – and, judging by the documentation cited above, it looks like miri-track-raw-pointers does not support pointer-to-integer-to-pointer roundtrips either.

I mean, the situation is even worse in case of your BitField<u8, (), …> because a &self reference is a reference to a zero-sized type. The reference doesn’t give you legal access to any memory whatsoever.

But focusing on a hypothetical &T -> &Cell<T> conversion, Cell is just a kind of safe wrapper for UnsafeCell and Cell.set() (using Cell::replace internally) creates a mutable reference to the target of the underlying UnsafeCell, so at last, if you go &T -> &Cell<T> -> call .set(), you’ve done the &T->&mut T conversion that’s always UB. In other words: BitField does not contain any interior mutability (due to lack of anything like UnsafeCell or wrappers like Cell/RefCell/Mutex/etc. so you just cannot soundly get a mutabel reference to any memory out of a shared reference to BitField<u8, (), …>

Some quotes:

Docs of UnsafeCell

If you have a reference &T , then normally in Rust the compiler performs optimizations based on the knowledge that &T points to immutable data. Mutating that data, for example through an alias or by transmuting an &T into an &mut T , is considered undefined behavior. UnsafeCell<T> opts-out of the immutability guarantee for &T : a shared reference &UnsafeCell<T> may point to data that is being mutated. This is called “interior mutability”.

All other types that allow internal mutability, such as Cell<T> and RefCell<T> , internally use UnsafeCell to wrap their data.

this alone (implicitly) pretty much says that transmuting &T into &UnsafeCell<T> is UB.

Nomicon

mem::transmute<T, U> takes a value of type T and reinterprets it to have type U . The only restriction is that the T and U are verified to have the same size. The ways to cause Undefined Behavior with this are mind boggling.

  • […]
  • Transmuting an & to &mut is UB.
    • Transmuting an & to &mut is always UB.
    • No you can't do it.
    • No you're not special.

Which is why I don't do this. :open_mouth: First I erase the aliasing guarantee by casting reference to pointer and then erase type information by changing pointer type. At this point, rustc has to put trust in my sanity and assume the object behind the pointer is in fact Cell and treat it as such, as doing otherwise would not make sense. Actually, I initially used normal transmute and that did not work out at all. This proves these two things are not interchangeable.

As I've already mentioned, there is an iffy part about casting ZST reference to a pointer. However, doing so is not undefined behavior, merely an unspecified one: it yields a consistent result which by blind luck provides us with a pretty cool technique. This behavior might change in the future, but it is in our power as a community to make it into a persistent feature if we deem it useful.

I think this might be worth discussing in a separate topic, though I'm not in the right state of mind for that atm.

Docs of UnsafeCell

There are two main way to cause UB using UnsafeCell:

  1. Mutably alias UnsafeCell itself. This is out ruled by UnsafeCell being inaccessible from the outside of BitField.
  2. Mutably alias object inside UnsafeCell behind the get() pointer. I use Cell which does not do it.

Dereferencing a const pointer to get a shared reference to Cell is perfectly sound. The way this pointer is computed is the only contention point.

I dunno If I managed to convince you, but I stand by these words.

I don’t know, this doesn’t appear to “prove” anything to me. In particular

sounds like UB to me. If the specification doesn’t provide any evidence that it’s not UB then where’s the difference?

The main point of the API seems be being able to use field syntax, anyways, otherwise you could just do things properly and create references that actually reference the thing they’re supposed to reference. In light of commonly accepted (as in, accepted to be sound) API like replace_with or take_mut, your whole API is kinda moot, anyways, because I can do this:

fn main() {
    let mut packed = Packed::default();
    replace_with::replace_with_or_abort(&mut packed.b, |b| {
        #[repr(C)]
        struct Pair<A, B>(A, B);
        let mut p = Pair(b, &&0);
        p.0.dbg(0); // modifies the least significant bits of the &&i32
        println!("{:?}", p.1); // =>> segmentation fault
        p.0
    });
}
3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.