Feature request: References to empty objects

The following code demonstrates that (on x86-64) Rust used 8 bytes to store a reference to an empty structs. Because reference references always the same data, it's enough to allocate 0 bytes for it. Please, improve your optimizer.

use std::mem::size_of;

struct Empty {}

struct S1<'a> {
    pub e: &'a Empty,
    pub e2: Empty,
    pub v: u64,
}

struct S2<'a> {
    pub e: &'a Empty,
    pub e2: &'a Empty,
    pub v: u64,
}

fn main() {
    println!("{}", size_of::<S1>());
    println!("{}", size_of::<S2>());
}

Zero-sized references where discussed and rejected before: RFC - Zero-Sized References by EpicatSupercell · Pull Request #2040 · rust-lang/rfcs · GitHub

Edit: You can write a simple PhantomData based type that behaves like zero-sized ref: Rust Playground

9 Likes

Your type is useless for my purpose:

My purpose is resize the reference to 0 bytes, when a generic type to which it refers happens to be zero-size. If I were not have generic type, but simply a zero-size type, I would not need a reference to it.

I wrote at the feature request the following:


It is indeed possible to do zero-sized references without breaking existing code:

struct S<'a> {
    #[allow_zero_sized]
    pub e: &'a Empty,
}

Then S should be zero-sized.

Please implement this feature.

You can implement a zero-sized-reference-to-zero-sized-type in library code, there's no need to add some kind of attribute for it to the language.

You can only do this for a concrete ZST, you can't know for a generic type filled by a ZST whether its address is meaningful or not. (Unless you add a trait that encodes this distinction, which could then be used to do the differently sized references in library code too).

1 Like

This would be a breaking change, because there is code in the wild that relies on the numerical address of references to ZSTs remaining stable.

3 Likes

You can implement a zero-sized-reference-to-zero-sized-type in library code

I don't see how to do it, please help.

How to create a "reference" type that is zero sized if and only if the type T to which it refers is Singleton?

The idea of #[allow_zero_sized] is to use it, when sure there is no such code applied to this reference.

What if unsafe code is relying on this address stability for soundness? Then, you have broken Rust's safety guarantees.

struct ZstRef<'a, T: Singleton>(PhantomData<&'a T>);

impl<'a, T> From<&'a T> for ZstRef<'a, T> {
  fn from(_: &'a T) -> Self { Self(PhantomData) }
}

impl<T> Deref for ZstRef<'_, T> {
  type Target = T;
  fn deref(&self) -> &Self::Target {
    // Safety: something to do with `Singleton` allowing this
    unsafe { NonNull::dangling().as_ref() }
  }
}

(and some associated type indirection to choose between this and &'a T for other types, though that needs specialization to not need manual implementation for concrete types).

The idea behind #[allow_zero_sized] is to use it in a struct that does not appear in unsafe code. If unsafe code is indeed used, it uses the structure as a whole, not field-by-field and zero-size field won't be "noticed", so that unsafe code will work correctly (if we ensure that this field is not passed to an unsafe code).

It seems you didn't understand my question:

How to create ZstRef that is zero-size if and only if T is zero-size. If T is not zero-size, it should behave as a regular reference.

#[allow_zero_sized] could cause UB without any nearby unsafe, for example:

pub struct EvenAddr(());

impl EvenAddr {
    pub fn new<'a>(x: usize) -> &'a EvenAddr {
        if x % 2 == 0 && x > 0 {
            unsafe { std::mem::transmute(x) }
        } else {
            panic!()
        }
    }
    pub fn test(&self) {
        if (self as *const _ as usize) % 2 == 1 {
            // SAFETY: we validated that the address was even in `new`
            unsafe { std::hint::unreachable_unchecked() }
        }
    }
}

fn main() {
    let x: &EvenAddr = EvenAddr::new(2);
    x.test();
}

This is fine on its own, but if you lose the address of the &EvenAddr and use the canonical ZST address of dangling() (1 in this case since it's a 1ZST) then it becomes UB.

I also think this could be useful. If not for unsafe code relying on the size of a reference not changing, this would be a nice layout optimization to have. I don't know of a way to emulate it in rust.

Is unsafe code allowed to rely on the layout of a struct containing a reference?

For:

struct A {
    v: &u32
}

struct B {
    v: &u64
}  

Is size_of::<A>() == size_of::<B>() guaranteed?

One papercut could be struct accidentally turning into zsts, which would break unsafe code.

If not, this could be changed. If it is guaranteed, I don't think the annotated version breaks anything - a zst ref could be coerced from the field on access.

1 Like
  1. You seem to prove that #[allow_zero_sized] causes UB without using #[allow_zero_sized] in your code.

  2. What does it mean to "lose" an address? What is "canonical" ZST? What is 1ZST? What is dangling()? I don't understand you at all.

If it's repr(transparent), then yes.

Then the layout optimization could not apply in this case.

The point is that there's code that relies on references to ZSTs maintaining their address (like the one shown above). Since #[allow_zero_sized] allows you to lose that information it follows that it is unsound.

The expandend example that shows how this can lead to UB is:

pub struct EvenAddr(());

impl EvenAddr {
    pub fn new<'a>(x: usize) -> &'a EvenAddr {
        if x % 2 == 0 && x > 0 {
            unsafe { std::mem::transmute(x) }
        } else {
            panic!()
        }
    }
    pub fn test(&self) {
        if (self as *const _ as usize) % 2 == 1 {
            // SAFETY: we validated that the address was even in `new`
            unsafe { std::hint::unreachable_unchecked() }
        }
    }
}

struct EvenAddrRef<'a> {
    #[allow_zero_sized]
    x: &'a EvenAddr
}

fn main() {
    let x: &EvenAddr = EvenAddr::new(2);
    let ear: EvenAddrRef = EvenAddrRef { x };
    
    // What is the reference used for the `&self` in the `test()` method call?
    // Since `x` is a zero sided it lost the information about the address returned
    // by `EvenAddr::new`, so you need to create a new one.
    // The "canonical" way to create a reference to a ZST is to use
    // `std::ptr::NonNull::dangling`, however if you use that in this case
    // you get the address 1 which causes UB!
    ear.x.test();
}

Currently a reference to a ZST stores an address. That's the 8 bytes you want to avoid. If you make the reference take 0 bytes then you're not storing the address anymore, thus you "lose" it.

Sometimes some code need to create a reference to a ZST out of thin air (which would also happen in the above example when test is called on the field marked #[allow_zero_sized]). In order to do this an address must be created and thus the question: what should that address be? It's commonly agreed to use the smallest non-null and aligned address, that is the address equal to the alignment, which is also the one returned by NonNull::dangling. References to ZST with such addresses are called canonical references to ZST.

I guess they were referring to a ZST with alignment of 1.

NonNull::dangling() is a function of the std::ptr::NonNull type which returns a non-null but aligned pointer for the given type.

6 Likes

You claim that this is not an undefined behavior?

Accordingly my understanding, this is undefined behavior, because "In Rust, by contrast, the compiler guarantees that references will never be dangling references" (Rust book).

From the nomicon:

A reference/pointer is "dangling" if it is null or not all of the bytes it points to are part of the same allocation (so in particular they all have to be part of some allocation). The span of bytes it points to is determined by the pointer value and the size of the pointee type. As a consequence, if the span is empty, "dangling" is the same as "null".

3 Likes