Feature request: References to empty objects

vporton · March 13, 2023, 12:57pm

The following code demonstrates that (on x86-64) Rust used 8 bytes to store a reference to an empty structs. Because reference references always the same data, it's enough to allocate 0 bytes for it. Please, improve your optimizer.

use std::mem::size_of;

struct Empty {}

struct S1<'a> {
    pub e: &'a Empty,
    pub e2: Empty,
    pub v: u64,
}

struct S2<'a> {
    pub e: &'a Empty,
    pub e2: &'a Empty,
    pub v: u64,
}

fn main() {
    println!("{}", size_of::<S1>());
    println!("{}", size_of::<S2>());
}

sahnehaeubchen · March 13, 2023, 1:09pm

Zero-sized references where discussed and rejected before: RFC - Zero-Sized References by EpicatSupercell · Pull Request #2040 · rust-lang/rfcs · GitHub

Edit: You can write a simple PhantomData based type that behaves like zero-sized ref: Rust Playground

vporton · March 13, 2023, 1:30pm

Your type is useless for my purpose:

My purpose is resize the reference to 0 bytes, when a generic type to which it refers happens to be zero-size. If I were not have generic type, but simply a zero-size type, I would not need a reference to it.

I wrote at the feature request the following:

It is indeed possible to do zero-sized references without breaking existing code:

struct S<'a> {
    #[allow_zero_sized]
    pub e: &'a Empty,
}

Then S should be zero-sized.

Please implement this feature.

Nemo157 · March 13, 2023, 2:52pm

You can implement a zero-sized-reference-to-zero-sized-type in library code, there's no need to add some kind of attribute for it to the language.

You can only do this for a concrete ZST, you can't know for a generic type filled by a ZST whether its address is meaningful or not. (Unless you add a trait that encodes this distinction, which could then be used to do the differently sized references in library code too).

Jules-Bertholet · March 13, 2023, 3:21pm

This would be a breaking change, because there is code in the wild that relies on the numerical address of references to ZSTs remaining stable.

vporton · March 13, 2023, 8:13pm

You can implement a zero-sized-reference-to-zero-sized-type in library code

I don't see how to do it, please help.

How to create a "reference" type that is zero sized if and only if the type T to which it refers is Singleton?

vporton · March 13, 2023, 8:14pm

The idea of #[allow_zero_sized] is to use it, when sure there is no such code applied to this reference.

Jules-Bertholet · March 13, 2023, 8:17pm

What if unsafe code is relying on this address stability for soundness? Then, you have broken Rust's safety guarantees.

Nemo157 · March 13, 2023, 8:28pm

struct ZstRef<'a, T: Singleton>(PhantomData<&'a T>);

impl<'a, T> From<&'a T> for ZstRef<'a, T> {
  fn from(_: &'a T) -> Self { Self(PhantomData) }
}

impl<T> Deref for ZstRef<'_, T> {
  type Target = T;
  fn deref(&self) -> &Self::Target {
    // Safety: something to do with `Singleton` allowing this
    unsafe { NonNull::dangling().as_ref() }
  }
}

(and some associated type indirection to choose between this and &'a T for other types, though that needs specialization to not need manual implementation for concrete types).

vporton · March 13, 2023, 8:38pm

The idea behind #[allow_zero_sized] is to use it in a struct that does not appear in unsafe code. If unsafe code is indeed used, it uses the structure as a whole, not field-by-field and zero-size field won't be "noticed", so that unsafe code will work correctly (if we ensure that this field is not passed to an unsafe code).

vporton · March 13, 2023, 8:39pm

It seems you didn't understand my question:

How to create ZstRef that is zero-size if and only if T is zero-size. If T is not zero-size, it should behave as a regular reference.

Nemo157 · March 13, 2023, 8:53pm

#[allow_zero_sized] could cause UB without any nearby unsafe, for example:

pub struct EvenAddr(());

impl EvenAddr {
    pub fn new<'a>(x: usize) -> &'a EvenAddr {
        if x % 2 == 0 && x > 0 {
            unsafe { std::mem::transmute(x) }
        } else {
            panic!()
        }
    }
    pub fn test(&self) {
        if (self as *const _ as usize) % 2 == 1 {
            // SAFETY: we validated that the address was even in `new`
            unsafe { std::hint::unreachable_unchecked() }
        }
    }
}

fn main() {
    let x: &EvenAddr = EvenAddr::new(2);
    x.test();
}

This is fine on its own, but if you lose the address of the &EvenAddr and use the canonical ZST address of dangling() (1 in this case since it's a 1ZST) then it becomes UB.

NoamB · March 13, 2023, 9:02pm

I also think this could be useful. If not for unsafe code relying on the size of a reference not changing, this would be a nice layout optimization to have. I don't know of a way to emulate it in rust.

Is unsafe code allowed to rely on the layout of a struct containing a reference?

For:

struct A {
    v: &u32
}

struct B {
    v: &u64
}

Is size_of::<A>() == size_of::<B>() guaranteed?

One papercut could be struct accidentally turning into zsts, which would break unsafe code.

If not, this could be changed. If it is guaranteed, I don't think the annotated version breaks anything - a zst ref could be coerced from the field on access.

vporton · March 13, 2023, 9:17pm

You seem to prove that #[allow_zero_sized] causes UB without using #[allow_zero_sized] in your code.
What does it mean to "lose" an address? What is "canonical" ZST? What is 1ZST? What is dangling()? I don't understand you at all.

scottmcm · March 13, 2023, 9:17pm

If it's repr(transparent), then yes.

NoamB · March 13, 2023, 9:25pm

Then the layout optimization could not apply in this case.

SkiFire13 · March 13, 2023, 9:32pm

The point is that there's code that relies on references to ZSTs maintaining their address (like the one shown above). Since #[allow_zero_sized] allows you to lose that information it follows that it is unsound.

The expandend example that shows how this can lead to UB is:

pub struct EvenAddr(());

impl EvenAddr {
    pub fn new<'a>(x: usize) -> &'a EvenAddr {
        if x % 2 == 0 && x > 0 {
            unsafe { std::mem::transmute(x) }
        } else {
            panic!()
        }
    }
    pub fn test(&self) {
        if (self as *const _ as usize) % 2 == 1 {
            // SAFETY: we validated that the address was even in `new`
            unsafe { std::hint::unreachable_unchecked() }
        }
    }
}

struct EvenAddrRef<'a> {
    #[allow_zero_sized]
    x: &'a EvenAddr
}

fn main() {
    let x: &EvenAddr = EvenAddr::new(2);
    let ear: EvenAddrRef = EvenAddrRef { x };
    
    // What is the reference used for the `&self` in the `test()` method call?
    // Since `x` is a zero sided it lost the information about the address returned
    // by `EvenAddr::new`, so you need to create a new one.
    // The "canonical" way to create a reference to a ZST is to use
    // `std::ptr::NonNull::dangling`, however if you use that in this case
    // you get the address 1 which causes UB!
    ear.x.test();
}

Currently a reference to a ZST stores an address. That's the 8 bytes you want to avoid. If you make the reference take 0 bytes then you're not storing the address anymore, thus you "lose" it.

Sometimes some code need to create a reference to a ZST out of thin air (which would also happen in the above example when test is called on the field marked #[allow_zero_sized]). In order to do this an address must be created and thus the question: what should that address be? It's commonly agreed to use the smallest non-null and aligned address, that is the address equal to the alignment, which is also the one returned by NonNull::dangling. References to ZST with such addresses are called canonical references to ZST.

I guess they were referring to a ZST with alignment of 1.

NonNull::dangling() is a function of the std::ptr::NonNull type which returns a non-null but aligned pointer for the given type.

vporton · March 13, 2023, 10:25pm

You claim that this is not an undefined behavior?

vporton · March 13, 2023, 10:33pm

Accordingly my understanding, this is undefined behavior, because "In Rust, by contrast, the compiler guarantees that references will never be dangling references" (Rust book).

quinedot · March 13, 2023, 11:28pm

From the nomicon:

A reference/pointer is "dangling" if it is null or not all of the bytes it points to are part of the same allocation (so in particular they all have to be part of some allocation). The span of bytes it points to is determined by the pointer value and the size of the pointee type. As a consequence, if the span is empty, "dangling" is the same as "null".

Topic		Replies	Views
Rules for alignment and non-NULLness of references Unsafe Code Guidelines	21	3073	March 25, 2019
Relaxing the improper_ctypes lint to allow passing ZSTs behind a raw ptr language design	11	1852	March 25, 2019
How do exclusive references to zero sized types work? compiler	8	1848	August 3, 2020
Is synthesizing zero sized values safe?	32	2541	March 25, 2020
(Mega-pre-RFC) Reference specialization types (DSTs, proxy-references) language design	32	2713	March 25, 2019

Feature request: References to empty objects

Related topics