Pre-RFC: Struct/union raw pointer field access

Hi, I would like to ask your opinions on this alternative RFC #2582 (&raw operator) proposal.

Summary

Allow accessing fields in raw pointers pointing to structs or unions. Such an operation would create a raw pointer.

Motivation

Rust doesn't provide a way to safely create a pointer to a field for uninitialized and packed types without going through reference indirection. This abstraction would also permit a safe way to implement offset_of outside of the standard library.

Guide-level explanation

Using member access operator on raw pointer would create a raw pointer pointing a given field. This avoids reference indirection, avoiding the need to prove guarantees about alignment and dangling references.

struct Hello {
    a: i32,
    b: i64,
}

fn main() {
    let hello = Hello { a: 1, b: 2 };
    let ptr: *const Hello = &hello;
    let b_ptr: *const i64 = ptr.b;
    assert_eq!(unsafe { *b_ptr }, 2);
}

This can be used with packed structures.

use std::ptr;

#[repr(packed)]
struct Hello {
    a: i32,
    b: i64,
}

fn main() {
    let hello = Hello { a: 1, b: 2 };
    let ptr: *const Hello = &hello;
    let b_ptr: *const i64 = ptr.b;
    println!("{}", unsafe { ptr::read_unaligned(b_ptr) });
}

It's also possible to partially initialize uninitialized structures.

use std::mem::MaybeUninit;

#[derive(Debug)]
struct Hello {
    a: i32,
    b: i64,
}

fn main() {
    let mut m: MaybeUninit<Hello> = MaybeUninit::uninit();
    let mp = m.as_mut_ptr();
    unsafe {
        mp.a.write(1);
        mp.b.write(2);
        println!("{:?}", m.assume_init());
    }
}

Reference-level explanation

https://doc.rust-lang.org/reference/expressions/field-expr.html

If a type of expression to the left is a raw pointer. this operation provides a raw pointer to the location of that field whose mutability depends on original pointer's mutability. This doesn't require unsafe, even if the pointer points to an union.

Drawbacks

This introduces a special rule for member access when using raw pointers increasing the complexity of the language.

Rationale and alternatives

RFC #2582. I believe however this design is more powerful overall as well as it has a more intuitive syntax not introducing a new pseudo-keyword raw.

Unresolved questions

Should this operator be possible to use for dangling/made-up pointers? Answering no here means unsafe requirement, which is probably fine, as this feature is useful in unsafe code only.

Future possibilities

This pretty much conflicts with RFC #2582 by introducing another syntax to access packed fields. Likely only one of those RFCs should be stabilized.

Accessing enum fields. That said, I don't think that is particularly important when unions can be used to simulate enums.

1 Like

Half the point of &raw is to be able to do the pointer offset operation without knowing whether the pointer is valid. If it's not allowed, then it has no extra power over taking real references (except pointer provenance, I suppose) and can't be used to implement offset_of! soundly (EDIT: forgot about using MaybeUninit for this).

This has been suggested before, but unfortunately I don't know exactly where. IIRC it's considered problematic because member access always gives a place currently, and this would change that.

Personally I like it, but as you've admitted, a general solution is still needed even with field-based pointer projection, for more complicated cases. enums are a quite tricky case, but &raw at least supports them, and allows taking pointers to union fields given a reference to the union.

This would be still allowed, even if raw pointer field access wouldn't work with dangling pointers.

macro_rules! offset_of {
    ($t:ty, $field:tt) => {
        let m = ::core::mem::MaybeUninit::<$t>::uninit();
        unsafe { m.as_ptr().$field } as usize - m.as_ptr() as usize
    };
}

All accesses are within an uninitialized allocation (but not dangling). Of course, you want to access $crate in practice, to prevent malicious core substitutions.

2 Likes

Ah, right.

I think the fact that obj.field would be unsafe could potentially be a footgun, but it is important to distinguish between ptr::offset (unsafe, must be within the allocation) and ptr::offset_wrapping (safe, may cross allocations or even be distinct from any).

&raw actually has the same problem, but it's more afforded to be ptr::offset as you're actually (in the code) dereferencing the starting pointer.

The issue is that anything that any "just muck with the pointer, don't look at what's behind it" feature has to fight is the idea that doing so is always safe, when in fact ptr::offset cannot cross allocations and is very important for optimization of pointer usage.

1 Like

How would this work with regard to zero sized types?

Normally, just like they work already in Rust? There isn't anything special going with ZSTs as far this feature is concerned. I suppose you may be asking about something more specific than the example below which currently returns 0x8 0x8 0x8, however.

#[repr(align(8))]
struct ZST {
    a: (),
    b: (),
}

fn main() {
    let z = Box::new(ZST { a: (), b: () });
    println!("{:p} {:p} {:p}", &*z, &z.a, &z.b);
}

Even if references to ZSTs were to become ZSTs themselves, this shouldn't affect raw pointers (references have validity requirements, raw pointers don't).

I recently merged a PR someone sent me for 100% safe code offset_of! macro, the only catch is that you need to pass in a valid instance of that type (which usually isn't a problem at all).

#[macro_export]
macro_rules! offset_of {
  ($instance:expr, $Type:path, $field:tt) => {{
    // This helps us guard against field access going through a Deref impl.
    #[allow(clippy::unneeded_field_pattern)]
    let $Type { $field: _, .. };
    let reference: &$Type = &$instance;
    let address = reference as *const _ as usize;
    let field_pointer = &reference.$field as *const _ as usize;
    // These asserts/unwraps are compiled away at release, and defend against
    // the case where somehow a deref impl is still invoked.
    let result = field_pointer.checked_sub(address).unwrap();
    assert!(result <= $crate::__core::mem::size_of::<$Type>());
    result
  }};
}

To guarentee that the deref impl can't be invoked,

#[macro_export]
macro_rules! offset_of {
  ($instance:expr, $Type:path, $field:tt) => {{
    let reference: &$Type = &$instance;
    let address = reference as *const _ as usize;
    // This helps us guard against field access going through a Deref impl.
    let $Type { $field: field_ref, .. } = reference;
    let field_pointer = field_ref as *const _ as usize;

    field_pointer - address
  }};
}

That looks like basically the same lines in a slightly different order?

Yes, just made it explicit that the field is getting borrowed from the struct (using match ergonomics). That way you don't have to do the dance with checked_sub and assert

I am pretty sure this was discussed in https://github.com/rust-lang/rfcs/pull/2582, and the conclusion was that it would be too confusing if x.foo was a field access vs just a pointer offset depending on the type of x. Same syntax doing totally different things like this seems like a bad idea.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.