[Pre-RFC] Add a new offset_of macro to core::mem

mjbshaw · January 23, 2019, 5:04am

Feature Name: offset_of
Start Date: 2019-01-13
RFC PR:
Rust Issue:

Summary

Add a new macro named offset_of to core::mem that computes the offset of a type’s field, similar to C’s offsetof.

Motivation

The offset of a field is a regular need in FFI programming. Some examples include:

Computing the offset of an Objective C ivar.
Setting the offset of a vector in a vertex buffer (e.g., SceneKit).
Interacting with an argument parser to store parsed arguments (e.g., FFmpeg’s AVOption).
Initializing an individual field of an uninitialized object.

It’s an open question whether merely creating a reference to uninitialized memory is undefined behavior. There are several crates that define their own offset_of macro that rely on references to uninitialized objects (like memoffset). While this question remains open, it is unknown whether these crates exhibit well-defined behavior. Providing an offset_of macro in Rust’s core library would provide a “blessed” way to compute the offset of a field that users could rely on both now and in the future. If it’s ever decided that a reference to uninitialized memory is undefined behavior, the core library’s offset_of macro will still have well-defined behavior (unlike all user-level implementations).

Providing an offset_of macro in the core library would assist FFI developers in writing correct code.

Guide-level explanation

The offset_of macro takes a type and a field name and expands to a constant expression that evaluates to a usize that gives the offset, in bytes, into a structure for a particular field. Some examples:

// You can use a regular struct with fields:
struct Struct {
    foo: String,
    bar: Vec<u32>,
}

static STRUCT_FOO_OFFSET: usize = offset_of!(Struct, foo);
static STRUCT_BAR_OFFSET: usize = offset_of!(Struct, bar);

// You can use a tuple struct:
struct TupleStruct(u8, u16, u32, u64);

const TUPLE_STRUCT_U8_OFFSET: usize = offset_of!(TupleStruct, 0);
const TUPLE_STRUCT_U16_OFFSET: usize = offset_of!(TupleStruct, 1);
const TUPLE_STRUCT_U32_OFFSET: usize = offset_of!(TupleStruct, 2);
const TUPLE_STRUCT_U64_OFFSET: usize = offset_of!(TupleStruct, 3);

// You can use a tuple:
const TUPLE_CHAR_OFFSET: usize = offset_of!((char, bool), 0);
const TUPLE_BOOL_OFFSET: usize = offset_of!((char, bool), 1);

// You can use a union (but all offsets will be zero):
union Union {
    foo: f32,
    bar: f64,
}

static UNION_FOO_OFFSET: usize = offset_of!(Union, foo);
static UNION_BAR_OFFSET: usize = offset_of!(Union, bar);

An enum cannot be used with offset_of since enums do not have accessible fields.

The offset_of macro respects a type’s and field’s visibility:

mod inner_mod {
    pub struct InnerStruct {
        private_field: usize,
    }

    struct PrivateStruct {
        pub field: usize,
    }
}

// ERROR: field `private_field` of struct `inner_mod::InnerStruct` is private
// const BAD_EXAMPLE_0: usize = offset_of!(inner_mod::InnerStruct, private_field);

// ERROR: struct `PrivateStruct` is private
// const BAD_EXAMPLE_1: usize = offset_of!(inner_mod::PrivateStruct, field);

You also cannot use offset_of to compute the offset of a field’s field (though a future RFC may alter that):

struct Inner {
    inner_field: bool,
}

struct Struct {
    inner: Inner,
}

// ERROR: expected one of `,` or `)`, found `.`
// const BAD_EXAMPLE_2: usize = offset_of!(Struct, inner.inner_field);

Reference-level explanation

In core::mem:

macro_rules! offset_of {
    ($ty:ty, $field:ident $(,)?) => ({ /* compiler built-in */ });
}

The internal implementation of the offset_of macro is generally equivalent to @eddyb’s offset_of (notably, the built-in avoids going through Deref), but differs in that:

The compiler built-in is guaranteed to be safe.
The compiler built-in is const-eval safe.
The compiler built-in supports tuples.
The compiler built-in supports unions.

Drawbacks

This increases the surface area of the core library (albeit in a minor way).

Rationale and alternatives

As touched on in the Motivation section, this is a regularly needed tool for FFI developers. It is used commonly enough that its presence in the core library would, I think, be warranted. Additionally, the implementation in core could be “blessed” in ways the user-level implementations cannot.

User-level implementations of offset_of exist and are presently viable alternatives, but it is unclear whether they exhibit well-defined behavior. They also cannot (with a single macro) both avoid going through Deref and support all of structs, tuples, and unions (since pattern matching is slightly different between them).

The syntax of offset_of is debatable (e.g., offset_of!(Type.field)), but I recommend we follow the historical form of C’s offsetof since Rust FFI developers are likely familiar with it and there aren’t any huge advantages to alternative syntax forms.

The naming is also debatable (e.g., offset_of vs offsetof). Again, I recommend playing off of C’s offsetof (so I will eschew anything crazy like byte_position_of), but separating the words with an underscore feels more idiomatic for Rust (given align_of, size_of, etc.). Using a name similar to offsetof makes searching the internet for the term slightly easier, and using a name with an underscore gives Rust a slight differentiation from C in search results.

Prior art

C’s offsetof
@eddyb’s offset_of.
memoffset crate.
intrusive_collections crate.
field_offse crate.
Discussion of offset_of and MaybeUninit.
My own crate which implements its own offset_of macro, but I won’t link to it because it intentionally invokes undefined behavior and I don’t want @RalfJung to close my loophole until I can implement this in Rust’s core library

Unresolved questions

Should offset_of work with arrays (e.g., offset_of!([u8; 5], [3]))?
Should offset_of work with a field’s field (e.g., offset_of!(Struct, inner.inner_field))?

I’m inclined to say no to these right now. A future RFC could always expand offset_of to support these (which should be backwards compatible).

Future possibilities

The offset_of macro will likely be used in FFI-related code that also uses MaybeUninit in order to compute offsets to fields for initialization. Until the reference-to-uninitialized-memory issue is sorted out (and depending on the conclusion of that issue), offset_of may be the only safe way to initialize individual fields in an uninitialized object. Thus, depending on future discussions, the offset_of might be a prerequisite to writing correct Rust code in certain FFI-related code.

comex · January 23, 2019, 6:59am

, but:

I think at least inner.inner_field should be supported. There's one obvious syntax, with nothing to bikeshed, so it doesn't add much complexity to this RFC to include it upfront; the only question is whether we want to support it at all. In my experience it's clearly useful.

I'm not sure about offset_of!(Array, [3]). Normally, offset_of!(Foo, FIELD) corresponds to foo.FIELD, so you would expect offset_of!(Array, [3]) to correspond to array.[3], which is not valid syntax. But an array index following a field access, like offset_of!(Struct, foo[3]), doesn't have this problem and, in my experience, is also useful.

For reference, in C, expressions such as offsetof(struct foo, inner.inner_field) and offsetof(struct foo, array[3]) are not allowed by the standard, but are supported as an extension by both GCC-compatible compilers and MSVC, and are commonly used in practice.

hanna-kruppe · January 23, 2019, 9:18am

I think it’s really important to provide a safe, universal offset_of in the language or standard library: people need this functionality, none of the ways they can write it themselves can be trusted, and even if it is possible to write a correct user-space implementation it’s too finicky to not put into the standard library.

However, whenever someone proposes a macro for what feels like a new language capability, I have to ask what it’s supposed to expand to. Macros are library code and supposed to turn one token tree into another token tree that one could in principle write out manually, not a way to introduce entirely new language capabilities. For example, although println! does a whole lot of work, in the end it just expands to a big expression constructing an data structure dictated by the format string and embedding the arguments passed in. (The data structures used by the macro in libstd are unstable and there’s no pressure to stabilize them, but you could still do the same in a third party library.)

But for offset_of!(type, field), I don’t know what it could expand to short of new syntax (e.g., offsetof $type.$field) or a magic intrinsic that badly emulates such a keyword (e.g., offset_of::<$type>(stringify!($field))). If something like that is needed to implement the macro, then I think we should very seriously consider just designing and stabilizing that thing in the first place. If we’ll need it anyway, hiding it behind a macro is just misleading.

Caveat: it can be useful to start out with a macro to defer dealing with a big bikeshed, as done with await. But even there the macro is only temporary, and in this case I don’t (yet?) see a bikeshed so large that the temporary-macro-strategy is useful.

RalfJung · January 23, 2019, 12:16pm

Note that accepting this RFC is enough to provide an alternative to field initialization in MaybeUninit, and that RFC has just been proposed for merging.

So, almost certainly, a more ergonomic way to initialize fields (i.e, without offset_of) will exist, independent on where the discussion around references ends up.

That said, offset_of is certainly still useful, and I agree it should be in libcore. If you ignore the Deref problem, I think it can even be written as a library (with the above RFC accepted) already -- but not with code that would be accepted in const context.

Well now that you summoned me into this thread, you know what's going to happen.

format_args! does not expand to anything, though -- and that file contains some other examples.

hanna-kruppe · January 23, 2019, 12:35pm

This is just an internal implementation quirk caused by predating the proc macro API. Anyone who cares enough to put in the time could rewrite format_args! as a plain old proc macro (maybe losing some nice-to-have diagnostics). Furthermore, even today you can run cargo expand and see the regular token tree that it expands to, because it is expanded as a tokens -> tokens transform, just the way it's hooked into macro expansion is magical.

ckaran · January 23, 2019, 2:08pm

I want to increase awareness of intrusive collections, which pretty much require an offset_of macro to work correctly. I come from the C-world, and intrusive collections are a core part of my toolbox. They are also something I want to see more of in rust as they make lots of stuff much easier to do. Just my USD$0.02.

Amanieu · January 23, 2019, 2:57pm

Actually format_args! does expand to normal Rust code (the implementation is in libsyntax_ext). A better example would be asm! which expands into a custom AST node that can't be represented with normal Rust code.

mjbshaw · January 23, 2019, 3:24pm

For the record, I have nothing against supporting offset_of!(Struct, inner.inner_field), offset_of!(Struct, inner.inner_field[3]), offset_of!(Struct, field[3]), or even offset_of!([u8; 5], [3]) (though I agree the syntax of that last one is questionable and not necessarily intuitive, and I agree that for now it should not be included in the RFC).

Thanks for pointing out GCC's prior art of allowing them! I wasn't aware of that, and it definitely makes me feel like I can include this in the RFC without feeling like it'll inevitably lead to an endless bikeshed.

mjbshaw · January 23, 2019, 3:49pm

That's... a good question I overlooked. I was originally thinking it would have the full definition of type available (and so could evaluate to a usize literal), but I realize now that macro expansion happens way too early for the macro evaluator to have the full definition of type. Here are a couple alternatives (neither of which are ideal, and neither of which I expect to be the final accepted solution; I'm just trying to get the idea-ball rolling):

Ignoring `Deref`

If we can ignore Deref for now, this could be implemented with RFC 2582 like so:

macro_rules! offset_of {
    ($ty:ty, $field:ident $(,)?) => ({
        let null = 0usize as *const $ty;
        $crate::mem::transmute::<_, usize>(&(*null).$field as *const _)
    });
}

This should be const-eval friendly too, even on current rust. It also is compatible with sub-fields (e.g., offset_of!(Struct, inner.inner_field)). The only downsides I can see are:

It doesn't avoid going through Deref. Avoiding Deref might require new special syntax.
It doesn't work if &field results in a fat pointer. That could be fixed by doing proper pointer subtraction instead of transmuting (I know someone's going to remind me that "pointers aren't just integers", which I'm well aware of). I should edit the post to fix that but it's time to do my $dayjob.

Crazy per-type traits

If a special trait was automatically defined by the compiler for each type (where each type gets its own trait), then it could be:

macro_rules! offset_of {
    ($ty:ty, $field:ident $(,)?) => ({
        <$ty as CrazyBuiltInTraitCustomMadeFor<$ty>>::$field
    });
}

That is, the trait CrazyBuiltInTraitCustomMadeFor is automatically defined by the compiler for each type, and it contains associated consts that share the name of the struct's fields, where each const is the offset of the field within the type. I haven't thought this through much, so there might be some major complications I'm overlooking (e.g., I'm not sure how this would work with sub-fields, nor how it should work with tuples which have integers for field names).

It could also be an actual type that has a custom impl for each type (so instead of doing <$ty as CrazyBuiltInTraitCustomMadeFor<$ty>>::$field in the macro, it would be CrazyBuiltInType<$ty>::$field).

Anyway, I'll have to give the expansion of the macro more thought. Thanks for bringing that up.

RalfJung · January 23, 2019, 6:30pm

This is UB: accessing a field asserts that the old and new pointer (computing the offset) are in-bounds of the same allocation. Your pointer is not in-bounds of any allocation.

There are some tricks to avoid Deref, like here and here.

mcy · January 23, 2019, 7:28pm

asm! is not a macro; it is a macro-like syntactic construct with its own AST node.

Re: offset_of!, I strongly believe that handing out actual offsets is a Bad Idea; I think what we want is exactly T::*U (ptr-to-member) and the acompanying operator ->* from C++, though obviously with different syntax. Maybe a one-way usize cast might be ok, but I suspect that most uses of offsets never need to witness the internal value, whatever that might be.

scottmcm · January 23, 2019, 7:31pm

That was even a keyword until recently... https://github.com/rust-lang/rfcs/pull/2421

hanna-kruppe · January 24, 2019, 9:30pm

Yes, asm! is magic rather than a macro and IMO that's one of the many problems preventing its stabilization.

I appreciate that this may offer some additional type safety for many use cases, but pointers-to-member are also a significantly larger feature that's significantly more difficult to design, so strategically I do not think it's a good trade off to make offsetof dependent on it, even assuming pointers-to-members will ever be added to Rust (which seems quite uncertain). The ability to get the offset of a field at all is fundamental to a bunch of systems software and people are already missing it in practice and and badly emulating it. We should get a good solution into their hands quickly, rather than escalate to a more perfect solution.

Plus, some or all of the type safety can also be achieved in library code (struct Offset<Base, Field>(usize, PhantomData<Base>, PhantomData<Field>); with an unsafe constructor wrapped by a safe macro and safe functions &Base -> &Field, &mut Base -> &mut Field, etc.)

mjbshaw · January 25, 2019, 4:41am

Hmm, I must have overestimated the guarantees of your RFC (2582). I assumed that &(*null).$field as *const _ would not be seen as a field access (as far as UB is concerned), and instead would be seen as a single atomic expression computing a pointer. If that's not the case, then it would have to use MaybeUninit and pointer subtraction (which would be necessary anyways to support unsized fields).

Yes, but that syntax isn't compatible with tuples or unions. I could drop tuple and union support from this RFC, but I was hoping to find a way to include them. Additionally, they create a reference to uninitialized memory (even with applying RFC 2582).

mjbshaw · January 25, 2019, 6:31am

How objectionable would it be if offset_of! was also a macro-like syntactic construct with its own AST node?

Ultimately I can't think of a good way to implement offset_of! that doesn't rely on something at least as equally hacky. Here are all the ways I/others have mentioned here (the code in each bullet point is meant to be the body of the macro, with $Struct being the type and $field being the field):

The following doesn't prevent you from going through auto-Deref:

let uninit = std::mem::MaybeUninit::<$Struct>::uninitialized();
let field = unsafe { &(*uninit.as_ptr()).$field as *const _ };
let offset = (field as *const _ as usize).wrapping_sub(&uninit as *const _ as usize);

This requires inventing some new syntax or mechanism to stop auto-Deref.

The following prevents auto-Deref, but it's not compatible with tuples or unions, and it cannot support field.sub_field offsets:
```
let uninit = std::mem::MaybeUninit::<$Struct>::uninitialized();
let &$Struct { $field: ref field, .. } = unsafe { &*uninit.as_ptr() };
let offset = (field as *const _ as usize).wrapping_sub(&uninit as *const _ as usize);
```
It also creates a reference to uninitialized memory, which is one of the things I'm trying to avoid since it's still an open question whether it's well-defined behavior to do so.
Using an intrinsic that takes the field parameter as a string could work:
```
let offset = internal_offset_of_intrinsic::<$Struct>(stringify!($field));
```
(we could also split the field parameter string by subfields if we want to preserve span information and ; e.g., field.subfield → ("field", "subfield") and pass all of them to the intrinsic). The intrinsic would be #[doc(hidden)] so users don't use it directly (with a note that doing so is UB).
Using an auto-generated per-type trait (or type), as I previously mentioned. This seems like a lot of work, and I don't really like it.
Make offset_of! a macro-like syntactic construct with its own AST node. This feels closest to the intrinsic idea, but without the hack of stringifying the fields.
Make a new keyword or syntax for the offset (i.e. revive the offsetof keyword that was killed). This might also a new AST node (or not). I don't really want to introduce new user-level syntax for this feature. Rust has had a lot of syntax churn over the past year, and I think that's been reflected in some of the Rust 2019 posts that advocate slowing down (in addition to other factors).

While I don't like the idea of having a macro that's not really a macro, it's starting to look appealing...

RalfJung · January 25, 2019, 8:08am

It is a single atomic expression computing a pointer. But it uses getelementptr inbounds for this computation, meaning the computation is UB if it is not within the bounds of the same object. That's just how computing a pointer for field access works in LLVM. It helps a lot with alias analysis.

mcy · January 25, 2019, 2:33pm

Extremely. As @hanna-kruppe mentioned, this is a huge obstacle for inline and global assembly. What you want is for core::offset_of! to be a compiler-evaluated macro, like the file line and column macros. These are declared in libcore/macros.rs, but their bodies are ignored by the compiler. It is unclear whether this can be evaluated as a proc macro.

@hanna-kruppe has a point that a library (hopefully, libcore) can abstract over numertical offsets, which is Not Wrong (though, I'll mention that it means things like Offset<(u32, u32), u32> can't be byte-sized... though that might not be a huge loss in the end.

felix.s · January 25, 2019, 6:27pm

We wouldn't have this problem if the syntax were offset_of!(Foo, .FIELD). I'm not necessarily endorsing it, but it's possible.

To offer a point of reference, in GCC and Clang offsetof(type, field) expands to... __builtin_offsetof(type, field), where __builtin_offsetof is a keyword that user code can invoke directly if it really wants. And it's done (I believe) for pretty much the same reason that @RalfJung mentioned; i.e. the 'traditional' implementation of ((size_t)&((type *)0)->field) being UB.

I don't think we should feel guilty about implementing this feature by adding yet another intrinsic to the language. And I think macro syntax fits it quite well, actually; its use cases are rare enough that it may be not worth the churn of reserving a keyword.

(I'm not distinguishing 'macro-like syntactic constructs' from 'compiler-evaluated macros' here. As far as user code is concerned, it's a distinction without a difference.)

CAD97 · January 25, 2019, 6:34pm

There is a small difference, in how it interacts with name resolution (as mentioned in the await thread).

That said, we can make a real compiler-evaluated-macro that evaluates to a surface-syntax-unexposed construct, eliminating that issue.

comex · January 26, 2019, 4:28am

By the way, the reason offsetof is no longer a keyword, from RFC 2421:

If we are not using a keyword for anything in the language, and we are sure that we have no intention of using the keyword in the future, then it is permissible to unreserve a keyword and it is motivated.

[...]

Rationale for sizeof , alignof , and offsetof

We already have std::mem::size_of and similar which are const fn s or can be. In the case of offsetof , we would instead use a macro offset_of! .

In other words, there's already an accepted RFC saying that offset_of! would be a better choice: deciding we need a keyword now would basically be 'changing our mind'. Not that that's necessarily bad.

Topic		Replies	Views
Discussion on offset_of!(..)	13	6373	February 2, 2019
Get the offset of a field from the base of a struct Unsafe Code Guidelines	8	6880	March 13, 2023
Pre-Pre-RFC: Field offsets language design	7	1752	March 25, 2019
Supporting offset-based types (allocation-bound indicies) language design	1	216	September 30, 2024
Pre-RFC: Struct/union raw pointer field access language design	11	2105	April 2, 2020