[Pre-RFC] Add a new offset_of macro to core::mem

That sounds like a somewhat backward way of introducing fields in traits and I don't see the benefits of such a solution in comparison to a truly builtin one. In fact, this course of action scares me a bit. The power that this intrinsic provides feels far too great, and I fear that it will lead to more and more diverse UB from user crates that use it in subtly wrong ways than the UB it removes from the existing macro crates, which could also be removed without adding a whole new intrinsic.

This is partially due to the inexpressive return type of usize but also the string parameter. When was the last time a stringly-typed api was the best solution? If reflection is wanted it should be done properly, and if we don't want it then the feature should be carefully designed such that it does suggestively offer the opportunity. And sure, one shouldn't rely on the value of a const function for correctness or compilation but reality teaches us otherwise.

To bring in what Swift has in this area for comparison:

Via the MemoryLayout stdlib type, Swift provides the function offset(of key: PartialKeyPath<T>) -> Int? which provides access to the byte offset from a pointer to T to the property referenced by the keypath, or .none if the property is not stored trivially (e.g. computed, indirect, or overlaps other properties).
Note that causing this function to return .none is not considered a semver major change, because whether a property is stored is not otherwise breaking.


Regarding privacy, Swift considers giving out a keypath to a private property to not break privacy because only someone with access to the property can make a keypath to it, and therefore, passing a keypath to someone gives them the right to get/modify the property with the same rights as the one who created the keypath.

3 Likes

If we get some way to do offset_of, then fields in traits could be implemented on top of it. Therefore there will be insufficient motivation to do the built in solution proposed in the RFC.

This intrinsic (if it is implemented as an intrinsic) could be wrapped in a macro that creates the necessary field type that is compatible with the fields in traits RFC. So it is possible to create a safe api around offset_of. This way, there should be less overall UB.

This is likely going to be implemented as an intrinsic, so that string parameter will be checked at compile time. So no big loss there. (bug a stringly typed api does feel icky to me)

I agree here, offset_of seems like a bad way to do reflection and better ways exist (such as the fields in traits RFC).

Just a reminder: though the intrinsic has been mentioned as offset_of<T>(&str), it would require a const-evaluable string as input, like the simd intrinsics that require literal integer arguments.

This could be exposed as either offset_of<T>(const &str) via magic, offset_of<T, const &str>(), or even only available as offset_of!($:type, $:ident). There’s no intent to use a dynamic string for runtime reflection with this intrinsic.

1 Like

Not as far as I can see, or at least no with the clarity and conciseness I would expect from a part of the type system of the language. A major portion of the new trait-kind is for unsafe information the compiler can rely upon and how can a user-crate define a wrapper providing these assertions? Also, there is interaction with borrow checking due to disjoint fields assertion that is simply not possible to provide semantically in current Rust (though I'd be glad to be proven wrong here). From what I could gather, this was one important motivation behind the proposal.

I do not see how this conclusion follows from the claim. There is less UB if it replaces code that is currently UB with code that does not exhibit it, and if it does not facilitate additional new code that is UB. From my point of view none of these are true for an offset_of-intrinsic. The outer interface of offset_of should not be unsafe and hence I do not get what you meant with creating a safe api around offset_of. Code that wrongly uses field offset currently will continue to have UB even with the intrinsic; the current crate solution returns correct results (when the compiler doesn't exploit the UB which it seldomly seems to do atm).

There is UB within the offset_of macro crates, the most popular determines the offset subtracts pointers into uninitialized memory, and relies on pointer projection to fields of struct-pointers. But the main UB here (invalid references) needs to be fixed regardless of offset_of since it is also used for initializing structs. If it is fixed, the macro could be written without UB (at least @RalfJung made it sound like it could) and not require any intrinsic. From this point of view, the new intrinsic wouldn't fix anything that is not already necessarily being fixed and offset_of! can not fix this since it requires changing language semantics of operations on pointers themselves. (For other readers, note that offset_of having a return type other than usize could provide pointer field projection. This but that is basically Generic pointer to field proposal in a nutshell already mentioned above).

So overall this proposes to add a powerful intrinsic with unprecendented interface and unclear effects on the rest of the language, removes only one instance of UB (the internal implementation of offset_of) whose existance is only incidentally related to the motiviation of the proposal. And its interface offers no signifcant ergonomic benefits for its usage of initialization, still requiring multiple unsafe pointer casts and manual unsafe offsetting (all prone to new UB).

To add on to this, there is also the raw reference RFC which would allow offset_of to be written without UB This would also make Generic pointer to field redundant. Now, there will be some changes to the api described by Generic pointer to field, but overall it should remain the same.

example using &raw from the raw reference RFC

macro_rules! offset_of {
    ($parent:ty, $field:ident) => {{
        let $parent { $field: _, .. }; // protection against deref-coercion
        let ptr = 0_usize as *const $parent;
        (&raw ptr.$field) as usize
    }}
}

macro_rules! field_type {
    ($type:ty, $field:ident, $field_type:ty, $field_type_name:ident) => {
        struct $field_type_name;

        unsafe impl Field for $field_type_name {
            type Type = $field_type;
            type Parent = $type;

            const FIELD_DESCRIPTOR: FieldDescriptor = {
                let $parent { $field: _, .. }; // protection against deref-coercion
                let ptr = 0_usize as *const $parent;
                let ptr: *const $field_type = &raw ptr.$field;
                unsafe { FieldDescriptor::from_offset(ptr as usize) }
            };
        }
    };
}

This wasn’t a major reason for writing up Generic pointer to field, but that is a good reason to bake it into the compiler. Only one issue, a user-land Project trait would make it impossible for the compiler to reason about the disjointness of the (smart/raw) pointers returned by project so this is a moot point. Basically you would have to special case every smart pointer that you want to allow disjoint borrows from.

I meant a reason for the other RFC, `Allow fields in traits that map to lvalues in impl'ing type´. Ideas on disjoint borrowing on custom pointer types (if required at all) fit better in a different thread.

I actually find Swift's solution interesting (kind of like it, but would need to try it out for a verdict) but there are some crucial details. In comparison

  • The input of offset is some well-typed object (a KeyPath from what I could gather) and not a string. It even has a unique expression type for its creation. It overall feels similar to a member pointer (less strictly typed than the current Rust proposal, not every member has a unique type).

  • Secondly, offset is a method of a generic other type which captures the base type in a type parameter and not an ad-hoc intrinsic or macro. This suggests MemoryLayout and KeyPath are the core primitives, not offset.

  • It seems that you are not supposed or at least discouraged to use the integer result of offset for manual computation of say pointers to members. Rather, you can do so type-safely with the KeyPath directly:

    // Mutation through the key path
    root[keyPath: key] = value
    

    Introducing offset alone without similar alternatives/abstractions could be considered rash.

2 Likes

I would very much prefer offset_of!($:type, $:ident), which would allow writing offset_of!(Foo, bar) without quoting bar.

That said, we might want to allow a subset of expression syntax rather than just an ident, to allow offset_of!(Foo, bar.baz) or offset_of!(Foo, bar.baz[3].fnord). Debatable, but people do use the C offsetof that way.

I think as a first step, we should just allow fields, and we can expand to more general expressions later

I would prefer that generality, as it seems more likely to be forward-compatible with disjoint borrows. However I agree with @RustyYato that it need not be the first step, as long as the initial approach does not foreclose such finer resolution as a future extension.

Sounds reasonable to me.

AFAIK, disjointedness only matters with references and accesses, so any API that allows the projection to be done purely as math on raw pointers until the wanted borrow is crated, should be fully capable of allowing disjointed borrows.

2 Likes

Correct. A basic offset_of for fields could be implemented UB-free with a solution for RFC for an operator to take a raw reference by RalfJung · Pull Request #2582 · rust-lang/rfcs · GitHub.

However...

...this does have UB even with the RFC! ptr.field perform an inbounds pointer offset operation, but your pointer is not inbounds of any allocation.

I would like this not to be UB, but there are concerns that making raw pointer field offset a safe operation would lose a lot of optimization potential. Likely, making it UB only on overflow (as opposed to the more restrictive "inbounds") would be sufficient to mitigate that, but LLVM does not offer that option currently.

So, until then, the UB-free way to do offset_of with &raw is:

macro_rules! offset_of {
    ($parent:tt, $field:tt) => {{
        // Make sure the field actually exists. This line ensures that a
        // compile-time error is generated if $field is accessed through a
        // Deref impl.
        let $parent { $field: _, .. };

        // Create an instance of the container and calculate the offset to its field.
        // Here we're using an uninitialized instance of $parent. We avoid UB
        // by only using raw pointers that point to real (allocated, albeit uninitialized) memory.
        let val = $crate::mem::MaybeUninit::<$parent>::uninit();
        let base_ptr = val.as_ptr();
        #[allow(unused_unsafe)] // for when the macro is used in an unsafe block
        let field_ptr = unsafe { &raw (*base_ptr).$field };
        let offset = (field_ptr as usize) - (base_ptr as usize);
        offset
    }};
}
5 Likes

Follow-up: the actual reason this has UB is that dereferencing dangling pointers is UB. So It's the *ptr in your code that already causes UB.

At some point I want to pursue relaxing this for raw pointers, but there are more pressing matters. :wink:

1 Like

Now that offset_from is a const fn (behind a feature flag), this can be implemented as a const-friendly macro (I haven't tested this since I nightly hasn't yet been updated with const_ptr_offset_from; I may revise this in the next day or two when I get a chance to actually test it):

#![feature(const_transmute)]
#![feature(const_ptr_offset_from)]
#![feature(ptr_offset_from)]

macro_rules! offset_of {
    ($Struct:path, $($field:tt)+) => ({
        const OFFSET: usize = {
            extern crate core;

            let base_uninit = core::mem::MaybeUninit::<$Struct>::uninit();
            unsafe {
                // This is UB, but it's the best we can do for the time being.
                // If this is in `core`, we can just leave a comment to tell
                // people that this is an unsafe implementation detail that
                // works here in `core` but is UB outside of this macro.
                let base_ref = core::mem::transmute::<_, &$Struct>(&base_uninit);
                let base_u8_ptr = base_ref as *const _ as *const u8;

                let field_u8_ptr = &base_ref.$($field)+ as *const _ as *const u8;
                let offset = field_u8_ptr.offset_from(base_u8_ptr) as usize;
                
                // Make sure the offset computation stayed within the struct's size.
                let assert = [offset; 1];
                assert[(offset <= core::mem::size_of::<$Struct>()) as usize - 1]
            }
        };
        OFFSET
    })
}

Using $($field:tt)+ allows this to support offset_of!(Struct, field.sub_field.sub_array[3]).

You had to drop the deref protection for this, though.

If a deref coercion actually ends up in a different allocation, offset_from will detect that -- you crucially rely on this computation happening at compile-time though, so there should be a comment for that.

But it is also conceivable that a deref coercion stays inside the same struct... and I am curious how weird that can get? I first thought this could be impure, but since you are running this at const-time we'd actually get an error if the deref code wasn't const-compatible. Probably right now being in a const context entirely protects you against deref coercions as those are non-const fn-calls; that check will get relaxed eventually though.

Yeah, I thought about that. But the deref protection syntax isn't compatible with tuple structs, so I consider it non-viable unfortunately.

Then you should mark the macro as unsafe though.

Actually it works just fine with tuple structs.

4 Likes

Indeed it does. I must have screwed something up when I tried it earlier. I stand corrected. Thanks.