[Pre-RFC] Add a new offset_of macro to core::mem

The problem then is that if we wanted to add an API to check whether a field does exist, has_field<T>(...) -> bool wouldn’t do the trick, because even if it would return false, code doing offset_of<T>(...) would get monomorphized unconditionally, unconditionally resulting in a compilation failure.

You could potentially have a try_offset_of, without impacting typical unconditional use cases.

Then if you have a use case for getting the offset of a field that might not exist, add a separate macro or function that returns an Option and never generates a compilation failure.

The common case for this (as stated by numerous people asking for it for years) involves accessing a field known to exist at compile time, and that case shouldn’t require an unwrap.

Wouldn’t try_offset_of<T>(…) provide a solution for both the existence/public-visibility test and the monomorphization issue? The compiler could simply compile false or None (as appropriate) for those cases where the field does not exist or is not sufficiently visible.

1 Like

Yes @Tom-Phinney, that would also work. I just wanted to limit the number of intrinsics exposed by the compiler, and building offset_of! in a library (like libstd) on top of try_offset_of seemed feasible:

macro_rules!  offset_of {
    ($ty:ty, $field:expr) => {{
          const RESULT: usize = try_offset_of::<$ty, $field>().unwrap_or_else(|| {
               panic!("Type \"{}\" does not have visible field \"{}\"", stringify!($ty), stringify!($expr))
          });
          RESULT
    }}
}

I kind of wish the API of try_offset_of would, however, differentiate between whether the field does not exist, or whether the field isn’t visible, e.g., by returning Result<usize, ...SomeError...> instead.

That information is a security vulnerability, since it would allow probing whether such a field exists or not.

1 Like

How is this a security vulnerability?

Also, one can already probe this today so… trying to forbid this is probably not worth the effort.

Presumably there is a reason why fields or structs are made private. One reason is to not disclose internal structure of a library crate to other crates which use it. Enabling user B to probe the internal structure of a library crate created by user A seems like a privacy violation, and potentially a security violation. Among other things, it can disclose to user B the likely internal structural approach used by the library that was attempting to keep such details private.

Presumably there is a reason why fields or structs are made private.

The only goal privacy achieves is preventing accidental misuse of some internal implementation details.

Users make use of internal implementation details all the time, and we provide many APIs that reveal them. While it is currently hard expensive to obtain the offset of a struct field, it is, e.g., trivial, to enumerate all struct fields, and compute whether they are private or public.

The simplest example of users using internal implementation details would be size_of<T>. Given pub struct Public(/* private: */ i32);, size_of<T> reveals that even though Public has no visible fields, it contains 4 bytes of data. Users use this all the time, when putting Public on the stack, or on the heap.

Being able to use a type requires some knowledge about it, and if you make a type public, you are exposing its path and layout, allowing users that want to to exploit their size, alignment, niches, call ABI, the type repr, enumerate the type fields, whether they are private or public, the offsets of all its fields, including private ones, etc. quiet trivially. It just requires sufficient motivation to do so and willing to pay a computational price for it.

I wouldn’t call it a “security vulnerability” in any way, but it does break an API/ABI boundary by exposing internal details. Library crates often use private fields in ways that they expect to freely change without breaking compatibility with users of the library.

1 Like

Presumably any user of a library crate who relies on non-public details of the library crate has already self-violated any API/ABI stability guarantees. In that case referencing a “private” field by name seems not much different than comparing a size_of<T> to an expected value. Thus I am moving toward @gnzlbg’s viewpoint.

FWIW I am not suggesting that writing code that does this is “good practice”, it isn’t. Code that relies on implementation details it shouldn’t be relying on for correctness will break when those details change.

But code needs to be able to query these details for a variety of reasons, and this code could be made simpler and more robust with better APIs to access this information.

Having said this, while align_of/size_of are super basic reflection APIs, and offset_of is slightly more advanced, try_offset_of -> Result<usize, Error> is a much more complicated reflection API. What should Error contain ? Field not found vs field is private ? How private ? pub(crate)? Some other kind ? What do we do when we add a new visibility specifier ?

I think I’d rather not answer any of these questions and just add offset_of -> usize that fails at compile time. With an Option<usize> things were a bit simpler, but if one goes into trying to report any more detail the complexity of the problem explodes. So I guess I’m back to @josh camp of providing the simplest thing that works. Anything more complex belongs in a more fleshed out reflection API.

2 Likes

For the record, I have no objection to try_offset_of -> Option<usize>, if there are use cases for that, as long as there’s an offset_of -> usize that fails at compile time.

1 Like

I think having a try_offset_of that returns None where offset_of fails to compile would be great. So long as offset_of doesn’t allow getting the offset of private field (this seems to be the consensus).

I would also like it if try_offset_of works with type parameters and allow accessing the fields of the monomorphized type. This could allow building abstractions on top of try_offset_of. It isn’t clear how this interacts with privacy.

2 Likes

That sounds like a somewhat backward way of introducing fields in traits and I don't see the benefits of such a solution in comparison to a truly builtin one. In fact, this course of action scares me a bit. The power that this intrinsic provides feels far too great, and I fear that it will lead to more and more diverse UB from user crates that use it in subtly wrong ways than the UB it removes from the existing macro crates, which could also be removed without adding a whole new intrinsic.

This is partially due to the inexpressive return type of usize but also the string parameter. When was the last time a stringly-typed api was the best solution? If reflection is wanted it should be done properly, and if we don't want it then the feature should be carefully designed such that it does suggestively offer the opportunity. And sure, one shouldn't rely on the value of a const function for correctness or compilation but reality teaches us otherwise.

To bring in what Swift has in this area for comparison:

Via the MemoryLayout stdlib type, Swift provides the function offset(of key: PartialKeyPath<T>) -> Int? which provides access to the byte offset from a pointer to T to the property referenced by the keypath, or .none if the property is not stored trivially (e.g. computed, indirect, or overlaps other properties).
Note that causing this function to return .none is not considered a semver major change, because whether a property is stored is not otherwise breaking.


Regarding privacy, Swift considers giving out a keypath to a private property to not break privacy because only someone with access to the property can make a keypath to it, and therefore, passing a keypath to someone gives them the right to get/modify the property with the same rights as the one who created the keypath.

3 Likes

If we get some way to do offset_of, then fields in traits could be implemented on top of it. Therefore there will be insufficient motivation to do the built in solution proposed in the RFC.

This intrinsic (if it is implemented as an intrinsic) could be wrapped in a macro that creates the necessary field type that is compatible with the fields in traits RFC. So it is possible to create a safe api around offset_of. This way, there should be less overall UB.

This is likely going to be implemented as an intrinsic, so that string parameter will be checked at compile time. So no big loss there. (bug a stringly typed api does feel icky to me)

I agree here, offset_of seems like a bad way to do reflection and better ways exist (such as the fields in traits RFC).

Just a reminder: though the intrinsic has been mentioned as offset_of<T>(&str), it would require a const-evaluable string as input, like the simd intrinsics that require literal integer arguments.

This could be exposed as either offset_of<T>(const &str) via magic, offset_of<T, const &str>(), or even only available as offset_of!($:type, $:ident). There’s no intent to use a dynamic string for runtime reflection with this intrinsic.

1 Like

Not as far as I can see, or at least no with the clarity and conciseness I would expect from a part of the type system of the language. A major portion of the new trait-kind is for unsafe information the compiler can rely upon and how can a user-crate define a wrapper providing these assertions? Also, there is interaction with borrow checking due to disjoint fields assertion that is simply not possible to provide semantically in current Rust (though I'd be glad to be proven wrong here). From what I could gather, this was one important motivation behind the proposal.

I do not see how this conclusion follows from the claim. There is less UB if it replaces code that is currently UB with code that does not exhibit it, and if it does not facilitate additional new code that is UB. From my point of view none of these are true for an offset_of-intrinsic. The outer interface of offset_of should not be unsafe and hence I do not get what you meant with creating a safe api around offset_of. Code that wrongly uses field offset currently will continue to have UB even with the intrinsic; the current crate solution returns correct results (when the compiler doesn't exploit the UB which it seldomly seems to do atm).

There is UB within the offset_of macro crates, the most popular determines the offset subtracts pointers into uninitialized memory, and relies on pointer projection to fields of struct-pointers. But the main UB here (invalid references) needs to be fixed regardless of offset_of since it is also used for initializing structs. If it is fixed, the macro could be written without UB (at least @RalfJung made it sound like it could) and not require any intrinsic. From this point of view, the new intrinsic wouldn't fix anything that is not already necessarily being fixed and offset_of! can not fix this since it requires changing language semantics of operations on pointers themselves. (For other readers, note that offset_of having a return type other than usize could provide pointer field projection. This but that is basically Generic pointer to field proposal in a nutshell already mentioned above).

So overall this proposes to add a powerful intrinsic with unprecendented interface and unclear effects on the rest of the language, removes only one instance of UB (the internal implementation of offset_of) whose existance is only incidentally related to the motiviation of the proposal. And its interface offers no signifcant ergonomic benefits for its usage of initialization, still requiring multiple unsafe pointer casts and manual unsafe offsetting (all prone to new UB).

To add on to this, there is also the raw reference RFC which would allow offset_of to be written without UB This would also make Generic pointer to field redundant. Now, there will be some changes to the api described by Generic pointer to field, but overall it should remain the same.

example using &raw from the raw reference RFC

macro_rules! offset_of {
    ($parent:ty, $field:ident) => {{
        let $parent { $field: _, .. }; // protection against deref-coercion
        let ptr = 0_usize as *const $parent;
        (&raw ptr.$field) as usize
    }}
}

macro_rules! field_type {
    ($type:ty, $field:ident, $field_type:ty, $field_type_name:ident) => {
        struct $field_type_name;

        unsafe impl Field for $field_type_name {
            type Type = $field_type;
            type Parent = $type;

            const FIELD_DESCRIPTOR: FieldDescriptor = {
                let $parent { $field: _, .. }; // protection against deref-coercion
                let ptr = 0_usize as *const $parent;
                let ptr: *const $field_type = &raw ptr.$field;
                unsafe { FieldDescriptor::from_offset(ptr as usize) }
            };
        }
    };
}

This wasn't a major reason for writing up Generic pointer to field, but that is a good reason to bake it into the compiler. Only one issue, a user-land Project trait would make it impossible for the compiler to reason about the disjointness of the (smart/raw) pointers returned by project so this is a moot point. Basically you would have to special case every smart pointer that you want to allow disjoint borrows from.