[Pre-RFC] Add a new offset_of macro to core::mem

The C reference is the normative part, but having a version of that that is actually readable seems like a good idea IMO.

3 Likes

Now I am actively worried, are you saying yours is worse than GitHub - Gilnaa/memoffset: offsetof for Rust ? :wink:

IMO it would be a good idea to at least centralize on one hacky way to implement offset_of despite it being UB. The memoffset crate seems to be a good candidate for that -- the maintainer is responsive to our concerns and suggestions.

So, until offset_of is in libstd, I think it would make sense to encourage people to use memoffset instead of their own home-grown solutions. Is anyone up for actively searching through Rust code bases out there, finding instances of an offset_of macro, and suggesting they use this create instead?

1 Like

Can’t we just provide a core::intrinsic::offset_of::<T>(field_name: &str) -> usize intrinsic ?

The compiler always knows the offsets, so that should work for all types (repr(Rust), repr(C), etc.) and be reliable. It doesn’t need to be a macro, but a macro can be implemented on top.

Doesn’t this require the field name to be known at compile time?

Yes, my implementation involves a worse form of UB.

I agree, but I need something that is const-eval friendly (I'm initializing static structures that interact with the Objective-C ABI), and none of these other options or crates are const-friendly. Various issues include:

But hey, if this will help motivate making some kind of const-friendly offset_of, I give to you my implementation:

macro_rules! offset_of {
    ($parent:tt, $field:tt) => {{
        union TransmuteHack<T: Copy, U: Copy> {
            from: T,
            to: U,
        }
        unsafe {
            TransmuteHack::<_, usize> {
                from: &TransmuteHack::<usize, &'static $parent> {
                    from: 0usize,
                }.to.$field,
            }.to
        }
    }}
}

It's terrible, but it's const-friendly and supports repr(Rust) structures (and as a bonus works on stable, but that's only important to me insofar as it means my crates that use it don't have to add #![feature(...)] for whatever nightly features the macro might use). I hate it, and I'd love to move off of it, but so far I don't have any viable alternatives because nothing else works when initializing a static.

I'd be perfectly happy with that if the intrinsic was unstable but we had a std macro that wrapped it.

2 Likes

@mjbshaw ah, const fn is a very tough additional constraint. Your code looks a lot like what memoffset had! It also used to support const fn. And transmuting 0 into a reference causes SIGILL on some platforms. :confused:

Could you try to use mem::align_of::<$parent>() instead of 0? That would be slightly less UB. :wink: In fact that would make it basically equivalent to the pre-MaybeUninit version that is currently in memoffset.

And you could also force evaluation to happen at CTFE, thereby avoiding run-time trouble:

macro_rules! offset_of {
    ($parent:tt, $field:tt) => {{
        // Make sure the field actually exists. This line ensures that a
        // compile-time error is generated if $field is accessed through a
        // Deref impl.
        let $parent { $field: _, .. };
        // FORCING code to be const-eval'ed so that it can
        // not cause run-time trouble.
        const OFFSET: usize = {
            union TransmuteHack<T: Copy, U: Copy> {
                from: T,
                to: U,
            }
            unsafe {
                TransmuteHack::<_, usize> {
                    from: &TransmuteHack::<usize, &'static $parent> {
                        // Properly aligned to maintain at least *some*
                        // properties of a valid reference.
                        from: std::mem::align_of::<$parent>(),
                    }.to.$field,
                }.to - std::mem::align_of::<$parent>()
            }
        };
        OFFSET
    }}
}

I also added Deref-coercion-protection.

Here's a small test of this on the playground.

3 Likes

Oh and did you think we were not aware of union-transmutes in CTFE and we’d close that loophole if you reported it? Don’t worry, we knew for a while. :wink: It had to stay open due to backwards compatibility.

I have to admit I am slightly irritated at the idea that some people would not report known soundness issues because they are afraid they would get fixed. I hope I did not give the impression that when a soundness issue occurs, it will be closed and shut at all cost and disregard valid concerns. I am well aware that there are trade-offs here, and while I personally feel very strongly about soundness, I sympasize with developers that just need to “get something done”. I am sorry if I came across as overly dogmatic in that regard.

I do think that developers should be upfront about deliberately causing UB and the reasons for that. I have submitted PRs to several projects that add comments saying “this is UB, but we do it anyway because”. Whenever that happens, that’s a great piece of feedback for things the language should be able to do but currently isn’t. We should probably collect these somewhere!

6 Likes

Sure, that's totally reasonable.

Nice idea. This also fixes calling offset_of! within a const fn, since union in a const fn isn't stable yet (I don't personally need offset_of! in a const fn, but it's nice to have it available).

That's cool, I didn't realize this you could do this without = value at the end.

No, it was actually the creation of a null reference. I was surprised CTFE didn't complain about creating a null &T, and that seems like a pretty easy loophole to close.

Regarding the rest of your comment, I don't think you've come across as overly dogmatic or anything negative like that. You've consistently shown yourself to be very pragmatic, and I'm quite grateful for your contributions to various discussions, crates, and Rust itself. Thank you, and I'll be less hesitant about sharing my crappy hacks in the future :slight_smile:

I hope you'll forgive me for calling you out specifically earlier. I meant it as a lighthearted gesture towards all your (wonderful, I might add) soundness work on Rust.

6 Likes

I shamelessly stole that from @Amanieu :wink:

That would basically mean enforcing (some part of) the validity invariant during CTFE. I wouldn't object to that but (a) during CTFE we do have very good control about execution so we know this won't do something completely crazy, and (b) I think @nnethercote would ban me from the project for the perf regression that would introduce. :wink:

We do check that no bad values "leave CTFE" though, on the interface to the run-time part of the program, as you can see in this example. (Also try other values, such as 1 or 4. You'll get different errors.)

:heart:

1 Like

Yes, we could make it core::intrinsic::offset_of::<T, const FIELD: &'static str>() if we wanted, but since intrinsics are magical, we don't have to. The intrinsic can just error if you pass it a non-const value.

Honest question: what uses does offset_of! have beyond translating pointer-to-container to pointer-to-member and vice-versa? I honestly feel like pointer projection would solve most use cases, though it would be a larger language change. And if I’m not mistaken, it’d make a userland offset_of! writable.

2 Likes

I need the actual numerical offset for FFI. Specifically, the Objective-C ABI uses the offsets for instance variables (ivars). For example:

#import <Foundation/Foundation.h>

@interface Foo : NSObject
@end

@implementation Foo {
  BOOL _field0;
  float _field1;
  int32_t _field2;
}

@end

This generates assembly that looks like this (some portions removed for brevity):

  .private_extern  _OBJC_IVAR_$_Foo._field0
  .section  __DATA,__objc_ivar
  .globl  _OBJC_IVAR_$_Foo._field0
  .p2align  3
_OBJC_IVAR_$_Foo._field0:
  .quad  8

  .private_extern  _OBJC_IVAR_$_Foo._field1
  .section  __DATA,__objc_ivar
  .globl  _OBJC_IVAR_$_Foo._field1
  .p2align  3
_OBJC_IVAR_$_Foo._field1:
  .quad  12

  .private_extern  _OBJC_IVAR_$_Foo._field2
  .section  __DATA,__objc_ivar
  .globl  _OBJC_IVAR_$_Foo._field2
  .p2align  3
_OBJC_IVAR_$_Foo._field2:
  .quad  16

All of those _OBJC_IVAR_$_Foo._fieldN symbols are the offsets for each field within the structure. When you access self's _field0 in Objective-C, the compiler translates that into a volatile load of _OBJC_IVAR_$_Foo._field0 to get the offset, and then uses self + offset to compute the address of the ivar, and then loads/stores that address as needed.

My objrs crate provides macros that transforms Rust code so that it matches the Objective-C ABI, allowing Rust and Objective-C to interoperate pretty seamlessly (and even allows you to use Rust types for ivars, e.g. String or Vec). Implementing Foo can be done like so:

#[objrs(class, super = NSObject)]
struct Foo {
  _field0: bool,
  _field1: f32,
  _field2: i32,
}

This will generate assembly that is nearly identical to what the original Objective-C would have generated (though you’ll want to add #[repr(C)] if you’re going for more precision). Doing this requires the objrs macro to compute the offsets of the fields and generate the correct statics that have the right offset value for each ivar.

I personally don’t care how offset_of! is implemented (whether it’s projection, an intrinsic, MaybeUninit and pointer subtraction, etc.). I just need to be able to evaluate it as a const so I can use it to initialize the statics.

5 Likes

I imagine pointers-to-fields would permit casting to usize for this purpose.

2 Likes

If we add an offset_of operation, I think I would like it to be fallible, so that one can use it to query whether a type has a field, and get a recoverable error otherwise, e.g., fn offset_of<...>(...) -> Option<usize>. That could be quite powerful.

The only problem I have with an offset_of intrinsic (or macro) is that it will likely play poorly with privacy rules, by allowing accessing private fields. I would like to prevent this because almost all unsafe code depends on the fact that privacy rules are practically unbreakable.

For example, I don’t want to be able to access the buf (RawVec<_>) field of Vec<_> from outside std::vec. Using the intrinsic (or macro) std::intrinsics::offset_of::<Vec<T>>("buf") would allow access to buf from ouside of std::vec.

Because of this sort of privacy violation, it will be very hard to build safe apis around these instrinsics to allow projecting through higher order abstractions like Pin<&mut T> or similar because there would be no way to check if a field is visible.

2 Likes

Wouldn’t you still have to derefence a raw pointer in order to get at a private field that way?

To access field, you need to add offset_of<T>(...) to the base pointer to T value and dereference it. But dereferenicng pointers is unsafe. So your argument boils down to: using unsafe (to access private fields) is unsafe. I may be missing something but I don’t see how adding offset_of changes anything here regarding safe APIs.

The problem is when you want to provide a safe API for projecting through e.g. Pin<&mut F> to some Pin<&mut F.structurally_pinned_field>. With just an intrinsic offset_of<T>(const &str), there’s no way to make a safe API that wraps the offsetting, as you’re relying on the use of a pointer being unsafe to ignore privacy rules soundly.

1 Like

The heart of my issue is trying to generalize beyond raw pointers, so saying that raw pointers are unsafe to use is missing the point.

That said, I think that privacy rules haven’t come up in discussion so I wanted to make sure they weren’t overlooked.


For example, lets say I want to add a Project trait that allows safe projection through pointers, both raw and smart without dereferencing the pointer. Now, for completeness I would also like to implement Project for &mut T, this should be possible given that we can already project through &mut T. But providing a safe api is hard if you can get the offset of arbitrary fields.

The only way I can think of making this pattern safe is by also providing an unsafe trait Field and making Project go through that.

TheProject and Field traits are based on my RFC that was linked earlier, but I don’t think that is the only place where privacy may be an issue, so I think that it is important to consider offset_of's interaction with privacy rules.

Partially concerning privacy rules: Should this work recursively? Becuase in privacy terms I am allowed to refactor an private member into a member of a private member (whose type is in the same module) with the guarantee of SemVer compatibility. If the return type is not usize (and I think it should be more type-safe and a offer a stricter/better abstraction) then it should be possible to do this refactor without influencing the return type.

pub struct Quack {
    foo: Foo,
}

impl Quack {
     pub const FOO_OFFSET: ... = offset_of!(Quack, foo); // Privacy allowed
}

It should be possible to refactor this without affecting the publicly visible item FOO_OFFSET. Reasons would be avoiding deduplication within, dependency injection, …

pub struct Quack {
    inner: InnerQuack,
}

struct InnerQuack {
    foo: Foo,
}

impl Quack {
     pub const FOO_OFFSET: ... = offset_of!(Quack, inner.foo);
}

A note on prior art: In my mind, this should also be allowed to not duplicate an inconsistency of how C++ pointer-to-member works, where the pointer-to-data-member of a super class can be implicitely converted to a pointer-to-data-member of owns own class but it is not possible to point to fields of members. Despite the fact that the layout of the superclass within ones own class happens exactly as if it were a member and thus pointer-to-member of a super class is kind of like pointing to a field of a member. :woman_shrugging: