[Pre-RFC] Add a new offset_of macro to core::mem


#41

I actually don’t like that argument; the problem with it is that it sill means that at some point k#ident -> ident, which means that if anyone was using ident for something else, they’d still need to continue using k#ident indefinitely. If we’re going to have to deal with that, we may as well stick to whatever is the accepted naming convention.

So, are we at the point where we’re going to spitball naming conventions for keywords? And if we are, what are the criteria?


#42

They could swap over to using the r#ident syntax in future editions, therefore keeping their names, and using the new stuff. This would be useful if there are some things which couldn’t be backported.


#43

Yeah, but that still means someone needs to work their way through the code to find all the places where they need to change to r#ident, and fix them up. Hrmm… I’m starting to sense that rust needs something like 2to3.

Actually, now that I think about it, something like 2to3 might be a really, really good idea. Rust keeps improving, but taking advantage of those improvements requires going back over old code and fixing it. Something like cargo upgrade --old-edition=2015 --new-edition=2021 could be quite helpful.

That said, I think we’ve now officially wandered off the topic by quite a bit. If anyone wants to continue this discussion, say so, and I’ll start a new topic.


#44

It’s quite useful, and called cargo fix --edition :slightly_smiling_face:


#45

Someday, I just may learn to read the docs before I type! :sweat_smile:

But that actually solves the problem, even the original question of an offsetof keyword or an offset_of!() macro. Use whatever works best for now, and if there need to be changes made in the future, let cargo fix handle it.


#46

I think it is the idea to do this very very very sparingly. Switching out a keyword for a macro (if it becomes writable as a macro) seems like an incredibly frivolous use case. Remember especially that example code and memory doesn’t get changed by cargo fix.

I’m in favor of the magic macro form (very mildly) only because it seems probable it will one day be a normal macro, but I agree it’s way more important to get this functionality in in some form.


#47

I think that this would only be used when crate authors are upgrading from one edition to another, as a way of helping with migration. 2to3 is not a tool that you want to use mutiple times in a row on the same code base (in particular, it isn’t idempotent; a trivial case is print -> print(); if you run it on same code base multiple times in a row, you’ll get print -> print() -> print(()) -> etc. I don’t know if cargo fix has the same issues, but I suspect that anyone that was working on migrating this way would be working on a separate branch intended solely for the migration, testing and checking the output of cargo fix as they go. Once they had done the migration, they wouldn’t use cargo fix again for a long time.


#48

What @catenary is referring to is that the current intent with editions is to avoid breaking changes as much as possible, only using them in cases where the upside vastly outweighs the downside of forcing existing code to change. Having breaking changes be cargo fixable is a requirement to consider them, but having cargo fix available is not an excuse to add breaking changes that provide only a small upside if they result in significant churn in the ecosystem.


#49

We discussed this in the FFI meeting at the Rust All Hands in Berlin, and as a summary: a few folks are planning to work on making sure the operations needed for a macro can be done without invoking undefined behavior, and once that happens we’ll add an offset_of macro using those operations.

Separate from that, there was some discussion about first-class “pointer to member” types, which could also provide an offset. The general consensus among the room of folks interested in FFI was that such a feature might be useful, but should not block offset_of, and people still want the operations needed for offset_of as non-UB for other purposes.


#50

I see. I’m sorry, I misunderstood the original comment.


#51

What are these operations?


#52

I’m pretty close (I think) to having offset_of implemented as a magic macro. Just working on the last bit of MIR that I’m hoping to finish up in the next weekend or two. Though a more experienced rustc developer will probably tell me I did it wrong :slight_smile:

I need offset_of in a const-eval context (specifically, creating static objects with special linker names that describe the data layouts for certain structures via offset_of). Do you know if these new operations will be const-friendly (even if they’re behind a feature gate)?

I still plan on posting my patch for a magic offset_of. I don’t necessarily plan on it being merged, but it’ll be a good talking point and learning opportunity, I think.


#53

After some discussion with various Rust team members, this is the best that we could come up with as a pure macro solution:

/// Macro to get the offset of a struct field in bytes from the address of the
/// struct.
#[macro_export]
#[allow_internal_unstable]
macro_rules! offset_of {
    ($container:path, $field:ident) => {{
        // Create an instance of the container and calculate the offset to its
        // field. Although we are creating references to uninitialized data this
        // is fine since we are not dereferencing them.
        let val = $crate::__core::mem::MaybeUninit::<$container>::uninitialized();
        let &$container { $field: ref f, .. } = &*val.as_ptr();
        #[allow(unused_unsafe)]
        let result = unsafe { (f as *const _ as *const u8).offset_from(val.as_ptr() as *const u8) };
        result as isize
    }};
}

This code should not have any UB. However it only supports full structs (no tuple structs, no tuples, no arrays) and only supports a single field (no field1.field2).

Considering these restrictions, we feel that a built-in compiler feature is the best way forward. This will allow us to support all types of structs and nested fields, as well as producing a constant value that can be used with const-eval (just like mem::size_of).

The remaining question is whether we want this as a keyword or as a macro-like construct. I believe that allowing an offsetof(Struct, field) in an expression context will be confusing since offsetof is not a real function despite looking like one.

This situation is exactly what the macro syntax was intended for. Consider the exclamation mark is println!(...) which clearly indicates that println is not a normal function. Therefore I believe that offset_of!(Struct, field) is the best way to expose this feature.


#54

However, you also said:

That means that offset_of!() won’t be a normal macro either…


#55

@Amanieu I don’t think that macro is the one we want. Specifically, you are creating a reference into an uninitialized MaybeUninit. It is well possible that we declare that UB.

You showed me a different version earlier, what made you move away from that?


#56

I don’t get the complaint about offset_of! being a “magic built-in macro”. It’s still a normal macro. It’s just a compiler-expanded macro, which expands to a single integer literal. The magic of how it procures that integer is an implementation detail.

There are existing macros that don’t expand to code implementing their semantics: compile_error! (item position, produces no items, just a compile error), env! (expands to a string value based on compilation environment), option_env! (expands to an Option<&'static str> literal based on compilation environment), line!, column!, file!, moudle_path!, (literals for the source location), cfg! (expands to a literal bool based on configuration).

These built-in macros are no different from a built-in intrinsic function, or any other #[lang] item.

[[ Side note: it’d be cool to integrate the stdlib built-in macros via the #[lang] tell-the-compiler-where-it-is to make them slightly less baked in. ]]


#57

For me, it’s personal preference. I know that I can memorize the set of magic macros in just the same way as I can memorize the set of magic words (AKA, keywords) that a language uses, but I personally dislike that there are both magic words and magic macros. I’d like to think that if I was just smart enough, that I could implement all of the macros myself, even if I don’t yet know how to.


#58

My point is that there’s a whole list of intrinsics posing as “normal functions” which you can’t implement in user code either. The same for the various #[lang] types, traits, and impls; none of these can be implemented without compiler support.

There’s nothing special about adding macros to the list of compiler-supported constructs when you need information from the compiler.


#59

It can’t expand to an integer literal. At the time macro expansion happens (for all macros, compiler-provided or not), type information does not even exist yet. Moreover, the integer it should produce will in many cases depend on generic parameters, so often it can’t even be evaluated at any stage before monomorphization, which happens long after type checking.


#60

Yes, this detail matters for implementation. But does it really matter from a user perspective?

As a user, there’s no difference between this macro and any other, which is the point that I’m trying to make. It’s just a standard-library-provided macro, that happens to need to hook into the compiler to do its work (due to, in this case, needing post-monomorphization information).