[Pre-RFC] Add a new offset_of macro to core::mem

hanna-kruppe · January 28, 2019, 3:42pm

Note that the keyword unreservation was done under the impression that it can be a completely ordinary macro, rather than a compiler-builtin with macro-like syntax. As @mjbshaw has pointed out at the start, the implementation assumed there is incorrect if one also wants to be able to get offsets of fields in in tuples and unions.

Centril · January 28, 2019, 6:38pm

I don't recall actually having such an impression when writing that RFC; I know eddyb noted their dislike for macros expanding to built-ins. Personally I don't have any problem with that in this case.

hanna-kruppe · January 28, 2019, 7:01pm

Fair enough, but at least two people in that discussion did have that concern (me and them) and only didn’t raise it further because an apparently-sufficient library solution was found – and quite a few other people cared enough about that to debate the merits of that implementation. I think it’s clear that RFC PR would have gone very differently if that implementation hadn’t been posted or if we had realized it’s insufficient.

mjbshaw · January 29, 2019, 6:20am

If the offsetof keyword is resurrected (which I personally don’t want to do), I imagine it would be a new AST node. In which case, I don’t really see much difference in terms of implementation between a keyword or a special built-in macro. The only significant difference seems to be surface syntax, and I personally prefer the macro syntax.

Similarly, this could be (safely) implemented as a completely ordinary macro if there was a way to disable auto-Deref, but Rust presently has no way to do that. One alternative is to do the following (only works for sized types):

macro_rules! offset_of {
    ($ty:ty, $field:ident $(,)?) => ({
        const fn offset() -> usize {
            let uninit = $crate::mem::MaybeUninit<$ty>::uninitialized();
            let ptr = uninit.as_ptr(); // requires as_ptr() to be a const fn;
            let field_ptr = unsafe { &(*ptr).$field as *const _ };
            let offset = (field_ptr as usize).wrapping_sub(ptr as usize);
            // Requires const assert!.
            assert!(offset == 0 || offset < $crate::mem::size_of::<$ty>());
            return offset;
        }
        const OFFSET: usize = offset();
        OFFSET
    });
}

This doesn’t work for unsized types because you have to use size_of_val, which isn’t (and can’t be, I don’t think) a const fn. This doesn’t stop you going through Deref, but the assert ensures that you cannot escape the struct, and the fact that it’s a const should catch any issues at compile time (though with less-than-ideal error messages).

So far, my conclusions are that offset_of! should be a built-in magic macro with its own AST node. My arguments are:

Implementing this as a pure (user-level) macro with existing Rust features requires crippling the offset_of! macro in one way or another that makes it feel half-baked. I’m really don’t want to do that.
Creating a new keyword and syntax would require a new AST node anyway. I don’t see any advantage (and it has the disadvantage of not being compatible with both Rust 2015 and 2018). Plus I just like the way the macro syntax looks at the user level.
Changing Rust such that this could be implemented as a pure (user-level) macro (i.e., some way to prevent auto-Deref) would require a new AST node (or similar) anyway. If such a feature is ever implemented, we can change offset_of! to be implemented using it, but I’d really like to avoid chaining the existence and stabilization of offset_of! to some other new language-level RFC and feature. We’ve got enough new language-level stuff being implemented; I don’t want to propose another one.

With that said, I’m going to try implementing this as a built-in magic macro. Even if it’s rejected it’ll be a good learning exercise for me.

ckaran · January 29, 2019, 3:10pm

I understand what @mjbshaw is saying about offsetof looking cleaner as a macro, but I disagree for two reasons:

Magic macros are keywords, they just don’t look like keywords.
It will be very confusing if we reimplement it as a macro.

The former problem makes it difficult for others to implement new compilers (I’m assuming that rust will become popular enough that others will want to). You have to know that some keywords look like macros, while others don’t. With a clean macro syntax it’s a bit easier.

The latter problem is confusing for end users; up until RFC 2421, the documentation told everyone that offsetof (along with a few other keywords) might some day become keywords. If I turn on syntax highlighting for rust, they are colored as keywords. So I’m used to thinking of offsetof as a keyword. Now it isn’t, so I have to mentally remember that I could use it as an identifier. If we adopt a macro of offsetof! that is actually a magic macro (keyword) and not a real macro, I now have to remember that pre-2015, offsetof was reserved, for sometime in 2015-2019 (dates? When did this change make it to stable?), offsetof wasn’t reserved, but there also wasn’t an official offsetof! macro, but after edition 20XX, there is an offsetof!, which looks like a macro, but is actually a keyword. That has a code smell to it that is difficult for me to describe.

Honestly, if it was up to me, I’d retract RFC 2421 immediately, and re-reserve all those words. Even if they are never used by rust, it will cut down on the confusion. If someone really needs to use a keyword (or reserved word) as an identifier, they can always use raw identifiers instead.

mcy · January 29, 2019, 5:01pm

I think you're collapsing two concepts into one. There are macro-like syntactic constructs (asm! and global_asm! can't be real macros in any meaninful sense, since that information needs to survive all the way past codegen, long past macro evaluation). Moreover, these are not stable, and unlikely to become stable with spooky macro syntax (I would hope they get replaced with e.g. intrinsics).

Then, there are compiler implemented macros, which I believe are confied to core::macros::builtin, which, for the most part, replicate non-standard functionality in clang et al, because it's useful functionality. These macros are impossible to implement without compiler support (with the possible exception of assert!).

None of these are keywords in the syntactic sense; they're just part of the quasi-privileged standard library. You can define identifiers with their names all day long. In a similar way, Rust cannot desigar for loops without knowing about Iterator and Option, whose paths are currently hard-coded into the compiler. Reserving their names as keywords is a bit silly.

ckaran · January 29, 2019, 7:42pm

You're right, I am. I'm thinking about it both from the end-user's perspective, and the perspective of someone who decides to implement a new rust compiler from scratch. In both of those cases, there exist 'macros' that aren't; they have special access that ordinary macros don't have.

From an end-user's point of view, they'll 'know' that something can be done with a macro simply because they see what appears to be a macro doing it already. And they may want to do something similar, which won't work because they don't have the access that the magic macros have.

From the point of view of a compiler writer, they'll either have to read through quite a bit of documentation online to learn which are the magic macros, or they'll learn as they try to implement macros, and realize that they have to hack something into the compiler.

In my opinion, keeping offsetof as a keyword is more ergonomic.

wesleywiser · January 29, 2019, 8:14pm

It's pretty easy to recognize these macros, because their documentation says

Built-in macros to the compiler itself. These macros do not have any corresponding definition with a macro_rules! macro, but are documented here. Their implementations can be found hardcoded into libsyntax itself. For more information, see documentation for std's macros.

and their source is empty:

    macro_rules! compile_error {
        ($msg:expr) => ({ /* compiler built-in */ });
        ($msg:expr,) => ({ /* compiler built-in */ });
    }

mcy · January 29, 2019, 8:56pm

I’ll point out that until Rust has an actual specification (which it has no hope of having without the fruits of the unsafe working group’s mission), we’re stuck in a state of “the spec is whatever stable Rust does”, which, if you’re a compiler vendor, makes you way more sad than whether a certain magic macro does some trivial thing.

ckaran · January 29, 2019, 10:19pm

That's actually part of my point, you have dig around in the source to find that information, which brings up:

100% true, couldn't agree with you more! That said, I still don't want to make things weird with magic macros, etc. As far as it is possible and practical, I'd like rust to remain a fairly clean language, where there are few, if any, surprises. I want to avoid creating a language which (when fully and formally specified) ends up making the C++ draft standard look small (the latest draft is over 1700 pages long). If you're thinking that the move from offsetof to offset_of! won't make a difference, you're probably right. It just doesn't feel very ergonomic to me though.

That said, if everyone else wants to go with a magic macro, I'll go with the flow.

mcy · January 29, 2019, 11:07pm

Unfortunately, I think that if you want to target real hardware, and have a powerful template and compile-time-evaluation system, you are going to wind up with something of that order of magnitude. (Our syntax isn't turing complete, so we have that going for us...)

I think this is a silly point to make, but I think that it's not worth being afraid, because we have already achieved C++-levels of complexity. =)

ckaran · January 30, 2019, 2:42pm

I'm not sure if that level of complexity is truly necessary. I had a half-baked idea (OK, in all honesty, it was a mostly raw idea) that would simplify templating/runtime evaluation significantly from the end-user perspective. Using it, you could reduce the language size by turning all macros (including the magic ones) into procedural macros, and those would all be ordinary rust (no macro rules). The language itself would shrink, and more stuff would move into the rustc_interface crate. Since that is just a crate, we can do all the usual things of submodules, etc., which would mean that we could have an official core that all compilers support, then each compiler could have its own namespace (rustc_interface::core for everyone, rustc_interface::nightly for experiments, rustc_interface::gnu for gcc, etc.) 99.99% of all code would probably just stick with rustc_interface::prelude::*, which would only depend on rustc_interface::core, but if there is something that you absolutely need out of your compiler, and you don't mind that you can't use a different compiler, or need to do it a different way, then you can import the interfaces for the other compiler types.

My concern with C++ levels of complexity are the corner cases; unfortunately, it's been a number of years since I did C++ often enough to give you good examples, but I do remember skimming through the spec, and realizing that there were some really hairy cases that would require some hard work on the part of the compiler writer to get right. I want to avoid as many weird corner cases as possible.

I think I know what you're saying, but... procedural macros are turing complete, right?

felix.s · January 31, 2019, 8:47am

Why isn't there an analogous problem with intrinsics (i.e. built-in functions and types)? Do you think users convince themselves they can write their own PhantomData or mem::forget~~, with blackjack and hookers~~, simply because they are used with regular function-call or type syntax? Why would anyone infer anything about the implementation of something merely because of its usage syntax? I'm pretty sure nobody thinks include_bytes! can be defined as a declarative macro.

And personally, if I were to design a programming language, I'd very much prefer having a simple, orthogonal syntax with few keywords (preferably none) and consistent name-resolution rules, and re-use them as much as possible -- even for built-ins -- instead of inventing a new syntactic construct every time I need to include an operation that was previously inexpressible. If the segfault keyword proposal is anything to come by, I believe quite a few people share the sentiment, though it may vary to what extent.

In that case, you too should be advocating against adding keywords, because keywords add special cases instead of removing them: they are arbitrarily reserved words that cannot be used as identifiers for user-defined items (despite otherwise conforming to the lexical syntax of identifiers) and aren't subject to ordinary name resolution (including namespacing). Adding syntactic constructs to the language creates more cases to handle for the parser. Meanwhile, offset_of! as proposed here with macro-like syntax can be already parsed into a syntax tree; implementing this feature is just a matter of assigning them some semantics.

ckaran · January 31, 2019, 3:43pm

Good point, logically there are issues with them.

I agree. However, from an ergonomics point of view, something like let c = a + b is much easier for an end user to parse than something like set(new('c'), add(get_value('a'), get_value('b')))[1]. While it is possible to create a complete language using function notation, or some other wholly clean and consistent syntax, at some point we need to start thinking about ergonomics. Programmers are people, not computers; we have opinions, and often have shared opinions on what is 'easy' and what is 'hard'. I will not voluntarily choose to program in Malbolge. I have programmed quite a bit in C, C++, and Python (and some others). From an ergonomics perspective, Rust feels easier to me. However, magic macros feel hard to me; keywords are the magic of the language, so they can do anything, including things that I can't do on my own via writing my own functions or macros. Functions and macros feel like things that I should be able to do myself. That is why I prefer an offsetof keyword over a magic offset_of!() macro.

Finally, we need to consider crates that have already defined of offset_of!() (simply because they thought they needed it, and rust didn't yet offer it). How will they deal with a new magic macro? Is the new macro going to be namespaced? Or is it going to be a built-in that stomps on (or gets stomped on by) the crate's prior definition? Since we already had some keywords that were reserved, everyone knew not to try to use them as identifiers, so there shouldn't be any crates in the wild that attempted to use offsetof. This makes the change somewhat less painful, and is part of my thinking for why RFC 2421 should be retracted.

[1] I just made up that syntax, it isn't meant to be rust or any other language

RalfJung · January 31, 2019, 5:11pm

It's going to behave like asm! and include! and any other magic macro: it is going to be properly namespaced and cause none of these issues.

ckaran · January 31, 2019, 9:32pm

If that's so, then there won't be a problem. Like I said, I personally dislike magic macros from an ergonomics point of view, but if the consensus is to use them, and they won't introduce further problems, I'll accept the consensus and go with it.

hanna-kruppe · February 3, 2019, 1:09pm

Summarily responding to the last ten or so comments: I do not think a strong case can be made that either a magic macro/“macro-like syntactic entity” or a keyword for offset_of is significantly worse than the other option in implementation complexity, specification complexity, or user-facing complexity. The main reason I wish for a keyword rather than a magic macro is because of a design preference to have the syntactic form of macros (PATH ! ( TT* )) be reserved for things that expand to token trees and not also as a grab bag of special language constructs that we couldn’t bother designing real syntax for. That is all.

But given that some people are highly allergic to a keyword and “macros = token based expansion” seems to be a minority position anyway, I really won’t lose much sleep over conceding this fight – it’s much more important to get this capability into stable Rust in some way.

However, I do want to note that we have plenty of contextual keywords already, so offsetof being unreserved is no obstacle wrt backwards compatibility.

scottmcm · February 3, 2019, 9:32pm

offsetof needs to work in expressions, though, which makes it harder. The existing contextual keywords work in items where it's easier. (And types, I guess, with dyn, but even there it worked poorly enough that it became a real keyword in the edition even though that wasn't the original plan.)

ckaran · February 6, 2019, 2:06pm

Well said. To me, this is ergonomics; making life easier for the end user.

And this may be the real advantage of reserving keywords; since they are reserved, you can do anything with them.

It's too bad that we don't have a some kind of known naming convention that was reserved for keywords. E.g., any word that starts with the letter 'k' is reserved as a possible keyword in the future. I'll be the first to admit that there would a lot of problems with such a plan, but at least you'd know what the keywords were, and you could expand the set of keywords as you see fit!

mcy · February 6, 2019, 5:05pm

This is in a sense Not Wrong; in C, all identifiers staring with _A, for A any capital letter, are reserved. C then asks you import headers that define the user-facing keyword with the gross "real" keyword. I.e, stdatomic.h has the line

#define atomic _Atomic

Fortunately, Rust's macros are not quite this insane, but it means we can't play this particular game. There has been a suggestion of introducing k#ident syntax to indicate "I want ident as a keyword, not an identifier", originally for being able to use 2018 keywords in 2015; I could imagine you could make an argument like "add k#offsetof foo.bar now and turn it into a real identifier in the next edition". I'm not sure I like this argument.

Topic		Replies	Views
Discussion on offset_of!(..)	13	6373	February 2, 2019
Get the offset of a field from the base of a struct Unsafe Code Guidelines	8	6880	March 13, 2023
Pre-Pre-RFC: Field offsets language design	7	1752	March 25, 2019
Supporting offset-based types (allocation-bound indicies) language design	1	216	September 30, 2024
Pre-RFC: Struct/union raw pointer field access language design	11	2105	April 2, 2020

[Pre-RFC] Add a new offset_of macro to core::mem

Related topics