Macros that generate statics with special link_sections/export_names often need some way to output a unique identifier along with this data (similar to how, if you write static X: u8 = 2;, Rust mangles the name to prevent it from clashing with other statics of the same name).
Two examples:
Objective-C generates statics that start with special names L_OBJC_METH_VAR_NAME_ and L_OBJC_SELECTOR_REFERENCES_ and then has a unique identifier afterwards - a crate that wants to emulate this needs to do so as well (see objc2's implementation).
The current approach in both of these cases is to create a proc-macro that hashes the Debug output of a proc_macro::Span (since that currently includes line/column information), but this is obviously very brittle!
Concretely, it would be nice if there was a way such that the following would work:
Regarding the stable-ness of the identifier, I wouldn't expect any guarantees (e.g. it would be allowed to change between compiler invocations), though it should attempt to at the very least be deterministic.
An impl Hash for proc_macro::Span could solve this issue somewhat nicely, in that users can freely chose the format of the identifier (e.g. if it should be in hexadecimal, only integers, UTF-8 unicode, or whatever).
Another idea would be a unique_int!() -> usize/u128 macro, this would allow macro_rules! macros to use the functionality as well (and allowing it to be used within const evaluation).
The easiest solution is probably Ident::fresh() -> Self.
Or alternatively, def-site hygiene makes the items not conflict, e.g.. My original intuition was that the two def sites are the same and can share symbols, but that's not the case as implemented; the def site is scoped to the defining item, not the defining scope/module/namespace.
This seems like the right answer to me as well. The most challenging bit here seems like bikeshedding the name (and to a slight extent the return type).
There's precedent for this in other languages, as well; for instance, C extensions have __COUNTER__, and Lisps use gensym.
I would propose that we call this unique_id!(), and have it return a unique u64 for each invocation site.
Why not lean further into this? The Rust mangling (esp. v0) is already prepared for handling pretty much everything already, you just don't have a way to combine it with your additional needs.
This is so much easier to implement, since the compiler has access to the symbol name and can interpret the attribute to generate the unique section, it's much better for debugging, works perfectly wrt deterministic builds, incremental, etc.
Whereas macros that expand to something different every time create a lot more issues than they solve.
I know we don't have incremental macro expansion yet, but we have to be careful not to make it unnecessarily harder, and stateful macros are dangerous for that.
We're still worried about what fallout might occur once we start being able to use one process (or wasm VM instance) per proc macro invocation, if there is enough global state in existing proc macros (though for thread_local! state, we're only weeks away from starting to test it out).
Thankfully we never guaranteed an order of execution between invocations, or that proc macro invocations share threads/processes, but lack of enforcement for these many years can be misinterpreted as permission.
Not for a macro. Macros need to be expanded first before this hash can be calculated. If on the other hand it is an argument to asm!(), it would be possible as at codegen time the crate hash is known.
I happened to come across a case where this would be useful but with a bit more guarantees:
the ids start at 0 and increase by 1, it's just not guaranteed which invocation gets which id
invocations in distinct places (macros) can share ID. This must not be relied for comparisons between different macros and such but the idea is if you write a macro with a single unique_int!() invocation and invoke it e.g. 5 times you're guaranteed to get ids 0, 1, 2, 3, 4 in undefined order. (IOW the counter is not shared among all macros)
Motivation: I'd like the IDs to fit 6 bits. Yes, if the macro is called more than 64 times it should fail to compile. This is not an issue because it's not public, it's just for convenience.
Also {integer} is probably better than any specific type.
Honestly, I feel like this idea is perfect, would solve both of my presented use-cases cleanly, and as you noted, wouldn't run into all the other issues with nondeterministic macros.
I doubt I have the required knowledge, but I could be interested in trying to implemented it. Would it make sense for me to write an RFC or something with the idea first?
I'm not sure an RFC is needed to start experimenting (some things go through compiler-team "MCPs" nowadays), but it wouldn't hurt, especially from the perspective of having it stabilized sooner than later.
The relevant part of the implementation for #[link_name]/#[export_name] is this:
If attrs indicates anything interesting should happen (that is, CodegenAttrs definition/computation elsewhere would need to be adjusted), you can pretty much just call v0::mangle(tcx, instance, None) in the code above, and combine that with e.g. a prefix.
The section part is sadly more ad-hoc:
So it would likely need to get a query like symbol_name (provided by rustc_symbol_mangling as well, so that it can actually do any mangling), but for the section.
It's not a huge difference, but I would still suggest first prototyping it for symbol names (e.g. adding the ability of having a mangled symbol with a prefix), and only then move onto section names.
I feel both these use cases are actually better served by creating a language equivalent of the linkme crate, instead of writing hacks around linker sections....
While this may be true, proper distributed slice support will be much harder to get into the language due to portability of techniques in the face of separate compilation.
Whereas link_section's behavior is inherently platform specific, and "provide this link name" is an extremely portable bit of vocabulary.
I can second that this would be an MCP for roughly permission[1] to implement, and that stabilizing could probably be done with just a T-lang FCP.
obviously you can do whatever in your own fork, but talking about possibility of merging âŠī¸