Get unique identifier inside macro

madsmtm · July 23, 2022, 2:28pm

Macros that generate statics with special link_sections/export_names often need some way to output a unique identifier along with this data (similar to how, if you write static X: u8 = 2;, Rust mangles the name to prevent it from clashing with other statics of the same name).

Two examples:

Objective-C generates statics that start with special names L_OBJC_METH_VAR_NAME_ and L_OBJC_SELECTOR_REFERENCES_ and then has a unique identifier afterwards - a crate that wants to emulate this needs to do so as well (see objc2's implementation).
The popular defmt crate needs to output different statics when given the same input, see knurling-rs/defmt#344 and the entry in their book

The current approach in both of these cases is to create a proc-macro that hashes the Debug output of a proc_macro::Span (since that currently includes line/column information), but this is obviously very brittle!

Concretely, it would be nice if there was a way such that the following would work:

// Input
foo!();
foo!();

// Outputs
{
  #[link_section = ".foo.[SOME_UNIQUE_HEX_1]"]
  static FOO: u8 = 0;
}
{
  #[link_section = ".foo.[SOME_UNIQUE_HEX_2]"]
  static FOO: u8 = 0;
}

Regarding the stable-ness of the identifier, I wouldn't expect any guarantees (e.g. it would be allowed to change between compiler invocations), though it should attempt to at the very least be deterministic.

madsmtm · July 23, 2022, 2:33pm

An impl Hash for proc_macro::Span could solve this issue somewhat nicely, in that users can freely chose the format of the identifier (e.g. if it should be in hexadecimal, only integers, UTF-8 unicode, or whatever).

Another idea would be a unique_int!() -> usize/u128 macro, this would allow macro_rules! macros to use the functionality as well (and allowing it to be used within const evaluation).

CAD97 · July 23, 2022, 3:30pm

Hashing spans isn't enough either, unfortunately; consider

macro_rules! twice {
    ($x:item) => ($x $x);
}

twice!(foo!{});

The easiest solution is probably Ident::fresh() -> Self.

Or alternatively, def-site hygiene makes the items not conflict, e.g.. My original intuition was that the two def sites are the same and can share symbols, but that's not the case as implemented; the def site is scoped to the defining item, not the defining scope/module/namespace.

josh · July 23, 2022, 7:17pm

This seems like the right answer to me as well. The most challenging bit here seems like bikeshedding the name (and to a slight extent the return type).

There's precedent for this in other languages, as well; for instance, C extensions have __COUNTER__, and Lisps use gensym.

I would propose that we call this unique_id!(), and have it return a unique u64 for each invocation site.

Nemo157 · July 23, 2022, 7:37pm

I don't see how a macro would help in the OP usecase? You can't emit something like #[link_section = concat!(".foo.", unique_int!())].

josh · July 23, 2022, 8:02pm

/tmp$ cat section_test.rs
#[link_section = concat!("unique_section_", stringify!(123))]
#[used]
static FOO: u8 = 123;
/tmp$ rustc --crate-type staticlib --emit obj section_test.rs
/tmp$ objdump -x section_test.o

section_test.o:     file format elf64-x86-64
section_test.o
architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000000  0000000000000000  0000000000000000  00000040  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 unique_section_123 00000001  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .note.GNU-stack 00000000  0000000000000000  0000000000000000  00000041  2**0
                  CONTENTS, READONLY
SYMBOL TABLE:
0000000000000000 l    df *ABS*	0000000000000000 section_test.d0add785-cgu.0
0000000000000000 l     O unique_section_123	0000000000000001 _ZN12section_test3FOO17h5adf25b41a35f966E

We just need a unique number to substitute for the 123.

Nemo157 · July 23, 2022, 9:37pm

But concat!(".foo.", stringify!(unique_int!())) will just give .foo.unique_int ! (). That would also need the ability to force early expansion.

EDIT: Also, I don't see any way this could be unique across multiple compilation units, which seems necessary for section names.

mathstuf · July 24, 2022, 1:37am

Could it be derived from the ABI hash that gets mangled into symbols for the crate? That would also help the reproducible builds story I imagine.

eddyb · July 24, 2022, 2:19am

Why not lean further into this? The Rust mangling (esp. v0) is already prepared for handling pretty much everything already, you just don't have a way to combine it with your additional needs.

Plausible bikeshed:

#[link_section(mangled(prefix = ".foo."))]
static ...

This is so much easier to implement, since the compiler has access to the symbol name and can interpret the attribute to generate the unique section, it's much better for debugging, works perfectly wrt deterministic builds, incremental, etc.

Whereas macros that expand to something different every time create a lot more issues than they solve.

I know we don't have incremental macro expansion yet, but we have to be careful not to make it unnecessarily harder, and stateful macros are dangerous for that.

We're still worried about what fallout might occur once we start being able to use one process (or wasm VM instance) per proc macro invocation, if there is enough global state in existing proc macros (though for thread_local! state, we're only weeks away from starting to test it out).

Thankfully we never guaranteed an order of execution between invocations, or that proc macro invocations share threads/processes, but lack of enforcement for these many years can be misinterpreted as permission.

bjorn3 · July 24, 2022, 7:45am

Not for a macro. Macros need to be expanded first before this hash can be calculated. If on the other hand it is an argument to asm!(), it would be possible as at codegen time the crate hash is known.

Kixunil · August 3, 2022, 9:43am

I happened to come across a case where this would be useful but with a bit more guarantees:

the ids start at 0 and increase by 1, it's just not guaranteed which invocation gets which id
invocations in distinct places (macros) can share ID. This must not be relied for comparisons between different macros and such but the idea is if you write a macro with a single unique_int!() invocation and invoke it e.g. 5 times you're guaranteed to get ids 0, 1, 2, 3, 4 in undefined order. (IOW the counter is not shared among all macros)

Motivation: I'd like the IDs to fit 6 bits. Yes, if the macro is called more than 64 times it should fail to compile. This is not an issue because it's not public, it's just for convenience.

Also {integer} is probably better than any specific type.

madsmtm · August 10, 2022, 2:01pm

Honestly, I feel like this idea is perfect, would solve both of my presented use-cases cleanly, and as you noted, wouldn't run into all the other issues with nondeterministic macros.

I doubt I have the required knowledge, but I could be interested in trying to implemented it. Would it make sense for me to write an RFC or something with the idea first?

eddyb · August 11, 2022, 4:17am

I'm not sure an RFC is needed to start experimenting (some things go through compiler-team "MCPs" nowadays), but it wouldn't hurt, especially from the perspective of having it stabilized sooner than later.

The relevant part of the implementation for #[link_name]/#[export_name] is this:

github.com

rust-lang/rust/blob/908fc5b26d15fc96d630ab921e70b2db77a532c4/compiler/rustc_symbol_mangling/src/lib.rs#L207-L216


      
              if let Some(name) = attrs.link_name {
                  return name.to_string();
              }
              return tcx.item_name(def_id).to_string();
          }
          
          
if let Some(name) = attrs.export_name {
              // Use provided name
              return name.to_string();
          }

If attrs indicates anything interesting should happen (that is, CodegenAttrs definition/computation elsewhere would need to be adjusted), you can pretty much just call v0::mangle(tcx, instance, None) in the code above, and combine that with e.g. a prefix.

The section part is sadly more ad-hoc:

github.com

rust-lang/rust/blob/f03ce30962cf1b2a5158667eabae8bf6e8d1cb03/compiler/rustc_codegen_llvm/src/base.rs#L143-L149


      
          pub fn set_link_section(llval: &Value, attrs: &CodegenFnAttrs) {
              let Some(sect) = attrs.link_section else { return };
              unsafe {
                  let buf = SmallCStr::new(sect.as_str());
                  llvm::LLVMSetSection(llval, buf.as_ptr());
              }
          }

So it would likely need to get a query like symbol_name (provided by rustc_symbol_mangling as well, so that it can actually do any mangling), but for the section.

It's not a huge difference, but I would still suggest first prototyping it for symbol names (e.g. adding the ability of having a mangled symbol with a prefix), and only then move onto section names.

crlf0710 · August 11, 2022, 4:33am

I feel both these use cases are actually better served by creating a language equivalent of the linkme crate, instead of writing hacks around linker sections....

CAD97 · August 11, 2022, 8:21am

While this may be true, proper distributed slice support will be much harder to get into the language due to portability of techniques in the face of separate compilation.

Whereas link_section's behavior is inherently platform specific, and "provide this link name" is an extremely portable bit of vocabulary.

I can second that this would be an MCP for roughly permission^[1] to implement, and that stabilizing could probably be done with just a T-lang FCP.

obviously you can do whatever in your own fork, but talking about possibility of merging ↩︎

madsmtm · August 12, 2022, 12:01pm

Thanks for the guidance! Busy times rn, but will try to take a stab at it at some point.

system · November 10, 2022, 12:01pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[Pre-RFC] Generation of item idents in macros bikeshed (deprecated)	9	3898	March 25, 2019
Ident format specifier matching keywords? internals	4	1248	March 25, 2019
Support special character in identifier? language design	10	1753	February 14, 2020
"macro-local" names, with name mangling as DRM libs	18	1439	January 2, 2021
"gensym" in the left parts of macro rules language design	2	1262	March 25, 2019

Get unique identifier inside macro

Related topics