From "life before main" to "common life in main"

My understanding is that it makes a linker section for the relevant data and uses a linker script to tell the linker "please put all of these together". The symbol then points to the start of this (now) array. I'm…not sure how ordering works or how it knows where "the end" is without looking into the implementation.

Thank you for the explanation.

If the compiler is able to build a dependency graph, then the ordering would simply be a topological sort of the graph. If no such sort is possible, then there is a cycle, and the compiler can spit out an error. However, I have no idea if that's what would actually happen.

@CAD97, if you get a chance would you be willing to explain how this mechanism works a little more? Web searches turn up entirely unrelated topics, and although I'm searching through the compiler source right now, I suspect that reading the code will take far longer and be far less illuminating than your explaining it to us.


My script just finished, distributed_slice does not appear to be mentioned anywhere in the rust sources, at least up to commit 17dfae79bbc3dabe1427073086acf7f7bd45148c.

But do you have a case that can't be solved by linkme + lazy_static?

All cases I can think of will work with either registering lazy_static objects with linkme, or wrapping the linkme registry in a lazy_static accessor. Because in the end the functionality based on this is used somewhere, and that can trigger the lazy initialization, with the benefit of the order being defined by dependencies.

1 Like

distributed_slice is from GitHub - dtolnay/linkme: Safe cross-platform linker shenanigans

1 Like

Thank you!

"Linker shenanigans." I'm not the best person to explain it, @dtolnay would be the one who understands how the linkme implementation works the best.

But the short version, as I understand it, is:

With linker shenanigans:

  • In creating a distributed slice "registry" called NAME, we set up three linker sections, the exact name and method of which are platform dependent, but we'll call __linkme_NAME, __linkme_start_NAME, and __linkme_end_NAME.
  • We ask the linker to lay these sections out such that __linkme_start_NAME is directly followed by __linkme_NAME which is then directly followed by __linkme_end_NAME.
  • Each item put in the distributed slice is (of known, verified type and) placed as a static in the __linkme_NAME section.
  • Again through platform/linker-specific tricks, we define statics that reside in the __linkme_start_NAME and __linkme_end_NAME sections.
  • We at runtime use those two statics to create our slice; we effectively have a "first before the start" and "first past the end" pointer from which to derive our linker-assembled slice.

This is almost certainly actually UB in a strict sense, as the Rust Abstract Machine doesn't have a concept of what we're doing here. In practice, this is closer to platform-defined behavior.

With compiler support, it would work much the same way, except that because the compiler itself knows about it, it wouldn't require platform linker support, just rustc linker support. At a high level,

  • The registry crate defines a distributed slice as a static.
  • Any static which is placed in the distributed slice is marked specially in the rlib as being part of the distributed slice.
  • When rustc is invoked to link together the rlibs into an executable, it first finds all of the statics marked as part of the distributed slice, and orchestrates the platform-specific operations to put them in a continuous statically allocated slice. This may be linker directives on some platforms (e.g. the ones linkme already supports), or it may be in directly reässembling the individual static sections into one static section (and references from the children back to the parent) before handing it off to the platform linker.
  • Notably, the Abstract Machine is now dealing with an actual slice of linktime determined size, rather than seeing you accessing outside of these statics you've defined, so it's no longer strictly speaking UB, and there's no danger of future optimizations breaking the behavior.

All of this of course only works with static linking.


I only just got pinged so I hadn't seen this thread until now, but since my crates were mentioned: I feel strongly that the linkme form of the API is the right one for Rust, and that executing code at runtime before main is unnecessary and should not be added to Rust. The distributed slice elements in linkme are each static so they are mandated to be compile time constructible (link time technically, since you can have references to other statics, unlike in const).

Basically these comments got it right:

And yeah, it's gonna need compiler support, along the lines of:

The only extension I'd make to that comment is that on the platforms where we can't count on the linker to handle building the slices, rustc can solve this on its own by propagating the elements through rmeta (as if they were macros) through all the layers of the dependency graph until the point that the rustc invocation that's compiling main can get all the final slices all put together, prior to any linking.

The way this works is equivalent to if every crate implicitly got the following in its root module:

pub(rustc) mod distributed_slice_elements {
    // for each of my direct dependencies:
    pub(rustc) use $the_dependency::distributed_slice_elements::*;
    // (except deduplicated in the case of diamond dependencies)

@CAD97, @dtolnay, thank you for your explanations. One further question then; do these methods ensure that cyclic dependencies are caught at compile time? I don't mean 'if we add that support to rustc in the future', I mean can the stable compiler catch and error out on cyclic dependencies today?

Statics are allowed to have cyclic dependencies on one another.

:flushed: I did not know that... thank you!

If code runs automatically in main or not is not as important as being able to register things. So linkme gets quite far. In most situations the user can just manually invoke the collected callbacks. The main reason the desire to have stuff run early in main comes from the situation surrounding the rust testing ecosystem which however can probably be addressed in other ways.

That said, linkme currently also doesn't work sufficiently. It suffers from very much the same issues as ctor in terms of reliability unfortunately so for that approach to work it likely also needs core language support.


Wait, is there any more information on why the current linkme github repo is archived? I don't see any recent commits that say anything about this.

(Edit: OK and inventory is also archived. I'm guessing this is because of recent issues, but I think it's a best practice to at least do a final push to the describing why the repo is archived; and ideally a link to a discussion area or issue. Without that people are left trying to find another communication channel outside the repo (like this one))


Probably because it is currently broken: Data silently disappeared with certain configuration · Issue #31 · dtolnay/linkme · GitHub

Is there any advantage in supporting distributed_slice via linker
compared to hypothetical support via rustc?

Couldn't this be resolved by adding a #[distributed_slice] to the test crate containing functions that will be executed before any tests are run?

Is anyone planning to write an RFC on #[distributed_slice] (or would be interested in collaborating on one)?

pyo3 currently uses inventory for the implementation of its multiple-pymethods feature. I think to get the best support for the implementation of pyo3's proc-macros, what is really needed is:

impl SomePythonClass {
    pub(crate) static PYMETHODS: [PyMethods] = [..];

... that is, a crate-private associated static, which each #[pymethods] impl SomePythonClass block would add an element to.

(I believe that would also need an RFC for associated statics :smile:)

If you wanted to propose an initiative I'd be willing to liason. I have a few other RFCs I need to get done before I can co-author one, though.

We can make some kind of "plugin" system:

The main goal is to make idea of #[distributed_slice] more general and module aware

First, a plugin that is declared by a user must be a static or const with the following trait being const implemented:

trait Plugin {
   type Peer; //type of statics being registered
   fn init() -> Self; //ctor
   fn register(&mut self, peer: &Peer); //registration routine

Registration of a value is done at compile time for statically linked code and manually, after loading a dynamic library. It's done by a #[register] attribute.

static PEER: <ITEM as Peer>::Peer = ... 

The compiler just produces an artifact with all boundaries erased and that's basically the end.

In case of dynamic library, using C constructors is the best option. But how do we know address of the static we are supposed to register in? A hack is to use global registry of registries in the executable, but we know the problems with these. Have anyone come up with a better scheme?

Edit: We can make registration in our dynamic libraries to rely on mangled set of sections - we may have one per registry used in binary, each stores a pointer to the registry and a function pointer to be invoked when the first one is initialized. To check if library has registered everything it needed to we may want to employ some additional function to check whether all pointers to registries are non-null (set)

1 Like

Thanks, I'll take a look at that. Is your thinking that an initiative would be to explore the problem space for general solutions (such as the one proposed just above)?

1 Like

I think a future extension (it certainly doesn't have to be in the first pass IMO) would be to support naming the resulting symbol for cdylib crates (e.g., plugins). This would allow library loader APIs (e.g., libloading) to access these arrays at some stable location (with an attribute naming it).

Although whatever symbol is accessed could also just have a method to return a pointer to the array in the binary, so maybe it's not actually all that useful.

Dynamic library constructors are the same problem as "life before main". I may just want to ask "does this library load" without having to worry about any of its code hooking itself into what I already have¹. I think we should try to avoid them. This leads me, at least, to conclude that explicit loading of static data into lookup structures is better. It also allows one to juggle multiple registries and have a one that has the exe+dynlibA instances in it and another with exe+dynlibB in it. Basically, the language should provide the data in the binaries in some structured way and how it gets used is up to the stdlib (for the executable itself) and crates (like libloading for dynamic loading) instead of happening Like Magic™.

¹ I'll note that unloading is fundamentally impossible because some platforms no-op it (musl), others are conditional (macOS will not unload libraries that use thread-local storage), and may run into destructor order problems. There is no way to know which of A or B can be unloaded first regardless of their initial order. Assuming A was first, B cannot be unloaded first because it may have registered hooks into A APIs. But B may also use and store A's APIs it found when loaded, so it can't be unloaded after either.