From "life before main" to "common life in main"

Just because I thought it was interesting: the weird scoping rules of macro_rules! macros very nearly allow you to implement something inventory-like without ctor, there's just one (rather silly IMO) compiler error that complains about ambiguity preventing it from actually working.

I say the error is silly because the following code does work:

macro_rules! a { () => { compile_error!("") } }
macro_rules! a { () => {} }
a!();

But this code doesn't:

macro_rules! a { ($($tt:tt)*) => { $($tt)* } }
a!(macro_rules! a { () => {} });
a!();

So I don't see why macro-generated macros shouldn't be allowed to shadow macros like manually written macros can.

1 Like

Not to derail this thread too much, but this seems to be deliberate: Weirdness with macros trying to redefine each other · Issue #45732 · rust-lang/rust · GitHub

C++ has famously caused itself trouble by leaving execution order of these things unspecified:

https://en.cppreference.com/w/cpp/language/siof

An unordered attribute like #[startup] will likely run into the same thing. For example you may want to register your plugin automagically, but your plugin may itself need to wait for some of its dependencies to initialize first before it can register itself (this can realistically happen, e.g. a video-conferencing plugin may want to wait for audio and video codecs plugins to register themselves first, and codec plugins may want to wait for hardware acceleration plugins to register first).

18 Likes

Another ordering issue to consider is setup sandboxes and child forks. I have servers that call fork() as the first thing in main, and then set up two different sets of seccomp filters for roles of the child and parent process (including specifically two different ways of reporting errors to sentry, since one of the processes is intentionally forced offline). Life before main would be terrible for this — I'd be running unsandboxed code, and half of the things would be in a wrong process.

4 Likes

The core issue with the traditional ctor approach is just that stuff runs and nobody has any control over it. With just registering functions there is a certain amount of control that can be exposed.

A more generic solution than #[startup] would be to have a #[register(MY_COLLECTION)] similar to how linkme operates. At that point the user gets all collected functions or types of a specific collection and can do with them what the user wants. However I think at all times the equivalent to #[startup] is necessary because of the lack of control over main at the moment.

5 Likes

If we have #[distributed_slice], then I think it's fine to push any framework that takes main away from you to offer a distributed slice of fn() hooks that it runs early in the life of the program. This means that you don't automatically have #[startup] with whatever framework, but it gives the framework control over when and how the startup hooks are ran. (So e.g. they can run after logging is initialized, or w/e.)

Ordering is still an interesting question to answer, but it's now answerable in userspace (e.g. the bevy scheduler) rather than needing to be solved in the compiler. (And there definitely are competing answers as to how to solve startup scheduling.)

(Side note: I really want to see a world where bevy can use distributed slices to construct the world. They've considered and rejected using linkme because of the platform limitations, but if it were built into the compiler directly, it definitely would get reconsidered.)

13 Likes

setenv/getenv are so difficult to make thread-safe because the problem spans Rust and C
are constructors not a case like that?

sometimes you have some control over main, sometimes not
sometimes it is more important that your plugin runs before main
sometimes it is more important that main runs before anything
sometimes main is in Rust, sometimes in C

...the last "sometimes" seems most problematic
you end up trying to fix the world, not just Rust

P.S. #[distributed_slice] on all platforms would have been wonderful..

1 Like

Having built plugin systems that need to work in shared and static builds (while supporting "builtin" plugins as well), I think some way to just "label" a piece of static data (or function) that gets put into the binary and then have an API to ask the current executable (and its transitively loaded shared libraries) as well as "this specific loaded library" things like "get me all symbols labeled with foo" would work. Now, this would be an unsafe API since going from symbol name to some concrete type is ripe for…abuse, but going from "ha, good luck, hope the linker doesn't screw you over" to "use unsafe to get what you need" is a vast improvement I think. This would allow for more…reasonable initialization routines to be built up from such a primitive.

FWIW, plugins shouldn't require "life before main" and should be able to instead wait for explicit instantiation (both because of the general unorderedness but also because plugins may depend on each other). But there may be other use cases as well. Can we gather a list?

FWIW, in Rust, I've used inventory, but if there were the "get me symbols matching X" primitive (probably provided by a crate given its likely platform availability and looking at linkme, maybe this is it), could similar behavior be implemented? It would be nice to have libloading (or similar) be able to have ways of querying for freshly-loaded information as well.

4 Likes

For example you may want to register your plugin automagically, but your plugin may itself need to wait for some of its dependencies to initialize first before it can register itself (this can realistically happen, e.g. a video-conferencing plugin may want to wait for audio and video codecs plugins to register themselves first, and codec plugins may want to wait for hardware acceleration plugins to register first).

I'm pretty apprehensive of rust adding anything which could make life-before-main more common. However this comment made me think: Couldn't the compiler be smart enough to handle this? You could have a restriction that a crate is only allowed to have a single function annotated #[startup], and the compiler could ensure that the #[startup] functions of dependencies run first.

1 Like

I think that "gather a list" is the proper solution to the plugin problem, as proposed above.

We could either embed the dependency graph in the collected items, or require the Rust compiler to order the items in "bottom up dependency order" (for anything statically available).

Implementation weeds for plugins

After you have dependency tree ordering, it's easy to borrow the five phase system from Minecraft modding (Initialization: more unsafe than usual, similar to ctor: your dependents have not loaded yet; Early, Normal, Late, Finalize: make no new changes, instead, compact your records of changes already made.)

Please forgive my ignorance, but I've never heard of this mechanism. How would it work? Does the compiler do a pass to determine the space needed for all items in the distributed slice, then statically allocate space for it? Or is it more like a vector, requiring an allocator to work?

My understanding is that it makes a linker section for the relevant data and uses a linker script to tell the linker "please put all of these together". The symbol then points to the start of this (now) array. I'm…not sure how ordering works or how it knows where "the end" is without looking into the implementation.

Thank you for the explanation.

If the compiler is able to build a dependency graph, then the ordering would simply be a topological sort of the graph. If no such sort is possible, then there is a cycle, and the compiler can spit out an error. However, I have no idea if that's what would actually happen.

@CAD97, if you get a chance would you be willing to explain how this mechanism works a little more? Web searches turn up entirely unrelated topics, and although I'm searching through the compiler source right now, I suspect that reading the code will take far longer and be far less illuminating than your explaining it to us.

EDIT

My script just finished, distributed_slice does not appear to be mentioned anywhere in the rust sources, at least up to commit 17dfae79bbc3dabe1427073086acf7f7bd45148c.

But do you have a case that can't be solved by linkme + lazy_static?

All cases I can think of will work with either registering lazy_static objects with linkme, or wrapping the linkme registry in a lazy_static accessor. Because in the end the functionality based on this is used somewhere, and that can trigger the lazy initialization, with the benefit of the order being defined by dependencies.

1 Like

distributed_slice is from GitHub - dtolnay/linkme: Safe cross-platform linker shenanigans

1 Like

Thank you!

"Linker shenanigans." I'm not the best person to explain it, @dtolnay would be the one who understands how the linkme implementation works the best.

But the short version, as I understand it, is:

With linker shenanigans:

  • In creating a distributed slice "registry" called NAME, we set up three linker sections, the exact name and method of which are platform dependent, but we'll call __linkme_NAME, __linkme_start_NAME, and __linkme_end_NAME.
  • We ask the linker to lay these sections out such that __linkme_start_NAME is directly followed by __linkme_NAME which is then directly followed by __linkme_end_NAME.
  • Each item put in the distributed slice is (of known, verified type and) placed as a static in the __linkme_NAME section.
  • Again through platform/linker-specific tricks, we define statics that reside in the __linkme_start_NAME and __linkme_end_NAME sections.
  • We at runtime use those two statics to create our slice; we effectively have a "first before the start" and "first past the end" pointer from which to derive our linker-assembled slice.

This is almost certainly actually UB in a strict sense, as the Rust Abstract Machine doesn't have a concept of what we're doing here. In practice, this is closer to platform-defined behavior.

With compiler support, it would work much the same way, except that because the compiler itself knows about it, it wouldn't require platform linker support, just rustc linker support. At a high level,

  • The registry crate defines a distributed slice as a static.
  • Any static which is placed in the distributed slice is marked specially in the rlib as being part of the distributed slice.
  • When rustc is invoked to link together the rlibs into an executable, it first finds all of the statics marked as part of the distributed slice, and orchestrates the platform-specific operations to put them in a continuous statically allocated slice. This may be linker directives on some platforms (e.g. the ones linkme already supports), or it may be in directly reässembling the individual static sections into one static section (and references from the children back to the parent) before handing it off to the platform linker.
  • Notably, the Abstract Machine is now dealing with an actual slice of linktime determined size, rather than seeing you accessing outside of these statics you've defined, so it's no longer strictly speaking UB, and there's no danger of future optimizations breaking the behavior.

All of this of course only works with static linking.

7 Likes

I only just got pinged so I hadn't seen this thread until now, but since my crates were mentioned: I feel strongly that the linkme form of the API is the right one for Rust, and that executing code at runtime before main is unnecessary and should not be added to Rust. The distributed slice elements in linkme are each static so they are mandated to be compile time constructible (link time technically, since you can have references to other statics, unlike in const).

Basically these comments got it right:

And yeah, it's gonna need compiler support, along the lines of:

The only extension I'd make to that comment is that on the platforms where we can't count on the linker to handle building the slices, rustc can solve this on its own by propagating the elements through rmeta (as if they were macros) through all the layers of the dependency graph until the point that the rustc invocation that's compiling main can get all the final slices all put together, prior to any linking.

The way this works is equivalent to if every crate implicitly got the following in its root module:

pub(rustc) mod distributed_slice_elements {
    // for each of my direct dependencies:
    pub(rustc) use $the_dependency::distributed_slice_elements::*;
    // (except deduplicated in the case of diamond dependencies)
}
22 Likes

@CAD97, @dtolnay, thank you for your explanations. One further question then; do these methods ensure that cyclic dependencies are caught at compile time? I don't mean 'if we add that support to rustc in the future', I mean can the stable compiler catch and error out on cyclic dependencies today?

Statics are allowed to have cyclic dependencies on one another.