Pre-RFC: Add language support for global constructor functions

daboross · April 29, 2019, 10:47am

Hi! Want to thank everyone for this- there's a lot I hadn't considered, and I'm probably not the most qualified person to be writing this.

Regardless, here are some responses to individual points! Hope this mega-post format isn't the worst thing ever to read through...

I'll edit the text in the initial post too, if that's alright.

Thank you for mentioning this! I was thinking keeping with what rust-ctor does currently might allow for a nicer implementation, but if we can get stdlib to always initialize prior to these, that'd be all better.

I'm going to optimistically edit the pre-rfc text removing this limitation!

This also brings up the question of whether these actually need to be unsafe, or not. If we can fix stdlib being accessed, there might not be anything inherently unsafe about using global constructors.

I was wanting the order to be explicitly undefined so that code wouldn't rely on anything, but I guess having some control could be good.

My only use case for this was as a backend for the inventory crate, but initializing FFI libraries seems like a good use case for this!

Do you know if current libraries using rust-ctor could use lazy_static! or std::sync::Once instead? I guess I'm wondering if there's anything ctors allow which nothing else does, or if it's for the performance benefit.

Replacing lazy_static isn't something that I was aiming at, but I can see this doing that. I'll add it in... somewhere? Not sure.

It would certainly be interesting if we could implement something to back inventory without global constructors!

I guess that wouldn't necessarily help with FFI initialization, but if we could solve at least one problem here without global constructors I would be for that.

I'll add this to the alternatives - at a cursory look, it seems more different than similar to global constructors, but it could definitely be a good alternative solution.

The main problem now is that either a) only the binary crate would be able to add global constructors, or b) binary crates would be forced to call into all global constructors for all the libraries they use (recursively) and adding a new global constructor to a library would be a breaking change.

My main use case for global constructors is coordinating different libraries which don't know about eachother, and want to all add to some global data store. Like if library A uses typetag to create serializable trait, libraries B and C should be able to add data to the "all types implementing this trait" global list without A knowing about them.

Using an explicit solution like this, any binary crate depending on B and C would have to call the global constructors for each of their serializable types manually, even if those types are purely internal and the user shouldn't have to care about them.

The biggest disadvantage that I see, though, is that then library C can't add a new type which uses a global constructor without a new major version. Since adding a global constructor forces all consumer crates to now add a new line to their main function, it becomes a breaking change anything involving global constructors.

This is true!

I haven't mentioned lazy_static as I don't believe it solves anything similar to the same problem, but I might not have really given a good explanation for that. It's true that global constructors would be able to do some of the same things lazy_static can do, but they can solve one extra case: when the crate using the global data has no idea the crate providing the data exists.

I'm adding more to the pre-RFC text, but here's another demonstration.

The best example I have is typetag. Say I have a logging crate which allows for various logger configurations, and can serialize those configurations into JSON. In my logger crate, I define a trait SerializableLogger, and use typetag on it.

Another crate, say logger-syslog-adapter, can then define a concrete implementation SyslogLogger.

When the consumer uses logger to deserialize their configuration, they want to be able to have it "just work" and deserialize it. With global constructors, typetag registers SyslogLogger into a static list of implementors of SerializableLogger. Then when the configuration is deserialized, if a syslog logger was specified, SyslogLogger is automatically grabbed and used as the logger for the Box<dyn SerializableLogger>, without logger ever mentioning logger-syslog-adapter in its source code.

This would have been impossible to implement with lazy_static as lazy_static requires the code providing the constructor to have been called at least once. But when dealing with cross-crate data like this, it's natural to only ever specifically call logger-syslog-adapter when setting up and serializing the data, not when deserializing it.

Just want to say I'm super glad to have a different viewpoint on this. I'm not too experienced with these, and I'm really glad to have your input on this!

Done.

Thoughts on using #[unsafe_global_constructor] instead? I wanted to include the word unsafe in some way since, if we don't fix them being before stdlib initialization, they can break things. But I can see how unsafe fn is the opposite of this.

I... kind of get this, but isn't using libraries at all a security concern?

If an end user depends on a library, then it seem reasonable to assume that they are calling at least one of that library's functions. Sure, it might make debugging more annoying if the library's doing something odd on program initialization, but if we're depending on it and including its code in the end binary, I think we're trusting the library.

When reviewing a library, I would think global constructors should be able to stand out. If nothing else, keeping them unsafe in some form or another should highlight them compared to other (safe) code.

If we expect global constructors to be a niche feature, I would agree with you on this. But I feel like the more libraries use it for small things (like typetag traits), the less this would mean.

What would you propose the behavior be when the user doesn't call run_all_hooks()? If we have not calling it being an error, new users will probably just stick it in there anyways - and requiring use of unsafe just to use various libraries will devalue unsafe.

If it's silently allowed, or even with a warning, then suddenly parts of crates people depend on might just not work. If this is used for FFI initialization, we could end up in unsound territory, or if it's just for things like typetag, then deserialization could just fail at runtime.

Unrelated to the above, but I hadn't thought of using LLVM as a downside. I'll add that in.

Sounds reasonable- if this ends up as a full RFC, having it only for one platform would be... bad. I was thinking of this as an alternative for testing, but I guess that's still bad language design.

I usually would to, but I would hope the global aspect of it offsets that. I'm opposed to initializer because of it's connotation with initializing a specific value somewhere. Keeping away from initializer would also help differentiate this from C++ static initializers, which are indeed intended to initialize a single static value.

If this initializes a crate, though, it kind of is constructing the crate's global state. I guess that's fairly similar to initializing the global state...

I'll add #[register_main_hook] in as an alternative, but if I found this somewhere in the code I think I'd have even less of an idea of what it does than global_constructor or _initializer.

I mean, I haven't thought through this? Good questions. I will try to add to these sections if I or anyone else comes up with reasons for either side.

This is one use case, but not my primary one. I will elaborate in the edited post.

Thank you for linking this!

I am excited about the possibility of this alternative, and will have to look into it more.

I'm planning on at least expanding this pre-RFC a bit further in its current direction to collect this knowledge and try to explore it? But you're right, the solution here doesn't really match up with the problem. I started approaching this from the point of view of "LLVM has a global_ctors attribute, and we use rust-ctor to solve the problem, so using rustc to take advantage of global_ctors seems like a good idea", not necessarily looking for the best language design solution.

My understanding is that this kind of built-in data store is less charted territory, but that's not necessarily bad. If we can have our cross-crate-coordination cake and (with no runtime cost) eat it too, that would be pretty great.

Adding this to unresolved questions. It seems like this could depend on whether we implement this using LLVM's global_ctors, or as part of the main shim if that proves useful for running after stdlib initialization?

I will be attemting to understand & follow links on @comex's explanation of this. (thanks for that!)

I think this ties into @dtolnay's post above.

My personal use case is directly just "using typetag"...

Just looking at rust-ctor's dependent crates for ideas, there's also

emacs using this to create ctors which will run when a rust plugin's output shared library is loaded
rust-pretty-assertions using it to set up ANSI escape codes on windows console

Not sure how useful that is. I can see the ANSI escape code setup being useful, but unless I'm misunderstanding a std::sync::Once check could probably work too? Maybe bad to do that initialization when panicking?

With the emacs crate, it looks like this would still want to use rust-ctor to get literal ctors even if we get support for global constructors in binary crates...

bill_myers:

The problem of this mechanism is that without run order guarantees code in global constructors will trigger undefined behavior if it calls any module that requires its own global constructors to run first.

This means that effectively global constructors cannot use any code from any other crate, since that crate may be changed to rely on global constructors without an ABI break, which is pretty limiting.

So I think it’s essential to provide a deterministic run order: a crate’s constructors must run before the ones in any crate that depends on it, and within a single crate they must run in the order they are found by a simple parsing that goes inside submodule.

While this guarantees no inter-crate UB, the developer must be careful to not use anything in the current crate that requires a global constructor to be run.

EDIT: it seems like that with this guarantee (and possibly some stdlib changes), unsafe is not required, although in most cases global constructors will be declared unsafe since they will modify static mut data.

I guess my hope here was that by having global constructors be standalone functions rather than explicitly initializing variables, it would make it much harder to create a library which invokes undefined behavior when called before the global constructor is called. Like, we won't ever have any uninitialized statics unless someone explicitly uses MaybeUninit.

I'd argue that if we can encourage abstractions over this feature enough, leaving the order undefined could be entirely fine. In my perfect world no constructors would ever depend on one another.... Of course, that world won't exist.

One thing I'm worried about if we define the order, though, is seemingly arbitrary changes changing it.

For example, I'm extremely worried about becoming dependent on runtime order of global constructors within a crate- especially if that depends on something like the names of modules, or what order they're declared in. There is exactly one feature right now which depends on module declaration order, and that is macro declaration. When macro declaration fails though, we will get explicit compile time errors.

If a crate has an implicit dependency of one module's global constructor running before another, this all becomes much more hairy. What if rustfmt reorders the mod declarations? What if in refactoring, one module is renamed, making it sort differently and putting it above the other? These seemingly entirely innocent changes could break code depending on this order, and the error wouldn't be discovered till runtime.

Sure, this could happen with an undefined order too. However, a "defined" order which depends on easily changeable things like module order could lull users into a false sense of security worse than just not knowing any order at all.

Ehhh, or at least that's my scenario. Maybe that's unrealistic?

There are a few more posts that I haven't responded to here, will try to do that. Glad to have many more ideas in here! Hope this hasn't been too rambly.

As others have mentioned, something like Idea: global static variables extendable at compile-time - #2 by dtolnay might be a better idea. But if it is, I think exploring this one fully still has value.

Again, thanks!

Topic		Replies	Views
Pre-RFC: Traits for crates (or: canonical API portability) language design	11	711	January 18, 2025
Subteam Reports 2016-04-25 announcements	2	1399	March 25, 2019
Pre-RFC: Mutually-excusive, global features cargo	47	5937	May 29, 2025
Idea: global static variables extendable at compile-time language design	6	2644	November 29, 2019
Global Registration (a kind of pre-rfc) language design	51	2889	October 3, 2024

Pre-RFC: Add language support for global constructor functions

Related topics