Pre-RFC: Add language support for global constructor functions

Nit: although C doesn’t automatically assign priorities, you can specify a priority in the attribute syntax, e.g. __attribute__((constructor(0))).

(Oh, and just to reiterate, just because I’ve been making counterarguments to various arguments against this feature doesn’t mean I think the feature is a good idea. I think it may be better to start with some kind of metadata-based, type-safe “put this static into a global list” feature, and see if anyone is still clamoring for constructors after that.)

2 Likes

I think it wouldn’t be too hard to define a partial order in which global constructors are called:

#[global_ctor( ensures = [foo_loaded, bar_started] )]
fn first() {}

#[global_ctor( requires = [foo_loaded, bar_started],
               ensures = baz_created )]
fn second() {}

#[global_ctor( requires = baz_created )]
fn third() {}

This is how rustc can determine the partial order:

foo_loaded, bar_started and baz_created are booleans that are set to false at first. Rustc iterates over the global constructors and selects the first one for which all requirements are true. After that, it sets the variables in its ensures list to true. Rustc continues doing this until no global constructors remain (or returns a compiler error if some requirements can’t be fulfilled).

I dislike global constructors but I love the idea of explicitly calling global constructors! I imagine it could work well with the restrictions:

  • there is at most one function that is called before main (the #[global_constrcutor] defined in the bin)
  • compiler shows warnings if some crate provides #[global_constrcutor] fn but it is not called - not sure if this should work for ‘all’ or ‘any’ execution path

With this restriction, why do you need it before main at all? Can’t the programmer can just make this “constructor” the first thing main calls?

edit: Oh I see, @catenary showed it that way, but you two also want a warning/error if that’s forgotten.

Perhaps you could accomplish that at the type level if the constructor actually returns some kind of token, which you then require as a parameter where that initialization is required.

2 Likes

This pattern can be used now with global or god objects that just have the entire API as methods. I don’t think it can be made to work for typetag though.

Hi! Want to thank everyone for this- there’s a lot I hadn’t considered, and I’m probably not the most qualified person to be writing this.

Regardless, here are some responses to individual points! Hope this mega-post format isn’t the worst thing ever to read through…

I’ll edit the text in the initial post too, if that’s alright.

Thank you for mentioning this! I was thinking keeping with what rust-ctor does currently might allow for a nicer implementation, but if we can get stdlib to always initialize prior to these, that’d be all better.

I’m going to optimistically edit the pre-rfc text removing this limitation!

This also brings up the question of whether these actually need to be unsafe, or not. If we can fix stdlib being accessed, there might not be anything inherently unsafe about using global constructors.

I was wanting the order to be explicitly undefined so that code wouldn’t rely on anything, but I guess having some control could be good.

My only use case for this was as a backend for the inventory crate, but initializing FFI libraries seems like a good use case for this!

Do you know if current libraries using rust-ctor could use lazy_static! or std::sync::Once instead? I guess I’m wondering if there’s anything ctors allow which nothing else does, or if it’s for the performance benefit.

Replacing lazy_static isn’t something that I was aiming at, but I can see this doing that. I’ll add it in… somewhere? Not sure.


It would certainly be interesting if we could implement something to back inventory without global constructors!

I guess that wouldn’t necessarily help with FFI initialization, but if we could solve at least one problem here without global constructors I would be for that.

I’ll add this to the alternatives - at a cursory look, it seems more different than similar to global constructors, but it could definitely be a good alternative solution.


The main problem now is that either a) only the binary crate would be able to add global constructors, or b) binary crates would be forced to call into all global constructors for all the libraries they use (recursively) and adding a new global constructor to a library would be a breaking change.

My main use case for global constructors is coordinating different libraries which don’t know about eachother, and want to all add to some global data store. Like if library A uses typetag to create serializable trait, libraries B and C should be able to add data to the “all types implementing this trait” global list without A knowing about them.

Using an explicit solution like this, any binary crate depending on B and C would have to call the global constructors for each of their serializable types manually, even if those types are purely internal and the user shouldn’t have to care about them.

The biggest disadvantage that I see, though, is that then library C can’t add a new type which uses a global constructor without a new major version. Since adding a global constructor forces all consumer crates to now add a new line to their main function, it becomes a breaking change anything involving global constructors.

This is true!

I haven’t mentioned lazy_static as I don’t believe it solves anything similar to the same problem, but I might not have really given a good explanation for that. It’s true that global constructors would be able to do some of the same things lazy_static can do, but they can solve one extra case: when the crate using the global data has no idea the crate providing the data exists.

I’m adding more to the pre-RFC text, but here’s another demonstration.

The best example I have is typetag. Say I have a logging crate which allows for various logger configurations, and can serialize those configurations into JSON. In my logger crate, I define a trait SerializableLogger, and use typetag on it.

Another crate, say logger-syslog-adapter, can then define a concrete implementation SyslogLogger.

When the consumer uses logger to deserialize their configuration, they want to be able to have it “just work” and deserialize it. With global constructors, typetag registers SyslogLogger into a static list of implementors of SerializableLogger. Then when the configuration is deserialized, if a syslog logger was specified, SyslogLogger is automatically grabbed and used as the logger for the Box<dyn SerializableLogger>, without logger ever mentioning logger-syslog-adapter in its source code.

This would have been impossible to implement with lazy_static as lazy_static requires the code providing the constructor to have been called at least once. But when dealing with cross-crate data like this, it’s natural to only ever specifically call logger-syslog-adapter when setting up and serializing the data, not when deserializing it.


Just want to say I’m super glad to have a different viewpoint on this. I’m not too experienced with these, and I’m really glad to have your input on this!

Done.

Thoughts on using #[unsafe_global_constructor] instead? I wanted to include the word unsafe in some way since, if we don’t fix them being before stdlib initialization, they can break things. But I can see how unsafe fn is the opposite of this.

I… kind of get this, but isn’t using libraries at all a security concern?

If an end user depends on a library, then it seem reasonable to assume that they are calling at least one of that library’s functions. Sure, it might make debugging more annoying if the library’s doing something odd on program initialization, but if we’re depending on it and including its code in the end binary, I think we’re trusting the library.

When reviewing a library, I would think global constructors should be able to stand out. If nothing else, keeping them unsafe in some form or another should highlight them compared to other (safe) code.

If we expect global constructors to be a niche feature, I would agree with you on this. But I feel like the more libraries use it for small things (like typetag traits), the less this would mean.

What would you propose the behavior be when the user doesn’t call run_all_hooks()? If we have not calling it being an error, new users will probably just stick it in there anyways - and requiring use of unsafe just to use various libraries will devalue unsafe.

If it’s silently allowed, or even with a warning, then suddenly parts of crates people depend on might just not work. If this is used for FFI initialization, we could end up in unsound territory, or if it’s just for things like typetag, then deserialization could just fail at runtime.

Unrelated to the above, but I hadn’t thought of using LLVM as a downside. I’ll add that in.

Sounds reasonable- if this ends up as a full RFC, having it only for one platform would be… bad. I was thinking of this as an alternative for testing, but I guess that’s still bad language design.

I usually would to, but I would hope the global aspect of it offsets that. I’m opposed to initializer because of it’s connotation with initializing a specific value somewhere. Keeping away from initializer would also help differentiate this from C++ static initializers, which are indeed intended to initialize a single static value.

If this initializes a crate, though, it kind of is constructing the crate’s global state. I guess that’s fairly similar to initializing the global state…

I’ll add #[register_main_hook] in as an alternative, but if I found this somewhere in the code I think I’d have even less of an idea of what it does than global_constructor or _initializer.

I mean, I haven’t thought through this? Good questions. I will try to add to these sections if I or anyone else comes up with reasons for either side.

This is one use case, but not my primary one. I will elaborate in the edited post.

Thank you for linking this!

I am excited about the possibility of this alternative, and will have to look into it more.

I’m planning on at least expanding this pre-RFC a bit further in its current direction to collect this knowledge and try to explore it? But you’re right, the solution here doesn’t really match up with the problem. I started approaching this from the point of view of “LLVM has a global_ctors attribute, and we use rust-ctor to solve the problem, so using rustc to take advantage of global_ctors seems like a good idea”, not necessarily looking for the best language design solution.

My understanding is that this kind of built-in data store is less charted territory, but that’s not necessarily bad. If we can have our cross-crate-coordination cake and (with no runtime cost) eat it too, that would be pretty great.

Adding this to unresolved questions. It seems like this could depend on whether we implement this using LLVM’s global_ctors, or as part of the main shim if that proves useful for running after stdlib initialization?

I will be attemting to understand & follow links on @comex’s explanation of this. (thanks for that!)

I think this ties into @dtolnay’s post above.

My personal use case is directly just “using typetag”…

Just looking at rust-ctor’s dependent crates for ideas, there’s also

Not sure how useful that is. I can see the ANSI escape code setup being useful, but unless I’m misunderstanding a std::sync::Once check could probably work too? Maybe bad to do that initialization when panicking?

With the emacs crate, it looks like this would still want to use rust-ctor to get literal ctors even if we get support for global constructors in binary crates…

I guess my hope here was that by having global constructors be standalone functions rather than explicitly initializing variables, it would make it much harder to create a library which invokes undefined behavior when called before the global constructor is called. Like, we won’t ever have any uninitialized statics unless someone explicitly uses MaybeUninit.

I’d argue that if we can encourage abstractions over this feature enough, leaving the order undefined could be entirely fine. In my perfect world no constructors would ever depend on one another… Of course, that world won’t exist.

One thing I’m worried about if we define the order, though, is seemingly arbitrary changes changing it.

For example, I’m extremely worried about becoming dependent on runtime order of global constructors within a crate- especially if that depends on something like the names of modules, or what order they’re declared in. There is exactly one feature right now which depends on module declaration order, and that is macro declaration. When macro declaration fails though, we will get explicit compile time errors.

If a crate has an implicit dependency of one module’s global constructor running before another, this all becomes much more hairy. What if rustfmt reorders the mod declarations? What if in refactoring, one module is renamed, making it sort differently and putting it above the other? These seemingly entirely innocent changes could break code depending on this order, and the error wouldn’t be discovered till runtime.

Sure, this could happen with an undefined order too. However, a “defined” order which depends on easily changeable things like module order could lull users into a false sense of security worse than just not knowing any order at all.

Ehhh, or at least that’s my scenario. Maybe that’s unrealistic?


There are a few more posts that I haven’t responded to here, will try to do that. Glad to have many more ideas in here! Hope this hasn’t been too rambly.

As others have mentioned, something like Idea: global static variables extendable at compile-time might be a better idea. But if it is, I think exploring this one fully still has value.

Again, thanks!

2 Likes

No, typetag needs to create a list of impls of a trait. lazy_static and Once only run when explicitely asked to run, while typetag needs all impls in the whole program. To get it you would need global constructors, as they are the only way to run something without explicitly telling it to run.

1 Like

I too have been burned by C++ global ctors :older_man: and because of this, I would also rather see us go after distributed_slice - like features first.

But I also have an evil idea which I can’t resist suggesting: the order of execution of global ctors is specified, but what the spec says is, they will be executed in a different randomly-chosen order on every run of the executable. Thus, if they’re not all independent, you have a good chance of catching it during QA, and nothing comes to depend on some unspecified-but-usually-stable order.

4 Likes

Ah, sorry! I was trying to ask about libraries using rust-ctor for FFI initialization here.

I’m wondering if moving forward with distributed_slice instead of this would leave other libraries besides inventory and things depending on it without a cross-platform solution.

1 Like

I would like to provide my feedback on this pre-RFC (well, on the feature in general, rather) since it is very relevant to what we need.

Our main use-case currently currently is our error handling. In short, we have &dyn failure::Fail (which will be &dyn std::error::Error eventually) and we need to “adapt” it to some other “&dyn ExtraInfo” trait. The crux of the issue here is that we need to know which concrete error types do we have so we can check against them (via failure::Fail::downcast_ref). However, at the same time, the size of our system is such that we can no longer rely on manual registration of all these errors: it’s too error prone.

Another use-case would be our deserialization framework: it’s somewhat similar to typetag crate in its nature. Currently we still rely on “manual” registration, but this is blocking future work on making it more modular (for example, we might try dylibs for pluggable types).

Finally, we are starting to shape parts of the system where we would have “application” developers building “plugins” for our core: again, manual registration becomes too error prone.

Currently we use ctor crate (I didn’t know about inventory/linkme before). I think, linkme is perhaps would be the best solution for us, though.

Personally, I would be in favor of this design going in the direction of linkme crate, though, for the following reasons (all of them are mentioned in this thread):

  • dodges ordering issue (well, offloads it to the consumer – it’s up to you to figure out the order you actually want / if you care!).
  • avoids running arbitrary code without you noticing (though, I also buy the whole argument of trusting libraries – of course, they can still do sketchy things!)
  • still allows for running “constructors” via Once/lazy_static!, if desired.
  • it generally seems to be something that would be easier to agree on?

I would say, that it was a pretty frustrating experience figuring out our current solution and outside of ctor/inventory/linkme technique (“linker magic”), it seems like there are no good alteratives. However, all three of these have the same subtle failure mode in edge cases, which @Comex rightfully mentioned in this thread.

4 Likes

It is good idea for RFC, but I believe RFC should state how no_std environment will be able to solve the problem too, or if it cannot be solved, it should stated in RFC. Rather than having user to provide multiple functions that are marked with special attributes, it might be also better to have single global initialization hook, rather than multiple. Since the idea is to mark free functions rather than allowing non-const static initialization, I don’t believe there is a need for multiple functions in single crate like that.

but all statics must still have some sane and initialized default.

Not necessary, since we have no concept of running destructor for statics, it is actually can be allowed for static to be uninitialized.

Go does this for its hashmaps- iteration order (as implemented, not as specified) occurs in a random order per process.

This makes it impossible to write a macro-by-example implementing gflags in Rust. I argue that gflags are the premiere use-case for library ctors.

Rust’s HashMap does this too, at least with the default hasher, as it’s randomly seeded.

Ah, I couldn’t remember if Rust did. I vaguely remembered some bug about collision attacks but I couldn’t remember if it was related.

The interaction with no_std is definitely something that would need to be figured out!

If this is implemented using the same infrastructure as C++ static initializers, then the infrastructure they would run on wouldn’t depend on allocation or anything else in std. It could even still run without the rust initialization in #![no_start].

I guess if we’re implementing it as part of the rust start code, then there’s more to figure out here. Thanks for bringing it up!

With the multiple functions, this is necessary in order to allow things like inventory to function. If we were only allowed one function per crate, there wouldn’t be a reasonable way for an attribute-based macro to register multiple things on independent items to run on startup. As this kind of “registration” is one of the main use cases, I would be inclined to say that having multiple functions is needed.

I just mean this as far as it is true in rust in general. The following isn’t valid rust, and has never been valid:

static X: &str;

Global constructors wouldn’t allow you to write this either - all statics would need to be initialized to something, still. Even std::mem::uninitialized() is non-const, so I don’t think it’s possible at all to do this?

Maybe using MaybeUninit, but even then, it’s not really “uninitialized” since unsafe code is required to read it later. This is a concern, but I don’t think it’s a large one.

To be clear, I’m proposing not changing these rules. Global constructors would not introduce the ability to have uninitialized statics, nor take it away (since we don’t currently have it).


With all that said, I think going forward with a distributed_slice or linkme-based method is probably more sane. I’ve not put together a proposal, but I’ll probably be trying to do that later next month or sometime soon.

2 Likes

If this is implemented using the same infrastructure as C++ static initializers, then the infrastructure they would run on wouldn’t depend on allocation or anything else in std . It could even still run without the rust initialization in #![no_start] .

If it would work out of box that would be ideal, but I believe it leaves concerns for dependency on std as some statics may need memory allocations and friends. It is been a while since I looked at Rust runtime, but most likely panics would be UB in this case, but heap allocations are still should be allowed(need to check at which step global allocator is et up)

Even std::mem::uninitialized() is non-const, so I don’t think it’s possible at all to do this?

I believe a better support for global constructors would require it to become const, let say as extra feature. While it is true we can use Option and friends, it is overhead. In short ideally we need const way to have uninitialized static for global constructor to initialize

1 Like

mem::uninitialized is on it’s way to being deprecated, so I wouldn’t count on it. Instead you could use mem::MaybeUninit, which has no overhead, to deal with uninitialized data.

2 Likes

mem::MaybeUninit is not const friendly yet too. It doesn’t really matter which we’d use, potentially it should be const friendly

1 Like

I intentionally excluded this feature from the RFC - as I understand it, allowing access to uninintialized constants when initializing other constants is one of the main bad things about C++'s static initializers. I hoped that by not allowing global constructors to initialize statics (forcing use of a mutable static with Option or similar instead), we could avoid that problem.

What’s your use case for uninitialized constants in particular?

3 Likes

when initializing other constants is one of the main bad things about C++'s static initializers.

There is nothing bad about it, it is just you cannot depend on initialization order of statics.

I don’t understand why you want to avoid user intentionally using uninitialized on global statics. How it is different from using it on non-global variables? It is not different, when using unsafe user takes responsibility to use unitialized variable properly when relying on global constructor functions.

What’s your use case for uninitialized constants in particular?

Initialize global variable that lacks const fn initializer (ideally I’d like to avoid mut for global statics that need one time initialization)