Hi! I've written up a WIP RFC for adding global constructors, otherwise known as ctors or static initializers.
Crates like typetag currently use these constructors, but only support our tier-1 platforms because they depend on rust-ctor
. By adding compiler support, we could expand this to all platforms.
If anyone could give feedback, either to the wording or the ideas, it would be greatly appreciated!
I've tried to fill in as much detail as I can, but it's fairly unrefined. I plan on refining this a bit then submitting an RFC pull request, if all goes well.
EDIT: This is the second version of the pre-RFC. I edited this on April 29th following replies - it does not incorporate all feedback yet, but at least some of it. There should be a little pencil icon on this post (top right I think?) which allows seeing the older versions.
- Feature Name:
global_constructor_functions
- Start Date: 2019-04-18
- RFC PR: rust-lang/rfcs#0000
- Rust Issue: rust-lang/rust#0000
Summary
Adds an attribute #[unsafe_global_constructor]
marking a function to be run as part of program initialization, before main
is called.
Global constructors will be run after the standard library has been initialized, but in an otherwise unspecified order. It is unsafe to declare a global constructor, and direct use will be discouraged in favour of wrapping libraries.
Motivation
There are three main motivations for this. The first is to enable collecting items from various crates into one place, and enabling higher-level crates like typetag crate to function on all platforms.
There is currently no cross-platform way to coordinate between crates which do not directly reference each other. The current solution is the inventory crate, the backing typetag. It supports first-tier platforms using rust-ctor.
For other kinds of initialization, we might be able to use a crate like lazy_static
. But this won't work for cross-crate coordination because the crate using the data has no idea the crates providing the data exist. The key idea behind inventory is that crate a
can define a data store, then crates b
and c
depending on a
can add to that store.
Then an end-user using both a
and b
can just know that a
fully works for b
's data without having to explicitly initialize it. The biggest example if this is, again, typetag. Crate a
can define a serializable trait, then end users can deserialize data directly into a trait object. The underlying concrete struct is defined in b
or c
, but a::Trait::deserialize
can still deserialize
The currently existing mechanism for creating global constructors is rust-ctor. It works very similarly to this RFC, but with the disadvantages of only supporting Linux, Windows and Mac and running constructors before std is initialized. With compiler support, we could extend this to support all platforms.
One particular motivating factor is the want for typetag support on the wasm32-unknown-unknown
platform. See rustwasm/wasm-bindgen#1216, mmastrac/rust-ctor#14 and dtolnay/typetag#8.
The second motivation is allowing initializing FFI libraries without synchronization, either with std::sync::Once
or lazy_static.
This is a less strong motivation, as we can currently
Guide-level explanation
Most rust code runs "after main". Functions running were usually all called at some point called by another function, and those functions were called by others, etc., until we reach the main
function which started it all.
This makes sense for most cases, but in some cases, having separate entry points which can initialize global structures before main can be useful. Global constructors fill this niche.
Consider this code:
use std::sync::atomic::{AtomicU8, Ordering};
static my_atomic_var: AtomicU8 = AtomicU8::new(31);
#[unsafe_global_constructor]
fn run_this_first() {
my_atomic_var.store(32, Ordering::SeqCst);
}
fn main() {
println!("Hello, world! My variable is {}", my_atomic_var.load(Ordering::SeqCst));
}
This is an example of declaring a global constructor. run_this_first
will be called during program initialization, separately from main. When multiple global constructors exist, it's unspecified which runs first.
At compile time, we initialize my_atomic_var
to 31
. Also at compile time, the list of global constructors is created, and includes our function run_this_first
.
When our program runs, first, the global constructors are loaded, and run in some order. Among these is run_this_first
, and my_atomic_var
is set to 32
.
Finally, we enter main
, and observe the variable as 32
.
Great, right? It is, but we have to be careful.
Global constructors run before main, and
Global constructors don't just run before main, they also run before other initialization the standard library performs. They should never use std::io
, and should never panic. To recognize this unsafety, all unsafe_global_constructor
functions must be declared as unsafe.
Finally, note that creating global constructors should be avoided whenever possible. Any nontrivial computation can slow down program startup, and upmost care must be taken to ensure all code is sound.
Most rust code, outside of a few support libraries, is expected to be completely free of global constructors. If this kind of initialization is needed, it's recommended to use a higher level library like inventory. Other crates might use this without ever realizing it, when they use crates like typetag.
As using global_constructor directly is discouraged, I don't anticipate this feature being taught to new rust programmers.
Constructing things at compile time and running code at runtime is Explain the proposal as if it was already included in the language and you were teaching it to another Rust programmer. That generally means:
- Introducing new named concepts.
- Explaining the feature largely in terms of examples.
- Explaining how Rust programmers should think about the feature, and how it should impact the way they use Rust. It should explain the impact as concretely as possible.
- If applicable, provide sample error messages, deprecation warnings, or migration guidance.
- If applicable, describe the differences between teaching this to existing Rust programmers and new Rust programmers.
For implementation-oriented RFCs (e.g. for compiler internals), this section should focus on how compiler contributors should think about the change, and give examples of its concrete impact. For policy RFCs, this section should provide an example-driven introduction to the policy, and explain its impact in concrete terms.
Reference-level explanation
Any otherwise-plain unsafe rust function may be marked with the #[unsafe_global_constructor]
attribute. When marked, it will be added to a list internal to the compiler, and included as a global constructor to be run before main on program launch.
For example,
#[unsafe_global_constructor]
fn my_constructor() {}
Internally, this will add the function to LLVM's @global_ctors
list.
Global constructor functions will be allowed in all crates, and in all modules (including inside other functions). Privacy of the global constructor function will not effect it being a global constructor.
Global constructor functions must:
- take zero arguments
- be monomoprhic
- return
()
To mark a function as #[unsafe_global_ctor]
without satisfying these requirements is an error
All global constructor functions in all included crates will be called, but the order is explicitly unspecified. There are no guarantees, even for functions declared directly adjacent in the same module.
Global constructor functions may be under a #[cfg]
flag, and this will behave as expected. The following constructor will be called if the code is compiled with the construct_things
feature, and won't if it isn't.
#[cfg(feature = "construct_things")]
#[unsafe_global_constructor]
fn my_constructor() {}
Global constructor functions must not be under #[target_feature]
. Global constructor functions are always called, and thus it wouldn't make sense to have code which can't run on some of the target machines. For example, both of the following will result in compile-time errors:
#[target_feature(enable = "sse3")]
#[unsafe_global_constructor]
fn bad_ctor() {
}
#[target_feature(enable = "sse3")]
mod my_sse3_module {
#[unsafe_global_constructor]
fn second_bad_ctor() {
}
}
Drawbacks
This introduces life-before-main into rust, something we explicitly avoided in the early days. See this quote from the old website's FAQ (inlined for posterity):
Does Rust allow non-constant-expression values for globals?
No. Globals cannot have a non-constant-expression constructor and cannot have a destructor at all. Static constructors are undesirable because portably ensuring a static initialization order is difficult. Life before main is often considered a misfeature, so Rust does not allow it.
See the C++ FQA about the âstatic initialization order fiascoâ, and Eric Lippertâs blog for the challenges in C#, which also has this feature.
This brings up two problems which are still relevant today:
- Like C++'s static initializers, global constructors will have an unspecified run order. As we can't initialize (only change) statics in global constructors, the danger is somewhat mitigated, but not entirely. Two global constructors could still rely on run order, and introduce subtle bugs.
Rationale and alternatives
-
Simply don't implement this.
As mentioned in the drawbacks section, this is a dangerous addition.
If we don't implement this, we could then further shame using
rust-ctor
, or let it be and simply not "grace" the feature with compiler support.On the other hand, I believe the advantages do outweigh the disadvantages. We currently don't have a good way to implement cross-crate initialization (like inventory) without global constructors. If we explicitly discourage direct use and ensure all users are aware of the unsafety, we should be able to minimize the danger.
-
Somehow define the order in which global constructors are called.
For example, in C++, it's guaranteed that static initializers declared in the same file execute in the same order that they are declared in. It could be prudent to define partial order, or a complete order, to the way in which rust's global constructors run.
This could be useful, but it could also allow people to start depending on fickle orders. The order being unspecified leaves it all up in the air, and ensures that users know they can't depend on anything they don't synchronize themselves.
Guaranteeing order of global constructors between crates might be reasonable, but I'd be very wary of having any sort of implicit ordering within the same crate.
If, for example, we were to order as encountered in a parse tree, this can all become more hairy. It'd be fairly easy to have one module implicitly depend on another module's global constructor running first. But then, what if rustfmt reorders the
mod
declarations? What if in refactoring, one module is renamed, making it sort differently and putting it above the other? These seemingly entirely innocent changes could break code depending on this order.Even worse, though, the breaking wouldn't be discovered until runtime. Global constructors have a real potential to give us unpredictable runtime errors, and I think keeping the order fully undefined should help us avoid that.
-
Implement something closer to the inventory crate, allowing separate crates to coordinate data without running code on app startup.
If this becomes a viable alternative, it would be able to solve one of the motivation factors for this RFC without many of the disadvantages of global constructors, such as initialization order and runtime cost on application startup.
As of right now, this is looking more appealing.
There are two disadvantages I know of:
- This may require a more complicated implementation, whereas LLVM already supports code generation for C++ static initializers. [TODO: research differences?]
- This would not help FFI initialization, another use case for global constructors.
See the [distributed slice] crate for an example implementation of something like this for some platforms.
And, we have some smaller changes which could be made:
-
Rename the attribute.
C++ calls this kind of thing a "static initializer", and the existing rust-ctor crate simply uses the
#[ctor]
attribute.We could rename
#[unsafe_global_constructor]
to#[unsafe_global_ctor]
,#[unsafe_ctor]
,#[unsafe_initializer]
,#[unsafe_static_initializer]
, or another combination.A different alternative would be to emphasize the registering nature of this, and go with something like
#[register_main_hook]
.I chose
#[unsafe_global_constructor]
as it's reasonably descriptive, references being program-wide, and doesn't imply it's necessarily initializing a particular value. It likely is initializing something, but this differentiates the feature from C++ static initializers, which always initialize a single static which was previously undefined (TODO: citation needed). -
Apply
#[unsafe_global_constructor]
tofn()
typed statics, in addition to or instead of to functions.This strictly increases the possible uses. I've not included it for the sake of being minimalistic, no other reason.
Prior art
This is heavily inspired by the rust-ctor crate, which implements this feature in "user-land" for Linux, Windows and Mac.
One main basis for this is "static initializers" in C++. A number of blogs describe using this feature, and why not to:
- Whatâs the â
static
initialization order âfiascoâ (problem)â? (isocpp.org) - C++ Static initialization is powerful but dangerous
- Static initializers will murder your family (somewhat satirical)
C++ static initializers allow a program to initialize static variables to some value, possibly calling functions which will be run at runtime. Problems occur when one static variable depends on calling methods on another: the initialization order is unspecified, so programs written this way crash 50% of the time.
This could definitely be a problem in Rust as well, but it's mitigated by one key difference: #[unsafe_global_constructor]
will never initialize a static variable.
The constructors proposed can change value of statics, but all statics must still have some sane and initialized default.
A second, smaller difference will be heavily discouraging use of this feature outside of specific abstraction libraries. If end users never explicitly use global constructors, then
TODO: read through & add Object Pascal (Pre-RFC: Add language support for global constructor functions - #26)
TODO: read through & add Ada (Pre-RFC: Add language support for global constructor functions - #15 by mjw)
Unresolved questions
-
Is it feasible to allow panicking within global constructors?
I personally don't know enough about landing pads and such to know if we could do this. (or even if we could say something like "all panics in global constructors are guaranteed segfaults", which would be better than leaving this up in the air).
-
Should static methods be allowed as global constructors? For example,
impl MyStruct { #[unsafe_global_constructor] fn my_gctor() { ... } }
There doesn't seem to be much downside besides "it looks odd", and the upside is allowing something that we don't really have any reason to deny.
-
Should extern functions be allowed as global constructors? For example,
#[unsafe_global_constructor] extern "C" fn my_gctor() { ... }
I'm unsure of what our interactions with
extern "C"
will be, or what someone might expect from this. Unless we can say for sure that this is fine, it seems like disallowing this would be a good conservative position. -
What platforms is it feasible to support?
I make the assumption that because C++ has static initializers, LLVM will support
@global_ctors
on all of its supported platforms. This could be a naive assumption. -
How should this interact with
#[no_main]
?This is an unresolved question purely due to lack of research on my part. Since this is a pre-RFC, hope it's alright to leave this in here until I actually do that research?
Future possibilities
#[unsafe_global_constructor]
onfn()
-typed statics could be added at a later date.- The order in which global constructors are run could be partially or fully defined at a later date.