Idea: limited life-before-main for statics runtime initialization

Note: I know about this FAQ entry in the old docs, but I think this problem still may be worth revisiting.

The most common advice when runstime initialization of statics is needed is to use crates like lazy_static. Such solutions work well enough in most of the cases, but in some rare cases they are quite sub-optimal. Under the hood they use synchronization primitives (usually atomics) to track if initialization has been performed. This means, that each use of such static has to include an initialization check branch. Not only it can be relatively expensive in hot loops, but more importantly it prevents compiler from applying some important optimizations (e.g. see this post, it's about atomics, but the same applies to lazy statics as well). Rust std itself suffers from the same issue in its is_x86_feature_detected macro.

Runtime static initialization may look roughly like this:

// the memory behind this static will be initially filled with zeros
#[init]
static FOO: u32 = || { .. };

// in the case of error main will return an error code by
// using the `Termination` trait
#[try_init]
pub static Bar: Vec<u32> = || {
    if flag {
        Ok(val)
    } else {
        Err(Error)
    }
};

IIUC the "static initialization order fiasco" can be mostly solved by forbidding initialization functions. In theory we may extended this feature to allow explicit specification of dependencies between statics initialized at runtime.

Instead of relying on linker, it may be worth for Rust to handle this initialization by itself, i.e. it could collect all such statics from dependencies and implicitly insert calls to initialization functions into beginning of the project's main.

Disadvantages of adding such feature are clear: additional compiler complexity and implicitly running code, which may cause unpleasant surprises (e.g. if abused, it may significantly increase program starting up time). Also in contrast with the "lazy" statics approach, static initialization code will be executed even if its value is not used anywhere.

But in some performance sensitive cases we need to know that a static was properly initialized without checking it by ourselves, which I think warrants addition of such feature. In some cases it may be worked around by caching its value on stack or by using initialization tokens, but it does not always work good enough (e.g. if static is too big or if we pass crate boundaries, which prevents API changes).

While my main concern is performance sensitive code, such statics may be useful for other cases as well.

What do you think about possibility of introducing life-before-main in the proposed form?

2 Likes

(no expressed opinion on the proposed feature)

(generally I prefer OnceCell to lazy_static nowadays)

The general solution to the problem for libs where using a global matters for performance is to either

  • have the caller initialize a Ctx and pass references to that, or
  • have the caller create a zero-sized InitToken that proves they've called a init() -> InitToken, which is truly zero-cost to pass as an argument, or
  • if it really needs to be global data, have an unsafe requirement to call some initialization function

It's interesting that you're already using an InitToken to prove that the global cell has been initialized and won't be written again. It should be possible to use unsafe to convince the optimizer to do a non-atomic read (and optimize as such). It's clearly a bug that this currently doesn't happen (and likely due to underutilization of noalias, which I believe is still enabled (finally)).

I'd even argue that a Relaxed ordering allows the desired optimization. I don't think life-before-main is required to get the result you're looking for in the linked urlo thread.

5 Likes

One idea I have along these lines is to put data into unmapped page, and let the init function map the page. That way you get a page fault when trying to access uninitialized data.

5 Likes

This won't work if the main executable is not written in Rust you are loading a dynamically linked library at runtime using dlopen or you are avoiding the main shim because you for example are running on an embedded system. In those cases only using the same mechanism as C++ initializers (the linker support you talk about) will work, but that for example runs before libstd can store the commandline arguments in a static, thus causing a portability issues when trying to use std::env::args() from inside such an initializer.

3 Likes

If the main shim doesn't run the SIGSEGV handler won't be registered. Also not every platform has an MMU to allow for this trick.

3 Likes

Does std support systems without MMU? If this is part of the runtime it doesn't need to be supported on core-only targets.

What about future platforms? Is it possible to create a supported platform that doesn't have an MMU and yet still supports std?

+1

+1 to this as well. This almost universally results in better quality (and slightly faster) code than a lazy global. Most of the time people claim they "need" a global, they could do just fine without one, and the corner cases are rare enough not to warrant breaking the important assumption (especially w.r.t. unsafe!) of no life-before-main.

2 Likes

The Fortanix SGX target supports libstd and I would expect it to not allow catching SIGSEGV.

Compiler can apply different approaches while compiling Rust applications and shared libraries. It also may depend on target, e.g. in WASM we would have to use the start function capability.

I think the WASM targets is one example of such systems.

Can you provide examples of such breakage? AFAIK it's already possible to use ctors today in Rust (e.g. see rust-ctor), but it's platform-dependent and somewhat fragile solution.

The start function runs before libc (eg wasi's libc) is initialized, which is wildly unsafe as basically any call into libc is UB at that point. Also using the start function won't work when eg linking a rustc generated wasm staticlib into a program using clang as main compiler.

For shared libraries the only option is C++ global initializers. As it isn't known in advance if a crate will be linked into an executable or shared library, this means that executables will have to use C++ global initializers too. Even if --crate-type bin is used, an additional --crate-type cdylib is allowed and will use the exact same object files. Only linling will be done twice.

A dependency may wrap a C++ library which needs to have it's global initializers run before it is safe to call. Currently unsafe code doesn't have to worry about it as life-before-main isn't allowed, but if life-before-main becomes possible it will be possible to call into said C++ library before it is initialized if the global initializer implemented in rust happened to run before the one of the C++ library.

From the readme:

Rust's philosophy is that nothing happens before or after main and this library explicitly subverts that. The code that runs in the ctor and dtor functions should be careful to limit itself to libc functions and code that does not rely on Rust's stdlib services.

I would go even further and say that rust-ctor is unsound as it doesn't require you to use the unsafe keyword.

4 Likes

Isn't libc initialized using the same start function? I think it should be possible for compiler to insert statics initialization after such environment setting code.

Ah, yes, you are right. I guess it should be fine requiring unsafe for runtime static initialization, but I must say it feels somewhat backwards. From idealistic point of view it's wrapping C++ code reliant on global initializers should be unsound, not limiting Rust in using system capabilities, but I understand it's reasonable from practical point of view.

No, the _start export is called by wasi after the start function. This function is implemented by wasi-libc and first initializes libc, then calls all C++ static initializers and finally calls the user main function.

And that's not an accident. It's like dynamic library unloading (which also came up a couple of days ago): it's surprisingly tricky to do it correctly, and naïve ideas/implementations are regularly full of soundness holes.

Well, let me turn the question inside out: can you prove that it won't cause any soundness (or other) issues? I think the burden of proof should be on whoever proposes such a fundamental change.

Anyway, if life-before-main is disallowed, then one can assume two things:

  1. Statements will be executed in the order deduced from the source code inside functions. For instance, many real-life libraries contain initialization routines which, unfortunately, rely on global or thread-local variables. If two such C libraries are used in a way that the Rust code calling them makes them depend on each other, suddenly initialization order becomes important. This is not in general true or solvable in the presence of life-before-main.
  2. Any required functionality provided by std or core or libc (or other system resources) will be properly set up. I/O comes to mind as the primary example.
1 Like

I want to amplify what @H2CO3 is saying here, because it's (IMHO) important. I spend maybe 50% of my time writing code, and then the other 50% of the time debugging it. This often involves point 1 that @H2CO3 made above, where I have to make logical inferences on what happened based on what code could or couldn't have been executed. If you introduce a lifetime before or after main, then you need a way of debugging it as well, or you end up with a lot of issues that are very, very hard to resolve.

So, as @H2CO3 has asked, can you give an example where the methods that @CAD97 provided in his earlier post won't work? That is, something that truly requires a life before main, or a life before some init function is called?

1 Like

SGX in in fact capable of handling processor exceptions. The Intel SDK catches OS signals, detects that they originated from a processor exception inside an enclave, and reenters the enclave which can then tell what exception happened with what exact processor context and resume execution however it likes. Fortanix is certainly capable of doing the same.

Won't this become a non-issue with improvements to const-eval?

As far as I know at least the first version of SGX doesn't allow dynamically changing which parts of the address space are mapped and which are not as soon as the enclave has been sealed.

1 Like

Const-eval can't handle interaction with the OS or any kind of non-determinism.