Globalization optimizations

binarycat · June 6, 2024, 10:34pm

this is an idea i've had for a while, and i think it fits very well with rust's idea of zero-cost abstractions.

the basic idea is turning local (stack) variables into global (static) variables. this is mostly useful for embedded platforms which may have limited stack space and/or not want to have the extra pointer indirection, but it would also give marginal performance improvements on non-embedded platforms. this allows programmers to wrap their global state in a struct without worrying about runtime performance. it also may allow them to be statically initialized instead of running their constructor at runtime.

i think the most sound way of implementing this would be through generic instantiation. basically, the compiler could turn a &mut T into a Globalized<T>, marking that the address of T is known at compile time. every function that accepts &T or &mut T would then be split into two implementations: one that takes its argument normally (this is the version accessible through function pointers and ffi exports), and one that receives its argument through the generated global variable. the regular version would call the regular version of other functions/methods, and the globalized version would call the globalized versions of other functions/methods.

later on in codegen, some of the (cold) globalized functions could be rewritten to trivially wrap the non-globalized counterparts, simply by taking the address of the global. this same method would be used for calling ffi functions on the global.

the criteria for this optimization triggering would be:

a local variable is defined in a function that is only called once (usually main() or a function that has been inlined into main())
opt-level=3

another simpler option would be just to globalize variables defined in main(), and rely on other functions getting inlined into main() in order to get most of the benefits. this would at least still allow the variable to be statically initialized.

binarycat · June 6, 2024, 10:42pm

an even simpler possibility i found that would allow static initialization: translate let x = BigStruct{ ... }; func(&x) into let x = &BigStruct{ ... }; func(x).

this will make BigStruct live in the static data region instead of on the stack.

here's an example of what i'm talking about on godbolt.

quinedot · June 6, 2024, 11:24pm

You can call and return from main more than once.

Related topic.

binarycat · June 6, 2024, 11:34pm

yes, but you can also analyze the call graph to ensure that doesn't happen, at least in the absence of ffi.

quinedot · June 6, 2024, 11:38pm

Not just the call graph, but any examination of main more more generally. (Interrupt handlers, creating a function pointer, type erasing the function item, ...)

kornel · June 7, 2024, 12:50am

Recursive main seems like an unnecessary flex that could be banned in the language.

Taking a pointer to main could give a pointer to abort instead. FFI and linking aren't an issue, because fn main is a mangled symbol and not the unsafe C main.

quinedot · June 7, 2024, 1:59am

Aside from being a breaking change generally, the direction seems to be making main less magical, not more. main being an import (RFC 1260) is in beta (stable next week). After that, crate-wide analysis is no longer always enough to determine "only called once" for main (without going back on that RFC/FCP too).

Since this is a (hypothetical) optimization anyway, I don't think there's enough motivation to make a breaking change. Additionally/instead, enforcement of an only-called-once property could be an opt-in attribute instead of being main specific, if it ends up having enough motivation to get implemented.

binarycat · June 7, 2024, 2:02am

yeah, i think something like a #[non_reentrant] attribute would be more generally applicable for optimizations. it could even be checked in debug builds using a local mutable static.

jrose · June 7, 2024, 2:03am

I think it would still be fair to say “if main isn’t pub, it can’t be called from other crates”.

The idea of conservatively-correct general stack-to-static promotion seems reasonable as a constrained optimization. Probably not one to turn on all the time, because static data is often worse than the stack in terms of cache use, code size, etc, but a valid transformation to have in the toolbox!

binarycat · June 7, 2024, 2:09am

one important thing is that stack data can't be statically initialized.

also, some microprocessors may not even have a cache, so saving a cycle of pointer indirection (and also a bunch of cycles to zero-initialize a stack array) would be more valuable.

robertbastian · June 7, 2024, 6:18am

Maybe we should apply all this magic to #[entry] instead of main? main would stay a normal function, and #[entry], which you have to use in embedded, is already special anyway.

bjorn3 · June 7, 2024, 7:09am

Rustc has no knowledge of #[entry]. It is a macro defined in the cortex-m-rt crate, which lowers to defining a #[no_mangle] symbol. There is no way for rustc to know that it won't be called twice.

ryanavella · June 7, 2024, 4:20pm

At minimum, non-recursive main provides the most information to an optimizing compiler. And you can use a non-recursive main to simulate a recursive main, so nothing is lost.

Demonstration

fn main() {
  recursive_main()
}

fn recursive_main() {
  if rand() {
    println!("recursed");
    recursive_main();
  }
}

The other way around (simulating non-recursive main with recursive-main) is also possible, but not in a way that the optimizing compiler readily understands. You'd need a hefty static analysis tool to demonstrate that main is only called once, isn't stored in a function pointer, isn't an interrupt hook, etc...

Even if you can import main from another crate, that doesn't preclude rustc from special-casing the codegen of main when compiling the final binary. It could be codegen'd to call abort at the very start, with the Rust runtime not calling main directly but rather main+16 or so to jump past a ud2-sled.

toc · June 7, 2024, 7:21pm

Another (weird?) option here would be to desugar main into two mains, only one being the entry point.

pub fn main() {
    println!("hello!");
}

/// Desugars to

#[entry]
fn secretly_main_XxX() {
    println!("hello!");
}

pub fn main() {
    println!("hello!");
}

Or just have secretly_main_XxX call main directly. But at least as written above secretly_main_XxX is definitely non-reentrant (nobody knows the name), and it can be optimized with statics or anything else.

Edit: somebody's going to pass in the address of secret main as an argument or something and call it, aren't they.

Vorpal · June 7, 2024, 8:31pm

Isn't something like that already happening, in rt.rs - source? I'm not exactly sure how #[entry] differs from #[lang = "start"], and how any of this works on no-std.

EDIT: Hm the docs for the module seem to be wrong, there is no mention of the heap anywhere in the file except the module docs.

dlight · June 8, 2024, 2:21am

But rustc could have #[something] that says that a function isn't called twice. However this only makes sense if it's unsafe to call this function.. but unfortunately main is safe; it doesn't matter if it's not public, because one can store a function pointer to it somewhere.

For an unsafe function, it would be totally reasonable to annotate it so that it's UB to call it twice.

robertbastian · June 8, 2024, 3:35pm

Rustc has no knowledge of #[entry] .

Sorry I was thinking of #[lang_item = "start"] and the #[start] attribute: Tracking issue for the `start` feature · Issue #29633 · rust-lang/rust · GitHub

binarycat · June 29, 2024, 11:59pm

a non_reentrant function could safely use a static mut variable. this could be handy if you have some data you want to be stored at a fixed address (configurable with a linker script), but you don't want it to leak into global scope

zirconium-n · June 30, 2024, 8:20am

It's still unsound in presence of multi-threading.

And it's even still unsound without multi-threading if you handle out the reference to the static mut.

binarycat · June 30, 2024, 2:56pm

hmm, i guess the current static mut rules are way too lax, but a safe construct that provides the same functionality could be used..

i think it just needs a few more rules:

the borrow checker treats it as a regular variable, so no references to it can outlive the scope it was defined in
it cannot be captured by a closure

Topic		Replies	Views
How about "generic global variables"? language design	19	5658	March 25, 2019
Non-generic statics in generic contexts	15	1092	February 7, 2022
Idea: polymorphic baseline codegen compiler	22	3270	March 25, 2019
Design space: could global inference but limited internally to the current crate would be acceptable in Rust? language design	18	1679	February 4, 2021
Idea: Existential lifetimes	5	1432	May 12, 2019

Globalization optimizations

Related topics