Globalization optimizations

this is an idea i've had for a while, and i think it fits very well with rust's idea of zero-cost abstractions.

the basic idea is turning local (stack) variables into global (static) variables. this is mostly useful for embedded platforms which may have limited stack space and/or not want to have the extra pointer indirection, but it would also give marginal performance improvements on non-embedded platforms. this allows programmers to wrap their global state in a struct without worrying about runtime performance. it also may allow them to be statically initialized instead of running their constructor at runtime.

i think the most sound way of implementing this would be through generic instantiation. basically, the compiler could turn a &mut T into a Globalized<T>, marking that the address of T is known at compile time. every function that accepts &T or &mut T would then be split into two implementations: one that takes its argument normally (this is the version accessible through function pointers and ffi exports), and one that receives its argument through the generated global variable. the regular version would call the regular version of other functions/methods, and the globalized version would call the globalized versions of other functions/methods.

later on in codegen, some of the (cold) globalized functions could be rewritten to trivially wrap the non-globalized counterparts, simply by taking the address of the global. this same method would be used for calling ffi functions on the global.

the criteria for this optimization triggering would be:

  1. a local variable is defined in a function that is only called once (usually main() or a function that has been inlined into main())
  2. opt-level=3

another simpler option would be just to globalize variables defined in main(), and rely on other functions getting inlined into main() in order to get most of the benefits. this would at least still allow the variable to be statically initialized.

an even simpler possibility i found that would allow static initialization: translate let x = BigStruct{ ... }; func(&x) into let x = &BigStruct{ ... }; func(x).

this will make BigStruct live in the static data region instead of on the stack.

here's an example of what i'm talking about on godbolt.

You can call and return from main more than once.

Related topic.

1 Like

yes, but you can also analyze the call graph to ensure that doesn't happen, at least in the absence of ffi.

2 Likes

Not just the call graph, but any examination of main more more generally. (Interrupt handlers, creating a function pointer, type erasing the function item, ...)

3 Likes

Recursive main seems like an unnecessary flex that could be banned in the language.

Taking a pointer to main could give a pointer to abort instead. FFI and linking aren't an issue, because fn main is a mangled symbol and not the unsafe C main.

2 Likes

Aside from being a breaking change generally, the direction seems to be making main less magical, not more. main being an import (RFC 1260) is in beta (stable next week). After that, crate-wide analysis is no longer always enough to determine "only called once" for main (without going back on that RFC/FCP too).

Since this is a (hypothetical) optimization anyway, I don't think there's enough motivation to make a breaking change. Additionally/instead, enforcement of an only-called-once property could be an opt-in attribute instead of being main specific, if it ends up having enough motivation to get implemented.

6 Likes

yeah, i think something like a #[non_reentrant] attribute would be more generally applicable for optimizations. it could even be checked in debug builds using a local mutable static.

I think it would still be fair to say “if main isn’t pub, it can’t be called from other crates”.

The idea of conservatively-correct general stack-to-static promotion seems reasonable as a constrained optimization. Probably not one to turn on all the time, because static data is often worse than the stack in terms of cache use, code size, etc, but a valid transformation to have in the toolbox!

4 Likes

one important thing is that stack data can't be statically initialized.

also, some microprocessors may not even have a cache, so saving a cycle of pointer indirection (and also a bunch of cycles to zero-initialize a stack array) would be more valuable.

1 Like

Maybe we should apply all this magic to #[entry] instead of main? main would stay a normal function, and #[entry], which you have to use in embedded, is already special anyway.

1 Like

Rustc has no knowledge of #[entry]. It is a macro defined in the cortex-m-rt crate, which lowers to defining a #[no_mangle] symbol. There is no way for rustc to know that it won't be called twice.

At minimum, non-recursive main provides the most information to an optimizing compiler. And you can use a non-recursive main to simulate a recursive main, so nothing is lost.

Demonstration
fn main() {
  recursive_main()
}

fn recursive_main() {
  if rand() {
    println!("recursed");
    recursive_main();
  }
}

The other way around (simulating non-recursive main with recursive-main) is also possible, but not in a way that the optimizing compiler readily understands. You'd need a hefty static analysis tool to demonstrate that main is only called once, isn't stored in a function pointer, isn't an interrupt hook, etc...

Even if you can import main from another crate, that doesn't preclude rustc from special-casing the codegen of main when compiling the final binary. It could be codegen'd to call abort at the very start, with the Rust runtime not calling main directly but rather main+16 or so to jump past a ud2-sled.

1 Like

Another (weird?) option here would be to desugar main into two mains, only one being the entry point.

pub fn main() {
    println!("hello!");
}

/// Desugars to

#[entry]
fn secretly_main_XxX() {
    println!("hello!");
}

pub fn main() {
    println!("hello!");
}

Or just have secretly_main_XxX call main directly. But at least as written above secretly_main_XxX is definitely non-reentrant (nobody knows the name), and it can be optimized with statics or anything else.

Edit: somebody's going to pass in the address of secret main as an argument or something and call it, aren't they.

2 Likes

Isn't something like that already happening, in rt.rs - source? I'm not exactly sure how #[entry] differs from #[lang = "start"], and how any of this works on no-std.

EDIT: Hm the docs for the module seem to be wrong, there is no mention of the heap anywhere in the file except the module docs.

But rustc could have #[something] that says that a function isn't called twice. However this only makes sense if it's unsafe to call this function.. but unfortunately main is safe; it doesn't matter if it's not public, because one can store a function pointer to it somewhere.

For an unsafe function, it would be totally reasonable to annotate it so that it's UB to call it twice.

Rustc has no knowledge of #[entry] .

Sorry I was thinking of #[lang_item = "start"] and the #[start] attribute: Tracking issue for the `start` feature · Issue #29633 · rust-lang/rust · GitHub

1 Like