Reintroduce `main` Functions to `no_std` Targets

Recently, the ability to have a main function without the standard library has been removed, since that the previous implementation never works. However, having a main function in no_std targets would improve programming experience and allows language features to be more unified. This Pre-RFC issues a solution, aiming to have a working implementation of customized program "actual" entry point with the main function.

Changing The Way How The main Function Is Called

Currently, the program's "actual" entry point, which usually has symbol name main, for example, on Linux and most other Unix platforms, is generated by rustc_codegen_ssa. The generated entry point has knowledge about the Rust main function and the start language item, then things like the main function pointer are passed to the start implementation. Since the compiler cannot generate a customized entry point, this implementation is blocking us to implement the start language item manually, and make it work in a desired environment, which is unknown to the compiler.

This Pre-RFC changes the way the Rust main function is called. After the change, we will have an "actual" program entry point entirely implemented in the standard library. We will introduce a new function generated by the compiler, which has symbol name rust_main. The generated function, in Rust, would look like:

// type RealTermination = (return value of the main function);
// extern "Rust" fn main(); (the Rust main function)
#[unsafe(no_mangle)]
unsafe extern "Rust" fn rust_main() -> &'static mut dyn Termination {
    static RET: SyncUnsafeCell<MaybeUninit<RealTermination>> = /* ... */
    unsafe {
        (*RET.get()).as_ptr_mut().write(main());
        (&mut *(*RET.get()).as_ptr_mut()) as &mut dyn Termination
    }
}

To ensure safety, the rust_main function is limited only to be invoked once on program startup.

The new implementation of the start language item will not have a fixed signature. However, the #[start] attribute will only work as a flag that indicates its existence and does conditional compilation, to avoid linkage errors when we are not compiling a bin target. For example, on Linux, we can:

#[start]
#[unsafe(no_mangle)]
unsafe extern "C" fn main(argc: c_int, arga: *const *const c_char) -> c_int {
    unsafe extern "Rust" {
        fn rust_main() -> &'static mut dyn Termination;
    }
    unsafe {
        rust_main();
        // handle termination...
    }
    0
}

For dropping the returned Termination, we may change &'static mut dyn Termination on the signature to &'static mut ManuallyDrop<dyn Termination>.

1 Like

I was going to say “main relies on having an OS to describe how the program is run at all” but (a) it might have something less than an OS that’s nonetheless still covered by the target, and (b) cfgs exist, so carry on.

You can absolutely have a main function on no_std targets. You just have to mark it as #[no_mangle] and use the correct function signature. Sure, this will skip a bunch of initialization that the standard library normally does, but all of this initialization is dependent on there being an OS anyway, so it doesn't make sense for no_std targets anyway. And if you do it on an std target, libstd will still function. The only adverse effects you may notice are that std::env::args() may not work, that stack overflow will give (guaranteed on all tier 1 targets) SIGSEGV rather a nice abort message and if for some reason stdin, stdout or stderr is not open at program startup trying to access either of them will misbehave rather than access /dev/null.

A crate can only be an executable or library, not both at the same time, so there is no reason why you would need conditional compilation.

Rather than having a generated entry point, I think it makes significantly more sense for #[start] to directly be the linker entry point. For example, on a *-unknown-none-elf target, #[unsafe(start)] would set the e_entry ELF field as the address of the annotated function, without any sort of further sugar. Everything is manual; the function might even be required to be #[naked].

But I think what the OP is actually wanting for is for "progressive std" — that instead of std support being all or nothing, targets can (via whatever mechanism) support parts of std functionality (such as a proper main entry point) independent of providing other orthogonal functionality (such as a global general purpose dynamic allocator).

2 Likes

I don't know what you mean by this, but it is entirely possible to write a binary for some arbitrary environment without using the compiler/std machinery and without implementing any language items:

#![no_main]

#[unsafe(no_mangle)]
extern "C" fn main(_argc: std::ffi::c_int, _argv: *const *const u8) -> std::ffi::c_int {
    println!("Hello, World!");
    0
}

This also works with #![no_std].

You "just" have to use a different function name and signature for each OS / embedded runtime. Rust can't know every single runtime and start convention out there, after all.

So, I don't understand which problem you are trying to solve.

@bjorn3 @RalfJung

However, it requires #![no_main], and it will be the programmer's duty to check the signature of the program entry point, or it requires an attribute macro. For a procedural macro, when writing a std-like library for no_std environments, if we need to do something like program initialization that can be only invoked once on program startup, the library will have to make the function #[doc(hidden)] pub unsafe. Without #![no_main] and with an entry point implemented in the library, the program initialization may be done more gracefully and more safely.

1 Like

So, IIUC, the goal of this request is to be able to write a library which (isn't std and) provides a program #[start] entry point that then calls the binary crate's fn main like std's entry point does.

The current design is that std provides some #[lang = "start"] fn<T: Termination>(main: fn() -> T, argc: isize, argv: *const *const u8, sigpipe: u8) -> isize, which the compiler is responsible for hooking up to the target object format's entry point.

The desired design is one where #[start] fn is directly plumbed to the object format's entry point, and the main: fn() -> impl Termination is somehow made available to call from the library providing the entry point into the downstream binary crate.

This seems reasonable enough, falling under the same desired functionality class of injected global resources a lá #[global_allocator].

3 Likes

We only support this for our main OSes though, which do have std.

So I don't understand the point of doing that without std... if the goal is to have some global init code called exactly once on startup (as indicated above), that sounds like a "life before main" request.

To resolve the "crate is either a library or binary" problem, you could make the attribute usable on a use statement?

#[start]
use platform_support::start;

fn main() {}

Requiring that start be #[naked] plus documentation reminders that you can't call any Rust code before you accomplish all of the loader program tasks such as zeroing .bss and copying over the .data seems like a viable path forward to prevent life-before-main problems.

EDIT: Just realized the above paragraph contains a description of the use case for this feature: This feature should primarily target being useful on platforms where your start symbol is instruction 1 on the CPU.

Instead of no_mangle, main should be a sym resolvable only from inside the #[start] function. (Note: magical Christmas land in this paragraph, no clue how implementable that is.)