Which std functions are available/work without calling main()

Given some code which is being called as an extension from some other language runtime, what library functions are expected to work as documented? That is, main is in some other language, and the first call to Rust is well after the initial process startup.

Specifically, it looks like std::env::args returns nothing if Rust’s main hasn’t been called. Is that expected? Is there any documentation to that effect?

What other functions change behaviour when used this way?

2 Likes

I would look to rust’s startup code to see things which happen before main().

So the stack guard is another – though at least on Linux, you’ll still have the OS guard, but it will look like a raw SIGSEGV on overflow. You’ll also miss the Thread init, which mostly just means that std::thread::current().name() will be None. I think both of these are also true of threads created outside of std::thread, e.g. with direct pthread_create.

3 Likes

Windows doesn’t actually use main() to gather its arguments, so it’ll actually work fine on there.

Unfortunately, Linux literally provides your command-line parameters as arguments to your main function. I’m not sure if it’s possible to get them any other way short of trawling /proc, which would be way slower than Windows’s single syscall.

1 Like

Would it be reasonable to add public functions to initialize and tear down the standard library, for use when main is not written in Rust? The initializer could take the C argv as an argument.

When using /proc using eg:

$ cat /proc/self/cmdline

the kernel will read the arguments from the position on the stack they are normally placed.

It’s also possible to find argc/argv from the “_dl_argc”/"_dl_argv" symbols.

Rust’s standard library should not require “special” initialization in order to be happy with getting called as a library; down that path lies languages with runtimes.

The thread name doesn’t seem like a problem at all; even threads spawned by Rust don’t have names by default.

On some platforms we can potentially get arguments after the program starts, and on others we can’t; as long as we don’t crash on any platform, that seems fine. Having a function to set the arguments won’t work on all platforms (if you want the concept of changed arguments to apply to code not written in Rust). This doesn’t seem like a problem to me; if you want to handle arguments, you need to get them from the code implementing main(), even if written in another language. Perhaps we should have a convenient way of constructing an Args that way.

Regarding the stack guard initialization: as far as I can tell, that only exists to set up a signal handler to handle stack overflow and produce a friendlier error message. That’s a rather unexpected behavior, and programs that expect to handle SIGSEGV (or platform equivalent) themselves may find it surprising. I’d argue that we should document that better than we do, and ideally not do that by default. Could we have a compile-time option to select that behavior?

Finally, sys::init() itself does very little: on unix-like platforms it sets SIGPIPE to ignore (also not documented anywhere as far as I can tell), and on every other platform it does nothing.

So, overall, I don’t think there’s anything necessary in that initialization, and what is there we should either document or consider carefully removing or making optional.

1 Like

That’s a glibc-private symbol that we shouldn’t use. And it can become inaccurate if your program changes where the argument area points to.

In an ideal world, I’d love to have a prctl to get the values settable by PR_SET_MM_ARG_START and PR_SET_MM_ARG_END. I don’t know of any means of doing so (other than reading /proc/self/cmdline, which depends on a mounted /proc, which we can’t count on either).

I just found out that __attribute__((constructor)) functions get passed argc, argv and environ, so using that seems by far the best option.

1 Like

Where do you see that documented? I don’t see that anywhere in the documentation of __attribute__((constructor)). (Using that also seems likely to produce surprising behavior in some use cases and with some toolchains/linking.)

1 Like

The argv crate relies on static constructors being passed argc/argv/envp on Linux/MacOS. It doesn’t seem to be specifically defined by an ABI; glibc does pass these explicitly, but other environments may not (discussion: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223752).

2 Likes

I just posted https://github.com/rust-lang/rust/issues/62569 about Rust ignoring SIGPIPE on startup.

2 Likes