Summary
This would make it possible from std
to access the original NUL-terminated UNIX process arguments, without having to convert to and from OsString
- which reallocates them, removes the NUL terminators, and on Windows changes their encoding.
Today std
does not provide a way to access process args or env vars without these allocations and reencodings. This means in use cases such as FFI calls, further allocations and reencodings are needed to get them back to the original (currently inaccessible) representation.
This proposal makes the original representations available using std::os::unix::env::argc()
and argv()
, new functions which follows the pattern of OS-specific std::os
functions like std::os::unix::fs::chown
.
There could be a separate case made for doing this for other operating systems besides UNIX, and also for doing it for environment variables as well. This proposal puts those out of scope because there are direct FFI workarounds for those use cases, whereas those workarounds are not available on UNIX targets.
Motivation
When making FFI calls from Rust on UNIX targets, it's common to need NUL-terminated UTF-8 strings. The same is true of NUL-terminated UTF-16 strings on Windows FFI calls. If these strings are obtained from environment variables or process arguments, on both UNIX and Windows targets, they already exist in the required format in memory.
Today in std
it's only possible to access these values via VarsOs
and ArgsOs
, both of which are iterators over OsString
values. These strings are not in the original format; they have been reallocated and had their NUL terminators dropped, meaning that further allocations and conversions are necessary to get them back into their original form.
On Windows, these allocations and conversions can be avoided through a direct FFI call to GetCommandLineW
. There is an equivalent for this on some UNIX systems (e.g. macOS) but on others, there is no direct FFI call which exposes these.
This proposal would make all of these unnecessary allocations and conversions avoidable on UNIX using only std
and no FFI.
Guide-level explanation
When writing FFI code that targets a particular OS, you may find that the function you're calling requires strings in a NUL-terminated format. Rust's String
and OsString
are not NUL-terminated, so if you have one of these, you'll need to do some conversions to use them in these FFI calls.
Whenever the strings you're passing happen to come directly from environment variables or process arguments, you can potentially avoid these conversions. For example, UNIX stores both env vars and process arguments in NUL-terminated strings, so you can avoid reencoding them to and from OsString
or String
by accessing pointers to the original strings using the target-specific VarsOsExt
and ArgsOsExt
traits.
Here's an example on UNIX of using ArgsOsExt
to avoid reencoding and allocations when making a FFI call to execvp
:
use std::os::raw::{c_char, c_int};
use std::os::unix::env;
extern "C" {
fn execvp(file: *const c_char, argv: *const *const c_char) -> c_int;
}
fn main() {
let args: &[*const c_char] = unsafe {
std::slice::from_raw_parts(env::argv(), env::argc())
};
// Skip the first argument (it's usually the path
// to this executable), and treat the second one
// as the path. Forward the remaining args to execvp.
unsafe {
execvp(args[1], args[2..].as_ptr());
}
}
Keep in mind that these are raw pointers to mutable data. Both environment variables and process arguments can be mutated, and any of these pointers may be null.
Proposed Design
Introduce these functions to a new module, std::os::unix::env
:
fn argc() -> usize;
fn argv() -> *const *const c_char;
These functions would read from these atomics, which is why they do not need to take &self
.
Today, these atomics are not exposed, and there is no direct FFI-based workaround to access the values they hold. That's in part because they rely on non-standard link_section
extensions.
Alternate Designs
These functions could use CStr
over *const c_char
, but then they would have to be unsafe
because CStr
requires that the pointers be non-null, which is not a guarantee in this case. Additionally, since the motivation for this is FFI, the CStr
s would likely need to be converted into *const c_char
s anyway, so overall CStr
seems both unsafe and unhelpful here.
It might sound reasonable to have a function which returns a slice instead of separate functions for argc
and argv
. However, as a comment in the current UNIX args implementation notes, argc
is not necessarily an accurate length for argv
, meaning that building a safe slice would require traversing the argv
until a null pointer is encountered—which would be undesirable given that the motivation for this use case is to avoid overhead.
As an alternative, it could make sense to have an Iterator
which iterates over argv
until it encounters a null, and uses argc
for a size_hint
only. That said, as shown in the guide-level explanation example, there are certain FFI use cases where having access to the raw pointers is more helpful than an iterator. So it seems like the minimal proposal here would be to expose the pointers, and then optionally an iterator convenience method could be discussed on top of that.
Prior Art
There are various OS-specific functions in std::os
already, like std::os::unix::fs::chown
.
Future Additions
Even though there are already FFI workarounds for them, it could be worthwhile to offer ArgsOsExt
implementations for other target OSes, such as Windows and WASI.
Doing something similar for environment variables could be worthwhile, as they have the same characteristic today of always needing to be converted to OsString
even if the desirable format is the one the OS already has in memory. However, there are already direct FFI workarounds to access this on all OSes, which is why this proposal leaves env vars out of scope.