env::set_var seems extremely dangerous: it might technically be “safe” in some restrictive sense, but it isn’t really safe in practice if you’re dealing with non-Rust code… there isn’t any reasonable way to ensure that environment-dependent functions from the C library aren’t called at the same time as set_var. (Even the standard library itself screws this up: it calls execvp without taking the internal lock.)
Given that there isn’t really any good reason to modify the environment, I think env::set_var should just be deprecated. Is there something I’m missing?
Good proposal, but to be strictly safe it needs to be more than deprecated(*). glibc, which Rust uses on linux, has the attitude that setenv must not be called in a multithreaded program (Rust issue link).
(*) set_var could be changed to do nothing, or it could panic if called in a multithreaded program.
I don't think it would be that tenable to deprecate without replacement, it's an operation expected of many standard libraries (especially when interoperating with other code as you've mentioned). We should of course attempt to make it as safe as possible and fixup any holes, but there are just inherent caveats with environment variables which doesn't necessarily mean that Rust shouldn't expose it.
This sounds like a bug! Not necessarily a reason to remove the function outright...
Well, it's special in the sense that the access to the environment doesn't happen in the original process... but that doesn't really help in terms of thread safety.
Okay... maybe we can come up with some sort of proposal to deprecate it, and replace it with a function marked unsafe?
Also, any mutex which happened to be locked by any thread when the fork was called, will remain locked forever in the new process thanks to COW (unless it was in shared memory). So if you wanted ENV_LOCK just in case, you’d have to lock it across the whole fork. But I think it’s totally unnecessary.
There's a newly forked process that's about to exec... and that process's environment is corrupted because it was forked in the middle of a setenv call.
Right, we need to lock it across the fork. (The new process will never try to access ENV_LOCK, so it doesn't matter what state it's in when the process is forked.)
This has been discussed before and I personally haven't changed opinions since then. These functions are safe to call from Rust, and they're documented as being a hazard if you're mixing Rust and C, and I think that's a pretty ok compromise to make. I don't think having an unsafe set_var is to palatable and I don't think it would prevent anything like this deal with set_var + fork.
Sort of... the problem is that being "safe" is infectious; everything marked safe has to be safe to use together. If I'm publishing a crate, I want to be able to say "my crate plus the standard library is safe: you can't break the safety rules without an unsafe block", and we want to ensure that this is reasonable.
For example, suppose I'm the author of the time crate, and I want it to be safe combined with the standard library. https://doc.rust-lang.org/time/time/fn.at.html uses localtime_r to convert a time to the local timezone. I have to mark time::at unsafe; otherwise, you can write "safe" programs which segfautt.
Or suppose I want to write a safe crate which wraps GTK; the initialization function has to be unsafe because the user has to promise they won't call set_var at the same time.
Basically, I think making set_var safe draws the line for safety at the wrong place; being able to write safe wrappers for C code seems more valuable than being able to modify the C environment in safe code.
(Another possibility to get around this mess would be to build a copy of the environment in the Rust standard library: we could have a set_var function which is threadsafe, but doesn't modify the C environment. Probably more confusing than useful, though.)
Oh, and error_string in the standard library also needs to lock the environment. The number of places we screw this up in the standard library is a good indication that other libraries will run into trouble, especially given that it's impossible to fix code outside the standard library to take the Rust environment lock.
Given that there isn't really any good reason to modify the environment,
Eh, wait, that sounds wrong to me. When interoperating with foreign libraries, or executables (that are fork'ed and exec'ed), it is sometimes necessary to be able to modify the environment. At least on unixoid systems, the environment is a crucial part of the interface provided by the operating system. It's API may be very badly designed, but still Rust has to expose it. So, marking set_env unsafe would be fine, but deprecation without replacement would not.
My feeling is that saying that set_var is unsafe isn’t necessarily the right line either, however. We can just include as part of the “unsafe contract” that you carefully touch environment variables, but in terms of foreign code there’s no reason that set_var is more unsafe than var (as foreign code could be both getting and setting environment variables).
I’d be fine exposing a way to lock the environment manually, it seems like a reasonable thing that may be required to expose robust bindings in some cases.
Hmm... what would that look like? Maybe something like this?
/// Lock the C environment.
///
/// Safe Rust code cannot write to the C environment for the lifetime of
/// the returned EnvLockReadGuard.
///
/// Multiple EnvLockReadGuard instances can be active at the same time.
///
/// Any code which calls a function which might read the C environment
/// (including a lot of functions in the C standard library which might
/// not be obvious) should hold this guard while calling those
/// functions.
///
/// The guard can be held for an indefinite period of time, which might
/// be necessary for complicated libraries (like GUI libraries) which
/// spawn threads.
fn lock_environment() -> EnvLockReadGuard;
/// Guard the environment from writes.
///
/// EnvLockReadGuard meets the bounds `'static + Send`.
struct EnvLockReadGuard(...);
EnvGuard needs to be something like a RwLockReadGuard: if, for example, someone has a program which uses both some GTK and the time crate, both crates can correctly lock the environment without causing a deadlock. Not sure if Send is necessary; it would be convenient in some situations, but probably painful to implement.
Not sure if the corresponding write API is necessary; it's probably rare to run into C libraries which unavoidably mutate the environment.
If EnvGuard doesn't correspond to a real lock on WIndows, there might be subtle portability issues, but there's probably no good way around that.