Pre-RFC: deprecate env::set_var

env::set_var seems extremely dangerous: it might technically be “safe” in some restrictive sense, but it isn’t really safe in practice if you’re dealing with non-Rust code… there isn’t any reasonable way to ensure that environment-dependent functions from the C library aren’t called at the same time as set_var. (Even the standard library itself screws this up: it calls execvp without taking the internal lock.)

See https://doc.rust-lang.org/nightly/std/env/fn.set_var.html for more details on why set_var is unsafe.

Given that there isn’t really any good reason to modify the environment, I think env::set_var should just be deprecated. Is there something I’m missing?

3 Likes

Good proposal, but to be strictly safe it needs to be more than deprecated(*). glibc, which Rust uses on linux, has the attitude that setenv must not be called in a multithreaded program (Rust issue link).

(*) set_var could be changed to do nothing, or it could panic if called in a multithreaded program.

Surveying Servo and all of its dependencies, set_env is used for the following things:

  • Build scripts, to set up vars for subsequent tools
  • Unit tests, to change config params in libraries (e.g., timezone info)
  • Turn on RUST_LOG (is there an API for this?)

I have a std::env::set_var("RUST_BACKTRACE", "1"); at the start of all my programs :-/

I don't think it would be that tenable to deprecate without replacement, it's an operation expected of many standard libraries (especially when interoperating with other code as you've mentioned). We should of course attempt to make it as safe as possible and fixup any holes, but there are just inherent caveats with environment variables which doesn't necessarily mean that Rust shouldn't expose it.

This sounds like a bug! Not necessarily a reason to remove the function outright...

1 Like

Modifying the environment seems to be fundamentally incompatible with multithreading, though, we can’t just will this to be safe.

Whatever is decided, env::remove_var should follow suit since unsetenv is equally unsafe.

That's in fn child_after_fork(), which is a rather special situation, no?

Well, it's special in the sense that the access to the environment doesn't happen in the original process... but that doesn't really help in terms of thread safety.

Okay... maybe we can come up with some sort of proposal to deprecate it, and replace it with a function marked unsafe?

What threads? This is a newly forked process that's about to exec -- there's nothing else going on.

Also, any mutex which happened to be locked by any thread when the fork was called, will remain locked forever in the new process thanks to COW (unless it was in shared memory). So if you wanted ENV_LOCK just in case, you’d have to lock it across the whole fork. But I think it’s totally unnecessary.

It sounds like the env lock can cause a deadlock on fork + set_var, unless it’s set up to unlock itself on fork (pthread_atfork).

There's a newly forked process that's about to exec... and that process's environment is corrupted because it was forked in the middle of a setenv call.

Right, we need to lock it across the fork. (The new process will never try to access ENV_LOCK, so it doesn't matter what state it's in when the process is forked.)

Ok, point taken. The exec is sort of an afterthought to the problem, that fork may snapshot the environment during another thread's modification.

This has been discussed before and I personally haven't changed opinions since then. These functions are safe to call from Rust, and they're documented as being a hazard if you're mixing Rust and C, and I think that's a pretty ok compromise to make. I don't think having an unsafe set_var is to palatable and I don't think it would prevent anything like this deal with set_var + fork.

Sort of... the problem is that being "safe" is infectious; everything marked safe has to be safe to use together. If I'm publishing a crate, I want to be able to say "my crate plus the standard library is safe: you can't break the safety rules without an unsafe block", and we want to ensure that this is reasonable.

For example, suppose I'm the author of the time crate, and I want it to be safe combined with the standard library. https://doc.rust-lang.org/time/time/fn.at.html uses localtime_r to convert a time to the local timezone. I have to mark time::at unsafe; otherwise, you can write "safe" programs which segfautt.

Or suppose I want to write a safe crate which wraps GTK; the initialization function has to be unsafe because the user has to promise they won't call set_var at the same time.

Basically, I think making set_var safe draws the line for safety at the wrong place; being able to write safe wrappers for C code seems more valuable than being able to modify the C environment in safe code.

(Another possibility to get around this mess would be to build a copy of the environment in the Rust standard library: we could have a set_var function which is threadsafe, but doesn't modify the C environment. Probably more confusing than useful, though.)

Oh, and error_string in the standard library also needs to lock the environment. The number of places we screw this up in the standard library is a good indication that other libraries will run into trouble, especially given that it's impossible to fix code outside the standard library to take the Rust environment lock.

1 Like

Given that there isn't really any good reason to modify the environment,

Eh, wait, that sounds wrong to me. When interoperating with foreign libraries, or executables (that are fork'ed and exec'ed), it is sometimes necessary to be able to modify the environment. At least on unixoid systems, the environment is a crucial part of the interface provided by the operating system. It's API may be very badly designed, but still Rust has to expose it. So, marking set_env unsafe would be fine, but deprecation without replacement would not.

You can use execve to create an executable with a new environment.

Ah, good point. That doesn’t help for shared libraries, though.

@eefriedman

My feeling is that saying that set_var is unsafe isn’t necessarily the right line either, however. We can just include as part of the “unsafe contract” that you carefully touch environment variables, but in terms of foreign code there’s no reason that set_var is more unsafe than var (as foreign code could be both getting and setting environment variables).

I’d be fine exposing a way to lock the environment manually, it seems like a reasonable thing that may be required to expose robust bindings in some cases.

Hmm... what would that look like? Maybe something like this?

/// Lock the C environment.
///
/// Safe Rust code cannot write to the C environment for the lifetime of
/// the returned EnvLockReadGuard.
///
/// Multiple EnvLockReadGuard instances can be active at the same time.
///
/// Any code which calls a function which might read the C environment
/// (including a lot of functions in the C standard library which might
/// not be obvious) should hold this guard while calling those
/// functions.
///
/// The guard can be held for an indefinite period of time, which might
/// be necessary for complicated libraries (like GUI libraries) which
/// spawn threads.
fn lock_environment() -> EnvLockReadGuard;

/// Guard the environment from writes.
///
/// EnvLockReadGuard meets the bounds `'static + Send`.
struct EnvLockReadGuard(...);

EnvGuard needs to be something like a RwLockReadGuard: if, for example, someone has a program which uses both some GTK and the time crate, both crates can correctly lock the environment without causing a deadlock. Not sure if Send is necessary; it would be convenient in some situations, but probably painful to implement.

Not sure if the corresponding write API is necessary; it's probably rare to run into C libraries which unavoidably mutate the environment.

If EnvGuard doesn't correspond to a real lock on WIndows, there might be subtle portability issues, but there's probably no good way around that.