Synchronized FFI access to POSIX environment variable functions

For clarity, if #[deprecated_safe] ends up panning out, I will close the pull request. I have requested it not be merged until a final decision is made on that front because of the (in my opinion) better alternative.

It is what I needed 100% of the cases where I so far used set_var/remove_var (which is mostly inside cargo-miri). Refactoring that code to remain safe when those functions are unsafe will be annoying -- not terrible, and if they had been unsafe to begin with that's what I would have done, but given that we had (and have) safe environment functions, of course I wrote my code to match.

What do you think are people going to do in that situation? I think a significant fraction of them will just add the unsafe block and be done with it. For many of them the unsafe premise (no other threads) might even be correct, for some it won't. So the result of your proposed plan is, in the optimistic case, a bunch of existing safe environment-touching code becoming unsafe. In the pessimistic case, that code becomes unsound.

I understand your objection to a Rust shadow environment, and I don't like it either. What I am missing is a realistic alternative. What is your plan to deal with this problem I am predicting? I am astonished that many people here seem to think that all of the existing users of set_var/remove_var will carefully ensure that there are no other threads before adding unsafe blocks around their existing, working, code. I am equally astonished about the proposal to declare a non-trivial fraction of existing, working, code unsound without any easy migration path. This is even more astonishing given that you are are usually very reluctant about subtle forms of UB.

I am fully in favor of deprecated_safe. But given that we have 7 years of code that were written assuming those functions are safe, I think it's not enough.

4 Likes

How about:

/// Sets the **shadow** environment variable `key` to the value `value` for the currently running process.
///
/// ... rest of the docs
#[deprecated = "Confusing, use set_shadow_var or set_system_var instead"]
pub fn set_var<K: AsRef<OsStr>, V: AsRef<OsStr>>(key: K, value: V) {
    set_shadow_var(key, value)
}


/// Sets the **shadow** environment variable `key` to the value `value` for the currently running process.
///
/// ... rest of the docs
pub fn set_shadow_var<K: AsRef<OsStr>, V: AsRef<OsStr>>(key: K, value: V) {
    // ...
}

/// Sets the **system** environment variable `key` to the value `value` for the currently running process.
///
/// ...
/// ## Safety
/// ...
pub unsafe fn set_system_var<K: AsRef<OsStr>, V: AsRef<OsStr>>(key: K, value: V) {
}
  • Old code gets warned of changed behavior
  • Old code that doesn't interact with system can just rename to shadow version and be done with it
  • Old code that intended system will have to be audited, but it was probably already broken
  • New code chooses one or the other from beginning and perhaps nudges people to avoid writing env vars entirely.

Or maybe even, in case of set_var write from shadow to system if we can undeniably prove that there's only one thread. That way most of the existing sound code intending to use system will continue to work, unsound code intending to use system will just break instead of UB-ing.

Behavior differences that aren't reflected in the return value sounds terrible. How is someone to know that writing to the system env happened in that case?

I don't think we want set_var to be the one that affects the shadow environment. That is not what it currently does. A shadow environment is sufficiently surprising that it needs opt-in from crate authors, so they can ensure that indeed they don't need non-Rust code in the current process to read this environment.

3 Likes

If there is a single thread then it did happen. If there isn't you actually don't know if "it happened" - it may have been overwritten in the other thread right after you wrote (even with "safe" locking), so the behavior already is terrible. I'm not saying my suggestion is great but seems to be least disruptive.

@RalfJung It certainly is not a great outcome. I reminds me of situation with mem::uninit which some crates used as a (subjectively horrible) hack to implement unreachable_unchecked. Turning it to panic was technically breaking change even for arguably questionable but technically valid code (I believe; correct me if I'm wrong), yet the decision was to do exactly that. And I think it was the least bad decision.

set_var affecting shadow environment is bad but it may be least bad decision. I'm personally not sure but would like to see some comparison with historic cases (were there other?) and some nice argument why it's different this time.

But if the decision is to keep set_var as-is and add shadow environment I propose to make shadow environment thread local for the exact reason I explained above. Or perhaps allow explicit locking. Actually, maybe not being able to hold a lock while you run Command is a good argument that even if there was no UB around setting global env, it's still very impractical.

TOCTOU has always been a problem with environment manipulation in the presence of threads. Even the shadow environment doesn't have this guarantee because you could have been suspended between the unlock and the return while another thread has its fun with the environment.

No that was not a breaking change. Panicking is a correct implementation of unreachable_unchecked.

However, the much more problematic code was the one that used mem::uninitialized to create values they actually wanted to then initialize and worth with. There we had an alternative available when mem::uninit got deprecated, and a lint to find definitely bad uses of mem::uninit, and a lot of other effort, and still you can find a bunch of MaybeUninit::uninit().assume_init() out there where people couldn't be bothered to actually fix their code. And that was from one unsafe API to another, here we are talking about code that is currently nominally safe!

Porting code away from set_var to not using the environment is, I think, less local and hence more tricky than porting from mem::uninit to MaybeUninit. So if the mem::uninit story teaches us anything, I think it is that we do need to provide an easy target to port to, like a Rust shadow environment.

No, that would make it entirely unsuited for its one and only purpose -- to port existing code.

The shadow environment is a thread-safe global HashMap<OsString, OsString>, there is no need to expose its lock. We don't need transactional access to that global map. Even if there are multiple threads, as long as they all work on different env vars, there is no race condition.

2 Likes

Indeed, that's why I suggested making it thread-local. However as @RalfJung said:

its one and only purpose -- to port existing code.

I didn't think of it in this way before but now realized that fixing these issues is best done with direct env configuration of Command or analogous constructs in FFIed libs anyway. So maybe it's not really the best either. I guess the doc should say it out loud.

The reason I suggested thread local is I came across a PR that wanted to use external command-calling library together with set_var. Ofc I immediately pointed out that this was bad in multithreaded code. I got told that library-native approach didn't work. In cases like this it'd be really helpful to have thread-local environment. It would be possible to have both (thread local having priority over global) but maybe it's too much and the library needs to be fixed anyway (if it's really broken and not some usage mistake). :man_shrugging:

@RalfJung

Panicking is a correct implementation of unreachable_unchecked.

It is UB-correct but often people use unreachable_unchecked because they don't want the branch there (enum matching). By reintroducing the branch the code is "broken" in the sense it no longer works as they intended and explicitly wrote.

Agree with the rest of your points, transition to a sound solution should be as easy as possible.

Yes that is buggy indeed. But it is buggy already with the current API as you said, so when we provide an API to migrate existing users to, it's okay for this to remain buggy. :wink:

That would only help if the external library also used that thread-local environment. Presumably this is not a Rust library, so a thread-local Rust shadow environment would be of no use.

Also even if this is a Rust command-calling library, I don't think the Rust standard library should have a thread-local environment just to work around the fact that the library-native approach of env passing in some crate is buggy. :joy: That's really a bug in that command-calling library and needs to be fixed there. There is very little the people calling the command-calling library can do about this, they need to get their dependency fixed or use another one.

It is. :slight_smile:

Yeah, seems like if we want to help people not to make the mistake in the first place this problem should be documented in set_var function and its future replacement(s).

I'll +1 @josh re: somewhat strong opposition to a "shadow environment". It is not a zero cost abstraction, and from a security perspective might give the impression that secrets (which really shouldn't be kept in environment variables, but that ship has sailed and I digress) have been removed from the environment when they have only been removed from the shadow environment, not the process environment. This has turned into something of a debacle with Java.

Based on some informal surveys done earlier in this thread of in-the-wild Rust code, it really seems like the main motivations for set_var are effectively missing APIs in Rust itself, where instead of having a first-class API to do something, environment variables are used instead.

I agree and think we do need to give people something to migrate to, but I think a better solution is mapping out those required APIs and providing a migration document.

4 Likes

I wasn't aware of this being a problem in practice, do you have a reference for that?

That doesn't capture my uses of these functions. Maybe those are a bit niche, since cargo-miri is essentially doing nothing but setting up the environment for invoking cargo. It's more like a giant shell script. And in a shell I can mutate the environment easily and safely (well, "safely", it's still a shell script).

I'm also absolutely in favor of having those APIs. I am just not sure it's enough.

Here's an example of the hoops people have jumped through to unset system environment variables in Java:

https://blog.sebastian-daschner.com/entries/changing_env_java

env::remove_var already doesn't remove environment variables from memory:

$ cat > env_test.rs <<EOF
extern "C" { fn getenv(s: *const u8) -> *mut u8; }
fn main() {
    let secret_addr = unsafe { getenv("SECRET\0" as *const str as *const u8) };
    std::env::remove_var("SECRET");
    loop {}
}
EOF
$ rustc env_test.rs -g
$ SECRET=my_secret_var gdb ./env_test
(gdb) run
^C
(gdb) p secret_addr 
$1 = (*mut u8) 0x7fffffffefc7
(gdb) p 0x7fffffffefc7 as &[u8; 13]
$6 = (*mut [u8; 13]) 0x7fffffffefc7 b"my_secret_var"

Could cargo-miri's use case be covered by a crate? If so a shadow-env crate could be published and the migration guide could point people at the new APIs for the common cases and to the crate for cases more like cargo-miri.

I think having the shadow environment be opt in via a separate dependency would alleviate at least some of the concerns here about the shadow environment being the default.

I think the important part is that they're removed before forking so the forked process can't read them after execing.

1 Like

Note that env::remove_var will still be available, you'll just need to make sure it is safe to call.

And env::remove_var_rust (or whatever the name would end up being) should hopefully make it clear from the name that this is not your usual remove_var operation (unlike the Java API which hides the fact that it acts on a shadow environment).

Not very well... you'll also have to replace all env::var and Command by the versions from that crate, which you might not be able to do if they are not in your code.

I am not suggesting to make it the default. I am suggesting to offering it as a way to port existing code, with documentation clearly calling out that using this is bad style.

It doesn't matter if env vars are removed before or after forking as exec replaces the environment anyway. If we introduce a rust shadow environment, this shadow environment would be passed to exec, not the original C environment. The only issue is with C accessing the original environment.

@bjorn3 but the main problem here is FFI use cases. A C library wrapped via an FFI binding can call fork/exec, in which case it might use the system environment, not the shadow environment.

And if it's implemented in a separate set of parallel APIs like env::remove_var_rust like @RalfJung was suggesting, and the shadow environment becomes the "default" for fork/exec, isn't that a breaking change? That means env::remove_var will no longer modify the environment passed to fork/exec, which means anyone currently relying on it to clear secrets will no longer have their secrets cleared from the environment.

IMO having a mutable shadow environment which can diverge from the system environment is going to make things a lot more confusing.

1 Like