Function to hold lock on execution environment (for FFI)

Having a free function in std::env that allows a user to obtain Rust's internal lock on the execution environment would allow for significantly safer usage of FFI. Currently, the documentation for std::env::set_var says

Note that while concurrent access to environment variables is safe in Rust, some platforms only expose inherently unsafe non-threadsafe APIs for inspecting the environment. As a result, extra care needs to be taken when auditing calls to unsafe external FFI functions to ensure that any external environment accesses are properly synchronized with accesses in Rust.

To my understanding, Rust's std::env::var and similar currently use an internal lock to ensure synchronization. This works great, so long as you stay within Rust. The second FFI is introduced, "extra care needs to be taken". Realistically, this isn't feasible currently, and this has led to soundness issues like time-rs/time#293; there are certainly other issues like this that exist in the ecosystem.

As a solution, it would make sense to have either a function that returns a lock (which releases on drop) or a function that accepts a closure, which is executed while the lock is held. This was briefly discussed in rust-lang/rust#27970. @RalfJung pointed out that the current use of the environment lock is incorrect, though I honestly don't understand how.

Speaking personally, I have seen a few crates pin time to 0.2.22, explicitly including unsoundness despite the fact that they could not possibly know whether the crate's consumers are upholding safety invariants (which are global in this case).

1 Like

(Random aside: as a programming language trivia question, "what did Python, Rust, Go etc. get wrong but Java get right"? has setenv() as an answer)

See also https://github.com/rust-lang/rust/pull/24741 in particular my big comment.

Anyways there's really no solution other than avoiding setenv() in a running potentially-multithreaded process. The only safe times to call it are process startup or between fork() and execve() - the latter is exposed via https://doc.rust-lang.org/std/process/struct.Command.html#method.env.

1 Like

It's certainly not foolproof, but I believe that having an API like this would be an improvement over status quo. FFI is inherently unsafe, but by having the ability to use a lock like this, we can still reduce the amount of unsafety.

Imo I think we should just deprecate set_env and replace it with an unsafe function. And I think any unsafety we get with better APIs in Rust is simply an illusion, as arbitrary C code can add getenv calls at any time, and that's an intractable problem to solve.

2 Likes

Well, yes. C code can have getenv calls at any time. If the person using the FFI wraps that call in a std::env::with_lock method, that would make it as safe to use as std::env::var. That's the whole point of me writing this :slightly_smiling_face:

So what does Java do?

A web search today only pulls up e.g. Stackoverflow posts on workarounds, but I remember coming across a document somewhere in the JDK docs that explained it wasn't threadsafe, so it was omitted.

Or actually...I may be thinking of chdir which is another classic Unix function that manipulates process-global state that predated multi-threading. Which Python, Rust, Go, etc. also happily expose. I don't even see any warnings about it in the docs for any of those languages! Maybe I'll try a PR for Rust at least.

My understanding/recollection of the history here is that when Java was designed from the start it was all about threading (a notable difference from e.g. Python), so they carefully thought about each API from that perspective. My guess is another factor is that the JVM has a significant runtime which links to various C libraries too that might invoke getenv(). On the flip side I think there is/was a sentiment that rather than use FFI you'd rewrite everything in Java.

Probably in Rust indeed they should both be unsafe, but we had that debate in the PR above. Can't quite deprecate them because you can safely invoke them early in process startup.

One heavy-handed option I can think of is a wrapper like setenv_early() and std::env::set_current_dir_early() that use an operating-system specific method to check whether the process has multiple threads, and if so returns an error or so.

3 Likes

I gave a few examples in the PR above of when using a function like that would require nontrivial modifications to the underlying C/C++ library. Even still today the compiler rustc links to LLVM which has a bunch of hits for getenv() in the source - a lot looks to be in tests or other language frontends, but I don't know LLVM enough to say for sure there are no getenv() in paths invoked via rustc. Even more complex I bet there will soon be real-world cases where someone is using libcurl with the hyper backend in a Rust app - in this scenario there are two Rust libstd in the process and hence two locks...

Don't get me wrong, there are definitely simple cases that could be solved if libstd exposed the mutex, but nontrivial cases get very messy with that. Moreover in general people are most often only going to figure out they need to do this the hard way (having their code race at runtime).

Anyways we're really just re-iterating here what's been discussed repeatedly in issues and PRs you linked to originally. I think the "just don't call setenv()" approach has been the consensus so far.

Any thoughts on my half-baked idea for setenv_early() above as a 3rd option? At least most of the time it would fail reliably if one e.g. changed code to add a Rayon par_iter() on some early code before setenv() was invoked.

1 Like

But then you lose all the benefits of threading, as you end up serializing all your FFI calls.

1 Like

I suspect most case are what you're referring to as the "trivial" case. Without full control over the backend, we can never be perfect. But that's okay; something is better than nothing.

Moreover in general people are most often only going to figure out they need to do this the hard way (having their code race at runtime).

Right now the way people find out is via a segfault. There must be something to avoid this. The status quo is stdlib tells us "do this", but doesn't tell us how (and as far as I can tell it's not possible).

Anyways we're really just re-iterating here what's been discussed repeatedly in issues and PRs you linked to originally. I think the "just don't call setenv() " approach has been the consensus so far.

I'm writing a library. It is unacceptable to me to set such a seemingly unrelated requirement when someone is trying to obtain the current time of their system in their UTC offset. This is a soundness bug. Telling people "don't do that" is not a solution — if it were, why do we use Rust over C/C++ at all?

Any thoughts on my half-baked idea for setenv_early() above as a 3rd option? At least most of the time it would fail reliably if one e.g. changed code to add a Rayon par_iter() on some early code before setenv() was invoked.

setenv() is not the issue. Those are already synchronized within Rust code. To repeat what I quoted in the first post:

extra care needs to be taken when auditing calls to unsafe external FFI functions to ensure that any external environment accesses are properly synchronized with accesses in Rust.

What on earth is "extra care" here supposed to mean? There is no solution to this currently. We need a way to synchronize FFI calls that access the environment. There is no other way that me or anyone else knows of to "ensure that any external environment accesses are properly synchronized" as the documentation suggests we do.

The internal lock can be switched to a RwLock. That would allow concurrent access while disallowing mutation at that moment.

A bit of a wild idea, and I don't know what the possibilities are here, but would it be possible to still link to libc and then override all of setenv, getenv, unsetenv LD_PRELOAD style, with other implementations? Considering the glibc/dynamic linking case primarily. Again, we could look for a practical solution and limited to the most common libcs - even go as far as investigate if it's possible to improve getenv/setenv in glibc (I don't have high expectations here).

2 Likes

Notice that getenv() returns const char *; this is why Rust's libstd copies it while holding the mutex. In C every concurrent setenv() call can invalidate all outstanding pointers returned from getenv() and cause them to read freed memory with potentially arbitrary consequences.

I think the idea of having something like getenv_locked_dup() and setenv_locked() (i.e. make the API calls look like what Rust's libstd does) and changing all C/C++ libraries to use them was brought up but...again the standards committee settled on the "don't do it" approach. It's not a problem you can solve with LD_PRELOAD.

By far the most common case for "I want to call setenv()" is changing the environment for a child process - that's already covered with a safe Rust API. I looked at the linked chrono issue and it seems the reporter is saying "calling getenv() is unsafe". No - it's more that calling setenv() is potentially unsafe.

OK and following issues around it looks like activity picked up here too.

Having thought a bit more about this, the idea thrown out there on #27970 is growing on me: make set_var panic if multiple threads are running. Given that Rust allows breaking changes to fix soundness issues, this worked be allowed. What are your thoughts on this?

It's worth considering.

I went to github code search just to look around (searching for set_var language:rust) and this example code showed up immediately:

/// #[tokio::main]
/// async fn main() {
///     env::set_var("AWS_ACCESS_KEY_ID", "ANTN35UAENTS5UIAEATD");
///     env::set_var("AWS_SECRET_ACCESS_KEY", "TtnuieannGt2rGuie2t8Tt7urarg5nauedRndrur");

Doesn't tokio::main start the thread pool? So multiple threads are already running when you enter main, here there would be a problem with that rule(?) It's still worth exploring how it could be done.

(And another point - lots of testing code that uses set_var. And the test runner is multithreaded. So it's a hurdle in a common use case - and this common case of testing might be a place where it's common to run into the soundness issues?).

Further though, isn't the problem just as much getenv and setenv in ffi and dependencies? Just changing set_var seems insufficient.

1 Like

To a point yes, but setvar is MT-unsafe even in C, so you're not really supposed to call it in the presence of threads (as glibc and musl will no doubt claim), but getvar is not considered to unsafe and there are no restrictions/guidelines on calling it.

1 Like

as glibc and musl will no doubt claim

Indeed, glibc's (and POSIX's) position is clear on this matter:

As soon as multiple threads are in use a program must not modify the environment and therefore calls to getenv are safe.

1 Like

Yeah and all that code is wrong because set_var in one test "leaks" into other tests. Been there, done that.^^

Well, I'd phrase this as "Ulrich Drepper's position is clear on this matter". :wink: Just a few responses down, someone says

With that said, I agree that Ulrich Drepper is wrong to claim that modifying the environment in a multi-threaded program is not allowed.

His views are hardly unique and the important point is that it's the glibc view:

Modifications of environment variables are not allowed in multi-threaded programs. The getenv and secure_getenv functions can be safely used in multi-threaded programs.

I'm not making an argument for or against any particular solution but I'm trying to be clear on how this is viewed by the people who make the libraries we're talking about.

That's the same Ulrich Drepper speaking there, though. :wink: (I am sure there are other people agreeing with him, but your link doesn't prove that.)

Thanks, that is definitely interesting.

I guess the discussion in the previous bugreport was about whether that is POSIX-compliant. The proposal here doesn't sound like there is broad consensus for the position that the environment must be immutable.

The fact that several decades after POSIX threads were introduced, this still seems to be an open problem, doesn't exactly inspire confidence either...^^

That text from the glibc manual was also written by Ulrich Drepper, for what it's worth. (He was the glibc maintainer at the time.) Interestingly, the manual for previous versions of glibc stated:

All of the following functions can be safely used in multi-threaded programs. It is made sure that concurrent modifications to the environment do not lead to errors.