Synchronized FFI access to POSIX environment variable functions

Regarding the names, I'd recommend

  • std::os::raw::{get,set}_var, which would be unsafe, and interact with the environment in a way FFI can see (and mess with, hence the unsafe).

  • and env::bikeshed_name::{get,set_var} (maybe env::rust::…?) for the safe Rust-shadowed environment: the env would be cloned at init by the runtime, and then these functions would be working with this rust-specific clone / masquerade rather than with the FFI-corruptible env.

    • To reduce friction, I would imagine that Commands ought to be tweaked to load from this shadowed env.

The current non-unsafe functions would be deprecated (maybe in a warn-by-default fashion for the current editions, and even denied later on with cargo fix suggesting to replace their usage with rust::'s?)

That way, the non-FFI users would still have access to a non-unsafe non-deprecated API; and the ones knowingly tweaking the environment in a way FFI can see but prior to FFI calls, from within the main thread, could unsafe-ly assert so by using the unsafe API.

5 Likes

How would this apply to platforms that already have a safe get/set? Would they have to maintain Rust-only shadowed copy too? Or would they continue to use the platform functions under the hood?

1 Like

What platforms would that be? How is safe access to environ in a int main(int argc, char* argv[], char** environ[])-using program mediated on that platform?

In particular I'm thinking of Windows where Rust uses SetEnvironmentVariableW and does not access the environment directly.

1 Like

I'm not sure about that. The search results include a lot of vendored copies of pkg-config and cc, which use it only in tests. Couple of other libraries I've seen there seem to only read the variables from Rust.

I think any kind of Rust-private environment would be a footgun; people set environment variables because they expect those variables to affect the environment, and if that works in Rust but suddenly stops working when accessed from C or when spawning a process by some means other than Command, that feels error-prone.

I'd prefer to just have an unsafe std::env::set and std::env::get. (And if we're going to make them unsafe anyway, perhaps we should drop our internal locking entirely.)

3 Likes

Suppose you call into some FFI library that mutates the global environment, and then you run a Command. Does it notice? Why or why not?

I've been loosely following this conversation, so apologies if this has already come up, but I have a question: how do other environments handle this? I imagine for some programming languages, like e.g., Javascript or Python, common implementations might not allow direct control over in-process parallelism, and so mutating the environment isn't unsafe. But surely there are some programming languages that both permit modifying the environment and direct control over parallelism? e.g., Java , Go and Haskell come to mind. Perhaps there are others. Do these languages have procedures to prevent UB for environment mutation, or do they just not address it?

2 Likes

I think Go makes a private copy of the environment and manipulates this copy: https://cs.opensource.google/go/go/+/master:src/syscall/env_unix.go?q=Setenv&ss=go%2Fgo

Haskell uses SetEnvironmentVariableW on Windows and unsynchronized putenv on Unix: System/Environment.hs

Java only allows setting environment variables using hacks I think: How do I set environment variables from Java? - Stack Overflow

1 Like

So are you saying both should be unsafe? (Your earlier post only mentions set being unsafe, and in principle making the setter unsafe is sufficient.)

Well, Rust calling get while C calls setenv is just as dangerous as Rust calling set while C calls getenv. But I suppose you could say that well-behaved C code shouldn't be calling setenv in multithreaded code, and if it does, the memory unsafety is its fault. It does seem that making get unsafe would be much more disruptive than making set unsafe, unless we followed the 'shadowed environment' approach.

4 Likes

This thread seems to have returned to "make setenv unsafe" but I think we should weigh that against the alternative of something like:

fn try_setenv(key, value) -> io::Result<()>

which would fail if multiple threads were active.

It is safe (and reasonable) to invoke setenv():

  • Early in process startup
  • Between fork() and exec() - and remember that fork() inherently returns to a single thread

Broadly speaking most people would do s/setenv/try_setenv().unwrap()/.

5 Likes

Python simply passes through os.environ["FOO"] = "BAR" into a call libc putenv(). A while ago, this came up in e.g. Anaconda which is a Python program that has slowly grown more multithreaded, but also links to a lot of C libraries some of which make threads. And hard to debug random crashes during operating system installation were the result.

5 Likes

True. But again, the link Ralf posted:

https://sourceware.org/bugzilla/show_bug.cgi?id=15607#c3

shows how easy it is to accidentally do something that spins up some threads.

I know Darwin goes to certain lengths to have some system APIs avoid creating threads in single-threaded programs, but it's not perfect.

Even in cases where it's possible, or indeed easy, to ensure env::set calls are performed early enough that there's only one thread, that doesn't mean that existing code does so. So there's still a major backwards compatibility hazard.

2 Likes

Yes, but doing so with try_setenv() would safely panic - and do so deterministically, because if there are no other threads there can't be anything racing to spin up a new thread. (Although perhaps it'd be more strict to verify if any other threads have ever existed)

(BTW, if you click through the link in that sourceware BZ to the original GNOME BZ - that's me. I'm the "setenv() is unsafe with threads, ask me how I know" guy in my circle of coworkers :wink: )

I'm not clear what you're arguing there. Is that an argument for doing nothing, or for making the existing setenv unsafe versus adding a new API?

1 Like

How woud you detect if multiple threads are running in the first place? std::thread::spawn having been called once is not the answer. The thread may have been created using pthread_create or similar, be it by rust code or by other libraries. Maybe even by a C application using a Rust library. Heck the thread may be created while try_setenv is running. For example on Windows where a process can inject a thread into another process. This is commonly done by both malware and I believe antivirus software. In the case of antivirus software how would you ignore it's injected thread to allow try_setenv to even work directlt at the start of main?

4 Likes

FWIW an unsafe std::env::set does not preclude a safe std::env::try_set.

1 Like

That one thing cannot happen, since if there is no thread yet then there is no other code running concurrently with try_setenv.

But I agree with the rest of what you said -- is there even a realistic implementation strategy for try_setenv that works on all affected (i.e., POSIX) platforms that Rust supports?

I would say that is like someone writing to /proc/PID/mem on Linux -- inherently cursed OS-level operations that we assume the environment does not perform. Rust safety inevitably is contingent on some environment assumptions when running on an OS with such cursed operations.

Windows doesn't have the problem we are discussing here (AFAIK), so no check is even needed there.

Indeed, and try_set might not even have to live in the standard library.

1 Like

On Linux at least, std::fs::read_dir("/proc/self/task")?.count().

Does something similar exist for Unix in general? If so that should be all that's necessary, I'd think. TOCTOU might have something to say about that, though.