Synchronized FFI access to POSIX environment variable functions

Not necessarily. Another approach would be to provide replacements for both getenv and environ that are thread-safe by guaranteeing internal synchronization; the environ replacement could be a listenv function or similar. setenv could either just become thread-safe or get a similar thread-safe replacement function. (And putenv would just not have a replacement at all.)

None of that would require exposing the internal lock.

2 Likes

I don't think many people run containers that don't expose /proc/self. A bit more realistic scenario is someone writing Rust code as /bin/init that would need to take care of mounting /proc.

But the result in either case is just that we assume the process is single threaded and allow setenv(). I would hope that any threading issues across most of the crates ecosystem has been flushed out already in "normal" usage where /proc is mounted. Even for the "writing /bin/init" case, it's likely they have some CI setup that at least runs cargo test unit tests in a Linux container with /proc that might catch relevant setenv() calls.

(Another angle here is we can easily add a Rust hook to std::thread::spawn() that sets an internal flag as others have suggested; the "writing /bin/init in Rust" case is much less likely to rely on a C library spawning a thread)

The thing is, set_var() is already safe for pure Rust code. The problem comes more into play when one has threads created by C (or another language, potentially e.g. a Rust library embedded in a Go process, etc.). So any hooks that only operate at the Rust level aren't going to catch those, and I think "Rust + $other language" is almost the entirety of the problem space here. Even rustc links to LLVM, a large C++ application.

1 Like

Interestingly I am not sure that is true, currently. Command::spawn() only captures the environment in the parent process, if we need a changed environment for the child process. (See capture_env()). If another Rust thread is doing a env::set_var() and holding the lock, while the fork() happens, the environment in the forked child process might be broken.

We can always capture the environment in the parent process to fix this, at some performance cost.

But if come back to initial problem. For example chrono uses std::env::set_var and std::env::var_os every time in local time to utc conversation. Like this:

 use std::env::{remove_var, set_var, var_os};
    extern "C" {
        fn tzset();
    }

    let ret;

    let current_tz = var_os("TZ");
    set_var("TZ", "UTC");
    tzset();

    ret = libc::mktime(tm);

    if let Some(tz) = current_tz {
        set_var("TZ", tz);
    } else {
        remove_var("TZ");
    }
    tzset();

and suppose there are several other cases when libc function controlled via global environment variable, like LC_ALL for example.

Obviously this should work in multi-threaded program, when threads are running. Not only at the begin of the program before any threads are spawned.

Unfortunately this isn't true. This pure Rust program will segfault due to this problem

2 Likes

OK right. The getaddrinfo man page is quite clear that this function is MT-Safe env with the important bit being the env conditional modifier.

So...yes "pure Rust" is fine except for the fact that we link to a C library by default that calls getenv() in a number of less than obvious but important places :wink:

Anyways I think while you are correct to point this out, my real point was really around how capturing Rust-specific threading state isn't going to cover enough of the problem space, e.g. cases like rustc linking to LLVM, or (my world) rpm-ostree linking to libdnf, or any Rust code using the curl or openssl crates etc.

I don't think we do.

We can make the existing functions unsafe, and only offer safe functions on systems that make the operations safe. The only way to make such functions pervasive is to first make them exist at all and then let time pass.

2 Likes

That IMO makes this a non-solution. Saying "it works for pure Rust code" in such a situation is the equivalent of putting our head into the sand and saying "I don't see a problem".

(So even if std wouldn't call getaddrinfo, that would not change my stance here.)

The latter IMO is not an option, so this leaves the former as the only option.

This code looks fundamentally broken. Imagine calling this function twice from two concurrent threads; the 2nd thread might reset TZ to its old value between the first thread doing set_var("TZ", "UTC") and tzset().

This is the equivalent of using global variables to communicate with a library, and there is no salvaging such an interface in today's multithreaded world. In fact it's worse than that since for a global variable one could much easier impose a locking discipline...


@cgwalters thanks for trying to organize this thread!

Do you mean this literally (which would be a breaking change) or in the sense of deprecating them and introduce new unsafe replacements? (Also not sure why get_var would be made unsafe.)

I think there also is a potential hybrid solutions here: Deprecate set_env and leave its behavior unchanged. Introduce 2 new functions: a safe one that changes a Rust shadow environment, and an unsafe one that changes the process environment.

Pros:

  • Does not break existing working code.
  • Can be implemented without special platform capabilities (such as "check if there are other threads running") or introducing new requirements for the Rust runtime as a whole (such as informing it when C code spawns a thread).
  • Gives the ecosystem the opportunity to slowly migrate each use of set_env to one of the two new functions.
  • If the two new operations are not sufficient for some code, we will hear about that and can consider adding more options.

Cons:

  • A lot of churn since every use of set_env needs to be changed. But honestly I don't think this can be avoided, those all need to be audited.
  • Having such a shadow environment could cause confusion. But Rust is not alone here, and the shadow-env-mutating variant of set_env could have a suitable name to make this clear (set_env_rust, for example, with set_env_process for the unsafe one).
  • Provides no safe way to mutate the process environment. But I am not sure if we have consensus that this is even possible on all POSIX platforms, so possibly it makes sense to delegate a safe try_set_env_process to a user crate for those platforms where checking the existence of threads is possible.
12 Likes

I was thinking that there might be too much of an installed base of applications that mess around with the environment block itself, but actually the POSIX spec for getenv says "Conforming applications are required not to directly modify the pointers to which environ points." They wouldn't have put that into the spec if they thought there were any such applications.

The same page does sanction changing the pointer itself, but that's quite a bit less troublesome.

If I can get some time in the next week or so I will try to draft a language-independent spec for such replacements.

4 Likes

With regard to a way to determine if a process is single-threaded, I've just published the num_threads crate. It has support for Linux and FreeBSD currently, but other operating systems are of course welcome. The API is simple:

fn num_threads() -> Option<NonZeroUsize>;
fn is_single_threaded() -> Option<bool>;

Hopefully this can lead to improvements in stdlib, as a number of potential solutions involve knowing if the process is single threaded.

1 Like

You should stat /proc/self/task and subtract 2 from the number of links, not call read_dir() on /proc/self/task and count (that takes O(n) time in the number of threads...).

On Windows, it seems you can check whether a process is single-threaded by calling with NtQueryInformationThread with ThreadAmILastThread, while unfortunately it's not clear whether there is any way to find the number of threads without getting information for all threads in the system (!) with NtQuerySystemInformation for SystemProcessInformation.

How do you deal with the race condition here? I presume you're doing something like, "check for a single threaded process, and if so, do Foo" where "Foo" requires single threadedness. But there is presumably time between when you do the check and when you do Foo, such that a new thread may have been spun up and perhaps even started executing code...

As long as you're not calling back to user-controlled code between the Time of Check and then Time of Use you're good.

let num_threads = check();
if num_threads == 1 {
    use()
}

If num_threads == 1 then you are the only code running! It's similar to how if you've got a reference to something and it's got a ref count of 1 then there can't be anyone else around to increment the reference.

7 Likes

Ah nice! Good point.

Don't worry, I had to have it pointed out to me at first as well!

Thanks @bill_myers; I'll check that out. Windows seems far from ideal.

I have now updated my post on options to include your updated list of Pros/Cons - with an important addition that was mentioned previously there.

To reiterate, "shadow environment" approach will actively break anything which is using set_var (and relying on that propagating to child processes) but directly using fork()/execve(). It's hard to know how big this set of software is, but I feel quite confident it is non-empty. One of my projects currently depends on the subprocess crate, which does directly fork(). Glancing at the reverse dependencies...that looks like a large set to audit, and that's not even accounting for unpublished/proprietary software. Suddenly trying to change to a shadow environment seems like too large of an implicit "contract change" between Rust and its systems programming audience to me.

Indeed, that is why my proposal does not involve changing the behavior of set_var. Every current user of set_var has to determine if they can somehow be sure that there is only a single thread, or if they are okay with only mutating the Rust-specific environment. If neither of these two options can be used, I don't think it is possible for this code to work -- it will need structural changes. (This is true for any of the proposed solutions, I think.)

I am confused by your summary -- several proposals (try_setenv, Rust shadow env) list "Does not break existing working code" but have a "Cons" that explains that this would break existing code. Your summary of the Rust shadow env also does not quite match what I said: I suggested to provide both a safe Rust env and an unsafe way to mutate the 'real' process environment. This is important because it keeps code working that does need to mutate the process environment.

Of course, one could also offer try_setenv in combination with an unsafe function mutating the process environment. But that leaves no option for code that has no way to guarantee being single-threaded, but can accept only affecting other Rust code (such as the test suite example your PR found).

So I feel like maybe the post needs to be restructured a bit? "Change POSIX" is one option, and the other one to only do things on the Rust side. Here we have a range of solutions that all do not involve changing the behavior or signature of set_var, just deprecating it -- so all existing code does keep working at least as well/badly as it works today. As replacement for the deprecated method, we can basically provide any subset of the following APIs:

  • try_set_var
  • set_var_rust
  • set_var_unchecked (unsafe)

These proposals differ (among other things) in how much of the existing code can be ported to the non-deprecated APIs. I think a shadow environment will be required for Rust code that does not care about C code seeing its environment changes, but also has no way of ensuring that it is single-threaded -- code like what your PR found. If we only have try_set_var and set_var_unchecked, there is not really anything that code can be ported to.

Is there any proposal that would actually involve changing the existing set_var? I do not think that will fly (marking it unsafe will lead to massive compilation failure, and just changing it to mutate a Rust shadow environment will break existing code as you said), but for some of the proposals you list I cannot tell if they would do this or not.

4 Likes

If a shadow environment is used, then the fork() problem can be solved by calling pthread_atfork() and providing a function that copies the shadow to the real environment after fork.

But there is still the problem of programs calling set_env and expect C libraries calling getenv to read the changes.

2 Likes

Note that on Windows external programs with enough permission can arbitrarily create threads in your process via the CreateRemoteThread family of functions. See especially the "Remarks" section in the docs for caveats.

While this is often quite sketchy, some system and debugging tools use this technique to gather information in a foreign process.

Edit: Another legitimate source of new threads is signal handling in console programs. When a signal handler is supposed to run (like for Ctrl+C presses), Windows creates a thread in the process for the handler as per HandlerRoutine callback function - Windows Console | Microsoft Docs

3 Likes