Synchronized FFI access to POSIX environment variable functions

This proposal looks great to me. Thank you very much for working on it.

FWIW, while I appreciate the "don't make an extra copy" aspect of take_env and release_env, I also feel like it wouldn't be the end of the world if environment access had a little overhead.

The iteration bits, and the approach of making existing functions thread-safe to the extent possible rather than making new ones, looks great to me.

Would you consider adding something to the effect of "if the function X exists, programs can assume that functions Y and Z are thread-safe"? That way it's clear that this API comes as a unit, and if you detect the new functions you can assume the changes to existing functions as well.

We already have at least two real-world examples of this:

  • code that sets RUST_BACKTRACE to affect its own Rust-side backtrace generation
  • the test case discovered in this PR, where not even rustc's own test suite still passes when set_var is changed to panic in case there are multiple threads
  • looking at code I wrote myself, I don't think I ever cared about set_var effects outside Rust -- all the uses of set_var in Miri (1, 2) only care about Rust consumers
5 Likes

I commented in that PR.

That code looks like it could be fixed by:

  • wrapping setup() in std::sync::Once or so
  • changing the set_var to command.env

right?

Another case is https://crates.io/crates/env_logger (which has 36M downloads). While there are APIs exposed that allow changing the default without directly writing to the environment, they aren't always in a convenient spot to use. Similarly, failure and eyre use RUST_BACKTRACE in a similar manner to the stdlib .

There are a good number of pure-Rust APIs that are controlled directly or indirectly via the environment. I don't think they're an edge case at all.

This seems unfortunately platform specific. What is portable code supposed to do here?

1 Like

We'd have to somehow feed the sysroot dir to all the other places in this file that run a Command, either by passing it around explicitly or through a global variable. We'd also have to remember to set MIRI_SYSROOT for any new Command that we add in the future.

And for the other use of set_var, that is configuring the log crate; I don't think we can use any other API here since the actual logging initialization is happening inside rustc_driver. (So this is like the RUST_BACKTRACE cases mentioned before.)

All Rust code using set_var will need to fixed anyway, so it might be best to have whoever works on this fix all affected crates on crates.io themselves, and then one can use the most natural approach, which is designing and creating new std APIs as the need for them becomes apparent while fixing the crates.

Then questions like whether a shadow environment is needed, whether APIs affecting getenv() on Windows are needed, etc. can be naturally answered by whether the need for such an API ever comes up while fixing the affected crates.

This seems like a rather large amount of work push on whoever fixes this.

https://research.owlfolio.org/scratchpad/threadsafe-env-v0.md

Thanks for this. A few comments:

(The difference between this and passing environ to execve is that execve is a kernel primitive that is specified to destroy all existing threads, so modifications to environment variables concurrent with an execve are not possible.)

FWIW, this would be nontrivial to implement. From a quick check, both Linux and xnu copy environment variables out of userspace before stopping all other threads. This is logical, since execve is documented in the POSIX man page as failing with E2BIG if the environment variables are too long, and is also documented in both of those kernels' corresponding man pages (though not the POSIX man page) as failing with EFAULT if the specified envp is an invalid pointer. If execve fails, other threads would be expected to still be alive.

Nontrivial, but not impossible, of course. One approach would be for libc to implement a wrapper for execve that checks whether environ was passed as envp, and adds appropriate locking if so.

Indeed, I think this special case would be a good idea, as there are plenty of programs that pass environ to execve and stand to be broken if it's not implemented. (Such programs really ought to just call execv instead, but many don't, and for posix_spawn there is no similar alternative.)

Passing a string to putenv that is neither a fresh allocation by malloc nor a string returned by take_env or env_next (see below) triggers undefined behavior. Modifying a string that was passed to putenv also triggers undefined behavior. putenv takes ownership of the string it is given; one should not call free nor release_env on it.

This is a lot of UB, and it may be difficult to determine whether a C program is following the rules or not. What about having putenv just make a copy of the input string? As @josh said, the cost of an extra copy is not the end of the world. This approach wouldn't avoid breakage if programs call putenv and then modify the string, but it would make the breakage less dangerous: instead of causing UB, the changes would just be ignored.

const char *take_env(const char *name);
void release_env(const char *val);
void replace_env(const char **newenv);

Bikeshed: I would omit underscores to make the naming convention more consistent with the existing getenv, setenv, clearenv, etc. Also, "take" sounds like it's taking something away from the environment, though I don't have any alternative suggestions.

Is it intended that replace_env is infallible? If the implementation is going to maintain ancillary data for each variable, it needs to allocate, so it seems like replace_env should be able to fail with ENOMEM.

That said, if the goal of replace_env is to provide an easy migration path for code that assigns to environ, having it be fallible would make migration that much more difficult. But then, it's already a nontrivial migration in any case where the new environment variables were (in the existing code) allocated with something other than individual mallocs.

And regardless of how trivial the migration is, I suspect that in practice every C library will have to maintain some type of compatibility with legacy code that still assigns to environ. In that case, I'm not sure there's much reason for replace_env to exist at all…

The C library also maintains a table of ancillary data for each environment variable. [..] The ancillary data includes, perhaps among other things, a reference count which is incremented by take_env and env_next and decremented by release_env.

How would release_env translate from a string pointer to the corresponding ancillary data? A hash table? That seems less than ideal.

1 Like

I've seen several programs that do this (in fact, I remember being told it's the reason to use putenv over setenv, although clearly there are reasons this isn't exclusively true). That said, I don't know that this impacts your larger point.

Right, making the global mutable state explicit seems easiest here. That said, there is a 3rd option which is to re-exec. At least on Linux it's like:

let mut cmd = Command::new("/proc/self/exe");
cmd.env("MIRI_SYSROOT", sysroot);
cmd.exec();

That will ensure the variable is set for the current (new) process as well as inherited by children, and avoids mutating global state. There's a larger cost to re-exec'ing, but if you're running a few child processes anyways, another exec isn't a big deal.

1 Like

Here some more use cases that are annoying to support well without a separate rust-private environment which with safe reads/writes (note that I fully admit that there are probably non-trivial details that need to be worked out WRT how this should work).

Note that I'm assuming that we cannot rely on the proposed thread-safe env API for quite some time — I can't see how we could rely on it given that it does not yet exist, and we want to support a wide range of OS versions (we also can't reasonably have an API that's only safe on some OS versions, especially when in practice that gets determined at runtime).

Sorry in advance for writing so much, if you don't care about the details, I get to the point at the end.

Configuring a process spawned by a library

If call a library function, which does some thinking before invoking a process via Command. This is very handy, especially in code that orchestrates builds (such as build.rs's, the xtask pattern, ...)

That is, at the moment (unless the library explicitly prevents it) I can configure the spawned process's environment by writing to the env prior to calling whatever function of the library that spawns the process.

Of course, this isn't completely ideal[1], but it has a significant benefit: it more-or-less just works without any additional effort required by the library developer — they don't have exposed functionality for configuring the Command they're going to spawn.

Without some safe and sound way for Rust code to write to the env in a way that is visible to Command (and ideally other rust code too), we'll either lose this ability[2] or every library that uses Command in some fashion would essentially have to grow support for configuring it.

Environment configuration from files

The dotenv crate finds and parses .env files (generally shortly after startup), and populates the global environment with the values it reads. These files generally are used to store sensitive data, and are kept out of version control.

It's considered best practice to put this type of data in the environment, to the point where taking it as parameters tends to be strongly discouraged outside of test code. I believe, the logic here is: if the SDK for some API accepts sensitive data as a function parameter, it might encourage naive developers to hardcode this information in the source[3]. However, if it forces them to pass in sensitive info via the environment, then the path of least resistance becomes to do the right thing.

While typical dotenv usage will probably happen very near startup, it could easily be after threads are spawned, such as in #[tokio::main] (and async is common in webapps, the place where this pattern is the most common)... and while dotenv doesn't do this, it's not hard to imagine similar code which uses async IO (or even just spawn_blocking) to read the files.

Perhaps in an ideal world, populating the environment would happen prior to process launch, but this can be hard to orchestrate (especially across multiple prod/dev/test/...), and so things like this are not uncommon.

My point

More generally, I believe the Rust-writer/Rust-reader use-case is way too widespread to ignore, and often will not have a straightforward migration path[4].

The Rust/C use case is of course important too, but... it's probably unfixable without making setenv and friends fully thread-safe (via something like what @zackw proposed). While that'd be great, it seems like we won't be able to rely on it for a long time.

However, if we have two environments (one being Rust-private and safe to read/write, and the other being the C environment, which is unsafe to write to), we solve several problems:

  1. Rust/Rust usage has a clear migration path to a safe API, even if they have no control over thread usage used by the application.

    Without this, I worry a lot of code will either #[allow(deprecated)] or use the unsafe function[5] without correctly verifying usage is single-threaded, which seems very hard to verify.

  2. Writes to the Rust env cannot cause memory safety problems for C code using getenv. Of course, C code won't see the write, which may be surprising (hopefully less so if we name the APIs well).

    While this is a little unfortunate, the upshot is that FFI libraries can continue to be sound, even if the C code it binds to has a getenv somewhere, which is... a bigger concern[6] of mine than it should be.

  3. Writes to the env from Rust that do need to be visible to C (for example, at the start of main and such) can still be done, it just requires calling an unsafe function.

And of course, I'm not proposing changing the existing API beyond deprecation, so existing working code won't break.


  1. Write-global-then-call APIs like this do have issues around multithreading ofc, but currently for Rust/Rust usage this would be a correctness issue, and not a soundness hole. ↩︎

  2. Concretely, I'd expect loss of functionality like this to result in code using the unsafe or deprecated API even if it is not actually sound to do so. ↩︎

  3. Which is undesirable for several reasons, such as it being in version control, developers having access to production credentials, etc. ↩︎

  4. It will of course always be possible to come up with alternative architectures that avoid writing to the environment, but this will frequently not be practical without major rewrites or forking dependencies. ↩︎

  5. Given that we made this mistake in the stdlib, it does not sound far-fetched to imagine code that justifies the unsafe by adding a lock around environment access. ↩︎

  6. To the point where not wanting to deal with people filing issues (or advisories) related to this has stopped me from open sourcing some of the FFI crates I wrote over the past year. ↩︎

1 Like

Though storing secrets in environment variables is a somewhat common practice, I personally wouldn't consider it a "best practice". And I'm not alone in that position.

Placing secrets in environment variables unnecessarily exposes them to spawned subprocesses, and they can be mistakenly logged by things like crash/exception reporters.

Better options include passing secrets via pipes (e.g. via standard input) or files (e.g. tmpfs mount or FUSE).

2 Likes