Synchronized FFI access to POSIX environment variable functions

RalfJung · October 26, 2021, 2:51pm

Not on Windows. There needs to be some cross-platform replacement. (I know Rust's env functions are actually fine on Windows -- for once, Windows got something right that POSIX screwed up -- but we can't expect people to migrate from the portable function in std to something platform-specific.)

Also, wouldn't libc::setenv be a step backwards in terms of soundness? We would lose the lock that protects at least pure Rust programs from unsoundness today.

That's fair, but there was a portable out-of-std replacement (albeit one that has become controversial since then), and I would assume that function is used much less than set_env.

Does the libs team have something like the lang team "initiatives"? I think it is time to get into a discussion with the libs team to see what they think about this problem and which solutions could be acceptable to them. Cc @josh

bascule · October 26, 2021, 3:08pm

That was my thought opening this thread, but it seems that lock can't be relied on for soundness anyway, at least in multithreaded programs.

The only sound usage of libc::setenv is in a single-threaded program.

RalfJung · October 26, 2021, 3:30pm

Well, the lock is good enough if no C code is being called (or only C code that does not interact with the environment at all). I am not saying that's great, but it's something.

So, there are sound multi-threaded usages of env::set_var.

bascule · October 26, 2021, 3:46pm

The extent to which it's sound today seems to be an implementation detail and definitely an off-label usage.

glibc describes it as:

Function: int setenv (const char *name, const char *value, int replace)
Preliminary: | MT-Unsafe const:env | AS-Unsafe heap lock | AC-Unsafe corrupt lock mem

MT-Unsafe is explicitly not safe to call in the presence of other threads.

zackw · October 27, 2021, 12:49pm

This thread tempts me to write a patch for glibc that makes setenv and putenv crash the program if there are multiple threads, unconditionally -- no opt-in nor opt-out -- and then removes the existing lock, because clearly its existence is giving people the wrong impression.

RalfJung · October 27, 2021, 1:51pm

Rust does not rely on that lock, since it does its own locking.

So, I understand that the spec for an MT-Unsafe operation does not guarantee anything in the presence of concurrency (EDIT: actually see my next post, the spec probably does cover what Rust does), but almost all conceivable implementations will be fine if locking is done consistently. The issue is that once you start mixing code written against different language runtimes, it becomes essentially impossible to do locking consistently.

RalfJung · October 27, 2021, 2:09pm

I think the glibc docs even explicitly allow the kind of locking Rust does:

Functions marked with const as an MT-Safety issue non-atomically modify internal objects that are better regarded as constant, because a substantial portion of the GNU C Library accesses them without synchronization. Unlike race , that causes both readers and writers of internal objects to be regarded as MT-Unsafe and AS-Unsafe, this mark is applied to writers only. Writers remain equally MT- and AS-Unsafe to call, but the then-mandatory constness of objects they modify enables readers to be regarded as MT-Safe and AS-Safe (as long as no other reasons for them to be unsafe remain), since the lack of synchronization is not a problem when the objects are effectively constant.

The identifier that follows the const mark will appear by itself as a safety note in readers. Programs that wish to work around this safety issue, so as to call writers, may use a non-recursve rwlock associated with the identifier, and guard all calls to functions marked with const followed by the identifier with a write lock, and all calls to functions marked with the identifier by itself with a read lock. The non-recursive locking removes the MT-Safety problem, but it trades one AS-Safety problem for another, so use in asynchronous signals remains undefined.

So, MT-Unsafe const:env means that if all calls to functions with a const:env/env designation are guarded by a non-recurisve rwlock, that removes the MT-Safety problem.

It seems to me like what Rust does is sound according to the glibc spec -- but it is extremely fragile and non-compositional; the moment anything else in the program calls an env function without getting the Rust-specific lock, we are in UB territory. There also is no compositional fix that would still let us mutate the global environment.

So this becomes a tradeoff between

Letting pure Rust programs (i.e., not linking in any non-Rust code that might access the environment) mutate the environment.
Making Rust safety compositional so that it still applies even when linking in non-Rust code that accesses the environment.

We cannot have both. If we were pre-1.0 I personally would strongly argue for 2; non-compositional safety is a disaster. But is there a good way we can get there now?

Note that if we were to make Rust set_env only mutate the Rust-specific environment (akin to Java), then programs in the first category would not change behavior, and programs in the second category would be safe (but programmers might be surprised that the non-Rust code does not see the environment changes). I think the only programs negatively affected by this (in the strict sense that programs with UB cannot become any worse, even though they might of course happen to work in practice) are programs that use set_env from Rust before spawning any threads, and later have non-Rust code access the environment. So maybe there is a way forward along these lines?

mjbshaw · October 27, 2021, 4:01pm

I don't think this can be relied upon unless you're building no-std binaries, which is kinda moot since env isn't available in no-std. std links to non-Rust system bindings, and the underlying implementations could access the environment.

zackw · October 27, 2021, 5:25pm

"Functions" is inaccurate here -- there's also environ and the third argument to C main. But I think you are correct that if the program-as-a-whole can guarantee that every access to the environment state is consistently guarded by an application-provided reader-writer lock, then it becomes safe to mutate the environment in a multithreaded program.

The problem is "only" the zillions of places where getenv() is called, without any locking at all, from deep within library routines that don't seem like they need to do any such thing, because getenv("MY_LIBRARY_DEBUG") is such a convenient way to plumb in some debugging hooks.

Because of those zillions of places, I think any attempt to set up application locking around environment access is doomed to failure and the least-bad available solution is, in fact, to forbid environment modification outright whenever multiple threads are active. I get the impression this is not a popular solution, but I don't really understand why. What do you see as motivating a need for environment modification after multiple threads are active?

I think that is a class of programs we really shouldn't break, along with free mutation of the environment visible to non-Rust code by programs that will never have more than one thread.

RalfJung · October 27, 2021, 5:54pm

If we had a time machine my preferred solution would be to implement the Java approach pre-1.0, plus possibly exposing the current set_env without its lock as an unsafe function if someone needs to do some setup and they can be sure no threads have been spawned yet.

But we are not pre-1.0, so any solution needs some kind of credible migration strategy.

If we change set_env to panic / abort / NOP in case more than one thread exists, then Rust-only programs that use set_env with concurrency break. Granted, they possibly have subtle bugs already anyway (I recall using set_env in a #[test] function and it took me a bit to realize that multiply concurrently running tests were interfering with each other), but those other threads might be entirely harmless, so this will probably break some working code. Also, is there even a good way to implement this check for whether threads exist for all POSIX platforms Rust supports? Maybe someone should implement this, make it panic, and do a crater run, so we can see how widespread set_env in concurrent Rust programs is.
If we change set_env to only mutate the environment visible to Rust, then any program that relies on these changes being visible to non-Rust environment accesses suddenly breaks.
Or we could leave set_env unchanged, add some other environment-changing API(s), and deprecate set_env in favor of them. Which APIs would we want/need?

kornel · October 27, 2021, 6:06pm

Could getenv be ever safe in libc? Theoretically it could be done by leaking memory in setenv(), so that getenv() returns equivalent of &'static str. If setenv used string interning approach, I think it could be reasonable.

RalfJung · October 27, 2021, 6:26pm

Not quite sure what this has to do with the concurrency problems we are discussing here? Of course getenv users have to be aware of the lifetime of the data, but if there is no concurrent mutation then that is not very hard.

Nemo157 · October 27, 2021, 10:13pm

It's explicitly noted in the putenv docs that you may modify the string you put into the environment to change it without having to call setenv/putenv again (and that some older versions of glibc did leak a copy instead, which is violating the SUSv2).

comex · October 27, 2021, 11:48pm

Well… sort of right. From what I can tell…

A Windows program using MSVC's CRT has three different copies of the environment, all stored in userland. There's the environment stored in the PEB and accessible with GetEnvironmentVariable and SetEnvironmentVariable, which is what Rust uses. There's the environment tracked by the CRT and accessible with _environ, getenv, and putenv, which presumably exists for C standard compatibility. And there's the wide-char environment tracked by the CRT and accessible with _wenviron, _wgetenv, and _wputenv, which presumably exists as a result of Microsoft's old attempt to Unicode all the things.

The CRT's two environments are both copied from the PEB environment at startup. Calling putenv or _wputenv will update all three. But any C program that tries to mutate _environ directly, or mutate the return value of _getenv, will get them out of sync.

More importantly, calling SetEnvironmentVariable will not update either of the CRT's environments. So if you call set_var from Rust, and then some C library calls getenv, it won't see the modification.

…Under the circumstances, that might be for the best.

comex · October 28, 2021, 12:05am

Boo. I forgot about that. Then perhaps Drepper was right.

That's one way to resolve the issue. But my impression is that such a thing would probably result in an enormous amount of practical breakage. Do you not think so?

hyeonu · October 28, 2021, 5:22am

Not sure it does make sense, but what if we hook the getenv/setenv functions and replace it with the wrapper using the Rust stdlib's lock, at the first call of the std::env::set_var()?

mathstuf · October 28, 2021, 11:33am

That is far too late. Rust could have been loaded long after some C code had run (and potentially stored the function pointers on its own).

Additionally, is this even possible other than through LD_PRELOAD-like mechanisms?

josh · October 28, 2021, 12:52pm

That seems like a majority use case; the primary reason to set things in the environment is to inherit them in other code.

RalfJung · October 28, 2021, 3:25pm

Including other code in the same process? (Inheriting to new processes created via Command is unaffected.) This is basically a static mut, and the other code is making no attempt at synchronizing its accesses. So quite clearly it cannot reasonably support changing this global state at runtime.

But I take it this means you consider changing set_var to only affect Rust code (and new processes spawned via Rust code) not to be acceptable. What about the other alternatives?

In an ideal world where we had this discussion pre-1.0, which API(s) do you think Rust should provide? Many people in this thread agree that the one we do have is a mistake, but I am curious what you and other members of the libs team think.

zackw · October 28, 2021, 3:43pm

Yes. The situation I've seen most commonly is when there's a library that takes configuration from environment variables when an initialization function is called. The application will set all the environment variables in main() and then call the initialization function (possibly much later, in worker threads).

In my experience, this is more common than using set_env or equivalent to set environment variables for subprocesses, because environment variables for subprocesses can more easily be handled with the third argument to execve.

I would go so far as to say that the Java treatment of the environment is a defect in the Java specification.

Topic		Replies	Views
Function to hold lock on execution environment (for FFI) libs	26	2840	April 4, 2021
Thread-safe environment variable mutation	2	1373	February 20, 2023
Pre-RFC: deprecate env::set_var libs	21	4273	March 25, 2019
Standard library, synchronization primitives, and undefined behavior libs	24	10039	April 7, 2019
[Discussion] Safety marks for non thread-safe ffi interfaces Unsafe Code Guidelines	2	217	July 1, 2024

Synchronized FFI access to POSIX environment variable functions

Related topics