Synchronized FFI access to POSIX environment variable functions

...an alternative point of view may be that this thread is about convincing libc maintainers to introduce a thread-safe API for both reading and writing the environment.

Once they do - in 10 years from now - Python, Rust, C, ... will be able to happily co-exist in the same process. Even if due to circumstances out of your control by the time you get a chance to set environment some other threads are already running.

I also got an impression that to a lesser extent this thread was about providing some stop-gap measures that will allow Rust programmers/programs to desperately hold onto safety by fingernails until that happens.

P.S. honestly in my view libc/posix maintainers deserve little credit for not having introduced such API-s tens of years ago..

1 Like

True, in some glorious future we will hopefully have a safely mutable global environment. There's clearly a need for that. Sadly, that will take a while.

But in the present time, it seems pretty clear that the environment is effectively read-only (with some exceptions). As such, it makes sense to treat it like that in Rust: reading is safe, and writing is unsafe and exists for those exceptions. This is different from static mut. (It is somewhat comparable to a library with an internal static mut that exposes a safe getter and an unsafe setter and documents when the setter is safe to use -- that would be a totally reasonable abstraction to have.)

7 Likes

This is not true. "ftrylockfile" from glibc on Linux (I don't know whether it is standardized somewhere) is exactly such lock. As well as I understand it is exactly lock internally held by "printf"

It is a POSIX API actually.

That's really not the same thing. The (POSIX) API lets you lock individual FILEs, but the lock object embedded in each FILE is not exposed โ€” you don't have any way of knowing whether it's a pthread_mutex_t or a pthread_rwlock_t or a C2011 mtx_t or what, and you don't get as much manual control over the lock as you would if you did. This means the library has more flexibility in how it implements the locking.

I came up with a potential solution to this I didn't see proposed yet (many comments?) and did some initial work to determine its feasibility. The idea is to just fork the affected library (localtime but could be repeated for others) and patch fetching of env vars to use Rust code. This is similar in effect to RIIR except it's much easier.

Here's my library: GitHub - Kixunil/rl_localtime: Rust-locked localtime - a sound localtime implementation

Please take a look if you're interested in fixing chrono. Note that I've also posted this to the chrono issue.

1 Like

I haven't looked into this much, but keep in mind that the core issue is that std::env::set_var is simply unsound. Theoretically I, as the author of the time crate, shouldn't have to worry about localtime_r leading to a data race because set_var shouldn't ever be called when multiple threads are running.

So, this has largely fallen off the thread, and while people were broadly in agreement that something needs to be done here (and we even had some agreement on some of the steps that need to be taken, although certainly not all), it's not 100% what next steps here would be.

I would... quite prefer this not be completely forgotten though.

I don't think people have forgotten about this, it just needs steps that aren't happening in this thread.

@zackw is driving the long-term plan of making safe environment functions. That'll take a long time before we can use it, but it seems like the only path towards a future safe interface.

I've seen a couple of different efforts towards making aspects of Rust libraries easily configurable by means other than environment variables, to reduce the need for set_var in Rust code.

And there's a proposal in the works to make unsafe environment functions and possibly deprecate the safe ones.

2 Likes

Just want to confirm that I haven't forgotten about the long-term plan. My day job has been keeping me very busy, but when I have something to report I will post it here (probably in a new thread).

5 Likes

On linux that would be fairly cheap via unshare with CLONE_VM which either has no effect (single-threaded) or returns an error (multi-threaded).

       In addition, CLONE_THREAD, CLONE_SIGHAND, and CLONE_VM can be
       specified in flags if the caller is single threaded (i.e., it is
       not sharing its address space with another process or thread).
       In this case, these flags have no effect.  (Note also that
       specifying CLONE_THREAD automatically implies CLONE_VM, and
       specifying CLONE_VM automatically implies CLONE_SIGHAND.)  If the
       process is multithreaded, then the use of these flags results in
       an error.

Exciting! I assume it will be posted here whenever it's online?

I've had a PR open for a while introducing the new unsafe functions. It's an "either or" situation, though. We either want the new methods or mark the existing ones as "deprecated safe", which to my knowledge is still WIP.

1 Like

I don't believe so. If you write pure Rust without unsafe it can't cause UB. But it being safe makes writing sound FFI code impractical. FFI code being impractical to write seems to be a good reason to make it unsafe though.

Rust's own locking probably rules out UB, on Unix, as long as there's no FFI, but I'm not 100% sure; anyway, the risk is if another thread is in the middle of std::env::var() for the same variable, has exited the internal call to libc::getenv and is in the process of copying the value into a String, there could be a data race between that thread and set_var โ†’ libc::setenv โ†’ free on the memory block holding the old contents of the variable. Given the way std is set up, I think this would be a plain old bug in the library as long as there's no FFI.

Unfortunately this has been demonstrated to not be the case. Pure Rust code can segfault. Does getaddrinfo need to take the ENV lock? ยท Issue #27970 ยท rust-lang/rust ยท GitHub

Even if this weren't the case, std::env::set_var is still unsound, as getting an environment variable is thread-safe while setting it is not. By not using the same lock everywhere (which is currently impossible), there is no way to synchronize gets and sets when using FFI. This is why I had to resort to the num_threads crate โ€” if multiple threads exist, there's no way to guarantee anything.

That's an internal library bug then that's fixable by changing non-public code in the library. Contrast this with things like unsoundness in Pin which couldn't be resolved without breaking change.

Conceivably, yes. But it's still unsound because Rust provides no way to use the same lock when performing FFI. Said another way, there is no way for me to call getenv or an equivalent method in FFI without causing a race condition. This is despite getenv explicitly being thread-safe. As such the Rust method setting the environment is unsound, as that is explicitly not thread safe.

2 Likes

That's why I still think we should have env::{rust_var,set_rust_var,remove_rust_var} which modifies a Rust-only 'shadow' environment (also honored by Command). This will allow easy porting of existing sound Rust code that does not want or need to interact with the underlying system environment of the current process (but might still want to properly set up the system environment of new processes it spawns).

In an ideal world we wouldn't have such a shadow environment. If I we could un-do having these safe methods in Rust 1.0 we should do it. But we cannot, and given the current situation at hand, I think we do need to give programmers something to migrate to, or this will be a disaster transition.

I feel like we probably need an RFC to hash out and decide about the plan here. Just local discussions in a PR won't do, precisely because there are multiple alternatives.

3 Likes

I really don't think we should offer a "shadow environment"; that seems both like a trap for users (often not what they're actually looking for) and a suboptimal interface for things that are exclusively Rust. Working with Command but not the current process will work in some cases while surprising people in others, and it would also add complexity (what happens if the shadow environment and the real environment conflict).

As soon as #[deprecated_safe] is available, I think we should use that on the existing functions, and continue having that be our interface to the environment.

1 Like