That's multithreaded thinking!
But if it returns 1 there are no other threads that could be spawning threads.
That's multithreaded thinking!
But if it returns 1 there are no other threads that could be spawning threads.
It's an argument against doing anything that means people's existing code (which may well be correct or at least unexpoitable, even if it calls setenv
when multiple threads exist) breaks when they upgrade Rust and they have to drop everything to fix it. That would include both making env::set
unsafe
, and making panic at runtime if multiple threads exist.
But it sounds good to me to add try_set
as the new recommended API, and deprecate env::set
in favor of try_set
plus a new unsafe
version that replicates the old behavior (maybe call it unchecked_set
).
In this case, it would be best not to go for the "soft deprecation" suggested by @skippy, since that would encourage users to just tack on an unsafe
block to continue using the existing method, while the recommended approach would be to switch to try_set
.
On Darwin, there's pthread_is_threaded_np()
.
I'm not a fan of try_set
. It feels more for making it not-Rust's-problem, rather than helping developers set env vars.
It's going to be inherently fragile, and offer no useful alternative when it fails. It can be broken by a shared global state that I don't quite control (e.g. a dynamic library with a global static constructor could spawn thread even before fn main
). It can suddenly start failing hard, and when I have functionality that depends on env vars being set, it's program-breaking. It errs on the side of not working, and fails even when putenv
could be safe.
I'm also skeptical of making set_var
simply unsafe
, because the most obvious use of it is to keep it exactly as is, wrap it in unsafe {}
, and hope for the best. It depends on a program-global assumption which is easy to optimistically make when the code is written (e.g. I'm not using libc directly, I'm not using threads yet), and nearly impossible to enforce as time goes on (anything anywhere can start doing these things).
I think a workable approach would be to:
unsafe env::set_libc_var
that is explicitly for communicating with non-Rust, with all the caveats.env::set_var
modify only Rust's environment.env::set_var
that picks either env::set_var_rust_only
or env::set_var_libc_crashy
depending on a flag/edition. It was possible to fudge method lookup for arrays' into_iter
, so maybe a similar hack could be made for set_env
too.You're right, I don't actually think we need to make get
unsafe.
A few cases of many where this would fail:
#[async_runtime_of_choice] fn main() { ... }
, where the runtime starts threads, but those threads don't do anything until they're given work to do.I personally would favor a two-step approach:
setenv
/unsetenv
/getenv
(and deprecate environ
in favor of a safe iteration function). That'd be a long transition, but a worthwhile one.As far as I can tell, the glibc implementation never frees environment strings, so would actually be fully thread safe if getenv were to take the same lock that setenv takes (although the string contents may be modified due to putenv or setenv changing a value, but that's not a problem for Rust since get_env copies the string anyway, although of course it must do so with memcpy and not Rust slice copies) and assuming no program calls putenv and then frees the string after calling unsetenv or directly mucks with environ.
Perhaps the solution could be to runtime patch (or override the symbol, but not sure if that's guaranteed to work) glibc's environment functions to take locks.
Since glibc leaks strings, it's probably safe to assume that no POSIX program fails catastrophically on a leaking implementation, and so any other libc can be patched to use glibc's algorithm.
There are a number of “systems programming” things which are safe (or at least better-behaved) iff there are no other threads, such as setting the environment as discussed here, fork()
, some uses of signals, use of !Sync
mutable globals, and probably lots of other things I'm not thinking of.
Therefore, I'm wondering if it would be useful to add to the Rust standard library a mechanism which allows safely invoking some of these things given the unsafe
creation of an explicit promise that the process is currently single-threaded, that would be passed as a parameter to the future version of set_var
and such. The advantage of this over marking individual calls unsafe
is that “not multithreaded” is a global property of the process related to the big picture of the program's control flow, so associating the “here is a property you can't necessarily statically guarantee that I promise is true” with some single high-level scope is more accurate and (slightly) more auditable than sprinkling it about every call site — and that it documents what kind of unsafety is present.
Also, such a mechanism could be used fully safely on targets which inherently have no thread support. (This is not the same as “why not make this construct/function safe if we're on a single-threaded target”, which I've heard rejected before, because it would affect only the site where the promise is created, which is supposed to be rare, not every “leaf” that uses it.)
Disadvantage: this one-size-fits-all mechanism would not address the case mentioned above of threads which exist but aren't doing anything yet (which would be safe for environment manipulation but not for fork()
, for example).
I am opposed to any approach that involves the introduction of a Rust-private version of the environment.
Nope, because, to repeat myself, environ
exists. (Later you say "assuming no program directly mucks with environ" but that is not a legitimate assumption.)
Also, even with Rust's locking, there is a race between the copy and a non-Rust thread mutating the value.
I can forsee something like this:
{
// RAII struct that puts the token back on `Drop`.
let st_token = std::thread::is_single_threaded_token().scoped();
std::env::set_env(var, value, &st_token);
// some_api_that_wants_st_token(); // Will panic just like nested `Rc` mutable locking would
}
some_api_that_wants_st_token(); // Ok here
Anything which either starts threads or notices threads could then do:
// We're breaking the single-threaded assumption, so take it and drop it on the floor.
let _ = std::thread::is_single_threaded_token();
std::thread::spawn(…);
IMO this sort of API would be better suited to being implemented in a crate which can pull in some sort of cross-platform crate dependency (e.g. palaver
) for querying the number of running threads (or Just Work on Windows).
It could use an underlying unsafe
API provided by std
.
OK I created github.com/cgwalters/try-setenv-rs just to have some actual code here. I could publish to crates.io if someone cares, but I think given the fundamental nature of this it kind of needs to be in the stdlib to be effective.
(Although, it would be interesting to gather some data on actually how many library crates call setenv outside of their own test code etc.)
This is a concern. I did validate that wrapping the simple example code in the above repo with #[tokio::main]
with the threaded tokio reactor panics. IOW, the tokio runtime does eagerly create threads. That said, the macro-main is just a nice to have - one can still create the runtime "manually". Also, perhaps it wouldn't be too hard for these runtimes to become "lazy" and only spin up threads when a future is spawned?
Like the original GNOME/dconf example? Isn't that the motivating example for this? The rust code calling try_setenv()
should deterministically panic. The fix is just to move the try_setenv()
calls before the C library init.
OK but let's try to make this less abstract. Here's a real world example: procspawn::init() which calls remove_var()
. (Hmm, need to support that too)
If we had try_setenv()
, the potential conflict between procspawn and things like #[tokio::main]
(or in general creating threads before calling it) would be more clear.
Did you have another case in mind where it would somehow be difficult to refactor to calling try_setenv()
early before creating threads?
I suggest using more explicit iteration than just count() > 1
, both for error handling and to avoid full iteration when not needed. Something like:
if let Ok(mut r) = std::fs::read_dir("/proc/self/task") {
// See if there are at least two entries (threads)
if let Some(Ok(_)) = r.next() {
return matches!(r.next(), Some(Ok(_)));
}
}
If you do see any Err
, it would probably be safer to assume it is threaded, and return true
.
It might also make sense to amortize this with a function for batch updates, setting multiple vars.
What if it is a C++ library spawning a thread inside a global constructor? If the C++ library guarantees that it won't access any env vars from this thread it would be UB free, but try_setenv
will still fail no matter how early inside of main
you call it.
Is there a real world example of this? I would agree that try_setenv()
wouldn't work here but...the other alternatives don't help either right?
They do help, because the C/C++ library need not be touching the environment in it's threads.
If making set_var
unsafe is resisted, then could we simply do what we did for /proc/mem/ and declare this out of scope thing?
Just ignoring the issue and calling it out of scope seems like not the best way forward.
I'd say I'm someone pushing for this change, to deprecate set_var
and make it unsafe, but I recognize that sometimes inertia means the best solutions aren't necessarily the ones that are done.
I haven't seen this mentioned yet but can we make something unsafe over an edition boundary?
Namely, in the current and former editions it is not marked as unsafe
but in, say edition 2024, it would be?