Synchronized FFI access to POSIX environment variable functions

No, I don't. I think that in practice most multithreaded programs are already obeying the "only set environment variables before creating threads" rule. It is a risk, and I wouldn't push such a change into a glibc release without first trying to measure the impact, but I really don't think it'd be that bad.

I am not sure that is true -- there is at least some evidence to the contrary

This is very early in running gnome-settings-daemon (not after minutes, or even seconds, it's the first thing we do), but we read the stored configuration using GSettings, which uses D-Bus. That D-Bus implementation (the one in glib) uses threads, thus with a single call to get the configuration value, we're using threads.

A program will initialize a bunch of things early on in its main, and whether any of those spawn threads can be an implementation detail that is not apparent from the outside.

So, harmless-looking code like this is already violating the rule of "only call setenv before spawning threads":

        region = g_settings_get_string (locale_settings, "region");
        if (region[0]) {
                g_setenv ("LC_TIME", region, TRUE);

If "the Rust functions only effect Rust code" is not a outright unacceptable breakage concern, wouldn't it be possible to then add a new function which synchronizes the changes made to the Rust environment back to the native one? This function could be unsafe, and documented with all the caveats, without changing anything about the pure-Rust behavior of the existing functions.

4 Likes

This is my primary concern. Many libraries create threads internally, and that shouldn't break code touching the environment, as long as the two aren't mixed. (Also consider things like async-std or tokio, which people sometimes wrap main with, and which may start threads.)

1 Like

As far as I can see, there is neither a static nor a dynamic way to actually check this though. So given the state of the ecosystem as a whole, I don't think it is possible to provide a safe API that achieves this.

While reading add env var to override sys::Instant::actually_monotonic() for windows and unix by the8472 ยท Pull Request #84448 ยท rust-lang/rust ยท GitHub, I realized that there's a problem with "leave setenv to an unsafe libc function". We have parts of Rust runtime which are controlled via environment variables, specifically, RUST_BACKTRACE.

It's important to allow applications to be able to force-enable backtraces themselves. For example, rust-analyzer currently does

    if env::var("RUST_BACKTRACE").is_err() {
        env::set_var("RUST_BACKTRACE", "short");
    }

If we follow the path of deprecating set_env, we should provide env-less way to tweak such API, like (strawman) std::panic::enable_backtrace(yes: bool).

6 Likes

IMO, (proper) APIs are always better than ambient global state manipulation :slight_smile: . In this case, that argument really should be a #[non_exhaustive] enum of potential states, not just a bool.

9 Likes

There are myriad other APIs in other libraries with similar environment usage, for which Rust programs may also want to set values like this. Changing one within the Rust standard library doesn't make environment functions unnecessary.

1 Like

Is there any reason we couldn't

  1. deprecate the current fn std::env::set_var and
  2. introduce a new unsafe fn std::env::set (name bikeshedding aside)

? It seems like this would be the simplest way, would permit existing uses to continue, and would not break existing code. Speaking as the maintainer of the time crate, this change would prima facie give me reason to re-expose the otherwise unsound API. The only reason I nuked it in the first place is because safe Rust could cause the unsoundness. If unsafe is required to alter the environment, the safety is now placed properly โ€” at the person setting the variable.

3 Likes

At this point, that seems like the simplest and most straightforward approach. People can use it safely, and most uses are safe, but we can't reasonably enforce that safety without breaking many of the perfectly valid uses. Making it unsafe seems preferable to the various possible ways to attempt to make it safer (none of which can make it completely safe).

If a function needs to be deprecated for no other reason than not being marked unsafe, would it make sense to deprecate the "safeness" of the function instead? I think I've seen that suggested in the past before in other situations, but I can't find them now.

4 Likes

That's what I suggested here: Consider deprecating and/or modifying behavior of std::env::set_var ยท Issue #90308 ยท rust-lang/rust ยท GitHub

That was suggested for CommandExt::before_exec:

However it's a breaking change. They ended up deprecating before_exec and adding a new unsafe fn pre_exec.

Likewise there are already many users of set_env:

https://github.com/search?l=Rust&type=Code&q=set_var

1 Like

Thanks, I think this comment is the closest to what I was thinking: std::os::unix::process::CommandExt::before_exec should be unsafe ยท Issue #39575 ยท rust-lang/rust ยท GitHub

If the function is going to be deprecated you'd receive a warning anyways, would it be a breaking change for that deprecation warning to say "wrap this in unsafe" instead of "change the method name and wrap it in unsafe"?

1 Like

If you add unsafe to the function signature, it will no longer compile outside of unsafe contexts, and that would be a breaking change. Deprecations are warnings, but adding unsafe to a function which previously didn't have it means the callers will have a compile error.

It seems like the only way you could get away with something like that is if there is no evidence the function was ever used, e.g. you can change it and a crater run shows no breakages.

Right, I was imagining a new "safeness deprecation" capability for the compiler. As in, everywhere you use the (now) unsafe function as a safe one, you'd get a deprecation warning instead of an error.

I don't know if that'd actually be worth the effort (or even feasible), but there's been a couple of times now where something has had to go from safe to unsafe.

10 Likes

It's more than invocation, you're changing the type of something, which is a "split the ecosystem" type change as far as I'm aware.

mod std_env {
    use std::ffi::OsStr;
    pub unsafe fn set_var<K: AsRef<OsStr>, V: AsRef<OsStr>>(key: K, value: V) {}
}

fn main() {
    let f: /* unsafe */ fn(String, String) = std_env::set_var;
}

:arrow_right:


 --> src/main.rs:7:46
  |
7 |     let f: /* unsafe */ fn(String, String) = std_env::set_var;
  |                         ------------------   ^^^^^^^^^^^^^^^^ expected normal fn, found unsafe fn
  |                         |
  |                         expected due to this
  |
  = note: expected fn pointer `fn(String, String)`
                found fn item `unsafe fn(_, _) {std_env::set_var::<_, _>}`

I believe what @skippy is saying is to permit either safe of unsafe, but the former would be deprecated and would emit a warning anywhere it's used (whether it's at call-site, type inference, type annotation, etc.)

2 Likes

Okay, maybe I understand now -- a deprecation lint, but one that is context sensitive and doesn't fire in unsafe context? And the type remains unchanged, a normal (safe) function type. (So the statement "(now) unsafe function" is inaccurate; it's only limited via lint, and never actually becomes unsafe.)

That does mean that code already using set_var in an unsafe context will get no warning. Unless it also warns in unsafe until you disable the lint, which means that every use of the function needs an attribute. Or maybe for a "limited" time, like...

default lint level in safe context unsafe context
Edition <= 2021 warn warn
Edition 2024 deny warn
Edition 2027 deny allow

To flip the question around, if you're going to get a deprecation warning anyways, is it really better to be told to keep the same function name but wrap it unsafe and/or add an attribute, rather than to use a new, properly-unsafe-typed function? After all, the goal is to have everyone using set_var actually take notice and make sure they're using it properly (right?). In 5 years I think I'd rather have set_var be a flag for "hmm, long since deprecated; this code needs some further updating and review" than ambiguous.

2 Likes

I hope I'm not derailing the thread, but I was imagining that the function would actually change to unsafe. The method itself isn't deprecated in any sense, only treating it as a safe function is deprecated.

So in your function pointer example you would now get a deprecation warning instead of an error due to the missing unsafe. The warning wouldn't indicate the function is deprecated, as it isn't, but only indicate that omitting the unsafe is deprecated.

Assuming this is even possible (I have no expertise to say), it wouldn't have to be tied to an editions at all. I don't know compiler internals, but I was thinking that anywhere the compiler finds a missing unsafe related to a function and emits an error, it would check to see if the function was marked "safeness deprecated".