Synchronized FFI access to POSIX environment variable functions

zackw · October 28, 2021, 3:47pm

No, I don't. I think that in practice most multithreaded programs are already obeying the "only set environment variables before creating threads" rule. It is a risk, and I wouldn't push such a change into a glibc release without first trying to measure the impact, but I really don't think it'd be that bad.

RalfJung · October 28, 2021, 4:29pm

I am not sure that is true -- there is at least some evidence to the contrary

This is very early in running gnome-settings-daemon (not after minutes, or even seconds, it's the first thing we do), but we read the stored configuration using GSettings, which uses D-Bus. That D-Bus implementation (the one in glib) uses threads, thus with a single call to get the configuration value, we're using threads.

A program will initialize a bunch of things early on in its main, and whether any of those spawn threads can be an implementation detail that is not apparent from the outside.

So, harmless-looking code like this is already violating the rule of "only call setenv before spawning threads":

        region = g_settings_get_string (locale_settings, "region");
        if (region[0]) {
                g_setenv ("LC_TIME", region, TRUE);

geeklint · October 28, 2021, 4:33pm

If "the Rust functions only effect Rust code" is not a outright unacceptable breakage concern, wouldn't it be possible to then add a new function which synchronizes the changes made to the Rust environment back to the native one? This function could be unsafe, and documented with all the caveats, without changing anything about the pure-Rust behavior of the existing functions.

josh · October 28, 2021, 4:48pm

This is my primary concern. Many libraries create threads internally, and that shouldn't break code touching the environment, as long as the two aren't mixed. (Also consider things like async-std or tokio, which people sometimes wrap main with, and which may start threads.)

RalfJung · October 28, 2021, 5:04pm

As far as I can see, there is neither a static nor a dynamic way to actually check this though. So given the state of the ecosystem as a whole, I don't think it is possible to provide a safe API that achieves this.

matklad · October 30, 2021, 12:50pm

While reading add env var to override sys::Instant::actually_monotonic() for windows and unix by the8472 · Pull Request #84448 · rust-lang/rust · GitHub, I realized that there's a problem with "leave setenv to an unsafe libc function". We have parts of Rust runtime which are controlled via environment variables, specifically, RUST_BACKTRACE.

It's important to allow applications to be able to force-enable backtraces themselves. For example, rust-analyzer currently does

    if env::var("RUST_BACKTRACE").is_err() {
        env::set_var("RUST_BACKTRACE", "short");
    }

If we follow the path of deprecating set_env, we should provide env-less way to tweak such API, like (strawman) std::panic::enable_backtrace(yes: bool).

mathstuf · October 30, 2021, 9:08pm

IMO, (proper) APIs are always better than ambient global state manipulation . In this case, that argument really should be a #[non_exhaustive] enum of potential states, not just a bool.

josh · October 31, 2021, 8:11am

There are myriad other APIs in other libraries with similar environment usage, for which Rust programs may also want to set values like this. Changing one within the Rust standard library doesn't make environment functions unnecessary.

jhpratt · October 31, 2021, 8:22am

Is there any reason we couldn't

deprecate the current fn std::env::set_var and
introduce a new unsafe fn std::env::set (name bikeshedding aside)

? It seems like this would be the simplest way, would permit existing uses to continue, and would not break existing code. Speaking as the maintainer of the time crate, this change would prima facie give me reason to re-expose the otherwise unsound API. The only reason I nuked it in the first place is because safe Rust could cause the unsoundness. If unsafe is required to alter the environment, the safety is now placed properly — at the person setting the variable.

josh · October 31, 2021, 10:48am

At this point, that seems like the simplest and most straightforward approach. People can use it safely, and most uses are safe, but we can't reasonably enforce that safety without breaking many of the perfectly valid uses. Making it unsafe seems preferable to the various possible ways to attempt to make it safer (none of which can make it completely safe).

skippy · October 31, 2021, 1:45pm

If a function needs to be deprecated for no other reason than not being marked unsafe, would it make sense to deprecate the "safeness" of the function instead? I think I've seen that suggested in the past before in other situations, but I can't find them now.

bascule · October 31, 2021, 2:45pm

That's what I suggested here: Consider deprecating and/or modifying behavior of std::env::set_var · Issue #90308 · rust-lang/rust · GitHub

bascule · October 31, 2021, 2:47pm

That was suggested for CommandExt::before_exec:

github.com/rust-lang/rust

std::os::unix::process::CommandExt::before_exec should be unsafe

opened 07:34AM - 06 Feb 17 UTC

closed 12:15AM - 23 Feb 19 UTC

fweimer

P-medium T-lang T-libs-api I-unsound C-bug

The [before_exec](https://doc.rust-lang.org/std/os/unix/process/trait.CommandExt….html#tymethod.before_exec) method should be marked unsafe because after a fork from a multi-threaded process, POSIX allows only allows async-signal-safe functions to be called. Otherwise, the result is undefined behavior. The only way to enforce this in Rust is to mark the function as unsafe. So to be clear, this should not compile: ```rust use std::process::Command; use std::os::unix::process::CommandExt; fn main() { let status = Command::new("sh") .arg("-c") .arg("echo world") .before_exec(|| { println!("hello"); Ok(()) }) .status().unwrap(); println!("status: {}", status); } ```

However it's a breaking change. They ended up deprecating before_exec and adding a new unsafe fn pre_exec.

Likewise there are already many users of set_env:

https://github.com/search?l=Rust&type=Code&q=set_var

skippy · October 31, 2021, 2:58pm

Thanks, I think this comment is the closest to what I was thinking: std::os::unix::process::CommandExt::before_exec should be unsafe · Issue #39575 · rust-lang/rust · GitHub

If the function is going to be deprecated you'd receive a warning anyways, would it be a breaking change for that deprecation warning to say "wrap this in unsafe" instead of "change the method name and wrap it in unsafe"?

bascule · October 31, 2021, 3:15pm

If you add unsafe to the function signature, it will no longer compile outside of unsafe contexts, and that would be a breaking change. Deprecations are warnings, but adding unsafe to a function which previously didn't have it means the callers will have a compile error.

It seems like the only way you could get away with something like that is if there is no evidence the function was ever used, e.g. you can change it and a crater run shows no breakages.

skippy · October 31, 2021, 3:19pm

Right, I was imagining a new "safeness deprecation" capability for the compiler. As in, everywhere you use the (now) unsafe function as a safe one, you'd get a deprecation warning instead of an error.

I don't know if that'd actually be worth the effort (or even feasible), but there's been a couple of times now where something has had to go from safe to unsafe.

quinedot · November 1, 2021, 1:41am

It's more than invocation, you're changing the type of something, which is a "split the ecosystem" type change as far as I'm aware.

mod std_env {
    use std::ffi::OsStr;
    pub unsafe fn set_var<K: AsRef<OsStr>, V: AsRef<OsStr>>(key: K, value: V) {}
}

fn main() {
    let f: /* unsafe */ fn(String, String) = std_env::set_var;
}


 --> src/main.rs:7:46
  |
7 |     let f: /* unsafe */ fn(String, String) = std_env::set_var;
  |                         ------------------   ^^^^^^^^^^^^^^^^ expected normal fn, found unsafe fn
  |                         |
  |                         expected due to this
  |
  = note: expected fn pointer `fn(String, String)`
                found fn item `unsafe fn(_, _) {std_env::set_var::<_, _>}`

jhpratt · November 1, 2021, 2:29am

I believe what @skippy is saying is to permit either safe of unsafe, but the former would be deprecated and would emit a warning anywhere it's used (whether it's at call-site, type inference, type annotation, etc.)

quinedot · November 1, 2021, 4:18am

Okay, maybe I understand now -- a deprecation lint, but one that is context sensitive and doesn't fire in unsafe context? And the type remains unchanged, a normal (safe) function type. (So the statement "(now) unsafe function" is inaccurate; it's only limited via lint, and never actually becomes unsafe.)

That does mean that code already using set_var in an unsafe context will get no warning. Unless it also warns in unsafe until you disable the lint, which means that every use of the function needs an attribute. Or maybe for a "limited" time, like...

default lint level in	safe context	`unsafe` context
Edition <= 2021	warn	warn
Edition 2024	deny	warn
Edition 2027	deny	allow

To flip the question around, if you're going to get a deprecation warning anyways, is it really better to be told to keep the same function name but wrap it unsafe and/or add an attribute, rather than to use a new, properly-unsafe-typed function? After all, the goal is to have everyone using set_var actually take notice and make sure they're using it properly (right?). In 5 years I think I'd rather have set_var be a flag for "hmm, long since deprecated; this code needs some further updating and review" than ambiguous.

skippy · November 1, 2021, 11:26am

I hope I'm not derailing the thread, but I was imagining that the function would actually change to unsafe. The method itself isn't deprecated in any sense, only treating it as a safe function is deprecated.

So in your function pointer example you would now get a deprecation warning instead of an error due to the missing unsafe. The warning wouldn't indicate the function is deprecated, as it isn't, but only indicate that omitting the unsafe is deprecated.

Assuming this is even possible (I have no expertise to say), it wouldn't have to be tied to an editions at all. I don't know compiler internals, but I was thinking that anywhere the compiler finds a missing unsafe related to a function and emits an error, it would check to see if the function was marked "safeness deprecated".

Topic		Replies	Views
Function to hold lock on execution environment (for FFI) libs	26	2762	April 4, 2021
Thread-safe environment variable mutation	2	1283	February 20, 2023
Pre-RFC: deprecate env::set_var libs	21	4092	March 25, 2019
Standard library, synchronization primitives, and undefined behavior libs	24	9922	April 7, 2019
[Discussion] Safety marks for non thread-safe ffi interfaces Unsafe Code Guidelines	2	206	July 1, 2024

Synchronized FFI access to POSIX environment variable functions

Related Topics