Thread-safe environment variable mutation

Almost a year ago (yikes!) in the "Synchronized FFI access to POSIX environment variable functions" thread, I promised to write a proposal for changes to the POSIX specification that would allow thread-safe read-write access to environment variables. I have finally gotten around to this, and I thought I would give IRLO a chance to kibitz it before I start showing it to C library implementors.

https://www.owlfolio.org/development/thread-safe-environment-variable-mutation-working-draft-2022-15/

My blog doesn't have comments, so for now, feedback here is fine. (Once I do start showing it to C library people, feedback will probably need to move to libc-alpha@sourceware.org or some related mailing list.)

15 Likes

Looks great!

A few bits of feedback on corner cases and rationale:

  • env_next should document what happens if you call it again after it returns NULL. That way, users and language bindings know whether they need to special-case this themselves. Ideally, it should return NULL repeatedly rather than invoking UB in this case.
  • The rationale for env_lookup returning a string that includes the NAME= seems implicit. On the surface, this seems error-prone for usages that primarily want getenv; if you just passed the NAME you know what it is. And it'd be possible for the data structure to index based on the pointer value to the value rather than the NAME=value. I get the impression the rationale is to allow for passing it back to putenv and to simplify the implementation, but it seems worth documenting that rationale and documenting what the best alternative would be and the upsides and downsides of that alternative.
    • In particular, optimizing for the common case of retrieving a value rather than the uncommon case of subsequently passing it to putenv, env_lookup could return a pointer to the value but guarantee that NAME= appears before it, so that if you want to call putenv you could subtract that offset.
    • Alternatively, this proposal could hook setenv rather than putenv, and then do lookups based on the value.
  • It doesn't seem inherently required that a read lock held by ENV_ITER couldn't be passed from one thread to another. Many threading systems have locks that can be taken by one thread and released by another, and there are also alternate implementations that don't require a read lock (see the next bullet). It'd be nice if the Rust wrapper around ENV_ITER could be Send. How awful would it be for the spec to require that ENV_ITER be usable from multiple threads (though only one at a time)?
  • How important is the requirement that env_iter snapshot the environment such that new changes aren't visible, rather than just requiring that it iterates without crashing and returns pointers that won't be modified, even in the face of concurrent modifications? That would allow for qsbr-based implementations, such as storing the environment in a concurrent hash table. And it'd eliminate the requirement for a full read lock and associated deadlock.
  • Consider using "must not" rather than "may not", for clarity.
5 Likes