It does not make sense. An Option
is being returned, but if the input contains '\0'
/'='
then it will panic (on some systems). This causes ICE for rustc. I do not think there are public APIs exposed that safely handle these kinds of invalid input, so should it be added?
What error could be returned in the instance of \0
? The std::env::VarError
enum is exhaustive, so a new variant cannot be added. There are only two options: NotUnicode(OsString)
, which is impossible to do correctly, and NotPresent
, which is objectively incorrect.
The current functions should be kept I guess. But I do not think there should not be something people can use to prevent it from panicking.
The reason it panics is that most of the time the key specified is a constant, so the conditions which would cause it to panic are "programmer error" and should just be fixed in the code. If you are calling it with a user-specified variable, then you could write a wrapper function like
pub fn safe_var<K: AsRef<OsStr>>(key: K)
-> Result<Result<String, VarError>, KeyError>
(or with an outer error enum). Maybe something like this is worth adding to std
, I'm not sure how common non-constant environment variables really are .
Is it? On linux anyway, any environment variable name with =
or \0
is denied, and so definitely is not present.
This would be more convincing if it was a compile time error.
It would be good if there was a public Key
type. Then such "programmer errors" could be handled separately
offtopic
A Key
type would also be useful on Windows for reasons that are beyond the scope of this topic.
A compile time error was not possible when Rust 1.0 released. It's almost possible now that const fn
have advanced quite a bit, but still pretty unergonomic
std::env::var({
const KEY: std::env::Key = std::env::Key::new("FOO").unwrap();
KEY
})
in the future it might be ergonomic enough to actually use
std::env::var(const { std::env::Key::new("FOO").unwrap() })
but still comparatively verbose compared to just accepting a runtime panic if you mess up.
After thinking a little more, I realized that it would actually be more consistent with things like vars
and args
if var
returned an Option
like var_os
and panicked on non-UTF8 values in addition to =
and \0
. (Even though I think that was an unfortunate choice.) But for some reason, perhaps because "the value would already be wrapped in something (an Option
) anyway", it was decided to have a Result
for var
with two variants: NotPresent
for the None
case of var_os
and another variant for non-UTF8 values. The panics of var
just mirror the panics of var_os
.
set_var
also panics on =
and \0
, so at least that part is consistent. Just unfortunate.
Would a new lint be favorable in this case?
For the built-in env!
/option_env!
macro ICE, I don't think there can be any feasible solution other than adding some "safe" function(s) in std
, since the error with NUL bytes and equal signs is OS dependent. If an OS does allow these characters, then it would not cause an error. Simply adding checks in the macro won't work since it creates false positives.
It's impossible for \0
to appear in the name of an environment variable, because execve
's third argument is a pointer to an array of C strings, and putenv
takes a single C string.
It is possible for =
to appear in the name of an environment variable, but this has bizarre side effects, or from another perspective, most likely arises as a bizarre side effect of setting an environment variable to a value containing =
. The environment variable block has no quoting mechanism, and getenv
blindly scans it for the argument string plus a trailing '='. Thus, if you do export A=B=C
from the shell, printenv
will report a single variable named A whose value is B=C, and getenv("A")
will give you back B=C
, but at the same time getenv("A=B")
will give you back C
.
There is no way to disambiguate this, but the odds are, I think, that anyone who actually does export A=B=C
means to set a variable named A
to the value B=C
, not to set a variable named A=B
to the value C
, and certainly not to create both of them at the same time.
So my vote is for having std::env::var
(and var_os
) return NotPresent
when its argument contains either \0
or =
.
Looking at the docs, there are two other panic cases:
This function may panic if
key
is empty, contains an ASCII equals sign'='
or the NUL character'\0'
, or when the value contains the NUL character.
It is possible to set an environment variable whose name is the empty string, e.g.
$ python3 -c 'import subprocess; subprocess.run("printenv", env={"":"bar", "A":"X"})'
=bar
A=X
but getenv
will fail if called with argument ""
, so again I think std::env::var
should return NotPresent
in this case.
It is not possible to set an environment variable whose value contains a NUL byte, because, again, the OS primitives work on C strings. Moreover, Rust's String
is perfectly capable of representing values containing U+0000. So I don't understand why that possibility is even considered.
I would prefer std::end::var
not to decide for me whether =
is permissible or not. If I have an environment variable with =
in the name then std::env::var
should return it. It shouldn't artificially return NotPresent
.
IMO var
/set_var
should never panic and instead should return appropriate errors when required. Unfortunately set_var
has no return value. But var
does have a return value and it should be utilized. Since \0
can't be in a var's name then it should just return NotPresent
instead of panicing.
I don't exactly disagree with this, and it is what the C getenv
does, but the problem is that environment strings are ambiguous: if environ[0]
is the C string A=B=C
, that can be interpreted as either an environment variable named A
whose value is B=C
, or an environment variable named A=B
whose value is C
.
This seems akin to the situation where Unix doesn't enforce any particular encoding on argv
strings, and lots of C programs don't pay attention to what encoding they're in, but Rust makes you pay attention. What if env::var
were to enforce a portable name (shell identifier, i.e. /[A-Za-z_][A-Za-z0-9_]*/
) by returning NotPresent for anything that isn't, but env::var_os
just forwarded whatever you gave it to the C library's getenv
? (Or GetEnvironmentVariableW
on Windows, as I see we are already doing.)
Your interpretation of multiple =
agrees with setenv
, which returns an error if the name contains =
(or is empty), but not when the value contains =
.
putenv
accepts entire environment strings ("NAME=value"), and doesn't error on empty strings, strings with no =
, or strings with multiple =
. But the first two don't actually add anything to the environment. It also accepts strings that start with =
, and these do get added to the environment.
(I think the only way to get putenv to error is to run out of space (modulo giving it a bad pointer)).
Your interpretation also agrees with putenv
in that if you pass FOO=bar
and then pass FOO=BAR=bar
, the latter will overwrite the former.
(Linux libc6 2.32)
Well, they're C strings, but perhaps that's a semantic argument. You can just treat them as OsString
s (and if your program works with files, IMO definitely should.)
That would lead to cases where NotPresent
is objectively incorrect. (And be backwards incompatible.)
From man 7 environ
:
By convention, the strings in environ have the form "name=value". The name is case-sensitive and may not contain the character "=". The value can be anything that can be represented as a string. The name and the value may not contain an embedded null byte ('\0'), since this is assumed to terminate the string.
Notably:
- "name=value" is a convention, not necessarily a strict requirement.
- The value is expressly permitted to contain
=
. - The string
A=B=C
is not ambiguous because the name cannot contain=
, thus the name isA
and the value isB=C
.
Even if =
cannot be in the name I would still greatly prefer for Rust to not panic. Just return NotPresent
. Panicking seems overly dramatic for such a generic API.
Why restrict to C functions and *NIX conventions? All of this depends on the platform. A system where environment variables are defined as HashMap<String, String>
or something similar would allow both \0
and =
to appear in the name, which is why I think rejecting strings containing \0
or =
would be undesirable and sometimes incorrect.
I agree with both points, but I disagree that our current limited API ought to be that flexible; and I agree with the OP that a panic!
is not great.
Thus, to me, it looks like two things ought to happen:
-
our current APIs ought to return
NotPresent
or something along those line for invalid keys and whatnot; -
a more precise / fine-grained API could be added:
try_{set_}var
, with that outer layer ofResult
wrapping that @Nemo157 mentioned (or an outer layer ofError
wrapping).
Is anyone here suggesting that it can't be a platform-specific check? Unix certainly doesn't support =
or \0
. Windows probably has other rules too.
Windows is almost the same except the first character of the key name can be =
. However, this is very strongly discouraged (indeed the docs make no mention of it) and it's only used by cmd.exe
as part of a legacy hack to implement per-drive current directories. E.g. =C:=C:\Users\Me
is parsed as ["=C:", "C:\Users\Me"]
.
So it's technically an implementation detail to support the legacy command-line interpreter, albeit one that's unlikely to change due to Windows' strong userspace guarantees.
The other bit of Windows weirdness is that the key name HELLO
is equal to hello
. Or to be more specific, Windows environment keys are case insensitive but case preserving. However, I do not think this affects the topic at hand.