Why should std::env::var panic?

It does not make sense. An Option is being returned, but if the input contains '\0'/'=' then it will panic (on some systems). This causes ICE for rustc. I do not think there are public APIs exposed that safely handle these kinds of invalid input, so should it be added?

2 Likes

What error could be returned in the instance of \0? The std::env::VarError enum is exhaustive, so a new variant cannot be added. There are only two options: NotUnicode(OsString), which is impossible to do correctly, and NotPresent, which is objectively incorrect.

The current functions should be kept I guess. But I do not think there should not be something people can use to prevent it from panicking.

The reason it panics is that most of the time the key specified is a constant, so the conditions which would cause it to panic are "programmer error" and should just be fixed in the code. If you are calling it with a user-specified variable, then you could write a wrapper function like

pub fn safe_var<K: AsRef<OsStr>>(key: K)
    -> Result<Result<String, VarError>, KeyError>

(or with an outer error enum). Maybe something like this is worth adding to std, I'm not sure how common non-constant environment variables really are :man_shrugging:.

Is it? On linux anyway, any environment variable name with = or \0 is denied, and so definitely is not present.

This would be more convincing if it was a compile time error.

1 Like

It would be good if there was a public Key type. Then such "programmer errors" could be handled separately

offtopic

A Key type would also be useful on Windows for reasons that are beyond the scope of this topic.

3 Likes

A compile time error was not possible when Rust 1.0 released. It's almost possible now that const fn have advanced quite a bit, but still pretty unergonomic

std::env::var({
  const KEY: std::env::Key = std::env::Key::new("FOO").unwrap();
  KEY
})

in the future it might be ergonomic enough to actually use

std::env::var(const { std::env::Key::new("FOO").unwrap() })

but still comparatively verbose compared to just accepting a runtime panic if you mess up.

After thinking a little more, I realized that it would actually be more consistent with things like vars and args if var returned an Option like var_os and panicked on non-UTF8 values in addition to = and \0. (Even though I think that was an unfortunate choice.) But for some reason, perhaps because "the value would already be wrapped in something (an Option) anyway", it was decided to have a Result for var with two variants: NotPresent for the None case of var_os and another variant for non-UTF8 values. The panics of var just mirror the panics of var_os.

set_var also panics on = and \0, so at least that part is consistent. Just unfortunate.

Would a new lint be favorable in this case?

For the built-in env!/option_env! macro ICE, I don't think there can be any feasible solution other than adding some "safe" function(s) in std, since the error with NUL bytes and equal signs is OS dependent. If an OS does allow these characters, then it would not cause an error. Simply adding checks in the macro won't work since it creates false positives.

It's impossible for \0 to appear in the name of an environment variable, because execve's third argument is a pointer to an array of C strings, and putenv takes a single C string.

It is possible for = to appear in the name of an environment variable, but this has bizarre side effects, or from another perspective, most likely arises as a bizarre side effect of setting an environment variable to a value containing =. The environment variable block has no quoting mechanism, and getenv blindly scans it for the argument string plus a trailing '='. Thus, if you do export A=B=C from the shell, printenv will report a single variable named A whose value is B=C, and getenv("A") will give you back B=C, but at the same time getenv("A=B") will give you back C.

There is no way to disambiguate this, but the odds are, I think, that anyone who actually does export A=B=C means to set a variable named A to the value B=C, not to set a variable named A=B to the value C, and certainly not to create both of them at the same time.

So my vote is for having std::env::var (and var_os) return NotPresent when its argument contains either \0 or =.


Looking at the docs, there are two other panic cases:

This function may panic if key is empty, contains an ASCII equals sign '=' or the NUL character '\0' , or when the value contains the NUL character.

It is possible to set an environment variable whose name is the empty string, e.g.

$ python3 -c 'import subprocess; subprocess.run("printenv", env={"":"bar", "A":"X"})'
=bar
A=X

but getenv will fail if called with argument "", so again I think std::env::var should return NotPresent in this case.

It is not possible to set an environment variable whose value contains a NUL byte, because, again, the OS primitives work on C strings. Moreover, Rust's String is perfectly capable of representing values containing U+0000. So I don't understand why that possibility is even considered.

5 Likes

I would prefer std::end::var not to decide for me whether = is permissible or not. If I have an environment variable with = in the name then std::env::var should return it. It shouldn't artificially return NotPresent.

IMO var/set_var should never panic and instead should return appropriate errors when required. Unfortunately set_var has no return value. But var does have a return value and it should be utilized. Since \0 can't be in a var's name then it should just return NotPresent instead of panicing.

4 Likes

I don't exactly disagree with this, and it is what the C getenv does, but the problem is that environment strings are ambiguous: if environ[0] is the C string A=B=C, that can be interpreted as either an environment variable named A whose value is B=C, or an environment variable named A=B whose value is C.

This seems akin to the situation where Unix doesn't enforce any particular encoding on argv strings, and lots of C programs don't pay attention to what encoding they're in, but Rust makes you pay attention. What if env::var were to enforce a portable name (shell identifier, i.e. /[A-Za-z_][A-Za-z0-9_]*/) by returning NotPresent for anything that isn't, but env::var_os just forwarded whatever you gave it to the C library's getenv? (Or GetEnvironmentVariableW on Windows, as I see we are already doing.)

Your interpretation of multiple = agrees with setenv, which returns an error if the name contains = (or is empty), but not when the value contains =.

putenv accepts entire environment strings ("NAME=value"), and doesn't error on empty strings, strings with no =, or strings with multiple =. But the first two don't actually add anything to the environment. It also accepts strings that start with =, and these do get added to the environment.

(I think the only way to get putenv to error is to run out of space (modulo giving it a bad pointer)).

Your interpretation also agrees with putenv in that if you pass FOO=bar and then pass FOO=BAR=bar, the latter will overwrite the former.

(Linux libc6 2.32)

1 Like

Well, they're C strings, but perhaps that's a semantic argument. You can just treat them as OsStrings (and if your program works with files, IMO definitely should.)

That would lead to cases where NotPresent is objectively incorrect. (And be backwards incompatible.)

From man 7 environ:

By convention, the strings in environ have the form "name=value". The name is case-sensitive and may not contain the character "=". The value can be anything that can be represented as a string. The name and the value may not contain an embedded null byte ('\0'), since this is assumed to terminate the string.

Notably:

  1. "name=value" is a convention, not necessarily a strict requirement.
  2. The value is expressly permitted to contain =.
  3. The string A=B=C is not ambiguous because the name cannot contain =, thus the name is A and the value is B=C.

Even if = cannot be in the name I would still greatly prefer for Rust to not panic. Just return NotPresent. Panicking seems overly dramatic for such a generic API.

5 Likes

Why restrict to C functions and *NIX conventions? All of this depends on the platform. A system where environment variables are defined as HashMap<String, String> or something similar would allow both \0 and = to appear in the name, which is why I think rejecting strings containing \0 or = would be undesirable and sometimes incorrect.

I agree with both points, but I disagree that our current limited API ought to be that flexible; and I agree with the OP that a panic! is not great.

Thus, to me, it looks like two things ought to happen:

  • our current APIs ought to return NotPresent or something along those line for invalid keys and whatnot;

  • a more precise / fine-grained API could be added: try_{set_}var, with that outer layer of Result wrapping that @Nemo157 mentioned (or an outer layer of Error wrapping).

3 Likes

Is anyone here suggesting that it can't be a platform-specific check? Unix certainly doesn't support = or \0. Windows probably has other rules too.

Windows is almost the same except the first character of the key name can be =. However, this is very strongly discouraged (indeed the docs make no mention of it) and it's only used by cmd.exe as part of a legacy hack to implement per-drive current directories. E.g. =C:=C:\Users\Me is parsed as ["=C:", "C:\Users\Me"].

So it's technically an implementation detail to support the legacy command-line interpreter, albeit one that's unlikely to change due to Windows' strong userspace guarantees.

The other bit of Windows weirdness is that the key name HELLO is equal to hello. Or to be more specific, Windows environment keys are case insensitive but case preserving. However, I do not think this affects the topic at hand.