Lack of API mutating args at std::process::Command

std::process::Command has no way to mutate the supplied arguments when one wants to execute a process again. Sometimes (esp in testsuites) this is required.

The other attributes (environment variables, current_dir) have API's for mutating them.

Therefore I'd like to propose/discuss what could be done for the commands args.

Traditionally program arguments are an array of strings where argv[0] is slightly special as it (usually) contains the program name, it can still be changed but is not part of the arguments handled by the argument parser. std::process::Command will already initialize that for you.

For mutating the argv as bare minimum I'd propose:

pub fn args_reset()

Shall reset the argument vector to the state it was when Commmand::new() created it. That is: setting its length to 1 keeping argv[0] intact.

Furher (to be discussed) API's may be:

pub fn arg_set(index: usize, value: &str) -> Result<(), Error>

replacing the argument at the given index, results in an error when the index was out of range. This can be used to change argv[0]

pub fn arg_pop() -> Result<(), Error>
// and/or
pub fn args_pop(n: usize) -> Result<(), Error>

drops the last element / n last elements returns an Error when there are less elements available. (discussion: shall argv[0] be preserved?)

pub fn args_new<I, S>(&mut self, args: I) -> &mut Command where
    I: IntoIterator<Item = S>,
    S: AsRef<OsStr>,

sets a new list of arguments, like args_reset() followed by args(). leaves argv[0] untouched

1 Like

Can you recreate the Command from scratch?

1 Like

It seems like if std::process::Command impl'd Clone, you could clone it just prior to adding the arguments, and then clone it again if you want to change them. Unfortunately it doesn't.

2 Likes

Sure, the Command can be recreated from scratch, there are workarounds for this issue. Implementing 'Clone' would do the job too.

The point here is that 'Command' is already a builder and can be used for multiple instances with one of the execution triggering methods. The other attributes (Oops as I am writing this I haven't checked for Stdio) do have API's to refine them before firing one execution again, only the 'args' does not. I'd call this an inconsistency in the API.

IMO it would be valuable to add this, to my understanding it won't cost much and won't break anything, just making things more consistent and easier to use.

clear_args() seems like the simplest thing that could work. More than that goes onto a slippery slope of "why not < insert any random Vec method > too?"

That's why I put other API's up to discussion. A 'reset' function is the bare minimum required (i'd suggest reset instead clear, because it should preserve argv[0]).

Additionally I'd opt for the 'set' function that replaces an existing component by index (dropping or returning the old one). Because it is common that one wants to prepare a command to be executed and then run it over different filenames for example.

argv[0] is specified using Command::new(), and can't be set using .arg(), so I think it's fine to present an API as if the "args" meant argv[1..] only, so clear() clears [1..].

Set by index would raise question why not implement Index and command[n] = arg. And if there's set, why not remove, and if there's remove, why not insert, retain, and so on.

Question is answered by that the 'args' is not implemented as single Vec, nor are all the methods Vec offers needed in this case.

My aim is for an usable ergonomic API. Take a look at 'env' which has env_clear() and env_remove().

Having an 'args_clear()' that clears [1..] is Ok. (while sometimes one wants to chage argv[0]) but that is unrelated to this topic, if need arises this can be discussed again under another topic.

Looking at env, values can be updated with the env() and envs() method, since that is an associative store. For args it is very common that one wants to update an argument at a certain position. I am thinking this really deserves to have an API, not for efficiency (which is irrelevant when we talking about spawning processes), but for ergonomic reasons with something like:

let mut command = Command::new("dosomething").args(["FILE"]);
somefiles.iter().for_each(|file| command.arg_set(1, file)?.spawn());

There are certain use cases where one wants to remove or insert more arguments, but that may only fatten the API, in that case it is Ok to clean and rebuild the argv from scratch.

1 Like

Just some observation:

arg_set() should return a Result<Command, Error>

Which surprises me now since the env() functions don't return a Result but a Command. But there are certain ways which would fail setting an environment variable (illegal characters in its name). Is this intentional or a defect.

Any errors will be returned when you call spawn, etc.

I assume they aren't tested any earlier because it would have made the builder less ergonomic back before we had the ? operator.

Actually errors are not handled in that case: (testing on linux here)

fn main() {
    use std::process::Command;

    let out = Command::new("bash")
        .env("FOO=BAR", "BAZ")
        .args(["-c", "/bin/echo $FOO"])
        .output()
        .unwrap()
        .stdout;
    println!("Surprise: {}", String::from_utf8(out).unwrap());
}

this results in "BAR=BAZ" as if one did .env("FOO", "BAR=BAZ") ... which isnt unexpected when you know how the underlying representation in the libc is.....

1 Like

Ah, interesting! I assumed it would do some testing when building the variables but only a null check is done.

Now I am puzzled about how to fix that. Other OSes would certainly put other restrictions on characters a env var name can hold (and possibly different rules on the value as well). Changing the return into a Result<> would be a breaking change, nogo for the stdlib. Validating the env before spawning seems to be costly (ymmw). Possibly one could store the env validation state in Command, dirty it every time the env is mutated and validate it when dirty -> set it to Ok or Error on spawn time, returning Error perhaps.

See this thread for more on env validation.

Huh, Unix Rust seems to use a saw_null variable as a way to signal an error.

I think Command::env() things should follow whats there (PR) finalized. But I have no idea how to deal with a breaking change in rusts stdlib (feature?). Anyway this is going off-topic here (the 'args' thing). Shall we open a new thread on this forum about the 'env' issue or file a ticket straight away about a possible defect? I am the new kid on the block here and don't know the rust process on these things.

I think just mentioning that Command::env also care in that thread should be enough. Easier than splitting up the tasks associated with whatever comes out of it.

The Command env methods are implemented internally using the private construct CommandEnv (rust/process.rs at 456a03227e3c81a51631f87ec80cac301e5fa6d7 · rust-lang/rust · GitHub), which has methods like clear, set, remove. An alternative would be exposing CommandEnv and introducing an equivalent CommandArg (possibly with a clearer name like CommandEnvStore?).

For environment variables, remove makes sense because the variables are like a map, with key and value, and the variables do not rely on order. But arguments may depend on order: an argument can have an effect on the next arguments, so removing a single argument can be much more complicated than removing an environment variable.

In summary, while there are both env_clear() and env_remove(key), and it looks harmless to have a similar arg_clear(), a method arg_remove(index) looks very tricky and error prone.

Also, env_clear and env_remove are necessary because Command::new inherits the current process environment, while there is no such thing for arguments. So even adding arg_clear is not really equivalent to env_clear.