Pre-RFC overhaul Command for 2021 ed

Hi. I decided to try to do something about Command and wrote this pre-rfc.

(edit 2020-01-05 17:48Z: moved github link to the bottom to make reading easier)

The existing Command api is pretty poor, and the 2021 edition seems to be a good opportuntity to improve it. Is something in this general direction likely to be worth me pursuing?

Summary

std::process::Command is in need of serious overhaul to improve error handling ergonomics and protect programmers from error handling mistakes.

The principal changes proposed are:

  1. Add #[must_use] to several critical types in std::process. [2021]
  2. Provide a cooked method Command::run() which combines spawn, wait, stderr collection, and error checks.
  3. Provide .ok() to turn ExitStatus into a std::result::Result and impl std::error::Error for process::ExitStatus. [2021]
  4. Change the return value of Command::output to a suitable Result type which can represent failure wait status, stderr output, or both. [2021]

No new functionality is proposed, nor any removals of functionality - just API changes to make the existing functionality less error-prone and more ergonomic.

Motivation (overall)

Currently, correct use of Command involves a lot of tiresome boilerplate to deal with ExitStatus and so on. The compiler fails to spot many obvious programming mistakes, including failing to check the subprocess's exit status.

Running subprocesses is inherently complex, and provides many opportunites for errors to occur - so there are some questions about how to best represent this complexity in a convenient API.

Because of the need to make incompatible changes, and the complexity of the process handling problem space and the resulting questions about what our API should look like, an RFC is appropriate.

There are a number of distinct but interlocking problems, many tracked at rust-lang/rust#73131. It seems convenient to deal with them in a single RFC, but there is a separate instance of the RFC template questions for each:

1. Diagnose failure to run, or wait, or check exit status

Motivation

The following incorrect program fragments are all accepted today and run to completion without returning any error:

    Command::new("touch")
        .args(&["/dev/enoent/touch-1"]);
    // ^ programmer surely wanted to actually run the command

    Command::new("touch")
        .args(&["/dev/enoent/touch-2"])
        .spawn()?;
    // ^ accidentally failed to wait, if programmer wanted to make a daemon
    //   or zombie or something they should have to write let _ =.

    Command::new("touch")
        .args(&["/dev/enoent/touch-3"])
        .spawn()?
        .wait()?;
    // ^ accidentally failed to check exit status

Proposed change

The compiler will report accidental failure to: actually run the command; wait for it to complete; or to check its exit status.

This will be done by adding #[must_use] to std::process::Command, std::process::Child and std::process::ExitStatus.

This is not a compatible change. It will be done in the 2021 edition.

Drawbacks

Some code that deliberately wants to leak a child process or ignore an exit status will have to use let _ = or some such. This requirement is in line with general Rust practice.

Alternatives

We could invent an alternative less error-prone API which still gives the same level of detailed control. It would have to have a different name.

Making this change only in the 2021 edition will not help fix existing programs in the ecosystem which have these error handling bugs. We could make the breaking change immediately. But that doesn't seem very friendly.

Prior art and references

Many existing types are #[must_use]. This is generally true of builder types and of types representing possible errors. For another example, consider futures, which only do anything if you await them.

Two of these changes were requested (one by the author of this RFC) in rust-lang/rust#70186 rust-lang/rust#73127.

Unresolved questions

None

2. Provide Command::run

Motivation

Today simply running a subprogram in a minimally correct way looks something like this:

    let status = Command::new("touch")
        .args(&["/dev/enoent/touch-4"])
        .status()?;
    if !status.success() {
        Err(anyhow!("touch failed {}", status))
            ?;
    }

Even this clumsy formulation has the problem that the stderr output from the command goes to our own stderr. That is indeed sometimes correct, but it is not a good default.

Guide level explanation

Provide a new method on Command, called run, which runs the command, and returns a good and useful return value. Program failures, including (by default) any stderr output, produce suitable Errs.

Reference level explanation

    impl Command {
        /// self.stderr is piped by default
        /// If stderr piped, collects the stderr and treats any as an error.
        /// Otherwise, stderr is not collected and not treated as an error.
        ///
        /// Returns `Err(io::Error)` if we failed to spawn or wait for the child, or failed to obtain its output for some reason not caused by the child.
        /// Returns `Ok(Err(CommandError))` if we obtained the child's status and (if applicable) stderr output as expected, but one of those indicated an error.
        pub fn run(&mut self) -> Result<Result<(), CommandError>, io::Error> {
    ...

    struct CommandErrorImpl { status: ExitStatus, stderr: Vec<u8>, }
    pub struct CommandError(Box<CommandErrorImpl>);
    impl CommandError {
        /// Returns `Ok<ExitStatus>` if the only reason for the failure was a nonzero exit status.  Otherwise returns `self`.
        ////
        /// Use this in prefernce to `get_status` if you want to to tolerate some exit statuses, but still fail if there was stderr output.
        /// Eg, someone running `diff`.
        pub fn just_status(self) -> Result<ExitStatus, CommandError> {
            if self.stderr_bytes().is_mepty() { Ok(self.status) }
            else { Err(self) }
        }
        pub fn get_status(&self) -> ExitStatus { ... }
        pub fn stderr(&self) -> Option<Cow<'_, str>> { ... } // uses from_*_lossy
        pub fn stderr_bytes(&self) -> &[u8] { ... }
        pub fn into_raw_parts(self) -> (ExitStstus, Vec<u8>) { ... }
    }

Drawbacks

The nested Result is unusual.

The CommandError type is rather complicated.

Rationale and alternatives

run seems an obviously useful convenience. It should be the usual way to run a synchronous subprocess. The error return from run must be able to contain the exit status and the stderr output; or it might be just an io::Error if things went badly wrong.

Perhaps the io::Error (comms error) should be subsumed into CommandError. Another way to look at that is that it would be hiding the nested Result inside CommandError. The problem is that the accessor methods then all need to be able to report "we failed to actually figure out what the command did". Also, io::Error is not Clone, but CommandError could be if it doesn't contain an io::Error.

My view is that failures within our process, of system calls we are using to do the spawning and IPC, are qualitatively different to failures which occur in the child. A caller who doesn't care about this distinction and has suitable error conversions in scope can just write .run()??.

stderr providing the result of from_*_lossy is needed because the format (*) is not known by the programmer writing portable code. On Unix it is UTF-8. On Windows, I believe it is UTF-16 at least some of the time.

The dichotomy between just_status and get_status is to make it hard to accidentally forget to check for stderr output while checking for an expectected nonzero exit status. Perhaps this is overkill, in which case just status might do.

Prior art

In modern Python 3 the recommended way to run a process is a run function which can capture the stderr (and stdout) output, but does not do so by default.

tcl's exec proc treats stderr output as an error, unless it is explictly dealt with another way.

Nested Result types occur occasionally in Rust code especially when doing complex things.

Unresolved questions

Is it right for run to call stderr output an error by default? I think this is safer, but there is room for reasonable disagreement.

Is it better to have Result<Result<(),_>,_> or to have accessor methods with complex interlocking conditions ?

Is it right to treat stderr as UTF-16 on Windows? Hiding the platform's stderr encoding in the portable API seems right, but we still need to know what the Windows implementation must do to implement .stderr(&self) -> Cow<u8>.

3. Enhance ExitStatus

Motivation

A caller who does not want to capture or check stderr must write un-ergonomic code to handle ExitStatus, calling sucess() and converting the result to an error:

    let status = Command::new(...")
        ...
        .status()?;
    if !status.success() {
        Err(anyhow!("command failed {}", status))
            ?;
    }

Instead we would like to be able to write something shorter, e.g.:

    let status = Command::new(...")
        ...
        .status()?.ok()?;
    }

Proposed change

    impl std::error::Error for ExitStatus { .... }

    impl ExitStatus {
        fn ok(self) -> Result<(), ExitStatus> {
            if self.success() { Ok(()) }
            else { Err(self) }
        }
    }

The new trait impl would probably have to be in the 2021 edition (since it's a very common type and this would perhaps be an inference break) and that really means that the ok method has to as well.

Drawbacks

It is unusual to have an Error which might represent some kind of success code. However, this is appropriate: sometimes a program is expected to give a particular nonzero exit status and unexpectedly exits 0; in which case being able to use the ExitStatus as an error directly is convenient.

Rationale and alternatives

The ok method for converting ExitStatus into a Result is obviously convenient.

If you have determined that things are wrong you can fehler::throw! (or return Err()) an ExitStatus directly.

One obvious alternative to having ExitStatus be an error itself would be to introduce a new ExitStatusError type which would be a newtype for ExitStatus (maybe only for nonzero ones).

Prior art

Many things in Rust already have an ok() method.

I'm not aware of prior art for fn ok(self: Thing) -> Result<(),Thing>.

4. Change the return value of Command::output and abolish Output

Motivation

std::process::Output is an accident waiting to happen. It is a transparent struct which combines desired output with error information in a way that requires programme/reviewer vigilance to see that all the fields are checked.

Worse, the fact that .output() returns a Result gives the impression that errors have already been checked and converted to an appropriate Err.

Actually doing things right is quite hard work.

Today, this is accepted:

    let output = Command::new("ls")
        .arg("/dev/enoent/ls-1")
        .output()?;
    let stdout = String::from_utf8(output.stdout)?;
    eprintln!("got = {}", &stdout);
    // ^ two mistakes:
    //   1. failed to check exit status
    //   2. stderr is discarded

The minimally correct version looks something like this:

    let output = Command::new("ls")
        .arg("/dev/enoent/ls-2")
        .output()?;
    let stderr = String::from_utf8_lossy(&output.stderr);
    if !output.status.success() {
        Err(anyhow!("ls failed {}: {}",
                    output.status,
                    &stderr))
            ?;
    } else if !output.stderr.is_empty() {
        Err(anyhow!("ls exited 0 but produced stderr output: {}",
                    &stderr))
            ?;
    }
    let stdout = String::from_utf8(output.stdout)?;
    eprintln!("got = {}", &stdout);

Proposed change

Abolish std::process::Output and change the return type of .output():

    impl Command {
        /// Like run() but stdout is piped and returned.
        /// Replaces old `.command()` method from 2021 ed. onwards
        pub fn output() -> Result<Result<Vec<u8>, CommandError>, io::Error> { ... }

    // add a field to CommandError:
    struct CommandErrorImpl { status: ExitStatus, stdout: Vec<u8>, stderr: Vec<u8>, }
    impl CommandError {
        pub fn stdout_bytes(&self) -> &[u8] { ... }

Drawbacks

Again, we have the nested Result as for run.

This is not at all forward compatible so must be done in a new edition.

Rationale and alternatives

To be ergonomic to use, and not invite mistakes, .output() must entangle the returned data with the error using Result. If the nested Result is right for run() it is right here too.

The user may want to get the stdout output even if there is an error, so the relevant error variant must contain the stdout. Rather than making a new error type, adding a field to CommandError seems appropriate.

The combination of .status(), .run() and .output() effectively encode, in the choice of which method is called, the decision about whether to buffer and capture stderr and/or stdout. An alternative would be to provide something like stdio::captured(), but that would complicates the API of Child considerably.

With the .run() and .output() API proposed here, it is complex to treat only certain stderr output as an error: one has to examine the CommandError in detail. I think that is more appropriate than the flat approach of the existing process::Output.

Prior art for Command affordances/improvements, in the Rust ecosystem

I searched crates.io for Command and looked for crates providing extra facilities, or replacements for, Command.

I found only two: command-run and execute.

command-run seems to be aimed at some of the problems I discuss here. It too has a run() method which checks the exit status and a different API for collecting output. It does not provide a facility for treating stderr output as an error.

execute seems mostly focused on pipe plumbing and offers very little assistance to help the programmer avoid error halnding bugs.

Future possibilities

This proposal is intended to make the Command API workable, ergonomic, self-contained and yet flexible.

More sophisticated tasks, like setting up process pipelines, are probably best left to the crate ecosystem for the foreseeable future.

github:

https://github.com/ijackson/rust-rfcs/blob/command/text/0000-command-overhaul.md

3 Likes

Editions cannot make breaking changes to the standard library. The same standard library is used by all editions, and a single program may include crates with different editions.

Fortunately, I think all of your changes except #4 are non-breaking changes which could be made at any time.

For change #1, #[must_use] is only a warning by default, so adding it is considered a non-breaking change. For example, it was recently added to Vec::split_off. (Crates that explicitly turn warnings into errors have opted in to potential breakage on toolchain upgrades.)

For change #3, you note that implementing Error for ExitStatus could cause inference errors in existing code. Could you elaborate on this?

For change #4, instead of changing the return type of Command::output, you would need to deprecate the existing method and add a new one with a different name.

13 Likes

Why do you say this? I have written a lot of programs that run subprocesses, and sending subprocess stderr to the same place as parent stderr has been the correct behavior in at least 2/3 of those programs, so it seems like a sensible default to me. In fact, your example of the "right thing"...

... is probably wrong. Stderr output from a tool that exited 0 is probably a progress bar, or other informative chatter intended for the human user of the tool, not error messages.

7 Likes

Other notes: I like the idea of catching as many usage errors as possible at compile time. I am also definitely in favor of making it difficult to ignore unsuccessful exit statuses from child processes. I don't have a strong opinion about exactly how the result of the operation is represented. Finally, have you put any thought into how the API could discourage people from writing code that can deadlock? (For instance, if you pipe output from child to parent, the parent must not call wait until after it has read all the way to EOF on the pipe, or the child might fill up the pipe buffer and get stuck. I don't know how hard it is to make this mistake with the existing API.)

I think the first proposed change makes sense, and doesn't require an edition: let's just mark the relevant methods #[must_use].

The second and fourth changes assume a very specific type of Command usage, where stderr represents an error. That isn't always the case. Sometimes stderr is log output from the underlying command (which should be presented but isn't necessarily a failure). If you're invoking a compiler, for instance, you want to present the compiler warnings but still proceed if the compiler produced the expected output.

As for the third change, the concept of "value-carrying errors" is one that has broader usefulness, and interacts with things like Termination. That's worth solving, but I don't think it's as simple as one Error impl.

But I absolutely think we should make the #[must_use] changes. That seems worth doing as soon as someone sends a PR; I don't think it needs an RFC.

10 Likes

My biggest problem with the API is that it's easy to misuse when you're piping data from the current process to the subprocess or vice versa. These mistakes are usually found when trying it out, but it would be better if the type system enforced that it's used correctly.

I agree with most of this except the impl Error for ExitStatus part. Indeed, if the exit status is a success, it shouldn't be an error.

I find the "sometimes 0 is unexpected" argument to be weak – it is true in other situations as well that sometimes, e.g. some unit tests for handling failure, getting a non-Error value is in fact an error. This does not, however, mean that all types should impl Error.

1 Like

Thanks to everyone for your comments. This has been really helpful.

@mbrubeck thanks particularly to your comments about the stability rules for the stdlib which I wasn't aware of.

It seems everyone is happy with the #[must_use] idea from both a taste/utility point of view, and a stability one. So I have put that on my todo list.

I definitely still think it's worth doing something about Output and providing a Result affordance for Error. @H2CO3, did you prefer the alternative of a separate ExitStatusError type then?

@mrbrubeck I'm not sure why I thought adding a trait impl would be a possible compat or inference break. Something I osmosed somewhere I think. But thinking about it a bit more, it would add trait methods to ExitStatus. If someone had an extension trait for ExitStatus which had methods with the same names as those on Error, I think this would become ambiguous?

Reading through the comments I saw @Aloso and @zackw mentioning deadlock risks. Can you point me to any examples of bugs caused by this? I'm aware of these issues myself but I want to see what mistakes people make :-). And also to see what they were trying to do, which might give me an idea how to provide an API that leads naturally to correct programs.

I was quite surprised to see a majority here thinking inherited stderr is the right default. My view is that treating it as an error is the safer default even if it is correct in less than 50% of cases. But there is definitely ample room for different views here and I'm not inclined to push this particular point against any opposing views. So thanks for the feedback there.

This is considered a “minor breakage” because it can be fixed by annotating the broken code to remove the ambiguity. Such changes are rejected if they are likely to impact a significant amount of existing code. But they are allowed if breakage is sufficiently rare or unlikely in practice.

I would prefer any new designs to be completely type safe. If there is a possibly-zero ExitStatus, that shouldn't impl Error. If there is a statically-know-to-be-non-zero exit code, that could and probably should.

Additionally, there should probably be a conversion from ExitStatus to Result<(), NonZeroExitStatus> with the obvious semantics. It can then be .expect_err()'d in order to cater for the use case you mentioned.

9 Likes

This reminds me that I suggested a redesign of ExitStatus back in 2017: Mini-pre-RFC: Redesigning `process::ExitStatus` It doesn't put a distinction between successful and unsuccessful exit into the type system, but I do like that idea. Perhaps start with

pub enum ExitStatus {
    SuccessfulExit,
    UnsuccessfulExit(NonZeroI32),
    FatalSignal(i32)
}

where the FatalSignal case never gets used on Windows. It's supposed to be possible to pass an arbitrary i32 value through exit and recover it in the parent with waitid, so abstractly we can't pack it down any further, even though I'm not aware of any Unix that actually implements that particular POSIX requirement (see the old thread for gory details).

Agreed.

As another possible option, we could consider allowing ? on the general ExitStatus if there's the potential for it to be successful -- though it's not obvious to me what semantics that should have or type(s) it should produce.

1 Like

There are at least four separate reasons here that I can think of:

  1. Whether the program being invoked only emits stderr output for issues that should be treated as a failure (yet doesn't actually return a failure status)
  2. Whether the default behavior of Command should take such programs into account, at the expense of other cases.
  3. Whether the attached program acts differently if stderr refers to a TTY (emitting useful progress messages, color, etc).
  4. Whether it's reasonable to buffer all of the stderr output.

I think the combination of those factors argues against changing the defaults of Command here.

  1. Programs emitting errors that should be considered fatal should also return a non-zero exit status; if a program emits stderr output but returns a zero exit status, that should indicate something like log output, progress output, or in the worst case perhaps warnings. (And there's a reason -Werror isn't the default; the same arguments apply here, notably the future-incompatibility issues.) It should be possible to trust the exit status of programs.

  2. For Rust's Command to default to "any stderr output is fatal", that would mean Rust was second-guessing the exit status of every program invoked; that would be a substantial overstep for the Rust standard library.

  3. Many programs provide more functionality on stderr if it points to a TTY; this seems particularly true of non-fatal stderr output. For instance, progress bars, or color log output. Redirecting stderr by default would prevent this.

  4. stderr output may be large, and may not be reasonable to buffer in memory. It may be much more than just the classic description of an errno value plus a filename or so, for instance. It might be the warning output from a substantial compilation, or the verbose logging of a network protocol, or similar things that are not proportional to the size of the command line.

16 Likes

Regarding the fire and forget process case, how would people feel about having a fn detach(self) method on the return type from Command::new(...).spawn()?.detach() instead of having to use let _ = ...?

I'd personally love to see a different way of handling piped IO. Currently it uses Option for what could have been statically analyzed.

It has actually bitten me recently when I accidentally tried to move stdout out twice - something that idiomatic code would prevent.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.