Mini-pre-RFC: Redesigning `process::ExitStatus`


#1

The RFC for ? in main has gotten hung up on design issues related to process::ExitStatus that I think need to be pulled out to their own discussion.

For context, process::ExitStatus is currently only used (in the stdlib) to report a subprocess’s exit status in a Rust parent process. The ?-in-main RFC tries to reuse it as a type you can return from main if you want to report failure to your parent without calling process::exit or printing any error messages. (I did it this way because I think we shouldn’t allow people to return a bare i32 from main. I think that will make certain types of beginner mistakes, like getting confused about when you should have a ; on the last expression in your function, harder to catch.)

The problem with this idea is that ExitStatus as-is has no constructors intended for public consumption, because right now the only code that has any reason to create ExitStatus values is the guts of process::Child::wait. Designing those constructors is where we run into a problem, because boy howdy are there a lot of portability gotchas relating to process exit statuses:

  • The C standard says that main returns an int but only defines the effect of three possible return values: 0, EXIT_SUCCESS, and EXIT_FAILURE. The first two report “successful termination” to “the host environment”, and the third reports “unsuccessful termination”. All other values have an implementation-defined effect. (The same behavior is defined for the exit library function.)

  • Unix divides exit statuses into two classes: “exited”, which means that the subprocess called the _exit primitive, and “signaled”, which means that the subprocess received a fatal signal. This is a hard distinction; it is not possible to generate a “signaled” exit status by passing any value to _exit. (Technically there are two other classes of exit status, but only debuggers and shells need to care about them.)

  • It is supposed to be possible (according to POSIX.1-2008) to pass an arbitrary i32 quantity through _exit to waitid, but none of the Unixes I can conveniently test (Linux, OSX, FreeBSD, and NetBSD) implements this. They either don’t have waitid at all, or only the low 8 or 24 bits of the exit status survive. The much more commonly used waitpid interface can only report the low 8 bits of the exit status.

  • The Bourne shell further confuses the issue by mapping “signal” statuses onto 128 + signal number (i.e. if you observe the value 139 in $?, that could mean either exit code 139 or signal 11; there’s no way to tell). For this reason, good practice on Unix is to avoid using exit statuses above 127.

  • On Windows, unusually, things are simpler. Any DWORD quantity (except 259, which is reserved to mean “that process is still running”) will pass unmolested through ExitProcess to GetExitCodeProcess. There is no such thing as a signal exit status; the catastrophic failure conditions that produce signal exit statuses on Unix instead cause GetExitCodeProcess to return an appropriate NTSTATUS code (e.g. 0xC000_0005 (STATUS_ACCESS_VIOLATION) is more-or-less equivalent to a reported SIGSEGV on UNIX). The only gotchas are that DWORD is u32 (whereas process::exit takes an i32) and that there are hundreds of NTSTATUS codes, they’re scattered all over the number space, and (as far as I know) it’s not documented which ones the system might generate in response to a catastrophic failure condition.

Now, in the ?-in-main RFC we want to make it easy for programs to signal generic success and failure by returning ExitStatus values from main, and that can be handled with zero-argument ExitStatus constructors corresponding to C’s EXIT_SUCCESS and EXIT_FAILURE. There is a minor naming problem because ExitStatus::success() is already taken for the “is this a successful exit status?” predicate, but that’s not the important design problem I want to talk about.

The important design problem is that we also want a constructor that takes an arbitrary i32 and guarantees to pass that along to process::exit, and the existing ExitStatusExt::from_raw (which is the only documented constructor for ExitStatus at the moment) does not do that job. Well, on Windows it does, because the internal representation for ExitStatus on Windows is just the DWORD returned by GetExitCodeProcess, which (except for the value 259) is the same as "a value you can pass to ExitProcess". But on Unix, the internal representation of an ExitStatus is the status reported by waitpid, and that can encode both “exited” and “signaled” statuses, but only “exited” statuses can be converted into a value that can be supplied to _exit, and let s = ExitStatusExt::from_raw(2) produces a signaled status on most Unixes. (The exact encoding of a waitpid status is unspecified in POSIX, but the convention used by Linux and the BSDs both is that an “exited” status has the low byte all-bits-zero and the exit code in bits 8 through 15.)


Given all of that, if backward compatibility were not a concern, I think ExitStatus ought to look like this:

pub enum ExitStatus {
    Exited(i32),
    Signaled(i32)
}
impl ExitStatus {
    fn success() -> Self { Exited(libc::EXIT_SUCCESS) }
    fn failure() -> Self { Exited(libc::EXIT_FAILURE) }
    fn from_code(code: i32) -> Self { Exited(code) }
    fn from_signal(sig: i32) -> Self { Signaled(sig) }
    fn is_successful(&self) -> bool {
        if let Exited(n) = *self { n == 0 }
        else                     { false  }
    }
    // can produce either Exited() or Signaled(); OS-specific implementation
    fn from_wait_status(status: i32) -> Self;
}

The code and signal methods have been removed; if you want more detail than is_successful, you match on Exited versus Signaled, which is better ergonomics than calling code and then signal and having to know that it’s impossible for both of them to return None for the same input. Signaled statuses will never arise from subprocesses on Windows, but portable code needs to consider both anyway.

One way to make this backward compatible would be to make the enum an internal data carrier, with no methods, but still exposed, and give some things different names:

pub enum DecodedExitStatus { Exited(i32), Signaled(i32) }
pub struct ExitStatus { /* inner: DecodedExitStatus */ }
impl ExitStatus {
    // strawman names using the Result convention
    fn ok() -> Self { ExitStatus { Exited(libc::EXIT_SUCCESS) } }
    fn err() -> Self { ExitStatus { Exited(libc::EXIT_FAILURE) } }
    fn from_code(code: i32) -> Self { ExitStatus { Exited(code) } }
    fn from_signal(sig: i32) -> Self { ExitStatus { Signaled(sig) } }
    fn success(&self) -> bool {
        if let Exited(n) = self.inner { n == 0 }
        else                          { false  }
    }
    fn decode(&self) -> DecodedExitStatus { self.inner }
    fn code(&self) -> Option<i32> {
        match self.inner { Exited(n) => Some(n), _ => None }
    }
    fn signal(&self) -> Option<i32> {
        match self.inner { Signaled(n) => Some(n), _ => None }
    }
    // can produce either Exited() or Signaled(); OS-specific implementation
    fn from_raw(status: i32) -> Self;
}

That’s not perfect but it seems acceptable to me. What do y’all think?

None of this addresses the question of what to do if a “signaled” ExitStatus is returned from main, but there are some plausible options (reraise the signal, for instance). If we try to define a type that can only represent “exited” statuses we immediately run into another argument, over whether that should be able to represent any i32 or just the ones that will pass unmolested through the APIs on the current platform, and since we can’t represent that restriction in the type system, does from_code now return an Option, and what is application code expected to do if it gets None… Better to avoid, I think.


#2

I don’t have any strong opinions on the details, but given that I’d really like ? in main, I wanted to say thanks for sticking with this even as the rabbit hole goes deeper and deeper.


#3

I’m half braindead right now, so I didn’t read this whole thing but…

impl Into<ExitStatus> for ()

could fill any backwards compatibility concerns, no?

Makes me want scoped trait implementations though… ugh.


#4

impl Into<ExitStatus> for ()

I’m sorry, I don’t understand how that would help. At all. I don’t even fully understand what it does, but to the extent I do understand, its effects seem to be unrelated to any of the problems I was talking about.


#5

Note that on Windows having an exit code of 259 is perfectly fine and legal and Windows will happily pass it on through. It just makes calling GetExitCodeProcess ambiguous but only if you don’t know whether the process has terminated. Fortunately it is easy to tell whether a process has terminated by just waiting on a handle to it, which is what libstd does when waiting on a child process.

As for the process returning NTSTATUS codes, it’s not that the process exits because of a catastrophic failure condition but because of an exception which was merely unhandled (although there are exceptions which cannot be handled) and the system causes the process to exit with the exception’s integer code. It’s fairly well defined what exception is raised by a given illegal operation. Because NTSTATUS codes for failure always have their two highest bits set, it is fairly easy to avoid exiting with a value that could potentially conflict.


#6

Good to know, thanks. MSDN is not terribly clear on this point.

I was simplifying a bit; I was aware that Windows reacts to CPU-level illegal operations (memory protection violation, invalid instruction, etc) by firing SEH exceptions, and that you get the exception’s code as the process’s exit code when the exception is unhandled. What is still unclear to me, though, is whether this can happen with any documented NTSTATUS code (or, in fact, any DWORD value; one could in principle call RtlDispatchException directly, with an arbitrary exception code?) … and to what extent Rust should bother to care. My current thinking is that on Windows it makes most sense not to try to “decode” exit codes – always produce Exited(n) with whatever we got, never Signaled(n).


#7

There’s certainly no general way to know what caused a specific exit code. You can raise an exception with any code you want, and you can also exit with any code you want.


#8

Fully agreed on that.

But calling std::process::exit(2) is shorter than even naming the std::process::ExitStatus type. I’m not sure why anybody would want to use it.

Can you elaborate on use-cases for that?


#9

If it’s being used for something as common as returning from main, wouldn’t it make sense to add to the prelude?


#10

ExitStatus would only be used in a subset of binary projects, and even there probably only in one file, so I don’t think it belongs in the prelude.

[details=Long-winded text you should probably not bother reading.]There are two dimensions to “common”, though. The nomenclature I first saw:

  • Prevalence: What fraction use it at all?
  • Frequency: Of those that do use it, how many times is it used?

Certainly most binaries return from main, so there’s a reasonable argument for prevalence. (Though even there, libs don’t, and many main methods will be fine returning () or Result<(), _>, and thus not mentioning ExitStatus, so my guess is that overall prevalence would actually still be fairly low.)

But I think for things in the prelude frequency is particularly important. Things like None or .into() are used many, many times. ExitStatus, however, plausibly has a frequency of only two: one mention in the signature, one mention for ExitStatus::from_code.[/details]


#11

I’m tempted to avoid derailing this discussion more, but all I’m saying is that today main expects a () as a return type, right? So something that converted from () into an ExitStatus would leave old programs correct, while allowing for a new return type of main.

I guess what I’ve missed is that there would need to be an implicit .into call.


#12

Question: Did this discussion reach any form of consensus? (I’ve not had time to read it through, though the last comment by @nixpulvis uses the word “derail”, so that’s not promising.)


#13

Question: Did this discussion reach any form of consensus?

It didn’t really make progress in any direction, but I didn’t see any strong objections to the narrowly-scoped process::ExitStatus change proposal, either. I reread the thread just now, and everything that came up seemed to be more properly about ? in main. (I’m not clear on where that discussion is, but it’s not relevant right now.)

FYI, I am locked out of my Github and irlo accounts until next week due to a 2FA botch plus travel, and because of that I’m not reading either regularly and won’t be doing any serious Rust-related work; e.g. if you wanted me to write up a formal RFC for the changes to process::ExitStatus, I couldn’t do that until this coming Wednesday at the earliest.


#14

… correction, there is one thing: @kornel said

calling std::process::exit(2) is shorter than even naming the std::process::ExitStatus type. I’m not sure why anybody would want to use it.

One reason is you may want to make sure that all the destructors get run. Rust doesn’t have any equivalent of atexit as far as I know (which is arguably a good thing) and process::exit is not even guaranteed to clean up the stdlib correctly – the documentation makes it sound like stdout might not get flushed, for instance. (Maybe that’s a mistake in the documentation? I dunno.)

zw


#15

If anyone is still interested in this, there’s a placeholder ExitCode struct that could use some love. (I tossed it together in place of i32 so it could be feature-gated separately from ?-in-main, but it has plenty of opportunities for improvement, and might not even need to exist if ExitStatus can pick up the task.)


#16

For the record, I still care about this but I don’t see myself having any time to work on it till September at the earliest. If someone else picks up the ball I would be happy to kibitz.