Inherited C-badness in std::io::Error

I often feel we have inherited some of the C errno problems with std::io::Error, specifically when it comes to the lack of sensible arguments to the operation that failed.

My latest example of this is that i got a

Error: Os { code: 2, kind: NotFound, message: "No such file or directory" }

out of a Command::new(...).current_dir(...).arg(...) call of some external command.

The error does not indicate which name was not found, which operation was attempted on it (e.g. many C errors of this type get confusing when you do not have permissions to one of their parent directories) and just in general are very unhelpful compared to our usual Rust error reporting standards.

In this specific case it turned out that I had forgotten to create sub-directory to the temporary directory I wanted to use as current_dir but I spent quite some time looking for issues the Command call might have finding the program to call first.

This does not just affect "No such file or directory" but quite a few others of these strerror(errno) style error messages.

Are there any plans to improve this situation in the standard library?

What was the reason to use std::io::Error all over the place there in the first place instead of more specific errors for specific operations?

6 Likes

Adding this would require an allocation we don't want to force on everyone.

If libc doesn't distinguish between the file itself and the parent directory not being found, then there is no race free way for rust to return different errors either. And the racy way requires extra syscalls that would slow things down a bit.

There is nothing stopping you from writing wrappers that provide more helpful error messages at the cost of allocations and race conditions.

The OS can return any kind of error for every operation. For example it can return network unreachable error when accessing a file (networked filesystem) or thanks to FUSE, literally any error code can be returned by filesystem operations or process spawning.

12 Likes

Where does that allocation even matter? Errors are only created in exceptional circumstances.

Right now that lack of memory allocation comes at the cost of a not insignificant time allocation every time I encounter an error like this.

Across all languages that use this format of parameter-less errors like "Permission denied", "No such file or directory", "Operation not permitted",... I have probably wasted weeks worth of my life in the last two decades as a sysadmin on trying to track down information that was readily available as local variables at the point the error was thrown or one of the stack frames that just passed it on and that situation is just frustrating.

Worse, it is not even as if that information can be recovered once you only have the error, the information content is essentially diluted as soon as you pass through a stack frame that performs two or more operations that could return this error.

1 Like

No, not at all. If you have a bot scanning your application with various nonsense requests then the majority of those requests will likely result in errors. If that includes file-serving or hitting other internal services those in turn may too return errors.

And even simple things like spawning a process can have more erroring syscalls than succeeding ones. E.g. when scanning PATH for an executable it may just call exec with all possible paths until one succeeds.

Across all languages that use this format of parameter-less errors like "Permission denied", "No such file or directory", "Operation not permitted",... I have probably wasted weeks worth of my life in the last two decades as a sysadmin on trying to track down information that was readily available as local variables

strace often is a faster way to get at that than modifying an application. But if modifying the application is an option then error handling crates like anyhow's .context() can be used to decorate low-level errors.

The standard library only provides a thin abstraction over OS primitives, which means it tries to avoid imposing additional costs that may not be wanted. Opting out of such things is generally much harder than opting in.

10 Likes

Opting in is actually much harder when the majority of applications you run are not written by you. Out of 1000 packages installed on a typical Linux systems if I am one of the most productive people I might have written 10 so defaults matter a lot.

That argument doesn't fly, how is it any easier to opt out of the cost of those errors in applications not written by you?

The difficulty only applies to the code, not use.

2 Likes

A sensible default would be an application that gives proper errors with all the details needed for debugging them and if someone uses some sort of optimization feature flag or compiler flag that error is mutilated down to what we have now for people who want the extra 1% of performance that provides. It is always much easier to throw away information automatically that is there than to retroactively recover that information that hasn't been included.

1 Like

No, first someone has to add the functionality to collect and retain more details. Then someone has to do add additional machinery to conditionally remove this functionality. Those are two steps each add complexity on top of the baseline.

And throwing away information once it has been gathered does not recover the cost of gathering it.

As I wrote earlier, the rust standard library provides thin abstractions over OS interfaces, the OS interfaces provide lightweight errors that do not contain the information you want. Doing extra work to enrich those errors is a one-way choice, one the standard library does not make for people, so that rust remains a low-overhead systems programming language.

But library and application developers can still choose to do that enrichment if they consider the ergonomics more important than the perf hit, but std does not have that context, and cannot make that decision on a call by call basis.

Maybe we could do something to make it easier to opt in, but the range of possible API designs huge, and there are a handful of opinionated error handling crates in the ecosystem. Some even capture stack traces on each error, which usually is even more expensive than just allocating some error string. People would definitely complain if every failed syscall would incur stacktrace costs.

3 Likes

Not quite. It has been tried before, there is a proof-of-concept PR here:

I think it just requires someone motivated to convince others of the case for it (the hardest part imo) and resurrect Mara's work.

1 Like

/// A heap-allocated C-string for doing syscalls with.

Since then std has gained a way to avoid allocating for short paths by using a stack buffer. So that'd add back an allocation.

2 Likes

Huh, TIL, do you have a link to that change? I think Mara's work could still apply but depending on the size of that stack buffer (hopefully not large enough that I need to worry about stack probes?) it may not be useful in many cases.

2 Likes

There a trick was used, and that trick is no longer possible with newer optimizations added.

Adding this would require an allocation we don't want to force on everyone.

And by not-adding this we force everyone to have very poor error quality. And no, crates like fs-err are not an answer, because you can't propagate that down your dependency chain. What you could propagate, however, is if io::Error had a generic for extra human-readable data, which could then default to () for current behaviour.

If only the stdlib could have feature flags and know when which sub-crate (e.g. alloc) is active (oversimplistic view, I know). [1]


  1. Though that might still result in people wanting those errors for parts of their requests and not having them for others. â†Šī¸Ž

That doesn't help at all. This is not about allocations being impossible (std::io::Error is only available when allocations are possible anyway), but about not wanting the performance overhead of doing the allocation.

1 Like

I did not mean using the cfg(feature = "alloc") equivalent here (though there recently was a discussion about moving most/parts of std::io to core or alloc, which is why I added the "e.g. alloc").

I was thinking more about custom feature flags like cfg(feature = "io_verbose_errors"), which would help with the "not everyone wants that" problem (at least as long as it's the same for the entire binary).

I think what is being proposed is, feature flags for the stdlib, as an extension of the build-std feature (the downside being you, uh, need to build std)

Then we could have std = { features = ["alloc-fs-err"] } or something like that, where applications could opt-in for better errors across the whole program, at the expense of an allocation. But libraries wouldn't need to be changed, so this wouldn't generate churn. You enable the feature only in your binary crates hopefully.

6 Likes

In theory you don't need feature flags. May be it is possible to pass generic parameter to indicate what error type user want to use, by default it would be std::io::Error, but constructed with passing it &Path. std::io::Error do nothing with this parameter, but if user replace generic parameter that indicate error with something else, then this is something else can allocate PathBuf and copy Path to it.

Can you show an example of what you mean? Rust doesn't have default type parameters in function signatures, so the only workarounds I can think of would be breaking changes.