The API of `Path::exists()` encourages broken code

Consider a situation in which a user wants to check existence of a file. However, the user doesn't have access to the directory that would contain the file. As a result, permission denied error happens internally but gets silently converted to false.

I don't believe that "false" is the right answer to "does this file exist?" if a program is not allowed to know.

I suggest to replace exists with try_exists, which returns io::Result<bool> and deprecate exists. I suspect an RFC is needed for this, right?

I guess that some people might have meant accessible when writing exists, so perhaps add accessible method that internally calls access on Unixes (IDK if there's something similar on Windows). However there's an interesting difference between UID and EUID, which should be documented.

Other discussion in Add `fs::exists` function by hamza1311 · Pull Request #3375 · tokio-rs/tokio · GitHub

25 Likes

To be honest, probably a lot of code is subtly broken by exists() in all forms, because of time-of-check-time-of-use error.

33 Likes

I would love to see that happen. I don't like that exists squelches other errors, either.

10 Likes

To my surprise, there's no warning about race conditions in the docs but the error behavior is documented well. That should be easy to improve, even without RFC.

One more real-world use case for some kind of exists to exist (:D) is for flag files - sometimes it's useful to store one bit of information by having a file existing or not. I actually use this, but not in Rust code. The main advantage is the flag can be trivially set/cleared by a single command.

I find it troubling that the doc says:

If you want to check errors, call fs::metadata .

Sadly, the most concise correct code to do that that I came up with is this:

use std::io::ErrorKind;
fs::metadata(&path)
    .map(|_| true)
    .or_else(|error| if error.kind() == ErrorKind::NotFound { Ok(false) } else { Err(error) })?

My guess is many people would rather ignore the error than write such a long error-checking code. Especially newbies who don't have enough experience to come up with this concise form.

3 Likes

I've created a PR to improve the documentation of Path::exists. Should be good first step I think.

8 Likes

My personal experience isn't inline with the claim. There are a bunch of calls to Path::exists in rust-analyzer codebase:

I've briefly looked at them, and they all seem correct to me? Any API other than Path::exists would make these calls clunkier. I might be missing something of course -- all those occurrences are pretty shallow, so, if you can spot a bug, please send a PR with improvements :slight_smile:

The usages fall into thee categories:

Probing: try several different paths to find which one exists (eg, discover Cargo.toml in one of the parent directories). Returning an IO error here would result a user-visible bug, as the paths after the erroring one won't be probed at all.

Freshness Check: the application itself creates a path, and uses .exists to skip the work. Bailing with an io::Error out of .exists in this case wouldn't be outright wrong, but would suffer from TOCTOU. It's better to fail when you try to do a specific thing after .exists returned false, rather than during a pre-flight check.

Sanity Check: the path should exist, and, if it doesn't, the app bails. Here, there's no meaningful difference between io::Error and the thing not actually existing.

7 Likes

I think the standard library really needs

impl File {
    fn open_optional(&Path) -> Result<Option<File>> { ... }
}

to encourage people to do the right thing by default.

(Most of my Rust code is Linux specific so I use fd-relative paths a lot and use my crate's equivalent helper openat_ext::OpenatDirExt - Rust )

I've been meaning to write up a longer blog about this but IMO it's almost always a better pattern to use an "outer result" for "fatal errors" instead of having API users inspect error types.

If we added this API then we'd need to update this section of the book to use a different example. Personally I much prefer the pattern of having an "outer Result<>" for errors which should always be fatal, and using an inner enum/result (or Option<> like above) for cases we reasonably expect to happen, like a nonexistent file.

Thanks for providing some real-life examples!

Probing: does it actually make sense to continue probing if the directory is inaccessible? This could mean messed up FS (permissions or whatever), so continuing may lead to confusing/invalid results. I believe it'd make sense to at least print a warning if such a problem occurs if you want to continue.

Freshness Check: yes, this is the case I mentioned - flag file. Not sure what exactly is being checked there (shouldn't it check timestamps too?). I assume it checks if the file was already created. I believe in this case if it's impossible to access the directory the creation is likely to fail as well so continuing might be waste of time. Also perhaps printing it for debugging reasons is valuable.

Sanity Check: printing io::Error instead of "does not exist" makes debugging easier. Otherwise the message can be incorrect and confuse the user. I was myself victim of such sloppy error reporting and wasted some time on figuring out a problem not long ago (not Rust app though). I'd prefer applications to report these errors more clearly.

I'm willing to look into sending a PR if you agree the changes would improve user experience.

1 Like

That's an interesting idea although I'm not convinced it should be in std yet. It'd be just convenience function for some kinds of errors which can be implemented more generally:

impl io::Result<()> {
    fn ok_if_not_found(self) -> Self { ... }
    fn is_not_found(self) -> io::Result<bool> { ... }
    fn ok_if_exists(self) -> Self { ... }
    fn is_exists(self) -> io::Result<bool> { ... }
}

This can be composed nicely and is not restricted to File. It also enables this cool pattern:

fn generate_file() -> io::Result<()> {
    let mut file = File::create_new(self)?;
    file.write_all("foo")?;
    Ok(())
}

fn generate_file_if_not_exists() -> io::Result<()> {
    generate_file().ok_if_exists()
}

(Yes, I know this is not perfect example but I actually had to use something similar before.)

2 Likes

it actually make sense to continue probing if the directory is inaccessible?

I do think so.

Do you believe that reporting that there's something suspicious about the FS is valuable enough to justify a bit more complex code that'd report it?

I am not sure, my gut feeling is probably no -- I don't remember getting any bug reports about this. More generally, I don't like trying to anticipate some specific edge-case error (like fs being inaccessible), as there are a lot of edge cases, and you will hit the one you didn't consider.

My preferred error handling strategy for rust-analyzer is to be transparent about what the program is doing (eg, printing the command before executing it), such that the user can retry the steps manually and see the error themselves.

That's interesting because from my experience poor error reporting is probably number one reason for lot of time being spent on debugging. Retrying manually often doesn't work because the environment user uses for retrying is often slightly different. (You can't imagine what kind of frustration and hair pulling I experienced with exactly this when apparmor was silently involved.)

But maybe this doesn't apply to kinds of applications you usually develop.

That many dedicated methods for this usage would strongly suggest that the return value could be its own type. Combinations like writer.flush().is_exists() make no sense.

That's an inherent limitation of io::Error, which comes from a syscall which could in theory return any error code, even if unlikely. I don't think making unexpected error codes UB would be a good idea.

I'm not a fan of io::Error, it's just that no error type can solve this problem.

It can very well make sense. Imagine a FUSE filesystem that opens a backing file every time an operation is done. If the backing file is removed, the write syscall should likely return EEXIST. Or a bit more sensible: What if a FUSE filesystem uses HTTP to perform operations. HTTP doesn't have a way to keep a file open, so every write would likely be a separate PUT request (ignoring buffering). If the response is a 404, the write should likely return EEXIST.

5 Likes

A couple years back, Node.js had a similar set of discussions, where fs.exists() was largely deprecated in favor of either fs.access(), or just error-checking the actual file system call when you make the call.

Node.js did keep fs.existsSync(), mostly for CLI scripts that want to check for e.g. config files.

4 Likes

Do you go on to open Cargo.toml in order to read it? What happens if the file disappears between the exists call and when you open it? Why not combine the existence check and the open?

As for handling the error… stopping the probe might not be the right thing to do, but is silently ignoring it the right thing to do? If the error is "Permission denied", then sure, if you're iterating over parent directories, it probably just means you've gone too far. But what about "Input/output error" (as caused by e.g. disk faults)? "Cannot allocate memory" (meaning the kernel is out of memory)? "Bad address" (meaning you somehow passed an invalid pointer)? In most of those cases it would make sense to at least log the error so the user is aware of it. You can't do that if you don't have the error.

Here too - does something else in the application open the path?

And there is a meaningful difference between io::Error and the thing not existing, when it comes to error messages. I assume that when the app bails, you log some sort of message. If the message says the file doesn't exist, but the real error is "Permission denied", you've misled the user. You could avoid that by changing your message to something vague like "failed to access", but then it's unnecessarily confusing in the common case where the file just doesn't exist. This problem would be avoided if the message included the specific OS error that was received.

9 Likes

There several different probing schemes in different places, so the best way to figure this out would be to look at every specific case. For Cargo.tomls, we are calling an external process (cargo metadata), we do not read them ourselves.

More generally, rust-analyzer happens to be a long-lived process, so point-in-time APIs are not suitable. Sadly, to get a watching API with guarantees you need something like watchman.

In most of those cases it would make sense to at least log the error so the user is aware of it. You can't do that if you don't have the error.

I do think that selectively propagating error or continuing based on the ErrorKind would create more problems. I agree that logging errors is a good idea. But that is true of all fs errors, so it probably makes sense to create an application specific app::fs module which just wraps the whole of std::fs with logging? We should do this for rust-analyzers main code (we already effectively have such wrapper for internal build system) ... This reasoning about app-specific wrappers around fs make me even more sympathetic towards @sunfishcode 's proposal to deprecate IOing Path methods.

Here too - does something else in the application open the path?

Again, this needs looking into every specific example. One that I remember is that we by default skip slow tests. However, we run them on CI, and we also have a dedicated test, which runs only on CI, which verifies that slow tests were not skipped. It works by checking if a special cookie file, created by slow tests, .exists(). Error message is irrelevant in this case.

I concede there's no need for proper error handling here, since (a) you're in a controlled environment where any error other than "No such file or directory" is extremely unlikely, and (b) any such error would trigger a test failure that could be followed up by manual investigation.

So this can be counted as an example where deprecating exists() would cause needless churn in existing code.

But imagining this were new code being written, I'd argue that try_exists().unwrap() would be very slightly preferable to exists() in this situation. That way, if an error somehow did come up, you'd see it in the CI logs rather than having to reproduce it.

Also, it's easy to imagine a similar test where you want to verify that some path doesn't exist. In that case, with exists(), an error would trigger a spurious test success. Still not remotely a big deal, since errors are unlikely, but try_exists().unwrap() would make for one less thing to worry about.

4 Likes