The API of `Path::exists()` encourages broken code

Consider a situation in which a user wants to check existence of a file. However, the user doesn't have access to the directory that would contain the file. As a result, permission denied error happens internally but gets silently converted to false.

I don't believe that "false" is the right answer to "does this file exist?" if a program is not allowed to know.

I suggest to replace exists with try_exists, which returns io::Result<bool> and deprecate exists. I suspect an RFC is needed for this, right?

I guess that some people might have meant accessible when writing exists, so perhaps add accessible method that internally calls access on Unixes (IDK if there's something similar on Windows). However there's an interesting difference between UID and EUID, which should be documented.

Other discussion in Add `fs::exists` function by hamza1311 · Pull Request #3375 · tokio-rs/tokio · GitHub

13 Likes

To be honest, probably a lot of code is subtly broken by exists() in all forms, because of time-of-check-time-of-use error.

17 Likes

I would love to see that happen. I don't like that exists squelches other errors, either.

6 Likes

To my surprise, there's no warning about race conditions in the docs but the error behavior is documented well. That should be easy to improve, even without RFC.

One more real-world use case for some kind of exists to exist (:D) is for flag files - sometimes it's useful to store one bit of information by having a file existing or not. I actually use this, but not in Rust code. The main advantage is the flag can be trivially set/cleared by a single command.

I find it troubling that the doc says:

If you want to check errors, call fs::metadata .

Sadly, the most concise correct code to do that that I came up with is this:

use std::io::ErrorKind;
fs::metadata(&path)
    .map(|_| true)
    .or_else(|error| if error.kind() == ErrorKind::NotFound { Ok(false) } else { Err(error) })?

My guess is many people would rather ignore the error than write such a long error-checking code. Especially newbies who don't have enough experience to come up with this concise form.

3 Likes

I've created a PR to improve the documentation of Path::exists. Should be good first step I think.

4 Likes

My personal experience isn't inline with the claim. There are a bunch of calls to Path::exists in rust-analyzer codebase:

I've briefly looked at them, and they all seem correct to me? Any API other than Path::exists would make these calls clunkier. I might be missing something of course -- all those occurrences are pretty shallow, so, if you can spot a bug, please send a PR with improvements :slight_smile:

The usages fall into thee categories:

Probing: try several different paths to find which one exists (eg, discover Cargo.toml in one of the parent directories). Returning an IO error here would result a user-visible bug, as the paths after the erroring one won't be probed at all.

Freshness Check: the application itself creates a path, and uses .exists to skip the work. Bailing with an io::Error out of .exists in this case wouldn't be outright wrong, but would suffer from TOCTOU. It's better to fail when you try to do a specific thing after .exists returned false, rather than during a pre-flight check.

Sanity Check: the path should exist, and, if it doesn't, the app bails. Here, there's no meaningful difference between io::Error and the thing not actually existing.

2 Likes

I think the standard library really needs

impl File {
    fn open_optional(&Path) -> Result<Option<File>> { ... }
}

to encourage people to do the right thing by default.

(Most of my Rust code is Linux specific so I use fd-relative paths a lot and use my crate's equivalent helper openat_ext::OpenatDirExt - Rust )

I've been meaning to write up a longer blog about this but IMO it's almost always a better pattern to use an "outer result" for "fatal errors" instead of having API users inspect error types.

If we added this API then we'd need to update this section of the book to use a different example. Personally I much prefer the pattern of having an "outer Result<>" for errors which should always be fatal, and using an inner enum/result (or Option<> like above) for cases we reasonably expect to happen, like a nonexistent file.

Thanks for providing some real-life examples!

Probing: does it actually make sense to continue probing if the directory is inaccessible? This could mean messed up FS (permissions or whatever), so continuing may lead to confusing/invalid results. I believe it'd make sense to at least print a warning if such a problem occurs if you want to continue.

Freshness Check: yes, this is the case I mentioned - flag file. Not sure what exactly is being checked there (shouldn't it check timestamps too?). I assume it checks if the file was already created. I believe in this case if it's impossible to access the directory the creation is likely to fail as well so continuing might be waste of time. Also perhaps printing it for debugging reasons is valuable.

Sanity Check: printing io::Error instead of "does not exist" makes debugging easier. Otherwise the message can be incorrect and confuse the user. I was myself victim of such sloppy error reporting and wasted some time on figuring out a problem not long ago (not Rust app though). I'd prefer applications to report these errors more clearly.

I'm willing to look into sending a PR if you agree the changes would improve user experience.

That's an interesting idea although I'm not convinced it should be in std yet. It'd be just convenience function for some kinds of errors which can be implemented more generally:

impl io::Result<()> {
    fn ok_if_not_found(self) -> Self { ... }
    fn is_not_found(self) -> io::Result<bool> { ... }
    fn ok_if_exists(self) -> Self { ... }
    fn is_exists(self) -> io::Result<bool> { ... }
}

This can be composed nicely and is not restricted to File. It also enables this cool pattern:

fn generate_file() -> io::Result<()> {
    let mut file = File::create_new(self)?;
    file.write_all("foo")?;
    Ok(())
}

fn generate_file_if_not_exists() -> io::Result<()> {
    generate_file().ok_if_exists()
}

(Yes, I know this is not perfect example but I actually had to use something similar before.)

it actually make sense to continue probing if the directory is inaccessible?

I do think so.

Do you believe that reporting that there's something suspicious about the FS is valuable enough to justify a bit more complex code that'd report it?

I am not sure, my gut feeling is probably no -- I don't remember getting any bug reports about this. More generally, I don't like trying to anticipate some specific edge-case error (like fs being inaccessible), as there are a lot of edge cases, and you will hit the one you didn't consider.

My preferred error handling strategy for rust-analyzer is to be transparent about what the program is doing (eg, printing the command before executing it), such that the user can retry the steps manually and see the error themselves.

That's interesting because from my experience poor error reporting is probably number one reason for lot of time being spent on debugging. Retrying manually often doesn't work because the environment user uses for retrying is often slightly different. (You can't imagine what kind of frustration and hair pulling I experienced with exactly this when apparmor was silently involved.)

But maybe this doesn't apply to kinds of applications you usually develop.

That many dedicated methods for this usage would strongly suggest that the return value could be its own type. Combinations like writer.flush().is_exists() make no sense.

That's an inherent limitation of io::Error, which comes from a syscall which could in theory return any error code, even if unlikely. I don't think making unexpected error codes UB would be a good idea.

I'm not a fan of io::Error, it's just that no error type can solve this problem.

It can very well make sense. Imagine a FUSE filesystem that opens a backing file every time an operation is done. If the backing file is removed, the write syscall should likely return EEXIST. Or a bit more sensible: What if a FUSE filesystem uses HTTP to perform operations. HTTP doesn't have a way to keep a file open, so every write would likely be a separate PUT request (ignoring buffering). If the response is a 404, the write should likely return EEXIST.

5 Likes