The API of `Path::exists()` encourages broken code

Agree!

I don't think that "we test this in CI" is a good argument for not logging the error. Firstly If my Ci discovers an error I want to know about it without complicated analysis. Misleading error messages can waste a lot of time as happened to me many times.

Secondly, you test it in CI with controlled environment but then distribute it to users who may have different environment: broken HW/FS, funny permissions or mounts. To those users "Not found" or weird behavior can be very confusing. It can waste their time or cause them to report bugs - they may see the file is there (e.g. if they mistakenly run under a different user) but the application acts as if it wasn't.

This is not a made-up case, it actually happened to me. I ran an application under a separate user for security reasons and it said "file /foo/bar doesn't exist" Debugging with sudo ls is quicker than sudo -u other-user ls so I didn't catch the real problem immediately and it seemed the file is there. Fortunately I realized programmers sometimes write shitty code, so I figured out it's permissions. Otherwise, the authors would have a bug report to deal with.

Is it really that much to ask people to just rewrite .exists() to .try_exists().unwrap() at those few places specifically pinpointed by the compiler? It shouldn't take more than 5 minutes including git commit, especially with cargo fix.

1 Like

I mean, the call to .exists() is inside a #[test] fn, which dose an early exit unless the CI var it set.

Ah, OK, that makes more sense. I would personally still prefer having a correct message.

If this is frequent maybe somehow deprecating it in non-test code only might be interesting?

Deprecation aside, I think this thread has made a compelling argument for a try_exists method. Anyone want to submit a PR for that method?

14 Likes

I will try to do it in the near future.

Is .try_exists().unwrap() desirable when one can already write .metadata().unwrap()?

Edit: Oops; I misread what try_exists() was doing.

I have opened Added `try_exists()` method to `std::path::Path` by Kixunil · Pull Request #81822 · rust-lang/rust · GitHub

I think a better name for exists would have been is_metadata_readable or perhaps the slightly less accurate is_accessible for short. At least I think it would make the short description more accurate:

is_accessible: Returns true if the entity's metadata is readable.

Of course it's too late to change the name now but I think the documentation should make mention of how it's implemented since the expected behaviour of exists appears so strongly tied to the fs::metadata function.

is_accessible would seem like if file is accessible - too confusing. The documentation already has correct information in longer description. I suggested to note it in summary too to be more visible but so far there doesn't seem to be strong agreement.

I think that maybe we should have is_accessible(mode) method too which calls access() on Unix (IDK what on Windows). But we should somehow figure out if this would be used. No point adding methods that one person will use.

In almost 25 years of Unix system programming, I have never encountered a situation where access was the correct choice. In most programs, it can't do anything stat can't do better. In setuid or setgid programs, it's theoretically handy because it does access-control checks using the "real" IDs instead of the "effective" IDs, but any such use is a bug because you can't avoid a TOCTOU race.

1 Like

I'm quite surprised by that. stat doesn't check the same thing as far as I know. I actually consider using real ID a bit surprising if the intended use case is sanity checking it will behave differently than open()

TOCTOU race is not really a problem if you use these things as intended. For instance if there's a risk that a user had access to a file that was revoked one microsecond later, that same user could have read that file sooner using just e.g. cat and not having to bother with tricking SUID programs. No programs should rely on a file to be accessible for a short while because someone will just write an infinite loop with open().

In normal programs (with neither the set-uid nor the set-gid bit set on the executable), the real and effective ID sets are always the same, and therefore access(path, F_OK) returns zero when and only when stat(path, &sbuf) would return zero. R_OK and W_OK are best emulated using open instead, according to the usual "try to do the thing and see if it failed, don't try to check whether it would succeed first" principle. This would also be true for X_OK if O_EXEC were universally supported, but it isn't; in its absence, access(path, X_OK) does do something that there's no other good way to do.

Au contraire. The canonical use case for access in a setuid program is e.g.

    if (access(pathname_supplied_by_user, R_OK) == 0) {
        do_something_with(open(pathname_supplied_by_user, O_RDONLY));
    }

Imagine that an attacker creates a file that they own, invokes a setuid-root program that does this on it, and races to replace that file with a symlink to /etc/shadow. If they manage to replace the file in the window between access succeeding and open being called, the program will read /etc/shadow. This is exactly the form of several privilege-escalation vulnerabilities over the years.

What Unix ought to give set-id programs for this use case is open(path, O_USE_REAL_IDS | ...); in the absence of that feature, you have to drop privileges with setuid and setgid first, which is tricky to get right, and process-global. Alas.

3 Likes

If you only care whether a file exists, why use stat instead of access when you don't need the extra information?

I can think of one possible answer: because access's behavior of using the real UID instead of the effective one is unlike any other API and is surprising. Indeed, I didn't know it did that, and now that I do, I will avoid using it in library code that might be used from a setuid/setgid program. That includes try_exists as proposed here.

But are there other reasons you think stat is preferable?

1 Like

You raised some good points.

R_OK and W_OK are best emulated using open instead

That involves getting a file descriptor and then maybe closing it immediately. One syscall is better than two syscalls but maybe not an issue in most cases.

It does make sense to perform checks first in situations when these are true:

  • Program is going to perform some kind of expensive operation - either computationally or it needs to process lot of user inputs (and failing to do so would lead to wasting time of the user)
  • Some files must be (presumably write) accessible for the operation to finish
  • The files changing in the meantime is theoretically possible but unlikely in practice. Later errors are not ignored.

Checking the accessibility of the files beforehand is a sensible thing to do. Keeping file descriptors around may be OK sometimes, but could be an issue in some scenarios.

Imagine that an attacker creates a file that they own, invokes a setuid-root program that does this on it, and races to replace that file with a symlink to /etc/shadow .

My understanding was that this is the exact kind of thing you're not supposed to do but if too many people do it that way, that's definitely a problem.

I agree completely with the rest. The API sucks.

That is a very reasonable approach. I'd just add two things. Some systems provide a version of access that uses effective UID, which would be consistent with open(). Consider using it when relevant.

Secondly, you're right about libraries, just don't hesitate to use access() in programs - each program either is or isn't designed as SUID. If it is, then you need to have pretty much everything perfect and auditing access would be mandatory. If it isn't then no sane person will ever set SUID bit on such program. Attempting to prevent insane people from harming themselves is pointless - they will find another way.

It's very rare in my experience not to need the extra information. You're going to do something with the file or you wouldn't care whether it exists, and the information provided by stat is probably going to come in handy one way or another. I regret I can't think of any good examples at the moment, though.

I would argue that in almost all cases the Right Thing is to keep the file descriptor around. Yes, the number-of-open-files-per-process limit is set pointlessly low by default on the current generation of Unixes, but that's never going to change as long as people keep working around it instead of complaining.

This includes the case where you want to know whether the file exists/is writable because you're going to invoke some other program that will actually access the file, and you don't want to start it up if it's just going to fail. Doing the check up front is reasonable, but TOCTOU discipline says pass down the file descriptor, not the filename. If the other program isn't set up to allow you do that, that's a bug in the other program.

I can get pretty dogmatic about TOCTOU issues because I have seen so damn many CVEs that boiled down to "surely my program cannot be security critical, so I don't have to care about races, right? Right?" (Wrong.)

1 Like

Wouldn't most programs work assuming you pass them /proc/self/fd/_?

One drawback of that is that it can easily lead to the other program giving less-helpful error messages like "syntax error in /proc/self/fd/3" rather than "syntax error in foo.toml".