Out-of-band crate evaluation for 2017-07-18: same-file

@jmst Thanks for your feedback. It's clear that the documentation could probably use a bit more clarification. Although, I find some of your framing to be a bit exaggerated. Saying the crate is "broken" on Windows for example seems a bit off the mark.

Note that the design of this crate is based on how other popular software does this type of detection. It is at the heart of Java's recursive directory iterator, for example, when checking for file system loops (last time I looked).

Your first proposal seems reasonable to me, but I'd be a little weary of saying much more than it does already. Nebulousness is a feature here since I found it very hard to get specific guarantees on this matter. Therefore, it doesn't seem right for this crate to make specific guarantees.

This is pretty far from a critical issue given the probability of one actually observing a bug because of it. Moreover, the existence of trial versions does not imply that everyone is aware of said systems.

Great idea!

It's not useless at all. It's used to detect loops in the file system created by symbolic links. It works. Therefore it's not useless.

That's false. The information provided is that, with some high confidence, if is_same_file returns true, then the two given paths refer to the same file. If you have zero tolerance for false positives, then you have enough information to know that you can't use is-same-file. If you can abide some false positives in rare circumstances, then you have enough information to use it. For example, if a false positive occurs during file system loop detecting during recursive directory traversal, then the results of the traversal will miss some file paths that it should have taken. This is unfortunate.

I believe it is the Windows 128-bit issue that caused me to write that statement. It is very low priority to fix because it isn't a problem in practice. (Or more precisely, I'm not aware of it being a problem in practice. If someone was actually experiencing it, then it would certainly raise the priority level!)

No, it's not bad. It's necessary. Otherwise the underlying file system may reuse whatever identifier one is using. Keeping the handle open prevents that from happening.

With that said, the is_same_file function could probably be implemented more efficiently by using stat calls as you say!

Where is this documented?

The original motivation for is-same-file was as a means for detecting file system loops. Such detection doesn't require handling unresolvable symbolic links. Additional APIs seem OK; but I'd like to hear use cases first.

1 Like