Proposal: std::path::resolve

Create a normalized, absolute path from any path.

This is a function that's like std::fs::canonicalize but does not touch the filesystem and therefore will work with paths that may not exist (or may error for other reasons). It does still require the environment for getting the current directory. Also, like canonicalize (and unlike components) the implementation is platform specific.

Proposed POSIX implementation (Playground):

/// Create a normalized absolute path without accessing the filesystem.
fn resolve(path: &Path) -> io::Result<PathBuf> {
    // This is mostly a wrapper around collecting `Path::components`, with
    // exceptions made where this conflicts with the POSIX specification.
    // See 4.13 Pathname Resolution, IEEE Std 1003.1-2017
    // https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_13
    
    let mut components = path.components();
    let path_os = path.as_os_str().as_bytes();
    // "A null pathname shall not be successfully resolved."
    if path_os.is_empty() {
        return Err(io::Error::new(
            io::ErrorKind::InvalidInput,
            "cannot normalize an empty path",
        ));
    }

    let mut resolved = if path.is_absolute() {
        // "If a pathname begins with two successive <slash> characters, the
        // first component following the leading <slash> characters may be
        // interpreted in an implementation-defined manner, although more than
        // two leading <slash> characters shall be treated as a single <slash>
        // character."
        if path_os.starts_with(b"//") && !path_os.starts_with(b"///") {
            components.next();
            PathBuf::from("//")
        } else {
            PathBuf::new()
        }
    } else {
        env::current_dir()?
    };
    resolved.extend(components);

    // "Interfaces using pathname resolution may specify additional constraints
    // when a pathname that does not name an existing directory contains at
    // least one non- <slash> character and contains one or more trailing
    // <slash> characters".
    // A trailing <slash> is also meaningful if "a symbolic link is
    // encountered during pathname resolution".
    if path_os.ends_with(b"/") {
        resolved.push("");
    }

    Ok(resolved)
}

The name could probably do with a few rounds of bikeshedding but I think it gets the idea across well enough for an initial proposal. It's in the path module (not fs) and resolves the path to an absolute one in a similar way to canonicalize does with the exception that it does not attempt to resolve symlinks (therefore any .. components are preserved).

A major criticism of an earlier proposal was it could change the semantics of paths. This attempts to stick to the POSIX specification to make sure the output path is equivalent to the input according to the rules for path resolution.

1 Like

How about somthing like absolute_path (or make_absolute, or some other variant of absolute)? This seems consistent with Path's existing is_absolute method. After all, that's effectively what this is doing: taking a path and making it absolute.

Yeah, I was worried that might be misleading.

This function takes a path, makes it absolute... and it normalizes it. So is_absolute is only testing for some of what this function is doing. I guess something like normalized_absolute might be more descriptive. Though perhaps a bit of a mouthful.

I think resolve is also bad since that makes me think it's actually going to do some kind of lookup (in the vein of DNS resolution and such).

absolute makes some sense, though the "we don't know if this actually exists" makes me wary. How about absolute_lexical?

Doesn't "normalized" also imply removal of . and .. components (it does to me at least)? I'd also expect Unicode normalization on macOS and case normalization on Windows (though case-sensitivity now being a per-directory (not just per-mount anymore!) complicates this…a lot). I don't think that word is appropriate here.

5 Likes

Originally I too thought the code in the OP didn't do any notable normalization, but then I realized it uses the components method, which does some normalization.

IMO that's totally fine for this proposed method to do, and I don't think it's necessary to explicitly call it out in the function's name.

Sorry, I should have made that clearer in the text. For anyone else reading along, this function is pretty much the same as doing:

env::current_dir()?.extend(path.components())

The places where it diverges from this are called out in the code with references to the POSIX specification. I'd also note that the Windows implementation is slightly more involved but fortunately the Windows API provides a function to do it for us (see GetFullPathNameW).

Hmm. What happens with //special/../new_special? I guess some testing with CIFS behavior would be of interest here.

What happens with //special/../new_special?

Essentially nothing. Both a leading // component and .. components are left as is in the OP code. You can test this using the playground link.