Path::to_lexical_absolute

It has long been desired to have a function that is like fs::canonicalize but doesn't need the full path to exist. To this end, I propose a to_lexical_absolute method (final name to be decided).

On *nix this would resolve . and .. path components without following symlinks. If the path is relative then the current directory is used to resolve the path.

On Windows it would call GetFullPathName.

Sample implementation

fn to_lexical_absolute(&self) -> std::io::Result<PathBuf> {
    let mut absolute = if self.is_absolute() {
        PathBuf::new()
    } else {
        std::env::current_dir()?
    };
    for component in self.components() {
        match component {
            Component::CurDir => {},
            Component::ParentDir => { absolute.pop(); },
            component @ _ => absolute.push(component.as_os_str()),
        }
    }
    Ok(absolute)
}

Playground

11 Likes

I have written this function many times, I would love it in std!

1 Like

How often is removal of .. components necessary as opposed to just getting from a relative to absolute path? The latter can be done using std::env::current_dir()?.join(path). If the argument is an absolute, join will return and ignore self (in this case std::env::current_dir()?).

Good question! I admit my use case is mainly for Windows where .. is evaluated lexically and getting an absolute path, without using fs::canonicalize, is very useful (join is not an adequate alternative).

For generalizing this to all platforms, one of my inspirations was the lexically_* C++ functions. Also people have expressed an interest in a "normal form" for paths. But I can't really speak for them.

So I'm less set on always removing .. components (unless this is wanted) but I would like some way to say "get me an absolute path, in a platform specific way, without the full path needing to exist".

2 Likes

It's useful for normalization where you might want to support something like /config/public/entry-1/../entry-2/, but guard against /config/category/entry-1/../../off-limits/entry-1/ by checking for the prefix /config/public/. Or just have a consistent form you can compare and hash.

4 Likes

I feel like if you're better off here with OS-level primitives here such as O_BENEATH or the like (at least if it's for security purposes).

2 Likes

Really, a combination of both is best. Best-effort from the program for better error messages, OS guarding for security.

2 Likes

OS handling of folder symlinks can be surprising. On SuSE 11 Enterprise Server:

mkdir -p 1/1sub 2/2sub
echo text > 2/file.txt
ln -sf ../../2/2sub 1/1sub/link
cat 1/1sub/link/../file.txt # prints 'text'

cd 1/1sub/link
pwd # prints '.../1/1sub/link'
ls .. # prints '2sub  file.txt'
  • seeing this I sometimes wish to normalize all .. myself - to make situation more predictable.
  • DIY resolution of .. could help keeping code behavior consistent between OS-s
  • DIY resolution of .. is faster - no need to call out to OS
  • DIY resolution of .. might be thought of as useful if you have written app config such as baseDir=/a/b relativePath=../d.txt but folder a/b does not exist

A bonus anecdote: suppose you've done ln -sf .. 1/2; it appears Java has changed what Files.readSymbolicLink returns for 1/2 between version 8 and 11 under Linux: 8 returns 1/2 and 11 returns an empty string.

This is not surprising at all. ls is going to ask "what is .. in the current directory" which is always the actual parent path[1].

The only consistent behavior is to always ask the disk what the parent path is. Resolution of a symlink name is only really done through the $PWD environment variable which is only available in shells and may even be wrong; chdir won't update it. So one needs to resolve $PWD fully, see if it matches getpwd() only then can you even hope to think you have a valid symlink name the user might want to use for that path. All other paths are unknown how one wants to resolve them and the only sensible and consistent thing to do is use absolute paths.

Just for the record, CMake tries to do these symlink-name shenanigans and resolve .. against $PWD rather than getpwd() which makes other tools very confused when CMake makes relative paths that assume $PWD actually works when the disk says you're off in the weeds. For anyone curious, this issue is the task to actually fix this behavior while still supporting the use cases that actually need $PWD resolution[2].

I don't think "smart fixing" of broken configuration setups is something users should get beyond "yeah, that doesn't work, try again?".

[1]Well, mount points can relocate, but at that point, you're going to get a completely different name for such a thing if you want that resolved.

[2]There are HPC setups where the user's home directory is /home/uname -> /mnt/sanXYZ/users/uname where XYZ can change between reboots because $reasons. Here, the /home/uname is the stable symlink one should use for such paths and CMake tries its darndest to do so. The problem is that "rewrite realpath name X to symlink name Y" is done way too low-level and means asking CMake for an absolute path of something ends up giving back a symlink name instead. It is also too low-level for CMake to know that resolving a .. component across these symlink names should instead start looking up the absolute path in order to compute a relative path that actually works when handing it off to another tool.

And, just for the record, I'm not saying this method isn't useful or shouldn't be added, but if you're thinking "this will solve my symlink woes!", it's…not that simple and doesn't save you from actually having to deal with them in a real way.

2 Likes

DIY resolution certainly sacrifices alignment with other tools
but does sound reasonable for some uses.
You certainly need to know what you're doing :slight_smile:

I meant consistency in operation of a given piece of code between
Windows, Linux, MacOS and other more exotic OS-es.

For that, I think something like this issue is better. Rust's stdlib can expose non-native paths as purely data structure manipulations (since there's obviously no one to ask "what is C:\?" on Linux). It can also be useful to represent paths on remote machines since one wants to ensure that local state doesn't interfere. Of course, symlinks make all kinds of problems with this, but if you're operating on purely logical paths, that's going to be the situation with case folding and Windows/MS-DOS 8.3 transformations too.

1 Like

+1

One of my most popular crates exist just to un-break the broken fs::canonicalize.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.