It has long been desired to have a function that is like fs::canonicalize but doesn't need the full path to exist. To this end, I propose a to_lexical_absolute method (final name to be decided).
On *nix this would resolve . and .. path components without following symlinks. If the path is relative then the current directory is used to resolve the path.
How often is removal of .. components necessary as opposed to just getting from a relative to absolute path? The latter can be done using std::env::current_dir()?.join(path). If the argument is an absolute, join will return and ignore self (in this case std::env::current_dir()?).
Good question! I admit my use case is mainly for Windows where .. is evaluated lexically and getting an absolute path, without using fs::canonicalize, is very useful (join is not an adequate alternative).
For generalizing this to all platforms, one of my inspirations was the lexically_* C++ functions. Also people have expressed an interest in a "normal form" for paths. But I can't really speak for them.
So I'm less set on always removing .. components (unless this is wanted) but I would like some way to say "get me an absolute path, in a platform specific way, without the full path needing to exist".
It's useful for normalization where you might want to support something like /config/public/entry-1/../entry-2/, but guard against /config/category/entry-1/../../off-limits/entry-1/ by checking for the prefix /config/public/. Or just have a consistent form you can compare and hash.
OS handling of folder symlinks can be surprising. On SuSE 11 Enterprise Server:
mkdir -p 1/1sub 2/2sub
echo text > 2/file.txt
ln -sf ../../2/2sub 1/1sub/link
cat 1/1sub/link/../file.txt # prints 'text'
cd 1/1sub/link
pwd # prints '.../1/1sub/link'
ls .. # prints '2sub file.txt'
seeing this I sometimes wish to normalize all .. myself - to make situation more predictable.
DIY resolution of .. could help keeping code behavior consistent between OS-s
DIY resolution of .. is faster - no need to call out to OS
DIY resolution of .. might be thought of as useful if you have written app config such as baseDir=/a/b relativePath=../d.txt but folder a/b does not exist
A bonus anecdote: suppose you've done ln -sf .. 1/2; it appears Java has changed what Files.readSymbolicLink returns for 1/2 between version 8 and 11 under Linux: 8 returns 1/2 and 11 returns an empty string.
This is not surprising at all. ls is going to ask "what is .. in the current directory" which is always the actual parent path[1].
The only consistent behavior is to always ask the disk what the parent path is. Resolution of a symlink name is only really done through the $PWD environment variable which is only available in shells and may even be wrong; chdir won't update it. So one needs to resolve $PWD fully, see if it matches getpwd() only then can you even hope to think you have a valid symlink name the user might want to use for that path. All other paths are unknown how one wants to resolve them and the only sensible and consistent thing to do is use absolute paths.
Just for the record, CMake tries to do these symlink-name shenanigans and resolve .. against $PWD rather than getpwd() which makes other tools very confused when CMake makes relative paths that assume $PWD actually works when the disk says you're off in the weeds. For anyone curious, this issue is the task to actually fix this behavior while still supporting the use cases that actually need $PWD resolution[2].
I don't think "smart fixing" of broken configuration setups is something users should get beyond "yeah, that doesn't work, try again?".
[1]Well, mount points can relocate, but at that point, you're going to get a completely different name for such a thing if you want that resolved.
[2]There are HPC setups where the user's home directory is /home/uname -> /mnt/sanXYZ/users/uname where XYZ can change between reboots because $reasons. Here, the /home/unameis the stable symlink one should use for such paths and CMake tries its darndest to do so. The problem is that "rewrite realpath name X to symlink name Y" is done way too low-level and means asking CMake for an absolute path of something ends up giving back a symlink name instead. It is also too low-level for CMake to know that resolving a .. component across these symlink names should instead start looking up the absolute path in order to compute a relative path that actually works when handing it off to another tool.
And, just for the record, I'm not saying this method isn't useful or shouldn't be added, but if you're thinking "this will solve my symlink woes!", it's…not that simple and doesn't save you from actually having to deal with them in a real way.
For that, I think something like this issue is better. Rust's stdlib can expose non-native paths as purely data structure manipulations (since there's obviously no one to ask "what is C:\?" on Linux). It can also be useful to represent paths on remote machines since one wants to ensure that local state doesn't interfere. Of course, symlinks make all kinds of problems with this, but if you're operating on purely logical paths, that's going to be the situation with case folding and Windows/MS-DOS 8.3 transformations too.