Add zero-cost builder methods to PathBuf

The problem

If a user wants to create a path using a builder pattern (path.join(a).join(b).join(c)), they are inadvertently copying the string buffer on every call to join. This means to easiest way to write the code is the least efficient. The efficient version takes effort and requires the developer to either read the source code or look at the docs to know that join allocates memory (so they have to use push instead):

{
  let tmp = path.join(a);
  tmp.push(b);
  tmp.push(c);
  tmp
}

Underlying issue

join actually comes from Path and allocates a new PathBuf. Calling PathBuf::join will deref coerce down to a Path and then come back with a newly allocated PathBuf.

Potential solutions

New methods

Add new methods to PathBuf such as PathBuf::append, PathBuf::using_extension, and PathBuf::using_file_name.

I don't like this because it still means you have to understand the difference between join/append and why one might be better than the other.

Break people's code (maybe in the next edition?)

Add new methods to PathBuf called PathBuf::join, PathBuf::with_extension, and PathBuf::with_file_name. This will break people's code if they depend on PathBuf::join creating a new owned instance. Going back to old behavior is very easy: (*path).join instead of path.join.

Is a stdlib change like this possible? How would it work?

Original discussion

3 Likes

PathBuf implements FromIterator, so PathBuf::from_iter([a, b, c]) should work.

1 Like

Finally, a practical reason to implement Div for PathBuf, so we can join paths with a / b / c.

:wink:

(Only slightly less-controversially, we could implement Add for PathBuf and write this as a + b + c, as we already do for String.)

Note that in many cases the efficiency benefit of repeated push over join is small or even zero, since each push might also trigger a reallocation and copy. I expect the difference is rarely important in practice except when the number of join calls is fairly high.

16 Likes

Could we perhaps impl Concat in std::slice - Rust for Paths? Or maybe for Components?

After all, [a, b, c].concat() is arguably the "best" way to do a.to_owned() + b + c for strings. (Best way to do string concatenation in 2019 (Status quo) - #5 by scottmcm - help - The Rust Programming Language Forum)

1 Like

Leading to such beauties as a / "/" / c.

:wink:

6 Likes

I believe it's possible to write e.g. path.extend([a, b, c]) if you have PathBuf to push multiple path fragments in one go, which is another option to use.

2 Likes

(Only slightly less-controversially, we could implement Add for PathBuf and write this as a + b + c , as we already do for String .)

This is probably much more controversial :grinning_face_with_smiling_eyes:

6 Likes

This is not ideal in the case a is already a PathBuf but b and c are just &Paths or &strs

Bu that's not that much better compared to repeated pushes since it doesn't return the PathBuf, so you still have to use the tmp variable.

Kotlin style "with" function would be a nice alternative to the builder pattern, even without those fancy syntactical support of the kotlin.

fn with<T, F: FnOnce(&mut T)>(mut base: T, modify: F) -> T {
    modify(&mut base);
    base
}

let buf = with(PathBuf::new(), |buf| {
    buf.extend([a, b, c]);
});
2 Likes
4 Likes

Or you can implement both Div and Add with former using push and latter just appending string (BTW, I do not see how to do it conveniently, and I just happened to have to do exactly that (specifically, appending extension to a file name possibly with an extension) a week ago).

Maybe join, etc. could be added to PathBuf and hidden in pre-2021 editions?

Is that possible? Are there docs I can look at to understand how conditionally adding code by edition would work?

I think that's the best option given that the array syntax requires each path segment to be of the same type. And while the tap library looks pretty sweet (thanks for sharing!), it requires knowing how the path APIs are implemented to seek something like that out.

But the truly beautiful application of Div is ..:

impl Div<RangeFull> for PathBuf {
    type Output = PathBuf;
    fn div(self, _: RangeFull) -> PathBuf {
        self.pop();
        self
    }
}

:wink:

18 Likes

Maybe it's a dumb idea but why not create a path! macro just like how vec! works? Allowing to do something like:

let path = path![ "dir1", "dir2", "file" ];

Here is how it could be implemented:

macro_rules! path {
    [ $($segment:expr),+ ] => {{
        let mut path = ::std::path::PathBuf::new();
        $(path.push($segment);)*
        path
    }}
}
3 Likes

That's pretty neat. How about this to allow minimizing allocations:

macro_rules! path {
    ( $($segment:expr),+ ) => {{
        let mut path = ::std::path::PathBuf::new();
        $(path.push($segment);)*
        path
    }};
    ( $($segment:expr),+; capacity = $n:expr ) => {{
        let mut path = ::std::path::PathBuf::with_capacity($n);
        $(path.push($segment);)*
        path
    }};
}

Usage:

    let root: PathBuf = foo();
    let dir = "forks";
    path!(root, dir; capacity = root.as_os_str().len() + dir.len())

    // OR

    path!(root, dir)
2 Likes

I would not say that just push is what would always be expected result for path / ..: /foo/bar/.. may very well be resolved to /baz because /foo/bar is actually a symlink to /baz/bar. Probably better to leave this out.

I believe {Path, PathBuf} don't currently try to resolve either, favoring the verbatim path indicated by the code and leaving the resolution to other code.

If I'm correct in that assertion, then I don't see the issue with the path! macro as defined above.

1 Like

Path and PathBuf simply treat paths as segments, without resolution as every push (or change in general) would require to check the filesystem for symlinks etc.

Normalization is performed with std::fs::canonicalize for that matter.

Path and PathBuf treat paths as an opaque blob of bytes until you call a specific function (which should document its behaviour). Most of these functions are in some way based on iterating components but things like push essentially just glue bytes together. It's why \0 in paths is valid. Or why you can push . filenames or have trailing slashes even though components ignores these.

Filesystem functions (some of which are implemented on Path) can return a new PathBuf but that is just whatever the OS returns after we call the relevant function.


Personally I feel there's an opportunity with path! macro to do some normalization or to have a compile error on "weird" paths. Though making it suitable for cross-platform code might be an interesting problem.

1 Like