Add zero-cost builder methods to PathBuf

SUPERCILEX · September 10, 2021, 11:22pm

The problem

If a user wants to create a path using a builder pattern (path.join(a).join(b).join(c)), they are inadvertently copying the string buffer on every call to join. This means to easiest way to write the code is the least efficient. The efficient version takes effort and requires the developer to either read the source code or look at the docs to know that join allocates memory (so they have to use push instead):

{
  let tmp = path.join(a);
  tmp.push(b);
  tmp.push(c);
  tmp
}

Underlying issue

join actually comes from Path and allocates a new PathBuf. Calling PathBuf::join will deref coerce down to a Path and then come back with a newly allocated PathBuf.

Potential solutions

New methods

Add new methods to PathBuf such as PathBuf::append, PathBuf::using_extension, and PathBuf::using_file_name.

I don't like this because it still means you have to understand the difference between join/append and why one might be better than the other.

Break people's code (maybe in the next edition?)

Add new methods to PathBuf called PathBuf::join, PathBuf::with_extension, and PathBuf::with_file_name. This will break people's code if they depend on PathBuf::join creating a new owned instance. Going back to old behavior is very easy: (*path).join instead of path.join.

Is a stdlib change like this possible? How would it work?

Original discussion

cuviper · September 10, 2021, 11:40pm

PathBuf implements FromIterator, so PathBuf::from_iter([a, b, c]) should work.

mbrubeck · September 11, 2021, 12:02am

Finally, a practical reason to implement Div for PathBuf, so we can join paths with a / b / c.

(Only slightly less-controversially, we could implement Add for PathBuf and write this as a + b + c, as we already do for String.)

Note that in many cases the efficiency benefit of repeated push over join is small or even zero, since each push might also trigger a reallocation and copy. I expect the difference is rarely important in practice except when the number of join calls is fairly high.

scottmcm · September 11, 2021, 12:08am

Could we perhaps impl Concat in std::slice - Rust for Paths? Or maybe for Components?

After all, [a, b, c].concat() is arguably the "best" way to do a.to_owned() + b + c for strings. (Best way to do string concatenation in 2019 (Status quo) - #5 by scottmcm - help - The Rust Programming Language Forum)

Tom-Phinney · September 11, 2021, 12:29am

Leading to such beauties as a / "/" / c.

CAD97 · September 11, 2021, 2:27am

I believe it's possible to write e.g. path.extend([a, b, c]) if you have PathBuf to push multiple path fragments in one go, which is another option to use.

ibraheemdev · September 11, 2021, 3:28am

(Only slightly less-controversially, we could implement Add for PathBuf and write this as a + b + c , as we already do for String .)

This is probably much more controversial

SkiFire13 · September 12, 2021, 8:44am

This is not ideal in the case a is already a PathBuf but b and c are just &Paths or &strs

Bu that's not that much better compared to repeated pushes since it doesn't return the PathBuf, so you still have to use the tmp variable.

hyeonu · September 12, 2021, 2:48pm

Kotlin style "with" function would be a nice alternative to the builder pattern, even without those fancy syntactical support of the kotlin.

fn with<T, F: FnOnce(&mut T)>(mut base: T, modify: F) -> T {
    modify(&mut base);
    base
}

let buf = with(PathBuf::new(), |buf| {
    buf.extend([a, b, c]);
});

CAD97 · September 12, 2021, 5:38pm

ZyX-I · September 13, 2021, 12:00am

Or you can implement both Div and Add with former using push and latter just appending string (BTW, I do not see how to do it conveniently, and I just happened to have to do exactly that (specifically, appending extension to a file name possibly with an extension) a week ago).

bill_myers · September 14, 2021, 4:40pm

Maybe join, etc. could be added to PathBuf and hidden in pre-2021 editions?

SUPERCILEX · September 16, 2021, 4:46am

Is that possible? Are there docs I can look at to understand how conditionally adding code by edition would work?

I think that's the best option given that the array syntax requires each path segment to be of the same type. And while the tap library looks pretty sweet (thanks for sharing!), it requires knowing how the path APIs are implemented to seek something like that out.

chris-morgan · September 17, 2021, 1:01am

But the truly beautiful application of Div is ..:

impl Div<RangeFull> for PathBuf {
    type Output = PathBuf;
    fn div(self, _: RangeFull) -> PathBuf {
        self.pop();
        self
    }
}

ClementNerma · September 17, 2021, 10:37am

Maybe it's a dumb idea but why not create a path! macro just like how vec! works? Allowing to do something like:

let path = path![ "dir1", "dir2", "file" ];

Here is how it could be implemented:

macro_rules! path {
    [ $($segment:expr),+ ] => {{
        let mut path = ::std::path::PathBuf::new();
        $(path.push($segment);)*
        path
    }}
}

SUPERCILEX · September 19, 2021, 12:36am

That's pretty neat. How about this to allow minimizing allocations:

macro_rules! path {
    ( $($segment:expr),+ ) => {{
        let mut path = ::std::path::PathBuf::new();
        $(path.push($segment);)*
        path
    }};
    ( $($segment:expr),+; capacity = $n:expr ) => {{
        let mut path = ::std::path::PathBuf::with_capacity($n);
        $(path.push($segment);)*
        path
    }};
}

Usage:

    let root: PathBuf = foo();
    let dir = "forks";
    path!(root, dir; capacity = root.as_os_str().len() + dir.len())

    // OR

    path!(root, dir)

ZyX-I · September 19, 2021, 12:50am

I would not say that just push is what would always be expected result for path / ..: /foo/bar/.. may very well be resolved to /baz because /foo/bar is actually a symlink to /baz/bar. Probably better to leave this out.

jjpe · September 19, 2021, 7:46am

I believe {Path, PathBuf} don't currently try to resolve either, favoring the verbatim path indicated by the code and leaving the resolution to other code.

If I'm correct in that assertion, then I don't see the issue with the path! macro as defined above.

ClementNerma · September 19, 2021, 8:29am

Path and PathBuf simply treat paths as segments, without resolution as every push (or change in general) would require to check the filesystem for symlinks etc.

Normalization is performed with std::fs::canonicalize for that matter.

chrisd · September 19, 2021, 9:21am

Path and PathBuf treat paths as an opaque blob of bytes until you call a specific function (which should document its behaviour). Most of these functions are in some way based on iterating components but things like push essentially just glue bytes together. It's why \0 in paths is valid. Or why you can push . filenames or have trailing slashes even though components ignores these.

Filesystem functions (some of which are implemented on Path) can return a new PathBuf but that is just whatever the OS returns after we call the relevant function.

Personally I feel there's an opportunity with path! macro to do some normalization or to have a compile error on "weird" paths. Though making it suitable for cross-platform code might be an interesting problem.

Topic		Replies	Views
`into_join` for `PathBuf` language design	10	381	November 10, 2024
[Pre-RFC] Additional path handling utilities	46	2463	March 25, 2019
PathBuf has set_extension but no add_extension. Cannot cleanly turn .tar to .tar.gz libs	11	4003	August 27, 2021
Any chance PathBuf could be renamed? bikeshed (deprecated)	7	1979	March 25, 2019
Why doesn't the `into_string` method be available directly under `PathBuf` even though it already exists in `OsString`? language design	10	855	February 28, 2024