Idea: `clone_with_capacity`

Cloning Vec will trim the capacity for space optimization, but in some cases pushing something to cloned Vec are necessary. For example, when I build Vec<PathBuf> to copy thousands of files to another directory and rename each file to another name(mostly longer than the origin), the underlying reallocation is almost inevitable, resulting in much time waste. I'm not sure if it is worth doing so; it is not that common after all. It applies to many other containers, of which Vec is just one case for convenience.

You can call Vec::with_capacity and then vec.extend(&other), but a vector of thousands isn't very big in the first place. If anything, I suspect you'd hit more reallocation activity while changing each PathBuf, but even that will be dwarfed by the syscalls to make the actual filesystem changes.

5 Likes

For a type like PathBuf, you’d need vec.extend_from_slice(&other) because it’s not Copy. I’m now noticing the documentation of Vec::extend_from_slice misses this difference.

Yes, I mean the reallocation of each PathBuf, not Vec<PathBuf>. Thanks for your reminding

You can also use something like let mut new_paths = Vec::with_capacity(old_paths.capacity()); new_paths.clone_from(&old_paths); to clone old_paths into new_paths while keeping the same capacity. clone_from is a method on Clone so it works on all Clone types. However most types use the default implementation of *self = other.clone(); which won't reuse any memory. Vec has an implementation of clone_from which reuses memory though.

In addition to its own capacity, Vec::clone_from uses clone_from on each item. PathBuf also optimizes clone_from, forwarding multiple layers down to its inner Vec. So you could do something like:

let buffers = iter::repeat_with(|| PathBuf::with_capacity(new_path_capacity));
let mut new_paths = Vec::from_iter(buffers.take(old_paths.len()));
new_paths.clone_from(&old_paths);

Or you can deal with each rename on the fly:

old_paths.iter().map(|path| {
    let new_capacity = todo!("calculate from path");
    let mut rename = PathBuf::with_capacity(new_capacity);
    rename.clone_from(path);
    // modify further...
    rename
}).collect::<Vec<_>>();

This makes me wonder if it's better done as an iterator-like thing instead, since collect can get the capacity right.

This is like how my_str.to_owned() + other_str also has the problem -- but the fix isn't .to_owned_with_capacity(), but to instead use [my_str, other_str].concat(), which allocates the correct amount of space up front

4 Likes

Thanks for the alternatives. By the way, I find that the function I propose relies on the fact that the old Clone object must have some extra capacity, which is sometimes unnecessary.