PathBuf has set_extension but no add_extension. Cannot cleanly turn .tar to .tar.gz

As discussed at Append an additional extension? - #11 by dpc - help - The Rust Programming Language Forum. dandyvica wrote:

There's a set_extension which replaces an extension but no add_extension . For example:

    use std::path::PathBuf;

    let mut path = PathBuf::new();

    path.push("/var/log");
    path.push("kern.log");
    path.set_extension("gz");

    println!("{:?}", path);

prints out /var/log/kern.gz and not /var/log/kern.log.gz.

As far as I can see, there is no way to do it with PathBuf. Does it make sense to add PathBuf::add_extension?

I don't think this needs to be extension-specific. It'd be nice to have a general method to append to the filename, and then you could just append .gz. (Rust doesn't run on any targets where filename extensions use anything other than . as the extension separator, and no new such targets seem likely to arise, so abstracting the . doesn't seem necessary.)

7 Likes

There have also been file systems where you could have no more than one dot in a filename, but all of the examples I know of are thoroughly obsolete. I doubt we need to worry about them, with the possible exception of ISO 9660 level 1 (used as an archival format by some medical records standards).

I agree with a general method to append to the file name. It would also be nice to be able to strip trailing directory separators (/).

. is already assumed by set_extension. It also doesn't care about how many . you have or add... or if you have /s embedded either for that matter. Also, setting the extension to "" means "remove one extension" not "make an empty extension", though this is undocumented. (Actually making an empty extension is still possible through some silly combinations.)

The point being, PathBuf isn't really strict on logic as-is. The current methods are pretty simplistic string-like manipulations. Something more strict on logic would be a larger effort (perhaps better suited to a crate).

I would say that if a standard library is lacking consistency, it's a reason to fix the standard library, not to create a crate for that. Especially if the issue is more or less trivial.

2 Likes

The standard library is consistent in assuming files have only one extension.

If there was add_extension (which for consistency with other types should be push_extension?), it would imply that there's a vector of extensions. Wouldn't it be inconsistent that you can set the last extension, but not before-last? Wouldn't it be inconsistent that you can add extension at the end, but not insert one before other extensions?

I think the trouble is assuming either. Rust cannot know the meaning of a . in the filename. What it can do is expose functions that allow handling filenames in common ways. So to my mind that should include treating extensions as if they were a vector as well as methods for handling only the last one.

Incidentally, I think of PathBuf as mostly a "dumb" wrapper around a string-like structure which doesn't really do path normalization. So each function has its rules for operating on an arbitrary (ascii compatible) string but these rules should be more or less consistent with each other.

I'd say it attempts to be consistent in defining an extension to be the part following the final ., modulo some exceptions around names that start with .. I wouldn't call it an assumption per se; it was likely deliberate because knowing a file ends in .gz is more useful than knowing it ends in .could.be.anything.gz. You can also remove multiple extensions with multiple calls to set_extension("").

The overloading of set_extension("") to mean "remove an extension" leads to inconsistencies with extension().

Anyway, I think it's all moot in the context of this request if PathBuf gets either an append method or perhaps .as_mut_os_string() instead.

1 Like

This works:

path.set_extension("log.gz");
1 Like

The problem is usually, that I have a file with an unknown name and extension (provided as an input or from FS) and I want to create a derivative name with additional extension. Usually .tmp or .gz or some other compression/container format name.

It's common enough that it would be great to have it in stdlib in some form (personally I don't care about the bikescheded details, as long as as I can put into my code and move on).

4 Likes

Just ran into this. I don't have a strong opinion about whether this should be in the std library, but I would like save 30-90 minutes for the people who have this need and find this page by searching.

First of all the easy one, appending:

use std::ffi::{OsStr, OsString};
use std::path::{PathBuf, Path};

/// Returns a path with a new dotted extension component appended to the end.
/// Note: does not check if the path is a file or directory; you should do that.
/// # Example
/// ```
/// use pathext::append_ext;
/// use std::path::PathBuf;
/// let path = PathBuf::from("foo/bar/baz.txt");
/// if !path.is_dir() {
///    assert_eq!(append_ext("app", path), PathBuf::from("foo/bar/baz.txt.app"));
/// }
/// ```
/// 
pub fn append_ext(ext: impl AsRef<OsStr>, path: PathBuf) -> PathBuf {
    let mut os_string: OsString = path.into();
    os_string.push(".");
    os_string.push(ext.as_ref());
    os_string.into()
}

Prepending is significantly harder, and I couldn't find a portable way to split an OsString, so I gave up making a function that inserts something just after the first component of a dotted filename, and made something that can insert just before the last component of a dotted filename.

/// Prepends something just in front of the very last dot component of your extension.
/// # Example
/// ```
/// use pathext::prepend_ext;
/// use std::path::PathBuf;
/// let path = PathBuf::from("foo/bar/baz.txt.gz");
/// if !path.is_dir() {
///    assert_eq!(prepend_ext("tmp", &path), PathBuf::from("foo/bar/baz.txt.tmp.gz"));
/// }
/// let no_ext = PathBuf::from("foo/bar");
/// assert_eq!(prepend_ext("tmp", &no_ext), PathBuf::from("foo/bar.tmp"));
/// ```
/// 
pub fn prepend_ext(ext: impl AsRef<OsStr>, path: &Path) -> PathBuf {
    let mut parent : PathBuf = path.parent().unwrap().to_owned().into();
    let stem = path.file_stem().unwrap();
    let orig_ext = path.extension();
    parent.push(stem);
    let mut parent_string : OsString = parent.into();
    parent_string.push(".");
    parent_string.push(ext.as_ref().to_owned());
    if (orig_ext.is_some()) {
        parent_string.push(".");
        parent_string.push(orig_ext.unwrap().to_owned());
    }

    parent_string.into()
}
4 Likes