It's just annoying to have to convert a PathBuf
to OsString
before converting it to String
.
You can use:
PathBuf::to_str()
to access aPathBut
as anOption<&str>
- It's an
Option
because a system path can have non-utf8 characters and&str
is utf8. - If you want to handle non-utf8 strings use
OsString
directly.
- It's an
PathBuf::to_string_lossy()
- If you want to access path for e.g. printing where you don't care that the
&str
you get out is an exact match (invalid characters won't be printed properly anyway).
- If you want to access path for e.g. printing where you don't care that the
Take a look at OsString
documentation because it explains this in more detail.
PathBuf::into_string()
isn't any more useful because with those two methods you still need two metod calls. OsString::into_string()
returns a Result<String, OsString>
where Err
is for the non-utf8 case anyway. It's much more cost effective to check for encoding (with to_str
) before copying an invalid String you won't be able to use later which is I guess why into_string
doesn't exist.
Not a real solution, I'm afraid.
I need into_string
because it must be into_string
:
- I need an owned string.
- I don't want to clone, both methods you suggest will clone.
- I need to error immediately on non-UTF8, no unnecessary task.
And given that the internal data of a PathBuf
is just OsString
, a PathBuf::into_string
should be zero-cost.
For small additions like this, you can submit an API change proposal.
I don't think there would be any major objections to
impl PathBuf {
pub fn into_string(self) -> Result<String, PathBuf> {
self.into_os_string().into_string().map_err(Into::into)
}
}
as it's just a small helper to make already possible functionality easier to access. There may be some objection that this isn't an operation you "should" be doing (or that you "should" be using camino for known-utf8 paths) and that the longer spelling pushes you towards doing the "right" thing more often, but it's already trivial to panic on invalid Unicode paths without one more Result
returning method.
Thank you for pointing me to camino, I will consider using it in my project. But for simple code with minimal dependencies, an into_string
method is tremendously helpful.
I wouldn't buy the argument that it isn't what we should be doing, as a more expensive APIs (such as to_str
or to_string_lossy
) are currently more convenient to call.
a more expensive APIs (such as
to_str
orto_string_lossy
) are currently more convenient to call.
OsString::into_string()
does clone: playground.
All 3 paths do basically the same amount of allocation and copying and just allow handling the error at different points. OsString
and String
have different guarantees about stored data and internal representation which means the data has to be moved to turn one into another. The only difference is into_*
does explicit drop of self.
Use path_buf.to_str().unwrap().to_string()
and create a utility trait extending PathBuf
if you do it often enough that you'd save a lot of time by calling just a single method. If you're not going to keep the OsString
if the operation fails (i.e. you unwrap), I believe to_str
might be more clear.
This playground is wrong. addr_of!(a)
is the address of the variable holding the PathBuf
or Result<String, …>
, respectively, not the address of the data.
use std::path::PathBuf;
pub fn main() {
let a = PathBuf::from("/hello/world");
println!("a: {:p}", a.as_os_str());
let a = a.into_os_string().into_string();
println!("b: {:p}", a.as_ref().unwrap().as_str());
}
a: 0x5609d34bf9d0
b: 0x5609d34bf9d0
OsString
/OsStr
(and PathBuf
/Path
) do have guarantees of internal data representation: If their data is a valid unicode string, then it’s represented in UTF8, just like String
/str
, as evidenced by the existience of the to_str
methods (for Path[Buf], for OsStr[ing]) returning a borrowed str
, i.e. pointing to data that already existed in this format.
pub fn to_str(&self) -> Option<&str>
Though this method existing does not already prove conclusively that convering between owned String
and OsString
/PathBuf
can happen without cloning, it is the case that it can, as the fixed playground demonstrates. For OsString: From<String>
, this fact is even documented. Of course the other way does still involve a linear scan, in order to validate all the data, which is also why the method is called to_str
, not as_str
, as it’s not a cheap constant-time operation. So converting OsString
or PathBuf
to String
is not a lot cheaper than it would be to copy the data to a new allocation, but still, it doesn’t copy. (Converting the other way is cheap though.)
The byte encoding is an unspecified, platform-specific, self-synchronizing superset of UTF-8. By being a self-synchronizing superset of UTF-8, this encoding is also a superset of 7-bit ASCII.
In practice, this unspecified encoding is currently WTF-8.
To be super clear, that's only true for Windows (and, to reiterate, it's not currently a stable guarantee). Other platforms may or may not have their own UTF-8 superset encoding. On posix systems this is usually arbitrary bytes which, for the sake of simplicity, are assumed to be UTF-8 like by default. You can do a manual encoding/decoding using (for example) the C locale or whatever.
Btw, Windows itself only really guarantees valid unicode for paths. That paths may not be validated is an implementation detail.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.