I was hoping for an unsafe fn(&OsStr) -> &str
API (with the invariant that it be UTF8), and it came up in discussion on twitter with kennytm that part of the reason this doesn't exist is that it has been argued it would expose the underlying WTF8 encoding of OsStr, which is meant to be an implementation detail. This is also why you can't have OsStr::as_bytes
in a cross-platform way (only on UNIX).
I have to admit I don't see the reasoning in this, because we already have OsStr::to_str(&self) -> Option<&str>
. This method requires that in the case of UTF compatible data, the OsStr
is in UTF-8 encoding; therefore it requires that the format of OsStr
on all platforms is a superset of UTF-8. It seems to me that being a superset of UTF-8 is the pertinent fact about WTF-8 that we might want to keep private (that it is that instead of a superset of UTF-16 like UCS-2).
So, my belief is that we already expose the only thing we would really care about hiding about WTF-8. I also note that AFAIK, we've been using WTF-8 unchanged since before 1.0 and have never had any desire to change the encoding.
So I'd propose accepting that WTF-8 is our representation on Windows, adding methods to OsStr and OsString to convert them to their byte representation, and to unsafely convert them to strings without checking their UTF8 well formedness (I think this should be done regardless of whether we add access to their byte representation, because an invariant of safely using that API is that you don't convert non-utf osstrs).
I guess perhaps one concern is that the as_bytes
will return a different value on different platforms. One option would be to add a windows version of OsStrExt
which contains as_bytes
like the unix version does. But this would be far from the only std API with platform-divergent behavior (e.g. the native endianness methods, the specific error codes returned by IO operations, etc).