And TryFrom<&[u8]> for &str
, TryFrom<Box<[u8]>> for Box<str>
.
At first I realized that there is no direct way to convert Box<[u8]>
to Box<str>
, although there are String::from_utf8
and core::str::from_utf8
. Then I tried to look at TryFrom and found that none of these are implemented.
There is alloc::str::from_boxed_utf8_unchecked
, but not safe version.
(Just to make sure you considered it, you can losslessly go Box<[u8]> -> Vec<u8> -> String -> Box<str>. But that certainly isn’t “direct”.)
Everything in the universe can be implemented with TryFrom
, but that doesn't mean it should be.
Just like how &str
-> &[u8]
is called .as_bytes()
not .into()
, it's intentional that this isn't done via TryFrom
. In particular, it's important to distinguish between "look for a BOM and try to decode" or various other such things.
There doesn't seem to be a TryFrom
impl for Vec<u8> -> String
, either, so that'd be a bit more of an API addition than just another function.
I would think a PR to add a non-unchecked version of std::str::from_boxed_utf8_unchecked
should be an easy accept and stabilize after riding the trains for a cycle or two.
... Actually, the relevant difference is that a Box
-converting version wants to return the box on validation failure, whereas the borrowing versions don't need to do this. FromUtf8Error
returns the bytes as Vec
, which wouldn't be an unnecessary copy, but it's an extra axis of API complexity to decide on.
I'd probably argue to just use the existing FromUtf8Error
, e.g.
pub fn from_boxed_utf8(v: Box<[u8]>) -> Result<&str, FromUtf8Error> {
match run_utf8_validation(&v) {
Ok(_) => {
// SAFETY: validation succeeded.
Ok(unsafe { from_boxed_utf8_unchecked(v) })
}
Err(error) => Err(FromUtf8Error { bytes: v.into(), error }),
}
}
It's also called .as_ref()
. I think that in an ideal world, AsRef
impls should imply the corresponding From
. (Recent Zulip discussion)
I think that From
impls should imply the other-direction TryFrom
, and thus contrapositively since I'm not a fan of TryFrom<&[u8]> for &str
, I don't think that the other direction should be a From
either.
The key observation is that while there's only one way to go &str -> &[u8]
(inspect the bytes), there's multiple ways to go &[u8] -> String
[1] (what encoding are the bytes in?). A From
implementation should be total and ideally information preserving (injective), but while that implies that a (non-total) inverse function exists, it doesn't necessarily imply that it's a natural mapping the way the forward function is. It certainly suggests that it probably would be, but it's not always the case.
Though this kind of case where you're covering to/from a canonical underlying encoding or various other interchange formats is basically the only case I can think of where one direction is "inherently natural" but the other direction isn't. Most other cases (e.g. the scalar part of a measure vector) are sufficiently ambiguous that a [Try
]From
implementation would be questionable in either direction. If &str
weren't so primitive in exposing its representation directly, and instead it were e.g. encode_utf8
instead, it'd clearly be the same way, but &str
is at a fundamental level "just" a fancy &[u8]
.
There's only one way to go
&[u8] -> &str
due to the "already present data" requirement of borrowing. If you allow creating an owned string, however, it could decode from any encoding, be lossy or not, etc. ↩︎
An additional question: Why str
a primitive type rather than a simple struct str { bytes: [u8] }
? It doesn't seem to be anything special or magic, not even as special as Special types and traits which can be defined directly in std, and UTF-8 guarantee is at API level (same as String).
That has been tried multiple times, the latest is
The primary blocker afaiu is compiler performance.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.