Why not impl TryFrom<Vec<u8>> for String?

And TryFrom<&[u8]> for &str, TryFrom<Box<[u8]>> for Box<str>.

At first I realized that there is no direct way to convert Box<[u8]> to Box<str>, although there are String::from_utf8 and core::str::from_utf8. Then I tried to look at TryFrom and found that none of these are implemented.

There is alloc::str::from_boxed_utf8_unchecked, but not safe version.

(Just to make sure you considered it, you can losslessly go Box<[u8]> -> Vec<u8> -> String -> Box<str>. But that certainly isn’t “direct”.)

2 Likes

Everything in the universe can be implemented with TryFrom, but that doesn't mean it should be.

Just like how &str -> &[u8] is called .as_bytes() not .into(), it's intentional that this isn't done via TryFrom. In particular, it's important to distinguish between "look for a BOM and try to decode" or various other such things.

3 Likes

There doesn't seem to be a TryFrom impl for Vec<u8> -> String, either, so that'd be a bit more of an API addition than just another function.

I would think a PR to add a non-unchecked version of std::str::from_boxed_utf8_unchecked should be an easy accept and stabilize after riding the trains for a cycle or two.

... Actually, the relevant difference is that a Box-converting version wants to return the box on validation failure, whereas the borrowing versions don't need to do this. FromUtf8Error returns the bytes as Vec, which wouldn't be an unnecessary copy, but it's an extra axis of API complexity to decide on.

I'd probably argue to just use the existing FromUtf8Error, e.g.

pub fn from_boxed_utf8(v: Box<[u8]>) -> Result<&str, FromUtf8Error> {
    match run_utf8_validation(&v) {
        Ok(_) => {
            // SAFETY: validation succeeded.
            Ok(unsafe { from_boxed_utf8_unchecked(v) })
        }
        Err(error) => Err(FromUtf8Error { bytes: v.into(), error }),
    }
}
4 Likes

It's also called .as_ref(). I think that in an ideal world, AsRef impls should imply the corresponding From. (Recent Zulip discussion)

1 Like

I think that From impls should imply the other-direction TryFrom, and thus contrapositively since I'm not a fan of TryFrom<&[u8]> for &str, I don't think that the other direction should be a From either.

1 Like

The key observation is that while there's only one way to go &str -> &[u8] (inspect the bytes), there's multiple ways to go &[u8] -> String[1] (what encoding are the bytes in?). A From implementation should be total and ideally information preserving (injective), but while that implies that a (non-total) inverse function exists, it doesn't necessarily imply that it's a natural mapping the way the forward function is. It certainly suggests that it probably would be, but it's not always the case.

Though this kind of case where you're covering to/from a canonical underlying encoding or various other interchange formats is basically the only case I can think of where one direction is "inherently natural" but the other direction isn't. Most other cases (e.g. the scalar part of a measure vector) are sufficiently ambiguous that a [Try]From implementation would be questionable in either direction. If &str weren't so primitive in exposing its representation directly, and instead it were e.g. encode_utf8 instead, it'd clearly be the same way, but &str is at a fundamental level "just" a fancy &[u8].


  1. There's only one way to go &[u8] -> &str due to the "already present data" requirement of borrowing. If you allow creating an owned string, however, it could decode from any encoding, be lossy or not, etc. ↩︎

5 Likes

An additional question: Why str a primitive type rather than a simple struct str { bytes: [u8] }? It doesn't seem to be anything special or magic, not even as special as Special types and traits which can be defined directly in std, and UTF-8 guarantee is at API level (same as String).

1 Like

That has been tried multiple times, the latest is

The primary blocker afaiu is compiler performance.

4 Likes