Yes, but would probably only be practical with a SmallString
-like type in alloc
somewhere. And that couldn't be used in core
, which might make a bunch of things awkward.
I believe @matklad was referring to the fact that a single-code-point or single-grapheme-cluster &str
could be used instead, which represents no additional burden w.r.t. alloc
.
That's empirically not true – people (especially beginners) wrongly default to operations over code points instead of grapheme clusters, which is usually what the would need but they get misled by the existence of char
.
You can produce char
by value though, whereas &str
is borrowed.
Halfway between "only &str
" and "char
and &str
" would be to represent char
as its UTF-8 bytes- char
retains a fixed maximum size, but can also be cheaply converted to/from &str
.
What is the size in bytes of this new utf8_char
? Is it a fixed-length 3-byte string? Does it have 1-byte or 4-byte alignment (since the latter would be more cache-friendly)? If it's not a fixed-length string, how do you avoid it needing to be a DST?
Alignment 1 Stride 4
It'd be a newtype over [u8; 4]
. Something like this:
Whether it should be aligned I don't know. Probably not, because that way it's sound to cast a &[u8; 4]
to &Utf8Char
(after appropriate UTF-8 safety checking). EDIT: 2e71828's post below makes a good point that this is not generally doable.
You can't reliably get an &[u8;4]
pointing at an arbitrary code unit inside a str
, though: If the last one is less than 4 bytes, the array overruns the string's allocation. One way around this would be to have a dynamically-sized Utf8CharRef
that's a newtype wrapper for a str
that contains exactly one code unit. It can then have a ToOwned
implementation that pads it out to 4 bytes.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.