Implement Index<usize> for String and &str

Yes, but would probably only be practical with a SmallString-like type in alloc somewhere. And that couldn't be used in core, which might make a bunch of things awkward.

I believe @matklad was referring to the fact that a single-code-point or single-grapheme-cluster &str could be used instead, which represents no additional burden w.r.t. alloc.

That's empirically not true – people (especially beginners) wrongly default to operations over code points instead of grapheme clusters, which is usually what the would need but they get misled by the existence of char.

You can produce char by value though, whereas &str is borrowed.

Halfway between "only &str" and "char and &str" would be to represent char as its UTF-8 bytes- char retains a fixed maximum size, but can also be cheaply converted to/from &str.

1 Like

What is the size in bytes of this new utf8_char? Is it a fixed-length 3-byte string? Does it have 1-byte or 4-byte alignment (since the latter would be more cache-friendly)? If it's not a fixed-length string, how do you avoid it needing to be a DST?

Alignment 1 Stride 4 :yum:

It'd be a newtype over [u8; 4]. Something like this:

Whether it should be aligned I don't know. Probably not, because that way it's sound to cast a &[u8; 4] to &Utf8Char (after appropriate UTF-8 safety checking). EDIT: 2e71828's post below makes a good point that this is not generally doable.

You can't reliably get an &[u8;4] pointing at an arbitrary code unit inside a str, though: If the last one is less than 4 bytes, the array overruns the string's allocation. One way around this would be to have a dynamically-sized Utf8CharRef that's a newtype wrapper for a str that contains exactly one code unit. It can then have a ToOwned implementation that pads it out to 4 bytes.

6 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.