The implementation of
char stores the UTF-8 encoding of the
char needle into a
[u8; 4] and uses this for searching within the UTF-8 haystack. So cases like this do not require any UTF-8 to UTF-32 decoding:
But cases like this do require decoding each character of
str from UTF-8 to UTF-32 in order to pass it to the provided
fn(char) -> bool:
char were internally represented as UTF-8 bytes stored in a
[u8; 4] (with padding as necessary), then the performance of the latter case could be better, because it could save the cycles spent on UTF-8 decoding. (However, it would also impact the implementation of
char::is_numeric.) This could also improve the performance of some string iterators,
Is this an accurate summary of the problem?
The encode_unicode crate includes a UTF-8 character type. It would be interesting to implement the (unstable)
Pattern trait for types like
fn(encode_unicode::Utf8Char) -> bool and do some benchmark comparisons.