Pre-RFC - allow usage of str::utf8_char_width


While writing a decoder for small binary messages in the context of an embedded device, I needed at some point to try to decode a single utf8-encoded character at the begining of a u8 buffer. I encountered a few problems: I could not decode the entire buffer as a utf8-encoded str, since only the first bytes (first character, up to 4 bytes) were known to be valid utf8 code points. One way to solve this problem is to perform utf8 decoding only on the first character by first extracting a sub-slice containing only the first bytes. However, I needed to know the length of the first character; I saw that a function to get this length was available in the stdlib (str::utf8_char_width), but I couldn't use it because it is part of the str_internals feature group, which is compiler-only. I ended up just copying the code from the function and everything worked.

I know this use case is really uncommon, but I think it would be a good idea to make this function available. After all, the opposite function char::len_utf8 (which gives the length in bytes required when encoding a character using utf8) is already available and stable.

For the time being you can use decode_utf8 in bstr - Rust, or just copy the short definition of utf8_char_width from the standard library.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.