Pre-RFC - allow usage of str::utf8_char_width

SadiinsoSnowfall · December 8, 2021, 12:56pm

Hi!

While writing a decoder for small binary messages in the context of an embedded device, I needed at some point to try to decode a single utf8-encoded character at the begining of a u8 buffer. I encountered a few problems: I could not decode the entire buffer as a utf8-encoded str, since only the first bytes (first character, up to 4 bytes) were known to be valid utf8 code points. One way to solve this problem is to perform utf8 decoding only on the first character by first extracting a sub-slice containing only the first bytes. However, I needed to know the length of the first character; I saw that a function to get this length was available in the stdlib (str::utf8_char_width), but I couldn't use it because it is part of the str_internals feature group, which is compiler-only. I ended up just copying the code from the function and everything worked.

I know this use case is really uncommon, but I think it would be a good idea to make this function available. After all, the opposite function char::len_utf8 (which gives the length in bytes required when encoding a character using utf8) is already available and stable.

steffahn · December 8, 2021, 1:16pm

For the time being you can use decode_utf8 in bstr - Rust, or just copy the short definition of utf8_char_width from the standard library.

system · March 8, 2022, 1:17pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Feature Request: Utf8 prefix inspection for u8 language design	6	579	March 26, 2024
Wild idea: deprecating APIs that conflate str and [u8] libs	59	3620	November 12, 2020
Pre-RFC: Add len_utf8_at method to str libs	8	1011	December 20, 2020
Why's char not an utf8mb4? language design	18	1966	August 13, 2021
`str` method for slicing code-point (i.e. `char`) ranges libs	23	2957	March 25, 2019

Pre-RFC - allow usage of str::utf8_char_width

Related topics