To_upper speed

newpavlov · January 25, 2021, 1:05pm

It's also important to note that an optimization similar to the ASCII one can be applied to Cyrillic characters as well:

fn to_cyrillic_uppercase(c: char) -> char {
    let c = c as u32;
    let res = match c {
        0x0430..=0x044F => c - 0x20, // most often used characters
        0x0450..=0x045F => c - 0x50,
        0x0460..=0x0481 | 0x048A..=0x04BF | 0x04D0..=0x04FF => c & !1,
        0x04CF => 0x04C0,
        0x04C1..=0x04CE if c & 1 == 0 => c - 1,
        _ => c,
    };
    unsafe {
        std::char::from_u32_unchecked(res)
    }
}

playground

(BTW: it's quite annoying that character codes are not selected to simplify such conversions...)

I guess this can be done for most European languages as well. Maybe it's worth to talk about a more general compression of the conversion table instead of only special-casing ASCII?

Topic		Replies	Views
Pre-pre-RFC: Support `write_uppercase(&self, &mut String)` libs	14	1642	May 29, 2022
Case Insensitive UTF-8 Comparison libs	11	5406	January 12, 2024
Benchmark for std::str::from_utf8()? libs	4	1140	March 25, 2019
Why's char not an utf8mb4? language design	18	2110	August 13, 2021
ASCII methods for u16	17	2969	April 11, 2021

To_upper speed

Related topics