To_upper speed

It's also important to note that an optimization similar to the ASCII one can be applied to Cyrillic characters as well:

fn to_cyrillic_uppercase(c: char) -> char {
    let c = c as u32;
    let res = match c {
        0x0430..=0x044F => c - 0x20, // most often used characters
        0x0450..=0x045F => c - 0x50,
        0x0460..=0x0481 | 0x048A..=0x04BF | 0x04D0..=0x04FF => c & !1,
        0x04CF => 0x04C0,
        0x04C1..=0x04CE if c & 1 == 0 => c - 1,
        _ => c,
    };
    unsafe {
        std::char::from_u32_unchecked(res)
    }
}

playground

(BTW: it's quite annoying that character codes are not selected to simplify such conversions...)

I guess this can be done for most European languages as well. Maybe it's worth to talk about a more general compression of the conversion table instead of only special-casing ASCII?

3 Likes