Arithmetic operations on char

char is an unsigned 32-bit integer with restricted range. However to actually do any sort of arithmetic on it you have to cast it to u32. I think this restriction should be lifted.

If you do arithmetic on a char, you still then have a 32-bit value, it's just no longer (necessarily) in the approved range: it is now a plain u32. So the standard library could usefully implement Add<u32>, Sub<u32>, BitOr<u32>, etc on char as below.

impl Add<u32> for char {
    type Output = u32;
    fn add(self, other: u32) { self as u32 + other }
}

Currently you have to do all the casting yourself which is needlessly verbose and also doesn't constrain you to u32; you can accidentally cast to u16 or u8 (etc) which is likely to lead to incorrect results. Implementing the standard arithmetic traits to allow char + u32 = u32 would both condense code and avoid that possible error.

For similar reasons, I would implement PartialOrd<u32> for char.

Can you say why you want to do arithmetic on a char? What are you doing that this is so common to be worth supporting without the cast?

Note that they are iterable in ranges, now, so 'a'..='z' works for that without ever needing to type + 1.

There's no automatic widening, though.

So if you do (u16)c + 10, you'll then need another cast in order to from_u32 it back into a char, which is a great opportunity to realize that you didn't want a u16 in the first place.

This was prompted by Make is_ascii_hexdigit branchless by GKFX · Pull Request #103024 · rust-lang/rust · GitHub, where I bitwise-or a character with 0x20 to convert ASCII uppercase to lowercase. In that example you would get an incorrect but compiling function by incorrectly specifying u16 in place of u32.

Instead of as u32, you can use u32::from(char). From<char> is only implemented for u32 and larger, so you're protected from accidentally using a smaller type.

11 Likes

I feel like this definition is wrong. char is a datatype describing an Unicode Scalar Value. These just happen to have a bijective mapping to a subset of unsigned 32-bit integers, which is used to store them in memory. char is not ment as a general integer type. (Similar to very large fieldless enums.)

12 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.