Arithmetic operations on char

GKFX · October 14, 2022, 8:39pm

char is an unsigned 32-bit integer with restricted range. However to actually do any sort of arithmetic on it you have to cast it to u32. I think this restriction should be lifted.

If you do arithmetic on a char, you still then have a 32-bit value, it's just no longer (necessarily) in the approved range: it is now a plain u32. So the standard library could usefully implement Add<u32>, Sub<u32>, BitOr<u32>, etc on char as below.

impl Add<u32> for char {
    type Output = u32;
    fn add(self, other: u32) { self as u32 + other }
}

Currently you have to do all the casting yourself which is needlessly verbose and also doesn't constrain you to u32; you can accidentally cast to u16 or u8 (etc) which is likely to lead to incorrect results. Implementing the standard arithmetic traits to allow char + u32 = u32 would both condense code and avoid that possible error.

For similar reasons, I would implement PartialOrd<u32> for char.

scottmcm · October 14, 2022, 8:46pm

Can you say why you want to do arithmetic on a char? What are you doing that this is so common to be worth supporting without the cast?

Note that they are iterable in ranges, now, so 'a'..='z' works for that without ever needing to type + 1.

There's no automatic widening, though.

So if you do (u16)c + 10, you'll then need another cast in order to from_u32 it back into a char, which is a great opportunity to realize that you didn't want a u16 in the first place.

GKFX · October 14, 2022, 8:52pm

This was prompted by Make is_ascii_hexdigit branchless by GKFX · Pull Request #103024 · rust-lang/rust · GitHub, where I bitwise-or a character with 0x20 to convert ASCII uppercase to lowercase. In that example you would get an incorrect but compiling function by incorrectly specifying u16 in place of u32.

cuviper · October 14, 2022, 9:48pm

Instead of as u32, you can use u32::from(char). From<char> is only implemented for u32 and larger, so you're protected from accidentally using a smaller type.

nacaclanga · October 19, 2022, 2:51pm

I feel like this definition is wrong. char is a datatype describing an Unicode Scalar Value. These just happen to have a bijective mapping to a subset of unsigned 32-bit integers, which is used to store them in memory. char is not ment as a general integer type. (Similar to very large fieldless enums.)

system · January 17, 2023, 2:51pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Implement From<char> for u64 libs	25	2333	February 27, 2021
Fn char::as_ascii(self) -> Option<u8>	4	732	April 20, 2020
Should autoconvert on narrowing const operation language design	3	486	May 10, 2023
Representing difference between unsigned integers libs	12	6057	March 15, 2021
ExactSizeIterator for Range<char> libs	6	855	September 16, 2020

Arithmetic operations on char

Related topics