Casting integers when bit-twiddling


#1

Something I’ve found myself doing recently is ((x >> 8) as u8 & MASK), where x: u32 and MASK: u8. The casts feel unnecessary and are an annoying papercut when bit-twiddling; after all, it is always the case that &-ing any unsigned integer with a u8 will result in a value that fits into a u8. I have a few thoughts on how to alleviate this papercut, and wether it’s worth it or not.

One solution is to allow & and | with types of mismatched size. Currently, all of the binary integral operators x op y require that x and y be of the exact same type. In general, this is a good thing; I’m not a fan of surprise widening conversions. However, I question if this is strictly necessary for BitAnd and BitOr. In the situation of bit-twiddling, it feels like casting is an unecessary additional assertion that “yes, I want to zero extend this mask to or it with something” or “yes, I want to truncate so I can and what’s left with a mask”. Thus, I’d like to see impls like the following in core:

impl BitAnd<u8> for u32 {
    type Output = u8;
    fn bitand(self, other: u8) -> u8 { (self as u8) & other }
}
impl BitAnd<u8> for u32 {
    type Output = u32;
    fn bitand(self, other: u8) -> u8 { self | (other as u32) }
}

That said, I’m not entirely comfortable with the idea with BitAnd and BitOr not having the same strict behavior as the other ops including BitXor, for which I don’t feel a behavior like this is anywhere near as useful.

An alternative, and somewhat more radical alternative is to create an “literal integral/float” pseudotype for constants, which we already know as {integer} and {float} in compiler errors. This imitates is how Go and C (via #define) do constants: they are untyped until used. I consider this a failing of both languages, since untyped literals can cause a lot of strange errors when combined with inference, though I can see where such a thing could be useful.

Currently, consts require a fixed type, which can be casted from, but I think being able to refrain typing consts on an opt-out basis, e.g. const MASK: ulit = 0b0101_0101;, could make the above impls unecessary (except for the automatic narrowing done by &, which I don’t feel is fully justifiable anyways). I’m not sure what the semantics of such a pseudotype would be; in what situations are they allowed to be used as an expression? Do they still type as i32/u32/f64 if no type is inferred?

This is not so much a proposal, but I’d like to see what thoughts are on this casting papercut.


#2

This problem deserves a good general solution that’s useful in other situations. See this post of mine, it’s also about value range analysis and an integration of it with the into():


#3

Wow, I hadn’t noticed that. Amazing!


#4

As usual, the problem with adding impls is inference.

Today you get 100_u8 | 1 => 101_u8. But with a full set of impls, you’d get 100_u8 | 1 => 101_i32, since the 1 falls back to i32 like it does in things like println!("{}", 1).


#5

So, literally today I had a conversation with someone who found it especially appealing that Rust did not allow mixing integer types without an explicit cast. And in particular, they appreciated that Rust’s stricter integer types made it much easier for them to meet broader safety requirements of the systems they wanted to build, requirements which C made much harder. Dropping that behavior, whether by adding impls for mixed operations or by any other means, would remove a major type-safety benefit of Rust.


#6

Alternate (?) library-only option: add a Mask type and make impls that implicitly widen/truncate the other input


#7

I am not sure what the OP has in mind, but what I am suggesting isn’t a step back in the Rust design, it’s not a decrease in strictness, and it’s explicit. It’s meant to make Rust code safer and more correct compared to the current Rust code.

First you need to understand the current situation: in a system language you often have to juggle different types. Sometimes you mask values, sometimes you get the modulus, and in many cases you have to change the type of expressions. C language allows this not caring in many cases about the types you convert.

Current Rust has a mix of features to help write that code, “as” is short and nice and it’s often the first tool used by Rust programmers to convert types, despite it’s the worst reliability offender among type conversions. I think the design of “as” is one of the few mistakes of Rust design.

Rust also offers into/from and tryInto, that perform lossless conversions and lossy but verified conversions. They improve the reliability of Rust code compared to using the “as” hard casts.

What I am proposing is an improvement of the into/from. They are for reliable conversions, so they don’t make the code more buggy. And they are explicit, they are not implicit.

The idea is to allow more into/from where the compiler can infer statically that a type conversion that is lossy if no value range is known becomes lossless:

fn foo07(x: u8) -> i8 { (x - 128).into() }

fn foo14(x: u32) -> u8 { (x & 0xFF).into() }

To do this the compiler needs a value range analysis of the expression.

I don’t know if this is possible or a new trait should be invented to do this.


#8

I can see how this can be useful in some cases. For example, it is quite annoying when trying to do a i32 && u32 for example, as it being bitwise means that there should br no issues with it, but right now it requires explicit casting, which can be quite annoying when it is used often (ie Maths Libraries, libm does this a lot)


#9

I decided to think a bit more about my literal types idea, and ended up deciding it wasn’t worth the additional complexity. Here’s a summary of my thoughts, for the curious:

  • Introduce ilit and flit primitive types (bikeshedable names; see also i__ and f__). These would be the types of unqualified numerical literals, which compiler errors currently refer to as {integer} and {float}, similar to how string literals type as &'static str.
  • These types automatically coerce to as integer literals do today: either to the numeric type expected by inference, or i32/f64, respectively, if no type is inferred.
  • const bindings may be declared ilit and flit. I imagine the following (imaginary) desugaring:
const MASK: ilit = 0b0101_1010;
let foo = MASK; 
assert_eq!(size_of_val(&foo), 4);
// desugars to
macro_rules! MASK { () => {0b0101_1010} }
let foo = MASK!();
assert_eq!(size_of_val(&foo), 4);
  • These types do not implement Sized, since they are only meant to live in compile-type constants, where their size would be determined at the callsite. The question remains if we would ban their pointer types, which we certainly do not want running around; they could also be made into a (byte_len, ptr) fat pointer, but I really don’t think this is a good idea. This is especially confusing for flits.

The sole purpose of such a pseudo-type would be to allow for constants that have the same inference behavior as literals, which can currently be approximated with macros, and which I am unsure is even a good solution to the original stated problem.

I like this! We ostensibly have things like m8 and friends in std::simd (in the form of mask vectors, which mainly exist as part of simd support). We could take this one step further: it might not be an awful idea to add “mask types” which can only be BitAnded, BitOred, and BitXored with each other and unsigned integers, and have the extension/truncation behavior you expect. It would neatly solve the inference problem if integer literals are inferred as mask types only when all other integer types (including i32) are not valid inference targets (or, more simply, require writting const MASK: m8 = 0b0101_0101m8;).

All of this, except for the inference part, can be done in a library already.


[Pre-RFC] Integer/Float literal types
#10

I think these could be really handy in const generics too, as many things would rather infinite precision for compile-time calculations, and the nice checked conversions to real types later. And it’d be a logical type for a custom literal suffix handler to take as input.

I rather like that, since it solves the symmetry problems that would otherwise show up.


#11

If you think this is worth pursuing, I can put together a more thought-out pre-RFC. I definitely hadn’t thought of custom literals, though to be fair I don’t consider them useful for something like custom literals… though I think those have a number of problems, too. (Though they would make things like declaring mask types less painful.)


#12

While I understand what you’re getting at (value-range analysis), one of the examples is concerning:

A u8 minus 128 is still a u8; into of u8 into i8 shouldn’t work regardless of range.

I can understand this one working:

For that one, even if we allowed that, but we’d have to guarantee that Rust code satisfying the value-range analysis in one version of Rust continues to do so in future versions. We don’t want, for instance, a new compiler to change its analysis algorithm and suddenly a pile of code doesn’t compile.

Beyond that, however, you’d need some way to express that in the type system.