Let's deprecate `as` for lossy numeric casts

I agree that a new proposal that leaves out the less-certain "lossy" names is the right way for now. The integer <-> float problem can be left for another day.

I think this hits the problem of assuming no-niche values. For example, what does NonZeroU8::bikeshed_from(256_u16) do? Fail to compile? Do we need a try_ version of it too?

One thing I like about the wrapping name is that I can define it for these cases: Start from some value that's representable in both, then wrapping_add(1) or wrapping_sub(1) peano-style until you reach the value, and what that did to the other one is what you get.

So under that logic, NonZeroU8::wrapping_from(256) is 1, and NonZeroU8::wrapping_from(0) is 255.

Now, we could discuss whether that's useful enough to implement for those type pairs, but it's at least unambiguous, and it matches the truncation behaviour for the simple unsigned-to-smaller-unsigned cases.

I think I thus would go for wrapping_from plus a machine-applicable lint for "replace this wrapping_from call with just from".

Since the main point here is to deprecate as, it's not clear that these wrapping etc should apply to other types. I would consider your example more readable as NonZeroU8::try_from(u8::bikeshed_from(256u16)).unwrap() anyway, then it's clear what's happening. The Peano thing, while tempting mathematically, doesn't seem that useful to me, and 0 converting to 1 is outright confusing. (Edit: you wrote to 255, counting backwards. I think this proves the point). While the word wrapping is probably originally meant in this Peano sense, this is all mostly relevant for cases where it is just a bit truncation.

4 Likes

I think it's a huge advantage of having this being a trait to be able to do things like u32::wrapping_from(my_big_integer), so I think we should lean into that side.

Right, because if you start from 1, which is representable in both, then 0 is one wrapping step below that in the input, so you go one wrapping step below it in the output, an the wrapping thing "before" 1 in a NonZeroU8 is 255.

0 going to 1 would just be wrong for wrapping_from, I agree. Of course NonZeroU8::saturating_from(0) would go to 1, though.

I went 255 steps forward, instead of 1 backwards, to arrive at 0 and 1 respectively. The problem of course is that the size of the value space of NonZeroU8 doesn't divide that of u8 or u16 (unlike that of u8, which does divide that of u16). This means that walking forward or backward can give you different results, which IMO makes for a rather confusing "wrapping" behaviour.

So yeah, for the big integer it might make sense, but I think that for range numbers (such as non-zero) it doesn't so much.

Side note, as conditionally const and const traits evolve, it will be super nice to see this just error at compile time.

2 Likes

"Truncate" has been discussed earlier, but I don't think it's at all suitable for narrowing (or other) casts between integers. At least to me, the word specifically means removing (zeroing) least significant digits. I think "wrapping" is a much better word for this operation.

That still sounds very surprising. It would be as if NonZeroU8 were numbers modulo 255, but with the unusual convention of denoting 0 by 255 instead.

Tangent on modular arithmetic and Rust integers

So, with wrapping arithmetic the numeric types uN and iN model integers modulo 2^N. In this sense and with the two's complement representation used by Rust, iN is equivalent to uN: for any x,y: uN, (x + y) as iN == x as iN + y as iN (with any of +,-,* used as a wrapping operator). The signedness is just a selection of which of up to two possible representatives is used for other operations where it does matter.

Considering both widening and narrowing casts (signed or not), x as {u,i}N is just returning the numeric value of x as a number modulo 2^N. As a particular example, -1i8 as u16 == u16::MAX, because u16::MAX is the unique number in the range of u16 that is congruent to -1 (mod 2^16). Admittedly a cast like this (widening cast from signed to unsigned) is the most ambiguous kind since it could plausibly be defined as first casting to the unsigned type of the same width.

With that mental model for primitive integers in Rust, none of the as-casts between integers are of much concern if the types are known.

I wonder which of these are clearer?

let x: u32 = ...;
foo(x as usize);
// OR
foo(x.into());

bar(x as u16);
// OR
bar(x.wrapping_into());

I'm aware it could also be x as _ , or into could be turbofished (and would often have to be), but the point here is that I feel with as the type is more likely to be written explicitly. In part precisely because as is capable of many different types of casting.

Clearly programmer intent is what is missing with as-casts. It's unclear if x as u16 is supposed to wrap, or known not to. Then again, that's also the case with wrapping_into, in case the programmer knew it won't wrap, and wanted to avoid the check.

I'm not sure it's really any more unusual than what "truncating" to a signed number does. i8 is "modulo 256, but representing 255 as -1 and 254 as -2 and ...".

Basically, I'm taking this much as obvious:

i16       -1 0 1 2 3 ... 254 255 256 257

           | | | | |      |   |   |   |
           v v v v v      v   v   v   v

NonZeroU8  ? ? 1 2 3 ... 254 255  ?   ?

So the only "wrapping" way that makes sense to me is just to fill those ?s with another copy of all the values, like

i16        -1   0  1 2 3 ... 254 255 256 257

            |   |  | | |      |   |   |   |
            v   v  v v v      v   v   v   v

NonZeroU8  254 255 1 2 3 ... 254 255  1   2

So maybe we say "that's weird", and discourage it from anything where the number of values doesn't divide evenly. But that would mean no NonZeroU8::from(my_non_zero_u16) either, which I'm not sure I could say "that shouldn't exist" to.

You'll notice I've been writing these as u16::wrapping_from(x), not x.wrapping_into().

We should keep the objective in mind when discussing conversions to NonZero* types.

Wrapping arithmetic is very interesting from a mathematical standpoint, but it's not useful in most real-world applications. It does have some useful properties (for example, adding an amount to a number and then subtracting the same amount returns the original number even if it overflowed/underflowed). But the main reason why wrapping arithmetic is widely used by computers is that it is easy and efficient to implement. If checked arithmetic was equally fast, almost nobody would use wrapping arithmetic.

Converting any integer that can be 0 to NonZero* requires a runtime check, because 0 is not a valid bit pattern of a NonZero* type. Implementing this conversion would probably be even slower than a checked conversion. This makes the argument for a truncating/wrapping conversion from {integer} to NonZero* moot.

2 Likes

That's not what happens; 28-1 divides 216-1 and not 216. So when you fill the ?s (increasing from 1) you get

i16        -2  -1   0  1 2 3 ... 254 255 256 257

            |   |   |  | | |      |   |   |   |
            v   v   v  v v v      v   v   v   v

NonZeroU8  254 255 ??? 1 2 3 ... 254 255  1   2

And something else if you decrease instead.

Interesting idea for going from NonZeroI16 to NonZeroU8, etc, though. (2n-1 always divides 22n-1 for positive n; (2n-1)(2n+1) = 22n-1.)

It is definitely a lot more unusual. In fact, it is IMO entirely broken: x+0 must be equal to x, for all elements of the equivalence class of 0. If we treat 255 and 0 as equivalent, that property fails to hold. In contrast, using -1 as the representative for the equivalence class that also contains 255 is entirely well-behaved -- all expected mathematical equations still hold.

This could be salvaged by saying that NonZeroU8 really represents integers modulo 255, but then the arithmetic operators should also reflect that. For example, 254+2 should be 1. I doubt many people will expect that. Also, NonZeroU8 becomes a misnomer now, because this is not a u8 (i.e., an integer modulo 256) known to be non-zero, it is a totally different algebraic structure: a u_log2(255) of sorts.

10 Likes

The name convertFormat unfortunately doesn't inherently distinguish between lossless and lossy casts though. As a possible alternative, WebAssembly uses the names promote and demote to describe f32-to-f64 and f64-to-f32 casts, respectively.

5 Likes

A long time ago I started writing an RFC and then never quite finished it: Rust pre-RFC: Add explicitly-named numeric conversion APIs ยท GitHub

1 Like

Not sure if someone's raised this issue in the thread before, but f64 to i64 conversions and the ilk are also rather controversial in their behavior.

(credit to u/simon_o on Reddit for pointing this out)

dbg!(1024_f64 as i64); // prints 1024_f64 as i64 = 1024

One could very easily imagine that as is supposed to work as a safe alternative to transmute for the numeric types, considering that:

dbg!(-1_i8 as u8); // prints -1_i8 as u8 = 255

instead of panic'ing.

While the documentation is clear about the semantics of as in various situations, it's a bit unsettling to see so many different behaviors depending on the types being converted.

So with all of that said, how do people feel about a long term plan of

  • Ensuring all current uses of as are covered by some reasonable replacement, details about naming aside but hopefully with okay ergonomics.
  • Adding warnings to rustc (or promoting clippy warnings) which suggest these alternatives.
  • vs the status quo of optional clippy warnings.

If support for that is good (and I haven't seen many "no as is great" posts in this thread), maybe @Aloso can throw this idea up in RFC form? I assume "deprecation" is off the table, at least initially.

7 Likes

I think it'd be great if at least all the lossy versions could be deprecated starting in edition2024, because there are reasonable replacements. (Keeping it for things like as &[T] or as Box<dyn Error> might be ok -- maybe it could become the type ascription syntax in 2027 or something.)

Anything faster than that is probably too fast, given how pervasive as currently is.

4 Likes

An important consideration for the short term is const fn, since trait-based conversions don't work there (yet, until const impl) but as does

6 Likes

Something that I don't think I saw considered much here is when a user doesn't want to be concerned with the value. Specifically unsigned <-> signed casts reinterpret bits, and casts with a signed starting type will sign extend when needed.

I was working on a design for an API that would allow users to perform the bitcast operation, sign extension operations, zero extension operations, and truncation explicitly and with the knowledge that if the operation cannot be reasonably performed (e.g. "truncating" a u8 into a u32, that doesn't make sense) it will not compile.

Saturating lossy conversions and conversions using floats that are concerned about the value or precision of the conversion would not be considered under my design and I think that would be an improvement. I think that between my proposal and either the linked "lossy conversions" RFC or something like what @Aloso had in mind, almost all uses of conversions between numbers (manipulating bits intentionally without care for the value and lossy conversions that have some specific behavior when there is loss) would be covered.

1 Like

I don't think of them as reinterpreting bits -- their behavior can be entirely explained on the level of modular integer arithmetic. Namely they pick an element of the same equivalence class that is representable in the other type.

Same as above -- I wouldn't think of them as transmuting. -1 and 255 are the same modulo 256. So this is totally consistent with 1024_f64 as i64 = 1024.

8 Likes

While this may be true, and perhaps should be how it is explained, every document I've seen about this, and most people that want to do this cast explain it as reinterpreting bits without doing any logic. For example from the Reference page about casting semantics it describes the cast as

  • Casting between two integers of the same size (e.g. i32 -> u32) is a no-op (Rust uses 2's complement for negative values of fixed integers)

The relevant Rust By Example page is weird in this regard. It seems to partially explain the casts as modular arithmetic, but only for casting to any unsigned integer. It then says

When casting to a signed type, the (bitwise) result is the same as first casting to the corresponding unsigned type. If the most significant bit of that value is 1, then the value is negative.

Which is effectively talking about casting to signed types as a bit cast.

I was not able to find any information in The Book about integer casts with as, but if there is any, its position should be considered with even more importance as the first time that users will be exposed to the operator and its function. I will note that it does however talk about arithmetic operations being two's compliment wrapping, and talks about the behavior in the default debug and release modes.

I am not well versed enough in the mathematics behind modular arithmetic to be confident in how this should work for signed types if the explanation were to be made consistent using the same explanation. Additionally I don't know if that should be how it's explained. I think there are a lot of users who think of integers in terms of their bits, rather than their value - which is what my set of ideas aimed to expose in a more consistent way.

2 Likes

FWIW I have always thought of unsigned-signed integer cast as reinterpreting bits. Especially in (for example) FFI code when you have a Rust API that uses u8, and a C API that uses char, reinterpreting bits is the most natural/obvious explanation

3 Likes