Let's deprecate `as` for lossy numeric casts

There is currently a discussion about the dangers of numeric casts with as. This has come up several times before, and I think it's time to fix it.

Casting between numeric types isn't unsafe, but it is very likely to cause bugs, and when used in unsafe code, it can cause UB down the line.

Integer casts are similar to integer arithmetic. By default, +, - and * operations wrap on overflow in release builds, which is equivalent to doing the calculation in a larger integer type and then truncating the result.

For arithmetic operations, this problem has been addressed: Each integer type has wrapping_*, saturating_* and checked_* operations. For numeric casts, options are more limited:

  • The equivalent to checked_* is try_into

  • The equivalent to wrapping_* is as

  • There is no equivalent to saturating_* for integer casts

Why does this require fixing, you may ask? The as operator is used for many type conversions, both lossless and lossy. It has different behavior (truncating, wrapping, saturating, rounding) depending on the types, explained here.

The main problem is that the name of the as operator is too vague, it doesn't describe what kind of conversion is performed. Furthermore, people often choose as instead of .try_into().unwrap() because it is shorter.

My proposal is to deprecate as for lossy numeric conversions. Instead, 4 new traits are added, Truncating{From,Into} and Saturating{From,Into}, which work the same as From and Into. You can use truncating_into when performance is very important and you know that it won't truncate, or when truncation is actually desired. saturating_into can be used e.g. when converting from a sized to an unsized integer and negative values should become zero.

For lossless conversions, you can continue to .into() or as. The deprecation warning is only shown when using as for a lossy conversion.

53 Likes

As a nit, I'd just call this truncate(). That aside, I'd love to see this.

10 Likes

What should be used for f64 as f32? Neither saturate or truncate seem completely appropriate.

It's not truncate because too large finite f64 saturate to f32::inf ([example]).

It's not really saturate because of the loss of precision / rounding; values between two successive f32 round to the closer value rather than saturate (away from zero) (or at least that's my understanding/assumption).

What does IEEE call the operation exactly?

I suppose it could be ".round()", but then you have the argument of whether values which saturate to inf should round to f32::MAX instead. "Approximate" is the closest verb I can think of that matches informally while not being wrong.

Otherwise I'm on board for getting closer to deprecating as for numerics.

19 Likes

Open RFC: https://github.com/rust-lang/rfcs/pull/2484#discussion_r200892454

I might leave the into version off for now, since using them without saying the type is a bit questionable. It might be fine to say "look, since it's lossy, you have to say u16::wrapping_from or whatever".

I think there's an important choice here. There's at least two obvious possibilities:

  • a wrapping_from method, which can do the full cartesian product of types

  • a truncate, which implies to me that it's not a widening, and thus doesn't include certain things that the other one would. Like I would assume u16 -> i32 wouldn't work with truncate, but more importantly neither would u8 -> i8, as 255 -> -1 isn't what I'd call a truncation.

5 Likes

IEEE 754-2019, section 5.4.2:

3 Likes

Looking through some code I see this niche use for as:

const CONSTANT: isize = (0x80000000_u32 as i32) as isize;

This is for FFI where it can be useful to match the original literal value in the C header because it makes it more obvious that they're the same when reading the code.

3 Likes

That works with just into(), though.

I agree, but we could have a separate method for converting between signed and unsigned integers of the same type.

2 Likes

mem::safe_transmute::<Y>(x) or something similar :wink: ? I'd love it if truncating as could get a warning in some distant bright future.

4 Likes

This, and even more so the floating <-> integer ones, are hard, and that's the primary thing that stalled the RFC I linked earlier.

The good news here is that we don't have to get the full way before we can start deprecating parts of it.

We could start recommending u8::wrapping_from(x) or u8::from(x) instead of x as u8 as soon as that exists, even if we haven't figured out everything about floating-point and such.

9 Likes

One of the annoying things regarding casts, and in particular making them more verbose, is that they're sometimes forced in scenarios where they could have been avoided if APIs were designed differently.

For example, indexing an array with an i32 requires casting to usize first. If as gets deprecated then I hope Rust considers indexing with other integer types (where any value outside the range of usize results in a panic, since that's effectively an "out-of-bounds" access).

14 Likes

AIUI everyone's generally in favor of it working for at least unsigned types[1]. The problem is type inference; we don't have a way for v[0] to keep working. More difficult still is v[ix.try_into().unwrap()]. This technically falls under allowed breakage, but it's way too disruptive in practice.

We need some way for type inference to still treat this is an inference point for usize and yet accept the other acceptable types. Such a system has yet to be designed, and has interesting implications to the type inference algorithm.

macro_rules! ix {
    ($e:expr) => (usize::try_from($e).expect("index out of range"));
}

works acceptably currently. (Choose between TryFrom and TryInto based on personal preference. When discussing the issues with Index + [$e.try_into().unwrap()], I prefer to avoid such in the macro, but it's technically more general so long as the explicit type hint isn't required.)


  1. The problem with negative indexing is that v[-1] could be assumed to give the last element. A good error-by-default lint here could help, but cases without this issue have broader support, and a static lint doesn't do much for variable indexing. ↩︎

11 Likes

As mentioned in previous threads where this has come up, I'd prefer to continue requiring indexing with usize only; I find the strict type enforcement helpful for catching errors.

I'd love to see this fixed by making it possible to use .into() to convert a u32 into a usize. We've been talking about ways to do that for a while, but some more recent suggestions from the library team may make that more feasible.

10 Likes

I vaguely remembered this RFC, but I didn't realize how similar it is to the proposal I had in mind. I would really like to see this RFC accepted!

What about having an explicit conversion to an index-type, that's not usize? Similar to, let's say, index_ext - Rust but this runs into some ambiguity problems we possibly would not have with a std-internal definition (because SliceIndex is sealed):

struct Intex<T>(T);

impl<T> From<T> for Intex<T> { … }

// … etc for all integer types and Range, RangeTo, RangeFrom?
impl SliceIndex for Intex<u32> { … }
impl SliceIndex for Intex<i32> { … }

// Usage, this should be unambiguous.
let fine = [0u8; 2][1u32.into()];

Side stepping the problem: It commits to neither indexing being implicit, nor does it require adding the unaturally panicking or platform-specific conversion.

The SliceIndex impl can also properly return None if a negative index is supplied, behaving in as indexing in a mathematical sense. 'Wrapping' negative indices could be a separate type or this is open to discussion.

Having a working Index<T> type would be fun in that it would have useful niches, too -- for ZSTs it'd probably be the full usize space (sadly), but otherwise it could be just 0 ..= (((isize::MAX as usize) + 1)/sizeof(T)).

It's probably one of the many things that would work better with better literal support, though.

Hmm, for arrays it might be nice to have the index/fencepost distinction. Since [..i] and [i] want different validity ranges.

Worse, there is literally no place in the Rust documentation where the cast behaviour is precisely specified. I guess it is implied that it acts the same as in C, but that's not something that a production-grade language can live with. However, that's a documentation issue.

With regards to the kind of conversion, for integers it is well known that it does zero extension or truncation, so I don't see any big deal. The signed-unsigned casts are much less clear and should be properly documented, but I don't see how your proposed traits would fix that.

Truncating casts are super common when working with FFI or migrating old C/C++ code into Rust. Turning a short operator into a long function call would be a significant degradation of ergonomics and readability in those cases.

Copious casts are also common in Rust code written by newcomers. While I believe that excessive casts are always a sign of a poorly designed API, a redesign may very well be unobvious, too burdensome or too difficult for a newcomer. I don't think the reduction in UX is worth the added benefits in correctness. Casts are very rare in idiomatic Rust code.

It's also unclear how would you even define truncating/saturating_into in the absence of the as operator. Do you mean that they use it internally, but its use in end-user code is discouraged? That won't be a popular approach.

Overall, this sounds like something which is best solved at the level of per-project lints. Clippy already supports both blanket ban on as operator as well as more specific cast lints (around 20 of them, in fact).

While I would also like to see as casts deprecated someday, I am not terribly fond of that particular proposal, for reasons on which I had already elaborated upon in the comments under that RFC, namely: lack of precision about the specific modes of conversion and the kinds of lossiness they entail. I think the version you propose here is much better in those respects.

Narrowing casts could simply exist as intrinsic functions instead.

1 Like

It would become less ergonomic, yes, but readability would improve. When you see as, you don't know what it does (does it truncate? saturate? round towards zero? lose numeric precision? Or is it lossless? These are all possibilities). A function call such as .truncating_into() makes it much easier for a reader to understand what is happening.

Rust emphasizes ergonomics, but not at the cost of correctness or maintainability.

Worse, there is literally no place in the Rust documentation where the cast behaviour is precisely specified.

The lack in documentation is something that can be fixed. This argument is neither for nor against this proposal.

9 Likes

There is a description in the as documentation of the semantics of all kinds of casts. If there is something missing or unclear, please feel free to open an issue.

2 Likes

I read your comment in the RFC. It raises some good points, but I don't think that it is enough to dismiss the RFC.

In the RFC, "lossy" essentially means that the value loses numeric precision (significant binary digits) in the conversion, but it is not truncated, wrapped, saturated, etc. I think this is a useful definition.

TruncatingFrom is called WrappingFrom in this RFC. I considered both names, but I'm not entirely happy with either of them. What we want is an operation that can do both, but I don't know a word that describes it exactly: Transmute the n least significant bits into the resulting type and discard the rest (n being the number of binary digits in the resulting type).

The RFC also includes a TryFromLossy trait for f64 -> f32 and {float} -> {integer} conversions that round like as, but return an error when as would saturate. It would be nice to have a conversion that returns the rounding error as well, but that could still be added later. It might be best to think of the RFC as an MVP.