[Pre-RFC] Implicit number type widening

scottjmaddox · June 27, 2019, 3:10am

Please, don’t. There are languages where implicit integer widening makes sense, but a performance and control oriented language like Rust is not one of them. It was a mistake for C to have implicit integer conversion; it’s an endless source of bugs.

Explicit conversion is already very easy, thanks to the From and Into traits, e.g. usize::from(0_u8). And now with TryFrom and TryInto stable, even fallible conversion is easy, e.g. u32::from(0_usize).unwrap().

RustyYato · June 27, 2019, 3:26am

I think you mean u32::try_from(0_usize).unwrap()

josh · June 27, 2019, 5:12pm

The primary downside would be that attempting to add a u8 and a u32 would effectively convert to u32 rather than giving a type error. (That’s also the primary upside.) There are absolutely cases where you want that error; the question is whether those are worth forcing the conversion in every case.

Random example: suppose you’re building a virtual machine, and you’re emulating a 64-bit environment, but you’re running on a 32-bit machine. (Or vice versa.) In such cases, you absolutely want explicit errors any time you mix usize, u32, and u64, to maintain the distinction between types of pointers.

josh · June 27, 2019, 5:13pm

Note that in any circumstance where the context determines the type, you don’t need something like usize::from(some_u8); you can just write some_u8.into().

RustyYato · June 27, 2019, 5:47pm

I think conversions to and from usize were ruled out earlier for this exact reason.

josh · June 27, 2019, 5:52pm

Right. Which means we need to carefully evaluate whether conversions between (say) u32 and u64 would produce the same problems.

RustyYato · June 27, 2019, 5:54pm

They wouldn’t, because the size of u32 and u64 are not platform dependent. So, conversions between them will not change depending on platforms (besides performance changes).

josh · June 27, 2019, 5:56pm

That’s not the concern; the concern is whether you’d want to keep those types carefully separate, or whether it’d suffice to require wrapper types to prevent such mixing.

timvermeulen · June 27, 2019, 7:16pm

For what it’s worth, I was only talking about adding additional PartialEq / Eq / PartialOrd / Ord implementations. It’s a lot more straightforward (and less controversial) than mixed integer arithmetic.

djc · June 29, 2019, 12:46pm

Since discussion has slowed somewhat (thanks for all the feedback!), here’s my current thinking based on comments so far.

Interaction with type inferencing

First off, there have been a lot of concerns about type inference being affected. In my mind, coercion by necessity is decided after inferencing: since coercion needs to generate run-time code, it needs to have clear inputs on the source type and the destination type for a particular operation. Someone please tell me if this understanding doesn’t mesh with the way it’s implemented in rustc. As such, in my mind the proposed number type coercions by definition do not change the behavior of the type inferencer.

Allow-by-default lint

Second, I think it makes sense to have an allow-by-default lint that disables or warns about coercions in a particular crate, module or function. This could help with the case of hot loops or for those people who insist on Rust being a super-explicit language.

Scenarios

There seem to be three different pain points that could be improved. These might have different kinds of implementation methods – more trait implementations vs inserting widening conversions – but I think it still makes sense to discuss them in one RFC to make sure this is a coherent proposal.

Store

Store a value of some integer type into a slot (like a struct field, a local variable with a defined type, a return value or a formal function parameter). In this case I think it would make sense to apply coercion.

Comparison

When numbers of different types are compared, coerce the smaller type to the larger to enable the comparison.

Indexing

Allow indexing with different unsigned number types. The trick here is to make sure we don’t unduly affect portability, since the “native” index usize type could be any of u16, u32 or u64 in practice. The question is how much that matters since the collection length is the more important constraint here, and will often be more limiting than the platform’s size type length.

Coercions

As for the allowed coercions, I’m now thinking of the following:

Signed fixed length integers to larger signed fixed length integers
Unsigned fixed length integers to larger unsigned fixed length integers
Unsigned fixed length integers to large enough signed fixed length integers
Fixed length floating point types to larger floating point types

Conclusions

We could tweak the proposal along the scenarios axis and along the coercions axis, of course. Personally I’m least sure about the indexing part.

What do people think about this refined version?

Dushistov · June 29, 2019, 1:54pm

But may be it is just mistake?

struct Foo { x: u16 }
foo.x = expression of type u8;

if compiler reports error in this case, developer can understand that width is 16 bits, and expression can be calculated via 16-bit arithmetic. If silently allow this it would be bad.

It is close to C++ integer promotion which is really really bad. Consider

let a: i8 = -1;
let b: i16 = -1;
a == b

In fact a and b are really have different values, because of -1 as i16 and -1 as i8 have different bits representation on most popular CPU.

And again what if it is program bug, and a may be variable "a" should be i16?

Again what if this is bug, and user don't want indexing with u8 (release mode has no overflow checking), so

let mut i = i_u8;
while i < 256_u16 {
  a[i];
  i += 1;
}

will be infinite loop (indexing + comparison automatic integer conversation).

RalfJung · June 29, 2019, 2:02pm

As just another example where I was really happy that we don’t do implicit widening on function calls: when calling write_bytes, one can easily mix up the “byte” and “count” argument. By writing the “byte” as 0u8, I can be sure I did not make that mistake.

scottmcm · June 29, 2019, 4:30pm

Unfortunately, the examples that have come up in the thread are making me less confident than I originally was that I'd like widening coercions in Rust.

I think that, overall, the scenarios that have come up make me want to see how well they can be done with more trait implementations before contemplating adding coercions.

Note that the cases where that works aren't the ones that I find interesting, because from works decently. There isn't a "larger" type when comparing i128 to u128, though, so I strongly think additional trait impls are a better way to make comparisons work than coercions would be.

I think this is again better done with additional impls than with coercions, especially looking at the portability point. a[i] always has the possibility of panicking, so if additional impls mean that that ends up really just being a[usize::try_from(i).unwrap()], I'd have no problem with that. (Continuing the "if the right way is longer than as, just do it automatically" idea.)

Can you show an example where it's a bug and it's only used in the comparison? (The bit representation is pretty irrelevant for such a comparison, because LLVM will probably have sign-extended them both into i32 registers anyway.)

Two things on this:

I feel like they usually will fix such an error today by a[i.into()] or a[i as usize] --- not by changing the type of a variable that they clearly wanted to be something badly enough to annotate it --- and thus they have the same problem anyway
I would expect literal comparisons like that to get the same "warning: comparison is useless due to type limits" that you get today with things like i <= 255, so having the mixed mode could actually be better for finding errors, since today the way to "just work stupid compiler" is (i as u16) < 256_u16, which doesn't give warnings.

leonardo · June 29, 2019, 5:52pm

Moreover, allowing the lossless and cheap cases to be implicit would serve to better highlight the opposite cases (where there is a danger of data loss).

This isn't a problem, just offer two different ways to cast the values, a short one for lossless ones and a longer one for lossy casts. Possibly a way to perform try_into with a shorter syntax.

Of course this makes things more implicit, and we’ve seen in the past that a part of the community really appreciates how many things in Rust take explicit syntax.

That sounds quite subjective. This Drawbacks parts needs to be expanded a lot.

Allow all integer types to be used for indexing, without modifying the language coercion rules anywhere else.

I'd like array/slice indexing to be allowed for smaller types too, but there are many details to think about.

josh · June 29, 2019, 7:55pm

I’d like to echo @scottmcm’s comment here; multiple responses in this thread seem to like the idea of adding more impls rather than adding widening. How about we take a careful look at adding indexing and comparison impls for certain types, and see how far that gets us?

Dushistov · June 29, 2019, 8:13pm

Comparing/assign with two variables with different types are always may be bug. Because of at first you have: a = b; or a == b and "a" and "b" have the same types. After that you do some refactoring and you get different types, but in fact you want the same types, but you forget change how "b" calculated.

At now Rust works perfectly, like in tech spec for C for safe systems (which require reduce integer promotion, compare and assign different types and so on things).

But if you allow widening even for obvious case:

let mut a: u16 = x;
let mut b: u8 = y;
a = b;

that may prevent compiler to catch bugs that developer missed after refactoring.

But you can find such things like usage of "as" in some cases, "unsafe" in some other cases. While with automatic widening there is no even clue what is wrong here during code reading.

Tom-Phinney · June 29, 2019, 11:46pm

The objections I’ve read in this thread are to the implicit aspects of the proposal, not to the general idea that there should be less painful ways to widen for comparisons and for indexing. I’ve written before, even in this thread, about my use of an ix!() macro to convert arbitrary uN to usize for indexing. Why not just add a blanket impl for widen() that will support type inference in comparisons while letting such widening be explicit?

For indexing, where the target type is usize, I suggest a blanket impl for ix(). Just reusing widen() would be incorrect when applied to uN types larger than usize, even if the actual value range of that type in the application was constrained to be less than usize::MAX. When applied to iN types an assert that the index is a non-negative value would also need to be included.

For symmetry, to address the need to have explicit documentation when wrapping uN types to a smaller uM (where M < N), a similar impl for wrap() could support type inference without requiring that the usually-inferable-from-context output type be explicitly specified. (NB: I sometimes encounter a need for explicit narrowing when writing crypto code.)

If such impls existed, most uses of as for implicit widening and narrowing conversions probably could be deprecated.

This comment specifically does not address whether similar impls should exist for narrowing iN types to a smaller iM, or for reducing the exponent range and precision from a fN type to a smaller fM type, where M < N. (Any consideration of this last issue should be forward-looking to f128 and f16 Edit: and should permit specification of rounding mode.)

Edit 2: Corrected float types from fpN to fN.

Dushistov · June 30, 2019, 12:16am

Because of this is great as it is. You have to mark conversations via macros or with "as" keywords, so it is easy to see where conversation happens. Any kind of conversations is possible bug. With some implicit kind of rules it would be hard to see places that contains probable errors.

scottmcm · June 30, 2019, 4:30am

I think this position is overstated.

Any line of anything could be a "possible" bug, including existing invisible things like auto-ref and auto-deref, so that part isn't persuasive. More interesting is whether something is more likely to be an issue than an alternative: if there's no easy way to fix a compiler error it might be more error-prone than just having the original code work, even if there are potential gotchas with the original too.
The second sentence implicitly asserts that all coercions or mixed-type operators are "probable" errors. This seems trivially false to me: shift operators take anything on the RHS today without usually being errors, and it seems obvious that for a: [T; 256] there'd be nothing wrong with indexing by a u8.

So please focus on showing how particular operators or traits or site would be a problem. Felix had a great example earlier showing how a fine-on-its-own rule caused wrapping issues elsewhere, which I think convinced the thread that coercion there is undesirable. If you have such a thing for other specifics, I'd love to hear them. (This is why I asked specifically about comparison, not about assignment.)

jongiddy · June 30, 2019, 7:21am

An example of a bug in C code due to implicit cast during struct storage: Cast key length to correct type by jongiddy · Pull Request #161 · vozlt/nginx-module-vts · GitHub

The implicit cast (from 8 to 16 bits) happens after an explicit cast (from pointer-size to 8 bits), and while they are nearby in this case, they could have been further apart.

The equivalent error in Rust would be:

struct ShortThing {
    length: u16;
}

fn main() {
    vec = [0; 300];
    // need cast for usize->u16, but mistakenly use u8
    s = ShortThing{ length: vec.len() as u8 }
}

As I understand it, this would be a compile error now, but would compile under this proposal.

Topic		Replies	Views
[Pre-RFC] Integer/Float literal types	11	1847	March 25, 2019
Pre-RFC: ergonomics around NonZeroU* and literals language design	11	1523	March 25, 2019
Idea: In the next edition, stop accepting `0.` as a valid float literal	32	3567	January 15, 2020
Pre-RFC: Extended array literal syntax language design	5	829	September 21, 2019
Integer Constructor Functions libs	12	783	November 16, 2021