[Pre-RFC] Implicit number type widening

I was thinking about this when Proposal: integer conversion methods came up and then this week I was watching @nikomatsakisPLISS talk (specifically, the part about explicit stuff potentially going implicit over time) and decided to put this into a pre-RFC. Happy to get feedback about what I might be missing here (since there didn’t seem to be any negative feedback when I mentioned it in the previous thread) and what else would need to be in an RFC for this.

  • Feature Name: implicit-number-type-widening
  • Start Date: 2019-06-19

Summary

In case where the coercion can be done losslessly and cheaply, convert between integer and floating point types without explicit syntax (like as or .into() or u64::from(v)). This would include:

  • i8 -> i32 (all signed integer types into larger signed integer types)
  • u8 -> u64 (all unsigned integer types into larger unsigned integer types)
  • f32 -> f64 (all floating point types into larger floating point types)
  • u8 -> i16 (all unsigned integer types to signed integer types that can fit all their values)

Motivation

Currently, all of these type conversions must appear explicitly in syntax. However, this feels overly cautious for conversions that can be done losslessly and cheaply. Moreover, allowing the lossless and cheap cases to be implicit would serve to better highlight the opposite cases (where there is a danger of data loss).

Guide-level explanation

In addition to the implicit coercions that we have today (like dereferencing or referencing), extend implicit conversions to allow these to happen.

@Tom-Phinney in the previous thread also specifically called out the pervasive use of usize for indexing, so that it would be nice if smaller types could trivially be used as indexes.

This should be fully backwards compatible, in that it strictly extends the set of code the compiler would accept. We might add lints in clippy to nudge users toward implicit conversions.

The new behavior additionally seems more similar to C++ behavior, so it might lower the barrier to entry from that direction.

Reference-level explanation

Seems reasonably clear how this should work, but please tell me what I might be missing.

Drawbacks

Of course this makes things more implicit, and we’ve seen in the past that a part of the community really appreciates how many things in Rust take explicit syntax.

Rationale and alternatives

  • Do nothing: to some extent this is more like a paper cut than anything else.

  • Some more constrained set of coercions: maybe extending i to i and u to u is less controversial than extending u to i and/or extending f32 to f64.

Prior art

C++ has similar implicit type conversions; I found this article that explains it a bit. It looks like Java (and Kotlin) don’t have this.

I’d be happy to hear about what Haskell and OCaml do. My 10.000-foot view is that probably they’re not as concerned about having separate types for all the integer widths so this isn’t as relevant to those?

Dynamic languages obviously also do stuff like this, but I’m not sure it is as relevant; on the other hand, for the part of our community that comes from dynamic languages (which includes me, having done a lot of Python), having to explicitly write out lossless & cheap conversions is maybe more annoying.

Unresolved questions

Template
  • What parts of the design do you expect to resolve through the RFC process before this gets merged?
  • What parts of the design do you expect to resolve through the implementation of this feature before stabilization?
  • What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?

Future possibilities

Template

Think about what the natural extension and evolution of your proposal would be and how it would affect the language and project as a whole in a holistic way. Try to use this section as a tool to more fully consider all possible interactions with the project and language in your proposal. Also consider how the this all fits into the roadmap for the project and of the relevant sub-team.

This is also a good place to “dump ideas”, if they are out of scope for the RFC you are writing but otherwise related.

If you have tried and cannot think of any future possibilities, you may simply state that you cannot think of anything.

Note that having something written down in the future-possibilities section is not a reason to accept the current or a future RFC; such notes should be in the section on motivation or rationale in this or subsequent RFCs. The section merely provides additional information.

13 Likes

I can absolutely see how this would make some code easier to write.

Speaking from personal experience, however, I’ve also had serious issues in large C codebases where I wished the compiler wouldn’t allow implicit conversions between integer types. In particular, when trying to change a type to a different integer type, you may want compiler-error-driven review of every call site / field access.

Given that we have safe coercions for all the safe cases, you can safely write .into() rather than needing as, often without specifying a type name.

Thus, my initial reaction is that I’d prefer not to have that. However, I wonder if there might be a better solution that doesn’t break people who want the more strict behavior, and I don’t want to assume that no such solution is possible.

25 Likes

Note that while u16 is Into<usize>, u32 isn't, because Rust doesn't want to rule out platforms where usize == u16. Most general-purpose "small index" types use u32, because 64KiB is a bit too restrictive, where 4GiB is at least a reasonable size limit for small data applications. (Assuming bytewise indexing.)

I could see supporting widening within i__ and u__, but between f__ and in-between i__ and u__ the difference is greater and requiring opt-in conversation, even when lossless, actually helps catch logic errors.

What about an allow-by-default lint able to catch implicit number type conversions? This way when refactoring one can just warn/deny the lint at the root of the module being refactored.

5 Likes

I think scoping this will be the difficult part.

I would be very happy for something like Foo { some_u32_field: some_u8 } to work, for example. But I get worried about things like x * (y + z) if that ends up being x * (((y as u16) + y) as u32) with multiple conversions inside with implications for wrapping.

There might also be weird things about this with literals -- if there's a coercion site, do more literals end up falling back to i32, and thus other things end up being i32 that weren't before? For example,

let mut x = 4; // Today in rust this is u8 because of the line below
x = 3_i8; // but if this is a coercion site, x could be i32
          // because the line above defaults to i32
          // and then this implicitly widens to i32

So I'd like to see more details about exactly when this would apply, and particularly how it'd interact with trait resolution.

Note that anything with traits could instead be done by implementing those traits for the other types. The current blocker for that is being able to "default" the inference on them, but I think that's wanted to support other versions of this, like comparisons between i32 and u32 that actually do the correct thing.

13 Likes

Thanks for the great feedback so far!

Can you explain what you would be reviewing for, exactly? My intuition would be that limiting this to lossless & cheap conversions makes the review superfluous, but my experience with C/C++ isn't that deep so I'd like to better understand what kind of pitfalls you might be looking out for.

Okay, so maybe usize and isize should remain out of scope because we wouldn't know across platforms if they're actually large enough to accomodate the value? (Perhaps we could technically still allow u8 -> usize but I feel like that would make the coercion too complex to reason about?)

Can you give some examples of the kind of logic errors that you're worried about here?

I can definitely see that an allow-by-default lint would make sense for this, so you could set it to warn if you're going through a tricky refactoring session, for example.

The way I would describe it in high-level terms is that I would see this happening only "at the top level", when assigning an rvalue into an lvalue of a particular type (I'm mostly thinking of struct fields and return values here). I think that would mitigate your concerns about inferencing and the intermediate value wrapping issues?

Is there an issue where this is being tracked? (I did some searching but couldn't quickly find anything.)

I am a little bit worried about the impact of this on performance ergonomics (making it easy to write the efficient code and hard to write the inefficient code).

Widening generally comes with a negative impact on performance in CPU-intensive computations, which is generally ~2x if the type has native support (you double the memory bandwidth and halve the vectorization potential) but can be more if support is emulated by software/microcode (like 128-bit integer support on most platforms, 64-bit integers on old/embedded chips, and most nontrivial floating-point functions).

For this reason, as a performance engineer, I find it great that Rust makes it easy for me to review use of widening in performance-critical loops, as opposed to, say, C/++. Losing this property would be sad, though I can see how it could also improve ergonomics in other areas than performance.

While being able to lint use of this feature on a per-module basis would help, it would still add one more thing to think about to the mental footprint of writing performance-critical Rust code, and wouldn’t help when calling functions from external libraries. And if usage of the feature becomes pervasive, globally enabling the lint wouldn’t be a realistic suggestion.

I would also like someone more knowledgeable about the matter to comment on the impact that this would have on type inference. It intuitively seems that it could make it harder to resolve in some cases, i.e. more sad “type annotation required here” error messages.

10 Likes

I think if you want to look for logic errors, you want to pay attention to where the conversion happens and why, and recognize that there are multiple places/justifications for the various decisions, which results in people writing code expecting one thing, while it actually compiles into something else.

let x = u8::MAX;
data[x + x]; // Is this an overflow?
let y = x + x; // Is this an overflow?
data[y]; // Would the presence of this line impact the above?
8 Likes

Let’s add another alternative: Allow all integer types to be used for indexing, without modifying the language coercion rules anywhere else.

  • Slice indexing is runtime bounds checked anyway, so it is already on the developer’s head to ensure that the slice’s size remains small enough to be addressed using whichever integer type they’ve chosen.
  • It’s pretty common for slice sizes to be limited by some other factor, such as a hard-coded cap, or the amount of data the human can enter before the universe’s heat death.
  • Unlike range iteration, string conversion, string parsing, and arithmetic, slice indexing seems to be the biggest spot where Rust won’t already allow you to use any integer type you want.
  • Slice indexing requires usize, but integer type inference defaults to i32. Does that seem kind of hostile, or is it just me?!
19 Likes

Tentative :+1: to this alternative, at least for unsigned types smaller than u64.

I’ve definitely run into codebases where the forced conversion to usize for indexing has forced careful thought about types elsewhere. For instance, u64 isn’t necessarily safe to allow (and that could break 32-bit platforms), and any signed type seems deeply problematic.

But allowing u8, u16, and u32 seems potentially reasonable.

6 Likes

As far as I can tell, u64 is the least bad type to allow other than usize itself for slice indexing, as long as the compiler-generated range check is correct. A u64 that's bigger than the address space should report an out-of-bounds panic. It seems like u8 would be worse, since, if the slice is bigger than 256 items, you will probably get an overflow while computing the index, and then you'll wind up getting the wrong item, instead of panicking.

2 Likes

The case I’m thinking of there is “compiles and works on my machine, fails at runtime when someone uses a larger u64 value”.

Forcing people to think about the difference between u64 and usize at compile time helps catch issues earlier. Allowing that code to compile pushes it to a runtime error that might not always happen.

That said, you make a good point regarding wrapping overflow in smaller types…

3 Likes

That might still be a correctness hazard. Right now, with struct Foo(i32); and x, y, z: u8, code like Foo(x * (y + z)) is a type error, which can prompt the programmer to explicitly convert x, y and z to i32 so that the calculation is performed over i32 and returns the expected result. With implicit widening of the 'top-level' x * (y + z) expression it will compile instead, perform the computation over u8 and potentially trigger incorrect wrapping or panicking.

17 Likes

For what it's worth I have also had the exact opposite problem, where Rust's current semantics made it really easy to write terrible code. I spent a day moving casts around trying to convince the compiler to do fewer widenings and narrowings in my inner loop, when the equivalent C was just fine from the start.

Though in fairness this may be more related to overflow semantics than integer conversions.

So perhaps what we want is less coercion and more "I don't care how big this thing is, let the optimizer pick."

2 Likes

I found a very old thread that seems relevant:

I also posted about potentially making foo[x.try_into()] just work in Idea: SliceIndex by Options and/or Results

Conceptually that seems plausible (though felix has a good concern about it), but I don't know how it would feel in practice. Would people be surprised that it doesn't work in other places, like function calls?

Maybe one thing that would help here would be if someone did a survey (in the looking and cataloguing sense, not in the ask people questions sense) of the places that .into() gets used between integers, with particular attention on figuring out where it's a complex case and where it's a trivial one, to see if there's a nice syntactic rule that could be used.

(It reminds me of lifetime elision, which came from "gee, it seems like this particular pattern is something like 80% of the uses of this, so maybe it can just be automatic in those". That "it doesn't need to get everything" criteria seems like the right thing here too, since if a solution here still means adding u64::from sometimes I'd be totally good with that if it makes the rules simpler and clearer.)

I don't know if there's an issue for it; it's come up in a few RFCs recently:

I'm not proposing this, but C9 does have types like uint_least32_t for such reasons.

Java definitely does have implicit widening conversions. Additionally it has implicit conversions from integer types to floating point types, even when they're lossy.

True, but note that this should only happen if the slice's count was truncated somewhere beforehand; otherwise, why would the program think the "larger u64 value" was within range of the slice in the first place?

I can think of two main places where this truncation might happen:

  1. In comparisons: if you have foo: u64, you might inadvertently write (foo as usize) < bar.len() instead of foo < (bar.len() as u64).

    To address this, I think the language should support cross-integer-type comparisons in general. After all, if x and y are integers, x < y always has a single mathematically correct answer, even if x and y have different types. In most cases, that answer can be computed by converting both operands to some integer type that is a superset of both, then comparing at that type; e.g. if (x, y): (u32, u64), you convert x to u64. In other cases, like (x, y): (i64, u64) the answer cannot be computed with single machine comparison, but requires two: x < 0 || (x as u64) < y. But a pair of comparisons is still cheap enough that it's reasonable for the compiler to implicitly produce one when translating x < y. (There is precedent in the form of u128/i128; every operation on those expands to a series of operations on smaller-width types.)

  2. When allocating: if you have foo: u64, you might write vec![0; foo as usize] without checking for overflow first.

    To address this, I think APIs like vec![elem; n], Vec::with_capacity, etc. should also allow arbitrary (at least unsigned) integer types, and treat overflow the same way as allocation failure.

    That way, you don't have to distinguish "can't have an array of 2^32 elements because this system is 32-bit" from "can't have an array of 2^32 elements because this system doesn't have enough RAM".

A drawback to all of these proposals – including the base one to allow indexing with different types, or even the original proposal to allow implicit widening – is that they interfere with type inference, which currently benefits from being able to assume that indices (or arguments to with_capacity etc.) are always usize, that operands to a comparison have the same type as each other, and so on. In theory it seems like it should be possible to make the compiler treat these things as defaults rather than requirements. But I've gotten the impression that making defaults work well with type inference is difficult in general...

9 Likes

This.
If you have a type mismatch, that generally means you need to change the types elsewhere, not cast the values.

Of course, there are exceptions, like the “compressed index” pattern:

struct Index(u32);

impl Index {
    fn new(i: usize) -> Index { Index(i.try_into().expect("index is too large")) }
    fn index() -> usize { self.0 as usize }
}

let x = v[i.index()];

where Index is logically usize, but needs to be compressed into a smaller type for those pesky performance reasons.
By abstracting that u32 into a type like Index you make it very obvious why the types are different and what you want to achieve.

The fact that Rust allows to easily avoid that C-style numeric type soup is one of its greatest achievements, at least for lower level programming.

6 Likes

I think such comparisons derail the discussion, and are unhelpful.

The proposal is not about copying C's behavior. Limiting conversions to only widening, only lossless conversions is substantially different from C's behavior.

Widening has been proposed several times, and every time it's impossible to have a reasoned discussion about the proposal, instead of C.

3 Likes