Implicit widening, polymorphic indexing, and similar ideas

I’m curious if we’ve already considered/rejected the idea of parameterizing containers also by the index type. It could be something similar to Haskell, which uses Ix to get a numeric value out of them. Or, it could just be restricted to numeric types.

Some languages (such as Ada) allow arrays over enums. It’s handy, but not devastating to not have.

I believe this is the idea behind the "polymorphic indexing" suggestion.

I guess “polymorphic indexing” could mean two different things. One would be a trait that can convert another index type into the usize or specific type used in the container. The other would be to make the index type one of the type parameters of the container type itself (similar to the keys on the map containers). The difference is subtle, however.

I’m not sure if it matters as much, since Rust doesn’t have ranged types. For example in Ada, it’s possible to have an integer type guaranteed to be in a specific range (e.g. 1 to 10). An array build with that type as the index of the array could forego range checking, since values of that type will be range checked earlier. This is probably less important with Rust having better iterators.

@darinmorrison, @d3zd3z, I believe “polymorphic indexing” refer to the former, i.e. multi-dispatching using the Index* traits.

Parameterizing the containers can be a useful extension. Some comments of RFC PR 544 discusses why uint(usize after renaming) is the best default indexing type and the problems with parameterizing the containers. (That comment thread is long. For a reference point, @AaronFriel on github ignited the related discussions in that thread.)

Here is a graph of the coercions for easier digestion by the way:

(It was too close to DOT syntax for me to resist!)

3 Likes

A random idea: Would you use a syntax that only does manual widening?

Unlike as, it would only compile when it widens or no-ops. (Handling of isize and usize thus depends on the target.)

2 Likes

I do think that having different syntaxes for widening and narrowing casts, whatever they are, should be an uncontroversially good idea. It doesn’t need to be as ambitious as having the widening case involve no syntax at all, i.e. implicit widening (can be, possibly even should be, but doesn’t need to be).

3 Likes

I’m not sure what kind of argument this represents, but I was pointed at a significant gotcha in a simple program using integer type inference, and I figured it belonged in this thread as an example of a behavior to avoid.

There’s a small Twitter thread: https://twitter.com/jamiiecb/status/553698334754742272

I understand that on the most recent version of Rust it no longer builds, and the poster does not clarify what version they’re running, but it was posted yesterday.

EDIT: There’s a closed issue about it here, apparently: https://github.com/rust-lang/rust/issues/14020 . This seems like a great example of exactly the kind of silently-erroneous behavior that the handling of integer types should be designed to prevent.

See also https://github.com/rust-lang/rust/issues/5477 .

What’s the use of allowing polymorphic indexing with types that don’t widen? If I’m indexing with a type that might be out of maximum bounds, I want to have to do an explicit cast.

@gwillen As far as I can tell, If we do this, then that will be caught at compile time, and with https://github.com/rust-lang/rfcs/pull/560, it would be caught at least at runtime, in debug builds.

@fread2281 I don’t understand your first sentence (question). With respect to the second, as arrays (Vecs, etc.) have no statically known minimum length, an index held in any type might be out of bounds.

If we give arrays a length type parameter (that defaults to usize), and arrays can only be indexed by types that are smaller or equal to the length type parameter, I think that would catch more bugs. My question is: if i’m indexing an array with an u64, shouldn’t I have to make an explicit cast? I know that out-of-bounds errors are still possible, but if I’m indexing with an u64 in a usize sized array, likely I should be using a different type somewhere. Even if arrays don’t have a length parameter, I would like for it to be visible in my code that I’m doing so. OTOH, if I only plan for 32 and 64bit support, indexing a usize sized array with u32 is perfectly sensible. (and if I only plan for 64bit support, indexing with a u64 is).

No, it is not an argument for using i64 everywhere. I’ve worked with many codebases that use a lot of lookup tables. The lookup tables are large, so it is important to use an element type that does not waste bits. Hence, a lot of lookup tables are u8 or u16. These lookup tables can provide indices into other lookup tables. So consider an expression like a = foo[bar[i]]. In Rust, currently I have to use a = foo[bar[i] as uint. The as uint is required in order to convert from u8 or u16 to uint, just so I can index into foo[].

This is verbose and annoying. But what’s worse, this actually harms correctness, because it is impossible to distinguish a conversion which loses data (such as converting from u64 to uint, since uint can be 32-bit on 32-bit platforms) from a conversion which cannot lose data (such as converting from u8 to uint).

4 Likes

I disagree that implicit widening is ‘a pure usability win’. Recently I was writing some C code that was (very loosely) along these lines (I think it was actually a vector cross product, but it’s the same idea):

long long foo(int x) {
    return x*x;
}

Here, x is squared (possibly overflowing) and then implicitly (and losslessly) converted to a long long. However, this code gave no warning whatsoever and led to a bug. The correct code in this case would cast x to a long long before squaring it. Another example from @petrochenkov in the safe integral coercions RFC:

use std::iter::AdditiveIterator;
fn silly_hash(string: &str, modulo: u32) -> u32 {
    string.bytes().sum() % modulo
}

This is the same idea, but uses sum() instead of explicitly using the + operator. The problem here is that string.bytes().sum() returns u8 instead of u32. The right code here would be string.bytes().map(|x| x as u32).sum() % modulo.

That said, I really dislike having to explicitly coerce integers all over the place, so it would be nice to somehow get the best of both worlds. One option that fixes the first example above is to somehow (I’m not quite sure how) let arithmetic operations between two integral types coerce their operands to the expected result type (if any) before operating. That is, something like impl Add<i32, i64> for i32 that defaults to returning i32 if the result type cannot be inferred. In other words, instead of coercing the operands to the largest of the two input types when performing arithmetic operations on integers, coerce the operands to the largest of the two input types and the result type.

There's been quite a bit of discussion of these very hazards and this potential solution throughout this thread...

Those two examples of hazards are certainly good examples.

It may be helpful to distinguish between implicit promotion where semantics can be changed (such as these two hazards), and implicit promotion where semantics cannot be changed. The example that I keep coming back to is array indexing. Widening u8 to usize for array indexing cannot change the meaning of the array access. Widening any integer type to usize that is <= the size of usize similarly cannot change the meaning of array access.

I know some people may be uncomfortable with this, but I’m willing to say that usize should have a defined lower bound on its size, and that it should probably be u32 (or even i32, but the signed debate is one I would like to hold at arm’s length for now), instead of u16. So promotion from any of { u8, u16, u32 } to usize should always be safe (cannot change semantics), and so should always be implicit.

1 Like

That’s precisely what polymorphic indexing provides, without having to have a general implicit widening rule.

I’m starting to think that there really should be different syntax for lossless and lossy explicit conversions. as uN spam being visual/typing clutter is one thing, which I think is justifiable, but if that spam makes it hard to identify the much more dangerous lossy conversions, it may hurt more than it helps.

4 Likes

I agree:

+1 for Polymophic indexing -1 for Implicit widening

The bug in the above code has absolutely nothing to do with implicit widening. Here's the same code written in Rust, but using explicit widening for the result of the expression since Rust demands that:

fn foo(x: i32) -> i64 {
  (x * x) as i64
}

fn main() {
  println!("{}", foo(2147483640))  // prints 64
}

And this has the same bug, even with explicit widening.

You are complaining about overflow of intermediate results, not the lossless expansion of a 32bit signed integer to a 64bit signed integer.

You might say that the compiler error about the missing cast in Rust would make you write the expression as x as i64 * x as i64 instead of what I wrote. If so, then that's a complete coincidence and doesn't mean that the explicit widening is somehow safer. Here's the same code with a much lower chance of the user correctly changing it:

fn foo(x: i32) -> i64 {
  let a = x * x;
  ... lots of code ...
  a
}

Upon encountering the type error pointing at the final statement, the user would rewrite it as a as i64 and still have the bug.

Note that even with casting x to i64 the code still has an overflow bug, only for larger numbers (which probably makes the bug worse since it's harder to encounter it). If you don't know what domain your numbers take, use a BigInt type. No amount of explicit casting will help you if you don't.

And again, the problem is overflow and not implicit widening. All of your examples amount to "guns kill people so we should ban straw hats."

I have heard many people complain about lossless, implicit widening and every time they are actually complaining about something else that's completely unrelated, usually overflow.

3 Likes

I think you misrepresent arguments and determine that they aren’t “arguments”. This discussion style is unacceptable.

Please point out the arguments I have misrepresented. If I did so, I certainly didn't do it on purpose.