[Pre-RFC] Implicit number type widening

djc · June 30, 2019, 9:48am

As I wrote in my previous post:

Although the trait implementations maybe don't really implicitly coerce, I think the behavior is sufficiently similar that we can discuss them in a similar way? Or do you see important differences? Also I think my refined proposal above in that case still explains what we're trying to achieve, although I suppose it could be more explicit in pointing to trait implementations as a solution. (Your reasoning about turning Index implementations into try_from() calls makes the tradeoffs in the Indexing scenario much clearer to me -- thanks.)

Also, my motivating use case originally was mostly about the Store scenario, which I don't think is covered by the suggestions of trait implementations.

In my refined proposal, I don't think it would happen like this. Since x, y and z are explicitly typed as u8, the calculation would happen over u8 and only at the point of storing into Foo.0 a coercion to i32 would trigger -- so this should be perfectly safe.

That is a downside. On the other hand, mistaken casts can already easily cause problems without this proposal. Part of the idea here is to make the use of as less necessary in safe cases, so that unsafe cases "stick out" more.

felix.s · June 30, 2019, 10:43am

This ordering surprises me a bit, since the intended semantics of polymorphic indexing (if an index cannot be converted to usize, treat it as out-of-bounds) seem more obvious and unambiguous than that of arbitrary arithmetic, where you have the problem I raised, that widening does not strictly speaking commute with arithmetic operations.

If you're in a situation where you need to compare bit representations of two values, you shouldn't use signed integer types for them. Signed integer types are exactly what they say on the tin: types that represent integers.

I don't think that is a very good argument: if you do mix them up with implicit widening, you'll still get a type error when fitting the 'count' argument into the 'byte' parameter, because it seems unlikely we'll have an implicit usize → u8 conversion.

That is exactly what I said, and that's the problem. The calculation will be performed over a type where it can overflow and therefore panic or give an erroneous result, and only later converted to the target type. What the programmer meant is to convert types first, guaranteeing an exact result without overflow occurring. With 'late' widening as you propose, this will not be caught by the compiler any more.

RalfJung · June 30, 2019, 1:13pm

count is often 1 (remember, that's multiplied by the size of T to obtain the number of bytes that are written). So that will just implicitly become a u8 then. There will be no error with implicit widening if I do ptr.write_bytes(1, 0u8) instead of ptr.write_bytes(0u8, 1).

Ixrec · June 30, 2019, 5:13pm

At the risk of creating the biggest possible tangent, to me the ptr.write_bytes(0u8, 1) case feels more like an argument for named arguments ptr.write_bytes(val: 0x0, count: 1) than an argument against implicit widening.

josh · July 1, 2019, 3:52am

This is something a lint could potentially catch (“you’re casting from a wider type to a narrower type than the type of the field you’re then storing into”), but it’s also an excellent example of why I don’t want fully general widening. I’d love to have a few more impls though.

gnzlbg · July 1, 2019, 6:51am

Since usize can be 16-bit wide, allowing u32 would have the same issues as allowing u64.

@RustyYato

Rust will not support 8-bit architectures, if we were going to we wouldn’t have made u16: From<usize> . Because on 8-bit architectures usize = u8 .

Note that C does not support 8-bit architectures either. Having said that, if someone ever adds an 8-bit target, we could probably #[cfg(target_pointer_width >= 16)] those impls out. Right now those impls being unconditionally available is correct for all targets that we currently support, so there is no point in cfging them out.

djc · July 1, 2019, 7:00am

On further reflection, I think I now understand what you mean about the differences between this and simpler coercion. Will spend some more time thinking about sensible design options.

Thanks for explaining it again. I wonder if there's a way to do early widening then, where the widening is propagated all the way to the origin of the variables (similar to how type inferencing works).

Nemo157 · July 1, 2019, 8:33am

If it affects the variable type, then is that not just inferencing? If it doesn't affect the type, (e.g. when the variable is explicitly typed) then that seems likely to cause issues similar to C's implicit widening still.

As an example, I had code similar to this:

let now: u8 = ...;
let start: u8 = ...;
let timeout: u8 = ...;

let timedout = (now - start) >= timeout;

This relies on 2's complement wrapping on the subtraction to deal with now overflowing correctly (so will have some protection via having to use the wrapping operators in Rust). So as an example, if we have (now, start, timeout) = (5, 250, 8) then timedout = true because 5 - 250 should wrap around to 11. But, because of C's implicit integer promotion the expression now - start promotes both operands to int type, with the same example values 5 - 250 really becomes -245, similarly the timeout is promoted and now the comparison -245 >= 8 returns false.

Obviously the reliance on wrapping behaviour means this example doesn't directly apply to Rust, and it's not quite the same since it's promoting temporaries, but it seems likely that a similar example could be constructed for any implicit widening behaviour when a users expectations don't exactly match the actual behaviour.

kornel · July 4, 2019, 9:45pm

My biggest pain is that trying to store lengths as u32.

It makes all code that uses u32 littered with noisy as usize for no good reason: I don’t want usize, and there’s no technical reason to use usize (CPUs support indexing by smaller types, some even have special indexing modes for them). It feels like needless busywork to make the compiler happy.

Using full 64 bits for something like Person {number_of_legs: usize} feels silly.

Implicit widening u32 -> usize would solve that pain. impl Index<u32> would be an 80% solution, also acceptable.

DDOtten · July 4, 2019, 9:58pm

As stated before in this thread widening u32 to usize is not an option because we support targets where usize has the size of u16. What would be an option however is implementing Index for all unsized types.

197g · July 4, 2019, 10:00pm

To add to this consider also the opposite problem: Some code might want to an index but support wide lengths as u64 even when the native pointer size of usize is only 32-bits (such as Seek::seek api). However, these could still be used as array indexing if small enough. Especially when the actual index ends up as a difference, e.g. relative to a local buffer, this is very likely convertible to the an offset regardless and this assumption feels no different than the one in-range one made when code uses a[idx]. Conversely, methods with the style of get(&self) -> Option<_> should also work for similar reasons.

kornel · July 4, 2019, 11:23pm

I can’t wait for portability lints to kill the defunct support for 16-bit platforms. It’s absurd for my programs that have 2MB+ of code and need hundreds of MB of RAM to function, to pretend to fit in 64KB. In my day to day coding I don’t support 32-bit platforms any more.

16-bit should be a strictly separate ecosystem, like no-std. Imagine if Rust didn’t support Vec at all, for any target, merely because no-std targets exist and some programs wouldn’t compile for them.

scottmcm · July 5, 2019, 4:44am

Can you elaborate on where you want the widening in the other 20%?

H2CO3 · July 5, 2019, 10:06am

You’re confusing two problems here. usize is not only variable-width; it’s supposed to be the “native” integer type, for indexing and measuring lengths within the memory model of the host platform.

If you are trying to index into an array or interact with the memory model otherwise, use usize. Yes, this means storing the number of legs of a person as a usize.

In contrast, u32 and other fixed-width integer types are suited for use in protocols where no immediate interaction with the host memory model is expected, eg. serialization formats. But in this case, it’s only expected and fair that you need casting. For example, how would you know/ensure that a 32-bit system handles a serialized file longer than several (4) GB? Of course it can’t, so a cast would truncate in this case, and this need for a conversion would remind you to the potential error (that, I would argue, you should generally be handling in a more sophisticated way than just casting away width differences).

I regularly cringe when I read code that limits a quantity to 4 billion even on 64-bit systems by using u32. It also makes false promises on 16-bit systems that won’t be able to deliver more than 64k. This is what is absurd in my eyes, not the very well understandable incompatibility of types that may have not only physically different sizes, but also different semantics.

Yes, interaction between these two worlds (somewhat abstract protocols facing the outside world where fixed-width integers are an indispensable prerequisite for any sort of reliability, and the host memory model) is sometimes inevitable and desirable. But this doesn’t mean that we should be littering the language with careless and highly error-prone implicit conversions all over the place. Instead, we should aim for conversions that handle errors gracefully and explicitly, not just fail silently.

In these kind of situations I usually find myself writing functions like usize_to_u32() and the inverse direction. Overall, I think TryFrom impls would do much more good to correctness and much less harm to ergonomics.

newpavlov · July 5, 2019, 10:34am

I disagree. If I know that indexed array can not have more than 256 elements, why in the world do I have to store index in usize instead of u8? Why waste 7 bytes on 64 bit systems?

But I believe the correct solution is not implicit widening, but additional Index impls for primitive integer types (even maybe including signed ones). Of course to do it we will need a solution for type inference issues.

Dushistov · July 5, 2019, 10:39am

But is it possible? At now you should

let idx: u8 = ...;
arr[idx as usize]

as I understand you want use idx without converting it into usize, but is any CPU has instruction to index with not "machine word" type? May be even if for some unknown reason Rust starts support indexing with u8, in reality there would be conversation from u8 to usize?

newpavlov · July 5, 2019, 10:55am

Index<u8> impl will simply convert u8 to usize under the hood. I am not sure if it should be a blanket impl though, we probably should start with concrete impls first.

Dushistov · July 5, 2019, 10:57am

What the point then, why not write "as usize" and that's all?

newpavlov · July 5, 2019, 11:00am

Mainly yes, it will be an ergonomic win without any safety issues. Also it may prevent some errors, e.g. converting u64 to usize via as on 32-bit platforms will use truncation, while Index will be able to do proper checks. Same goes for indexing using signed types.

kornel · July 5, 2019, 11:05am

I know this it's supposed to be the right type, but it isn't always:

In many situations smaller working set is very important (due to caching, memory bandwidth), and storing sizes in 1/2/4 bytes instead of 8 (+ padding) becomes more important than the details of using it.
Most 64-bit CPUs we have today have a long 32-bit history, and they have tons of 32-bit addressing modes and shorter encodings for instructions with 32-bit operands.

For example, how would you know/ensure that a 32-bit system handles a serialized file longer than several (4) GB?

By having implicit widening! When I can rely on implicit conversions being always lossless, I won't have to use dangerous potentially-truncating as usize casts that could or could not be lossless, depending on context that is non-local.

In withoutboats nomenclature:

as usize is noisy,
when using u32 for lenghts, as usize is burdensome,
as usize is manual,
and the very important information: whether it's a lossless widening, or lossy truncation, is non-local (it depends on the types, which may be locally not present due to inference, and defined elsewhere in declaration of structs or return types).

While implicit widening would make use if u32 for indexing (assuming portability lints land too, or indexing is implemented for u32):

non-noisy
not burdensome
automatic
and local (you know it's widening, because truncation wouldn't compile)

Topic		Replies	Views
Implicit widening, polymorphic indexing, and similar ideas ideas (deprecated)	91	22888	March 25, 2019
Implicit numeric widening/coercion proposal language design	42	1139	November 3, 2025
`u32` as a second fallback type language design	31	2009	June 16, 2021
Subscripts and sizes should be signed language design	151	7841	December 7, 2022
Subscripts and sizes should be signed (redux) language design	41	2046	November 9, 2023

[Pre-RFC] Implicit number type widening

Related topics