Unsigned integer type inference

artemkin · December 11, 2015, 4:09pm

Hello all,

I’ve stumbled upon yet another bug in C++ code due to mixing signed and unsigned types, and decided to check how it might be handled in Rust.

Actually, I’m a bit surprised.

fn main() {
  let a = -2;
  let b:u32 = 2;
  let c = a / b;
  println!("{} / {} = {}", a, b, c);
}

Output

<anon>:2:10: 2:12 error: unary negation of unsigned integers may be removed in the future
<anon>:2 	let a = -2;
         	        ^~
error: aborting due to previous error

So far so good, but it looks like a special case. Minor changes and we got a terrible result of mixing signed and unsigned types.

fn main() {
  let a = 2;
  let b = 6;
  let c:u32 = 2;
  let d = (a - b) / c;
  println!("({} - {}) / {} = {}", a, b, c, d);
}

Output

(2 - 6) / 2 = 2147483646

Why such unsafe code is possible in such safe language?

Also, it is hard to reason about a piece of code like let a = 2, as it is not possible to know whether a is signed or unsigned.

Thank you

glaebhoerl · December 11, 2015, 4:20pm

Compiling that in debug mode results in an overflow assert at runtime.

pepp · December 11, 2015, 4:22pm

If you had compiled your example in debug mode it would just panic in run-time.

thread '' panicked at 'arithmetic operation overflowed'

There was very heated debate about overflow checks in release builds, run-time costs and other stuff some time ago and it was decided that overflows should be checked in debug for testing but not in release because of performance impact.

Compile-time constrains would make every arithmetic expression a mess because every non-constant expression can possibly overflow.

matklad · December 11, 2015, 4:22pm

I suppose this question is better suited for https://users.rust-lang.org

Actually, your code does not mix signed and unsigned types, a, b, c, d are all u32. Subtraction of unsigned integers is defined in Rust. More over integer overflow is defined in Rust (unlike signed integer overflow in C++). Overflow will produce a panic! in debug build, and will wrap in release build.

steveklabnik · December 11, 2015, 4:23pm

Unsafe means something very specific in Rust. Over/underflow cannot cause memory unsafety, even though it's obviously not desireable.

artemkin · December 11, 2015, 7:40pm

Ok, let me clarify the question a bit.

I'm aware of overflow checks in debug and their absent in release builds, and it is obviously good solution.

I'm wondering why this code is not rejected by compiler? It is error-prone code, isn't it? I would rather add some explicit type annotation to make it compilable.

Yes, I understand, but even if it is due to type inference rather than integral promotion in C++, it has absolutely the same result: implicit switch from signed to unsigned arithmetic.

This code leads to unexpected switch from signed to unsigned arithmetic, it is not about over/underflow.

matklad · December 11, 2015, 8:04pm

As far as I understand, there is no signed arithmetic in your example. All initial values and all intermediate values are u32. May be there is a terminology issue here? What is your definition of unsigned and signed arithmetic?

I'm wondering why this code is not rejected by compiler?

It is impossible to predict at a compile time if acertain operation will overflow, hence the run-time checks.

Also, it is hard to reason about a piece of code like let a = 2

You can use a literal suffix to make this obvious:

let a = 2u32;
let b = 2u64;
let c = 2i64;
let d = 2isize;
let e = 2usize;

matklad · December 11, 2015, 8:06pm

@artemkin do you want this code to be rejected?

let a = 6u32;
let b = 2u32;
let c = a - b;

?

artemkin · December 11, 2015, 8:10pm

No. It is absolutely valid and explicit unsigned code.

The same as

let a = 6;
let b = 2;
let c = a - b;

is absolutely valid signed code (don’t you expect unsigned arithmetic by default, do you?).

The problem is that adding

let d = c / some_unsigned_val

implicitly changes signed code to unsigned.

matklad · December 11, 2015, 8:18pm

Hm, I still don’t get this…

2u32 - 6u32 == 4294967292u32 <- this is valid, because we overflow aka wrap aka make calculations in Z/2^32 aka calculate modulo 2^32

4294967292 / 2 == 2147483646 <- this is valid in almost every imaginable sense (although I can image a case when it is invalid )

cuviper · December 11, 2015, 8:25pm

Wrapping is subject to debug asserts. Release mode doesn't check it, but debug builds will panic.

artemkin · December 11, 2015, 8:44pm

Ok, read this code. It is pretty straight forward isn’t?

fn print_foo(foo: Foo) {
   let a = 2;
   let b = 6;
   let c = a - b;
   let d = c / foo.count;
   println!("{}\n", d);
}

fn main() {
   let foo = Foo { count : 2 };
   print_foo(foo);
}

Sure, Foo is defined in another module/file/library, as it is usually is in non-trivial code. One morning someone changed Foo definition from

struct Foo {
   count: i32
}

to

struct Foo {
   count: u32
}

make sense?

matklad · December 11, 2015, 8:45pm

Hm, I've found one more potential source of confusion here. Integer literals without suffixes are polymorphic. Their type is inferred from use, if it is unambiguous, and defaults to i32 (am I correct here?) if it is not constrained.

That is, in the following code

let c = 92;

The c can have any integral type, and you need to see usages of c to determine it's precise type. In your first example,

let a = -2;
let b:u32 = 2;
let c = a / b;

all three variables are typed as u32 because of the explicit annotation for b.

So this

Is not always true. There is not enough information to say if these are signed or unsigned numbers. If the next line is, say, let d = c / 2u32 then these are signed 32 numbers. But if the next line is let d = c * 92i64 these are signed 64 bit numbers.

This looks complicated, but in practice is rarely a problem. Oftentimes you have an explicitly typed variable in the expression, and you can always use suffixes. Please not also that the issue is not with type of an arithmetic expression, but with the type of a literal.

matklad · December 11, 2015, 8:49pm

Yes, this is a potentially problematic example, but it is only because the d is not actually used anywhere.

If it was

fn make_d(foo: Foo) -> i32 {
   let a = 2;
   let b = 6;
   let c = a - b;
   let d = c / foo.count;
   d
}

then you’d get a compilation error.

artemkin · December 11, 2015, 8:58pm

This is error-prone even if d is returned from the function:

fn change_foo(foo: Foo) -> Foo {
   let a = 2;
   let b = 6;
   let c = a - b;
   let d = c / foo.count;
   Foo { count : d }
}

cuviper · December 11, 2015, 9:02pm

Your types were unspecified, so they get inferred from Foo::count. If someone changes that API, the inference will follow the change too.

What do you wish would happen?

matklad · December 11, 2015, 9:07pm

Yes, in such case the only guarantee is a type annotation on any variable or a suffix on any literal. I think it’s a dilemma of static polymorphism in general:

if you change a type, you don’t need to change all it’s usages and it’s good.
if you change a type, the semantics of each usage is silently changed and you are not warned by a compiler error and it’s bad.

Here is an example of a similar issue without integers:

struct A;
struct B;

impl Default for A {
    fn default() -> A { A }
}

impl Default for B {
    fn default() -> B { B }
}


trait T {
    fn do_something(&self);
}

impl T for A {
    fn do_something(&self) {
        // Make something good.
    }
}

impl T for B {
    fn do_something(&self) {
        panic!("Destroy the world.")
    }
}


struct Foo {
   field: A
}


fn act(foo: Foo) {
    foo.field.do_something();
}

fn main() {
   let foo = Foo { field: Default::default() };
   act(foo);
}

cuviper · December 11, 2015, 10:07pm

FWIW, there was a period of time when there was no implicit fallback type at all. If an unsuffixed literal couldn’t be inferred, you’d get an error. So your example of “let a = 2; let b = 6; let c = a - b;” would have failed if there was nothing else to infer the exact type.

RFC 212 restored the fallback as i32, which might be interesting reading for you (including PR comments).

Gankra · December 11, 2015, 11:25pm

To be clear: default inference basically only kicks in for examples or unit tests. All other code I’ve ever seen pretty quickly forces the types to be concrete (either by interacting with a struct or a function). For instance, if you index into an array, it’s gotta be a usize.

The fact that you can get silly things in toy code is not particularly concerning to me.

yigal100 · December 12, 2015, 2:53pm

One minor caveat / question: I haven’t checked this but I’d personally expect some kind of lint/warning when a negative literal is inferred to be unsigned. I know that C programmers think that:

unsigned a = -1;

is perfectly fine and idiomatic but even in C this is a type unsafe way of writing the equivalent type safe:

unsigned a = ~0;

Topic		Replies	Views
Revisiting the unsigned ideas (deprecated)	5	2063	March 25, 2019
Overflow checks and unsafe code Unsafe Code Guidelines	10	1818	March 25, 2019
Thought: switch the default on overflow checking and provide RFC 560's scoped attribute for checked arithmetic language design	20	2370	October 31, 2021
A tale of two's complement	62	24765	March 25, 2019
Comparisons between signed and unsigned integers?	6	1682	April 1, 2023

Unsigned integer type inference

Related topics