Unsigned integer type inference


#1

Hello all,

I’ve stumbled upon yet another bug in C++ code due to mixing signed and unsigned types, and decided to check how it might be handled in Rust.

Actually, I’m a bit surprised.

fn main() {
  let a = -2;
  let b:u32 = 2;
  let c = a / b;
  println!("{} / {} = {}", a, b, c);
}

Output

<anon>:2:10: 2:12 error: unary negation of unsigned integers may be removed in the future
<anon>:2 	let a = -2;
         	        ^~
error: aborting due to previous error

So far so good, but it looks like a special case. Minor changes and we got a terrible result of mixing signed and unsigned types.

fn main() {
  let a = 2;
  let b = 6;
  let c:u32 = 2;
  let d = (a - b) / c;
  println!("({} - {}) / {} = {}", a, b, c, d);
}

Output

(2 - 6) / 2 = 2147483646

Why such unsafe code is possible in such safe language?

Also, it is hard to reason about a piece of code like let a = 2, as it is not possible to know whether a is signed or unsigned.

Thank you


#2

Compiling that in debug mode results in an overflow assert at runtime.


#3

If you had compiled your example in debug mode it would just panic in run-time.

thread ‘’ panicked at ‘arithmetic operation overflowed’

There was very heated debate about overflow checks in release builds, run-time costs and other stuff some time ago and it was decided that overflows should be checked in debug for testing but not in release because of performance impact.

Compile-time constrains would make every arithmetic expression a mess because every non-constant expression can possibly overflow.


#4

I suppose this question is better suited for https://users.rust-lang.org :slight_smile:

Actually, your code does not mix signed and unsigned types, a, b, c, d are all u32. Subtraction of unsigned integers is defined in Rust. More over integer overflow is defined in Rust (unlike signed integer overflow in C++). Overflow will produce a panic! in debug build, and will wrap in release build.


#5

Unsafe means something very specific in Rust. Over/underflow cannot cause memory unsafety, even though it’s obviously not desireable.


#6

Ok, let me clarify the question a bit.

I’m aware of overflow checks in debug and their absent in release builds, and it is obviously good solution.

I’m wondering why this code is not rejected by compiler? It is error-prone code, isn’t it? I would rather add some explicit type annotation to make it compilable.

Yes, I understand, but even if it is due to type inference rather than integral promotion in C++, it has absolutely the same result: implicit switch from signed to unsigned arithmetic.

This code leads to unexpected switch from signed to unsigned arithmetic, it is not about over/underflow.


#7

As far as I understand, there is no signed arithmetic in your example. All initial values and all intermediate values are u32. May be there is a terminology issue here? What is your definition of unsigned and signed arithmetic?

I’m wondering why this code is not rejected by compiler?

It is impossible to predict at a compile time if acertain operation will overflow, hence the run-time checks.

Also, it is hard to reason about a piece of code like let a = 2

You can use a literal suffix to make this obvious:

let a = 2u32;
let b = 2u64;
let c = 2i64;
let d = 2isize;
let e = 2usize;

#8

@artemkin do you want this code to be rejected?

let a = 6u32;
let b = 2u32;
let c = a - b;

?


#9

No. It is absolutely valid and explicit unsigned code.

The same as

let a = 6;
let b = 2;
let c = a - b;

is absolutely valid signed code (don’t you expect unsigned arithmetic by default, do you?).

The problem is that adding

let d = c / some_unsigned_val

implicitly changes signed code to unsigned.


#10

Hm, I still don’t get this…

2u32 - 6u32 == 4294967292u32 <- this is valid, because we overflow aka wrap aka make calculations in Z/2^32 aka calculate modulo 2^32

4294967292 / 2 == 2147483646 <- this is valid in almost every imaginable sense (although I can image a case when it is invalid :slight_smile: )


#11

Wrapping is subject to debug asserts. Release mode doesn’t check it, but debug builds will panic.


#12

Ok, read this code. It is pretty straight forward isn’t?

fn print_foo(foo: Foo) {
   let a = 2;
   let b = 6;
   let c = a - b;
   let d = c / foo.count;
   println!("{}\n", d);
}

fn main() {
   let foo = Foo { count : 2 };
   print_foo(foo);
}

Sure, Foo is defined in another module/file/library, as it is usually is in non-trivial code. One morning someone changed Foo definition from

struct Foo {
   count: i32
}

to

struct Foo {
   count: u32
}

make sense?


#13

Hm, I’ve found one more potential source of confusion here. Integer literals without suffixes are polymorphic. Their type is inferred from use, if it is unambiguous, and defaults to i32 (am I correct here?) if it is not constrained.

That is, in the following code

let c = 92;

The c can have any integral type, and you need to see usages of c to determine it’s precise type. In your first example,

let a = -2;
let b:u32 = 2;
let c = a / b;

all three variables are typed as u32 because of the explicit annotation for b.

So this

Is not always true. There is not enough information to say if these are signed or unsigned numbers. If the next line is, say, let d = c / 2u32 then these are signed 32 numbers. But if the next line is let d = c * 92i64 these are signed 64 bit numbers.

This looks complicated, but in practice is rarely a problem. Oftentimes you have an explicitly typed variable in the expression, and you can always use suffixes. Please not also that the issue is not with type of an arithmetic expression, but with the type of a literal.


#14

Yes, this is a potentially problematic example, but it is only because the d is not actually used anywhere.

If it was

fn make_d(foo: Foo) -> i32 {
   let a = 2;
   let b = 6;
   let c = a - b;
   let d = c / foo.count;
   d
}

then you’d get a compilation error.


#15

This is error-prone even if d is returned from the function:

fn change_foo(foo: Foo) -> Foo {
   let a = 2;
   let b = 6;
   let c = a - b;
   let d = c / foo.count;
   Foo { count : d }
}

#16

Your types were unspecified, so they get inferred from Foo::count. If someone changes that API, the inference will follow the change too.

What do you wish would happen?


#17

Yes, in such case the only guarantee is a type annotation on any variable or a suffix on any literal. I think it’s a dilemma of static polymorphism in general:

  • if you change a type, you don’t need to change all it’s usages and it’s good.
  • if you change a type, the semantics of each usage is silently changed and you are not warned by a compiler error and it’s bad.

Here is an example of a similar issue without integers:

struct A;
struct B;

impl Default for A {
    fn default() -> A { A }
}

impl Default for B {
    fn default() -> B { B }
}


trait T {
    fn do_something(&self);
}

impl T for A {
    fn do_something(&self) {
        // Make something good.
    }
}

impl T for B {
    fn do_something(&self) {
        panic!("Destroy the world.")
    }
}


struct Foo {
   field: A
}


fn act(foo: Foo) {
    foo.field.do_something();
}

fn main() {
   let foo = Foo { field: Default::default() };
   act(foo);
}

#18

FWIW, there was a period of time when there was no implicit fallback type at all. If an unsuffixed literal couldn’t be inferred, you’d get an error. So your example of “let a = 2; let b = 6; let c = a - b;” would have failed if there was nothing else to infer the exact type.

RFC 212 restored the fallback as i32, which might be interesting reading for you (including PR comments).


#19

To be clear: default inference basically only kicks in for examples or unit tests. All other code I’ve ever seen pretty quickly forces the types to be concrete (either by interacting with a struct or a function). For instance, if you index into an array, it’s gotta be a usize.

The fact that you can get silly things in toy code is not particularly concerning to me.


#20

One minor caveat / question: I haven’t checked this but I’d personally expect some kind of lint/warning when a negative literal is inferred to be unsigned. I know that C programmers think that:

unsigned a = -1; 

is perfectly fine and idiomatic but even in C this is a type unsafe way of writing the equivalent type safe:

unsigned a = ~0;