A tale of two's complement

There has been a lot of discussion about Rust’s integral types, and in particular a lot of questions about what type to use for integer fallback and what to call the “pointer-sized” integer type. We (the core team) have been reading these threads and have also done a lot of internal experimentation, and we believe we’ve come to a final decision on the fate of integers in Rust. The purpose of this post is to clarify that decision and explain our rationale.

Integral types. As today, Rust will continue to offer a wide variety of fixed-sized integral types (i8, i16, u32, etc) as well as two integral types whose sizes varies depending on the architecture (int, uint). int and uint are always defined to have the same number of bits as a pointer on the target platform (we assume a flat memory space).

Guidelines and fallback. To save memory and ensure consistent behavior across platforms, users are encouraged to use fixed-size types where possible. However, it frequently happens that you have integers that are tied to the size of memory: for example, indices into an array or the size of an allocation. In these cases, uint and int are an excellent choice, though it may make sense to use smaller, fixed-size types if you know that the length of the array (or size of the allocation, etc) is limited. In accordance with these guidelines, we are accepting RFC 452, which says that integer literals whose type is not otherwise constrained will fallback to i32. As part of the stabilization process, we plan to examine integer types appearing in libstd APIs to ensure conformance with this guideline.

Overflow. Our plan for handling overflow is to adopt a variation of RFC 146. Roughly speaking, the idea is that after every integer operation, there is a debug_assert! inserted by the compiler that checks for overflow. Because this is a debug asserion, it will typically be compiled out when performing optimizations. Since overflow cannot cause crashes or data races (outside of unsafe code), we can skip these checks without endangering Rust’s core value proposition of safe systems programming. Whenever checks are disabled, overflow will yield an undefined value (this is distinct from–and much more limited than–the “undefined behavior” you get in C).

This design aims to balance the benefits of overflow detection with the performance cost of checking for overflow on every integer operation. Also, by ensuring that every integer operation is checked for overflow in debug builds, we should be able to avoid people relying on the behavior of overflow, giving ourselves room to ratchet up the safety checking in the future (as well as possibly adding other ways to control when checks are compiled in).

For those cases where overflow is actually desired, such as hash computations, we will provide an explicit WrappedInt type that can be used to request wrapping semantics. This has the advantage of clarifying to the reader that overflow is an expected part of the calculation. Finally, the CheckedInt type (which exists today) can be used to guarantee that overflow checks are performed, if desired. In the future, we may provide more nuanced means of enabling overflow checks also for normal integers (such as scoped attributes).

Frequently Asked Questions

How are these details different from the status quo?

The two changes are to accept two RFCs. RFC 452 changes the choice of type used for integer fallback to i32 and RFC 146 defines overflow semantics as an error.

Why not use the name int for 32-bit values?

Having int be an alias for i32 (and uint an alias for u32) would create two names for the same type; moreover, the names i32 and u32 clearly communicate the size of the type. This does mean that, on 64-bit platforms, we differ from many C compilers, which use the name int to refer to 32-bit values; this can be a hazard when writing FFI declarations, and for this reason we have lints that warn about the use of int or uint types in such cases.

What about renaming int and uint to something more explicit, such as iptr or imem?

There have been numerous requests to rename the int and uint types. The primary concern is that the current names suggests that these types ought to be a user’s “default” choice, when in fact a pointer-sized integer is often larger than is necessary (not to mention that it will cause program semantics to vary depending on the target). We spent quite a lot of time deliberating this point and exploring alternative names.

Ultimately, however, we have chosen to leave things as they are. Given that changing the name of the type int would affect literally every Rust program ever written, the bar for making such a change, particularly at this point in the release cycle, is quite high. There seem to be several strong arguments in favor of the status quo:

  1. We believe that adjusting the guidelines and tutorial can be equally effective in helping people to select the correct type. In addition, using i32 as the choice for integral fallback, in particular, can help to convey the notion that i32 is a good “default” choice for integers.
  2. All of the alternate names also had serious drawbacks. For example, the type iptr strongly suggests that the value is a pointer cast to an integer. Similar concerns apply to isize, index, offset, and other suggestions. Ultimately, pointer-sized integers are used for many purposes (indices, sizes, offsets, pointers), and we don’t want to favor one of those uses over the others in the name. We did produce alternate branches using the names imem/umem as well as isize/usize, but found that the result was fairly unappealing and seemed to create an unnecessary barrier to entry for newcomers. Ultimately, whatever benefits those names might offer, they don’t seem to outweigh the cost of migration and unfamiliarity.

If integers silently oveflow in optimized builds, won’t this mask bugs in shipping code?

While integer overflow cannot lead to crashes or data-races, it is of course true that if we don’t check for overflow, it may lead to other sorts of bugs. This is unfortunate but we have to balance the real-world performance factors with the possibility of bugs. Also, we reserve the right to make checking stricter in the future. Finally, if a strange bug is encountered, at least a developer will quickly notice the overflow if they attempt to reproduce on a debug build.

Can’t overflow sometimes cause crashes?

There are corner cases in which incorrect codegen can cause LLVM optimizations to drop bounds checks, leading to segfaults. We consider any such case to be a bug in rustc. Also, one should take extra care in unsafe code that is doing pointer arithmetic or bypassing bounds checks, as overflow can violate invariants that you expected to hold. Of course, the fact that unsafe code can cause crashes is nothing new, that’s why it’s labeled as unsafe.

Aren’t you concerned about de facto lock-in for overflow?

There is certainly a concern that we will not be able to make overflow stricter without breaking code in the wild. We believe that having overflow checking be mandatory in debug builds should at least mitigate that risk, as intentional uses of overflow should be detected long before the code ever comes into widespread use (and redirected to the suitable wrapper type).

10 Likes

@nikomatsakis Have you talked about how you’ll handle integer shifts?

1 Like

Nice. I think this was the right decision, especially as it will keep the door open for exploring the performance impact on real-world Rust programs of overflow checking in optimized builds.

I’m sure this has all been hashed out, but to point out caveats from a security perspective:

  • Yielding an undefined value is a new type of surprising behavior in safe code. Wouldn’t it be difficult to optimize based on the assumption of an undefined value (but not behavior), since LLVM’s add nsw yields the latter?

  • Any type of crash-on-overflow makes code somewhat more susceptible to DoS, since code that handles user-submitted numbers may be made to produce what would otherwise be harmless overflows. Then again, they might not be harmless…

2 Likes

It won’t crash on overflow except in debug builds, which shouldn’t be run in production anyhow.

Is this change intentional? I thought the signed overflow is currently well-defined (in addition to being "safe").

1 Like

I’m a bit surprised by the reasoning here. You choose not to rename int and uint because it would affect existing Rust code, but then change overflow from being well defined as wrapping to an undefined value, something that existing code may currently rely on. The renaming would show up instantly when someone tries to compile existing source code, while the change from wrapping to an undefined value may cause existing and correct code that relies on wrapping to start failing mysteriously at some unspecified time in the future (because LLVM uses two-complements overflow, so the undefined value will probably be the correct number, except when certain optimizations trigger).

So given that the change of integer overflow semantics should really result in an audit of all existing Rust code, why not take this opportunity to also do the renaming of int and uint, which is a relatively minor change in comparison?

10 Likes

I’m very disappointed to hear that int and uint are not getting renamed. I think the current names gives the wrong idea of the purpose of these types and when you should use them, especially since they mean something different in (almost?) every other language that has similarly-named types. It’s also weird to have a default fallback to i32 (which I think is a good thing) while having a different “default-looking” set of types. While there was a lot of discussion in RFC #464 as to what specific name would be the best, the vast majority seemed to be in agreement that the most important thing was to change them to something other than what they are now, regardless of what that ended up being.

While it’s true that changing the name of int and uint will likely break a lot of code, doing so provides an opportunity to evaluate in each instance whether using them is correct, or whether a fixed-size type would be more appropriate. As it is, most usages of int and uint should probably be audited, anyway, so it doesn’t gain a lot. There have been several examples of places where even the standard library used int or uint where it wasn’t appropriate. If the worry is too much breakage at once, what about leaving in int and uint as deprecated aliases for a couple of weeks?

Also, I’m curious about the objections to imem and umem, since you didn’t list them among the types with similar concerns to iptr. These names seemed the most accurate to me, since the types are supposed to be used in contexts referring to pieces of memory in some way (sizes, indexes, offsets, etc.).

That said, I do agree with all of the other decisions outlined in the post. Thank you for all of your work.

10 Likes

A+ to this whole thing. It’s just about what I wanted!

I’m curious to see how the debug overflow checking slows things down in practice, so I’d really like a way to have them present in optimized/non-debug releases as well, but that’s something that can happen in the future changes once the rest of this is done.

It may be hard to trigger overflows in test cases if your code is non-optimized: Dealing with large enough test sets is probably impractical.

1 Like

@cgaebel Sorry, I was unclear, was more thinking about writing code with the future in mind. Actually, here’s a question: will there be a non-debug option to enable checking in optimized builds? Security-conscious users (those who prefer DoS to being wrong, anyway) may want to use this option.

Some other details:

  • WrappedInt sounds grammatically iffy to me, as opposed to WrappingInt.
  • What will the name of a fixed-size wrapping type be? WrappingI8, WrappingInt8, WrappingInt<i8>?
  • Any of those is pretty long, especially the last one. Is wrapping really so uncommon that it’s ok to write a long name whenever you want to do it? …Maybe it is.
  • Why not +% etc. like Swift?
1 Like

We just need to remember we had a ton of overflow bugs in our own std::vec code just a year ago. It’s a very simple pattern – use a function parameter value to add, multiply or subtract from one of the internal len or capacity fields, then plunge into unsafe code assuming that those values are still within bounds.

Rust users will learn to use unsafe code and will eventually also produce the same kind of bugs, if they aren’t careful.

I'm really disappointed to hear the core team made a wrong decision of leaving int/uint as they are.

It is only slightly more troublesome than renaming fail!() to panic!() or removal of coercion from a fixed sized array to a slice.

Rust haven't released v1.0. Anything can happen.

Perhaps I missed something, but I couldn't find anything informative under the above two links. What's different from the plain vanilla Rust documents?

4 Likes

Yielding a undefined value on every overflow makes me feel so insecure. Will LLVM have any more chance of optimization under the new semantics? I don’t quite understand why insertion of overflow check for the debug mode cannot coexist with wrapping semantics.

Lastly, I hope the core team can clarify their stance on codes like this:

http://www.reddit.com/r/rust/comments/2q40k2/a_tale_of_twos_complement/cn33adg

let a = 5u32;
let b = 7u32;
let c = (a - b) + 10u32;

Hello, I just created this account just to say BRAVO about the new way of handling the integer overflow problem, it looks very good. Just one word of caution: if you don’t make mandatory an option to preserve the overflow checking instruction in the optimised build, there is a risk that you’ll have the ‘gcc -ftrapv mess’: a non standard option and not even working sometime because there is no real pressure to make it work… If you specify a standard option and add a corresponding conformance test, this increase work for the compiler writers but it makes Rust much more interesting to users who look for robustness/safety first (IMHO).

I’m very happy that Rust removed the wrap-on-overflow semantics before 1.0.

CheckedInt seems to be a Mozilla C++ thing, not a Rust thing? I have been experimenting with a Checked<int/u32/…> type. As noted in the RFC this is the wrong approach, but it was still interesting to implement.

I agree with @comex and @nodakai that returning an undefined result instead of the wrapped result doesn’t seem useful, but it can be changed later if necessary.

@tbu: x86 (and thus LLVM) doesn’t have a “shl with overflow” instruction. Instead the carry is the last bit shifted out. You can emulate “shl with overflow”, for example by doing one bit at a time.

@victorvde My problem is that << currently is UB in safe code, using code as simple as this:

let shift = 32;
let undefined_behaviour = 1u32 << shift;

In the std docs, you see that uint and int have been replaced by different names.

What about just doubling the precision so that u8+u8 => u16? Then you can choose wrap, saturate, assert, or panic for your program and the default that the language provides would be safe.

Of course u64+u64 still needs to be figured out. Maybe the + operator isn’t defined for u64 and you have to be more explicit about which type of addition you want: add_wrap, add_saturate, add_assert, add_panic. That’s really the problem, there are multiple types of addition but only one + operator.

It’s complex to handle this right. The RFC 45 was about this but it was too complex

+1. I think this is the right direction.

While I like imem/`umem’ a lot (and hadn’t heard that suggestion before), this is a great step forward!