Over the past several months, there have been a large number of discussions about the role of a default “integer” type in Rust. Most recently, the core team posted a decision that was based on a lot of the discussion on Github and here, and based on a lot of discussions that we’ve had on and off throughout 2014.
Unfortunately, members of the core team didn’t participate much in this round of discussions, and some of the considerations that members of the core team had when making the decision weren’t adequately discussed on the RFC thread before we made the decision. Part of the problem was that this long-standing issue wasn’t properly shepherded, despite literally hundreds of comments that have been made on the topic.
In light of that, we want to restart the discussion. Since the latest discussion on Github, Reddit, and here, a bunch of us have spent a number of hours trying to outline all of the possible options, and tried to identify what we consider to be the costs and benefits of each approach. I’ll try to outline my analysis (and opinion) below, based on those discussions.
The goal is to restart the discussion. If you find yourself disagreeing with a cost or benefit that I discuss, or think I’ve missed something, please say so! I tried to incorporate the vast majority of the public feedback I’ve seen so far, plus conversations I’ve had with Aaron, Niko and Huon over the past few days.
TL;DR
- The status quo option (
int is machine-sized but we encourage people to use i32 by default), while well-intentioned, is problematic.
- There are two better options (Designs 2 and 3 below), both of which could be achieved without derailing the forthcoming alpha release. While both options are viable, there is a real tradeoff between them. I’ve tried to outline the basic contours here, but hope you’ll help us dig further into the details of the pros/cons in the ensuing discussion.
General Framing
First, let me start by framing the discussion: it’s about what int should mean in Rust (and if an int should even exist in Rust). The reason there is contention about this topic is that people believe (rightly so in my opinion) that whatever type is called int in Rust will often be used as a default choice for people who don’t have a better idea. This is not just about beginners; even seasoned developers sometimes reach for a “reasonably good-enough integer”.
In Rust, integers are commonly used for several kinds of things:
- To represent numbers with a domain-defined upper bound, like the number of seconds since the process started, the discriminant of an enum, or an index into an array with a restricted, program-defined maximum size.
- Math, in which there is no obvious, reasonable upper bound
- In cases where the upper bound is proportional to the amount of addressable memory, such as vector indexes, node ids, or numbers that represent sizes.
Regardless of which decision we choose, there will be some cases where casts are needed to convert from one type to another. Casts that truncate values can be very harmful and lead to portability issues, and if programmers find themselves needing to cast very often, they are unlikely to have clarity about what the original types were trying to express. This means that we should try to find solutions that keep casts to a minimum, even though we cannot avoid them entirely.
An orthogonal, but related design decision is whether we should allow implicit coercion from one integer type to a larger integer type. This reduces the incidence of casting, and should not generally be dangerous, but the precise allowed coercions might differ from platform to platform (e.g. whether a machine-sized integer can be coerced to an i32 would be different on 32-bit or 64-bit systems).
It is also not controversial to say that if you are using numbers that safely fit into i32, using i32 will usually be faster than using i64 or machine-sized integers (on 64-bit systems).
Status Quo: The int Type is Machine-Size But We Tell People to Use i32
The design proposed in the earlier discuss post aimed to compromise between the current definition of int and a widespread desire for i32 to be the default type by keeping int as machine-sized, but recommending i32 for general (non-indexing) use.
However, given that people are likely to use int as a default no matter what the guidelines tell them, it is problematic to use int for machine-sized types if they are not the recommended default. That is, if we want i32 to be the default, that’s what int should mean. While such a change would entail churn – a real risk at this stage – it is surmountable via a strategy I will outline in Design 2.
On the other hand, as we’ll see with Design 3, it may turn out to be quite sensible to keep int as machine-sized but also recommend it as the default type, sending a consistent message.
Design 1: No int Type
One solution to the problem is simply not to have an int type (but instead have the sized integer types plus something like isize). Instead, whenever you want to use an integer, it is up to you to figure out what the best size for that usage is.
In practice, I believe that this will lead to conventional choices about the best choice to make if you don’t have a better opinion. Those choices might come from our own design guidelines, conventional opinion, or popular answers on Stack Overflow. There is a good chance that there would also be multiple conventional answers, depending on who you ask.
In other words, if we decline to provide an explicit default in the language, or guidance about a default, many people will end up with their own “go to” default. We lose the opportunity to help people make a good decision, or to lead with the design of the standard library.
On the other hand, as with the status quo above, if we suggest a default in the style guide there is very little reason not to make that the default integer type as well.
Design 2: The int Type is an Alias for i32
There are several possible variations on this line of reasoning, but I will focus on the one that preserves i32 as the default, while minimizing necessary casting.
We (Niko, Aaron and I) have given this proposal a good deal of thought (including a plausible strategy for rolling it out before 1.0), and think it’s a reasonable contender for a solution.
- The
int type is an alias for i32.
- Whenever a Rust method wants to represent an offset or size, it uses
isize, which is a machine-dependent size. (The name isize is a straw-man.)
- Rust has an implicit coercion from smaller types to types of bigger or equal size, which means that you can use an
int as an index into a vector on 32-bit and 64-bit systems.
Pros:
- This design encourages people to use 32-bit integers when they don’t have a better idea in mind.
- Assuming that their usage can always fit into 32-bits, this will result in faster math operations, and therefore somewhat faster programs.
- This design encourages people to use a fixed-size integer when they don’t have a better idea in mind. This could help people who are targeting both 32- and 64-bit systems with robust test suites (but who don’t run their suites on 32-bit systems) catch some bugs that would otherwise remain hidden until their code was run on 32-bit systems.
Cons:
- For programs targeting 64-bit deployments, this increases the chance of overflow errors relative to the other defaults we are considering. While some kinds of programs might still overflow even with a 64-bit integer, there are many domains where 32-bits aren’t enough, but 64-bits is far more than enough (e.g. seconds since the epoch, nanoseconds since program start).
- When a program uses an
int to refer to something proportional to addressable memory (common when working with data structures), this significantly increases the chance of an overflow on a 64-bit system. Because int is the default, this situation is likely to arise even in cases where isize would have been a more appropriate choice.
- Additionally, it becomes very easy to accidentally write data structures that do not support the full 64-bit range of data on 64-bit systems. For example, Java’s mmap functionality does not support more than 32-bits, because it was accidentally written with 32-bit numbers in mind. This problem would not become apparent until someone tried to use the data structure with a very large number of elements. It would trigger a silent overflow in production, and could not be fixed compatibly (without breaking existing downstream consumers).
- It would break some existing code that stores the result of a method like
pos or Iterator::enumerate in an int field or variable. This isn’t a primary consideration; if the balance of factors leaned in favor of this option, there’s a way to roll it out without delaying 1.0, which I’ll discuss next.
If Rust went with this approach, the roll-out plan would look like this:
- In Rust 1.0-alpha, we would introduce an
isize type.
- Between 1.0-alpha and 1.0-beta, we would change the
int type to be an alias for i32 and change all of the standard library to use int and isize appropriately.
- For 1.0-beta we would introduce a temporary deprecation for the literal
int type used when an isize is expected. This would help people who are currently using int to mean machine-size to transition to isize.
- In 1.0-final, we would remove the deprecation, making
int a literal alias for i32, which would allow people to use the default integer in indexes.
Design 3: The int Type is Machine-Size
This is similar to the status quo, except that we would encourage people to use int as the default integer type, rather than i32. The pros and cons of this choice are largely the flip side of Design 2. I’m not going to repeat all of the details here, so please read Design 2 above.
This is the second option that I think deserves serious consideration. Personally, given the weight of the pros and cons, I am leaning towards this option.
Pros:
- Decreases the chances of overflows in applications targeting 64-bit when using the default
int type.
- When a program uses an
int to represent something proportional to addressable memory, this drastically reduces the likelihood of overflow on all architectures.
- Virtually eliminates the possibility of accidentally writing data structures that do not work with a very large number of elements on 64-bit architectures.
- Since the current
int is machine sized, this is the only option that doesn’t break any existing code.
Cons:
- If the particular usage of the integer could have fit into 32-bits, this will result in slower math operations with that integer on 64-bit systems. Of course, there is always the recourse of using
i32 explicitly in such cases.
- If a program is also targeting 32-bit systems, but is robustly tested only on 64-bit systems, Design 2 might more readily discover mistakes involving numbers outside the 32-bit range that would result in incorrect programs on 32-bit architectures. This can be mitigated by testing on the platforms you ship for.
Design 4: The int Type is an Alias for a Bigger Integer
At first glance, one way to further mitigate overflow problems across all architectures (for integers used without much thought) is to make int an alias for an even bigger number, like i64 or a BigInt.
Unfortunately, if the program then attempts to use the default int as an index, it will be forced to cast it down to the size of the index. In practice, this would result in more widespread, dangerous (truncating) casting, which would reintroduce another kind of pernicious, platform-specific overflow problem. It would also significantly harm ergonomics.
Additionally, a default of i64 on 32-bit systems or BigInt on any architecture has a large performance tax that is unwarranted given that it doesn’t even effectively eliminate a large class of overflow problems (the ones introduced by truncation).