Over the past several months, there have been a large number of discussions about the role of a default “integer” type in Rust. Most recently, the core team posted a decision that was based on a lot of the discussion on Github and here, and based on a lot of discussions that we’ve had on and off throughout 2014.
Unfortunately, members of the core team didn’t participate much in this round of discussions, and some of the considerations that members of the core team had when making the decision weren’t adequately discussed on the RFC thread before we made the decision. Part of the problem was that this long-standing issue wasn’t properly shepherded, despite literally hundreds of comments that have been made on the topic.
In light of that, we want to restart the discussion. Since the latest discussion on Github, Reddit, and here, a bunch of us have spent a number of hours trying to outline all of the possible options, and tried to identify what we consider to be the costs and benefits of each approach. I’ll try to outline my analysis (and opinion) below, based on those discussions.
The goal is to restart the discussion. If you find yourself disagreeing with a cost or benefit that I discuss, or think I’ve missed something, please say so! I tried to incorporate the vast majority of the public feedback I’ve seen so far, plus conversations I’ve had with Aaron, Niko and Huon over the past few days.
- The status quo option (
intis machine-sized but we encourage people to use
i32by default), while well-intentioned, is problematic.
- There are two better options (Designs 2 and 3 below), both of which could be achieved without derailing the forthcoming alpha release. While both options are viable, there is a real tradeoff between them. I’ve tried to outline the basic contours here, but hope you’ll help us dig further into the details of the pros/cons in the ensuing discussion.
First, let me start by framing the discussion: it’s about what
int should mean in Rust (and if an
int should even exist in Rust). The reason there is contention about this topic is that people believe (rightly so in my opinion) that whatever type is called
int in Rust will often be used as a default choice for people who don’t have a better idea. This is not just about beginners; even seasoned developers sometimes reach for a “reasonably good-enough integer”.
In Rust, integers are commonly used for several kinds of things:
- To represent numbers with a domain-defined upper bound, like the number of seconds since the process started, the discriminant of an enum, or an index into an array with a restricted, program-defined maximum size.
- Math, in which there is no obvious, reasonable upper bound
- In cases where the upper bound is proportional to the amount of addressable memory, such as vector indexes, node ids, or numbers that represent sizes.
Regardless of which decision we choose, there will be some cases where casts are needed to convert from one type to another. Casts that truncate values can be very harmful and lead to portability issues, and if programmers find themselves needing to cast very often, they are unlikely to have clarity about what the original types were trying to express. This means that we should try to find solutions that keep casts to a minimum, even though we cannot avoid them entirely.
An orthogonal, but related design decision is whether we should allow implicit coercion from one integer type to a larger integer type. This reduces the incidence of casting, and should not generally be dangerous, but the precise allowed coercions might differ from platform to platform (e.g. whether a machine-sized integer can be coerced to an
i32 would be different on 32-bit or 64-bit systems).
It is also not controversial to say that if you are using numbers that safely fit into
i32 will usually be faster than using
i64 or machine-sized integers (on 64-bit systems).
Status Quo: The
int Type is Machine-Size But We Tell People to Use
The design proposed in the earlier discuss post aimed to compromise between the current definition of
int and a widespread desire for
i32 to be the default type by keeping
int as machine-sized, but recommending
i32 for general (non-indexing) use.
However, given that people are likely to use
int as a default no matter what the guidelines tell them, it is problematic to use
int for machine-sized types if they are not the recommended default. That is, if we want
i32 to be the default, that’s what
int should mean. While such a change would entail churn – a real risk at this stage – it is surmountable via a strategy I will outline in Design 2.
On the other hand, as we’ll see with Design 3, it may turn out to be quite sensible to keep
int as machine-sized but also recommend it as the default type, sending a consistent message.
Design 1: No
One solution to the problem is simply not to have an
int type (but instead have the sized integer types plus something like
isize). Instead, whenever you want to use an integer, it is up to you to figure out what the best size for that usage is.
In practice, I believe that this will lead to conventional choices about the best choice to make if you don’t have a better opinion. Those choices might come from our own design guidelines, conventional opinion, or popular answers on Stack Overflow. There is a good chance that there would also be multiple conventional answers, depending on who you ask.
In other words, if we decline to provide an explicit default in the language, or guidance about a default, many people will end up with their own “go to” default. We lose the opportunity to help people make a good decision, or to lead with the design of the standard library.
On the other hand, as with the status quo above, if we suggest a default in the style guide there is very little reason not to make that the default integer type as well.
Design 2: The
int Type is an Alias for
There are several possible variations on this line of reasoning, but I will focus on the one that preserves
i32 as the default, while minimizing necessary casting.
We (Niko, Aaron and I) have given this proposal a good deal of thought (including a plausible strategy for rolling it out before 1.0), and think it’s a reasonable contender for a solution.
inttype is an alias for
- Whenever a Rust method wants to represent an offset or size, it uses
isize, which is a machine-dependent size. (The name
isizeis a straw-man.)
- Rust has an implicit coercion from smaller types to types of bigger or equal size, which means that you can use an
intas an index into a vector on 32-bit and 64-bit systems.
- This design encourages people to use 32-bit integers when they don’t have a better idea in mind.
- Assuming that their usage can always fit into 32-bits, this will result in faster math operations, and therefore somewhat faster programs.
- This design encourages people to use a fixed-size integer when they don’t have a better idea in mind. This could help people who are targeting both 32- and 64-bit systems with robust test suites (but who don’t run their suites on 32-bit systems) catch some bugs that would otherwise remain hidden until their code was run on 32-bit systems.
- For programs targeting 64-bit deployments, this increases the chance of overflow errors relative to the other defaults we are considering. While some kinds of programs might still overflow even with a 64-bit integer, there are many domains where 32-bits aren’t enough, but 64-bits is far more than enough (e.g. seconds since the epoch, nanoseconds since program start).
- When a program uses an
intto refer to something proportional to addressable memory (common when working with data structures), this significantly increases the chance of an overflow on a 64-bit system. Because
intis the default, this situation is likely to arise even in cases where
isizewould have been a more appropriate choice.
- Additionally, it becomes very easy to accidentally write data structures that do not support the full 64-bit range of data on 64-bit systems. For example, Java’s mmap functionality does not support more than 32-bits, because it was accidentally written with 32-bit numbers in mind. This problem would not become apparent until someone tried to use the data structure with a very large number of elements. It would trigger a silent overflow in production, and could not be fixed compatibly (without breaking existing downstream consumers).
- It would break some existing code that stores the result of a method like
intfield or variable. This isn’t a primary consideration; if the balance of factors leaned in favor of this option, there’s a way to roll it out without delaying 1.0, which I’ll discuss next.
If Rust went with this approach, the roll-out plan would look like this:
- In Rust 1.0-alpha, we would introduce an
- Between 1.0-alpha and 1.0-beta, we would change the
inttype to be an alias for
i32and change all of the standard library to use
- For 1.0-beta we would introduce a temporary deprecation for the literal
inttype used when an
isizeis expected. This would help people who are currently using
intto mean machine-size to transition to
- In 1.0-final, we would remove the deprecation, making
inta literal alias for
i32, which would allow people to use the default integer in indexes.
Design 3: The
int Type is Machine-Size
This is similar to the status quo, except that we would encourage people to use
int as the default integer type, rather than
i32. The pros and cons of this choice are largely the flip side of Design 2. I’m not going to repeat all of the details here, so please read Design 2 above.
This is the second option that I think deserves serious consideration. Personally, given the weight of the pros and cons, I am leaning towards this option.
- Decreases the chances of overflows in applications targeting 64-bit when using the default
- When a program uses an
intto represent something proportional to addressable memory, this drastically reduces the likelihood of overflow on all architectures.
- Virtually eliminates the possibility of accidentally writing data structures that do not work with a very large number of elements on 64-bit architectures.
- Since the current
intis machine sized, this is the only option that doesn’t break any existing code.
- If the particular usage of the integer could have fit into 32-bits, this will result in slower math operations with that integer on 64-bit systems. Of course, there is always the recourse of using
i32explicitly in such cases.
- If a program is also targeting 32-bit systems, but is robustly tested only on 64-bit systems, Design 2 might more readily discover mistakes involving numbers outside the 32-bit range that would result in incorrect programs on 32-bit architectures. This can be mitigated by testing on the platforms you ship for.
Design 4: The
int Type is an Alias for a Bigger Integer
At first glance, one way to further mitigate overflow problems across all architectures (for integers used without much thought) is to make
int an alias for an even bigger number, like
i64 or a
Unfortunately, if the program then attempts to use the default
int as an index, it will be forced to cast it down to the size of the index. In practice, this would result in more widespread, dangerous (truncating) casting, which would reintroduce another kind of pernicious, platform-specific overflow problem. It would also significantly harm ergonomics.
Additionally, a default of
i64 on 32-bit systems or
BigInt on any architecture has a large performance tax that is unwarranted given that it doesn’t even effectively eliminate a large class of overflow problems (the ones introduced by truncation).