Restarting the `int/uint` Discussion

I tried to write some things down more axiomatically, with some propositions that folks could agree or disagree on, which might get more to the heart of the matter than discussing possible conclusions. I didn’t do very well.

I’ve put a few down that I support, but they are debatable. In particular, they range farther afield than “memory safe, low level”.

P1. code shouldn’t have [much] different semantics on different platforms.

I am much happier with code not compiling rather than compiling into a program that may unexpectedly produce different results. This is imo an important part of a healthy library ecosystem. I see this as a strike against Designs 2 and 3.

P2. people shouldn’t use unknown integer widths for performance.

They can certainly use known, and varying, integer widths for performance, but you should opt in to this. Random day one Rust user isn’t going to get excited because Rust uses one versus another; serious Rust user doesn’t want to guess whether uint means register sized or memory index sized or whatever. I see this as a strike against Design 3 and for Design 1 (where maybe you have imem, umem, ireg, ureg, and whatever other meaningful systems-level types should be).

P2a. people shouldn’t use unknown integer widths.

This probably isn’t defensible for ergonomic reasons, but it’s good to think of the legitimate reasons to use unknown widths in core logic, other than convenience.

P3. explicitness is an important part of Rust.

The reason I got excited about Rust was not that I could type for loops with as few characters as possible. There are better languages for that. It is because I needed to think about things that might not be problems (data races in my single threaded code?), but that by understanding made me better (ah, iterator invalidation!). Making the “hello world” experience easy is not a great motivation for having an int type, imo (improving ergonomics, might be). The “guidance” I think you should give is “you should have to think about integer widths, at least once”. This is again in support of Design 1.

My personal preference is Design 1, where it is non-trivial to put isize or usize or whatever in a data structure, but they come back from things like len(). If you want to use them, hooray, but you are acknowledging that your program may do different things on different platforms.

I’d also be happy with Design 4, as long as it is basically the same as Design 1 otherwise, and at some point new users get told “hey, int isn’t as lean as you thought it was”. I worry this may turn off folks who feel deceived, unless it is mostly fast, in which case (like other fancy language features) they realize “hey, robust code isn’t so bad”.

7 Likes

Yes, I do find the rationale that we have to train new systems programmers about bounds regardless persuasive. I think I still lean toward option 2 slightly, based on my experience with overflow not being a large problem and a desire to reduce friction for workaday programming tasks, but I don't think option 1 (not having an int) would be a large burden to newcomers, given that they will quickly have to interact with the entire integer size zoo in order to do anything interesting with Rust.

Just to elaborate on the collections-related issues:

Taking a default type that is a fixed size may lead to incorrect implementations that overflow.

I’m not concerned by this. The standard interfaces are all based on isize, and should remain as such. Having our collections randomly fail out at 32-bit on 64-bit platforms is dumb in my books. It’s not our business to be telling people that they should be writing domain-specific data structures for “large” data sets; especially at the otherwise arbitrary 32-bit boundary. I also believe there’s real perf to be gained here by using ptr-sized values. If you use fixed-width in these APIs then you need to perform a lot of checked arithmetic and branch to panic/abort right away. If you use ptr-width, you can saturate allocation sizes and have the (already required) branch in the allocator OOM for you. We can also completely avoid many checks/saturations elsewhere by virtue of “the allocator will OOM first”.

Similarly this is a virtue of using unsigned types for many of these operations: there are many places that don’t ever need to check if the value is too small, only too big. Of course we then need to guard against underflow, but this is a rarer concern (and there’s often a special behaviour that needs to be performed in that case anyway-- e.g. yield None on pop).

Now this isn’t a perfect strategy. Bitv needs tons of bounds-checks because everything is in bits and not bytes. Which also means Bitv has an “arbitrary” limit of 1/8th of the theoretical maximum memory, literally only because the len() it returns has to fit into a usize. However breaking the API here doesn’t seem particularly worthwhile. In practice it’s “only” 1/4th (kernel gets half), and on 64-bit you’re never getting anywhere near anyway.

I am a bit concerned about libraries written on top of std collections using a fixed-width int and needlessly handicapping their code on 64-bit.

Having a default type that is ptr-sized may lead to implementations that incorrectly use them as backing storage or other things

This has historically been a real problem. Bitv, EnumSet, and arguably Trie all were dubiously built on usize. We fixed Bitv and EnumSet. Further, as a result of all this discussion I expect all the collection maintainers for collect and std will be very vigilant towards this sort of thing in the future. However it’s still very tempting to do when implementing your own bit-twiddly tricks.


At this point I lean towards letting the int name die in the name of “keep it simple”. If it must exist I would pick i32 for the reasons outlined above – lesser of two evils.

5 Likes

This is absolutely the key. Anything that brushes that one issue under the rug is wrong, and flies in the face of Rust's explicitness philosophy.

I do believe that #2 fails in that regard.

1 Like

Re: friction, to me ‘i32’ is as easy to type as ‘int’ without the baggage of having to remember how this whole argument ended up playing out. It also seems like a pointless distraction to have to decide between two non-semantically-differentiated names for the exact same type.

2 Likes

I also work on a system that packages up traces of web requests written in Rust for processing and crunching in Java. Like you, we have routinely hit overflows where someone assumed that 32-bits would be large enough (again, we deal in relative sizes) on both the Rust side and the Java side.

On the Java side, we know we're always deploying to 64-systems, and have come to reflexively use 64-bit numbers because we've hit production overflow bugs fairly often when a poorly chosen 32-bit integer was used. On the Rust side, which runs on both 32- and 64-bit systems, we tend towards 64-bit integers, unless we're pretty darn sure the number can fit into a 32-bit slot (relative time in millis, for example).

Like you, that experience doesn't lead me directly towards a particular solution, but it does tend to push me away from Design 2.

1 Like

I have to admit, I personally find the idea of "no int type" more appealing than "int as alias for i32".

7 Likes

So I just want to drill into this a little bit. I don't think you're arguing against the existence of an isize type, right? There are of course differences between machines, and those differences will make their way into programs that need to work with addressable memory.

I think what you're saying is that you should have to be explicit (and think carefully about) the fact that you're using an unknown-sized integer. I'm very sympathetic to that position :smile:

The thing that troubles me is that if things like Iterator::pos and Iterator::enumerate yield isizes, it becomes very easy to write programs that feel a lot like the axioms you feel strongly about. In other words, if you get an isize through normal code, you'll store it off as an isize, probably without thinking too much about it.

It's just not clear to me whether being explicit about isize in Rust will result in programs that actually satisfy your axioms, and whether blind casting (casting that people due to satisfy the type checker without thinking too hard about it) will erase the gains.

All of that said, I basically agree with your goals, and think I agree that type int = isize is likely to violate them more often than some of the other options.

1 Like

I don't have much an opinion about this; i32 is a bit more annoying to type than int, but still pretty easy on a QWERTY keyboard. However, to address one point:

I've seen countless 32-bit integer overflow bugs in the last few years, mainly because I spend a lot of time looking for them in C and C++ code in order to try to exploit them as vulnerabilities. In this context it's a potent class of bugs, because (a) sizes or counts in file formats and protocols are often 32 bits wide, making it possible to specify values that overflow when code multiplies or adds them, and (b) 4GB is just small enough that in some cases it's possible to actually allocate that many bytes, have that large a file (never done that), or otherwise get up to the limit by brute force, if it's required to exploit the bug.

Of course, exploitability is much less of an issue in Rust where the Rust code itself is mostly memory safe, although there could still be issues with DoS, interaction with other code, unsafe code, etc.. I'm just saying that code that could overflow 32 bits given the right input seems to be far from uncommon in practice.

4 Likes

Option 1 (no type named int) all the way. It is certainly possible that new conventions will spring up in its stead, based on documentation, std, stackoverflow.com, or whatever, such as using i32 “most of the time” instead, but at least in this case, the name makes it completely obvious what it means. If someone writes a collection and uses i32 or u32 for it, then it should be blindingly obvious that it’s not going to scale beyond 32-bit sizes. It’s not within our power to prevent every mis-sized integer bug ever, but on the margin, not having int and uint will cause people to think harder and choose the right size more often, and will provide greater clarity even in the remaining cases, and we must count that as a win.

I used to think that having type int = i32; and type uint = u32; might be a good idea, but reading the OP has actually persuaded me otherwise.

Implicit upcasting and/or polymorphic indexing sound like potentially good, but largely orthogonal, ideas.

4 Likes

Right, I've certainly seen that, but I wonder what the impact is. I've seen lots and lots of overflow-related security bugs, but they all seem to boil down to memory safety issues. Those don't seem really relevant to this issue as far as I'm concerned because there are lots of ways bad indexing or bad allocations can happen in C—overflows being just one of them—and Rust has a comprehensive defense via bounds checking. I'd be more curious to know if you have any other examples in which there was a security issue that didn't boil down to memory safety.

I went looking for this a while back and the only security-related overflow exploit I could find that wasn't a memory safety issue was a gold overflow in World of Warcraft :slight_smile:

I’m finding a lot of the arguments in favor of Design 1 persuasive, but personally still mulling it over.

Keep it coming folks. This conversation is awesome.

2 Likes

I’m definitely in favor of something like isize, perhaps broken out a few ways if there are semantically different meanings of it (e.g. reg vs mem). Opposed to people using it unintentionally (which may be an ergo can’t-solve issue).

I’d be delighted, though I don’t know enough to be sure this is possible, if isize was something you got back occasionally from len(), and so have access to the type, but if you want to compare it to your locally defined and typed variables you need to cast it. I’d be supportive of not having isize be supported by Reader and Writer, for example. This is all from my local microcosm of how I use the code, though, and I can easily see it being a horrible mess from a different point of view.

Mostly, I’d like to be able to index vectors with things other than uint, so that I don’t have to put as uint everywhere on my clearly-sized types. Ideally, I won’t have to guess if or how it works on other platforms, at the same time.

As an aside, why should enumerate return an isize? Some enumerators are derived from types that can’t hold so much, but if I do range(0u64, 1u64 << 63).enumerate(), on a 32-bit machine, why should that be 32 bits? I’m not really sure what it should be, of course. :smile:

1 Like

So, to be clear, I think that #1 is a good, conservative choice. Based on my experience, I think that option #2 would decrease friction and would not present issues in practice over option #1, but lots of users have weighed in stating that in their domain they’re uncomfortable with the possibility of overflows resulting from 32-bit defaults. So count me in for either option.

Along with the question of whether to have int and uint names, there is the parallel one of the i and u suffixes. I think the best option here is also “option 1” - to just remove them. We would have ip/up, ix/ux, is/us, or whatever, suffixes, depending on what int/uint get renamed to.

There have been quite similar gold exploits in other multiplayer games too: Diablo III and Kingdom of Loathing.

There is the DoS angle, although I don’t think it’s a big deal - on the client side it’s not that important, and on the server, if you’re allocating a user-supplied amount of memory, you can be taken down anyway.

Edit: Just remembered this fun one: http://www.bbc.co.uk/news/world-us-canada-23352230

1 Like

One more point in favor of #1.

I believe that the security arguments against #2 are correct, but as a few people have pointed out, those concerns are reduced by rust's existing safety guarantees.

However, I do not think that only security should be considered here, as all other aspects of "correctness" are also relevant.

So a thought about that...If indexing is a platform specific type (say uintp for argument's sake), then when compiling for any platform where uintp was >= your chosen type (with guidance to choose u32 under most circumstances) , then it would just seamlessly, and safely, cast your u32s to that arch specific type. but, if compiling to a a smaller platform, where, for example u16 was the indexing type, then it should require an explicit conversion. (Similarly it would not downconvert a u64 to u32 on a uintp=u32 platform)

Ramifications of this approach are a high guarantee of safety and correctness when building for any 32 bit or higher platform, and compile time errors when trying to use that kind of naive approach for small platforms. Verbosity is minimal, and the ony downside compared to today is the requirement that somebody choose a specific type, with compile time errors directing users to make decent choices.

This feels like the best of all worlds to me.

I also support design 1, for the reasons listed above, by @Valloric, @simias, and others. Rust is a language that, in many ways, requires the programmer to think about what makes sense to do within the constraints of real systems. Requiring programmers to think explicitly about integer sizes is consistent with its design.

2 Likes

I agree with this entirely. I said before I'm not against Design 1, but I do think we'll see useless complaints from newcomers about "why is there no int type?". Design 1 does have a really nice property of being conservative and leaving an option for us to move to Design 2 in the future if truly necessary.

In do feel like this approach amounts to telling people to eat their broccoli and exercise every day; sure they should do it, but no amount of finger-wagging will actually get people to do what's best for them. Design 1 says they must think about this right away when learning Rust; I'm afraid it will change people's heuristic from "use an int when you don't care" to "use an i32/i64/u32/u64/whatever when you don't care", with the actual type depending on the person. Unfortunately some of those are less ideal than others.

I'm surprised by people brushing away performance concerns with 64bit integers. Instruction-level perf aside, a 64bit int takes up twice as much cache space as a 32bit int. Now you have a vector of such integers and you'll definitely feel the perf hit outside of microbenchmarks.

My point is, if a person doesn't care about the size of the integer they're going to use, we can't make them care. Attempts to do this will fail. With int = i32, at least we're picking a solid-if-not-perfect choice for them. With nothing for int, newcomers are likely to make a silly choice on their own.

We seem to be dancing around the fact that for the vast majority of integer use-cases, a 32bit signed integer works just fine. Yeah, citation needed, I know. But we recommend int as a default at Google for what is without a doubt the world's largest unified C++ codebase and the world hasn't come crashing down on us yet. The Google style guide has its issues, but this recommendation isn't one of them.

I'll admit I'm not entirely buying my own argument here, but I'm bringing it up because I feel like it should be brought up. Instinctively, I want to tell newcomers "you damn well better think about your integer size! and get off my lawn!" but I'm not convinced this approach maximizes user success, despite my knee-jerk preference for it.

I'll do you one better: here's a commit that fixes an issue like this. I was porting HGE (an open-source game engine written in the '90s) to modern machines since I had an ancient project that used it (it now uses SFML). At various points the engine placed pointers in a 32bit DWORD and while this works on a platform with 32bit pointers, it leads to crashes on a platform with 64bit pointers. It took me ages to track this down. This is just one example.

2 Likes

I have gained some sympathy for Design 1 through this discussion, but some things are still troubling me. None of them are particularly dispositive, but I really would appreciate some help working through them.

First off, while the people in this thread certainly don’t find the lack of an int type to be strange, and all have the right kind of experience to think carefully about integer sizes, I think that that may not be representative of the wider community, especially people who will join the Rust community after 1.0.

I’ve taken a straw poll of friends of mine in more HLL circles (but who have written a lot of code in C/C++/Java) who are keeping an eye on Rust. For them, the lack of any int type is indeed, pretty weird. They also found the name isize to be weird, but that specific detail is not crucial for this discussion.

I don’t know exactly how to represent this feeling, but I understand it, and I don’t think it’s being addressed that well in this thread. Most people here are all very well-versed in the details of this topic, which make you somewhat different from an everyday user, even one who understands the existence of various integer sizes (but doesn’t want to work at that level of abstraction every time they need an integer).

To be clear, I don’t think anyone disagrees that there are many times when you actually have some idea of the bounds of your integer, and where you want to use a fixed-sized iN. Much of my code uses explicit i32s and i64s, and that doesn’t bother me much at all. The question is more about what happens when you know you want an integer, but haven’t thought about how big it will need to be.

The existence of so many blind casts (to silence type errors) by even experienced developers leads me to believe that this state of mind (“I don’t really know what integer I need, but just make this work”) is more widespread that we might imagine.

Second of all, it seems like the core problem bedeviling us here is the mere existence of an isize type. That means that no matter what we do, if Vec deals in isizes, people will end up writing programs that are different across platforms.

In that light, the key question is how common isizes actually are, and how much they pollute other parts of the program. If isizes are relatively uncommon to be used explicitly (because they can often be inferred when used, for example, via enumerate), maybe having a fixed-sized int type that interacts awkwardly when used with explicit uses of isize isn’t so bad.

Finally, I also think I want to explore the precise problems with type int = i64 a little bit more, despite being initially negative towards it because of the problem of truncation. Nothing is stopping a user from choosing a smaller type if they know that the number they’re using fits into 32-bits, but 64-bits is both big enough for a huge amount of real-world domains and consistent across platforms.

In particular, I’m wondering whether we could improve the truncation problem though improved error messages: when attempting to use an i64 in an isize (like indexing), they would get an error that says that they have the wrong type, and explain that i64 is too big for 32-bit systems. It would suggest using an i32 or isize instead, and specifically discourage a cast.

4 Likes