Restarting the `int/uint` Discussion

AaronFriel · December 31, 2014, 7:45pm

+1 for Design 1, here are my reasons in favor:

There is no implicit knowledge in working with integers. No newcomer to the language nor does any systems expert need to interrogate anything other than the type to know what size an int is, and that I think is a win for systems programming and interfacing with other code.
There is a reduction in explicit knowledge necessary for working with anything that would be serialized, indexes, and pointers. It will be definitely known, a priori, that structs with intptr cannot be round-tripped on different machines. At the same time, working with pointers and index sizes in an unsafe way would have to go through another type, a intptr or intp or the like¹.
Finally, Design #1 makes it easiest to add lints and warnings to reduce code-smells on non-portable code. I think working with intptr and uintptr outside an unsafe block should be an error or a warning.

To address @wycats issue with isize, I think the standard container libraries should pick a safe, easy maximum index size, or perhaps be generics instantiated with a safe default. i.e.: Vec<Foo> is actually just a name for GenericVec<u32, Foo>. In practice, none of the std containers will scale well above billions of elements, and the people who work with code that regularly has such requirements will implement their own domain-specific containers anyway.

¹ I’m inclined to say that the longer and more unsavory the name, the better. If intp ends up being the pointer-sized int, it might just end up replacing int in usage.

simias · December 31, 2014, 7:45pm

but, if compiling to a a smaller platform, where, for example u16 was the indexing type, then it should require an explicit conversion. (Similarly it would not downconvert a u64 to u32 on a uintp=u32 platform)

I see where you're coming from but that's going to be a pain point I'm afraid. Somebody writes a library on a 64bit machine, puts it on crate.io. You want to use it for your rust pet project on x86... but it won't build because it lacks proper casts. Sadness ensues.

As far as I know that would be the first major portability gotcha in safe rust code.

That being said I agree that having to cast to usize for indexing is always annoying though. Maybe we could just allow any unsigned type as an index? Even a u64 on a 32bit architecture? It might cause some broken code but you won't be able to create a collection whose size is bigger than a 32bit integer on a 32bit architecture anyway, so what's the worse that can happen? At any rate it would be an overflow issue that would trigger an assertion in debug build.

That seems like a better compromise to me. An other possibility I've seen mentioned elsewhere would be just make collection sizes u32s on all platforms. I don't really like that solution but it has some merits.

AaronFriel · December 31, 2014, 7:52pm

A collection of more than 4 billion elements will have severe scalability problems, far surpassing the pain of explicitly using a u32 type.

The only collection I can imagine keeping a u64 index would be the BitVec.

carllerche · December 31, 2014, 7:54pm

You are not considering any collection that uses the disk as well for storage (file backed buffers etc…) those easily go past 4GB.

loonyphoenix · December 31, 2014, 8:14pm

Hey, as a beginner programmer who barely even knows a lot of C, I’d like to give my perspective since wycats mentioned that only experts seem to be discussing the issue. I’d find Design 1 to be much easier to understand than Design 2 or Design 3. After reading the introductory rust guide, I came away with the implicit conclusion that it was a good idea to use int as the best default option, even though after reading this debate I understand that it’s not correct. So I definitely like the idea of renaming int into something that makes sure I know that it’s not for general usage. As for leaving int as an alias for i32, I’d find it strange too, and my first instinct would be to wonder if I should be trying to use i32 as much as possible for some reason. Perhaps this is not the case for people who are already very experienced in C-like languages, but surely the perspective of someone trying to understand Rust without any serious preconceptions should be taken into account as well?

glaebhoerl · December 31, 2014, 8:17pm

@Valloric, @wycats My argument is that even if people write i32, or intp, or whichever type, in every case where they would otherwise have written int, that is still an improvement. But they likely won't, not 100% of the time, which is another improvement.

I should drag out this Dijkstra quote again:

... when judging the relative merits of programming languages, some still seem to equate "the ease of programming" with the ease of making undetected mistakes.

Familiarity is no good if it's false.

frankmcsherry · December 31, 2014, 8:23pm

Hypothetically, imagine you swapped in iunk for int, with the unk suffix implying unknown width. Then you put in imem or ireg or isize with the current architecture dependent int meaning.

I have a hard time imagining the HLL folks saying “I’d like to use the iunk type, please”. I know this isn’t what you asked them, but in some sense it is what the compiler is asking them. It feels to me like one of the important experiences with Rust is the compiler saying “let’s be clear, now”. Writing code that just runs and does a thing but who really knows what it does cause lol ints … it just feels like a non-goal of Rust. Caveat: I’m not in charge of Rust goals, nor things that might sink adoption.

If you make the hypothetical change above and then (hypothetically, still) cleaned up rustc and such to use these new types, it seems like you’d get a much clearer picture of where iunk is important, versus where folks want the more meaningful isize sort of type. That sounds like a lot of work, but even if one sticks with int, it would (probably?) make sense to have more specific identifiers like isize for clearer semantic reasons. It would then be a goal to have as few iunks as possible, for portable code. My sense is that most remaining uses of int/iunk would be “geez, I don’t really know how big this could be…” rather than “this should be machine sized”.

The question about having data structures be polymorphic with respect to int sizes is great. I think the right thing is to make people acknowledge that the widths may vary, and have them use isize. It is basically the same as now, I think, except that it is clearer that the type is not “just any old int”. It isn’t shameful or anything, just a lot clearer. Having both int and isize would mean there is an awkward transition between using the more shameful “i’m not sure what width is about” int and the less shameful “ah, got it now” isize.

wycats · December 31, 2014, 8:36pm

Let me try to be even more concrete:

In a world without int, people need to decide what int type they normally use when they need an integer. In my personal coding style, because I have repeatedly been bitten by overflows, I choose i64 unless I have reason to feel confident that what I’m doing fits into i32. I think this is a reasonable thing to tell new Rust developers.

What this means is that the first time they encounter an isize (which is probably the first time they try to index a Vec), they will get an error that tells them that their integer is too big, and that they should think about using a smaller integer.

In practice, I don’t know what this makes the rule for what integer to use. It seems like the only options are somewhat involved heuristics, which is a lot to ask every time someone wants to use an integer in Rust. I know people feel like that’s the price you pay for writing Rust, but it just feels like a lot to ask.

It almost feels worth it to say that the built-in data structures are limited in size to 32-bits just to avoid this int/isize interoperability problem.

tupshin · December 31, 2014, 8:37pm

But this is exactly what you want to happen. If somebody only built and tested their code on a 64 bit platform, and made choices that are known to not be safe, then I'd much rather the code fail to compile rather than compile and then fail at runtime.

Additionally, infrastructure could do a whole lot to address this. Travis-ci could be leveraged to do builds on all target platforms, and its output could be automatically used by cargo and crates.io to provide very helpful error messages about why the dependency fails, and display a useful red/green list of architectures on the crates.io site.

Everything about this approach seems appealling to me. It would greatly encourage pull requests to fix architecture depedent problems by exposing those issues and making them first class artifacts of the dependency infrastructure.

AaronFriel · December 31, 2014, 8:46pm

@carllerche: File backed buffers != Vec, and certainly not representative of how any of the collections currently work, and it would be perfectly reasonable for a file backed buffer to use u64 (or even a u128, given ZFS’ mantra). Are you going to propose that 32-bit systems should be limited to 32-bit or 64-bit backing stores?

If the goal is for std to have a Vec that is perfectly scalable between the most massive use cases and the smallest, I think the collections library will have other problems than deciding what integer type to use. Let alone all the other collections in the library.

carllerche · December 31, 2014, 8:48pm

I personally write mostly software that runs in a controlled (linux 64bit) environment. I’m guessing a lot of people are in this boat. The fact is that in this case it is just not worth defaulting to 32bit numbers. I (and many other people that I have worked w/ talked to) use 64bit numbers unless there is a damn good reason not to. I personally have had to debug overflow bugs that corrupted customer data in the past (due to using 32bits instead of 64) and it is not fun. Most of the time it goes unnoticed until much later and by then a whole bunch of dependent data is corrupted. Again, it’s just not worth risking this.

This means that the type that I (and many others) will default to in my code will be i64. Now, imagine if all libs like collections and what not take isize. This means that I am going to have to cast each time I call into these libs since i64 does not coerce to isize. If I am writing code correctly, if I downcast, I am going to have to check for an overflow. This is a very very big decrease in usability. Odds are, most people are going to hard cast w/o a check and expose themselves to the bugs anyway.

As such, I am arguing in favor of the status quo (keeping int / uint as pointer sized). Anything else is going to be a significant decrease in ergonomics for me.

rpjohnst · December 31, 2014, 8:50pm

Design 1 is closest to optimal for me, and closest to the popular RFC. It removes any confusion about int because every type is explicitly named for what it is, and there are no duplicates. It may seem odd for beginners, but “seeming odd” is not actually a problem. An initial oddity is far better than a perpetual fog of aliased types with less-clear names.

Further, Design 2 and 3 are both backwards-compatible changes to Design 1. If it turns out not having an int type is some kind of deal-breaker, it’s trivial to add a type int = i32, or i64, or isize, or whatever people decide later. It’s really a separate discussion.

However, Design 3 also introduces machine-dependent behavior into the built-in types, which would be a disaster.

wycats · December 31, 2014, 8:57pm

It seems like you would be ok with making int an alias for i64.

What this would mean in practice:

On 64-bit systems (the systems you target), you get reasonable performance with the default integer
If you care about 32-bit systems, you can get a reasonable, overflow-resistant, default choice that is slow on 32-bit systems, or you can choose to use a smaller integer if you feel confident that what you're doing will fit into it.
If you were targeting 32-bit systems, we could provide good error messages around truncation that would help you decide whether you could live with a 32-bit integer when interfacing with an API that takes isize.
We could potentially have a mode that allowed implicit coercion between isize and i64 for applications that only targeted 64-bit systems.

Can someone mount a strong argument against this approach?

Valloric · December 31, 2014, 9:01pm

Performance. People coming from C or C++ will use int because that's what they're used to and it will be slower than the int-using code in C or C++. They'll decry Rust being slower than C and C++. I'll have to tell C++ devs I'm teaching Rust to avoid int because it's a perf gotcha.

I also don't buy that i64 is "overflow-resistant." I have plenty of code that uses 128bit ints because 64bit ints are too small.

Since there seems to be lots of disagreement over what int should be aliased to in Rust if it is to be aliased at all, I'm now more in favor of Design 1. Let's not have the debate; int shouldn't exist.

wycats · December 31, 2014, 9:01pm

Another way to say it is: if you don’t care, you get something that has reasonable overflow-resistance and is portable, is reasonably performant on 64-bit, and is slow on 32-bit.

If you want to put the effort into making it fast on 32-bit, you have to think about whether your integers actually fit into 32-bit space.

wycats · December 31, 2014, 9:02pm

I think it's pretty clear that overflows on 32-bit are much more common than overflows on 64-bit. All of the overflows I've hit in the real-world were eliminated by moving to 64-bit.

AaronFriel · December 31, 2014, 9:03pm

@carllerche, @wycats, et al.:

Am I the only one confused by your continued references to isize being the parameter taken by vecs, et al.?

I don’t understand why Vec can’t be indexed by something standard like u32 without problem. I certainly don’t see a reason for Vec to be indexed by i32 or any signed type, and I don’t know why you would want an isize type to infect the standard library everywhere. In fact it seems to me that it would be most reasonable for isize to be relegated purely to code that munges pointers, and verboten outside of unsafe

Let’s look at the types:

Vec doesn’t scale past billions of elements, so cap that at u32, or if people get really antsy about that, u64. Or make Vec generic in its index parameter, and support any fixed size integer.
RingBuf, DList, HashMap: doesn’t matter.
VecMap definitely doesn’t scale to large keys, it even says so the box.
Set, BitVSet, et al. don’t matter either.
BitVec is the only one that makes sense to have a u64 or larger index parameter.

So why does isize come into play with collections code? I feel like it should be called intpr and only be used in unsafe code. All intrinsic types outside of unsafe should have a machine-independent and explicit size.

wycats · December 31, 2014, 9:04pm

The flip-side of "I have to teach people to avoid int because perf" is "I have to teach people to prefer i64 because overflows". At my company, we always prefer i64 as a default integer, because choosing i32 without carefully verifying that the number we're using fits into 32-bits has caused serious production problems.

We think of 32-bit integers like 8- or 16-bit integers: great if you can use them, but be damn sure you can.

Valloric · December 31, 2014, 9:06pm

You're making a really good argument for Design 1.

wycats · December 31, 2014, 9:06pm

I personally think this is very reasonable, but I don't think @aturon or @nikomatsakis think so. Maybe they can pipe up.

Topic		Replies	Views
If `int` has the wrong size ...? bikeshed (deprecated)	26	5705	March 25, 2019
Default integer type should be safe to work with large arrays internals	19	6709	March 25, 2019
A tale of two's complement	62	24926	March 25, 2019
Pre-RFC: Generic integers v2	82	2495	August 26, 2024
Pre-RFC: Generic integers (uint<N> and int<N>) language design	50	7333	March 25, 2019

Restarting the `int/uint` Discussion

Related topics