There is no implicit knowledge in working with integers. No newcomer to the language nor does any systems expert need to interrogate anything other than the type to know what size an int is, and that I think is a win for systems programming and interfacing with other code.
There is a reduction in explicit knowledge necessary for working with anything that would be serialized, indexes, and pointers. It will be definitely known, a priori, that structs with intptr cannot be round-tripped on different machines. At the same time, working with pointers and index sizes in an unsafe way would have to go through another type, a intptr or intp or the like1.
Finally, Design #1 makes it easiest to add lints and warnings to reduce code-smells on non-portable code. I think working with intptr and uintptr outside an unsafe block should be an error or a warning.
To address @wycats issue with isize, I think the standard container libraries should pick a safe, easy maximum index size, or perhaps be generics instantiated with a safe default. i.e.: Vec<Foo> is actually just a name for GenericVec<u32, Foo>. In practice, none of the std containers will scale well above billions of elements, and the people who work with code that regularly has such requirements will implement their own domain-specific containers anyway.
1 Iām inclined to say that the longer and more unsavory the name, the better. If intp ends up being the pointer-sized int, it might just end up replacing int in usage.
but, if compiling to a a smaller platform, where, for example u16 was the indexing type, then it should require an explicit conversion. (Similarly it would not downconvert a u64 to u32 on a uintp=u32 platform)
I see where you're coming from but that's going to be a pain point I'm afraid. Somebody writes a library on a 64bit machine, puts it on crate.io. You want to use it for your rust pet project on x86... but it won't build because it lacks proper casts. Sadness ensues.
As far as I know that would be the first major portability gotcha in safe rust code.
That being said I agree that having to cast to usize for indexing is always annoying though. Maybe we could just allow any unsigned type as an index? Even a u64 on a 32bit architecture? It might cause some broken code but you won't be able to create a collection whose size is bigger than a 32bit integer on a 32bit architecture anyway, so what's the worse that can happen? At any rate it would be an overflow issue that would trigger an assertion in debug build.
That seems like a better compromise to me. An other possibility I've seen mentioned elsewhere would be just make collection sizes u32s on all platforms. I don't really like that solution but it has some merits.
Hey, as a beginner programmer who barely even knows a lot of C, Iād like to give my perspective since wycats mentioned that only experts seem to be discussing the issue. Iād find Design 1 to be much easier to understand than Design 2 or Design 3. After reading the introductory rust guide, I came away with the implicit conclusion that it was a good idea to use int as the best default option, even though after reading this debate I understand that itās not correct. So I definitely like the idea of renaming int into something that makes sure I know that itās not for general usage. As for leaving int as an alias for i32, Iād find it strange too, and my first instinct would be to wonder if I should be trying to use i32 as much as possible for some reason. Perhaps this is not the case for people who are already very experienced in C-like languages, but surely the perspective of someone trying to understand Rust without any serious preconceptions should be taken into account as well?
@Valloric, @wycats My argument is that even if people write i32, or intp, or whichever type, in every case where they would otherwise have written int, that is still an improvement. But they likely won't, not 100% of the time, which is another improvement.
I should drag out this Dijkstra quote again:
... when judging the relative merits of programming languages, some still
seem to equate "the ease of programming" with the ease of making
undetected mistakes.
Hypothetically, imagine you swapped in iunk for int, with the unk suffix implying unknown width. Then you put in imem or ireg or isize with the current architecture dependent int meaning.
I have a hard time imagining the HLL folks saying āIād like to use the iunk type, pleaseā. I know this isnāt what you asked them, but in some sense it is what the compiler is asking them. It feels to me like one of the important experiences with Rust is the compiler saying āletās be clear, nowā. Writing code that just runs and does a thing but who really knows what it does cause lol ints ⦠it just feels like a non-goal of Rust. Caveat: Iām not in charge of Rust goals, nor things that might sink adoption.
If you make the hypothetical change above and then (hypothetically, still) cleaned up rustc and such to use these new types, it seems like youād get a much clearer picture of where iunk is important, versus where folks want the more meaningful isize sort of type. That sounds like a lot of work, but even if one sticks with int, it would (probably?) make sense to have more specific identifiers like isize for clearer semantic reasons. It would then be a goal to have as few iunks as possible, for portable code. My sense is that most remaining uses of int/iunk would be āgeez, I donāt really know how big this could beā¦ā rather than āthis should be machine sizedā.
The question about having data structures be polymorphic with respect to int sizes is great. I think the right thing is to make people acknowledge that the widths may vary, and have them use isize. It is basically the same as now, I think, except that it is clearer that the type is not ājust any old intā. It isnāt shameful or anything, just a lot clearer. Having both int and isize would mean there is an awkward transition between using the more shameful āiām not sure what width is aboutā int and the less shameful āah, got it nowā isize.
In a world without int, people need to decide what int type they normally use when they need an integer. In my personal coding style, because I have repeatedly been bitten by overflows, I choose i64 unless I have reason to feel confident that what Iām doing fits into i32. I think this is a reasonable thing to tell new Rust developers.
What this means is that the first time they encounter an isize (which is probably the first time they try to index a Vec), they will get an error that tells them that their integer is too big, and that they should think about using a smaller integer.
In practice, I donāt know what this makes the rule for what integer to use. It seems like the only options are somewhat involved heuristics, which is a lot to ask every time someone wants to use an integer in Rust. I know people feel like thatās the price you pay for writing Rust, but it just feels like a lot to ask.
It almost feels worth it to say that the built-in data structures are limited in size to 32-bits just to avoid this int/isize interoperability problem.
But this is exactly what you want to happen. If somebody only built and tested their code on a 64 bit platform, and made choices that are known to not be safe, then I'd much rather the code fail to compile rather than compile and then fail at runtime.
Additionally, infrastructure could do a whole lot to address this. Travis-ci could be leveraged to do builds on all target platforms, and its output could be automatically used by cargo and crates.io to provide very helpful error messages about why the dependency fails, and display a useful red/green list of architectures on the crates.io site.
Everything about this approach seems appealling to me. It would greatly encourage pull requests to fix architecture depedent problems by exposing those issues and making them first class artifacts of the dependency infrastructure.
@carllerche: File backed buffers != Vec, and certainly not representative of how any of the collections currently work, and it would be perfectly reasonable for a file backed buffer to use u64 (or even a u128, given ZFSā mantra). Are you going to propose that 32-bit systems should be limited to 32-bit or 64-bit backing stores?
If the goal is for std to have a Vec that is perfectly scalable between the most massive use cases and the smallest, I think the collections library will have other problems than deciding what integer type to use. Let alone all the other collections in the library.
I personally write mostly software that runs in a controlled (linux 64bit) environment. Iām guessing a lot of people are in this boat. The fact is that in this case it is just not worth defaulting to 32bit numbers. I (and many other people that I have worked w/ talked to) use 64bit numbers unless there is a damn good reason not to. I personally have had to debug overflow bugs that corrupted customer data in the past (due to using 32bits instead of 64) and it is not fun. Most of the time it goes unnoticed until much later and by then a whole bunch of dependent data is corrupted. Again, itās just not worth risking this.
This means that the type that I (and many others) will default to in my code will be i64. Now, imagine if all libs like collections and what not take isize. This means that I am going to have to cast each time I call into these libs since i64 does not coerce to isize. If I am writing code correctly, if I downcast, I am going to have to check for an overflow. This is a very very big decrease in usability. Odds are, most people are going to hard cast w/o a check and expose themselves to the bugs anyway.
As such, I am arguing in favor of the status quo (keeping int / uint as pointer sized). Anything else is going to be a significant decrease in ergonomics for me.
Design 1 is closest to optimal for me, and closest to the popular RFC. It removes any confusion about int because every type is explicitly named for what it is, and there are no duplicates. It may seem odd for beginners, but āseeming oddā is not actually a problem. An initial oddity is far better than a perpetual fog of aliased types with less-clear names.
Further, Design 2 and 3 are both backwards-compatible changes to Design 1. If it turns out not having an int type is some kind of deal-breaker, itās trivial to add a type int = i32, or i64, or isize, or whatever people decide later. Itās really a separate discussion.
However, Design 3 also introduces machine-dependent behavior into the built-in types, which would be a disaster.
It seems like you would be ok with making int an alias for i64.
What this would mean in practice:
On 64-bit systems (the systems you target), you get reasonable performance with the default integer
If you care about 32-bit systems, you can get a reasonable, overflow-resistant, default choice that is slow on 32-bit systems, or you can choose to use a smaller integer if you feel confident that what you're doing will fit into it.
If you were targeting 32-bit systems, we could provide good error messages around truncation that would help you decide whether you could live with a 32-bit integer when interfacing with an API that takes isize.
We could potentially have a mode that allowed implicit coercion between isize and i64 for applications that only targeted 64-bit systems.
Can someone mount a strong argument against this approach?
Performance. People coming from C or C++ will use int because that's what they're used to and it will be slower than the int-using code in C or C++. They'll decry Rust being slower than C and C++. I'll have to tell C++ devs I'm teaching Rust to avoid int because it's a perf gotcha.
I also don't buy that i64 is "overflow-resistant." I have plenty of code that uses 128bit ints because 64bit ints are too small.
Since there seems to be lots of disagreement over what int should be aliased to in Rust if it is to be aliased at all, I'm now more in favor of Design 1. Let's not have the debate; int shouldn't exist.
Another way to say it is: if you donāt care, you get something that has reasonable overflow-resistance and is portable, is reasonably performant on 64-bit, and is slow on 32-bit.
If you want to put the effort into making it fast on 32-bit, you have to think about whether your integers actually fit into 32-bit space.
I think it's pretty clear that overflows on 32-bit are much more common than overflows on 64-bit. All of the overflows I've hit in the real-world were eliminated by moving to 64-bit.
Am I the only one confused by your continued references to isize being the parameter taken by vecs, et al.?
I donāt understand why Vec canāt be indexed by something standard like u32 without problem. I certainly donāt see a reason for Vec to be indexed by i32 or any signed type, and I donāt know why you would want an isize type to infect the standard library everywhere. In fact it seems to me that it would be most reasonable for isize to be relegated purely to code that munges pointers, and verboten outside of unsafe
Letās look at the types:
Vec doesnāt scale past billions of elements, so cap that at u32, or if people get really antsy about that, u64. Or make Vec generic in its index parameter, and support any fixed size integer.
RingBuf, DList, HashMap: doesnāt matter.
VecMap definitely doesnāt scale to large keys, it even says so the box.
Set, BitVSet, et al. donāt matter either.
BitVec is the only one that makes sense to have a u64 or larger index parameter.
So why does isize come into play with collections code? I feel like it should be called intpr and only be used in unsafe code. All intrinsic types outside of unsafe should have a machine-independent and explicit size.
The flip-side of "I have to teach people to avoid int because perf" is "I have to teach people to prefer i64 because overflows". At my company, we always prefer i64 as a default integer, because choosing i32 without carefully verifying that the number we're using fits into 32-bits has caused serious production problems.
We think of 32-bit integers like 8- or 16-bit integers: great if you can use them, but be damn sure you can.