Restarting the `int/uint` Discussion

The problem is not so much these specific collections that you are pointing out but the signature for traits. I think that limiting Index to i32 would be bad.

I can mmap a 100GB file, should the returned byte slice only allow you to index lookup the first 4GB? That is not ideal.

1 Like

I don't think so. If you go around telling people to prefer i32 for perf, and I go around telling people to prefer i64 for overflow-resistance, nobody is going to know what to do :wink:

I’m hopeful that the error message would tell them that they should think about casting their number to isize, or whatever, which may be smaller. At which point, they should think about that. I might be wrong, but my experience with Rust hasn’t been “start with easy but ill-defined stuff, and opt in to more detail”. They’re going to hit borrows and lifetimes five minutes later; this specificity might be a good warm up (or filter; shame on me for typing that, but if the programmer decides they can’t be bothered, Rust might not be for them).

In my fantasy world, most of these cases (comparisons with .len(), indexing, etc) would be taken care of by enriching comparisons and indexing (is something wrong with defining u32 to u64 comparison? I don’t know). I don’t write types very often when I don’t care (inference sorts out what I need, and would likely hand me an isize). This is sort of what I was getting at with the iunk vs isize thought experiment: I can’t think when I would write down iunk in a struct or function definition; I always seem to know what I plan to use it for, and write the appropriate type down.

If the question is (I sense): “what should the first experience be?” rather than “c’mon, we all like int, right?” you could think of having int in the language, bound to i32/i64 or whatever, but triggering a lint that says “ok, time to put on your grown-up pants” / “portability warning”.

I think the arguments that performance sensitive folks will write lots of int everywhere and then be confused is misguided. I can see the issues coming up, but it will be like explaining -O or --release, admittedly with more editing in their future. This is probably annoying for them, but way less so than “I ported my C++ OOP architecture to Rust, but now have all these borrow check errors”.

1 Like

There have been numerous arguments made that rust should be opinionated enough to force you to care enough to consciously choose a type.

If the “experiment” of not having a int/uint type ends up being considered too big a hurdle to new adopters, then we can always add it in later.

But, if we are right, and that people will very quickly adapt to this, and that there are significant emergent safety, performance, and correctness benefits from not having it, then we will never know.

So my argument is mostly on the reversibility/irreversibility angle.

But my pedagogical argument is that Rust should act in the role of guiding a programmer to the best choices, not just the safest ones. And best can’t always be considered to be 64.

2 Likes

u64 has the same performance penalty on 64-bit as it does on 32-bit, because it’s not about the actual integer operations. It’s about cache usage.

Getting rid of int (Design 1) solves the issue of perf vs overflow, because now both choices are explicit. It’s not like either is a default in existing code- people who default to 64-bit or 32-bit (conventionally speaking) don’t have to change anything.

I can’t tell concretely which option you prefer. Can you elaborate?

I don't think so. If you go around telling people to prefer i32 for perf, and I go around telling people to prefer i64 for overflow-resistance, nobody is going to know what to do

I think Valloric's point is that it may not make sense for there to be a global default, but rather that different shops will use different defaults depending on their needs.

The problem with that line of argument, though, is that there's still a shared ecosystem through crates.io, so it's hard to completely localize these concerns. But maybe it just means that only certain crates are appropriate for your shop.

Making a choice explicit doesn't eliminate the problem; it just means that every time someone uses a number they have to make that tradeoff. I'm suggesting that a good default might be "overflow", and let people opt into "perf" if they're pretty confident that their numbers can handle it.

What I'm saying is that if you're reaching for a default, "slower but more correct" seems better than "fast but less correct" for Rust. Which argues for i64.

They have to make that tradeoff anyway, explicitly or not. We’re effectively talking about creating a global default, because any use of int as default is currently wrong and accidental. But, from what I can see in this thread, it doesn’t look like there is a good global choice here- Google uses 32-bit, you use 64-bit. Leave the default to shop-specific convention.

Crates.io doesn’t really have to deal with that problem either, since crates tend to be about specific things (time, collections) where (again, from what I can see here), there is a clear global preference.

Seems better to you, but not better to Valloric; the gotchas that you two are worried about seem complementary. I think Valloric is saying that he acknowledges that you might want to do the tradeoff in the other direction, but that that should be your perogative -- every shop chooses their own default.

Actually IMO the conclusion is the exact opposite of that. Whatever we alias int to (if anything) will be a tacit recommendation from the language to use that integer size. By not making a recommendation, the user needs to determine if their use-case fits in an i32 and if not, go with an i64.

Telling people that i32 is faster but i64 is safer is great advice that should help them determine what type to choose. Can your use-case fit an i32? Go with that, it's faster. Doesn't fit? Go with an i64.

The decision will be made on a case-by-case basis without the language trying to recommend something without knowing what the use-case is.

1 Like

IMO this falsely equates the cost of speed with the cost of correctness.

Pervasive usage of i64 in your program when an i32 would work just fine will lead to a perf cost. Pervasive usage of i32 may lead to an overflow issue.

You're painting the costs as 50-50, but they're not. The perf cost is always there, but the correctness bug might be there, and it's certainly rare enough that neither I nor @pcwalton have ever encountered it.

Advocating int = i64 makes the 99.9% use case a slave to the 0.1% use case (and the odds are too generous to overflow here).

1 Like

I think allowing Vec to be parameterized by the index type is just fine, given that we have default type params. (Though it's possibly problematic to use u64 on a 32-bit system?)

I'm just not sure that it completely solves the problem, since other APIs may choose to work with vecs of specific index types and expose that fact -- and then you still have to handle various mismatches or potential overflows when a local index type doesn't match that. It's hard to quantify how common that would be though.

(More generally, this would push the ecosystem toward parameterized sizes in general. That's not necessarily a bad thing, and can even be added backwards-compatibly given default type params. It does mean that signatures are somewhat more complex.)

1 Like

The perf cost is not always there. There are many cases in which the perf hit (in context of the entire system) is so minimal it isn’t worth even worrying about.

2 Likes

The trouble with preferring 64 bit over 32 due to safety is that making overflow less common is not so great - you are still just as vulnerable to security issues, but more likely to miss the overflow in tests (I used to argue for 64bit, but was persuaded by this reasoning.

Also, using 64 bit on 32 bit means emulation which is super slow and the kind of non-transparent performance cliff we try to avoid in Rust, we should do all we can to make this trap explicit.

7 Likes

I think this is fine - every time you use a number you should worry about the size, if you don't want to do that, you shouldn't be using a low level language.

4 Likes

Entirely true. Then the user investigates the perf problems with that critical section of code and sees it's using int (which in this world you're proposing is 64bit). If they're a C or C++ dev (that's the Rust target market, let's not forget) I'll bet you quite a bit most such converts will not think of the int at all as a possible perf problem (since it's 32bit in C & C++). It's a hidden, counterintuitive perf hit and that's the most damning part. A true "gotcha."

People who care little for performance probably aren't using Rust to begin with. They're using Java, C# or something else where they get the benefit of memory safety with a GC. AFAIK the point of Rust is safety while being fast. We already have many languages for the safe-but-not-super-fast market.

4 Likes

In general, I’m in favour of option 1 - no int type, I believe this is the option which has had the most consensus in the past and it makes sense to me, to paraphrase NIko, we are all grown ups and should accept that there is no free lunch with ints - you have to think about and pick a size. The only problem is picking a name, isize/usize seem the least worst to me. Familiarity to C++ programmers seems like a huge bonus here.

I don’t strongly object to aliasing int to i32 but it feels messy - I fear that we will have a future where only beginners use int and all the style guides say not to - exactly like C/C++ today.

I do strongly object to design 3 - encouraging people to use pointer-sized int is in most cases plain wrong and is teaching the very bad habit that overflow doesn’t matter (because reasoning that your code will never get bigger than pointer-sized is much harder than reasoning that it never gets larger than 2^32 or 2^64)

.BigInt has no place as a default in a systems language. I don’t believe 64 bits is meaningfully safer than 32, so design 4 has no benefit, but does have big cons on <64 bit systems.

4 Likes

My gut feeling here is that parametrising by size will lead to just this kind of size-mismatch hell. I have been bitten in this way enough times by collections parametrised by allocators, etc. that I think it would be a problem. By convention and default parameters we could encourage most uses to use usize, but if that is successful, it kind of defeats the purpose of having the flexibility in the first place.