Restarting the `int/uint` Discussion

The shuttle didn’t crash because of the overflow, the integer in question wasn’t even used. It crashed because of the runtime checking. When you start doing overflow checks at runtime you’re introducing more bugs than you fix.

I tried to follow the discussion, but I ended up a bit confused:

In the case of no int/uint platform-sized types, what would I use to access a Vec/slice? What would it’s length return? It seems like on 64bit systems, 32bit types will not be able to provide full memory space access, while the inverse means that an index might be invalid for that platform.

If types were parameterized on the index they use, would that mean I’d have to parameterize every &[T]? How do I give a parameter to that? Would it need to be changed to &Slice<T>? I’ve seen default type parameters mentioned. What would be the default? Wouldn’t that default have the same issues as a platform sized type?

Also, what if I explicitly want to make a library that adapts (as in “uses”) to the user’s platform’s memory space. How much special casing would I have to do to take a slice and an index into the slice as function arguments?

Sure, you could turn off overflow checking in release builds, but if the value WAS used it would be wrong anyway. So historically there have been huge problems with overflows, even if anecdotally they don’t happen in some codebases.

it would be changed to some weirder name like ix/ux or something, but int would disappear because most people expect int to be 32 bits, not pointer sized

All of these proposals leave a (strawman) isize type behind with all the current int semantics. This is simply a discussion of whether there should be a type called int, and if so, what it should be.

1 Like

Excellent. That’s the part I was missing. That does put my mind at ease a bit.

I can’t write a long comment with arguments right now but, I do want to say that I’m in favor of design #1. Let’s not have aliases.

I don’t think people pick int because of it’s name. I think they pick it because it used in all documentation and Rust code as a default. If we were more conscious about picking a more appropriate type in the documentation (especially the guide) and the standard library with all it’s examples, I think people would use int less.

I think overflow resistance is a very weak argument. The only difference between i32 and i64 in that regard is that the latter will overflow later. If you want to protect yourself from overflows, choosing a larger fixed size integer is not enough. If malicious user input is involved, any fixed size integer will overflow.

We could have have a bigint with copy semantics, like in Python. This is reasonable, because bigints are usually not that big.

If malicious user input is limited to u32 and u16, then u64 will practically never overflow, while a u32 can be overflowed rather quickly.

The problem isn’t the size but rather the allocation.

I think bigint libraries like GMP are pretty good about minimizing allocations. Anyway, if Clone semantics are too painful to use for bigint, it makes sense to give them Copy semantics.

Alternatively, we could have a second bigint type that has copy semantics, so it could be swapped in for the native types. There could be a warning about it in production builds mentioning the “hidden” allocations. This could serve as the “I don’t want to think about the size” integer that @wycat wants. It would be a much better choice than i64.

I don’t think special-casing a BigInt type to be Copy, like the old Gc was, is likely to be a good idea at all.

Does anyone have significant experience using BigInt types in Rust? Does anyone have data about how many .clone() calls actually need to be added in practice due to the absence of Copy?

I mean, first we should determine how significant the problem is, before we approve heroic measures to resolve it.

We could also attack it from the opposite direction: make cloning less inconvenient, such as by adding a dedicated operator for it (for instance, prefix ^ and @ are free). I know people have wanted this for Rc as well. (If anyone wants to discuss this further, I think it deserves its own thread.)

In either case, having int be a BigInt doesn’t sound appealing to me, although I haven’t thought about it very hard. There’s so many conflicting expectations surrounding the name, it’s probably best to avoid it entirely.

[Clearly Discuss is more effective than RFC pull requests at bringing out discussion.]

When there were technical bookstores, I saw a shelf devoted to C/C++ pitfalls books like Enough Rope to Shoot Yourself in the Foot, not including "Effective C++", "C++ Puzzlers", and style guides. When will languages simplify all that and the language spec down to something programmers will read?

From the Java Language Spec intro:

It is intended to be a production language, not a research language, and so, as C. A. R. Hoare suggested in his classic paper on language design, the design has avoided including new and untested features.

We're wrestling here with bounded integers in a production language for systems and real-time programming across a wide range of address spaces. This is great but cutting edge new so the features beg to be tested (and pitfalls fixed) before v1.0. Are there quick ways to do the experiments and usability tests?

The features include the integer type choices, non-portable types like isize, auto-widening or not, what you can use for indexing, and possible polymorphism with BigInteger.

To the main question, I vote for Design 1: Rename int/uint to isize/usize (or something), for the reasons mentioned here and in the RFCs.

Also remove the i/u integer literal suffixes. 1i looks like a complex number, and it is in Go.

If it's feasible to obviate the pointer-size integers via techniques like array indexing with all integer types, that sounds better, but then what type does len() return?

Sometimes we want to use a "default" integer type when smaller sizes would do, for simpler APIs. It should be the same as the integer type inference fallback (the language's default). A target-dependent type is a bad default due to portability hazards.

Sometimes we want to defer picking an integer type. That's OK. How about a TO-DO attribute with lint checks that complain about using it in binary I/O and released code? If the name int comes back as a synonym for a large integer type with this TO-DO attribute, that could be OK.

Can BigIntegers be made polymorphic with bounded integers? People have mentioned issues with copying and substitutability.

Although they’re problematic for hard real-time code, they’re much like virtual memory: things slow down rather than fail.

The Dart VM has a clever design for its sole integer type. Internally:

  • A value that fits in i31 or i63 (depending on the size of a pointer) is stored inline with a 1-bit tag.
  • A value that fits in i64 requires allocating an i64.
  • A larger value requires allocating a Bigint.

So values up to 1/4 of the number of bytes in the address space require no memory allocation.

For the record, my preference would be design #2. In my experience, a 32-bit int is a good default choice for almost anything except system-related stuff (number of bytes, count of nanosecond ticks, etc). My second second preference is design #1, provided it includes polymorphic indexing.

I would generally support 1, although I would be open to widening coercions as long as the availability of coercions to/from isize/et al. do not vary with the compilation target.

One related issue that has not been discussed here is the already-accepted RFC that restores the integer literal inference fallback to i32. I like it, but is arguably as much a default as the naming of the type.

Another systems language novice here, here’s my 2 cents from what I believe is at least part of the Rust intended audience (by design or just happen stance).

TL;DR: My dream scenario would be this: unsuffixed literals and int are i32 but remove i and u suffixes entirely. Also implement a pointer sized integer named something else entirely and using a suffix such as ip, iz, iw, ix, etc etc. Personally I think isize is horribly strange, and can imagine anyone else coming from a dynamic language would say the same…but that’s another topic.

Full Story Rust attracted me because I’m coming from a dynamic language background (Python), and it is allowing me to get into the systems level business safely (otherwise I woul dhave just jumped into C/C++). It is also somewhat familiar territory with type inference, etc. So I highly believe I won’t be the only non-systems programmer interested in using Rust (which is a great thing!). Having said all this, coming to a systems language, I know there will be a learning curve and things I need to start caring about, but saying that, “Everyone who uses Rust must care about integer widths all the time!” is ridiculous. You’re dangerously close to taking The Moral High Ground, which is super offputing. And unless Rust is being imposed on you by your job, that’s the point at which you walk away.

Initially, when starting with Rust people may not even be aware there are multiple width integers. They simply know, “I need a number.” (As many dynamic language users do). Not having an int type may be strange, to veteran C/C++ (or even Java/C#) developers, but may not be a huge deal to dynamic language developers. The downside is those trying to take the learning curve are potentially going to suffer by not having a good default choice and being told picking a good width integer is a must. When suggesting disallowing ints in release code, etc. This will also turn off newcomers. I totally understand trying to assist the programmer not make mistakes, but telling him or her outright what to do is a no go.

Using a BigInt just play it safe and cover those huge huge numbers isn’t plausible in my book. People coming from languages like Python or Ruby are coming to Rust for a reason, to get better performance, or static typing, etc. They will be aware there are differences in Rust, and specifying that ints can only hold up to X number safely is one such step on the learning escalator.

Why this is such a huge topic is strange to me. Those who really care about integer widths, will know to look for one that suits them. 32bit widths are almost certainly large enough for most of the, “Just give a me a number and I’ll sort out the size later” scenarios. If someone knows their number is going to be huge in advance, they already know they want a large (64bit or higher) number.

The real question to me is what is the fallback for unsuffixed literals? i32 from what I’ve heard. So in my mind, if we keep the i and u suffix (which I think should go away due to looking like imaginary numbers, and can be confusing when we already have unsuffixed literals) they should denote an int type which in this scenario is the same as unsuffixed literals (i32). Now, that’s a lot of duplication, and could lead to confusion to newcomers who may be using 5i or let x: int = 5; just to try to start doing things the “Rust Way.”

/rant off

I thought about that as well but it’s much more limited. It’s basically up to you to use this feature or not, it won’t contaminate external APIs and the like.

Actually, removing or renaming the u and i suffixes makes having a default integer fallback even more important from a usability perspective because code like for _ in range(0u, 10) today (which does not bother me much all truth be told) would become for _ in range(0u32, 10) or something like that which looks so much worse. Even if you use u8 there it makes it significantly harder to parse at a glance.