Restarting the `int/uint` Discussion

It seems to me that the core problem here is that I don't think that maximum performance for CPU-bound programs is the only use-case for Rust. As a result, I see a large swath of cases where "I need an integer, but I'm not sure what size yet" is a reasonable state of mind.

In contrast, a number of people here have said that that state of mind is essentially incompatible with the goals of Rust. The idea is that if you're using Rust, you should always be ready and willing to think carefully about whether you really need those 64-bits, because using them unnecessarily isn't free. Especially when writing IO-bound programs, taking the time to carefully consider the size of every integer, especially early on in the lifetime of a program, is just not worth it in my opinion.

I think that's what this argument is about. I don't like Design 1, because it's a de facto rejection of this state of mind, which I find myself in regularly.

If someone is writing an IO-bound program and doesn't yet know how big his integers will be, Design 1 throws him directly into the middle of this argument, for a small benefit. On the flip side, if someone is writing a CPU-bound program, the answer ("you should think carefully about the size of your integers") is pretty easy to arrive at no matter what we do here.

As someone who writes both kinds of programs, I find myself using smaller, fixed-sized integers in CPU-bound code, even though the current state has an int type.

This is why I (elsewhere) suggested providing int (or whatever name, I don't care) specifically as a type intended for stubbing out code. Shipping with it should be considered a code smell, or possibly even linted against when building in release. This wasn't well-received, though.

You’re saying that int should be a straight alias for i64. If that’s the case, it explicitly isn’t a don’t-care option. It’s 64 bits, and you’re making a decision to use 64 bits every time you use it, just using a less explicit name and introducing the overhead of having to choose one name or the other. The fact that you didn’t think hard about it shouldn’t matter; there’s zero reason you would have to think hard about typing ‘i64’ either.

A don’t care option would be a bigint, because that would actually be correct, or something more exotic like a type that threw up big warnings outside of debug compiles.

2 Likes

Except that it seems like there is disagreement about precisely this point, so attempts to figure out a good default for integers without clearly defined bounds will throw somebody into this kind of debate.

I was not convinced upon reading the original post, but the discussion in this thread leads me to think that Design 1 is the best idea. There is no need to have an int type if we already have i32. If it’s not 32 bits it betrays the expectations of many Java programmers. At the same time, the default type for numbers is a separate issue.

At least if the programmer gets an overflow because they used i32 everywhere they would not blame the language since they typed the width themselves! If they typed int and it worked on their machine - but then it crashed on 32 bit machines… it would be a surprising result to someone who is used to int being 32 bits.

2 Likes

The assertion throughout the thread is if you’re using a low-level language, you should probably be thinking about your integer sizes. But “should” is a different statement than “have to”; if you want to be indifferent, go for it. There’s no reason your indifference can’t equally apply to the name ‘i64’ as it did to ‘int’, especially if ‘int’ is just an alias for ‘i64’ (i.e., you actually are still making a concrete size decision).

So the argument against using 64-bits is that someone coming from another programming language might expect 32-bits?

Except for the fact that you're somewhat unlikely to arrive at this, and if you go searching for a good practice, you're likely to find conflicting advice.

Oh I fully understand, I didn’t mean it as a knock to the guides at all. Considering how long this debate has been going on and the potential magnitude of the changes proposed, it’s completely understandable why the guide doesn’t delve into integer widths. I just meant this discussion was the first thing that made me realize that choosing a integer width is something that should be taken into serious consideration. This thread alone has proved use to me as a beginner and I agree that Rust itself can lend a hand as well.

The source of the potential conflicting advice has nothing to do with the name, and making ‘int’ a straight alias doesn’t do anything to remove that conflict, it just tries to hide it under the rug to no real benefit.

As for the process of being stymied by the lack of something named ‘int’: how do you picture this actually happening? Someone tries to write ‘int’, compiler error, google ‘rust int’, get this thread, head explodes? I promise you that dealing with explicitly-named integer types is going to be one of the smallest roadbumps newcomers hit, well behind the borrow checker, closures (wtf is move), trait objects, Send+Sync+Copy, ref, etc, etc.

2 Likes

I don't think it's as easy to arrive at that conclusion as you say. Certainly if you have experience it's easier, but with a name like int, a lot of people will just keep using int if things work.

This includes people who are just trying out or learning Rust without being fully aware of its integer types, even if they are experienced with this issue in another language.

In any case, you have already been using i64 as a de facto default (and others have probably used i32 the same way where it makes sense for their problem domain). Using the name int for something that's not a good global default is a bad idea.

I think this is a feature, not a bug.

Sorry for being a little bit glib there.

I think it’s clear that at the current time, we don’t have a very clear consensus on any default. I personally feel that there are reasons for a default (to avoid casting hell between libraries, and to signal a reasonable “big” size), but I can also see that we don’t have any consensus on that now.

The good news about Design 1 is that, with the minor exception of some casting hell from early libraries, we can always create a default later. I’m fine with that.

If you have a size called int it has to be 32 bits, that’s the “consensus” with C# and Java. So I’d rather not have a size called int because i32 already exists and u32 is actually shorter than uint. The complicated situation in C is why later languages just made it 32 bits.

Making it pointer sized is an improvement where you want something pointer sized, but confusing when you don’t. So having int be pointer sized is not the wisest choice because it is surprising to most programmers, including C programmers who would still expect it to be 32 bits on a 64 bit machine. I feel like it would cause “it works on my machine” issues where the developer has a 64 bit machine and didn’t check for overflow.

So using that logic I would say that the name int should be removed. I have no strong opinion about the fallback for numbers since that’s a separate issue.

1 Like

Another novice piping in here. Coming from a few years years in C/C++, C#/Java, Haskell, etc. Just read this whole thread on-and-off throughout the day.

I support Design 1. From C/C++, int/unsigned have always caused a niggle in my mind every time I've used them, because I'm never quite sure what size they are/will be (and rapidly forget after looking it up). From C#/Java, I've liked int = i32, but had to remember, "Oh! I should consider the bounds."

From Haskell, we have constructs like undefined (generic any-type placeholder; a compiler warning). This can be great: I can stop thinking about something while I'm working on it and then make a decision a bit later, without completely disrupting my workflow by stopping and thinking about what I should be doing. Plus, as a friend pointed out, it's easily greppable. Similarly, I'm a huge fan of @Gankra's idea:

Though it is beyond the topic of this thread if choosing Design 1, I would like to see more discussion along these lines. It allows Rust integers to be good for BOTH prototyping AND production, at a level-of-granularity where I can prototype a single function and work out the bounds a little later (i.e. before pushing a commit).

In fact, if int/uint were highly discouraged in this way, I think it would be okay to completely ignore performance, e.g. by using i128/u128 (or a bigint, which apparently isn't possible).

If there was another thread about this (since you said it "wasn't well-received"), I would love to read it!

EDIT: Oh, yeah, and: I believe a type like this could technically be a library, yes? Since it wraps a native type but has lints/warnings attached to it. Perhaps not excellent for beginners, but good for developers trying to get over the "what size do I use" block.

1 Like

@Valloric Something has been nagging me all day that I was hoping you could help me with.

You mentioned that over 12 years of writing C++ code in a large codebase, you have never encountered problematic 32-bit overflows. That is so far from my personal experience that I suspect there might be some style rules in place on your codebases that helps to eliminate the possibility.

Can you think about what style and code review rules that were in place on these codebases that might have eliminated 32-bit overflows? I can’t help but wonder if we’re missing an opportunity to canonize (either in the language or the style guide) some set of advice that makes 32-bit overflows much less painful than my experience, while still maintaining the performance of 32-bit integers for the common case.

Thanks so much!

Design 1 or 4 are the only ones that make sense.

If there is an “int”, it must be a bigint.

All other types violate basic properties of integers, namely that addition is always possible or that for every a, b such that a < b, for every c, a + c < b + c.

Indexing is not a problem, since any reasonable design for indexing would support indexing with any integer type (all you have to do is a bounds check, and then cast to machine-sized integer).

12 years of writing C++ but not all of it in one codebase (thank God!). For Google code, the entire C++ style guide is here. There really isn't anything special in there that helps prevent integer overflow. The recommendation is to use plain int unless you have doubts about your use-case fitting in that range. Then use something like int64_t.

That's it. It's the same rule of thumb I've used since I started writing C++. All the compilers I use have int as 32bit, and if I ever have even the slightest concern about the range of values being adequate, I go with a 64bit int typedef. These aren't rare cases; at work I probably pick a 64bit int several times a week.

I really see only two cases: either I work with integer ranges that are laughably below the 4 billion limit and thus int is fine or I use a 64 bit int. There's no middle ground.

I've never seen a 32bit int overflow bug in code I've written, read or maintained (I work with absolutely brilliant people so there's that). This experience spans many different codebases, from game engines to epub editors to MapReduce code etc.

Note that @pcwalton has stated that his C++ experience showed similar lack of exposure to 32bit int overflows. Like him, I've been bitten by unsigned int underflow many times. It's a common enough problem at Google that using unsigned ints is discouraged across the codebase.

I've also stuffed a uint64_t in an int64_t more than once; on one occasion it made it all the way to production and caused issues in a Product You Certainly Use.

2 Likes

You may have not seen overflow errors, but a shuttle crashed because of an overflow error. I believe the software raised an exception (overflow was checked), but this is useless at runtime because the shuttle is already in the air and can’t recover from an exception when you’re already in flight.

This is pretty great.

We use int very often, for integers we know are not going to be too big, e.g., loop counters. Use plain old int for such things. You should assume that an int is at least 32 bits, but don't assume that it has more than 32 bits. If you need a 64-bit integer type, use int64_t or uint64_t.

For integers we know can be "big", use int64_t.

In Rust terms, that would be something like:

We use i32 very often, for integers we know are not going to be too big, e.g., loop counters. For integers we know can be "big", use i64.

I liked your "laughably small" phrasing. It helped me a lot. I think guidelines like these, paired with Design 1, would work quite well.

Thanks so much.

3 Likes