Restarting the `int/uint` Discussion

nikomatsakis · December 31, 2014, 9:11am

We do have the FFI lint for this reason.

nemaar · December 31, 2014, 9:13am

+1 for Design 2. Size of the integers should not change just because it is compiled to a different architecture.

strcat · December 31, 2014, 9:23am

The entire discussion here is framed in the wrong way. The int and uint types are not convenient “machine size” types. They do not correspond to the general purpose register size (word size). These types are pointer size, which is different than register size on some architectures.

There are other platform dependent integer types that are useful… and the int / uint naming is only sensible for word size at best - which is not pointer size.

benh · December 31, 2014, 9:24am

If we’re keeping the int type as an alias for i32, we might as well do the C# thing and rename the sized integers accordingly instead of having multiple names, for maximum intuitiveness for people coming from languages with 32bit int types.

strcat · December 31, 2014, 9:27am

Few languages have a 32-bit type called int. It’s not how the type is defined in C - it’s how it’s implemented on a subset of platforms, but the only guarantee is that it’s at least 16 bits. The long type is at least 32 bits, not int.

strcat · December 31, 2014, 9:28am

So how is offset and the built-in indexing going to be defined? Rust has clear use cases for pointer-size integer types, and making it a separate type on different platform will greatly hurt portability. The default would be that code simply doesn’t work on other architectures.

strcat · December 31, 2014, 9:44am

I don’t think any of the 4 proposals here makes sense. It makes distinctions between choices that aren’t exclusive and is missing the most prominent view points on this issue.

The first design implies that int is a default, but there is no current default in the language. The i32 type is being added as a fallback for inference which is the closest thing to a default. It claims that there’s an opportunity to provide a good decision / guide design but fails to substantiate that claim. The usual design suggestion is to use the integer type that’s large enough for the use case - and while you can provide guidelines for some common cases, there is no sane “default” choice.

The second proposal simply creates duplication in the language. The listed pros and cons make no sense at all because programmers need to be aware of the definition of the type regardless of the name. Choosing an integer out of a hat and hoping that it doesn’t overflow is ridiculous.

The third proposal makes some of the same mistakes and misses the point of having pointer-size or register-size integer types (not the same!) entirely. Again, the integer type needs to be chosen based on the needs of the use case. You’re not going to catch most of these issues simply by running the test suite - that’s not a typical integer overflow bug. They are going to occur in the rare edge cases you didn’t think about when choosing the size (picking an arbitrary type doesn’t work).

The RFC proposing that the pointer-size integer types - which we do have, and do need - be renamed to an appropriate name was what the community was behind, and this is a poor substitute for that. It doesn’t address all of the concerns and is full of inaccurate claims.

simias · December 31, 2014, 9:45am

I vote for option #1. I don’t quite see how not having an int type would be a show stopper for people new to rust. For instance there’s no float type, only f32 and f64.

In particular, in the “Pros” for option #2 I read “This design encourages people to use 32-bit integers when they don’t have a better idea in mind.”

Is that really a “pro” though? As you mention in option #4 unless you’re using bigints you probably can’t ignore the width of the type. So I’d say that might just be encouraging sloppy programming.

Maybe I’m biased because I’m mostly doing low level programming these days, but are there really cases where some of you write code in C, C++, rust or whatever and don’t care about the width of the integer type? I don’t have a concrete use case for the “Good enough” integer and what does “Good enough” even mean? If you tell the user (especially those coming from languages like python which use bigints for basic integer types) that the rust int is good enough you’re giving them a gun to shoot themselves in the foot.

I’ve made a quick unscientific survey of the C and C++ code lying on my hard drive (both mine and third party code). The only uses of ints I see are either:

code assuming it’s at least a certain size (C garantees it’s at least 16bits, I’ve seen a bunch of “assert(sizeof(int) == 4” as well), but in this case you can easily just use an u16 or u32, they’re probably just using int because it might look nicer or doesn’t have to include stdint.h or other headers.
iterating through an array (and that’s arguably dangerous if you aren’t making sure the size of the array fits the int)
signaling errors in return values but that’s not really a use case for rust and it’s still about assuming a minimal range for int

When doing maths with loosely constrained ranges I usually end up using floating point, not integers (or bigints if they’re available). And when I need to do arithmetics with integers I’m very careful about not overflowing. The soon to be added debug checks for overflow would help with that though.

Perhaps less importantly, making int an alias for i32 would happen to match the C int type on x86 and amd64 but that wouldn’t be true on all other architectures. If support for one such architecture (where the C int is not 32bit) is added at some point then existing broken FFI code that uses the rust int type to match the C int type would break. Again, not really a major concern at that point but I thought it was worth considering.

However I’m in favour of adding some form of integer coercion to limit the use of casts, although maybe simply allowing slices/arrays to be indexed by any unsigned integer type would be enough? That seems a bit more conservative than allowing coercion in the general case. Implicit type conversions is one thing that I really don’t like about C, although I suppose it’s not so bad if you only allow it to a bigger type.

But then as you mention that would make coercing to and from the isize type change from architecture to architecture and that sounds pretty nasty to me. If I understand correctly that would make this code build on amd64 but not on 32bit architectures:

fn foo(index: u64) -> T {
    some_slice[index]
}

I’m not really sure I like the sound of that.

strcat · December 31, 2014, 9:51am

It’s funny that the person claiming that my comments are “technical sloppy” is so clueless about an issue where they’re presenting themselves as an authority. I suggest reading the in-depth discussion on this issue and the well-written (unlike this noise) RFC by Jerry Morrison. Starting a whole new discussion thread when you have little grasp of the problem area isn’t helping anything.

CloudiDust · December 31, 2014, 10:37am

There is actually a similar suggestion about introducing multi dispatched integer indexing to the core datastructures that may fix the ergonomics problem without introducing coercion in this comment thread.

Yeah, I agree it is better than adding general coercions especially when “u64 -> usize” may or may not work depending on architectures.

nikomatsakis · December 31, 2014, 10:37am

The term “default” here is not, as you say, a technical “part” of the language. Nonetheless, it is common for people to have a “go to” integer type that they pick first. This is what the default is, and Yehuda’s point (I think) is that people will pick int or uint, whatever we say, so we might as well align int/uint with what we think the best overall choice would be. (I think integral fallback is mostly a red-herring, since it really only applies in small one-off programs or other random integers floating about, typically 0 or 1.)

Also, a point of clarification. By “register-sized integer types”, I presume you mean “fastest size”? I mean, on an x64 system, there are at least some registers with every possible width, so the term is not particularly precise. In any case, it is certainly true that none of the listed proposals included variable-size integer types except for pointer-sized (and they all assume a flat address space as well). This is no accident. Speaking for myself, I think we need to keep the zoo of integer size types to a minimum, and options like “fastest integer size” don’t carry their weight. Every machine-dependent type carries overflow and portability hazards along with it, and if you really really care about using the “fastest” possible type, it’s easy enough to define your own aliases with a #[cfg] switch.

nikomatsakis · December 31, 2014, 10:40am

While I think the idea of parametrize the core data structures over their index type has potential, it is not a panacea. This indexing would have to spread very far – for example, iterators would also need to be parametrized so that calls to enumerate know what type to yield. In general, I think we should be wary of using type parameters to address every problem. I think permitting coercions from small to bigger integer types seems useful and harmless in any case, and might go a long way towards improving ergonomics (though I know it’s only half the problem).

petrochenkov · December 31, 2014, 10:44am

+1 to this gentleman and his reasonings. (Except for integer coercions. The problem can be solved with existing multidispatch traits without adding another complexity to the language.) From my C++ experience int as a "default integer type" is rarely used in professional code, discouraged by guidelines and isn't really needed. size_t/ptrdiff_t and fixed-width types are used instead, and Rust already have them all (modulo renaming). Even addition of int as a simple alias to i32 (a feature aimed solely for beginners) creates more problems than solves and beginners will have to relearn in the end.

This is the exact bias that is good and needed, Rust is supposed to be a low-level language after all : )

In general, this restart looks more like an attempt to disregard the arguments and the consensus from the previous discussions, than something constructive.

simias · December 31, 2014, 10:48am

Nonetheless, it is common for people to have a "go to" integer type that they pick first. This is what the default is, and Yehuda's point (I think) is that people will pick int or uint, whatever we say, so we might as well align int/uint with what we think the best overall choice would be.

I see this repeated by several people but I still don't get it, so I'm going to repeat myself untilI get an answer: why would you encourage people not to care about the integer width if you're not using bignums? And even if you do, what's your rationale for choosing that 32bit is the right default? I still don't get the motivation, can we get some concrete examples of what this mythical "good enough" integer would be used for?

I think permitting coercions from small to bigger integer types seems useful and harmless in any case, and might go a long way towards improving ergonomics (though I know it's only half the problem).

I gave an example of problematic cast in my post above, coercing u64 to usize wouldn't work on 32bit architectures. Neither would u32 to usize on 16 bit architectures. That would be an easy way to write non-portable code.

nikomatsakis · December 31, 2014, 10:49am

This is actually a fairly good point. It limits one of the implicit pros of having int be an alias for 32 bit -- we'd still want to lint ints out of FFIs, though it would certainly mitigate the harm of being sloppy in practice.

nikomatsakis · December 31, 2014, 10:58am

Who said anything about encouraging it? We're just recognizing reality.

I do not feel that 32 bit is a good choice, though I find some of the arguments made in favor of 32 bit somewhat persuasive. My feeling has been that pointer-sized is actually a pretty good choice. Anecdotes are not data etc, but looking briefly through my code, I see a fair number of uses of uint where I haven't thought deeply about the range of values they will take on. Almost invariably, they are counters, either for recursion depth or indices of some kind. For these cases, choosing the size of address space is a safe upper bound. So, for the way that I write code, uint is a safe "go to" choice. Now, in practice, I doubt most of those values will exceed 32 bits, so I think one could argue that u32 would have served as well, and given me smaller data structures to boot (though I doubt that this size different would be measurable in most cases).

simias · December 31, 2014, 11:06am

Who said anything about encouraging it? We're just recognizing reality. [...] Now, in practice, I doubt most of those values will exceed 32 bits, so I think one could argue that u32 would have served as well, [...]

So, would you say that having uint named that way encouraged you to use it instead of u32?

nikomatsakis · December 31, 2014, 11:25am

Ah, I see your point. You’re arguing for “no type named uint”. Fair enough. Yes, I think perhaps having uint encouraged me somewhat. On the other hand, in writing C++ code (where using a naked int or unsigned is very gauche), I’ve certainly seen that int32_t and uint32_t become the reflexive “go to” choice instead, and it certainly happens that those types are used where (imo) a wider type might have been a safer choice.

Also, in my comment, I didn’t mean to imply that I think I should have used u32 in those cases (though from your quoting it looks like that’s how it sounded). I think uint/usize/whatever was probably the right call. I’d say it’s good to substitute smaller integral types where the domain allows, but it’s only worthwhile where it will make a big difference in memory usage or performance (premature optimization and all that). I guess that one of the things that is unclear to me is how frequently this is the case. Microbenchmarks are (typically) not very representative. I know that it’s been argued (e.g., by Valloric, on this thread) that larger integer types are a kind of hidden tax that has a bigger effect than we realize.

CloudiDust · December 31, 2014, 11:40am

I think I overlooked the “Rust on 16bit” use case. With that considered, keeping int/uint would definitely be the wrong way to go.

strcat · December 31, 2014, 12:03pm

Existing code is abusing int and uint as “word” sized types. For example, the pure Rust big integer types were done this way. A larger hardware type means more work can be done via one instruction in cases like that. The fact that the language itself refers to pointer size as target_word_size is a strong indication that there’s a lack of understanding.

Topic		Replies	Views
If `int` has the wrong size ...? bikeshed (deprecated)	26	5730	March 25, 2019
Default integer type should be safe to work with large arrays internals	19	6725	March 25, 2019
A tale of two's complement	62	24976	March 25, 2019
Pre-RFC: Generic integers v2	82	2595	August 26, 2024
Pre-RFC: Generic integers (uint<N> and int<N>) language design	50	7363	March 25, 2019

Restarting the `int/uint` Discussion

Related topics