Default integer type should be safe to work with large arrays


#1

I’d like to comment about integer size and default fallback of integer literal, which is currently changing.

AFAIU the main argument for sizeof(int) = sizeof(u32) is that program should work identically on 32-bit and 64-systems.

That’s what folks love Java for. However there’s a big downside of having sizeof(int) = sizeof(u32). Programs often fail to work correctly when program create arrays of size more than 2G (or, for example, mmap files larger than 2 Gb).

I’ve seen two real-world examples of such problems.

First is a C++ project in a company I work. They use ints everywhere, and their programs just corrupt data, when data size is more 2 Gb, because of integer overflow. Their program works identically on 32-bit and 64-bit systems, it just doesn’t correctly load more than 2Gb of data on both 32-bit and 64-bit systems. That problem was hard to locate, and now it is very hard to fix, because project source is quite large, there are thousands of variable declarations.

Another program is lighttpd server. If I recall correctly, in version 1.x it could not process HTTP uploads of size more than 2Gb. Fortunately, it didn’t crash, it just rejected such uploads. Again it worked identically on 32-bit and 64-bit systems.

Of course, programmers in both situations were wrong and careless, they should have used size_t.

My point is that, “default” integer should be safe to overflows when working with large arrays. Default integer size can prevent such problems.

There’s a concern that 64-bit integers are slower than 32-bit. For me this argument is less important than correctness on 64-bit systems. Because such trivial performance problems are much easier to fix than incorrect work on large data sets.

And if “default” integer size is sizeof(void*), then it should be called just int and uint (because uptr and iptr are a bit weird and unusual for default), and fallback of integer literal should be to int.


#2

I think you’re conflating some ideas here. int/uint are strictly defined as pointer-sized. uint is used everywhere lengths and array indices are needed. int is used mostly for ptr offsets. We currently do not have a “default” integer type. We also don’t have silent integer promotion. Therefore applications would have to actively go out of their way to index with u32. If you don’t explicitly give a type to a number, but use it for indexing, rust will infer it to be a pointer-sized uint. If you explicitly type a number as u32, then Rust will complain if you index with it without an explicit cast.

People are dissatisfied with int/uint being named int/uint, because it’s confusing. They also want integer fallback back for programming “in the small”, when the numbers are used in a way where their type is ambiguous. The fallback can be any type. These are technically distinct proposals, although they are related and commonly conflated.


#3

We currently do not have a “default” integer type. We also don’t have silent integer promotion. Therefore applications would actively go out of their way to index with u32.

If

  • int is renamed to iptr
  • new uint size is 32-bit
  • 0 falls back to int (which is 32-bit)

then int and uint (which are 32-bit) would be “default choice” for integer type for programmers (like int in Java which also has long). And that is dangerous for reasons I explained in topic start.

For example, some HTTP client library might fail to load response of size > 2 Gb, because it parsed Content-Length header into uint.


#4

Most renaming proposals kill the int/uint names altogether.

All memory allocation APIs expect uptr. All library data structures use uptr. At some point their abstractions will meet the standard library’s (or someone else’s) abstractions, which want uptr. They will then get an error. If at that point they decide to cast, rather than evaluating that they are using the wrong type, there’s nothing we can honestly do there. However I seriously doubt they won’t use the uptr type right away when it is absolutely everywhere in our APIs.


#5

So if most integers in the libraries would be uptr, then

  • uptr and iptr names are unusual, while there are nice and popular names int and uint familiar to every C/C++/Java/Python/Haskell/and so on programmer
  • falling back to i32 instead of the most widespread integer type is counter-intuitive

#6

I think part of your argument is that you should have to think about overflow in the minimum number of cases. You argue that the default type should be pointer sized so that more things will ‘just work’ without thinking about overflow. I think that is wrong. You must always consider overflow. We should choose a default integer which makes it easiest to consider overflow. I believe, but do not have data to back it up, that most uses of integers have a platform-independent maximum, therefore, where you need to check for overflow in your code does not change from platform to platform. Reasoning about overflow and integer size is thus easier when the size is fixed across platforms.

It is only when you are indexing into arrays (and doing similar operations) when the reasoning about overflow becomes platform independent, and thus easier to deal with for platform-dependent integer sizes. Since both array indexing (due to iterators) and pointer arithmetic (because it is unsafe) are rare in Rust, I believe that such situations are more rare than platform independent situations.

The caveat here is that mostly, this doesn’t matter at all - in any real program you have mostly explicit annotations. Therefore, the only real benefits to the choice of integer fallback are pedagogical - I would like to get newcomers thinking about integer size and overflow from day one, so again I’m in favour of making the default i32.


#7

You must always consider overflow.

Well, I’m talking mostly about overflows in integers that store data size, and it can be prevented by making “default integer type” platform dependent.

Of course, when integers are used in implementation of numeric algorithms like SHA-1 or random number generators, you have to think about overflows, indeed.

Since both array indexing (due to iterators) and pointer arithmetic (because it is unsafe) are rare in Rust, I believe that such situations are more rare than platform independent situations.

That’s true for high-level code.

When you work with network protocols, data formats or implementing data structures, there are arrays and integer offsets.

Therefore, the only real benefits to the choice of integer fallback are pedagogical - I would like to get newcomers thinking about integer size and overflow from day one, so again I’m in favour of making the default i32.

If newcomers should think about integers size, there should be no fallback at all. (Personally, I prefer no fallback to fallback to 32-bit integer).


#8

You may want to go through this thread for past discussions on renaming int to iptr / index / intptr:

On the other hand, for the “default” type of generic integer literals like 0, please refer to these two proposals:

(Note that I’m just a newbie who is trying to become more familiar with Rust.)


#9

Rust is very strict about types and you cannot use anything other than what’s called uint now for referring to an element of an array or a vector. So what should be the “default” type for casual arithmetic can be discussed separately with how to deal with a large data set.

They have unusual names by design. They are supposed to remind Rust programmers that they are dealing with somethinig architecture-dependent.


#10

Rust is very strict about types and you cannot use anything other than what’s called uint now for referring to an element of an array or a vector.

You can easily cast, programmers cast between integer types a lot without a doubt. For example, integer cast is used more than 100 times in libcore.

They are supposed to remind Rust programmers that they are dealing with somethinig architecture-dependent.

Most other programming language users (C/C++/Swift/Go) need not to be reminded of that.


#11

This is almost saying you cannot use any primitive integer types except i64. Please recall that 32 bit Linux with Large File Support allows you to deal with files larger than 4 GB.


#12

This is almost saying you cannot use any primitive integer types except i64. Please recall that 32 bit Linux with Large File Support allows you to deal with files larger than 4 GB.

That’s a good point.

64-bit integers on 32-bit systems have drawbacks:

  • AFAIR they are not atomic
  • on some system they may be too slow because of lack of 64-bit integer CPU instructions

If there were no these arguments, I’d suggest int to be i64 everywhere.

Platform-specific size of int is a trade-off between performance/safety/flexibility.


#13

i64 is also slower than i32 even on 64-bit systems. It’s not as dramatic, but using numbers that are twice as large isn’t free.


#14

i64 is also slower than i32 even on 64-bit systems. It’s not as dramatic, but using numbers that are twice as large isn’t free.

That’s true. Integer operations itself are slower, but worst of all, 64-bit integers use twice as much space in processor cache, which is very limited on modern systems.


#15

Most other programming language users (C/C++/Swift/Go) need not to be reminded of that.

Keep in mind that integer types in C and C++ are total mess. The only good things are size_t and (u)intN_t types that are sadly still not used everywhere. Also keep in mind that Rust is primarily low-level language in contrast to Go/Swift/Java.


#16

Keep in mind that integer types in C and C++ are total mess. The only good things are size_t and (u)intN_t…

Yes. I propose Rust uint to be C++ size_t, and use it by default for integer type.

Also keep in mind that Rust is primarily low-level language in contrast to Go/Swift/Java.

This argument is for platform-dependent size of int, not against it. Because if higher-level languages (Go/Swift) are OK with int, then low-level language (Rust) is especially OK.


#17

My understanding is that size_t isn’t guaranteed to cover the whole linear address space but uintptr_t is. An ISO-compliant C compiler for 8086 can have 16 bit size_t and 32 bit uintptr_t. (Correct me if I’m wrong.)

I want to emphasize it’s so tricky to write a valid C/C++ code. Rust is much better designed to safety.


#18

Also keep in mind that Rust is primarily low-level language in contrast to Go/Swift/Java.

This argument is for platform-dependent size of int, not against it. Because if higher-level languages (Go/Swift) are OK with int, then low-level language (Rust) is especially OK.

Not necessarily. Platform-dependent types are bad for correctness of memory-unrelated algorithms because of overflows, but good for correctness of memory-related stuff because of overflows and performance. On the other hand, independent types are better for correctness and can be worse for performance (but sometimes also better, e.g. i32 on 64bit).

From that I draw conclusion that there should be dependant uptr/usize/etc used for vector sizes and similar stuff but default uint should be set strictly as alias to u32. With current Rust rules for integer conversions it will be painful to mix the two together so hopefully only few programmers try to do that.


#19

Also you should consider that an u16 can be slower to use on certain 32-bit processor systems when passing into functions and returning values from functions depending for narrowing. Take for example the ARM architecture. Therefore using uint can result in producing higher performance code across different architectures which can be contrary to popular belief that passing a value smaller than uint or one of the exact size needed would be more beneficial.

The one problem I see is when uint is smaller than you assumed.

The caller or called will actually narrow the arguments by doing an AND operation on the arguments, including IIRC the return value.

I am not sure if this directly relates to Rust and LLVM because I have done no actual tests, but I would not be surprised to find that it is true.