If `int` has the wrong size ...?


#1

@steveklabnik posed the question: What should we be suggesting at the type to use by default in the Guide for integrer valued things (Rust Issue #15526).

Meanwhile, in the discussion about putting integer fallback RFC PR 212 back in, part of the argument during the team meeting for why we need a by-default fallback (as opposed to one you opt into via a scoped attribute) was that int, as in the type named “int” (whatever that may be), will be the type that people expect as the default, ending with: “if int has the wrong size [then] that’s a different discussion.”

So, okay, let us have that discussion somewhere then. Maybe on this discuss thread. :slight_smile:


The fact that the integer arithmetic type that most everyone will reach for at first (int) is the one that is inherently platform dependent is troubling to me.

Note that this is distinct from concerns about the presence of a fallback. I actually do find the “Lack of bugs” argument from RFC PR 212 persuasive. But that is only dicussing the cases where no type annotation nor i32/i64 suffix is provided; it says nothing about the cases where the user wrote int even though it is actually inappropriate to their problem domain.


Rust Issue #9940 was one take (with a very long attached discussion) on how to address this problem. Well, sort of: namely by renaming int/uint to something that made it clear that they are pointer-sized (and thus really only for use as indexes into objects in the address space). That RFC suggested intptr_t / uintptr_t

  • Anyway, #9940 got closed because the person who filed the issue didn’t think the change was worth the trouble anymore.

You can read the arguments back-and-forth on the comment thread for #9940 and for RFC PR 212 about what the size of a default type should be (i.e. some people thought it should be fast, i.e. i32, others thought we should discuss something less likely to under- or overflow, namely i64).

I am sympathetic to both arguments; I wanted us to consider i64 as a default size, but I would not fight i32 if my only alternative is the platform-dependent size aka intptr_t.

In any case, renaming the current int to intptr_t is probably not enough on its own, because I suspect the type names int and uint (and likewise the suffixes i and u) are too sweet to give up. So then, what do we use them for?

(See also related topic: Integer Types )


#2

There is also Rust Issue #16446, which seems possibly related.

cc @nikomatsakis and @pcwalton whom i suspect will want to either weigh in, or simply squash the discussion before it baloons out of control as others have. :wink:


#3

The RFC PR 161 suggests index or intptr as alternative names. Other ideas: size, offset.


#4

One problem with names like index, size, and offset, besides the plain lack of familiarity, is that it might not be immediately obvious that this is an integer type at all. Perhaps names like isize and usize could help with this (this is harder with index because it already starts with an ‘i’).


#5

Maybe index for int and size for uint?

That may be a good point to have an unfamiliar type name.

Anyway, the goal is also to encourage peoples to ask themselves “do I really need an architecture-related type here?”.


#6

My point was that having ‘i’ and ‘u’ prefixes (whatever name they’re prefixed to) would make the connection to i8i64 and u8u64 more obvious, and help comprehensibility.

(For example, if I saw a type name like size without any prefix, my first association might be to QSize rather than to size_t.)


#7

I agree that keeping i and u as the first letters of whatever types we use could be good here.

I had a recent thought that maybe part of the problem is the instance on using the same suffix for both i and u in the pointer-sized case.

I.e., maybe if we took off that implicit constraint, then we could arrive somewhere interesting, like uaddr and ioffset. (Since, you know, the signed case arsies from pointer arithmetic and thus it makes sense to talk about as a releative offset versus the unsigned being an absolute address.

  • Of course, going down that road might lead one to them attempt to encode rules in the types like uaddr + ioffset -> uaddr and make addition of uaddr itself illegal, while uaddr - uaddr -> ioffset. Kind of funny looking. Maybe this is more appropriate for a private newtyped library of my own rather than the core language.

#8

Well, this reflect the pointer arithmetic that Rust try to avoid, so it may be a good point. Anyway, a cast is still possible and more explicit about the danger the developer will be facing to.

I like the ioffset, but for range attributes we may prefer an unsigned type, which could also make sense with a usize because the range should be able to handle a collection size/length and a len() is often (in other languages) used as a for upper bound.

A usize will be more explicit than the uaddr (a size must always be positive) and is more abstract than an address. A ulength or ulen type make sense too.


#9

What about floating point fallback?
The lack of it creates relatively more inconvenience than the lack of integer fallback. (Although I’m still against both of them).

As for bikeshedding:
int / uint -> idx / uidx integer fallback -> no (or i32) floating point fallback -> no (or f64)


#10

It’s true that a signed value makes more semantic sense as some kind of relative value (offset), and an unsigned one as an absolute one (size). But I’m not sure if this outweighs the fact that from a principle-of-least-surprise perspective, I would really expect the signed and unsigned versions of the type to have the same name apart from the prefix (again, just like all of the other fixed-size integer types).

Anyways, coming out: my personal preference right now would in fact be for isize and usize. These are the first names that have come up as possibilities in the whole discussion so far (going back to the threads on GitHub) which cross the threshold of being non-awkward enough that I could actually conceive of them replacing int and uint.


#11

I agree, there is no way to to see offset and expect the person to be able to derive size from it. It feels like a special case.

The correct term for it is word but assembler kind of spoiled it (why is it QWORD for 64 bits? It’s clearly the word size on my processor). I guess isize and usize is fine


#12

The C semi-equivalent would be size_t and ssize_t, or Rustified maybe size and ssize.


#13

I like isize and usize, as they’re short enough, get the point across, are familiar to those coming from C, and match the prefixes used for other integer types.


#14

+1 to usize/isize, also +1 to i32/u32 as the default signed/unsigned integer type.


#15

You know, i think this note gets at why I prefer uaddr over usize: this type should not be considered abstract, as it is very much tied to the underlying machine architecture, no?

(Having said that, I’d probably take usize/isize over today’s uint/int.)


A separate note: I am still concerned about my end note:

But perhaps people are indeed suggesting we throw out int/uint, and repurpose the i/u suffixes for isize/usize ?


#16

With the introduction if isize/usize, what will be the type of i in

for i in range(0u, 10) {
    println!("i = {}", i);
}

An iterator of type usize strikes me as very unnatural.


#17

If we go for isize, usize, We should change the suffixes away from i/u too, since them being that short only really makes sense if we want to push people towards habitually using them.

for i in range(0usize, 10) { println!("i = {}", i; }

“Wow that looks gross! I better use u32!”

for i in range(0u32, 10) { println!("i = {}", i; }

Mission accomplished!


#18

The thing is that, outside of unsafe code, Rust programs do not and should not ever have to think about raw memory addresses. We have safe reference and box types whose semantics are defined abstractly and do not expose any notion of a machine-level memory address. Thus it seems strange for the naming of these types to reflect this low-level detail, because they are not primarily intended for use only with unsafe code. (This is the same thing which felt off to me about the earlier suggestions for intptr and uintptr.)

The aspect which safe Rust code (i.e. what should be the vast majority of Rust code) cares about is that these are the types which are appropriate for representing the sizes of and indexes into arrays (and presumably other kinds of collections as well).

This is indeed an interesting follow-on question, and also ties into the choice of the default type for integer literals.

The defining characteristic of the int and uint types, resp. i and u suffixes are that, because they’re so “sweet” and culturally ingrained, they’re what people will habitually use whenever they don’t feel like thinking about it. Basically, the implied meaning of a type called “int” is that this is the type which it’s appropriate to use “most of the time”.

The question is whether such a type exists. Is there an integer type which it is appropriate to use “most of the time”? My feeling is that we should have a type called int, defined as this type, if and only if such a type exists. In other words, if by giving them a type called int, we won’t be giving people a false sense of security and leading them into mistakes. If, most of the time, you can’t avoid having to think about the correct choice of type without implicating the correctness of the program, then we would be doing our users no favors by pretending otherwise, and it would be preferable to force a decision.

The only remotely plausible candidates for being int are i32, i64, and isize. One thing we could do is try to collect data, not necessarily about which type is used, but which types would be correct for the types of integers in existing Rust code. (There is the possibility that more than one of these would be correct, i.e. that it doesn’t matter, and also that none of them would, e.g. that you actually want i8.) If we find a clear majority for any of the three then most likely that type should be int. Otherwise, most likely none of them should. (Conversely, if we find lots of cases where int is used and it’s not correct, then that might be an argument against having an int.) But doing this would probably be a whole lot of effort.

(Most of the above discussion should also apply equally to uint, the u suffix, u32, u64, and usize, in the case where the programmer knows that she wants an unsigned type, but not what size.)


#19

(FWIW I think we could have 0is and 0us be the suffixes for isize and usize, if we decide that i and u shouldn’t be.)


#20

I think the same but am a little concerned about is and us being two actual words with other meanings, especially is. In practice this should not be a problem though.

Also we can repurpose i/u to mean i32/u32.