A tale of two's complement

A couple of questions:

Is two’s complement still guaranteed, e.g. how do signed shifts work now? How does unary minus work on unsigned types? Does let a = -1u still wrap, or it panics/gives unspecified result now, like let a = 0u - 1u, or unary minus can’t be applied to unsigned types at all? What optimization opportunities unspecified values (overflow/underflow results) provide for a compiler?

Also, unsigned wraps -1 + 1 -> 0 and 0 - 1 -> -1 are used quite often in C in scenarios like:

// Reverse loop
for (size_t i = size; i != size_t(-1); --i) { /**/ }

// Loop counter
size_t count = -1;
while (/**/) {
    ++count; // Always incremented
    /*Some code with breaks and continues*/
}

It will be a pedagogical problem to teach people not to rely on them.

FYI although signed overflow is undefined behavior in C, unsigned overflow is well defined (which is weird).

This is surely the last time to have a proper integer name reflecting what it is in Rust (unlike other languages) and to not reproduce the same mistake.

Far less ambiguous than int/uint, especially for imem/umem.

The unfamiliarity is one of the reason for the renaming. The int in Rust is not like the int in C or other languages. The beginners (or not) will surely fall in that trap.

Moreover, like already said, this is the unique opportunity to fix the current code base from integer misuse (like found in stdlib, Servo and a lot more current code).

As seen in the RFC 464, the vast majority of interested people are in favor of this breaking change.

5 Likes

There are things in the standard library that currently use int or uint for no good reason. For example, enum_set::CLike or the exp argument of Int::pow.

Is there a plan to do a pass on the libraries to changes these to fixed-size types?

I’ve often did things a certain way just because the standard library did something similar. It should provide a good example.

1 Like

I'm not entirely sure. The name of int is too tempting and education cannot perfectly change the people's mindset. I believe there would be endless questions of "why does this not work?"

I thought the name came from a precedent (intptr_t) in C? We've generally followed the precedents to name a thing, although I have a preference on imem.

I think using int as a different meaning from other languages leads to much higher barrier than using some explicit name.

6 Likes

It doesn’t invoke undefined behaviour itself. It produces an undef result, which can easily cause undefined behaviour because the value can vary between each read. For example, using it for indexing an array can result in a different value for the underlying pointer arithmetic than the value that was bounds checked.

1 Like

That’s just semantics: in any sane definition of Rust, the function

fn saturating_divide(d: uint, s: uint) -> uint {
    match s {
        0 => uint::MAX,
        n => d/s
    }
}

should never panic, but if we have LLVM undef-s, you could easily get a divide-by-zero. While this isn’t undefined behaviour by LLVM semantics, memcpy-ing a carefully crafted ROP payload into the stack isn’t undefined behaviour by x86 semantics either, but it is not desirably safe behaviour for a high-level language.

1 Like

Welp. I didn’t realize that. So it’s not memory safe at all - that’s not going to work.

I know, that’s why I reported these bugs as soundness issues. It could be fixed without adding a branch that can’t generally be optimized out. LLVM just needs to add an instruction attribute providing a way of having it return an unspecified value instead of an undefined value.

1 Like

I think it’s a mistake to not rename int.

Some good options:

  • isize, usize (from C size_t)
  • nint, nuint (from Xamarin C# nint/nuint)
  • i32_64, u32_64 (from Rust i32/i64/u32/u64)
3 Likes

How about iwrd / uwrd?

It’s a wrong name too since a word is not always the pointer size. IMO iptr/uptr are the best name (with maybe index as an alias of iptr), but any name different than int would be better.

I think this looks great, but another +1 to renaming int/uint. As others have said, these are too tempting as default types (I personally still use them when I don’t want to bother thinking about sizes). Yes, it’s less intuitive to new users to rename them. But the types themselves are unintuitive - we shouldn’t make them look like they’re not. Making things less approachable is not an inherently bad thing, because the questions it forces users to ask are worth asking (and answering). In reality, there really isn’t a catchall default-appropriate integer type, and we shouldn’t pretend there is when it’s misleading to do so.

As for breaking literally all rust code, that’s what pre-1.0 is specifically for: not worrying about backward-incompatible breakages when there’s a valid reason to break things. I contend that here there is.

3 Likes

fail! to panic! and namespaced enums both landed recently, both broke the world, and both seemed less necessary to me than renaming int. Add me as one more person asking to please not use breaking the world as an excuse not to do the right thing.

4 Likes

IMHO, fallbacking to i32/u32 without renaming int/uint will give the wrong impression that int/uint are synonyms of i32/u32, because that’s what some popular languages are doing. (In practice that means at least C/C++ on 32/64 bit systems, and C#.) And yes, int/uint are too “default looking” anyway.

I don’t see how giving wrong impressions is beginner friendly. If it is different, it should look so.

Another +1 to renaming. FWIW, I now prefer imem/umem.

1 Like

I’ll accept that criticism. iadr / uadr? For “address-width”

I agree with OP that iptr/iaddr/ioffset/index etc are all too specific. C/C++ have specialized types like intptr_t/offset_t/size_t, even though some may be aliases of each other. But Rust has only two arch-dependent integer types, and they are used in many different contexts, so we should use names that are generic enough, but still different from int/uint.

That’s why I am now for imem/umem.

Talking about index types brings a question to mind: can we have vectors with custom index types, e.g.

let mut v = vec<f64, index_t = MyTime>::new();
...
v[MyTime::mod(time + offset, lag_length)] = x;

I have used a similar mechanism in a C++ project to good effect (increased type safety), unfortunately requiring a bit of hackery.

If we must rename int/uint, I’d prefer iptr/uptr or ix/ux (here x means non-fix-sized bits, maybe 8/16/32/64, depends on platforms/machines. think how we named i8/i16/i32/i64 and u8/u16/u32/u64).

3 Likes

I want to thank the core team for giving these integer proposals in-depth consideration, experimenting with names, and deciding! I'm not thrilled with the int/uint names, but such is life. (Would the names ix/ux or i/u have fared better?)

On the topic of breaking changes, note that this was not a risk-reduction choice. (It's more risky than having to revisit all uses of int/uint to find those that should become WrappingInt or int32.)

@arielb1 could you please add background for those of us who're unfamiliar with LLVM and x86 internals? ROP = Raster Operations Pipeline?