A tale of two's complement

ix/ux are interesting and logical and better than imem/umem in isolation, but I am afraid they may be too short and may collide with variable names. How about intx/uintx, with suffixes ix/ux?

1 Like

I find it surprising that nobody mentioned C/C++'s ptrdiff_t (which is the signed version of size_t) ptrdiff_t documentation size_t documentation

I would like int to be renamed as idiff and uint to be renamed as usize. I think that these 2 names are suggestive of their uses. This should also encourage using unsigned types where appropriate.

1 Like

Actually size_t etc are discussed in the various RFCs. (See the most recent RFC 464 as a starting point.)

Also, I think ssize_t is the signed version of size_t. C/C++ give their int type names semantics other than their sizes. Just because ptrdiff_t and ssize_t may be aliases of each other, doesn’t mean they should be treated as the same thing.

But we have only one arch-dependent signed/unsigned integer type pair in Rust, so we should avoid using “non-generic” names.

@CloudiDust

I did not say that size_t wasn’t mentioned before, but I said that ptrdiff_t wasn’t.

AFAIK ssize_t is not in the C/C++ standard.

I am not sure what are you trying to tell me.

So I read the doc on size_t and ptrdiff_t and did a bit googling and found that, I misunderstood both ptrdiff_t and ssize_t, and -1 is the only negative value that ssize_t is required to be able to represent according to the standard. Seems that when I did C/C++ back then, I was using the wrong type.

Because of misleading names. And many tutorials I read back then contained misinformation.

All the more reason to rename int/uint.

1 Like

If ssize_t actually can’t represent negative values other than -1 (to signal an error), I’d use Option<usize> in rust. My proposed name should discourage such misuse.

1 Like

@theme, my point about ssize_t is that, I always assumed that it is for storing “index offsets for containers”, but actually the standard doesn’t agree with me. I was misguided by the tutorials I read. So, we should prevent the same thing from happening to Rust. (I was actually making this point clear to the OP, not arguing with you, sorry for the confusion.)

But I don’t agree with the names idiff/usize. As the OP said:

“Ultimately, pointer-sized integers are used for many purposes (indices, sizes, offsets, pointers), and we don’t want to favor one of those uses over the others in the name.”

I agree with that. And idiff/usize is too specific.

The only semantic meaning of int/uint in Rust is that they are pointer-sized. Just like i32/u32 are 32bit-sized. But they are different from intptr_t/uintptr_t in that they are not only used for “intergers casted from pointers”.

imem/umem are better in this regard, they mean “integers for memory-related things”, but not too specific.

Still they are not ideal because some would think that “int memory”/“unsigned int memory” don’t seem right and don’t look like integer types at all. Personally I don’t find this a problem, but still …

Inspired by @liigo, I now think intx/uintx are better than imem/umem.

Because:

  1. It is clear that they are integer types.
  2. They are consistent with the other interger types in that they follow the signed-ness + size pattern.
  3. x signifies the fact that their exact size is in a sense “unknown” or “varies”, or, platform-dependent.
  4. for better or worse, intx/uintx somewhat look like index/uindex.
  5. Newcomers will ask the question: “why is there an x?” And they’ll actually read the manual.
4 Likes

Okay, I had enough time understanding the consequence of this change, in particular the undefinedness of overflows. (I personally don’t care about uint and int, given that the implied changes to the “undefined behavior” are actually more important. I’ll refrain from such bikeshedding for now.) Huon in IRC had clarified the intention of this change, so I’d like to rephrase his words to clear things up. Any mistake in rephrasing is my mistake.

Overflow is unspecified but defined. This is not same to C/C++'s Undefined Behavior (UB), nor LLVM’s Undefined Value (UV) which is represented as undef in IR. It is unfortunate that Niko used a term “undefined value” for this behavior, since LLVM UV can actually trigger UB (for example, by using it in the conditional branch). The important distinction between UB/UV and the unspecified behavior is that the optimizer cannot take advantage of such unspecified behavior. The compiler can silently wrap integers or insert a panic on overflow, but once it got fixed (in rustc jargon: after trans) this behavior should not be changed. This implies that…

Non-debug builds will be (hopefully) same to the current builds under the same set of flags. The compiler will prefer the fastest option on the non-debug builds. Modern architectures (okay, barring from Mill) strongly prefer the wrapping behavior, which matches the current Rust behavior, so the actual compiled code won’t change (hopeefully). In fact the presence of Mill justifies this choice: if the future architecture has a good support of hardware overflow check, the unspecified behavior can be changed to match that.

Dynamic overflow checks are here to stay, even in non-debug builds. It is the essential part of memory safety, so it won’t go away. The optimizer can remove redundant checks but it should not change the meaning of the program. This one is already explained above, in the “Can’t overflow sometimes cause crashes?” section.

You should be always careful in testing overflows. Mandatory overflow checks make testing somewhat more robust, but only when the test is explicitly designed to trigger overflow. (This was the final disagreement between I and Huon after the clarification. I’m glad this one is relatively minor.) Rust will only prevent the worst-case scenario in the case of overflow, as it is now. The importance of implicit and explicit testing doesn’t change even after this change.

2 Likes

The new renaming proposal (int/uint -> intx/uintx) has been filed as RFC PR 544.

1 Like

I too think that renaming int/uint is a good idea. Because umem might be a little bit confusing for beginners and names like idiff are too specific I think the intx and uintx suggested by @CloudiDust are a good idea. Through maybe intp/uintp would be bether with p standing for platform and also sounding a little bit like pointer

@naicode, IMHO, intp/uintp have the same problem as iptr/uptr, that is they may also be mistaken for “integer types that should store casted pointer values”, ie. intptr_t/uintptr_t. Some may think that they suggest a specific use case, and this is not desirable.

Unspecified values are still fun. With unspecified values a compiler in each particular case can assume an overflow/underflow result to be any specific value it finds convenient. For example, in this snippet

// x has type int
if x + 1 < x {
    /*...*/
}

a compiler can choose saturating behavior, i.e. INT_MAX as the overflow result, and completely eliminate the if block based on this choice. This is quite similar to what C compilers currently do with “undefinedly behaving” overflows.

I guess, to completely prohibit such optimizations overflow still have to be partially specified. For example it can be defined as “wrapped value or no value at all (panic, signal, abort, whatever)”. With this definition user will never be able to assume wrapping behavior and compiler will not be able to perform dangerous optimizations.

P.S. It would be useful to somehow separate this thread into two parts - bikeshedding about renaming int/uint (hopeless, imho) and everything else.

2 Likes

+1 for this totally clear and specific definition for the result of integer overflow.

I'm happy that we're taking these steps towards better-is-better in the final stretch.

With regards to int/uint, see my comment on @CloudiDust's RFC. I think the other two issues were the more important ones, but I hope this decision will also be reconsidered with the newly suggested alternatives intx/uintx or intp/uintp, which address some of the concerns (unfamiliarity) with the previously suggested names.

With regards to the definedness of overflow, please read the RFC and accompanying discussion thread. I tried to make it painfully clear that this is not intended to allow the compiler to optimize based on it. That would be crazy, since the whole point is to increase correctness and reliability, not decrease it! The fact that LLVM's undef cannot be used to represent an overflowing value is explicitly noted in the RFC. What @lifthrasiir wrote is basically correct, except that overflow checks are not an essential part of memory safety (otherwise current Rust would be memory-unsafe!); we have a lot of liberty in terms of what knobs we offer to end-users to control where and when run-time checks get enabled (for which the only relevant factor that's considered should be performance). The decision in this thread's OP is to use a debug/release build distinction, which I'm fine with, at least initially; later we can add more fine-grained options like scoped attributes.

More concretely, I think the semantics of overflow should be to either signal an error somehow, or to lower directly to the platform's native hardware instructions (which usually means two's complement these days). There should also be room for things like delayed exceptions, "as infinitely ranged" semantics / what Ada does (see RFC).

Aren't you concerned about de facto lock-in for overflow?

There is certainly a concern that we will not be able to make overflow stricter without breaking code in the wild. We believe that having overflow checking be mandatory in debug builds should at least mitigate that risk ...

I think another fact which will work in our favor is simply that wraparound is so rarely the desired semantics (again, basically only for hashes and checksums), so it's unlikely that people will end up relying on it for this reason alone. The only other situation where it's useful is the associativity of addition and substraction operations (i.e. maybe one operation overflows, but the next one "brings it back in range"); this is a more legitimate worry, but hopefully this is something that could be accomodated longer term with some AIR-like advancements.

Finally, I just want to note again that while all of the emphasis tends to be put on overflow, I suspect that the more important benefit will actually be around underflow of unsigned types. As I noted in a comment on the RFC, C++ style guides tend to recommend int even for may-only-be-positive values, simply because you can at least add asserts, while underflow of unsigned types is silent and undetectable. But with this, if we use an unsigned type, all of the shouldn't-become-negative asserts in all of the relevant places get added for us automatically! So using unsigned types suddenly becomes meaningful again.

One more thing: why was a WrappedInt type considered a better alternative than a WrappingOps trait implemented by the built-in integer types? (For reasoning in favor of the latter, see RFC.)

One more thing: why was a WrappedInt type considered a better alternative than a WrappingOps trait implemented by the built-in integer types? (For reasoning in favor of the latter, see RFC.)

I think the choice here will be made on a case-by-case basis: For hash and rng algos, I can see Wrapping<T> being more convenient, because expressions are often complex, so encoding them with functions/methods would be very unergonomic.
For one-off operations, WrappingOps may be better because you don't need to convert values to "wrapping" types and back.

In any case, I imagine that the low-level implementation of wrapping ops in the language will be via a bunch of intrinsics, such as wrapping_add_i32(a:i32, b:i32) -> i32, and everything else will be implemented in the library.

I believe here that ROP is referring to return oriented programming a technique often used in exploits link here. I believe the point they were making is that defined behavior is not always desirable behavior.

1 Like

If the intent is to prohibit optimizations, be formal about it and don't use "unspecified values", use another terminology. Above I showed how unspecified values allows "shady" optimizations similar to optimizations based on UB. If it were C I'd use "implementation defined value or no value ("signal an error somehow", as you said)", but in Rust there's nothing "implementation defined" here, Rust already postulates 2's complement everywhere else (e.g. signed shifts), so you can freely use "no value or 2's complement wrap-around". Repeating my previous message - with this definition user will never be able to assume wrapping behavior and compiler will not be able to perform dangerous optimizations. (And this is similar to what @nodakai suggested, but less restrictive since it doesn't mention debug at all).

Secondly, please address the next questions, related to overflows and underflows. I bother you, as the author of the RFC, since the core team seems to be on vacation or just doesn't read this thread.

  1. Is unary minus allowed on unsigned numbers?

Currently it assumes wrapping behaviour, but with new rules it will be the only arithmetic operation on unsigned numbers with this behaviour. It'd be extremely inconsistent to have -1u as well-defined wrap-around and 0u - 1u as panic.

If the intent is to eradicate modulo arithmetic from unsigned numbers, then, I'm afraid, unary minus should be nuked too and the operation, if needed, should be performed with a named function. It would be very sad to lose -1 as "the maximum value of unsigned type", but acceptable, I guess.

  1. Conversions between signed and unsigned types

Does -1i as uint panic or not? Does 0xFFu8 as i8 panic or not? To be consistent with everything else, they should panic and only half of the range of any signed (unsigned) type should be convertible to their unsigned (signed) counterparts. Again, if a wrapping operation is needed, it should be performed with a named function.

After thinking about these two questions, a new set of symbolic operators with wrapping behaviour looks more and more tempting..

I opened up a new discussion on this topic at http://discuss.rust-lang.org/t/restarting-the-int-uint-discussion/1131 with a lot more detailed analysis.

It’s not a “detailed analysis”. It’s yet another rehash of misinformation and lacks all of the prominent view points on this issue with any consensus behind them.

2 Likes

With the new range syntax, there was a kind of “enumeration” trait introduced, Step, but it doesn’t consider enumerations that might not have a successor value, i.e. it does not handle types without wraparound.

It seems problematic to me (bug report here) – but it is ultimately tied to how we want to handle overflow.

Is it the user’s responsibility when typing 0i8.. to not take more elements from the range iterator than possible?

Will this blow up in debug builds only – (that risks that clueless users will conclude they just need to use release builds), etc?