On Casts and Checked-Overflow
The RFC 560 text includes the following new error condition (with associated "defined result"):
- When truncating, the casting operation
as
can overflow if the truncated bits contain non-zero values. If no panic occurs, the result of such an operation is defined to be the same as wrapping.
This raises a few questions:
-
How is the sign-bit treated? E.g. Is the sign-bit of a negative input value considered a truncated bit when e.g. casting an
i8
tou8
? (Note that this implies that-1_i8 as u8
can panic.) -
Are the truncated bits interpreted directly, or are they logically-inverted for a negative input value? For example, one might interpret the above text as saying that
-1_i16 as i8
can panic, since-1_i16 == 0xffff_i16
which is non-zero in its upper-eight bits.
As part of effort towards finishing off the Arithmetic Overflow Tracking Issue, I have spent some time working through the space of possible interpretations of this text, and have identified three potential interpretations that are each independently useful.
The goal of this post is to describe those three interpretations, and to provoke a dialogue about what interpretation (either of the three, or perhaps another I had not considered) Rust should use for overflow-checking of cast operations.
NOTE: Throughout this text I use literals, written in hexadecimal, in my examples; but the point is that you should think of them as representative for some code that just says <identifer> as <type>
, and I happen to be telling you what the value and type are via the numeric literal syntax, using hexademical to make it clear which bits are set to nonzero values.
Goals and Examples
The goal of the overflow-checking is to catch bugs: Cases where the programmer has not clearly expressed their intention, and may be inadvertantly throwing away important state in the cast operation. At the same time, we do not want to introduce an undue burden when writing casts that are "intuitively valid."
So for example, here are some simple cases where it seems useful to trigger a panic :
0x102_i16 as u8
: an unchecked-cast would yield2
, throwing away the high order bit. (This seems like the very definition of a dangerous truncation; consider e.g. casts fromu64
tousize
on 32-bit targets.)0x8003_i16 as i8
: again, an unchecked-cast would yield3
, throwing away the sign-bit (and the fact that this is a negative value of large magnitude).0xFFFE_u16 as i8
: this case is analogous to the previous one; here an unchecked cast would yield-2
, when the original input value was 65534.
But here are some examples where at least it is not so clear cut whether a panic is warranted (if not outright obvious that we should not panic):
0xFFFE_i16 as i8
: this is a cast of -2.- It seems like this should be entirely safe to cast to
i8
; but as noted in the questions above, ensuring we do not panic means we need to not treat the higher 8 bits as truncated here. - Note also that the input bit pattern here is the same as the one we gave in the third example above of where panic seemed warranted -- so only the type of the left-hand side is what makes the difference here.
- It seems like this should be entirely safe to cast to
0xFF_i8 as u8
: this is a cast of -1 to a range that cannot represent -1. However, I think one quite frequently encounters cases where one casts directly to the unsigned counterpart of a signed type in order to e.g. be able to do logical right-shift on the bits (i.e. shifting-in zeroes rather than the sign-bit).0x81_i16 as i8
: this is a cast of 129 to a range that cannot represent the value 129; but one can interpret the highest order bit as a sign-bit, yielding the signed value -127.- So, in one sense, no bits of information have been lost (and thus there has been no truncation).
- But in another sense, the denotation of the value has been completely changed, and thus perhaps a panic is warranted.
Terminology
In the text below I use some technical phrases, which I will define here:
- An integral type is one of the types
iN
oruN
for someN
in {8
,16
,32
,64
,size
} - The bitwidth of an integral type
iN
oruN
isN
. (Note that the bitwidthsize
is considered distinct from both32
and64
, regardless of the target architecture's address size.) - The phrase "the signed version of
I
" (for some an integral typeI
=iN
oruN
) denotesiN
- The phrase "the unsigned version of
I
" (for some an integral typeI
=iN
oruN
) denotesuN
- Unless specified otherwise,
t
is some integral type. - Unless specified otherwise,
x
is an identifier that has some integral type (which may or may not be equal tot
). - The phrase "The mathematical value of
x
" means the value ofx
when interpreted (according to its type) as an signed integer of arbitrary precision. Thus: - the mathematical value of
0xFF_i8
is -1 - the mathematical value of
0xFF_u8
is 255 - the mathematical value of
0x8000_i16
is -32,768 - the mathematical value of
0x8000_u16
is 32,768 - The phrase "
x
falls in the range oft
" means the mathematical value ofx
falls in the closed interval [min,max], where min and max are the mathematical values oft::MIN
andt::MAX
The Three Interpretations
So, with that in mind, here are the three interpretations I have identified:
- "Strict Range" -
x as t
may panic unlessx
falls in the range oft
. - "Width Oriented" -
x as t
may panic unless either- the bitwidths of the type of
x
andt
are equal, or x
falls in the range oft
- the bitwidths of the type of
- "Loose Range" -
x as t
may panic unless eitherx
falls in the range of the signed version oft
, or
x
falls in the range of the unsigned version oft
(There may exist other interpretations of the text beyond these three, but these were the ones that I identified that seemed potentially useful.)
Some examples:
- All three interpretations say that
-1_iN as i8
can never panic, for anyN
, because -1 falls within the range [-128,127]. - "Strict Range" and "Width Oriented" both say that
-1_i16 as u8
can panic, since -1 falls outside the range [0,255] - "Strict Range" says that
-1_i8 as u8
can panic, for the same reasoning as above. - However, "Width Oriented" says that
-1_i8 as u8
can never panic, because the bitwidths of the input and output types are equal. (Note that this implies that-1_i16 as i8 as u8
can never panic, even though-1_i16 as u8
can.) - "Loose Range" says
-1_iN as u8
can never panic (for anyN
), since -1 falls in the range [-128,255].
No two of the three interpretations are semantically-equivalent; for any two interpretations, there exist inputs where the panic behavior may differ (as illustrated in the examples above).
Comparison
Let's assume that one of the above three interpretations is the one we desire. The question is: Which one?
My current preferential ordering (most preferred first) is:
- "Width Oriented"
- "Loose Range"
- "Strict Range"
I believe a good solution must support -1_i8 as u8
; at the same time, I think Rust should be allowed to panic on 129_i16 as i8
(more on this latter point in a few paragraphs).
"Width Oriented" is at the top because it satisfies both of the previously listed conditions, and general, appears to have desirable behavior (see the illustrative implementation linked below).
"Strict Range" is at the bottom: I think 0xFF_i8 as u8
should not be allowed to panic; we need to make it easy to do bit-oriented computations between types of equivalent bitwidth especially when its not losing any actual bits of information.
"Loose Range" is in the middle because it beats "Strict Range" (by supporting -1_i8 as u8
), but it is not at the top because I think it is strange.
- At first I thought "Loose Range" was a strong contender because it seems very uniform. But consider the cast
0x81_i16 as i8
(aka129_i16 as i8
): the "Loose Range" interpretation allows this (i.e. will never panic), and converts the value 129 to -127. The "Width Oriented" interpretation, on the other hand, allows a panic to occur here, since 129 falls outside the range [-128,127]. - Note that "Width Oriented" allows a panic on
0x81_i16 as i8
and0x81_u16 as i8
but forbids panic on0x81_u8 as i8
; in all cases the input value is 129, but the relevant difference is the type. We consider it safe to do theu8
toi8
cast, because we assume that the matching bitwidths indicates that the reinterpretation of the sign bit-is intentional. - Another way to look at this whole situation is that the "Width Oriented" avoids a truncation of the sign-bit in such a case.
Illustrative Implementation
The following linked gist has some code illustrating the three strategies and their behavior on various boundaries cases when casting to i8
or u8
, as well as a transcript of the code running.