[off-topic] uintptr_t round tripping in C

From "Tootsie Pop" model for unsafe code:

(Trying out Discourse's "reply as linked topic" feature, since this has little to do with the source discussion or even Rust, but I feel like it's still worth replying publicly in case anyone cares.)

Yikes. I didn't even think of the possibility that the standard wording could be twisted that way. I'm not sure it actually holds up though, because the standard also says (6.5.9.6):

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

Since the original and integer-round-tripped pointers compare equal, one of those must be true. Assume the original pointer is dereferenceable (not null or one-past-end); if it happens to point to the last element of an array, a pathological implementation could claim that the round-tripped version has "one-past-end" provenance, but in every other case, the two pointers must be "pointers to the same object".

6.5.3.2.4 is somewhat vague about what can be dereferenced:

If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.102)

  1. [...] Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime.

I suppose it's not explicitly ruled out that something can be a "pointer to the same object" yet also an "invalid value" – but by that logic *&x would also not be guaranteed to work, since the & operator's definition merely states that it "yields the address of its operand" (6.5.3.2.3), without any explicit language that the pointer value is "valid".

This can be distinguished from at least the most common cases where valid and invalid pointers can happen to compare equal on standard systems:

  • freed pointers: have indeterminate value; the value might or might not be allowed to change by itself, which would provide an excuse for dereferencing being undefined, but in any case doing so is explicitly banned in a footnote to 6.5.3.2.4:

    Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime.

  • one-past-end pointers: explicitly banned by 6.5.6.8:

    If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

Though I did find a GCC bug report where something similar (if less useful) is currently broken by optimizations; there's some disagreement about standard wording in there, but discussion seems to have petered out without deciding whether GCC's behavior is justifiable or not.

Anyway, it doesn't matter. Most programs that cast from uintptr_t to pointer don't do so just to test for equality. If your hardware doesn't support that, it will break those programs no matter what the standard says. A more useful question is how popular such casting is: I think most uses of uintptr_t are one-way (for hashing, printing, testing alignment, etc.), so it's hard to figure that out with a simple grep, but certainly many are not.

Since this is a Rust forum – is there anything in Rust's stable API that inherently requires usize round-tripping to work? Is there any possibility that linting on casting/transmuting from integer to pointer is a good idea? I suspect that hardware that doesn't support it is too obscure for this to be the case, but I could be wrong.

I’d be incredibly surprised if there was any stable API that required it to work; my main reason for posting it was that the sole API reason to push the unsafe boundary wider than the module is to support it.

The maintenance reasons - that refactoring things into different modules could change unsafe behavior - is not especially convincing IMO, and when the only API reason is on shaky ground…

The only reason one would want to go *T -> usize -> *T (rather than just sticking to *const T or *mut T) would be to take advantage of the numeric value, i.e. to perform math on it, and then dereference the derived value. THAT is wildly unsafe no matter how you slice it, because anything you can do that way could either be done with mem::transmute and “safer” pointer arithmetic, or is blatantly risking UB (i.e. pointer subtraction, which Rust lacks for Very Good Reasons).

If you never alter the usize - which is the only case in which ANY behavior is guaranteed - there’s no point to it anyway.

Also, there’s the whole thing about what “object” means - the object isn’t the data you read out, it’s the allocation. Pointers into different objects must not be equal, but a pointer into an object does not necessarily mean the ability to read from that object.

My understanding is that the CHERI thing takes advantage of the “subobject at its beginning” bit - a zero-length subobject, which thus cannot have any bytes read from it.

Have a look at the implementation of Once which uses an AtomicUsize to hold a pointer and some state in the low two bits (which are always 0 due to alignment).

Ah, perhaps I should clarify: While implementations of stable APIs may well rely on it, I’d be incredibly surprised if it was exposed in the APIs themselves (and was thus uncorrectable).

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.