Types as Contracts

What’s actually the problem with making a type-punned load equivalent to a cast? You can’t get type-punned loads in purely safe code anyway, We could e.g. say that integers written in safe code, when loaded as pointers, aren’t valid pointers. That would be quite brutal, but it would be sound (or, slightly less brutal, that if a value is stored to memory as an integer [in safe code?] and loaded as a pointer, this results in a non-dereferencable pointer).

Actually, the annoying part is that an “non-standard memcpy” could mix up your pointers with same-integer-value non-dereferencable pointers. But there are other reasons to prohibit non-standard memcpys.

What’s actually the problem with making a type-punned load equivalent to a cast?

A consequence of this choice is that redundant load-stores cannot be optimized away. That is what I tried to explain in my last comment above: This

int *foo(int *a) {
  int **p = &a;
  int x = *(int*)p;
  *(int*)p = x;
  return a;
}

should be optimizable to

int *foo(int *a) {
  return a;
}

However, if the load (when type-punned) performs a cast, this optimization removes a ptr-int-ptr cast, which must not be done.

I got another one. :wink: (I am dumping these counterexamples here mostly so that I know where to go if I search for them again in the future.)

An interesting question is whether comparing two pointers where one is one-past-the-end is allowed to be non-deterministic. That is, if I write p == q twice, am I guaranteed to get the same result? The C++ standard can be read to not guarantee this, while C (as pointed out by @comex) is more restrictive and seems to rule out non-deterministic comparison in this case. For the original example this makes no difference, but there is a problem with guaranteeing deterministic comparison. Consider the following example (by Juneyoung Lee, I didn’t come up with this myself):

int f() {
  int p[1], q;
  p[0] = 0;
  if (p+1 == q) {
       if (q == (int*)0x200) { // or `(int)q == 0x200`
         *(int*)0x1FF = 1;
       }
  }
  return p[0];
}

Most alias analysis will conclude that the function always returns 0. If pointer comparison is allowed to be non-deterministic, we can justify that by deciding that p+1 == q always returns false, no matter the actual addresses of p[0]. But following the C standard, making pointer comparison deterministic, means that if q actually sits at 0x200, the access to 0x1FF should be guaranteed to be valid and change p[0]. Essentially, comparing p with q “captures” p and makes it available at a concrete physical address.


Btw, one may even take the opinion that C++ allows such pointer comparisons to yield an indeterminate value, aka undef or poison in LLVM. That would be a problem for Rust, because comparison of raw pointers is allowed in safe code…

Even if LLVM decides to make it “just” non-deterministic, that would still be observable from safe code unfortunately. The following two functions would not be equivalent:

fn foo<T>(x: *const T, y: *const T) -> (bool, bool) {
  let b = x == y;
  (b, b)
}

and

fn foo<T>(x: *const T, y: *const T) -> (bool, bool) {
  (x == y, x == y)
}
1 Like

You example C code requires the C standard to guarantee that the following are equivalent: p + 1 and (int*)(((ptrdiff_t)p) + sizeof(*p)) where int *p;. And even then, you’d trigger UB due to the unaligned access to p.

Though, the only guarantees needed by Rust should be in the LLVM language reference, not the C standard.

I basically got this upside down. My "brutal" proposal was:

  1. every address in memory has an "is dereferencable pointer" bit.
  2. if a pointer is loaded from an address without that bit, you get a non-dereferencable pointer.
  3. pointer stores preserve the bit status of the original pointer.
  4. integer stores always leave the bit clear.
  5. integer to pointer casts create a dereferencable pointer, and are the only way of creating a new dereferencable pointer out of nothing.
  6. memcpy leaves the bit as it was before.

note 1: this disregards unaligned accesses, but they can probably be handled too.
note 2: transmute is equivalent to a type-punned load.

Derivative properties:

  1. having the "dereferencable" bit set always causes less UB than having it clear. This means that a load/store pair can always be removed without making code less defined.
  2. loads of dereferencable pointers always sync against either an int->ptr cast or a primitive pointer operation.
  3. All integers with the same numeric value should be equivalent.
  4. memcpy can't be implemented in Rust because you can't copy the dereferencable bits manually, but rather needs to be a compiler intrinsic.
  5. int->ptr cast becomes a "side-effectful" unsimplifiable operation. I'm not sure how bad it is from an optimization perspective, but I don't see a good way around it (this might mean we want to shift users who merely want to smuggle an integer inside a pointer, like slice iterators, to use a type-punned load).

Which example are you referring to?

So, essentially, this introduces a distinction between "absolute pointers" (created by casting from an integer) and "plain integers"? Do you have another class of values, like "abstract pointers"? (I think they are needed to explain things like the "inbounds" restriction.)

I don't entirely follow though; I don't see what this changes for my example. What happens on integer loads of a (non)dereferencable pointer?

Unsimplifiable in which way? Dead casts can still be removed, right? Just casting back and forth cannot be removed, which I agree with for different reasons.

By itself this only creates a distinction between "non-dereferencable pointers" (created by type-punning integers) and "dereferencable pointers" (created by more legitimate means). This moves the pointer provenance problem away from integer optimizations.

I'm still not sure what's the best way to handle pointers created from integers - I think that a permission-limited version of the intptrcast (which has side-effectful ptr->int casts and inter-object pointer comparisons) - where an "absolute pointer" can alias any locks that were open when it was created - could work while allowing enough optimizations.

With respect to "magic integer"s - aka Juneyoung Lee's example:
I don't think we sanely can make it undefined as long as we allow for same-numeric integers being identical. But I have always felt comparing pointers in different objects (e.g. the p+1 == q here) is "exactly as bad as" comparing these pointers as integers and "broadcasts" the pointer. We should be able to compare pointers to the same object without "broadcasting" it, but we should stop doing weird optimizations between pointers to different objects.

With respect to your old examples:

int *foo(int *a) {
  int **p = &a;
  int x = *(int*)p;
  // this stores a non-dereferencable pointer
  *(int*)p = x;
  return a;
// anyone who accesses `*a` will get a non-dereferencable pointer
}

Optimized:

int *foo(int *a) {
  return a;
// This returns a pointer which might be dereferenced
// under the same circumstances as `a`, which isn't less
// defined than a pointer that can never be dereferenced.
}

If you try to load through one of these non-dereferencable pointers (or a GEP of it), you get back UB (you can still round-trip it to an integer and get the same number value, and depending on how we describe int -> ptr casts, maybe also do a proper int -> ptr cast on that integer and get a pointer you can dereference).

That depends on how you figure out the semantics for them, which I'm not exactly sure how it would be done. I believe it's possible to have them removable.

Seems reasonable. LLVM does a bunch of optimizations a la "these pointers come from different malloc, they will compare unequal", though.

So you suggest even integer values track this "dereferencable" bit? That's not compatible with "bit-equal integers can be substituted" though, as that extra bit may still differ.

Not if you also allow substitution of bit-equal integers, as the first example shows.

That's different from "broadcasting" rules - broadcast/listen semantics mean that a "listening" pointer might always be "derived from" a "broadcast" pointer. We probably want to put some restrictions there to maintain the best set of reorderings.

unsafe fn foo(x: *const u32) -> u32 {
    let other = malloc(12);
    // this is a "broadcast" (because malloc is from
    // a different object than `x`), so it is side-effectful
    // and can't be removed in an "endo-optimization". 
    // It is equivalent to:
    // > {
    // >     if different_objects(x, other) {
    // >         broadcast(x);
    // >         broadcast(other);
    // >     }
    // >     raw_equals(x, other)
    // > }
    x == other;

    let ptr = 0x1337 as *mut u32; // this is the "listen"
    // and now listen is possibly derived from broadcast, so they might alias
    *ptr = 42;
    *x
}

I like the "ordered" broadcast/listen semantics in theory because they seem clear. I hope that these semantics (with the consequence of pointer comparisons not being dead-code-eliminable) don't actually affect performance so badly in real workloads, because pointer comparisons should be restricted to code that is already side-effectful, but I didn't investigate that too far.

No! Integers always have the dereferencable bit unset (that's written as "4. integer stores always leave the bit clear."), so the store *(int*)p = x; stores a non-dereferencable value into the address, irrelevant of what was loaded.

Sure. Casts are very significant if you care about permissions - you can't even do an "IFC" based model

unsafe fn is_ub_1012A() -> u32 {
    let mut x = 0;
    let n = (&x as *const u32 as usize);
    unsafe {
        // this is UB - `x` was never mutably borrowed,
        // but it is somehow aliased.
        *(n as *mut u32) = 1;
    }
    x
}

unsafe fn is_not_ub_1012B() -> u32 {

    let mut x = 0;
    let n = (&x as *const u32 as usize);
    unsafe { 
        // this is not UB, but if we just CSE p with n, and
        // dead-code eliminate the casts, we get to
        // the first example, which is UB.
        let mut p = (&mut x as *mut u32 as usize);
        assume(p == n); // CSE is OK here
        *(p as *mut u32) = 1;
    }
    x
}
int f() {
  int p[1], q;
  p[0] = 0;
  if (p+1 == q) {
       if (q == (int*)0x200) { // or `(int)q == 0x200`
         *(int*)0x1FF = 1;
       }
  }
  return p[0];
}

@arielb1 I can follow your reasoning (mostly), but I don’t think I like the consequences. I am happy with specifying casts back-and-forth to not be NOPs, so that int2ptr(ptr2int(x)) cannot be reduced to x. But I don’t think a cast or comparison whose result is not used should have any consequences (other than maybe removing UB), that seems rather counter-intuitive to me.

unsafe fn is_ub_1012A() -> u32 {
    let mut x = 0;
    let n = (&x as *const u32 as usize);
    unsafe {
        // this is UB - `x` was never mutably borrowed,
        // but it is somehow aliased.
        *(n as *mut u32) = 1;
    }
    x
}

unsafe fn is_not_ub_1012B2() -> u32 {
    let mut x = 0;
    let n = (&x as *const u32 as usize);
    unsafe { 
        // this is not UB, but if we just CSE p with n, and
        // dead-code eliminate the casts, we get to
        // the first example, which is UB.
        let mut p = (&mut x as *mut u32 as usize);
        // (desugared `assume`)
        let eq = p == n;
        if eq {
            *(p as *mut u32) = 1;
        } else {
            std::intrinsics::unreachable(); // trigger UB
        }
    }
    x
}

I think 1012A/1012B example shows that we can't have all of the following in a single IR:

  1. If every mutable borrow of an address is only used locally, the value behind that address can't be modified except directly through that borrow (this is implied by the vague principle that "every mutation must happen through a mutable borrow").
  2. equal integers are equivalent - if 2 integer values are numerically equal at every potential dynamic execution, they can be interchanged while preserving semantics.
  3. DCE of integer equality.
  4. Removal of a conditional with 1 branch being immediate UB ("assume-based optimization") - other conditional optimizations work too.
  5. Dead code elimination of both int-to-ptr and ptr-to-int casts

I think (1) is an important (and as your examples show, it might be replaceable with many other aliasing rules), and (2) (3) and (4) are important "integer" optimizations I would like to keep, so (5) is the odd one out.

If we want 1 IR to do all the high-level optimizations, I don't see a good way of getting around this proof. We could have a "2 IR" semantics - say, non-constant branches can never be eliminated in MIR and are the "point of broadcast", while a lower IR can remove them. We're probably going to have a "2 IR" semantics anyway - a peephole optimizer doesn't have to lower a dead int -> ptr cast to anything - but we need an "higher level" IR for inlining, and I think we want it to have 1 through 4.

Thanks, that is an interesting example! And sorry for the long delay. I had mostly thought about LLVM itself, where this distinction of “the address has been taken but not in a way that makes things mutable” does not come up.

It doesn’t seem to me that it’s the ptr-to-int cast whose DCE is the problem – it’s taking the address of x in a mutable way and getting that into a raw pointer. That is, after p got replaced by q in the conditional, it seems fine to turn &mut x as *mut u32 as usize into just &mut x as *mut i32 – but that one effect, of “making memory mutably accessible by raw pointer accesses”, cannot be removed. One could debate whether just &mut x is enough, but I think the idea that “casting to raw pointer means you potentially leak the address” makes sense. This kind of cast is significant. It would also explain why the same problem does not come up in LLVM, where there is no distinction between references and raw pointers.

Of course, it would be a shame if as a result of this, casting references to raw pointers for the purpose of comparing the addresses would result is pessimizing optimizations. Maybe, finally, reference comparison should get an intrinsic.

My feeling is that pointer to usize conversions and "cross-object" pointer comparisons are rarer than reference to pointer conversions - reference to pointer conversions occur at every unsafety abstraction boundary.

I’ve got a proposal that is a sort of a combination of the intptrcast model and @arielb1’s comment. Notably:

  • Properties 1-4 of the 5 shown to be inconsistent by 1012A/B hold.
  • Equal pointers are equivalent.
  • Pointer equality tests do not have side effects and can be eliminated if dead.
  • int-to-ptr casts do not have side effects, are always defined if correctly aligned (I want this for, e.g., ptr::dangling), and can be eliminated if dead.
  • ptr-to-int(int-to-ptr(x)) can be optimized to x.
  • Where possible, operations result in poison instead of producing undefined behavior, allowing more code movement.
  • All operations but int-to-ptr, even if they are side-effecting, are referentially transparent and idempotent. If you run them twice on the same values, you get the same results and can eliminate one of the calls.
  • Even with int-to-ptr, an equality check between the result of two int-to-ptr calls with equal values with give either poison or true, so can be optimized to true. Put another way, int-to-ptr(x) == int-to-ptr(y) can always be optimized to x == y, even if the casts are done at wildly different times.
  • Programs that need to treat pointers as integers but don’t need to reconstruct pointers from those integers can use transmute and skip the side effects of a ptr-to-int cast. This is useful in, e.g., hash tables.

The idea is that every pointer is represented as a combination of 3 things. Like intptrcast’s pointers, they have an allocation (id) that they are a part of and an offset in that allocation. Like @arielb1’s proposal, they have a validity bit, saying whether the pointer was constructed in a “reasonable” manner. When pointers are stored in memory, this extra information is stored in a side table, gets fetched an pointer load, is copied on memcpy (If we want to allow byte-by-byte memcpy instead of an intrinsic, I can see ways to do that), and is cleared on integer stores.

The validity bit serves two purposes. First of all, it makes it so that a pointer that used to be invalid can never suddenly become valid. Secondly, it provides a bit of laziness to the validation so that casting an integer to a pointer can never fail on its own.

I distinguish between casting pointers to integers with as and transmuting pointers to integers. The former does the “broadcasting” which allows for later conversion of integers into pointers, while the latter skips that step and just gets the integer value of a pointer.

I’ve been thinking of it in pseudocode, so that is probably the clearest description:

/// This allocation is for casts to pointers that don't result in a pointer to any real place. This
/// allocation always has `base = 0`, `size = 0`, `broadcast = true`, and `pointers = vec![]`.
const NO_ALLOCATION: AllocId;

struct Pointer {
    alloc_id: AllocId,
    offset: u128,
    /// This is the equivalent of @arielb1's dereferenceable bit.
    valid: bool
}

struct Allocation {
    ...,
    /// A unique identifier for this allocation. AllocIds are *never* reused.
    id: AllocId,
    /// This is analgous to whether a block is concrete in the intptrcast model.
    broadcast: bool,
    /// All allocations, even abstract ones, get a base address. In practice, the may
    /// be assigned lazily, but giving all allocations addresses ensures that two
    /// things can't magically switch from being unequal to being equal. Additionally,
    /// it allows for things like transmutes and type punning that cast to integers
    /// without broadcasting the allocation.
    base: u128, /* Really a usize, but in the target, not in the host. */
    /// The size of the allocation, in bytes.
    size: u128,
    /// Where valid pointers are stored in this allocation. These are stored
    /// as (offset, pointer) pairs. Using a simply dereferenceable bitset as
    /// @arielb1 suggested is insufficient because it doesn't deal with loads
    /// that read part of one pointer and part of another and it doesn't preserve
    /// what AllocIds those pointers come from.
    // This could also be implemented as in miri's current relocations.
    pointers: Vec<(u128, Pointer)>
}

impl PartialEq for Pointer {
    fn eq(&self, other: &Pointer) -> bool {
        if self.transmute_to_integer() == other.transmute_to_integer() {
            if self.alloc_id != other.alloc_id || !self.valid || !other.valid {
                POISON
            } else {
                true
            }
        } else {
            false
        }
    }
}

impl Pointer {
    /// Cast to an integer properly, using `as`.
    fn cast_to_integer(&self) -> u128 {
        if self.valid {
            ALLOCATIONS[self.alloc_id].broadcast = true;
        }
        self.transmute_to_integer()
    }

    /// Transmute a pointer to an integer.
    fn transmute_to_integer(&self) -> u128 {
        ALLOCATIONS[self.alloc_id].base + self.offset
    }

    /// Cast from an integer properly, using `as`.
    fn cast_from_integer(value: u128) -> Pointer {
        Pointer::transmute_from_integer(value)
    }

    /// Transmute an integer into a pointer.
    fn transmute_from_integer(value: u128) -> Pointer {
        // TODO: With typed pointers, this should probably check alignment.
        let alloc_id = ALLOCATIONS.iter()
                                  .find(|alloc| alloc.base <= value &&
                                               value - alloc.base < alloc.size) // FIXME: Should this be <=? That causes problems with adjacent allocations.
                                  .map(|alloc| alloc.id)
                                  .unwrap_or(NO_ALLOCATION);
        Pointer {
            alloc_id: alloc_id,
            offset: value - ALLOCATIONS[alloc_id].base,
            valid: ALLOCATIONS[alloc_id].broadcast
        }
    }
}

impl Memory {
    fn load_pointer(address: Pointer) -> Pointer {
        self.validate_access(address, POINTER_SIZE, POINTER_ALIGN);
        if let Some(&(_, pointer)) = ALLOCATIONS[address.alloc_id].pointers
                                                                  .iter()
                                                                  .find(|&(offset, _)| offset == address.offset) {
            pointer
        } else {
            Pointer::transmute_from_integer(self.load_integer(address, POINTER_SIZE, POINTER_ALIGN))
        }
    }

    fn load_integer(address: Pointer, size: u128, align: u128) -> u128 {
        self.validate_access(address, size, align);
        ... // actually fetch the data from the allocation
    }

    fn store_pointer(address: Pointer, value: Pointer) {
        self.store_integer(address, value.transmute_to_integer(), POINTER_SIZE, POINTER_ALIGN);
        ALLOCATIONS[address.alloc_id].pointers.push((address.offset, value));
    }

    fn store_integer(address: Pointer, value: u128, size: u128, align: u128) {
        self.validate_access(address, size, align);
        ALLOCATIONS[address.alloc_id].pointers.retain(|&(offset, _)| offset >= address.offset + size || offset + POINTER_SIZE <= address.offset);
        ... // Actually store the data
    }

    fn validate_access(address: Pointer, size: u128, align: u128) {
        if !address.valid {
            UNDEFINED BEHAVIOR;
        }
        if address.transmute_to_integer() % align != 0 {
            UNDEFINED BEHAVIOR;
        }
        if address.offset + size > ALLOCATIONS[address.alloc_id].size {
            UNDEFINED BEHAVIOR;
        }
    }
}

My idea was to only have a bit set on the start address of the pointer, and to clear it on all the middle addresses and the previous size_of(ptr)-1 addresses when assigning.

So a pointer assignment is

memory[addr:addr+WORD_SIZE] = data;
valid[addr-WORD_SIZE+1:addr+WORD_SIZE] = false;
valid[addr] = true;

Ah, that makes more sense. I don't think a literal Vec<bool> is completely sufficient for what I want to do, but `Vec<Option<(bool, AllocId)>> would preserve all the relevant information - the approaches are basically equivalent.

Now that I've read the intptrcast paper a few more times, I think I understand it and my own model better. The essential difference between the two models is that my proposal always allows for integers to be cast into pointers always (which is useful for ptr::dangling), the intptrcast model makes many int-to-ptr casts undefined behavior. They support and validate basically the same optimizations; mine just allows more programs.

Incrementally, the changes from the intptrcast model are

  1. Instead of raising undefined behavior in the ill-defined cases, return a poison value.
    • This actually increases the number of useful optimizations, since it allows speculating operations on pointers.
    • Dereferencing invalid/out of bounds pointers is still UB since it can trap in real implementations.
  2. Assign base addresses to all blocks, even abstract ones. Instead of denoting abstract blocks with an undefined address, we maintain an explicit boolean flag saying whether a block is abstract or concrete.
    • Use this to define transmute, memory type punning, align_offset, and any other methods that observe the integer value of a pointer through some "side channel".
    • Notably, since we don't make blocks concrete on transmute or similar things, trivial transmutes or load/store pairs can still be optimized away.
  3. Allow int-to-ptr casts that don't land in any block (even deallocated blocks). The result is a pointer in the same block as null, block 0.
    • This makes ptr::dangling valid.
    • The resulting pointer will still be out of bounds, since block 0 has zero size.
  4. Instead of returning poison from incorrect int-to-ptr casts, return a value tagged with a new "created from an invalid int-to-ptr cast" flag set.
    • This is mostly just a delaying tactic - almost all operations immediately return poison (or are UB, in the case of dereferencing) when given an invalid pointer.
    • However, ptr-to-int casts (as well as the side channels like transmute) are now defined, just returning whatever integer was used to create the invalid pointer.
    • Notably, ptr-to-int casts on invalid pointers don't make any blocks concrete.
    • This makes the new slice iterator valid.
  5. Allow comparisons between arbitrarily invalid pointers as long as the answer is false, i.e. if the integer value of the pointers are different.
    • Since this allows comparisons between out of bounds pointers, this allows comparisons to ptr::dangling(), making it much more useful.
    • As I described it in pseudocode, this actually sometimes makes comparison less defined than in the intptrcast model, since that supports comparisons within a block regardless of validity.

I'm unsatisfied with step 5. It feels too conditional -- code might be UB in one execution but not another, and ideally fundamentally invalid code will be UB as often as possible. In particular:

  • A comparison between a pointer from a deallocated region and a pointer from a later-allocated region will be poison if and only if the later-allocated region happens to be put on top of the deallocated region with the right overlap. Since the user has no control over this process, the code is wrong, but it usually won't be poison, so they might never notice.
  • int-to-ptr(x) == int-to-ptr(x) is true if and only if x happens to not land inside any allocation. Otherwise, it is poison. The user usually has no control over this.

Therefore, I think the definition of equality should be tweaked from what I wrote. I haven't settled on something I think is good yet, however.

@gereeter Wow, I should have replied sooner. So sorry. A lot has happened since then, and maybe you have seen this paper proposing a memory model for LLVM. There are a lot of parallels, in particular around “delaying” checks and using poison values instead of UB to enable code motion.

I think the key difference in terms of properties is that the model we propose there does allow DCE of ptr-to-int and int-to-ptr casts, and that comparing two pointers never yields poison. However, the comparison my be non-deterministic, and that is indeed one of the two big problems with that model, I think (the other one being out-of-memory situations; we cannot justify removing a malloc). I have yet to witness actually non-deterministic comparison in LLVM, but I have also not seen a formal model that would make comparison deterministic (and enable sufficiently many optimizations).

In terms of the definition, besides a more sophisticated model of a pointer, we do not have a piece of state that tracks whether an allocation has been “broadcast”. Instead, we have “twin allocations” and rely on non-determinism to argue why certain allocations cannot be accessed by pointers derived from integers.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.