I’ve got a proposal that is a sort of a combination of the intptrcast model and @arielb1’s comment. Notably:
- Properties 1-4 of the 5 shown to be inconsistent by 1012A/B hold.
- Equal pointers are equivalent.
- Pointer equality tests do not have side effects and can be eliminated if dead.
-
int-to-ptr
casts do not have side effects, are always defined if correctly aligned (I want this for, e.g., ptr::dangling
), and can be eliminated if dead.
-
ptr-to-int(int-to-ptr(x))
can be optimized to x
.
- Where possible, operations result in poison instead of producing undefined behavior, allowing more code movement.
- All operations but
int-to-ptr
, even if they are side-effecting, are referentially transparent and idempotent. If you run them twice on the same values, you get the same results and can eliminate one of the calls.
- Even with
int-to-ptr
, an equality check between the result of two int-to-ptr
calls with equal values with give either poison or true, so can be optimized to true. Put another way, int-to-ptr(x) == int-to-ptr(y)
can always be optimized to x == y
, even if the casts are done at wildly different times.
- Programs that need to treat pointers as integers but don’t need to reconstruct pointers from those integers can use
transmute
and skip the side effects of a ptr-to-int
cast. This is useful in, e.g., hash tables.
The idea is that every pointer is represented as a combination of 3 things. Like intptrcast’s pointers, they have an allocation (id) that they are a part of and an offset in that allocation. Like @arielb1’s proposal, they have a validity bit, saying whether the pointer was constructed in a “reasonable” manner. When pointers are stored in memory, this extra information is stored in a side table, gets fetched an pointer load, is copied on memcpy
(If we want to allow byte-by-byte memcpy
instead of an intrinsic, I can see ways to do that), and is cleared on integer stores.
The validity bit serves two purposes. First of all, it makes it so that a pointer that used to be invalid can never suddenly become valid. Secondly, it provides a bit of laziness to the validation so that casting an integer to a pointer can never fail on its own.
I distinguish between casting pointers to integers with as
and transmuting pointers to integers. The former does the “broadcasting” which allows for later conversion of integers into pointers, while the latter skips that step and just gets the integer value of a pointer.
I’ve been thinking of it in pseudocode, so that is probably the clearest description:
/// This allocation is for casts to pointers that don't result in a pointer to any real place. This
/// allocation always has `base = 0`, `size = 0`, `broadcast = true`, and `pointers = vec![]`.
const NO_ALLOCATION: AllocId;
struct Pointer {
alloc_id: AllocId,
offset: u128,
/// This is the equivalent of @arielb1's dereferenceable bit.
valid: bool
}
struct Allocation {
...,
/// A unique identifier for this allocation. AllocIds are *never* reused.
id: AllocId,
/// This is analgous to whether a block is concrete in the intptrcast model.
broadcast: bool,
/// All allocations, even abstract ones, get a base address. In practice, the may
/// be assigned lazily, but giving all allocations addresses ensures that two
/// things can't magically switch from being unequal to being equal. Additionally,
/// it allows for things like transmutes and type punning that cast to integers
/// without broadcasting the allocation.
base: u128, /* Really a usize, but in the target, not in the host. */
/// The size of the allocation, in bytes.
size: u128,
/// Where valid pointers are stored in this allocation. These are stored
/// as (offset, pointer) pairs. Using a simply dereferenceable bitset as
/// @arielb1 suggested is insufficient because it doesn't deal with loads
/// that read part of one pointer and part of another and it doesn't preserve
/// what AllocIds those pointers come from.
// This could also be implemented as in miri's current relocations.
pointers: Vec<(u128, Pointer)>
}
impl PartialEq for Pointer {
fn eq(&self, other: &Pointer) -> bool {
if self.transmute_to_integer() == other.transmute_to_integer() {
if self.alloc_id != other.alloc_id || !self.valid || !other.valid {
POISON
} else {
true
}
} else {
false
}
}
}
impl Pointer {
/// Cast to an integer properly, using `as`.
fn cast_to_integer(&self) -> u128 {
if self.valid {
ALLOCATIONS[self.alloc_id].broadcast = true;
}
self.transmute_to_integer()
}
/// Transmute a pointer to an integer.
fn transmute_to_integer(&self) -> u128 {
ALLOCATIONS[self.alloc_id].base + self.offset
}
/// Cast from an integer properly, using `as`.
fn cast_from_integer(value: u128) -> Pointer {
Pointer::transmute_from_integer(value)
}
/// Transmute an integer into a pointer.
fn transmute_from_integer(value: u128) -> Pointer {
// TODO: With typed pointers, this should probably check alignment.
let alloc_id = ALLOCATIONS.iter()
.find(|alloc| alloc.base <= value &&
value - alloc.base < alloc.size) // FIXME: Should this be <=? That causes problems with adjacent allocations.
.map(|alloc| alloc.id)
.unwrap_or(NO_ALLOCATION);
Pointer {
alloc_id: alloc_id,
offset: value - ALLOCATIONS[alloc_id].base,
valid: ALLOCATIONS[alloc_id].broadcast
}
}
}
impl Memory {
fn load_pointer(address: Pointer) -> Pointer {
self.validate_access(address, POINTER_SIZE, POINTER_ALIGN);
if let Some(&(_, pointer)) = ALLOCATIONS[address.alloc_id].pointers
.iter()
.find(|&(offset, _)| offset == address.offset) {
pointer
} else {
Pointer::transmute_from_integer(self.load_integer(address, POINTER_SIZE, POINTER_ALIGN))
}
}
fn load_integer(address: Pointer, size: u128, align: u128) -> u128 {
self.validate_access(address, size, align);
... // actually fetch the data from the allocation
}
fn store_pointer(address: Pointer, value: Pointer) {
self.store_integer(address, value.transmute_to_integer(), POINTER_SIZE, POINTER_ALIGN);
ALLOCATIONS[address.alloc_id].pointers.push((address.offset, value));
}
fn store_integer(address: Pointer, value: u128, size: u128, align: u128) {
self.validate_access(address, size, align);
ALLOCATIONS[address.alloc_id].pointers.retain(|&(offset, _)| offset >= address.offset + size || offset + POINTER_SIZE <= address.offset);
... // Actually store the data
}
fn validate_access(address: Pointer, size: u128, align: u128) {
if !address.valid {
UNDEFINED BEHAVIOR;
}
if address.transmute_to_integer() % align != 0 {
UNDEFINED BEHAVIOR;
}
if address.offset + size > ALLOCATIONS[address.alloc_id].size {
UNDEFINED BEHAVIOR;
}
}
}