Would something like this work as a memory model?..
There is a set of memory allocations. These could refer to allocations on the
heap or on the stack, whatever. An allocation consists of a base address and a
type, the base address however is purely abstract - it is represented as a free
variable of type usize
, not as some specific value.
More specifically, the base address is represented by 64 boolean variables
(I’ll assume a 64 bit architecture throughout this post). For some
allocation named $a
I’ll refer to these variables as $a0
to $a63
.
Next we define a type of boolean polynomials over these variables.
struct Anded {
anded: HashSet<VarId>,
}
struct Bit {
xored: HashSet<Anded>,
}
Here, VarId
refers to some specific boolean variable (eg. $a23
). A Bit
represents formulas like:
($a2 * $a23 * $b7) + ($a0 * $b1) + $c3
Where *
is AND and +
is XOR. The important thing about these polynomials is
that they are the normal forms of a more generalised language of expressions
containing variables, 1
, AND, OR, XOR and NOT. eg. If I have some arbitrary
expression like:
(!$a0 | $b23) & $c45
I can normalise to (in this case)
($a0 & $b23 & $c45) ^ ($a0 & $c45) ^ $c45
This means that any two boolean expressions, consisting of boolean variables
and these operations, have decidable equality. Note that 1
is just the AND of an empty set of variables.
Next we define all our int and pointer types as being arrays of Bit
s.
struct U8([Bit; 8]);
struct Ptr([Bit; 64]);
// etc.
We can also define all our arithmetic operations on these types. eg.
U8([0, 0, 0, 0, 0, 0, 0, $a0]) + U8([0, 0, 0, 0, 0, 0, 0, $b0])
=> U8([0, 0, 0, 0, 0, 0, $a0 * $b0, $a0 + $b0])
When we perform a memory allocation, or get the address to a value on the
stack, we get a value of the form Ptr([$a0, $a1, $a2, ... , $a63])
. Note what
this is saying: the allocation address is some arbitrary value, unknown to the
model, but can still perform any kind of bitwise or arithmetic operation on it.
When we dereference a pointer, the pointer must be of the form $a + x
where
$a
is some allocation and x
is a valid offset. Anything else is undefined
behaviour.
Reading uninitialized data also gives us variables, only these variable don’t
correspond to any allocation.
Does this work? It seems to match my mental model, allow a lot of safe ways of generating valid addresses while still permitting optimizations.