Stacked Borrows: An Aliasing Model For Rust

Hmm, that makes sense.

Some other notes / nitpicks:

Dereferenceable

Given a function argument of type &T, I think your model already requires that the argument is a valid pointer (aligned, not trapping, not null), even if the function body never dereferences it: otherwise, the retagging and barrier operations wouldn't make sense. Thus, rustc is justified in applying LLVM dereferenceable, as it currently does. However, it might be worth stating this more explicitly.

On the other hand, given a function argument of type &&T, if the function body never dereferences it, the inner pointer could be invalid. So if LLVM hypothetically offered a "deferenceable N times" attribute, rustc couldn't use it.

De-reference-ification optimization

Another hypothetical optimization I've talked about in the past is: among function arguments that are immutable references to word-sized values (e.g. &usize, &&T), automatically detect cases where the function implementation doesn't care about pointer identity for that argument, and transform the ABI-level signature to take the value directly (usize, &T). Of course, it isn't always possible to statically determine whether the function cares about identity. But in many cases it is possible; in particular, if the function body passes the reference as an argument to another function, rustc can propagate the "doesn't care" flag from the callee to the caller.

If &&T can point to an invalid pointer, there would be some restrictions on that optimization:

  • It would still be okay to do the transformation to pass &T directly, but the new argument couldn't be marked dereferenceable.
  • If T is itself word-sized, it would not be okay to do the transformation twice and pass T by value.
  • On the other hand, the compiler could assume validity if it additionally detected that the argument is unconditionally dereferenced (again by the function body or its callees). But that isn't always the case; e.g. if the function has an early exit (perhaps using the ? operator) before it happens to touch that argument, it's not an unconditional dereference.

Probably not the end of the world.

Unsized types

You mentioned "mem::size_of<T>() many bytes"; I guess that for unsized T, that should that be mem::size_of_val(t) many bytes? ...I can't actually think of any interesting or problematic subtleties related to that, but I'm mentioning it anyway just for the record.

Reinterpreting memory

Unfortunately... I agree with gereeter that transmuting between pointers and references, or transmuting &&mut T to &&T as they suggested, or doing the equivalent of a transmute by copying bytes, etc. should be legal. I suspect there's a fair amount of existing code that uses such transmutes, and it would be surprising (in the "surprise! UB" sense :wink: ) if they didn't work – especially considering that they're legal in C.

For this to be accommodated, I guess it would have to be possible for arbitrary values to carry tags, not just reference-typed ones – including raw pointers, usize, and even u8. I suppose this contradicts your desire to make non-reference types as "non-magical" as possible. But I think the alternative is worse.

Local variables

Given:

fn foo() -> i32 {
    let mut x = 2;
    let _ = &mut x as *mut i32; // does not escape
    bar();
    x
}

Can the compiler optimize this to always return 2?

If x were immutable, that optimization would arguably be justified in your model, under the assumption that an immutable local variable acts like an immutable reference. But not if x is mutable.

In some sense this is logical. If the raw pointer were passed to bar() rather than thrown away, the optimization would clearly be invalid; but you want raw pointers to be non-magical and untracked, so the fact that it's thrown away shouldn't make a difference and it should be invalid here as well.

However, from a compiler perspective, I think that would be quite surprising. The raw pointer cast seems like dead code, but removing it would change semantics. And LLVM is definitely going to want to remove it, so you'd have to use some special mechanism to trick LLVM into thinking the pointer value escaped.

I think it would be better to make this optimization valid by introducing an additional rule, something about values that are completely non-escaping.

Such a rule would also allow something like

fn no_op(p: &mut i32) -> &mut i32 {
    unsafe { &mut *(p as *mut i32) }
}

to be optimized into a true no-op.