Stacked Borrows: An Aliasing Model For Rust

Hm, actually… if I’m reading the rules right, retagging activates the old borrow and initiates the new borrow. So retagging while memory is frozen would be UB, wouldn’t it? Then it seems like something like the following would be UB if moves of &mut locals retag:

let m1 = &mut foo; // initiate Uniq(m1)
let r = &foo; // freeze foo, and reset borrow stack
let m2 = {m1}; // activate Uniq(m1), initiate Uniq(m2) -- UB

Is this correct?

This is a very interesting direction to go, and it seems quite promising to me!

One thing which strikes me about this is that by providing a clear definition of the tagging behaviour in an instrumented machine, this opens the door to a sanitizer, and that would be wonderful. :slight_smile:

One thing which isn’t clear to me: how do multiple Raw tags in the same stack work? I have several thoughts on how this might work but I am not at all confident that they are correct. :wink:

This looks like a really nice model. I definitely like the general approach, especially the way that this takes the intptrcast model’s broadcast operation and makes it significantly more local and bounded through the creation of references. And making the broadcast be on casting from reference to raw pointers neatly sidesteps all the annoying questions about comparing pointers, since that requires casts to raw pointers.

For this purpose, we tag shared references with some kind of “timestamp” indicating when it was created.

This is definitely out of scope for now, but it seems like when generalized to multiple threads, this will want to be a vector or tree clock that interacts with the happens-before relationships generated by acquire-release pairs. Which seems like it would actually fit super well and possibly make a number of data races detectable, which is super nice.

So, when a &mut is cast to *mut like above, we instead push Raw onto the stack, indicating that any raw pointer may be used to access this location.

Is this also true when a &mut gets transmuted to *mut? I can see arguments for both sides. It would be nice to have a way to cast a &mut into *mut explicitly requesting that you not open the memory location up to modification through other raw pointers, for use in ways that will never be dereferences, such as in comparisons, hash consing, and memoization.

Shared references with interior mutability do not really have any restrictions in terms of what can happen to memory, so we treat them basically like raw pointers.

I’m a little bit confused as to what you mean by “treat them basically like raw pointers”. You’ve addressed in other comments that only the specific bytes that are within UnsafeCell are affected by this, but does that mean that Raw gets pushed when you create a reference to that location? Or just that the frozen flag is ignored?

I also think it might be worth having another marker specifically for this circumstance, i.e. when you create a shared reference to a location with interior mutability, you push a special Shared item onto the stack for the location. This would prevent other random raw pointers from the modifying or viewing area, but doesn’t impose any constraints on which reference is used to access the memory. This would allow transforming

fn foo(x: *mut i32, y: &Cell<i32>) {
    *x = 15;
    y.set(3);
    *x
}

into

fn foo(x: *mut i32, y: &Cell<i32>) {
    y.set(3);
    *x = 15;
    15
}

This might be totally useless in practice, though, since UnsafeCell is primarily used via the get method, which creates a raw pointer. I think to actually get a benefit we’d need to add an intrinsic for “reclaiming” a piece of memory, popping off the Raw and pushing a Shared.

“Time” here is a rather abstract notion, we really just need some counter that we bump up every time a new reference is created. This gives us a unique ID for each mutable reference – and, as we have seen, for shared references we actually exploit the fact that IDs are handed out in increasing order (so that we can test if a reference was created before or after a location was frozen).

I’m a bit concerned that making tags just timestamps makes it hard to reason about moving around the creation of references. There may be no way to actually compare the timestamps of references, but that doesn’t seem at all clear to me. Because of that, I don’t see that it would be valid to rewrite

fn bar<'a>(baz1: &'a mut u32, baz2: &'a mut u32) -> (&'a mut u32, &'a mut u32) {
    let p1 = &mut *baz1;
    let p2 = &mut *baz2;
    (p1, p2)
}

into

fn bar<'a>(baz1: &'a mut u32, baz2: &'a mut u32) -> (&'a mut u32, &'a mut u32) {
    let p2 = &mut *baz2;
    let p1 = &mut *baz1;
    (p1, p2)
}

which in my mind should absolutely be valid - I don’t feel like the relative time of creation of two pointers should matter.

Instead, would it work to tag pointers with a pair of a timestamp and just a fresh unique id? The timestamps remain usable for dealing with shared references, but when considering unique elements on the stack you just look at the id. These ids can be compared for equality, but aren’t ordered. Then, the only moments when the time gets stepped forward when memory gets frozen, not when any reference gets created. In the example above, p1 and p2 would have the same timestamp, and reordering them would be perfectly valid.

In fact, I think in that case it might be possible to record time on a per-memory-location basis, which would mean that most timestamps would be highly unordered, making various properties about aliasing stronger. Possibly.

To be clear, these are just random ideas I’m having that might fix what I’m seeing as somewhat complicated reasoning. I’m not sure my idea would actually simplify things; it just seems like it might.

We change the tags of the mutable references to the current clock value, bumping up the clock after every tag we assign, and then we push those new tags on top of the borrow stack.

So essentially there is an implicit reborrow of every pointer passed into a function? That actually feels very natural, and might actually remove special cases since Rust already allows you to reborrow the input to a function without any extra syntax.

(this can be a function argument, but also loading it from memory or getting it as the return value of some other function)

What happens when a pointer is loaded in many individual bytes and only reconstituted later? Does the retagging happen at reconstitution time?

The idea is to put a “barrier” into the stack of all function arguments when demo5 gets called, and to make it UB to pop that barrier from the stack before demo5 returns.

Does this, in analogy to retagging, also happen when a pointer is first moved into a function through other means (reading memory, etc.)?

A nice side-effect of barriers in combination with renumbering is that even if demo4 from the previous subsection would not use its arguments at all, it would still be UB to call it with two aliasing references:

Initially, I was definitely against this being a “nice” side-effect. However, if we want it to be possible to introduce unnecessary writes to some &mut pointers (which I do for the sake of lifting writes out of loops that might have 0 iterations) and reorder accesses to &mut pointers (which I really do), it has to be UB to just pass two &mut references to an empty function:

fn baz(ref1: &mut u32, ref2: &mut u32) { }

turns into

fn baz(ref1: &mut u32, ref2: &mut u32) {
    let val1 = *ref1;
    *ref1 = 15;
    *ref1 = val1;
    let val2 = *ref2;
    *ref2 = val2;
}

turns into

fn baz(ref1: &mut u32, ref2: &mut u32) {
    let val1 = *ref1;
    *ref1 = 15;
    let val2 = *ref2;
    *ref1 = val1;
    *ref2 = val2;
}

which breaks on aliasing inputs. So I’m tentatively convinced although I was very skeptical of the validity-based model before. And of course being able to safely mark things as noalias is very important.


TL;DR I like this model a lot, at least on first and second read-through. It may want tweaking, but it has a lot of properties that I like.


On an irrelevant editorial note,

…while “Stacked Borrows” is (to some extend) “access”-based.

That should be “to some extent”, I believe.

1 Like

Thanks for the work!

Some questions:

  • It looks like threading could be worked into this by replacing simple timestamps with a partial order based on happens-before relationships for times between different threads; is something like that in the cards?

  • Does this imply that e.g. Rc will be less efficient than &T, since Rc contains a pointer rather than a ref as an implementation detail, and therefore the compiler must assume that someone could have mutated the object the Rc points to, even though it models an immutable reference? But I guess Rc could be changed to use a real ref instead of a pointer, so maybe that’s not a critical problem. Oh, but you can actually mutate the referent via get_mut(), so never mind.

  • How does this interact with transmutes, or other treatment of memory-as-bytes?

    E.g. right now rust guarantees that references and pointers have the same representation in memory, meaning you can build a data structure containing pointers and then transmute it to one containing references. (Or share a mmap-ed data structure containing references with another process, etc.) This is a really useful property; Rust’s general usefulness for working with memory-as-bytes is one of its key advantages right now, since standard C and C++ have given up this ability.

    A similar example: I have some code that deserializes an object graph, possibly containing cycles, into objects owned by an arena. The result is going to be read-only and shared between threads, so I want to use raw references (or something with equivalent performance) for inter-object references.

    If there’s a cycle, it’s inevitable that at least one reference in the graph must be created before its referent is finished deserialization. But under these rules, I’m not sure how that’s possible. Even if I try to launder the result via ref-pointer-ref conversion of a root pointer or whatever, the model still remembers when all the references in memory were originally created, and makes this UB.

    Hopefully the final model will enable cyclic reference graphs to be legally creatable somehow, even if it requires jumping through crazy hoops.

    Maybe this model can be tweaked to allow it? I think the problem is inherent in associating timestamps with references sitting in memory. What would happen if memory were associated with time-of-read or time-of-write, but references took their ‘createtime’ from the reference that was used to load them from memory?

Edit: reread the article and somehow I missed the details in 5.2 and misunderstood the proposal. It’s much more conservative than I thought at first, so I think my above cases are actually allowed. Now I have the opposite concern - when loading references from memory, it seems like some questionable things become legal:

let mut i = 5;
let p = &mut i as *mut i32;
let rr = &&i;
let y = **x;
unsafe {*p = 17};
let r = *rr;
assert!(*r == 17)

This seems like code that should be UB. The compiler should be free to assume that any chain of immutable references is just as frozen as a simple reference. But as I read the proposal, the timestamp of r is its time of creation (after the write to p), so this is actually legal:

Any time a reference is passed to us from “the outside world” (as function argument, loaded from memory, or returned from a callee; also below structs or enums but not below unions or pointer indirectons), we retag.

Or is this what the ‘pointer indirections’ exception refers to, so in that case the loaded reference copies the timestamp of its ancestor?

However, in my interpretation, the reason it complains is that if it accepted the program, we would have UB in safe code! […] One important property, then, is that if the program has UB and does not use any unsafe code, the borrow checker must detect this.

I think this observation is particularly interesting because it provides a way to answer questions that have been coming up about what the borrow checker should actually allow. For example,

There are some things I am concerned about with NLL that I’d like to resolve before we make any final decision regarding stabilization. These are cases where NLL perhaps accepts “too much”. One such item has to do with the use of liveness, and in particular the implications of disjoint liveness ranges.
~ https://github.com/rust-lang/rust/issues/43234#issuecomment-410720455

Though of course it doesn’t help with “is this something we’re willing to promise forevermore to understand as being in the safe subset”.

I wonder if there’s a way to phrase the borrow checker as something that tries to constant-fold all the borrow stack operations…

2 Likes

Looks good.

I’m somewhat concerned about how barriers interact with cases where lifetime disjointness is enforced by dynamic checks. Let’s take your demo code, which you explained would be UB if x and y point to the same location:

fn demo5(x: &mut i32, y: usize) -> i32 {
  *x = 42;
  foo(y);
}

fn foo(y: usize) {
  let y = unsafe { &mut *(y as *mut i32) };
  *y = 7;
}

But then change &mut i32 to RefMut<i32>, and usize to &&RefCell<i32>:

use std::cell::{RefCell, RefMut};
use std::mem::drop;

fn demo5x(mut x: RefMut<i32>, y: &&RefCell<i32>) {
  *x = 42;
  drop(x);
  foo(y);
}

fn foo(y: &&RefCell<i32>) {
  *y.borrow_mut() = 7;
}

fn main() {
    let x = RefCell::new(42);
    demo5x(x.borrow_mut(), &&x);
}

This is now entirely safe code. But from your borrow model’s perspective, it ought to be equivalent:

  • RefMut<T> contains &mut T as a field.
  • &RefCell<T> is a shared reference with interior mutability, which in your model behaves like a raw pointer (using a Raw item). The double reference (&&RefCell<T>) is so that retagging doesn’t take effect, i.e. neither demo5x nor foo will push another Raw on entry.
  • I had to explicitly drop the RefMut, but that shouldn’t affect anything; it only writes to the RefCell's state field, not the value itself.

So this should also be UB. Am I mistaken?

1 Like

What do you mean by “global problem”?

There is still one key difference but it falls out of the desugaring: If you use the same mutable reference multiple times, that asserts that there has been no other access to these locations in the mean time. However, using a raw pointer multiple times creates a new reference each time, and hence does not assert such exclusive access.

No that should not be UB – activating Uniq(m1) in the last line will unfreeze the location.

That is, in fact, the very next step in my roadmap. :slight_smile:

Several Raw tags right next to each other are redundant.

However, when there are Uniq tags between them, they do make a difference. Imagine we have a stack Raw, Uniq(x), Raw, Uniq(y). If we now use a raw pointer on this location, that’s fine. Next, we use x, which will pop the first Raw off the stack. If we now use a raw pointer again, that will pop Uniq(x)! So from now on, using x is not fine any more.

This situation can happen when you start with y, create a raw pointer from that, from that raw ptr create x, and then from that create another raw pointer. Using x again invalidates the “later” raw pointer; then using a raw pointer again invalidates x because this means you must be using the raw pointer that was created from y. Of course, raw pointers are not tagged so we cannot know where you got it from; we will always assume you are using the “latest” raw pointer that is still in the stack. (This allows most code.)

1 Like

Oh, right. But then, the unfreezing is an observable effect of the move? Extending the above example:

let m1 = &mut foo;
let r = &foo;
let m2 = {m1};
*r; // UB if previous line unfreezes foo

Ah, that’s something I had not even though about, but you are right. :slight_smile: Of course, we still have to deal with LLVM’s model here, so pointer comparison is still a very subtle topic.

Hm, interesting. My expectation was to use a single global clock. If you have an operational model of concurrency, there still is a total order of all operations, it is just not observable by the program. So I would have used that order.

I would have to think more about whether there is any difference between these approaches. But, as you said, this is definitely not a priority at this point. :slight_smile:

Oh, transmutes. Good question. We cannot really do anything on transmutes because it is not, in general, possible to even detect when they happen. So nothing happens on the stack for a transmute.

My first inclination is to say that such a transmute is UB: The transmute would create a raw pointer with a tag, which is not a valid raw pointer (the tag usually gets stripped when casting &mut to *mut). That would sidestep the question. However, I am sure I will be told that people expect such transmutes to work. In the model as defined currently, the transmute would give you a raw pointer that may not be used, as there is no Raw on the stack. You have to explicitly say as *mut _ to inform the compiler about what is happening. I am not sure if that is what people expect.

Yes, Raw gets pushed if the location is not already frozen. (That can happen, e.g. when a &RefCell<T> and an &T alias.) The Borrow for a location covered by UnsafeCell below a shared reference is Mut(Raw).

I think your transformation is illegal, and my model deliberately conflates raw pointers and shared references with interior mutability. Mostly this makes the model simpler so I have to consider fewer combinations of things. :wink:

For example, we might want to allow FFI code which takes a &Cell<i32> and passes it to C as int*. The C accesses would all be raw pointer accesses.

I totally did not think about this, but that is a good point. I have two answers: First of all, we are defining something more like a surface language semantics here. So I am less concerned about making everything reorderable. Actually doing so is also really hard. LLVM can still do these reorderings later because it does not have the stack or timestamps.

Secondly, for Uniq(_), the tag is only ever compared for equality. They happen to be ordered but it really does not matter. So actually I think it is not hard to argue that your two programs are equivalent. Things get slightly more complicated with freezing.

Yes. But not just pointers passed to functions, also pointers returned to us from other functions and pointers we load from memory.

Oh my, you have a nasty mind. :wink: When it is loaded as individual bytes, it is loaded as a bunch of u8. Then you might be calling transmute, which in this case is a function returning an &mut – so retagging would happen because you got a reference returned from a function.

No. That wouldn’t work, unfortunately. The barrier reflects the fact that the lifetime of a reference you get as function input will necessarily outlive the current function call. References you receive during the function execution could be living shorter, so adding a barrier would not be correct.

Lifting writes out is not usually possible (you cannot just insert a write, that would change what the rest of the code can observe about this location!). But we certainly want to be able to lift reads out of loops.

I also think that reordering accesses does not need barriers; retagging is enough.

That is the biggest argument for me, TBH. I am not even sure what exactly the rules are with noalias, but my hope is that with the barriers we are on the safe side.


This is a very interesting question, and it comes right back to what I just said above about transmutes. I did expect that someone would tell me these transmutes are needed, I did not expect that that had already happened. :wink: Could you maybe write a small Rust-only test case demonstrating what you think should be possible?

An alternative to transmutes, at least in Rust, might be to use uninitialized memory. But that may not work: We have to create one shared ref first, which should freeze whatever it points at, but that “whatever” will then later get mutated to close the cycle. I do not think it is possible to create a cycle of & this way (if there is no UnsafeCell).

I assume you meant let y = **rr?

In that case, this is not legal. When let r = *rr; happens, we are loading an &i32. That is covered by the clause “whenever a reference is loaded”, so we proceed with retagging. We first activate the reference we loaded. That reference was created further up when &&i happened, and it kept its timestamp while sitting in memory. Activating it, however, will cause UB because the location got unfrozen when you wrote to it using *p.

No, timestamps are never copied. Only freshly assigned during retagging.


Btw, I am extremely happy about all the feedback, questions and concerns I am getting here. If I had to describe a best-case scenario for how a discussion about this blog post would look like, this htread would be it. :heart:

8 Likes

Wow! Thanks for this example. I love it. :slight_smile: (And nice hack with the &&RefCell to trick retagging.)

However, I think what this actually shows is that RefMut is wrong. This is something I have been wondering about for quite some time.

The reason RefMut is wrong is that it contains an &mut with a lifetime which, in this case, outlives the function call – but the exclusive access of this reference does not always outlive the function call, as you showed.

Hm, yes. You are “touching” m1 and hence invalidating all references that were created after m1.

do you propose this should compile? https://play.rust-lang.org/?gist=e1fcf5b59f631a9a1da00beff522997b&version=stable&mode=debug&edition=2015

I don’t want that to compile (unless an “implicit block” is inserted by the compiler between let b and let d, but I think that’d be surprising).

@Soni this already compiles with NLL: https://play.rust-lang.org/?gist=cdea8cd3ec8d1f33e66a16580bda52cb&version=nightly&mode=debug&edition=2015. And that is fully intentional. No longer tying lifetimes to variable scopes is the exact point of NLL. This has nothing directly to do with my proposal, which is about unsafe code.

okay, so NLLs are implicit blocks. (e.g. add let e = b.do_thing(); after a.not_mut())

this is the context: https://cybre.tech/SoniEx2/rust.hexchat.hexchat-plugin/src/branch/master/src/lib.rs#L20-L58 (trying to implement hexchat_list_*), which is about unsafe code.

There will certainly be such transmutes in the wild, but I don’t think they are justified. If you know you have a &mut and want a *mut, casting it is superior to transmuting in every conceivable way.

A slightly more interesting case is if you have a memory region that stores pointers, and is accessed by code that needs raw pointers (e.g., it’s a data structure shared with C code), but some other code wants to view the memory as containing references. However, in that scenario it seems acceptable to me to require that all code agrees that the memory contains raw pointers, and let the code that wants to deal with references explicitly cast between raw pointers and references when loading and storing (and place any lifetime parameters in PhantomData).

2 Likes

How does the model cover conditionality?

x: &mut i32, y: &mut i32, flag: bool
let mut ptr = x as *mut i32;
if flag { ptr = y as *mut i32; }
...
if flag { 
    *x = 1;
    unsafe { *ptr = 2; }
}

@jonh This is a dynamic model, it just looks at whatever code is executed. So assuming flag did not change, that code is fine.

I just spent quite a while disagreeing with you and attempting to come up with a counterexample, and upon repeated failures I’m now convinced you’re right that the models are basically the same. So here is my proof sketch that, in fact, you are correct (at least with the interpretation of the partial order that I was mentally using). I think this helps clarify what the different points in the design space are, by looking at where the proof sketch might fall apart.

First of all, note that since the total order is compatible with the partial order, anything accepted by the partial order is accepted by the total order, so it suffices to show that anything accepted by the total order is also accepted by the partial order. The only place where timestamp ordering is relevant is in checking the validity of shared references. In a case accepted by a total order, there are three relevant timestamps: t_freeze, when the location was frozen, t_create, when the shared reference was created, and t_use, when the value is read from under the shared reference. For this to be accepted, t_freeze <= t_create in the total order, and no writes to the location are scheduled between t_freeze and t_use. In addition, because of the data dependency, t_create must happen-before t_use. (N.B. This may not hold in the presence of weak memory, since the reference could have been read out of a promise.). Then, in the partially ordered setting, at the moment of t_create, the creation of the reference would trigger freezing the already frozen memory location. In my mental model the freeze time was actually an antichain frontier, so this would add t_create to the freeze time. Then, upon t_use, there have been no writes, so the location is still frozen, and the ordering confirms that t_create (in the freeze time antichain) happens-before t_use, allowing the read. Q.E.D.ish.

There are 2 big ways that I can see this proof failing:

  1. If t_create does not happen before t_use. This may be totally possible with weak memory orders. Essentially, the question is whether the following should be allowed:
    unsafe fn thread1(x: &Atomic<&'static u32>, ptr: *mut u32) {
        x.store(&*ptr, Ordering::Relaxed);
    }
    fn thread2(x: &Atomic<&'static u32>) {
        let value = *x.load(Ordering::Relaxed);
        println!("{}", value);
    }
    
    Note that the retagging at the return of load has the same effect of requiring validity as the actual read.
  2. If the freeze time is modeled as a single time instead of some frontier. This could make sense, but rules out
    unsafe fn thread1(ptr: *mut u32) {
        println!("{}", *(&*ptr));
    }
    unsafe fn thread2(ptr: *mut u32) {
        println!("{}", *(&*ptr));
    }
    
    as only the first freeze would “win” and the second reference would be invalid.

In both those cases, I’d actually like for the code to be accepted, so… color me convinced that, to the extent that the models are different from each other, the simple operational total order is better. Anyway, as we both said, this is not a question that is super relevant right now.

Yeah, my instinct is that this will be way too stringent. Especially since we say that &mut and *mut are the same layout, people will expect at least round trip conversion to be valid, and as you point out, we can’t even detect transmutes in general, which makes it nigh impossible to make miri check for this condition.

I at least am relatively happy with this as a compromise position (and it’s one of the things I proposed way back in the Types as Contracts thread). The lack of Raw on the stack just means that the new pointer is not dereferenceable, but it still allows comparison, hashing, etc. It also allows for a 2 piece desugaring of the as *mut _ cast, as a side effecting “broadcast” operation followed by a transmute, which I think is a reasonable model to use. The actual transmute can then be subject to, e.g., DCE, and the broadcast can be handled separately and will compile to no IR.

That said, I can definitely see this being another confusing way to shoot yourself in the foot.

Oh, sure. I just wanted to see if we could extend the model to make it legal. But since I finished writing that idea up and have been thinking about it more, I’m getting less and less convinced that it is useful, at least for now.

Ooh, FFI is an interesting aspect of this topic that I hadn’t thought about. If the rust code casts to a raw pointer and passes it to C, that certainly would be fine, but I assume you’re talking about an extern fn that directly takes &Cell<i32> as an argument? I’m not sure how this would cause more problems than just passing a reference directly to FFI. If I declare

extern "C" {
    fn foo(&mut i32);
}

then the C code will receive a reference, but will interpret it (which seems like a transmute to me) as a raw pointer. Then it will modify the memory, which would be invalid if there isn’t some implicit “all pointers immediately push Raw if crossing the FFI boundary.” This doesn’t seem to be a problem unique to Cell.

Honestly, while I do like making everything reorderable (and this particular case seems simple enough that I could totally see wanting a Rust pass to do something like this), my biggest motivation for wanting separate tags and timestamps is that it feels mentally simpler to me. Technically more complicated, but intuitively simpler.

That seems mostly reasonable, but I can see a few places where transmutes might throw a wrench in things. What if someone transmutes a &mut into a shared reference?

Actually, that’s an interesting question independent of the problem of timestamp tag ordering. I could see an argument for it just being UB, but it seems so innocuous, and more convincingly, I’d like to be able to transmute from &&mut T to &&T (mostly because I want more useful examples like transmuting &[&mut T] into &[&T]). It’s not clear what the right behaviour is. My instinct is that it interacts with retagging to make everything work out - when you actually load/receive the transmuted reference, it gets retagged. This activates the Uniq element on the stack (because despite the type, you actually got a mutable reference) and then tops off the stack and freezes it. But maybe that is too delayed.

:grin:

Not in general, certainly, but if you have &mut/noalias dereferenceable, it seems like that should lock things down enough for it be be valid (EDIT: I just noticed #53105, which is exactly this). Those assert exactly that there is no other code that can observe the location (and you won’t trap by trying to write). Obviously you need to only write back the value that was already there, but I don’t see a problem with that.

That said, upon experimentation, it looks like LLVM still won’t do an unconditional write, even in that highly restricted situation.

This mostly feels right to me. However, it occurs to me that it would be possible to support the intuition that when you explicitly drop a mutable reference, you are giving up the promise of exclusive access. There would be a “drop glue” (in quotes because it is purely at the instrumented VM level) for &mut that pops off the Uniq and any barrier that came with it.

The main case I can see for justifying them is if you have many references (i.e. in a data structure) and you want to convert all of them in one go. Iterating through and casting each one is expensive and possibly impossible, since we don’t really have a way to change the type of a data structure partially through (that would require true linear types to make it work).

This (generalized) was the motivation behind my RFCs #91 and #223, for example.

Could you maybe write a small Rust-only test case demonstrating what you think should be possible?

Here’s short example that just creates a cyclic linked list, and a longer one that actually deserializes a graph.

1 Like

Hmm, that makes sense.

Some other notes / nitpicks:

Dereferenceable

Given a function argument of type &T, I think your model already requires that the argument is a valid pointer (aligned, not trapping, not null), even if the function body never dereferences it: otherwise, the retagging and barrier operations wouldn’t make sense. Thus, rustc is justified in applying LLVM dereferenceable, as it currently does. However, it might be worth stating this more explicitly.

On the other hand, given a function argument of type &&T, if the function body never dereferences it, the inner pointer could be invalid. So if LLVM hypothetically offered a “deferenceable N times” attribute, rustc couldn’t use it.

De-reference-ification optimization

Another hypothetical optimization I’ve talked about in the past is: among function arguments that are immutable references to word-sized values (e.g. &usize, &&T), automatically detect cases where the function implementation doesn’t care about pointer identity for that argument, and transform the ABI-level signature to take the value directly (usize, &T). Of course, it isn’t always possible to statically determine whether the function cares about identity. But in many cases it is possible; in particular, if the function body passes the reference as an argument to another function, rustc can propagate the “doesn’t care” flag from the callee to the caller.

If &&T can point to an invalid pointer, there would be some restrictions on that optimization:

  • It would still be okay to do the transformation to pass &T directly, but the new argument couldn’t be marked dereferenceable.
  • If T is itself word-sized, it would not be okay to do the transformation twice and pass T by value.
  • On the other hand, the compiler could assume validity if it additionally detected that the argument is unconditionally dereferenced (again by the function body or its callees). But that isn’t always the case; e.g. if the function has an early exit (perhaps using the ? operator) before it happens to touch that argument, it’s not an unconditional dereference.

Probably not the end of the world.

Unsized types

You mentioned "mem::size_of<T>() many bytes"; I guess that for unsized T, that should that be mem::size_of_val(t) many bytes? …I can’t actually think of any interesting or problematic subtleties related to that, but I’m mentioning it anyway just for the record.

Reinterpreting memory

Unfortunately… I agree with gereeter that transmuting between pointers and references, or transmuting &&mut T to &&T as they suggested, or doing the equivalent of a transmute by copying bytes, etc. should be legal. I suspect there’s a fair amount of existing code that uses such transmutes, and it would be surprising (in the “surprise! UB” sense :wink: ) if they didn’t work – especially considering that they’re legal in C.

For this to be accommodated, I guess it would have to be possible for arbitrary values to carry tags, not just reference-typed ones – including raw pointers, usize, and even u8. I suppose this contradicts your desire to make non-reference types as “non-magical” as possible. But I think the alternative is worse.

Local variables

Given:

fn foo() -> i32 {
    let mut x = 2;
    let _ = &mut x as *mut i32; // does not escape
    bar();
    x
}

Can the compiler optimize this to always return 2?

If x were immutable, that optimization would arguably be justified in your model, under the assumption that an immutable local variable acts like an immutable reference. But not if x is mutable.

In some sense this is logical. If the raw pointer were passed to bar() rather than thrown away, the optimization would clearly be invalid; but you want raw pointers to be non-magical and untracked, so the fact that it’s thrown away shouldn’t make a difference and it should be invalid here as well.

However, from a compiler perspective, I think that would be quite surprising. The raw pointer cast seems like dead code, but removing it would change semantics. And LLVM is definitely going to want to remove it, so you’d have to use some special mechanism to trick LLVM into thinking the pointer value escaped.

I think it would be better to make this optimization valid by introducing an additional rule, something about values that are completely non-escaping.

Such a rule would also allow something like

fn no_op(p: &mut i32) -> &mut i32 {
    unsafe { &mut *(p as *mut i32) }
}

to be optimized into a true no-op.

This really is a interesting example, and I think it does in some sense “feel” like it or something like it ought to be allowed. I think it is generally fine to say that your specific code is UB (in my mind at least is pretty flagrantly mutates something that has had a shared reference taken which is invalid) as long as there is a clear guideline for what should be done instead. Specifically, regardless of how ugly the resultant code is, it should definitely be possible to create a value of that specific Node type (no changes like requiring it to be more generic or giving it repr(C)) using unsafe code that is cyclic in the way you describe.

This is the best I was able to come up with. As long as UnfinishedNode and Node are layout-compatible (which is a guarantee I think we can choose to provide, though we don’t right now), this should do the same thing as your code without undefined behaviour.

I think I would be in favor of a strong enough guarantee about struct layout to make this sort of code work (possibly with some extra annotations). I don’t think this would be at all onerous, though it might get in the way of struct layout randomization; I don’t actually think it will though. As a specific strawman proposal designed to be extra conservative:

Two monomorphised types S and T are layout-compatible if any of the following hold.

  • S = T
  • S is a repr(transparent) wrapper around S2 and S2 and T are layout-compatible.
  • T is a repr(transparent) wrapper around T2 and T2 and S are layout-compatible.
  • S and T are both built-in pointer types (&mut, &, *mut, *const) and the pointer payloads are either
    • Both Sized,
    • Both slices,
    • The same trait
  • S and T are both structs with the same number of fields and, when paired up in declaration order, every pair of fields
    • Contain layout-compatible types
    • Have the same name (this is a bit extreme, but I wanted to be conservative)
  • S and T are compatible according to repr(C) or other specific representation rules (e.g. packed, repr on enums, etc.)

If we made those guarantees, I think the cyclic data-structure use case is not too heinous to deal with.

The primary difficulty is that it is really hard to maintain data dependency chains and make sure that you don’t track things the way they were at the source level. Yes, it is painful to not be able to remove the dead cast, but that will be easily cleaned up later, and without it, you have to be sure to not break a different part of the chain. The traditional (well, I mean the one I’ve seen in multiple places; this isn’t really old enough to have tradition) example is

fn foo() -> i32 {
    let mut x = 2;
    let ptr = &mut x as *mut i32 as usize; // does not escape
    bar(ptr - ptr);
    x
}

Now bar in some sense has been passed the pointer, and unless you want to deal with some complicated questions of what information content certain values leak about the pointer, you need to let bar access x. But then you can’t rewrite ptr - ptr to 0. Complicating pointer-to-integer casts is annoying, but it seems to me like the least bad option.

See also the sort-of-tangent to the Types as Contracts discussion which I think did a great job of demonstrating the tradeoffs.

Additionally, I don’t think it would cause any confusion past the AST, as I’m imagining that the lowering from AST to MIR would lower a reference-to-pointer cast as two statements in the MIR: a (brand-new) PushRaw operation and then a transmute to do the actual cast. Then, the compiler would be free to remove the cast, just not the PushRaw which has side effects. The PushRaw in some sense acts like a compiler barrier - it prevents certain movement of operations, but when MIR gets lowered to LLVM, it would just disappear.

This is just a MIR-level model, so it is fine if LLVM removes the cast - we just can’t remove it (or, well, the push of Raw, which is a bit different) before lowering to LLVM. We do need to be careful to compile to valid LLVM, but as long as MIR undefined behaviour is a superset of LLVM undefined behaviour, which it should be since we throw away a bunch of extra information, then LLVM will generate valid code.