[Discussion] Super borrow, for super let, placement new, and hybrid borrow

Defination

Super borrows could only occurs in signatures, which:

  1. Prevent calling drop of the parameter (like normal borrows)
  2. May have an optional lifetime part, which cannot be automatically infered if exists.
  3. Might not be initialized (if we want something like placement new or super-let).

Example 1: get_or_insert

Currently, there is an interface, called get_or_insert, which might insert an item, thus an exclusive borrow &mut self is needed, and it is also get, thus a shared borrow &T is returned. Thus we finally got something like fn(&mut self)->&T. Since the &T is regarded as downgrading from &mut _, the existance of &T prevent other shared references.[1] Since the signature here means the mutable borrow lives as long as the returned immutable borrow, even with polonius the error cannot be bypassed:

#![feature(hash_set_entry)]
fn main() {
    let mut set = std::collections::HashSet::from([1, 2, 3]);
    let ret = set.get_or_insert(100);
    let ret2 = set.get(&2);
    dbg!(ret2);
    dbg!(ret);
}
$ rustc --edition 2024 -Z unstable-options -Zpolonius -C link-arg=-fuse-ld=mold test.rs  -o test && ./test
error[E0502]: cannot borrow `set` as immutable because it is also borrowed as mutable
 --> test.rs:5:16
  |
4 |     let ret = set.get_or_insert(100);
  |               --- mutable borrow occurs here
5 |     let ret2 = set.get(&2);
  |                ^^^ immutable borrow occurs here
6 |     dbg!(ret2);
7 |     dbg!(ret);
  |          --- mutable borrow later used here

error: aborting due to 1 previous error

For more information about this error, try `rustc --explain E0502`.

Even if we could prove that, got another reference is safe, we just cannot make Rust know. In this case, I suppose my super borrow works:

fn get_or_insert<'a>(super 'a self, value: T) -> &'a T {
    let ret = self.base.get_or_insert(value); // currently ret is downgraded from &mut
    unsafe {
        // SAFETY: we have proven that, obtaining other shared borrow is safe here
        mem::transmute(ret)
    }
}

In signature, the signature suppose self is super-borrowed, which means it is exclusively borrowed at first, and finally the borrow become a shared borrow which lives 'a.

For a borrow checker, such function signature could be directly translated into 2 functions:

fn get_or_insert<'a>(super 'a self, value: T) -> &'a T {/*omit*/}
fn caller() {
//    let ret = self.get_or_insert(val);
    let _ = &mut self; // ensure something could be borrowed exslusively.
    let ret: &'a T = Self::get_or_insert(&'a self, val); // check whether we could borrow something for `&'a`.
}

For its body, just regard the super param as normal param without any borrow, and mark any borrow of super params, lives at least their super lifetime, might be enough.

Example 2: Placement new

Since we could send uninitialized variable into the stack, we could thus play some trick with placement new:

impl Foo {
    fn placement_new(super '_ s: Self) {
                        // ^^ Note the signature. this lifetime mark means s must be valid for at least `'_`, thus s must be valid after `placement_new` is called.
        s = Self { ... }
    }
}
// call placement new:
let a: Foo;
Foo::placement_new(a);
// call placement new in depth:
fn caller() {
    let a: Foo;
    caller2(a); // we could transfer super parameter between functions
    // since caller2(a) does not drop `a`, we could use `a` later.
    // question: should we mark such `a` with a different symbol, e.g., `super a/new a/out a/...`?
}
fn caller2(super '_ a: Foo) {
    Foo::placement_new(a);
}

Example 3: Control extern exclusive states

Suppose you have an FfiInterface, which might generate Unprotected values, and some call of FfiInterface might trigger FFI's GC, which recycle all unprotected things. Thus you must ensure every Unprotected is either dropped or protected before a FFI call.

A real world example is the famous statistic software, R. Currently, we have to protect all the variables in FFI calls, or we have to expose unsafe interface and manually check whether all FFI calls are valid. After super borrow implemented, we might have the capability to writting safe Rust.

pub struct Unprotected<'a>(*mut c_void, &'a ());
pub struct Protected(*mut c_void);
pub struct FfiInterface(()); // should not be constructed in safe Rust.
impl FfiInterface {
    unsafe fn new() -> Self { Self(()) } // construct a new interface is unsafe.
    // super 'a self means, an exclusive borrow is leaked, and it lives at least `'a`
    // note that, `allocate_in_ffi` may trigger GC, thus for stable Rust, it must be `&mut` to ensure no other valid Unprotected<'_> exists.
    // and thus it became another `fn (&mut self) -> &T`.
    pub fn allocate_in_ffi(super 'a self) -> Unprotected<'a> {
        unsafe { ffi_allocate() }
    }
}
impl Unprotected<'a> {
    pub fn protect(self) -> Protected {
        unsafe { ffi_protect(self.0) } // consume 'a, thus release the shared borrow of `FfiInterface`.
    }
}

Example 4: Writer example for super let

fn super_let(super 'a file) -> Writer {
    println!("opening file...");
    let filename = "hello.txt";
    file = File::create(filename).unwrap(); // parameter file is initialized here.
    Writer::new(&file) // since file is not drop, this line is OK.
}

Example 5: Pin function for super let

Since make macros into function may need extra care, here I cannot avoid introduce a new grammar for optional parameters:

impl<'a, T> Pin<'a, T> {
    pub unsafe fn new_with_super_borrow(val: T; super 'a mut value: T) -> Self {
                                           // ^ note that semicolon, everything after this semicolon could be optional.
        value = val; // thus val is masked.
        unsafe { Pin::unchecked(&mut value) }
    }
}
let thing = Pin::new_with_super_borrow(Thing { … });

Further discussion: Default values

I have no idea whether allow default values is a good idea, but super value seems 100% suitable for default parameters, but that might make this discussion too large.

Alternatives:

  1. A better borrow checker?
  2. A more flexable borrow rule, which allow return mut borrow back and reborrow it as shared borrow, rather than just downgrade mut borrow, makes the shared borrow cannot be shared with other borrows.
  3. A better allocator that could ensure placement new is optimized?
  4. A better FFI interface which could track the status of some locks

  1. Here, fn(&'a mut self)->&'a T cannot be regarded as a downgrade since we have both RefCell::<T>::get_mut(&mut self)->&mut T and RefCell::<T>::borrow_mut(&'a self)->BorrowMut<'a, T>. If fn(&'a mut self)->&'a T can be regarded as a downgrade and allowing more shared references occurs, we could modify the immutable borrow downgraded from get_mut with borrow_mut. ↩︎

1 Like

Your first example looks similar to my proposal "Downgradable mutable borrows" from December 2023:

2 Likes

I have several concerns about all the previous RFCs:

  1. If 2 lifetimes are provided, is the short lifetime really necessary? (My proposal: no)
  2. Is it possible store downgradable borrows into something like Vec? (My proposal: no)
  3. Is it possible that we have to downgrade and upgrade the borrow several times? (My proposal: in the function body, super 'a value could be regarded as value itself, thus it could be downgraded many times, or even re-borrow as another super 'b value. At the end of function body, the return value might contains a lifetime which is shorter than super borrow's lifetime.)
  4. How to tell users what could be downgraded and what could not? (My proposal: imagining all the borrow/reborrow/downgrade happens in the same caller. After a mut borrow ends, immutable borrows of-coursely could be created.)

Actually I've made something almost same to your RFC in July 2024. And finally I found that, the borrow could be combined with super-let, since it could be regarded as something occurs on a higher stack.

What's more, I strongly suspect that the short lifetime will cause some corner cases, since it cannot be either enlarged or shorten. For a most simple case, since &'short mut lives at least 'short, and we cannot define lifetime shorter than the whole function scope, thus &'short mut lives for the whole function scope, which makes &'long cannot be obtained in the function.

fn poc<T>(&<'short mut, 'long> item: T) -> &'long T {
    &*item // Can we actually do this?
    // firstly, item is borrowed for at least 'short and 'short at least lives here
    // which makes &'long impossible to be borrowed.
    // thus *item here is *(item as &'short mut), and thus only `&'short T` is generated.
}

Just a clarification:

The borrow checker is working correctly. This signature:

fn<'this>(&'this mut self)->&'this T

Says that so long as the return type is used ('this is active), *self remains exclusively borrowed (the &mut self also has lifetime 'this).

That relationship can be relied on for soundness.

So the desired get_or_insert semantics will indeed require a new type of API, be it super borrow or something else; the existing API semantic is not a bug and can't be "fixed".

6 Likes

Re: Example #2

I don't see how is this substantially different from MaybeUninit, in the part that actually matters for emplacement: ability to do it transparently across function boundaries.

I know that, but the problem is, it is safe and we should have the capability to make a shared borrow while the &'this T is alive.

Agree. We could always downgrade a &mut T obtained from RefCell::get_mut to &T, while allowing more shared borrow (e.g., from RefCell::borrow_mut) will directly leads to a logical error.

I'll update it soon in Example 1.


Calling .assume_init() in MaybeUninit is unsafe, and we may tell compiler to check whether the super parameter is initialized.

Maybe we need another grammar to explictly tell whether we should initialize the variable:

Actually, I firstly want to deal optional values with initialize together, since they are quite the same: If value is not provided, call the initialize code and thus perform the initialize path / using the default values. Otherwise do not initialize / using provided value. After some attempts, I found the grammar too difficult to handle.

That's really the trivial part. The problem with placement is that no amount of new kinds of pointers can guarantee in-place construction, especially not across function boundaries, without duplicating every construct tor function ever written.

If this thing is critical, IMHO, super '_ out_value is enough, which suggests out_value lives for at least '_, which means the borrow lives and thus out_value is a initialized value.

I'll edit the main thread to emphase that.

No, I get that. It's just that "eliminating manual assume_init" doesn't really do much to get us closer to Placement New. Placement new isn't referring to the ability to write a new function that uses an out param, it refers to the ability to take more or less any function and make it return into a place, guaranteeing that intermediate stack copies are elided.

1 Like

No, it is not safe, at least if done naively. Have you seen the link the comment you replied to provided? It shows exactly what could go wrong if this was allowed.

1 Like

I just say, make extra shared borrows after get_or_insert is safe (since it could be rewritten as "try insert and then get"), and currently no grammer could tell compiler about that. IMHO, currently we cannot bind a mut borrow and an immut borrow into one function without any side-effect. Not all &mut T could be downgraded to &T, but some of them could.

impl<T> MutexExt<T> for Mutex<T> {
    fn get_ref<'a>(super 'a self) -> &'a T {
        // `get_mut` aquires a `&mut T` with no locking
        let tmp1 = Mutex::<T>::get_mut(&mut self).unwrap();
        // here tmp1 is under &mut
        let tmp 2 = tmp1 as &T; // tmp2 thus lives as long as the mut borrow.
        // return tmp2 cannot be executed here, since we only allow self borrowed for 'a, but actually self is still borrowed mutablly.
        #[cfg(break_safety_assurance)]
        unsafe {
            // SAFETY: this really breaks the safety requirement.
            return mem::transmute(tmp2);
        }
        // if you really want to return &T:
        // &*self.lock().unwrap()
        Mutex::<T>::lock(&self).unwrap() as &T
        // since here we use a shared borrow here, the returned value agrees with its signature.
    }
}

This doesn't really make sense to me, 'a is a lifetime while "mutable" is a kind of borrow.

This is trying to return a reference to a temporary, are you sure this is supposed to work?


This seems interesting though. I'm not sure this is what you meant, but here's how I re-interpreted it. Instead of having a function take a mutable reference, we could have a way to declare that it takes something more akin to a mutable place, which:

  • the function can mutably borrow for the duration of the call (i.e. not after it returns, including in the return value);
  • the function can sharely [1] borrow in the return value;
  • the caller considers to be sharely borrowed by the return value.

This avoids the issue with internal mutability because the call can't return a reference obtained from e.g. Cell::get_mut, since that mutably borrows from the place and the function can only return values with a shared borrow. It can however use Cell::get_mut in its body. Effectively the requirement of not downgrading a mutable borrow to a shared borrow is delegated to the body of the function call.

One worry I have though is that this is very specific to the return value, where you only need to create a value with the given lifetime. But what if you also receive that lifetime as input, what would it represent then?


ps: honestly your proposals are always kinda hard to read due to various grammar mistakes or typos. I know this is usually kinda frowned upon, but have you considered using an AI to help you correct them? They are generally pretty good at that.


  1. is this even a word? ↩︎

3 Likes

I'm really sorry for that. Maybe I should try working with AI.

IMHO that's fine. Since the super in super 'a self can be interpreted as the variable self is allocated in the caller's scope, all operations on self can be imagined as taking place on the caller's scope. Here, 'a (and the seemingly useless 'a mut) should be understood as the borrowing state of self after the function is called. The meaning of 'a is that we will return a variable with a lifetime of 'a, so the caller is obligated to mark self as borrowed for a lifetime of 'a (or 'a mut if we really need that).

use std::sync::Mutex;
fn main() {
    let mut a = Mutex::new(5i32); // super variable
    // Here, `get_ref` is called. At this point, please pretend that all operations
    // on `a` within `get_ref` occur within the caller's scope (this is the meaning of the `super` identifier).
    let _ = a.get_mut().unwrap(); // `get_mut` is fine.
    let ret = &*a.lock().unwrap(); // `.lock().unwrap()` is also fine, since we discard the result in `get_mut`
    // `get_ref` ends here (since it return the value `ret`).
    // We could see that, `ret` borrows `a` with a shared borrow.
    // In case only the signature of `get_ref` is visible to caller,
    // all its implementations should keep invisible,
    // and thus we have to using its signature to tell all its caller,
    // the returned value `ret` borrows `a` with a shared borrow.
    // this is why the signature `fn get_mut(super 'a self: Self)` contains a lifetime 'a.
    println!("{b}"); // fine
    // `b` dropped here. thus lifetime `'a` ends, no extra borrow of `a` exists.
    println!("{a:?}"); // Thus it is fine.
}

Here, I probably shouldn't abbreviate super 'a self, because it is actually more similar to mut self rather than &mut self. The full notation should be super 'a self: Self, not self: super 'a Self. In fact, disregarding the borrowing of the return value (and the drop at the end of the function scope), self is indeed of type Self. According to the current syntax rules, only when self is of type Self (rather than some kind of borrow) can we perform both mutable and immutable borrowing on self successively.

As for more lifetime, although I have no idea whether it is another super 'a mut self, writting them as-usual might be a choice:

fn foo<'a,'b: 'a,'c: 'b>(super 'a mut input: &'b mut T<'c>) -> &'a T<'a> { ... }
// As you can see, we do not discard the capability to simulate `fn(&'a mut T) -> &'b U`.

I once thought that two lifetime annotations were needed: one 'short to describe mutable borrowing and another 'long to describe the final immutable borrowing. (Thus we could have something like &<'long, 'short mut> value.) In this case I could explain the behavior more clearly:

// old thought
fn get_ref<'short, 'long: 'short>(&<'long, 'short mut> self) -> &'long T {
    // `get_mut` aquires a `&mut T` with no locking
    let tmp1 = Mutex::<T>::get_mut(&'short mut self).unwrap();
    // here tmp1 lives at most `&'short`
    let tmp2 = tmp1 as &'long T; // you cannot extend the lifetime here.
}

However, I suddenly realized that Rust currently lacks the ability to annotate a lifetime that is shorter than a function call. Therefore, once we use a 'short to describe a mutable borrow, since this 'short must live at least as long as the function call itself, we simply cannot create an immutable borrow with a 'long lifetime within the function body. This is also why I attempted to use the keyword super to change the semantics of borrowing to "temporary own a variable which is allocated in the caller's scope."

Here, 'a in super 'a self: Self only allow shared borrow of self leak outside the function. Since tmp2 actually keeps the &mut self borrow, it cannot be returned.


Almost exactly, except these 3 descriptions seems allowing &'a mut borrows downgrade to super 'a borrows, which might cause lots of misuse.

I have no idea whether we should allow this since in my proposal, only &mut T and &T exists, all the super 'a value: T could be explained as "imagine all the operations of the parameter happens on the caller's scope". For a quite special case, we could move the value out, and move that value back several expressions later, since it is fine if we perform such operations on the caller's scope. For yours, since we could downgrade &mut T to super T, I'm afraid we have to deal with &T, &mut T and super T together.