Two Levels of mut

There can be no references simultaneously with a mutable reference. But this is far too restrictive:

There is a frequent middle case like pushing onto a Vec. Any references (even mutable) to earlier parts of the vector would not be affected. Thus they could coexist. Same goes for inserting new keys into a HashMap. Only simultaneous references to the whole structure/container would be incompatible.

I suggest a separate keyword to denote this, like fn push(&grow self, value: T). This would be a guarantee that the method doesn’t alter existing elements or their keys/indeces.

This would apply recursively to all members of a struct. E.g. I need to access data both by index and key. So I have a struct of Vec<(Key, T)> and HashMap<Key, usize>. If a reference to this struct is &grow, that would imply that I may only call &grow methods on the members.

And such a reference could only be taken from a mut or &grow.

Pushing onto a Vec could cause the entire collection to be reallocated, thus invalidating all references to other parts of the slice. Same goes for inserting into a HashMap.

16 Likes

Another complication is that something like Vec::<T>::get(&'a self, ix: usize) -> Option<&'a T> is interpreted by the borrow checker in such a way that the &'a T reference to an item is essentially the same as the borrow &'a self of the whole Vec. So with regards to this statement

Only simultaneous references to the whole structure/container would be incompatible.

there is currently no meaningful distinction between “references to the whole structure/container” and “references only to items inside the structure/container, not to the whole structure/container”, at least as soon as any function boundaries are involved. (Borrow checking works locally and would analyze a call to Vec::get only based on its signature.)

So one would simultaneously need to introduce some new way to mark Vec::get, too.

1 Like

:face_with_symbols_over_mouth: Oops, real life getting in the way of ideals.

1 Like

At least for something like Vec::push_within_capacity, reallocations could be ignored though. Much more niche use-case, but it’s at least something to discuss without needing to worry about other approaches to tackle the reallocation issue.

(Regarding tackling the reallocation issue, in principle, at least references into the items, behind additional levels of indirections should still be unaffected even for ordinary Vec::push. I mean a reference to a T in a Vec<Box<T>> for example.)

2 Likes

Ach ja! That explains some of the overzealous error messages I’ve gotten.

A Vec that doesn't move elements when you push to it is typically called an arena or memory pool.

1 Like

Also for linked lists this would then work. (Even though fixing the links is technically changing the old part of the list, so would maybe need to be done unsafely.)

Interesting data type. Alas it doesn’t allow access other than through a mut iterator. So though it solves the memory no go, it still doesn’t give the desired feature.

The idea with typed-arena is that the alloc method gives back a &mut T to the pushed item, which is supposed to serve as the main way to get (and keep) access to the item. That’s alloc(&'a self, T) -> &'a mut T (notably &'a self is an immutable borrow, so this returned &'a mut T does not conflict with any further immutable borrows of the same arena).

2 Likes

This can be a trait: Rust Playground

Is it like two-phase borrows? Two-phase-borrows - Rust Compiler Development Guide.

No, I don’t think there’s any similarity.

I had read about Box<str> (which still mystifies me a bit, as I had originally misunderstood that I can not have a non-static str.)

I was struggling badly with lifetimes here. If I have a reference in a struct, I must put the same lifetime on both, meaning the struct can only live as long as whatever outer thingy its inside is referencing. But here the reference is from one inner container to another while the struct’s lifetime should not be restricted. I couldn’t figure out how to express that. Something like:

struct X {
    a: Vec<Box<str<'a>>>,
    b: Vec<&'a str>,
}

Now to get around not being able to reference into my container – on a whim I tried Rc<str>. To my delight it works great, AFAICT totally transparent! And it takes only 16 bytes of overhead, 8 bytes less than String.

Rust rocks, once it clicks!

That's what's called a "self-referencial struct". Rust doesn't have a useful way to express that with lifetimes. There are a few crates that try to make it possible to create useful self-referencial structs; they do so with unsafe and it is very hard to do soundly. Most or all such crates have at least some niche unsound use cases.

(You can construct self-referencial structs safely in some cases, but it makes them almost unusable.)

2 Likes

This requirement seems like it can be solved through my recent new design pattern. We can design an API like this:

#![feature(vec_push_within_capacity)]
use std::marker::PhantomData;
use std::slice::SliceIndex;
type InvariantLifetime<'brand> = PhantomData<fn(&'brand ()) -> &'brand ()>;
pub struct VecHandle<'vec, 'brand, T> {
    vec: &'vec mut Vec<T>,
    _lifetime: InvariantLifetime<'brand>,
}
pub struct ValueToken<'brand> {
    _lifetime: InvariantLifetime<'brand>,
}
pub trait VecExt<T> {
    fn push_scope<'vec, F, R>(&'vec mut self, fun: F) -> R
    where
        for<'brand> F: FnOnce(VecHandle<'vec, 'brand, T>, ValueToken<'brand>) -> R;
}
impl<T> VecExt<T> for Vec<T> {
    fn push_scope<'vec, F, R>(&'vec mut self, fun: F) -> R
    where
        for<'brand> F: FnOnce(VecHandle<'vec, 'brand, T>, ValueToken<'brand>) -> R,
    {
        let handle = VecHandle {
            vec: self,
            _lifetime: Default::default(),
        };
        let token = ValueToken {
            _lifetime: Default::default(),
        };
        fun(handle, token)
    }
}
impl<'vec, 'brand, T> VecHandle<'vec, 'brand, T> {
    pub fn push_within_capacity(&mut self, value: T) -> Result<(), T> {
        self.vec.push_within_capacity(value)
    }
    pub fn get<'handle, 'token, I>(&'handle self, index: I, _token: &'token ValueToken<'brand>) -> Option<&'token <I as SliceIndex<[T]>>::Output>
    where
        I: SliceIndex<[T]>,
    {
        // SAFETY: We convert the lifetime of return value from 'handle to 'token.
        // This is safe because the 'token is scoped by the `scope` call,
        // and inside the whole scope, we'll never move/drop any items.
        // Also `scope` method exclusively borrowed the inner vec,
        // so others outside the scope also can't modify the inner vec.
        unsafe {
            std::mem::transmute(self.vec.get(index))   
        }
    }
}

And use it like this:

pub fn main() {
    let mut v = vec!["a".to_string(), "b".to_string(), "c".to_string()];
    v.reserve(10);
    let output = v.push_scope(|mut handle, token| {
        let x1 = handle.get(0, &token).unwrap().as_ref();
        let x2 = handle.get(1, &token).unwrap().as_ref();
        handle.push_within_capacity("d".to_string()).unwrap();
        let x4 = handle.get(3, &token).unwrap().as_ref();
        [x1, x2, x4].join(" ")
    });
    dbg!(output);
}

Checkout the implementation here: Rust Playground

And for the correctness or soundness, please checkout my post:

@steffahn gave some suggestions, inspiring me to apply the design pattern I recently found (I haven't found a good enough name for it) to meet your requirements.

3 Likes

Hey Tenny, thanks for this! It indeed looks like what I was asking for. I haven’t thought about it long enough yet, to really get it.

Two big limitations are the unsafe, which I hardly want in my code, and the within capacity. I totally get that part, but it’s no fit for my initially unknown data size.

1 Like

While the original post's argument is flawed, I think there is a kernel of truth: sometimes an object can be mutable in different ways, some of which will not invalidate existing references, and it can be annoying when the compiler can't tell which kind of mutability it's looking at. For instance, my_vec[0] = foo; requires mut my_vec but cannot reallocate, and yet the compiler doesn't distinguish it from the kind of mutability that can invalidate the other references (like my_vec.push(bar)).

I'm having trouble coming up with a concrete example where this is annoying, but I know I've run into it before with HashMaps before. I was trying to hold references to the (mutable) HashMap’s values, but couldn't because I was also mutating the HashMap (but not in a way that would invalidate any references). I had to replace the non-Copy values with a usize index into a Vec that stored the actual values I cared about.

1 Like

It's not clear to me that different "kinds" of mutability (or any language level change) is required here: the sought functionality in a particular case can usually be provided by the library once the right abstraction/API has been identified.

Under the hood it'll (ultimately) be implemented with unsafe code, and it may also be possible to implement such downstream without waiting for the library to provide a new/safe API. Indeed, such downstream implementations are a common way to experiment before solutions are adopted in the standard library.

You may well find that hashbrown (which is the implementation underlying the standard library's HashMap but which contains fuller and lower level APIs) provides APIs upon which a solution to your problem can be built.