Design safe collection API with compile-time reference stability in Rust

TennyZhuang · February 8, 2024, 11:29am

My recent blog: Design safe collection API with compile-time reference stability in Rust - COCl2's blog home

The blog post propose a design pattern that can provide reference stability for collections in safe rust. Reference stability is important ability in C++ STL while rust std collections doesn't have. Although I do not have the details on how to apply this pattern to the standard library yet, I hope to gather some relevant discussions and suggestion in order to organize a better Pre-RFC. Therefore, I am posting it here.

Jules-Bertholet · February 8, 2024, 7:06pm

This is very nice! I haven't seen the invariant lifetime brand pattern used this way before.

Hopefully, one day partial borrows will allow having several lifetimes directly in the self reference (one for structure and one for data), obviating the need for a separate token.

steffahn · February 8, 2024, 8:21pm

I fully agree that these kinds of API ideas should use some version of partial borrows if possible. The added benefit is that this would mean one could probably refine the lifetimes on existing API.

For example… well, there’s not even that many examples in the standard library because most collections to move values on re-allocation at least…

Your proposal is supposed to be about enhancing VecDeque then, it seems from the remark at the end of the article?

A collection can add the fine-grained permission API by only adding a scope method, without breaking any existing APIs. I’m considering to propose a VecDeque::scope API to std library, which allowed call push while holding stable value references.

For that however, remember that VecDeque on .push does re-allocate memory and move elements if it runs out of capacity, so (unlike C++’s deque) it cannot promise valid references to elements after insert.

Also generally for such APIs in std, since some “partial borrows” based solution seems best, that might be a good reason against standard library inclusion of such APIs.

An interesting (but even more complex beyond more basic “partial borrows”) idea IMO here would also be so somehow distinguish “shallow” access to data from “deep(-only)” access.

That way, a Vec<Box<T>> (or VecDeque<Box<T>>) could have push even on re-allocation only access the contained Boxes “shallowly”, and any borrows of the deep T (“deep” means it’s behind further indirection, and thus unaffected by simple moves of the containing Box<T>) directly could be allowed to be kept.

Another interesting challenge I’ve thought about and don’t know a good way how to model is append-only accesses.

For examples, a fn push_and_take_mutable_reference(self: &mut Vec<T>, elem: T) -> &mut T style method (and ignoring the reallocations for now) would only mutate the structure part, and then keep a mutable borrow to the values part of the Vec, however, this only borrows sort-of “newly introduced” parts of that values part of the Vec… so in principle one shouldn’t make this in conflict with existing references to values.

However it cannot just be a shared borrow of values either, otherwise a subsequent get(len - 1) after that push_and_take_mutable_reference call would support acquiring an aliasing reference to the mutable reference to the last element.

One thing that your article doesn’t mention: of course some use-cases could also be addressed by an API that makes the collection !Sync and permits mutability to the structure through interior mutability. This is of course a trade-off (in this case of reducing API-complexity, especially if done in a library-only fashion^[1], vs. the ability to do shared accesses thread-safely, e.g. throwing in a rayon parralel iterator traversal or the like).

Your article makes me think that !Sync protected interior-mutability based APIs could also work with a scope-ish model, or rather, simpler, with a fn (&'a mut self) -> HelpfulWrapper<'mut, …> API, so you can have only some section of the code use the otherwise ordinary (and thread-safe) collection in such a manner. (With such “scoped” access, they could even both exist! Both a !Sync interior-mutable structure API and a ValuePerm one.)

Here are a few notes I’ve had while reading:

However, r1 is definitely valid after a push_back operation in a double-ended linked list. The code is correct but can’t pass the safe rust compilation.

Not necessarily; as in Rust, operations that can invalidate references include mutable access, an unfortunate implementation of push_back could access mutably the whole relevant Node that needs to be updated and invalidate r1. (I’ve asked miri, it looks like the current implementation doesn’t, but still, such restrictions can come up in Rust, compared to C++)

inconvinience

(couldn’t help but notice the typo, sorry)

pub fn new_lru_cache<K, V, F>(fun: F)
where
    F: for<'brand> FnOnce(ValuePerm<'brand>, LruCache<'brand, K, V>),

is the issue with dropping the LruCache early supposed to be addressed yet? (Reading futher eventually shows “no”, but maybe you could acknowledge that that issue is still supposed to be ignored, in some half-sentence somewhere in the “Associate LruCache and ValuePerm” section… or maybe by making the forward reference

Fortunately, this issue is resolved in the subsequent design, so let’s temporarily ignore it here.

more precise by saying something like

Fortunately, this issue is resolved in the “scope API” design below, so let’s temporarily ignore it here.

i.e. without any possible future “partial borrows” ↩︎

Vorpal · February 8, 2024, 10:19pm

I don't have a full proof but I'm pretty sure this runs into issues with Rice's theorem. For example if you insert an arbitrary number of elements (based on the output of some complex calculation), how many items from the end are forbidden from access?

You obviously can't answer that statically at compile time. So the list of operations that you can perform around the context of such borrows must be very limited. And I for sure don't have a coherent model in my head of what exactly that limited set of operations should look like, especially in a principled implementable way.

steffahn · February 8, 2024, 10:34pm

I didn’t mean for any analysis based on indexes . The only analysis I was expecting was that references created before call to push_and_take_mutable_reference can stay alive. Like…

// still let’s *ignore* the re-allocation issue here,
// imagine `Vec` used `Box<Item>` items internally, if you want.
let mut v = vec![1];
let r1: &i32 = &v[0];
let r2: &mut i32 = v.push_and_take_mutable_reference(42);
// let r3: &i32 = &v[0]; // not allowed, no reasoning about indexes, and v[1] here would be bad!
use_refs(r1, r2);

the r3 here wouldn’t work, but of course

// still let’s *ignore* the re-allocation issue here,
// imagine `Vec` used `Box<Item>` items internally, if you want.
let mut v = vec![1];
let r1: &i32 = &v[0];
let r2: &i32 = v.push_and_take_immutable_reference(42); // other API, returning only immutable ref
let r3: &i32 = &v[0];
use_refs(r1, r2);

which means there must be some kind of “mutability” to the access that push_and_take_mutable_reference does and that differentiates it from push_and_take_immutable_reference, but no mutability that precludes the let r1: &i32 = &v[0] that was created before from staying valid.

Edit: Ah, perhaps you got the idea of the analysis being more complicated from me not being clear that: the push_and_take_mutable_reference functionality would be implemented involving unsafe code! (Maybe you thought I meant that it could be implemented using push and get_mut(self.len() - 1) and the compiler ought to accept that.)

TennyZhuang · February 9, 2024, 6:17am

Thank you for reminding me. I realize that my implementation is indeed a simulation of partial borrow, breaking down a lifetime into two within the scope. This is a very useful insight.

TennyZhuang · February 9, 2024, 7:42am

This is a very useful insight. I didn't realize before that my implementation was actually simulating the scope API.

This is also something I didn't know before. I mistakenly thought that the Vec in VecDeque referred to allocating multiple segments of Vec. I just finished looking at the underlying implementation of VecDeque. Indeed, this example is wrong. Maybe only LinkedList in std collections wouldn't move elements?

The case is similar to get_many_mut, I don't think it can be achieved at compile time, unless the structure itself is multi-version, and we can only allow the get on a previous snapshot after push_and_take_mutable_reference .

I've tried that, but !Sync types are really too hard to use in a tokio runtime, and tokio is almost the de facto standard for async Rust., so I don't mention it here. It's another problem that how !Sync types can be more usable in async rust.

I'm glad that my article can provide some better ideas for designing Rust APIs.

Thank you very much for these notes. They are very helpful in refining my article, and I will make the revisions ASAP. This can also help more readers (I may submit the article to next TWIR) better understand my ideas.

TennyZhuang · February 9, 2024, 7:47am

I get your idea, I guess we should prevent calling a random get method after the first push_and_take_mutable_reference. May the method can return a handle without random get method, for following operations? I'll think it more later.

TennyZhuang · February 12, 2024, 1:33pm

Find a related discussion. So it's really partial borrow

system · May 12, 2024, 1:34pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Safe Intrusive Collections with Pinning language design	17	5657	March 25, 2019
Static path-dependent types and deferred borrows language design	28	4333	June 20, 2021
Blog post: Indirect ownership, shallow borrow and self-referential data structures language design	6	1594	July 27, 2022
Sharing for a lifetime language design	18	3323	March 25, 2019
Notes + Interviews on Fallible Collection Allocations	53	7829	March 25, 2019

Design safe collection API with compile-time reference stability in Rust

Related topics