Zero-cost interior mutability - proposal

Vorpal · November 4, 2024, 7:30pm

Even limited global analysis risks slowing down build times. They are already not great with Rust. Similar to C++ before modules, and I have heard from colleagues that with modules build times in C++ have actually improved.
Is it? I think I have had to use Rc once, RefCell never, and Cell once. I have used Arc a few more times and Mutex a few times (though never together, my Arcs could all be readonly, just with complicated life times). So this is likely very domain specific. Usually you get better performance (for the common threaded case) by resturcturing your code to avoid sharing. And cleaner code by avoiding interior mutability in the single threaded case.

John_Nagle · November 4, 2024, 7:36pm

The "slightly less restrictive rules" section is easily implementable. That's a single-function test. Less restrictive than that, and more analysis is required.

SkiFire13 · November 4, 2024, 8:38pm

That seems to have the same interprecedural issues though.

John_Nagle · November 4, 2024, 8:57pm

Not really. The thing you can't do is hold a .borrow() or .borrow_mut() and, while holding it, call something else that does a conflicting .borrow() or .borrow_mut(). SafeCell is a zero-cost abstraction, so there's no reason to hold a .borrow() or a .borrow_mut() for very long.

Don't do

  let obj = obj_link.borrow_mut();
  // ... lots of stuff including calls that do other manipulations.

Keep the borrows local and it's hard to have a clash.

theme · November 5, 2024, 3:07am

How do you check whether that something else might call a conflicting borrow() or borrow_mut()?

SkiFire13 · November 5, 2024, 6:52am

Yeah, that's the source of the I was talking about.

I don't see how these two facts are correlated. If anything, people generally love to keep their borrows for longer than necessary.

That's a guideline, but there's nothing preventing people from not doing so.

John_Nagle · November 5, 2024, 7:20am

It's a proposed compile time check, covered above.

theme · November 5, 2024, 8:31am

How would such a compile-time check work?

afetisov · November 5, 2024, 12:57pm

If I understand correctly, those signatures are entirely bogus. borrow_mut, and likely take, actually work with &self rather than &mut self. That's bad, regardless of whether we agree to do global reasoning and post-mono errors. If something takes a &mut, it should work like a &mut. Both for correctness, legibility and learnability reasons.

Maybe SafeCell requires a better notation, or maybe the method implementations should call some magic upgrade_safecell intrinsics, but in any case the user-facing function signatures should be accurate.

Are you proposing to add TBAA? It has a poor track record in C++, and Rust specifically avoids any kind of typed memory in its memory model. In the presence of unsafe code and transmutes (which can happen via pointer casts in faraway code), your API looks unsound. That is, unless we declare that transmuting the contents of SafeCell is, at least in most cases, UB. But that goes directly against current assumptions of unsafe code, and planned language evolution.

TBAA UB is hard to avoid in the presence of low-level memory manipulation (though perhaps in Rust it would be easier, since it would affect one specific deliberately chosen type rather than whole program), and its existence goes counter to the goal of allowing arbitrary low-level operations.

It would also make unsafe generic code unsound, since the type could always contain a SafeCell. That means we'd need a new auto trait similar to Freeze, which would exclude any SafeCell from the type. I expect it to take forever to be stabilized, for the same reasons that other new auto traits are a pain to add.

Do I understand correctly that the intended goal is to support self-referential types? In that case, can you provide a reasonably complete mock example, which would show how something like a double-linked list could be implemented via SafeCell?

binarycat · November 8, 2024, 9:37pm

it seems like what this proposal actually is concerned with is not just "safe interior mutability", but cyclic references.

the name is also confusing, SafeCell sounds like a safe version of UnsafeCell, but we already have that, it's called Cell.

RefCell is also safe, so a name implying that it is checked via global reasoning would be much more helpful IMO.

John_Nagle · November 8, 2024, 10:09pm

I don't have any strong preference on naming. Suggestions?

Yes, this is really about circular references. The RefCell/SafeCell part of the problem looks do-able. It has to mesh well with a similar compile-time solution for Rc and Weak.

The general idea is to replicate the common Rust idiom where forward references are Rc, but backwards references are Rc::Weak, upgraded locally as needed. If all the checking for that can be done at compile time, we have safe no-cost circular references.

What's thinking about StaticRc? It seems to be going in the right direction. Is it sound if no unsafe functions are used? It's complicated, and seems to need more explanation.

scottmcm · November 8, 2024, 11:17pm

If you're looking at an inter-procedural analysis to determine things like this, what about trying to put such a check in a proof tool instead? They're already looking at ways of declaring invariants and running checkers to prove you don't invalidate them, and in many ways this kind of "RefCell doesn't ever panic on me" sounds very similar to the "this unwrap doesn't ever panic on me" kinds of things that is, AFAIK, already in the domain of what they want to accomplish.

John_Nagle · November 8, 2024, 11:23pm

Speaking as one of the first people to build a usable verification system, full-scale proof tools are too hard to use, require too much user annotation, and require too much abstract thinking for most programmers.

Rust's borrow checker is a simple kind of proof tool. Here, I'm discussing something at roughly the same level.

Compilers are much better than humans at detecting that some code here is inconsistent with other code way over there. This is that kind of check.

SkiFire13 · November 9, 2024, 12:40am

I think the point was: couldn't you first prototype this in some proof tool to explore its feasibility? It doesn't need to be in a full-scale proof tool, like you mentioned it could be a simplier one, similar to the borrow checker.

John_Nagle · November 9, 2024, 5:39am

That's a legit question.

Can we get enough benefit from some simple global analysis to justify implementing such machinery at compile time? Set membership transitive closure up the call tree isn't that hard, and it's enough for this SafeCell concept. But it's too much machinery for just that.

What other problems can we solve with such a global mechanism? Possibilities include:

Compile time Rc without too much annotation.
Detecting mutex self-deadlock due to nested locks from the same thread.

The goal here is to reduce or eliminate the temptation to use unsafe merely because someone can't make their data structure inter-references work. That's a worthy goal, in keeping with Rust's design principles. Much rejection of Rust by C++ programmers comes from what seem to be unnecessary restrictions on references. Often, those restrictions are unnecessary, but Rust doesn't know how to check that at compile time.

DARPA's TRACTOR project is trying to do automatic translation of existing C code to safe Rust code. This is going to result in Rust code with C-type reference patterns, and thus probably heavy use of Rc and RefCell. See section Q3.1 of the TRACTOR FAQ linked above:

Q3.1: Is it acceptable to use Rust’s smart pointer types, like Rc and RefCell?

A3.1: These pointer types, and others like them in the Rust standard library, are all acceptable in code that may be synthesized. Proposers should pay attention to the program metrics with respect to CPU performance, memory usage, and idiomatic code.

So it's worth thinking about how to deal with Rc-heavy code. Might even be fundable.

parasyte · November 11, 2024, 2:03am

Solving the interprocedural conflicts problem with borrowing would be a pretty big one, even in a limited form only within a compilation unit. (A general solution is still likely to require syntax to avoid semver hazards.) This subject comes up quite often, and the most recent thread I'm aware of was Idea: Experimental attribute based support for partial borrows

Speaking from my personal experience, reference counting in C was largely ad hoc and intrusive within the data structures that needed it. It was more common to abstract reference counted wrappers in C++. Is it just me?

Take libevent as an example, just the first library I thought of. It has reference counts in at least two structs:

They don't even use the same field name.

It's awesome that TRACTOR wants to automate some C-to-Rust translation, but how would a tool realistically detect that these could potentially be extrapolated to Rc<SearchState> and Rc<BufferPrivate>? Are the semantics even equivalent for this transformation?

John_Nagle · November 11, 2024, 4:51am

Interprocedural conflicts with borrowing, as discussed, can usually be worked around by introducing an additional struct. That may be slightly annoying to write, but is easy to understand. Having to pass multiple fields as separate mutable references is also annoying, but, again, not too hard to understand. There's enough expressive power in Rust to handle those cases.

I'm more concerned with common C/C++ reference idioms that can't be expressed directly in Rust. This often comes up when someone tries to do something that looks like inheritance in Rust. It's possible to get past such problems with Rc, RefCell, and Rc::Weak, but that just defers the problem to run time, and confuses who really owns what.

What you actually want, most of the time, is A owns B, while C and D also use B. Then you need a guarantee that C and D have dropped all references to B before A releases ownership of B. You usually don't want C or D keeping B alive after A is done with it, which is what Rc will do. At that point you usually have a memory leak due to circular references.

parasyte · November 11, 2024, 5:51am

I'm aware of the workarounds for interprocedural conflicts, and so are the people who want a solution for it. You asked what else the global analysis could be used for.

You probably already know all of this, but that's exactly what lifetimes do. So long as you can prove that the drop happens in C and D before A, it will Just Work. Rc doesn't have the proof burden specifically because it can keep B alive: the lifetime is 'static. Arenas are a common way to provide that proof with cyclic references without ref counting. Everything in the arena is dropped with it, no memory leaks. Since you're throwing out workarounds, there's one for you.

I think the global analysis idea is interesting because it can solve some real problems.

ryanavella · November 12, 2024, 12:00am

I haven't seen you mention it yet, but a lot of these shared-mutability C idioms can be simulated with &Cell<T> where T: Copy or T: Default (assuming the Default impl is cheap)

Even if T is not Copy or Default you can still go far with just the Cell::replace method.

use std::cell::Cell;
use std::num::NonZeroU8;

// Note: `(NonZeroU8, String)` is neither `Copy` nor `Default`
pub fn f(cell: &Cell<(NonZeroU8, String)>) {
    const PLACEHOLDER: (NonZeroU8, String) = (NonZeroU8::MIN, String::new());

    // Take an owned value out of the Cell
    let mut x = cell.replace(PLACEHOLDER);

    // Mutate the value
    x.0 = x.0.saturating_add(1);
    x.1.push('0');

    // Stick the value back into the Cell
    cell.set(x);
}

All of the above methods have practically no runtime overhead. In the worst case scenario it will just pessimize alias-based optimizations, but that would have been equally true for equivalent C/C++.

~~If you are okay with a small amount of runtime overhead you can use &Cell<Option<T>> to access the methods of Cell<T> where T: Default. The runtime cost is just a branch prediction~~ (EDIT: &Cell<Option<T>> probably doesn't have a significant advantage over &RefCell<T>, it is worth profiling but probably overkill)

scottmcm · November 12, 2024, 1:32am

You can couple this with temporaries & deref for a fun way of doing it, too.

I tossed together a sketch of it here, for example:

github.com/scottmcm/pawn-rs

src/lib.rs

73ddbe1bf


      
          #[test]
          fn it_works() {
              let v = Cell::new(vec![1]);
              v.pawn().push(2);
              {
                  let mut t = v.pawn();
                  t.push(3);
                  t.push(4);
              }
              assert_eq!(v.into_inner(), [1, 2, 3, 4]);
          }
          
          #[test]
          fn pawn_twice() {
              let v = Cell::new(vec![1]);
              {
                  let mut a = v.pawn();
                  assert_eq!(a[..], [1]);
          
                  // You can pawn something again without failing,

This file has been truncated. show original

Topic		Replies	Views
Exploring Interior Mutability, Auto Traits and Side Effects language design	14	2698	February 24, 2021
impl<T> Clone for Cell<Rc<T>> libs	25	3641	March 25, 2019
PRs removing RefCells, longer term plans compiler	29	5882	March 25, 2019
Sharing for a lifetime language design	18	3404	March 25, 2019
Adding shared mutability, while still maintaining memory safety language design	14	760	November 29, 2024

Zero-cost interior mutability - proposal

Related topics