[Idea?] Access state for borrowing

Hello all

I am not a rust developer, but I’m an intermediate C++ developer, and read a lot about rust. Looking for opportunity to use rust.

Like every C++ dev, I’m uncomfortable with Rust strict borrowing rules. (Ownership is awesome however in my opinion)

Specifically, I think that multi mutable borrowing should be legal and this is what we always did before rust. By multi mutable borrowing, I mean that an object could be referenced by many mutable references simultaneously. I know what Rust philosophy is, and I understood exactly why Rust avoid this - you can’t be sure that access is safe.

Two examples when it’s happen:

  • Double linked list/tree. I know that the community has found a workaround without using unsafe, but it’s unnatural.
  • Gaming/Systems Component relations - I don’t need to explain it to you very much, but I want to state an example: Amethyst Game Engine use entity component system(ECS), and by this design the engine gained the power to use mutable references simultaneously, because objects(here, entities), are referenced by integer index and not by pointer. This “cheating” is criticized in more detail at Jonathan Blow’s video.

Rust current solution is RefCel, but this is just dynamic safety.

Why therefore low level devs use such design and feal safe? Because in their head/documentations, they claim: Reference A is using object X at this section of the “loop”, and reference B is using object X at other section of the “loop”.

I suggest (assuming it wasn’t suggested yet) a solution, how to put this “thinking” formally at rust: Add an access state concept.

Let “States” be a static user defined enum, and a object “component” which is linked to access state enum “States”.

Each borrowing of “component” should determine(manually or half-automatically?) which access state it wish to borrow.

If one wish to use the mutable/const reference, it must be in the scope of the access state. By access state scope I mean adding syntax/using macros to declare access scopes.

The compiler can easily validate this, and the user gained a more detailed access state transparency.

I don’t know if it might solves double linked list/tree issue somehow, but I’m sure it solves components relations. For example, in game logic you have steps in the game loop - they correspond to access states.

What do you think about this? Any similar idea has been suggested?

1 Like

Well, I have a few things to say, Rust doesn't think in terms of mutable and immutable, those are just the effects of the safe defaults (shared mutation is denied by default). Rust thinks in terms of shared and unique references. But these references fundamentally are not supposed to be able to check every correct program.

Also, you can use Cell as a almost non-existent wrapper around a value to get shared mutation on single threaded programs. It supports Copy and Default types, but you can use types that aren't either (it will just be harder).

How often do you actually need a doubly linked list? Trees are really easy to define in Rust, because they have a clear ownership semantic.

struct Tree {
    left: Option<Box<Tree>>,
    right: Option<Box<Tree>>,
    value: T
}

You could also use indicies,

struct Tree<T> {
    data: Vec<Node<T>>,
}

struct Node<T> {
    value: T,
    right: Option<usize>,
    right: Option<usize>,
}

And this may be more cache friendly.

I would encourage you to watch the talk "Leak Freedom by Default" by Herb Sutter from the 2016 cppcon. In this talk he shows how to achieve leak freedom by using unique_ptr<T> (Box<T> in Rust), and shared_ptr<T> (Arc<T> in Rust, in Rust we also have Rc<T> which has no C++ equivalent), and arenas (indexed vectors!).

So it looks like, even in C++, there are people who think that the way Rust does things are good.

I have seen Jonathan Blow's video about this, but he doesn't represent any majority of programmers. His needs are very specific (why he is building a new language). So I would take his word with a grain of salt. Game Programming is a very different beast than most other domains of programming. Game Programming is bug tolerant and requires every last last microsecond to be squeezed out. So there is no wonder that Rust wouldn't be a perfect fit, the borrow checker would get in the way most of the time.

Not everything can be checked statically. We could also generalize this argument and say that Mutex and RwLock are just dynamic safety, but no one will buy that argument. The reason is, sometimes you need dynamic safety because your problem is too complex to be verified statically.

7 Likes

This is just another way of putting the same component system that Amethyst/specs is using, and a lot more work, just to save one indexing operation.

The particles that specs gives out could just as well have been pointers to the object, which the system upgrades to “real” references. Instead, they chose to use an arbitrary key that is translated by the system. (It happens to be an index/generation pair.)

And in using arbitrary keys rather than pointers, you gain powers: e.g. to kill an entity without having to notify everyone immediately, just when they try to access it. How is this not better than using pointers? You pay a very minor cost to translate the opaque key into the reference, and for it gain the ability to actually destroy entities and recycle their memory automatically.

Additionally, it gives the power to expand the capacity and retain quick iteration over all components of the same type without having to rewrite every pointer.

This is even the exact pattern described by http://gameprogrammingpatterns.com. I don’t see how it’s a bad default.

2 Likes

As I said, I do understand rust philosophy and I really liked it!

My claim is that the compiler shouldn’t be less stricter, but smarter instead in order to allow(safely) good programming that are tend to be dangerous.

The problem exists, there are 2 partial solutions (RefCel or unsafe code).

I suggest a consistent method for the programmer to convince the compiler that he is correct. I think it’s very simple and natural to divide to access scopes which correspond to access states.

1 Like

There is work for this in the form of polonius, which will make the borrow checker allow more correct programs.

How is unsafe a partial solution?

I’ve feared to reference Jonathan Blow’s video, just because this misunderstanding:

I know why it’s good to use integers instead of pointers. I know what data oriented design is and why its cache hits are good. I know what is the advantages of entity component system, I used Unity too. (best similar engine that I know)

I think personally that Amethyst is good.

I referenced Jonathan Blow’s video in order to show that Amethyst ignores “safely” the strict borrow checking of rust. I wanted to prove the important need of my suggestion.

Buggy code can be perfectly memory safe. Rust's first safety concern is memory safety, not verifying that your code is bug free. As I said before, not everything can be verified, and this is why we have unsafe.

Could you please provide an example of how your idea will be used.

1 Like

This is a falsehood. Amethyst's ECS system (specs) fully obeys Rust's borrowing rules, and that's why it's safe.

The rule of Rust is "any live mutable borrow is a unique borrow (mumble mumble pinning makes this more complicated unfortunately)".

The particles that the ECS uses to key an entity or component are not references, thus they have no borrowing requirements, because they don't borrow anything.

The components remain owned by the central store, which then loans out borrows keyed by the particles. These references are then statically guaranteed to follow Rust's mutability xor aliasing rules.

In what way is this different from your "states"? It really feels like specs is the library solution to what you're getting at. The particles cannot mean anything without the blessing of the store.

Note also that having &mut Store in scope isn't enough to validate particles, either. To activate the particles they have to create a new borrow from the store, which prevents use of the store while the particle is live. This means any language solution can't be any better than the library solution, since you have to reborrow the store anyway.

I'm a full-time C++ developer, and my first reaction to Rust was "Yes! All this 'ownership' stuff that's pure tribal knowledge in C++ (and the reason you are a danger to yourself and others if you haven't read Effective C++ yet) is actually built into the language! Why did no one do this sooner!?"

@RustyYato and @CAD97 seem to have covered all the misinformation in the top half of the post already, so on to what I think you actually wanted to say...

My main thought is that it's extremely hard to tell what you're suggesting here. In particular, parts of this only make sense if you're proposing a dynamic/run-time solution, and other parts only make sense if you're proposing a compile-time check.

The talk of "each borrow" having a different "access state" and "scope" sounds like it's proposing a compile-time check not dissimilar from the current borrow checker or the new NLL borrow checker. @RustyYato mentioned "polonius" earlier; that's the codename for yet another evolution of the borrow checker that isn't production-ready yet. If you're actually trying to propose a concrete improvement to the borrow checker, it's extremely likely that whatever you have in mind is already covered by NLL or polonius, or that you need to read about them before you can articulate your idea comprehensibly.

Likewise, the talk of a "user-defined State enum" sounds like a dynamic/run-time solution similar to Cell and RefCell, which is especially puzzling since you imply you're trying to describe an alternative to RefCell. I think this part is just confused.

The problem with doubly-linked lists isn't really about multiple mutable borrows, it's more about borrow circles / self-borrowing (and the other posts already explain how it's not relevant to ECS at all).

Self-borrowing is a genuinely hard problem in Rust, but part of the reason Rust's model works so well in practice is that it is very, very rare to actually need self-borrowing. The general experience of the Rust community so far seems to confirm that the benefits of being able to guarantee basic things like &T always refers to a valid T value, no exceptions, vastly outweigh the occasional need for genuine self-borrowing.

It is true that today, self-borrowing usually ends up with either dynamic checks, or unsafe code. But it's important to remember that unsafe Rust is basically the same as C++ in terms of its safety guarantees, so the claim that implementing a doubly-linked list with unsafe code is a "workaround", i.e. the implication that C++ has an advantage over Rust here, is simply a misunderstanding of what C++ has always been. AFAIK, both languages have to write essentially the same code with all of the same raw pointer manipulations to make their linked list implementations work.

Which is not to say that we're not interested in enabling self-borrowing use cases in safe Rust. It's just that C++ has never even claimed to provide anything close to a compile-time guarantee of memory-safety for self-referencing/pointing user-defined types. Any self-borrowing use cases that we could enable in safe Rust would be genuinely novel safety features that AFAIK no language has ever had before. In fact, I believe the Pin APIs are just such a novel feature.

9 Likes

@Tal can you "tl;dw" the point in this video? It's an hour long and after watching about 10 minutes of it which included him futzing around with his webcam a bunch, the most I managed to get was that he agreed with most of the ideas in the original RustConf talk and claims the approach it describes is how most games work today, but that he had a point that's "very subtle" which I was too impatient to wait for him to get around to. I clicked around trying to find his point and only found extraneous rambling.

5 Likes

He basically said that arenas are just a fancy allocator and that indicies into the arena are just fancy pointers. Because they are fancy, Rust doesn’t understand that it is “just” an allocator, and so it can’t help protect againts safety issues. There were some other points, but this was his main issue.

Disclaimer: I watched this video a while ago, so I may be misremembering his main point

2 Likes

Interesting. That feels like it’s talking past any of the actual claims of Rust though.

Has there ever been a programming language capable of providing a compile-time memory-safety guarantee for non-trivial memory allocator code? I always assumed that was impossible in practice because allocator code must talk to the OS today’s major OSes are not designed to support such a thing (maybe Redox could do it?).

If the argument Blow’s trying to make is something like “Rust’s safety guarantees don’t apply to video games because all video games need custom allocators”, then I think this was probably a reasonable hypothesis at one point, but has since been effectively disproven by Chucklefish’s experience report, assuming I am correctly remembering the very unambiguous wording about having near-constant UB issues with C++ even with extremely high C++ expertise, and basically none of that in Rust.

Unfortunately, I can’t double-check my memory because… https://www.rust-lang.org/static/pdfs/Rust-Chucklefish-Whitepaper.pdf 404s! Where does one report this sort of thing?

3 Likes

Here is some abstract empty code to demonstrate what I think that should be legal. This not fix self-borrowing, but it fix some sort of multi mutable borrowing:

struct A {

}

struct B<'a> {
  [access_state(B)]
  a: &'a mut A
}

struct C<'a> {
  [access_state(C)]
  a: &'a mut A
}

fn main() {
    let mut _a = A{};

    let _b = B{a:&mut _a};
    let _c = B{a:&mut _a};

    loop {
        access B {
            // Do something with _b
        }

        access C {
            // Do something with _c
        }
    }
}

[access_state(*)] macro tell the compiler which borrow state it wish to borrow.

access * { *** } is the scope that permits using the borrowed references only for * access state references. This is what I meant by saying “step in the loop” in the original post.

Again, this doesn’t overcome self-borrowing, but does allow the user to share object(“component”) at many places, safely.

1 Like

What happens when you nest access blocks?

access B {
    access C {
        
    }
}

Does this automatically cause an error, or do you need to use the reference to error?

Can you use multiple access_state attributes on the same field?

Ah, now we’re getting somewhere. This is much closer to what we sometimes call “partial borrowing” or “disjoint borrows”, which does actually exist today in some simple forms like this:

struct A { x: i32, y: i32 }

fn main() {
    let mut a = A { x: 0, y: 1 };

    // the compiler can tell x and y are "disjoint" parts of a
    let x = &mut a.x;
    let y = &mut a.y;

    while *x < 10 {
        // so these &mut borrows don't interfere with each other
        *x += 1;
        println!("{}", x);
        *y += 1;
        println!("{}", y);
    }
}

I believe something similar is done to closure captures.

It’s probably possible to look into extending this sort of reasoning to more complicated cases, but we still need a clear example of something that’s not permitted today yet would make sense to permit and cannot be more easily done some other way.

Do you have a concrete example where these B and C accesses are not fields on a struct? (as discussed above, I don’t think an ECS would apply here)

3 Likes

It remains unclear whether the OP is talking about borrows disjoint in “space” (e.g., using different parts of a structure) or statically disjoint in “time”. If it is the former, @Ixrec has covered it, so I will address the latter.

It is possible to have a shared handle to the same “element”, and be able to temporarily “upgrade” one into a unique and thus mutable reference, while “freezing” the other handle. One implementation of this is &RefCell<T> (and more generally, any shared reference to a wrapper offering interior mutability), but it is true that some patterns do not need runtime checks (to promote the handles) and should be able to thus be statically checked; qcell is a crate offering such pattern, by moving the unique / shared problem to an external element. Borrowing such element kind of creates the scopes you were talking about.

Personally, I haven’t found a case where I would need such pattern, since instead of having a “remote owner” I could simply refactor the code and make it “non-remote”.

  • In your code example, for instance, making B and C not have a reference to A, but taking one within some method, would make the pattern also work, no shenanigans required.

The only time where refactoring becomes near-impossible is with self-borrows; these very often require unsafe, given how hard the invariants such self-borrowing structures rely on are difficult to express at compile-time.

3 Likes

As for the example - It happens all the time, in classical OOP games, aka game graphs. It happens not just in games, the componenets interactions happens too much in any system programming, the top example is operating systems.

I can’t give you right away an example because everything that I’m familiar with contains by its nature self borrowing references.

When I said ECS doesn’t really solve the problem and cheat, I meant that rust philosophy isn’t just “memory and thread safety”, it catch by its nature logical problems. I said that if one wish to implement a component system without ECS, say OS for example, he will have to get too much unsafe code, or at least RefCell, which isn’t perfect.

I gave an idea how to replace RefCell by a static approach. Even though it doesn’t solve self borrowing, it’s still usefull.

In addition, all those algorithm you mentioned here are very powerful indeed, but I don’t think it’s natural and ideal to let them decide what code is valid and what isn’t. I think that like rust current syntax, the programmer should write it’s intentions explicitly, i.e. give the compiler(and the rest of humanity that maintain the code) a “proof” by natural syntax. Therefore, I explicitly mentioned here the “access state” syntax. Maybe it can somehow be extended to self borrowing verification, don’t know how to do this properly.

I suspect this is the sticking point. It seems like, in practice, the kind of code that would benefit from these annotations so much that it's theoretically worth adding them as a core language feature is probably also code that needs a significant amount of self-borrowing or, like an OS, needs a ton of unsafe code for completely unrelated reasons.

5 Likes

Worth mentioning that apparently this was the dominant view pre-NLL, and the NLL RFC talks about this:

Part of the reason that Rust currently uses lexical scopes to determine lifetimes is that it was thought that they would be simpler for users to reason about. Time and experience have not borne this hypothesis out: for many users, the fact that borrows are "artificially" extended to the end of the block is more surprising than not. Furthermore, most users have a pretty intuitive understanding of control flow (which makes sense: you have to, in order to understand what your program will do).

5 Likes

Rust's rules are a lot more specific than this, but the defaults and naming confuse a lot of people when they start looking into the details.

As @RustyYato mentioned, &T and &mut T are about "shared" vs "unique," not "immutable" vs "mutable." This has far deeper implications than perhaps you realize:

  • A &mut T is, by virtue of being a unique reference, allowed to change the "shape" of an object. For example, it can change an enum variant, reallocate a vector or string, or deallocate a box.
  • These kinds of operations can invalidate other references into the object, causing them to dangle. This makes your access states proposal insufficient- it would also need to ensure that a given state does not invalidate the references used by another state.
  • Even if you solve that problem, intuitive reasoning about &mut T is still insufficient, because &mut T enables a lot of optimizations that can be invalidated in more subtle ways. It's treated much like a C restrict pointer.

RefCell gets way more attention than it deserves in these kinds of discussions, because it's not a fundamental part of the Rust model. It is just a simplistic API that tries to push you back into the world of &T/&mut T as quickly as possible.

Coming from C++, what you really want is &Cell<T>. This is a T that can be mutated directly via shared references, with no dynamic checking or conversion back to &mut T. This also has some subtle implications:

  • Because unrestricted mutation à la &mut T can change an object's "shape," thus invalidating references to its interior, &Cell<T> by default forbids forming those references to begin with, and only allows the entire object to be overwritten.
  • Some objects never change shape- arrays and structs always have the same sub-objects in the same place no matter how you overwrite them. This means it is safe to go from a &Cell<[Element]> to a &Cell<Element>, or from a &Cell<Struct> to a &Cell<Field>, for example.
  • The real tool behind both Cell<T> and RefCell<T> is UnsafeCell<T>- a T that the optimizer knows might be aliased, backing out of the optimizations mentioned above. You can build your own safe shared mutability tools on top.

Rust is still a little immature in this area. The &Cell<[Element]> to &Cell<Element> conversion is extremely recent, and the type system has no way to express &Cell<Struct> to &Cell<Field>. It's all there in the model, though, so you can wrap up an unsafe implementation in a safe API and know it will work.

Given the tools above, this is more of a problem of ownership and lifetimes than mutability. The solution is just like any other container- you need to view the whole data structure together, and treat all its nodes as part of a single thing, rather than trying to use &'a T for some impossible 'a.

Rust basically gives you no tools for implementing this safely. You just need to bite the bullet and do it with unsafe, then wrap it in a safe API. A language that did provide a safe way to do this would be far more complicated, probably requiring some general theorem proving features. Punting on that while still enabling safe wrapper APIs is still a huge gain over C++.

11 Likes