PRs removing RefCells, longer term plans

My speculation went a bit off track, but what I want to emphasize is that there are other abstractions which permit safe mutation through shared references but cannot fail at runtime, e.g. append-only vectors, insert-only hash tables, etc. This would give us extra flexibility in removing uses of RefCell without plumbing around mutable references or splitting structures.

In cases where RefCell is used gratuitously and we just need to pass some context structure by mutable reference instead, I think itā€™s a clear win to do so. Generally I think @pythonesque has the right idea, I just want to make sure weā€™re exploring other options in the design space. I donā€™t think Cell and RefCell are the last word in single-threaded interior mutability.

2 Likes

I havenā€™t had the time to look through the PRs in detail, but in general Iā€™m in favor of moving away from RefCell and towards &mut where possible. Letā€™s use the static guarantees that the language provides us with.

I have this notion (which hasnā€™t yet made too much corrosive contact with reality) that it should be possible to structure things into passes that build up/aggregate some kind of information data-structure (which gets threaded through as an &mut or is a return value) while the information objects from previous passes are threaded through as shared references:

fn compile() {
    let ast = parse();
    let resolve_map = resolve(&ast);
    let polytypes = collect(&ast, &resolve_map);

    ...
}

Working data needed by the pass would then be passed through in an &mut context object. This is especially viable for passes with many small, independent work packages, like type-inference and function translation. I think thatā€™s pretty much the direction the discussed PRs start off into.

For things that need to be shared and mutated by multiple threads (e.g. monomorphized functions impls) this wonā€™t work but I think the idea of strictly growing data-structures, as has been discussed above, will be a good strategy in most cases. We effectively have this already for many things like interned ty::Tys and many of the caches being built up throughout.

Iā€™ve just looked through the trans PR a bit and one thing I find a bit troublesome is that before Cells and RefCells clearly marked things that could change and one could rely on everything else to not change. Now there is no more distinction between ā€œmutableā€ and ā€œimmutableā€ fields anymore (this is one of the cases where Iā€™d like to have const fields that canā€™t be mutated after initialization even through a mutable reference). I guess itā€™s a trade-off between having to remember which fields not to mutate and having to fear runtime errors. Iā€™m surprised to say that, from a documentation point of view, &FunctionContext + Cell/RefCell might be better than &mut FunctionContext. It could be worked around some by making fields private and only providing getters though. And I donā€™t suspect this to be a big problem as long as passes and their context objects are focused enough and donā€™t contain too much stuff that isnā€™t actually needed.

Yes, sometimes I wish that we required you to mark fields as mut in order to change them directly (even though, of course, one could change them by reassigning the struct entirely). I guess being able to mark fields as const would move back in that direction in a compatible way. Usually I don't find this problematic because most state isn't shared that widely; even here, grep'ing through trans isn't hard, but of course it's nicer to see it at your fingertips.

Some time ago I put this style into practice (at small scale) in the variance inference. I found it worked ok, but I'm not sure how it would scale up to bigger things. I think it might work better with either struct extension or perhaps just abusing Deref -- it gets annoying to remember that, in order to access (e.g.) the AST, you need to do polytypes.resolve_map.ast.nodes[3] vs just cx.ast_nodes[3].

Oh, I absolutely agree with this. I've been wondering about monontonic, non-moving data structures for a while. But I've yet to experiment with building one. In my experiments in that direction, I found that perhaps a persistent data structure would work better -- in particular I was experimenting with different versions of persistent that were oriented at short-lived snapshots and a single master copy, trying to reduce the overhead of inserting data and so forth in that scenario.

The advantages of a persistent data structure is that it shares many of the same properties, but it is able to reorganize and reallocate its internal structure freely. Maybe I wasn't thinking hard enough, but all the ideas I had for monotonic, non-moving hashtables and the like seemed to have poor lookup performance over time. And of course if you have a persistent data-structure, you can wrap it in an Rc (or Arc) and trivially extract and clone it, and then the user gets a nice snapshot.

Of course, if possible, I find IVars even more appealing: that is, leave a hole for the data, create a standard hashmap or set or whatever sequentially, and then plug it in. But I've found that in practice most of the maps DO wind up being mutated in later stages, though often it's only during cross-crate inlining. If we moved away from using global maps, this would be much less of an issue.

Yesterday I started, but didn't have time to get very far with (story of my life these days, it seems like), going through the data structures and trying to document and list up where they are modified from. I want to sketch out in more detail what I am talking about with regard to per-fn data structures, as well as identifying problematic cases in advance.

1 Like

Hm... now I have some interesting ideas for how we might go about this based on MVCC databases, particularly how they handle things like lock elision, update chaining, and tombstoning. I'd never considered applying such ideas in a single-threaded context, but now that I am thinking about it I think it could both be done and made way simpler :slight_smile: In particular, imagine using stuff like versioned transactions and value locking to get strong safety and correctness guarantees with very little extra overhead, using lifetimes for correctness only at a very coarse level. I realize this is pretty abstract, but I think together with HKTs you could effectively turn any strict collection into a persistent one without losing much performance (maybe even making the performance losses entirely opt-in). Let me play around with it...

Yes, overwriting the whole struct is always possible. However, my motivation for wanting to have const fields was always more a question of keeping a composite value internally consistent and not of preventing some memory slot from being mutated. But that's just a side remark and not really important to this discussion.

Do you have an example to look at where Deref is used in such a way?

rustc typically uses getter methods (e.g. every instance of fn tcx)

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.