PRs removing RefCells, longer term plans

So @pythonesque has opened two heroic PRs trying to remove RefCells from various places:

There has been a certain amount of mixed feedback on those PRs. I wanted to open this thread to discuss the two in general and decide whether this is the right approach. I’m on the fence myself.

From what I see, the advantages are primarily less RefCell noise and overhead, and perhaps a certain clarity about what gets mutated where. But the cost is that we wind up splitting the data structures quite a bit and doing a lot more threading about.

This sort of impacts long-term design plans, to some extent. For example, RefCell is sort of anti-parallel: but in most of these cases, I think it’s a non-issue. This is because the RefCells are specific to a fn item, and I envision parallelization occurring only between functions.

I’d like to also start off some more discussion on what the right data structures are, but I’m short of time so i’ll content myself with saying that I think everyone agrees the current set of hashmaps is not ideal. I’d like to see those maps becoming per-fn, to start, and I think we can often just remove the maps altogether in favor of structs with ivars, much as @Aatch did in his PR.

Anyway, the main thing is to decide about the PRs in question for the moment (and it may be that we want one and not the other). Thoughts?

ps @pythonesque, to clarify: if you think I did a bad job selling the PRs, I’d love to see a better one. :slight_smile: I’ve been wanting to write this post for a while but wound up with only a few minutes to do it in, so I figured it was better to get started with something hastily written than wait for something more deeply researched.

The biggest reason I want to get rid of a lot of the RefCells is for clarity (though performance is a nice bonus). For the most part, the RefCells aren’t necessary anyway–there are usually just a couple of contexts to pass around and nearly all the borrows are extremely short (presumably due to fears of ICEs). That is, I think most of the disadvantages you are referring to (threading structures through, etc.) are already effectively happening, it’s just not being reflected in the type signatures.

The reason I submitted these patches was because I was having trouble figuring out what was going on due to the pervasive RefCell usage. For someone who hasn’t worked with the codebase much, like myself, it is really hard to tell what it’s safe to modify. Even if the total number of RefCell-related ICEs these days is low, I suspect the price of that is that many parts of the compiler have to operate in very granular phases, (with mutability during each phase restricted to a couple of variables). While having distinct phases is certainly not a bad thing, it feels to me like it’s very difficult to change the phases because everything has to fit just-so. Again, I haven’t worked with the code as much as you, so I could be wrong about this, but my impression is that this has resulted in many “logical” passes having their code scattered all over the compiler, since they have to do a bit of work in each context (before the variables become sacrosanct).

The other problem I have is that RefCell ends up being insidiously viral–people end up using it solely because the way the code is written, mutable lifetimes can’t possibly work. By adopting conservative lifetimes (&mut ), it becomes a lot easier to refactor, as well as to see whether something is actually being used. One example is the Block arena in trans; it turned out we didn’t need the fcx in the blocks, and as eddyb pointed out we may not even need the arena at all. In typeck a variable in Inherited that wasn’t even in use. There are a lot of other little examples of things like that.

I am not familiar enough with the compiler internals to address the “right datastructure” question definitively; however, my experience has been that a combination of Cells and arenas are really effective for representing self-referential datastructures, which it seems like is what you’re referring to (though I wasn’t familiar with the word ivar). However, I don’t think it’s enough by itself. For rustc, I suspect at any time you can divide a context into twoish parts: one with clear ownership and mutable collections where you are building up data (often temporary data), and a longer-lived context of mostly-immutable data that is pointer spaghetti. That is a pattern that seems to come up pretty often for me in other Rust projects, and it seems correct conceptually as well.

I agree with many of the issues people are raising (re: ugliness) but I also think it is, for the most part, correctable. A lot of the ugliness is because I am not trying to remove every RefCell in our application at once, which is why we have to carry around several. From experience, though, if you just pass around a mutable context from the getgo (particularly if you are writing methods on impl blocks on the context), a lot of these problems go away.

The remaining issues are, IMO, not really fundamental problems with the approach; they’re indicative of missing syntax (that I’ve wanted before). Specifically: Rust doesn’t provide any convenient sugar for temporary groups of references. The behavior you want is for the mutable references in the group to be reborrowed on each call, each with a distinct lifetime, and not have to write any of the outer lifetimes out. You get basically the behavior you want if you consistently pass the references in as separate parameters, but it’s verbose (and also kind of poor as an abstraction). impl seems like it should be the least verbose option, but does make it really annoying to split up the groups. If you bundle up the references directly into a structure, you have to include explicit lifetime parameters, and it will move out every time you call a function (assuming one of them is &mut); if you try to address that by passing &mut references, and you have references with different lifetimes (if one of them is immutable this is likely), you have to keep deconstructing and reconstructing the group every time you want to pass to a new function. Such sugar would be useful for a lot more projects than just rustc and (I think) would address most of the ugliness concerns; but I might be being overoptimistic about its effectiveness.

5 Likes

And also–since I feel like it was more of an afterthought before–I do want to stress that I feel that the single best reason to get rid of the RefCells is that they are useless. Thus far, nearly all of the places we actually needed shared mutability either could have been or was already in a Cell. The vast majority of the others follow strict stack discipline, so for the most part, we’re just inviting potential future crashes and hurting performance by keeping them around.

I’m not a huge fan of RefCell. Here are my gripes:

  • RefCell does an end run around the usual static guarantees of Rust. It feels like an indictment of Rust when the compiler pervasively subverts the spirit of its own type system.
  • I’ve run into borrow conflict panics in rustc “in the wild.” I’ve also introduced a few of my own. They’re generally easy to debug, but not finding them until runtime makes our long build times even less forgivable.
  • Dynamic borrows introduce an invisible “action-at-a-distance” between disparate areas of code. This makes local reasoning difficult.
  • This is compounded by borrows being relinquished implicitly by a guard going out of scope. It’s very easy to not notice that you are holding a borrow longer than you intended. The old (unsound) scoped thread API harbored a similar pitfall which was a factor in abandoning attempts to salvage it.
  • As a corollary, this introduces a stumbling block for new contributors. If you experiment with making changes to one subsystem, you might induce confusing failures in unrelated code which you then have to decipher. Of course, RefCell is hardly the only way you can add excessive coupling to your program architecture, but it’s one that’s idiosyncratic to Rust.
  • The resulting cascade of angle brackets in types gives me flashbacks to Scheme, only pointier.

They do have their upsides, though:

  • You can introduce mutation to an existing code path without plumbing new lifetime parameters or mut modifiers throughout a sizable chunk of the call graph.
  • The same arguments about coupling and new contributors then apply, but for different reasons.

So, on the one hand, I’d love to see RefCell removed where possible. On the other, I recognize that they can reduce boilerplate and make adding features more agile and surgical.

RefCell is sort of a single-threaded version of a mutex (well, an RW lock to be more specific). Mutexes have acquired a reputation for being fiddly and error-prone, with experts now recommending concurrency abstractions that are less general but safer. I feel like we need similar alternatives to RefCell, but I’m not exactly sure what they would look like, or if they could be made to perform better.

These are my more general thoughts on the matter. I’ll dive into the PRs when I have time.

2 Likes

I feel like we need similar alternatives to RefCell, but I'm not exactly sure what they would look like, or if they could be made to perform better.

Well, the holy grail for concurrency is either linearizability or transactions, depending on who you talk to. Of course, in single-threaded programs we already have linearizability (...mostly). And the thing about transactions (and this is why thinking about RefCell as a "single threaded RwLock", though accurate, only goes so far) is that if they're rolled back, the end user is presumably still hanging around to handle it. I can't think of a good way to deal with something similar for RefCell; maybe coroutines? Attach a "session" callback and pretend you're working on it from another thread, maybe? When you borrow a RefCell someone else has grabbed, "context switch" as long as that one is in another session? Seems like a pretty big stretch. Also, while blocking on a mutex is totally acceptable behavior; dynamic borrow assertions are analogous to deadlocks (they occur under the same circumstances).

On the other hand, in concurrency land atomics if anything have a far worse reputation than mutexes, but in single threaded Rust, Cell is awesome. All of which is to say: I don't know if leaning too hard on the concurrency analogy is necessarily the best approach here (though you can certainly get a lot of mileage out of it!). Interrupt handlers might get us closer (the general problem being single-threaded reentrancy), and there the approach I believe people tend to take is "keep actual shared state to a minimum and defer the bulk of the work to an event loop" (admittedly, there are also practical reasons why you don't want to spend too long in an interrupt that have little to do with RefCell--all analogies fail eventually).

Edit: Well, there is one abstraction from concurrency world that perfectly translates... move semantics :stuck_out_tongue:

I don’t think there’s a single replacement for RefCell that can be used in most cases: I see RefCells as something to resort to when there isn’t an existing safe abstraction for an invariant I’m depending on. The alternative is writing unsafe code to get a safe abstraction tailored to the specific case.

For example, the “window” iterator functions provide a safe wrapper when you want to access chunks of a collection - it’s safe because you know that the windows don’t overlap, but short of implementing a theorem prover in the compiler, it can’t work that out on its own, so you write the window function in unsafe code and make the guarantees yourself.

If the window function didn’t exist, the only alternative for safe code would be to use a RefCell. However, the compiler isn’t limited to safe code, so it may be possible to factor out all of the RefCells from the compiler, by finding all uses and building the minimal set of safe wrappers which are capable of enforcing the particular guarantees required in each case.

At the very least, it would be worthwhile explicitly annotating every use of RefCell to explain why it’s guaranteed to never be borrowed multiple times (checking that would make quite a nice lint)

edit: Speaking of safe abstractions, here’s a neat one which I haven’t seen yet. CellRc<T>: it acts like a hypothetical Cell<Rc<T>> which is not directly possible because Cell only accepts Copy data. It can’t be borrowed, but it has set/get methods, like Cell. It’s great for storing immutable data structures as it’s much more flexible than Cell, but has less overhead and stronger static guarantees than RefCell (it’s impossible for methods on it to fail due to a double borrow). There’s also a multithreaded equivalent: AtomicArc<T>. (Atomics being the multithreaded equivalent of Cells)

1 Like

On the subject of using Cell for NonCopy types; I believe that the reason it was considered unsafe was actually Rc's implementation of clone. But I wonder if we don’t now have enough type system machinery to just make an unsafe trait that inherits from Clone and acts as a marker that says “The value returned when calling clone() on a value of this type can’t be used to take a reference into owned data of the first type” (since only owned data could be affected by the Drop–basically, piggyback off dropck for safety). I’d have to think about whether there aren’t other gotchas, but if it is safe, it seems like a pretty good solution to the (IMO, frequent) situation where you have a bunch of non-Copy values that you would like to share, and you’re willing to pay the cost of a “real” Clone when it happens (you could still share non-owned data, I suppose, but you can already do that with Cell).

Let’s look at the analogy between atomic variables and Cell. An important property of atomic operations is that they are lock-free and wait-free. That is, a task accessing such a variable can be suspended indefinitely at any point and all other tasks will still be able to make forward progress. With some difficulty, atomic variables can be leveraged to create larger scale lock-free, wait-free data structures which guarantee freedom from deadlocks. We already know about Cell; what would single-threaded equivalents of the more complex structures look like?

Another approach we could borrow is the use of functional data structures. For example, two different threads can safely “modify” a shared cons list by prepending or dropping leading elements. This corresponds to a single-threaded structure that follows a stack discipline, like… well, the call stack.

Append-only or otherwise monotonically “increasing” structures could also be fruitful. There are many tables in the compiler which accumulate new entries but never lose old ones. As long as you avoid reallocations, append-only mutation need not invalidate extant borrows. LVars are an abstraction from the world of concurrency which attempt to capture the core notion of such monotonic growth. Perhaps there is a single-threaded analogue?

The more I think about it, the more it seems like there is a rigorous correspondence here waiting to be exploited. Is anyone aware of any academic research in this direction?

1 Like

Append-only or otherwise monotonically "increasing" structures could also be fruitful. There are many tables in the compiler which accumulate new entries but never lose old ones. As long as you avoid reallocations, append-only mutation need not invalidate extant borrows. LVars are an abstraction from the world of concurrency which attempt to capture the core notion of such monotonic growth. Perhaps there is a single-threaded analogue?

Rustc already leans heavily on arenas for this purpose (it doesn't let you iterate over them but that's more an accident of the API than anything else, I suspect).

I think the fundamental restriction of Cell is that you shouldn’t be able to borrow anything owned by the cell for a lifetime which is tied to the Cell itself. (There’s probably some mathematical name for this!)

With a Cell<Rc<T>>, all references to T are protected by Rc’s own borrow machinery, so there shouldn’t be any unsafety (provided Rc’s clone implementation doesn’t leak a reference to &self into user code).

An unsafe trait implemented for all Copy types by default would do the trick. (But what to call it? TransparentClone?) You’d also need a completely separate trait for the multithreaded one (AtomicClone?)

With a Cell<Rc<T>>, all references to T are protected by Rc's own borrow machinery, so there shouldn't be any unsafety (provided Rc's clone implementation doesn't leak a reference to &self into user code).

I suppose a "write only' variant of Cell<Rc<T>> might be safe, but write only is the effective outcome if you can't take any references into the Rc. If you could take references into it and write to it, it's a recipe for memory unsafety, since setting the interior value could still drop things inside the Rc. If you can't drop things inside the Rc, then it sounds identical to Rc<Cell<T>>, which is aleady safe (albeit relatively uncommon, probably because Cells tend to be embedded within larger structures).

An unsafe trait implemented for all Copy types by default would do the trick. (But what to call it? TransparentClone?) You'd also need a completely separate trait for the multithreaded one (AtomicClone?)

I think you could probably just go with TransparentClone + Sync; if a type is (1) thread safe and (2) has a clone implementation, then it must be safe to clone across threads, and if it additionally (3) guarantees that the clone doesn't have shared ownership, I think that fulfills all our requirements, no? I admit I haven't thought about this too carefully.

Edit: Actually yeah, it wouldn't really work because atomics don't really last long enough to do something like this. In my experience, any code that does pointer manipulation with atomics is very subtle; Cell has the luxury of having infinite time to perform its atomic operation.

Hmm. This is an interesting discussion, though a bit far afield of the original question. @pythonesque and @bkoropoff, I definitely enjoyed your take on why RefCell is bad. I’m sort on the fence here. I want us to use the Rust type system in the best way, and I do find that (in general) pursuing stricter divisions of mutable/immutable state makes the code clearer, will help us avoid ICEs, and so forth. OTOH, the current system is pretty simple and scalable, and I don’t see any huge problems with it.

On balance, I think I am currently in favor of removing the RefCell usage. If nothing else, it may help us to identify things we could change in Rust to make this more ergonomic. But also, if we were writing the compiler from scratch in today’s Rust, I think we would use a lot less RefCell. I suspect over time we will be able to refactor the code in the PRs further so that it is cleaner and requires less manual context passing. Basically, if we DO this change, it seems like overall a sideways step now or maybe a step back in terms of “cleanliness”, but it puts on a path that I think leads to a better tomorrow.

Note that there are some specific decisions I don’t like in the PRs. For example, I want to keep Typer as an object-safe trait, because it cuts down on generic costs (and avoids extra code gen costs). In general objects make the code so much nicer. But probably the best thing is for me to check out the PRs and play around with them; doing so often gives me an appreciation for the choices the author made originally. =)

Regarding parallel constructs and phasing:

I actually LIKE the current compiler structure where things are written in distinct phases and passes. Probably we need clearer documentation, but I think in the end pass-oriented structure is easier to reason about than when things occur in a lazy order (the usual alternative). It’s also more parallel friendly (maps to fork-join in a trivial way, whereas laziness requires a kind of micro-task-graph). I’ve certainly found when reading other compilers that those which use a more lazy scheme can be harder to understand, because the control-flow is very data-dependent, versus those that use a series of passes are relatively easy to diagram and describe.

But I suspect we need clearer documentation about what will be written when. The PR by @Aatch https://github.com/rust-lang/rust/pull/22564 suggested one route here. Basically for those bits of data that will be “filled in” later, the struct includes an ivar (which is a write-once cell that begins empty, essentially). This ivar gives a natural place to document when the value will be provided and so forth, and makes it easy to diagnose when you are accessing data that is not yet supplied.

3 Likes

So how would those annotations look like and how would a lint pick them up?

As far as Typer specifically, I tried to keep it object safe for precisely that reason, but ran into other issues (I unfortunately don't recall what they precisely were now, but I'm sure you will find it quickly if you start messing around with it :smile: ). I think arielb1 presented a good way forward long term with Typer (and actually, part of the reason I started working on trans next was because it's one of the biggest non-typeck consumers of Typer, if not the biggest).

If you're looking or specific things that would make refactoring like this easier in the future, the single biggest is of course nonlexical lifetimes, but this isn't news to anyone :slight_smile: ). Being able to skip writing out all the lifetime parameters when you really just need to specify one of them would also help ergonomics a lot, as well as the "group" thing I mentioned.

I actually LIKE the current compiler structure where things are written in distinct phases and passes. Probably we need clearer documentation, but I think in the end pass-oriented structure is easier to reason about than when things occur in a lazy order (the usual alternative). It's also more parallel friendly (maps to fork-join in a trivial way, whereas laziness requires a kind of micro-task-graph). I've certainly found when reading other compilers that those which use a more lazy scheme can be harder to understand, because the control-flow is very data-dependent, versus those that use a series of passes are relatively easy to diagram and describe.

Hopefully, I conveyed in my post that I don't think it's bad to do things in phases, exactly. What I was getting at was that it feels extremely difficult to make alterations to the high level structure of the compiler (e.g. changing order of phases) because the contracts are relatively fragile, affect tons of code, and aren't compiler-checked. Obviously, it's a fairly large project and it's never going to be easy to make sweeping changes, but more flexibility can't be a bad thing, surely!

My only concern with the ivars was that we're paying a runtime cost for the privilege, and this time one not required for memory safety. But the cost isn't all that high and it's not like the compiler is exactly microoptimized anyway, so adding something that will help us catch bugs can only be a good thing, I suppose. I do feel like it should be possible to do in a way that you don't usually have to pay the cost, though: e.g., once you're done with pass x, go double check to make sure all values you're expecting to be set are set, and then turn them from Cell<Option<T>> into just T as a "type" transformation that only takes place at compile time (this is easier without the Option, and easier still if when the types are modifiable you have &mut references, since you can just shuttle your reference off to live behind something immutable whenever you want to freeze them, which lets you do a whole bunch of them at a time; BTW, am I wrong in thinking that if you have &mut Cell<T> you should be allowed to take direct references into its inner value?)). Anyway... like you said, extending Cell's functionality is really a separate discussion.

Yes, this is a downside of a more phase-oriented structure. It is also sometimes limited, in that there are language constructs which sometimes cannot be easily ordered. So I feel like we should take advantage of it where it makes sense, but not everywhere. I could be persuaded that moving entirely to a lazier architecture makes sense as well, but I'd definitely rather go there more gradually.

I think the runtime cost is negligible. And of course the runtime cost IS required for memory safety in the absence of some fancy type features (or some transmutes). Both of these will however introduce divergence into the type signatures (now we have distinct types for "a class def'n before field X is set" and "a class def'n after field X is set". This is nice, but it affects all consumers of the code, and you wind up writing a lot of very generic code (I've been down this path in past lives...). I'm not sure it's worth it overall. But, in any case, writing the code with ivars is a natural first step.

Rc's Clone impl is actually perfectly safe (it can’t invalidate anything). The Drop impl is somewhat problematic, however - when you are assigning, you must only drop the Rc after you placed a new value. This is more problematic in the concurrent case, because you can drop the Arc between a thread reading it and cloning it, so you need more synchronization, e.g. some kind of hazard pointers (e.g., Linux uses RCU in its AtomicOptionArc-s).

So how would those annotations look like and how would a lint pick them up?

I was thinking of just using a doc-comment on the declaration. They'd follow a particular pattern which a lint could recognise by starting with a known prefix or something:

/// Borrows: mutably borrowed once between each pass
foo: RefCell<Bar>

It's then immediately obvious how 'foo' is used by the program, and when it's safe to borrow 'foo' either mutably or immutably.

But if you manage to serialize that graph to disk, you have something like a holy grail of incremental compilation :wink:

(I have never implemented something like this, so I do not know how feasible it would be in practice!)

1 Like

I think in practice this kind of graph would have too fine a granularity to do this effectively and without invading nearly every aspect of the code, making it rather hard to read.