MIR: how to represent back-edges and the CFG in general?

So in @nagisa’s https://github.com/rust-lang/rust/pull/33628 PR, I noticed that he has run up against the problem of needing easy access to the predecessor’s from the MIR graph.

The same problem arose discussing https://github.com/rust-lang/rust/issues/32966 with Scott Carr (username unknown). In particular, I wanted to port some graph algorithms code that I wrote to operate on MIR so that we can easily compute dominators, post-dominators, and so forth, but that code needs access to the predecessors for a node.

Anyway, all of this has reminded me that I wanted to discuss whether we are representing the graph in the right way. In the first version of the MIR representation that I wrote, I based it on the rustc_data_structures::graph code. This means effectively that each basic block could have any number of successors and so forth – the terminator did not contain the basic blocks directly. Therefore, if you had a Goto terminator, for example, you would expect exactly one successor, and if the terminator was If, you would expect two successors (true, false), and so forth.

The advantage of this is that it is super easy to write generic graph traversal code – right now we have to use some rather awkward accessors (successors, successors_mut) for this. The disadvantage is that you have to maintain the correspondence for the terminator and the number of successors etc yourself. On balance, it seems better to use the generic graph than not to me-- I’ve used this system in many compilers in the past and never really had a problem with the terminator / set-of-successors getting out of scope.

So how do people feel about moving MIR back to use librustc_data_structures::graph again to represent the graph? This would mean replacing the Vec<BasicBlockData>, basically, with a Graph<BasicBlockData>.

I’d also be open to some other graph representation, though (a) I think the one we have is pretty decent and (b) I’d prefer to just upgrade librustc_data_structures::graph in that case, rather than having something special for MIR.

cc @nagisa @eddyb @arielb1

cc @Aatch :slight_smile:

I wonder how would one represent a BB’s successor inside a generic graph. Would we simply have a struct BB {..., term_kind: TerminatorKind } and rely on successor count and, especially, ordering being just right? Current representation has a benefit of successors being essentially impossible to get wrong. OTOH, as long as the graph guarantees some sort of successor order it should be right? Optional successors seem like a particularly tricky case. Call, especially, because it has two optional successors. Technically, it is possible to reconstruct what each successor means from the data in terminator (e.g. 1 successor and no destination means the successor is for the unwind branch), but it seems trickier than it needs to be.

So how do people feel about moving MIR back to use librustc_data_structures::graph again to represent the graph?

I would strongly support this. I believe that any serious work involving CFG would be hindered by lack of efficient graph representation (in terms of time, not space) and algorithms. I’m somewhat surprised, actually, how well we’ve managed so far with just a plain vector.


an afterthought (can’t be bothered to rewrite the comment again): graphs support arbitrary data on edges, right? We could have enum EdgeKind { Unwind, Success, whatnot } as edge data? (EDIT: or even represent the terminators as edge data?)

Yes, this is what I was proposing.

Yes, this is the downside to what I was proposing :slight_smile: -- or, rather, the upside to the current system.

Yes, I imagine that we could provide some helper methods to help here -- basically a way to get the appropriate successor "by name" which then indexes into the graph. I don't really have a better idea at the moment.

Interesting. Yes, we could do that.

Why can’t we just use a trait?

I'm not sure what this means...? A trait for what?

However, there was some chatter on IRC in which @arielb1 made a reasonable case for keeping the current system and re-computing predecessors when needed. This would work fine too, as long as that code is accessible enough.

(In terms of integrating with that dominators etc code that I wrote, I imagine the generic graph trait I had would be implemented by the predecessor cache, essentially, which would I guess also mediate access to the underlying MIR...?)

I meant that.

I think we need to decide who would implement his trait. Would this trait by implemented by Mir, CFG, Builder, or ...?

Currently we build a CFG in Builder but throw that information away. The MIR transforms work on a Mir and do an ad-hoc recreation of a CFG. Maybe a Mir itself should retain some notion of a CFG?

Part of the question is who are the consumers of the back-edges and CFG? Builder or transforms or both?

ps, we don’t have to use this trait, but the trait that I defined was a very simple one:

It just allows you to query the preds/succs of a graph node (which are basically named via an index) and get back an iterator. This is then the basis for all the other algorithms.

I think what @arielb1 was proposing was that we would have some new type that you create which lets you access the MIR as a graph (re-creating the information it needs).

This CFG doesn't really have any additional information that MIR doesn't, iirc?

Yes, you're right. Our CFG never explicitly stores a CFG :wink:

But I was thinking that when Builder creates a new terminator it sort of 'knows' the new edge(s) it just created. It 'knows' what block the new terminator terminated and where it jumps to. Anyway, I think this is sort of orthogonal at this point.

+1 Trait. I think the similar tricks that work for intrusive collections https://github.com/Amanieu/intrusive-rs https://github.com/RustOS-Fork-Holding-Ground/intrusive-collections should work here—the ideal graph library for IRs would be some sort of intrusive graph.

Might also be cool to get rid of basic blocks and teach graphing library how to optimize linear chains.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.