Breaking down the hashtables to be per-item etc


#1

So I wanted to lay out a mid-level vision for refactoring the compiler. My goal is to move the compiler towards supporting parallelized and incremental compilation. This post is primarily talking about concrete steps we can start taking now that will help move us there. I’m specifically not talking a move to MIR or LIR or any such thing, though I am generally in favor of such a move (but also feeling somewhat unsettled about just what shape such an IR should take.)

I think the first step we have to do no matter what is to start the migration away from “side table city”. The current strategy of using massive hashtables indexed by node-id seems to be generally bad:

  • likely to have poor locality, as the data for a particular fn is scattered about by the hash fn
  • uses a lot of memory
  • relatively expensive lookup, since hashtable conflicts become more likely as occupancy goes up

I think we should start moving on a migration to restructure the hashtables so that there are relatively few (perhaps as few as one per item, or per class of item). These hashtables would lead to data structures. To accommodate phasing, those data structures would have “holes” in them in the form of ivars; each ivar gives us a place to document the pass where the relevant information is produced. This is roughly the pattern that Aatch followed in his PR. This would probably allow us to remove or subsume a number of hashtables, including:

    pub tcache: RefCell<DefIdMap<TypeScheme<'tcx>>>,
    pub trait_defs: RefCell<DefIdMap<&'tcx TraitDef<'tcx>>>,
    pub predicates: RefCell<DefIdMap<GenericPredicates<'tcx>>>,
    pub super_predicates: RefCell<DefIdMap<GenericPredicates<'tcx>>>,
    pub ty_param_defs: RefCell<NodeMap<TypeParameterDef<'tcx>>>,
    pub item_variance_map: RefCell<DefIdMap<Rc<ItemVariances>>>,
    pub impl_or_trait_items: RefCell<DefIdMap<ImplOrTraitItem<'tcx>>>,
    pub trait_item_def_ids: RefCell<DefIdMap<Rc<Vec<ImplOrTraitItemId>>>>,
    pub trait_items_cache: RefCell<DefIdMap<Rc<Vec<ImplOrTraitItem<'tcx>>>>>,
    pub impl_trait_refs: RefCell<DefIdMap<Option<TraitRef<'tcx>>>>,
    pub tc_cache: RefCell<FnvHashMap<Ty<'tcx>, TypeContents>>,
    pub struct_fields: RefCell<DefIdMap<Rc<Vec<field_ty>>>>,

For fns in particular, there are a lot of hashtables that are really storing “per fn” information, but we are using one combined hashtable for all fns in the crate. This “per fn” state should be combined into one “fn info” table that is reachable either in its own hashtable or as part of the “fn item info” struct. At least the following hashtables could be moved in there, I believe:

    pub region_maps: RegionMaps,
    free_region_maps: RefCell<NodeMap<FreeRegionMap>>,
    node_types: RefCell<NodeMap<Ty<'tcx>>>, // XXX weird
    pub item_substs: RefCell<NodeMap<ItemSubsts<'tcx>>>,
    pub freevars: RefCell<FreevarMap>,
    pub ast_ty_to_ty_cache: RefCell<NodeMap<Ty<'tcx>>>,
    pub adjustments: RefCell<NodeMap<AutoAdjustment<'tcx>>>,
    pub upvar_capture_map: RefCell<UpvarCaptureMap>,

Note that this is something we can do gradually. We can start with one table, say adjustments, and setup the necessary scaffolding, and try to migrate additional tables as we go. I would expect to see performance improvements over time.

Of course once we do this we’ve still got a fair distance to go before we can parallelize. One of the most interesting questions we have to solve is how to deal with the type inferning tables, for example. But I think isolating out the state and removing hashtables is definitely a good start.

It’s a bit afield, but longer term, I’ve started to come around to the idea of restructuring the compiler into more of a DAG, rather than using passes that run to completion. In particular, if we do this right, it would not only help to document the connections between items, it might be used to automatically record a construct a dependency DAG, which can then serve for incremental compilation. My thought is basically that rather than poke at the data structures in tcx directly, we’d be using a trait, which would serve to document the interface between passes, and lets us easily insert intermediaries to capture requests.

Thoughts?


Flattening the contexts for fun and profit
#2

trait_defs is already such of a map (I should move impl_or_trait_items into it, through).

If you want to use IVar-s, we should probably go with an item-based SmallVecMap or something. If we use compacted ItemId-s (per-crate or interned) we could probably get a non-terrible load, and locality of reference should improve.

I actually kind-of like the hash-table system - it is essentially an ECS. The reason aatch got his 7% boost on his patch is because our struct_fields caching is broken.


#3

+1 for all of this. I think it’s hard to narrow down the design much further without picking a table, implementing this redesign and seeing what breaks.

One thing I dislike in the compiler atm is the number of different contexts we have floating about. Whilst we do this work I think it would be nice to also think about how we can combine or redesign the system of contexts.

On the longer term stuff, I wonder how this would interact with ideas for having one or more IRs. My preferred architecture would be for a series of IRs and lowering steps between them. That suggests a by-phase architecture. However, One could imagine lowering on a function by function (or basic block by basic block) basis which would probably work well with the lazy/DAG architecture.

@arielb1 could you expand “ECS” please? I’m not familiar with the acronym.


#4

Ah, @arielb1, you reminded me of something. Another reform I’ve wanted to do for a very long time is to move away from having one just NodeId and instead have distinct kinds of ids: ExprId, ItemId, etc. This would help with making numbers lower and also just clarify what is going on I think. The main challenge is that there are a fair number of places in the compiler where we take advantage of “punning” – and we could still have a NodeId enum if needed that encompasses any sort of id. In any case, this seems like another sort of thing that can be done incrementally.


#5

The main reason that I like the IVars or something similar is that I think it would be easier to read the code. I’m not dead set against hashtables in general. One concern I do have is that there are a number of cases – e.g., generics or the predicates hashtables – where they apply across many kinds of items (traits, etc), and it seems useful to just have that all in one hashtable. So I imagine there might be a mix when all is said and done?

To be honest the thing I feel most passionate about is moving the fn state out of global, crate-wide hashtables into per-fn hashtables. I am less certain about how to organize the other stuff, though IVars seem like a step forward to me.

@nrc, regarding the multiple IRs, I saw that as something else that would be stored. Right now our “lowered” method IR is basically the AST combined with side tables. Eventually that gook would be replaced with something more structured. Depending how many lowerings we want to do, though, we may want to be able to throw away the MIR once the LIR is built or whatever, but that seems easy enough to achieve.


#6

I don’t think I approve of this change - it is very nice for tools to be able to have an id which can point anywhere into the AST and is unique within the crate.

I would like more nodes in the AST to have node ids though.


#7

Aside: The current ast_map doesn’t do parenting well, the parent node is always the nearest Item or similar.

I think having a properly parented ast_map would be useful for tools too.


#8

That does not prevent also having a more-specific id.


#9

Hmm, interesting. That might be true, but I’m not entirely convinced that node-ids as they are are particularly good for tools (examples?). The current node ids are tremendously unstable, for example, and they don’t have any particular mapping to things the user knows. It seems like it’d be fairly easy to construct richer pointers (i.e., expr 22 of fn 23), and if integers are really required to have a side-map that maps node-ids to these more stable references.

To some extent, though, my ardor for segregating node-ids will depend on what we decide with respect to MIR and so forth. In particular, if node-ids mainly play a role in the front-end, but we have some other IR later, then maybe that IR can use a better scheme (and I imagine most tooling will be primarily concerned with the AST).


#10

I think this is definitely a step in the right direction, I think. I also like the ivar approach.

What I was not able to glean from your write-up is your idea of how long these “per fn” datastructures should live. Ideally, the things needed for compiling a single item (which means mostly, a single function) from type-checking to translation would be aggregated during these steps and then discarded again before compiling the next item. That should reduce memory consumption quite a bit and would result in something easily parallelizable:

                PARSE
                  |   
                EXPAND
                  |
               RESOLVE
                  |
               COLLECT
         /        |        \
        /         |         \
    ANALYSIS0  ANALYSIS1  ANALYSIS2
       |          |          |
     TRANS0     TRANS1     TRANS2

Of course, the above diagram is very simplified and there are probably things that are computed during compilation of one item, where it makes sense to cache them for re-use instead of re-computing them later. But overall, in the best case, that would mean that for a large chunk of data only as many instances as there are parallel threads need to exist at the same time.

With that, do you mean a DAG of compilation phases, i.e. the nodes of the graph are compilation steps and the edges signify the before/after relationship? Or do think of something where the DAG is at the item-level, i.e. the nodes of the graph are actual AST nodes/type definitions/etc?


#11

ECS: entity-component-system, a sort of design pattern in which entities (essentially IDs) are associated with components (data) which are acted on by systems (transformations). As far as I know people mostly use that design in game engines. So it was an interesting surprise that @arielb1 drew that connection- nice catch!


#12

I think those points you are raising are more concerned with the second half of what I was talking about, when I described changing compilation away from a discrete series of passes and more into a “demand-driven” DAG-based approach. I hadn’t really thought about the possibility of lowering overall memory usage by not keeping around the data for functions – that’d be a good design goal, though.

I was imagining initially that the tcx would have a map from fn ids to the “FnDefinition” information, which would include all the per fn tables. This maps more closely to what we do today and obviously will not allow us to throw away data earlier, but I think the two concerns are separable. Eventually this defintions might never live in a central table, as you point out, and instead just flow from the parser to the back-end (where they’d get serialized for inlining and incremental codegen purposes, presumably).


#13

More stable node ids would be nice, it is true. I’m not sure how to trade off the simplicity of simple, global ids vs stability. Its really nice to have a single id and be able to dump that into a database and get meaningful stuff for it. It can then be used as a key for looking stuff up really easily and we can go back to the compiler and ask for info on it.

To be fair, I don’t think having more complex IDs will hurt too much - already we have to use def ids when dealing with multiple crates, so we don’t really have simple ids. But, the simpler things are the better - I’ve already had bugs in DXR due to converting def ids to a global id.

Agree on the MIR stuff