A short while ago I started a topic on implementing an IR (or two) for the compiler. Since then I have been investigating how we’d best go about doing it. One of the more obvious things I’ve discovered is that the current type system code will need a good amount of refactoring in order to make any architectural changes going forward.
middle::ty
and it’s satellite modules have become, over time, a dumping ground for a variety of
other systems. There are fields in the context object for many things that are only loosely related
to typing.
-
used_unsafe
- A set of used unsafe nodes, this seems to exist solely to allow for “unused unsafe” warnings. -
node_lint_levels
- A map from nodes to what lints are allowed/warned/denied. This exists ontcx
only to allow for the enum sizes lint - in trans! -
stability
- A map from a node to a stability. This seems very out of place and again only serves to power a warning. -
used_mut_nodes
- Set of local variables, markedmut
, that get used (in a way that means they need to bemut
. This is for a lint, again.
On top of this, many fields are just tracking/caching small amounts of information that are specific
to certain nodes. We have a cache for object-safety that is just a map to boolean values! We have so
many caches it’s ridiculous. Any possible large-scale change to the compiler is going to have to
start at middle::ty
. This is what I think needs to be done:
1: More complete representation of the type system
There is no specific type you can go and look at and say “this represents the definition of a struct type”, instead struct types have to be constructed from several different maps. The same goes for other types.
So, I propose that we create a “definition” representation of types that is allocated in an arena
and interned like types currently are. This definition representation would have the name, the type
parameters, and the contents of the type. Importantly, instead of an opaque DefId
that you use to
look up information in a map, this would be a pointer to the actual definition. This is big win from
a performance perspective, as you 1) don’t have to handle the very unlikely case that the entry
isn’t in the map and 2) don’t have to do a much more expensive hashtable lookup.
2: Remove anything that isn’t directly related to typing from the context
We have a certifiable God Object going on here. As i mentioned earlier, it is incredibly bloated and much of the data doesn’t need to be around for most of the compilation. There needs to be a small-as-possible typing context that contains only what is necessary for working with types. Unless we make stability and lints part of the type system, they have no reason to be leeching off of the ubiquity of the type context.
On the same topic:
3: Stop putting everything in hash maps
I get it, hash maps are incredibly useful datastructures. But I think we have a problem with hashmap abuse. Consider this an intervention.
A lot of this data should be stored inline where it’s actually used/useful. Part of the problem is
that we don’t have anywhere to put a lot of this data (hence the first point), but that doesn’t
stop it from being a problem. Whether or not a trait is object-safe should be part of the
representation of the trait, not in a hashmap from a DefId
to a bool. What traits (like Copy
and
Sized
) a type implements should be part of the representation of those definitions. Whether or not
a definition is an associated type shouldn’t even be something we have to store. It should
just… be.
4: Use a Typed AST
There is pretty much no part of the compiler that doesn’t need the types of various parts of the AST. Instead of constantly looking up the same types in the same places and doing the same “is it there” checks, we need to have a secondary AST that incorporates the types directly. This would be a massive boost not only to performance, but productivity. Currently, seemingly-simple tasks like "what is the type of this expression" require more work than is expected.
What I’m hoping is that people that understand this code better can provide some more details. As I said, refactoring this code is almost certainly going to be required for any major changes going forward.