OK, going to try and organise my thoughts here. I should say that I haven’t tried any of this out, so if there are better ideas for handling this then I’d be keen to hear them out.
Overview
The compiler knows all things. We need to get data into rls-analysis, which lets rustdoc display the ‘methods available’ for a type. The pipeline is:
- save-analysis must be configured to do this, we can’t include this data by default (see Config)
- we need some API in the trait system for providing the list of methods available for a type, this needs to be organised by impl (and in fact, just having the list of impls is probably enough)
- we need data structures in the save-analysis data structures to represent this information (I think this is mostly done, but if the design changes, it will need more work, see Impl)
- in the save-analysis crate, we need to add all the impls for each type, as found from the trait system
- we need to process this data in rls-analysis and make it available in a nice API
The complicated bit is that there are many ways that an impl
can apply to a type (consider Deref
impls, the various auto-referencing/dereferencing that can happen in the .
operator, blanket impls, etc.). I really, really want to avoid re-implementing the trait system in rls-analysis (or rustdoc), so I think it is important that the ‘computation’ happen in the compiler by expanding the trait system. We should not have the ‘raw data’ in the save-analysis data and then compute in rls-analysis.
‘Locality’ is also awkward, consider crates C1
, C2
, and C3
where C3
depends on C2
which depends on C1
, and a concrete type T
in C2
. There might be impls in all three crates which apply to T
. Should the impl data be included where the impl
occurs or where the concrete type occurs? What if the impl is downstream/upstream of the concrete type?
Data design
I propose that for any concrete type Foo
in a crate, save-analysis data includes an Impl
for every impl which applies in the current or upstream crates (note that this means that a save-analysis impl could be from an upstream crate, not just the current one, which is a new thing for save-analysis). For every syntactic impl in a crate, the data includes an Impl
for every concrete type it applies to in upstream crates. This means that for every impl
in the source code, there will be many Impl
s in the save-analysis data. There is some repetition here and it may be worth separating out data about the syntactic impl (e.g., the span and children (i.e., the list of implemented methods) from the relation between concrete type and impl - we already have a Relation type, but it doesn’t quite fit yet). EDIT - I think we need to make that last optimisation as part of this work - because if the impl is upstream then we don’t have easy access to the children, docs, etc. So, we should change the RelationKind::Impl
variant a bit to include data about the impl, and make Impl
just about the syntactic impl. We need a way to refer to impls then and I’m not sure if they have some kind of Ids, if they don’t in the compiler today, then we can synthesis them one in save-analysis.
ImplKind records the kind of impl which relates a concrete type to a trait. Direct
means something like impl Foo for Bar
whereas Indirect
means something like impl Foo for Box<Bar>
. Blanket
is of the form impl<T> Foo for T
. Note that there might be a chain of impls which cause a trait to apply to concrete type and we don’t store information about the chain, only the ‘last’ impl, i.e., the one which names the trait (though we should have records of other impls in the chain, we don’t store their presence in any chain, I think this is OK for the rustdoc use case). Deref
describes an impl for a type that applies to a concrete type via a Deref impl (note that there is no record of the actual Deref
impl); e.g., Consider Foo and Bar where there is Foo: Deref<Target = Bar> and a trait impl Tr for Bar
, then there will be an Impl
in the save-analysis data which links Foo
to Tr
and which has kind Deref("Bar", id)
where id
is the DefId
of Bar
.
Changes to save-analysis
I think this is the easy bit, whenever we encounter an impl
we have to record the methods etc. I think we already do this, but iirc, only for inherent impls. Then for every concrete type we need to query the trait system to get all known impls and list out these as Relation
s. We need to stop recording the impl relations we currently record. When we record an impl
we need to find all upstream concrete types which the new impl applies to. I’m not sure what that API looks like or how it would/could work.
New trait system API
AIUI, when doing method resolution, we know the type of the receiver, then search through all impls to see if the named method exists there (including doing the deref adjustments). The new API we need starts with a concrete type and instead of searching all possible impls, makes a list of them and returns it. I have been told this is not too hard, but I don’t know where to even start.
The harder question is what to do for concrete types with impls in downstream crates.
Changes to rls-analysis
We need to read in the impl and relation data as raw save-analysis data (this should be easy), then lower it to an appropriate format (I think this should be a single mapping from concrete type DefIds to Impl ids, plus for each crate a table of impls. The lowering should include getting rid of the distinction between same-crate, upstream, and downstream impls).
Finally, we’d need to present an API for rustdoc (and other potential clients). I think, we probably want to be able to iterate over all impls for a type, and then for a given type, iterate over items in the impl (methods, assoc types, etc.). Alternatively, we might iterate over the items and each includes a tag for the impl it comes from. Or perhaps we want both views.
I’m not sure where we should instantiate the Self and associated types - does this need doing at all? If so, should it be done by rls-analysis or rustdoc? Under the above model, we can’t do it in the compiler. It seems like functionality that all clients would want, which suggests rls-analysis, but it is very different functionality from anything it does at the moment.
MVP
This is all quite a lot of work! We should start with something minimal. I suggest:
- we cover all kinds of impls, I believe that the trait system makes it as easy to do this as to look at just simple impls - it might be easier to elide Deref impls and impls which require deref adjustments in method calls, if so, we should elide them to start with.
- we should stick to same-crate or just same-crate + upstream impls, and ignore downstream impls
- we should not worry too much about an efficient data representation
- not sure if there are other areas we do less, but we should try!