Rustdoc2, rls-analysis, and the compiler: help wanted!


#1

Hey everyone!

So, now that rustdoc2 is in full swing, I’m starting to run up against limitations of the RLS with regards to what it needs. This is very much expected!

However, @nrc is quite busy with various things at the moment, and so I’m gonna need some help to get this done. @nikomatsakis suggested I create a post here to maybe coordinate some initial work.

Before I talk about the problem, I want to emphasize that I don’t need something perfect at this stage. Furthermore, getting the basics of this working is a hard blocker for me shipping something useful, so I’d like to try to get something going ASAP. I am totally okay with shipping a bad, hacky solution at first, and then later replacing it with something good. I’d much prefer that, honestly, as it gets me going.

Right now, I can get a Def for various things. That works just fine. But, what I can’t do is, from the Def of a struct or trait, get a list of methods, their type signatures, and their docs. That’s the issue. It’s also why it’s a hard blocker: right now any struct’s page is going to simply be a name, its doc string, and nothing else!

@nrc told me that this info is not yet inside of save-analysis, and we’d need some compiler work to make this work, and that it needs designed at all.

On the whole “minimal solution” thing again, it would be 100% okay for this first bit to only have inherent impls; I just need some kind of useful info, not full fidelity yet. Dealing with traits overall is going to be a Hard Problem, but I don’t want to block shipping something useful while we work on getting all of that right.

So, any ideas on how we should tackle this? I’m totally interested in helping ship this, but I don’t do much compiler hacking, and so would need some help.


#2

Quick thought: Isn’t Relation in Analysis going to achieve what you want here? Not sure exactly which trait impls are recorded and collected by the rls-analysis, but I imagine initially you could walk through all the implemented traits for a Def and collect their children member Defs and whatnot. Probably relation might need better exposing via rls-analysis interface to be more easily accessible.

I’ll have to take a closer look, as this was just me speaking off the top of my head, but I’m more than willing to help with this, as I did some related work in the context of rls-analysis and the compiler for the RLS!


#3

I’ve never seen that API before! So maybe!


#4

OK, going to try and organise my thoughts here. I should say that I haven’t tried any of this out, so if there are better ideas for handling this then I’d be keen to hear them out.

Overview

The compiler knows all things. We need to get data into rls-analysis, which lets rustdoc display the ‘methods available’ for a type. The pipeline is:

  • save-analysis must be configured to do this, we can’t include this data by default (see Config)
  • we need some API in the trait system for providing the list of methods available for a type, this needs to be organised by impl (and in fact, just having the list of impls is probably enough)
  • we need data structures in the save-analysis data structures to represent this information (I think this is mostly done, but if the design changes, it will need more work, see Impl)
  • in the save-analysis crate, we need to add all the impls for each type, as found from the trait system
  • we need to process this data in rls-analysis and make it available in a nice API

The complicated bit is that there are many ways that an impl can apply to a type (consider Deref impls, the various auto-referencing/dereferencing that can happen in the . operator, blanket impls, etc.). I really, really want to avoid re-implementing the trait system in rls-analysis (or rustdoc), so I think it is important that the ‘computation’ happen in the compiler by expanding the trait system. We should not have the ‘raw data’ in the save-analysis data and then compute in rls-analysis.

‘Locality’ is also awkward, consider crates C1, C2, and C3 where C3 depends on C2 which depends on C1, and a concrete type T in C2. There might be impls in all three crates which apply to T. Should the impl data be included where the impl occurs or where the concrete type occurs? What if the impl is downstream/upstream of the concrete type?

Data design

I propose that for any concrete type Foo in a crate, save-analysis data includes an Impl for every impl which applies in the current or upstream crates (note that this means that a save-analysis impl could be from an upstream crate, not just the current one, which is a new thing for save-analysis). For every syntactic impl in a crate, the data includes an Impl for every concrete type it applies to in upstream crates. This means that for every impl in the source code, there will be many Impls in the save-analysis data. There is some repetition here and it may be worth separating out data about the syntactic impl (e.g., the span and children (i.e., the list of implemented methods) from the relation between concrete type and impl - we already have a Relation type, but it doesn’t quite fit yet). EDIT - I think we need to make that last optimisation as part of this work - because if the impl is upstream then we don’t have easy access to the children, docs, etc. So, we should change the RelationKind::Impl variant a bit to include data about the impl, and make Impl just about the syntactic impl. We need a way to refer to impls then and I’m not sure if they have some kind of Ids, if they don’t in the compiler today, then we can synthesis them one in save-analysis.

ImplKind records the kind of impl which relates a concrete type to a trait. Direct means something like impl Foo for Bar whereas Indirect means something like impl Foo for Box<Bar>. Blanket is of the form impl<T> Foo for T. Note that there might be a chain of impls which cause a trait to apply to concrete type and we don’t store information about the chain, only the ‘last’ impl, i.e., the one which names the trait (though we should have records of other impls in the chain, we don’t store their presence in any chain, I think this is OK for the rustdoc use case). Deref describes an impl for a type that applies to a concrete type via a Deref impl (note that there is no record of the actual Deref impl); e.g., Consider Foo and Bar where there is Foo: Deref<Target = Bar> and a trait impl Tr for Bar, then there will be an Impl in the save-analysis data which links Foo to Tr and which has kind Deref("Bar", id) where id is the DefId of Bar.

Changes to save-analysis

I think this is the easy bit, whenever we encounter an impl we have to record the methods etc. I think we already do this, but iirc, only for inherent impls. Then for every concrete type we need to query the trait system to get all known impls and list out these as Relations. We need to stop recording the impl relations we currently record. When we record an impl we need to find all upstream concrete types which the new impl applies to. I’m not sure what that API looks like or how it would/could work.

New trait system API

AIUI, when doing method resolution, we know the type of the receiver, then search through all impls to see if the named method exists there (including doing the deref adjustments). The new API we need starts with a concrete type and instead of searching all possible impls, makes a list of them and returns it. I have been told this is not too hard, but I don’t know where to even start.

The harder question is what to do for concrete types with impls in downstream crates.

Changes to rls-analysis

We need to read in the impl and relation data as raw save-analysis data (this should be easy), then lower it to an appropriate format (I think this should be a single mapping from concrete type DefIds to Impl ids, plus for each crate a table of impls. The lowering should include getting rid of the distinction between same-crate, upstream, and downstream impls).

Finally, we’d need to present an API for rustdoc (and other potential clients). I think, we probably want to be able to iterate over all impls for a type, and then for a given type, iterate over items in the impl (methods, assoc types, etc.). Alternatively, we might iterate over the items and each includes a tag for the impl it comes from. Or perhaps we want both views.

I’m not sure where we should instantiate the Self and associated types - does this need doing at all? If so, should it be done by rls-analysis or rustdoc? Under the above model, we can’t do it in the compiler. It seems like functionality that all clients would want, which suggests rls-analysis, but it is very different functionality from anything it does at the moment.

MVP

This is all quite a lot of work! We should start with something minimal. I suggest:

  • we cover all kinds of impls, I believe that the trait system makes it as easy to do this as to look at just simple impls - it might be easier to elide Deref impls and impls which require deref adjustments in method calls, if so, we should elide them to start with.
  • we should stick to same-crate or just same-crate + upstream impls, and ignore downstream impls
  • we should not worry too much about an efficient data representation
  • not sure if there are other areas we do less, but we should try!

#5

So, after a little discussion with Steve there is an even more minimal MVP. The basic idea is to only take account of inherent impls (we also get the most basic trait impls in the same crate). There’s a little bit of work to do in a few places since the existing code is meant for a different use case. I think that the key work items are:

  • check that we emit a Relation struct for every impl we’re interested in (it’s possible bugs have happened since I last used this data
  • add the id of an impl to Relation::Impl or change the to id in Relation to be the id of the impl, not the trait
  • emit information about the impl - I think we should use the Impl struct for this, though we might miss some fields such as ImplKind
  • I believe then we have enough information to link a type to some of its impls and then to the methods contained inside (or to a trait and then to methods)
  • we’ll need to modify rls-analysis to store all this data in its tables and API to make the data available.

As opposed to the full plan in the previous comment, I know how to do all of this! I believe this shouldn’t be too much work either - hours rather than weeks. I’d be happy to mentor anyone who wants to have a go, and can help out too, though won’t have time to do everything since I am travelling next week and have a massive backlog of work.


#6

Here’s a start.


#7

Here’s part 2