The RLS / Rustdoc interface to rustc


#1

So I decided to spin off another thread relating specifically to the interface between the compiler and the RLS / Rustdoc. This is continuing the conversation we started in the Splitting compiler team meetings into proactive/reactive thread, but that thread name didn’t feel very … suitable.

So our IRC discussion, I’ve had some more thoughts (as well as some conversations with @nrc and @steveklabnik). It felt like we made progress in the meeting, but didn’t reach a total consensus. I figured we can discuss a bit in this thread, and maybe we can schedule another meeting for details.

I’m going to keep this initial message very short, and just ask for a few people to leave comments with their current thoughts:

  • Regarding RLS, @nrc mentioned that he would try to write up his thoughts after the discussion.
  • Regarding rustdoc, I think it would b great if @steveklabnik or @GuillaumeGomez (or someone else) could draw up a list of the major blocking architectural issues?
    • My impression is that rustdoc/RLS mostly want the same information out of the compiler, and they should be built on the same libraries.

I’d also to note that it may be helpful to try and focus on the short-to-medium term. That is, we are still actively building up the query/incremental system, and it’s clear that – whatever we want long-term – we can’t drop save-analysis tomorrow. What we can do, I think, is to start (a) getting the RLS to support incremental and (b) start trying to define the interface between the compiler and RLS a bit more precisely.

(Ultimately, my take is that what matters most is what the interface is between compiler and the RLS. If it seems to make things cleanest to have the compiler dump a bunch of data that the RLS can load and manage itself, that’s fine – and we’re going to be doing it in the short term. But how does that data get generated? What kinds of links and information does it include? These seem like the same questions that any query-based approach must answer.)


Splitting compiler team meetings into proactive/reactive
#2

I can give a first (short) list based on the issues I encountered and that I remember of:

  • Rustdoc has to redefine a lot of things to display them (mainly hir structs).
  • Rustdoc has some limitations, for example, I tried to add the list of all traits implemented for a type, including the generic implementations from parent crate(s). I had discussions with @eddyb about this but we didn’t get much further.

#3

I’ll be back with some detail about RLS planning. Re Rustdoc, here are some thoughts on Rustdoc and the RLS working together.

Proposed architecture

I propose that Rustdoc depends on the RLS, this disconnects it from the compiler and provides an easily understandable interface, while sharing a large amount of code with other important projects (primarily IDE support).

RLS background

The RLS is a bunch of libraries which form a layer around rustc and Cargo and provide an interface for tools to get information about Rust programs. The RLS is not a single thing - there are different crates which provide different levels of abstraction and different modes of operation.

  • librustc_save_analysis - part of the compiler, turns the compiler’s internal data about a crate into a more easily understood (and slightly abstracted) format. Output can be accessed as an API (to query a single AST node), or as a dump of all info about a crate, this dump can be passed as in-memory data, JSON, or CSV (and can be extended to other formats).
  • rls-analysis - takes save-analysis as input (either directly in memory or via the JSON dumps) and presents the data as a Rust API. This involves some post-processing and cross-referencing of data, then storing it in a set of hashtables.
  • rls-vfs - a virtual file system for the RLS, not relevant to rustdoc
  • rls - a client of rls-analysis, it manages builds using Cargo and rustc, uses rls-analysis to process the results of the builds, and presents this all using the LSP.
  • rls-span, rls-data - helper libs, not relevant

Example clients:

  • rls-vscode - our reference IDE implementation uses the rls lib, communicates over LSP
  • other IDE plugins - similar to above, use rls over LSP
  • rustw - a web-based code exploration tool, uses rls-analysis from its (Rust) backend, does not use the rls crate and shells out to Cargo directly (will also work with other build systems, which rls does not). I would expect rustdoc to follow this model.

Overview

(This is a very early sketch of how things could look, don’t take it as a concrete proposal, only to illustrate how the RLS and Rustdoc could work together)

I imagine that we would have a backend written in Rust which would act as a web server. I.e., Rustdoc pages would not be statically generated as there are now, but would be generated on demand. The backend would operate on a save-analysis dump, i.e., does not itself include a build step (note that for the distro, this can be installed by rustup, we should provide a helper program (probably a Cargo plugin) that builds a project, and starts the Rustdoc backend with the data). The backend uses rls-analysis to read and process the data and does no processing itself. It uses rls-analysis’s API to get information and processes it into something easily digestible by the frontend, provided as a RESTful (ish) http API.

The frontend should be a ‘single page web app’ using standard JS web tech. Personally, I would use React, but we could use Ember, Angular, whatever. It would send ajax requests to the backend to make the docs - one request per ‘page’.

rls-analysis API

We’d need some new APIs, but a lot already exists. The key data structure is a Def which represents any definition:

pub struct Def {
    pub kind: DefKind,
    pub span: Span,
    pub name: String,
    pub qualname: String,
    pub api_crate: bool,
    pub parent: Option<u32>,
    pub value: String,
    pub docs: String,
    pub sig: Option<Signature>,
}

Note that we already include data about docs, parent (which gives info for ‘up links’), and the signature (more on this below). We have a function for_each_child_def which gives access to all children defs (e.g., fields in a struct, items in a module, etc.), and find_impls which gives all the impls that a type implements (although this is not well-tested and will probably need some bug-fixing).

We would need new APIs for finding the 'root’s (i.e., the top-level modules of each crate, easy), for text search (we already can search by identifier, but we can’t do the kind of fuzzy search that rustdoc currently does, we want this for IDEs too), and possibly need to add more data for the details of impls.

Signatures need re-doing. The current version doesn’t really work. The concept is that they would contain enough data to render any item. There is a little design work plus implementation (which touches the compiler) to do here. Perhaps it should be integrated with DefKind. Straw-man sketch:

enum DefKind {
    Fn(FnSignature),
    ...
}

struct FnSignature {
    generics: Vec<(String, Id)>,
    args: Vec<Arg>,
    ret: Option<(String, Id>,
    vis: Visibility,
}

struct Arg {
    var_name: String,
    var_id: Id,
    ty_name: String,
    ty_id: Id,
}

enum Visibility {
    Pub,
    PubRestricted(String, DefId),
    None,
}

The big missing piece is ‘logical children’, for example taking into account deref coercions or impls which are not straightforward (e.g., when doc’ing Ty, impl ... for &Ty or blanket impls). I’m not even sure what this should look like. But, I don’t think there is anything super-hard here, and we probably want something very similar for compiler-powered autocomplete.

Proof of concept

rustw has a ‘summary’ view which is a bit more source-oriented than rustdoc, but is very similar in concept, it demonstrates that this approach basically works. It is probably not encapsualted enough from the rest of rustw that it can be a foundation for Rustdoc though.

Handlebars version: https://github.com/nrc/rustw/blob/master/templates/summary.handlebars

React version: https://github.com/nrc/rustw/blob/react/static/summary.js#L93 (WIP)

Rationale

The RLS is used by IDEs and is likely to be continually developed. While the underlying architecture might change (in particular to take advantage of incremental compilation) it is unlikely to disappear or for the API provided by rls-analysis to change dramatically. The level of abstraction feels right for rustdoc - it abstracts away a lot of the low-level detail from the compiler’s data structures, but doesn’t lose any info that we might need for Rustdoc.

Rustdoc only talks to the compiler via a data dump and some helper libraries. There is no need to be linked to the compiler directly, nor do you have to worry about versioning (too much). Most developers only need to understand the API of rls-analysis which is fairly small and straightforward. For some debugging, reading JSON is required. But it should be very rare to have to add features to the compiler. There is no unstable code, and no need to be in the same repo as the compiler, or even to be a submodule or whatever (modulo distribution issues).

Using the compiler directly is a bad idea:

  • the data structures are very low-level and need a lot of plumbing. You would be mostly duplicating code in librustc_save_analysis to do this.
  • the compiler’s API is not stable - breaking changes are expected and these would break rustdoc. Fixing these would require knowledge of compiler internals. The only way to not get such breakage is to keep Rustdoc in-tree.
  • you have to build against the compiler which means long build times
  • rustdoc would have to be part of the rustup distribution route and could only be installed this way. Using the RLS, it could be installed with Cargo, or built and used from source with any compiler.
  • you have to build the project you want to document, this makes projects like rustdoc.org or users building their own std lib docs less convenient.

#4

This all makes sense, except for the part where users need to run RLS and a web server to read documentation. That seems rather undesirable compared to the static HTML pages of today. Among many other things, it significantly increases the cost of hosting docs as rust-lang.org and docs.rs and a still-nontrivial number of github pages do. Furthermore making it a single page app makes it more difficult to scrape the HTML (let alone redo the search), e.g. for integration into something like devdocs. The only potential benefit I can see is sharing code with RLS for the search, but that seems a relatively minor thing compared to not having to deal with compiler internal data structures.

Naively, I’d expect that rustdoc would remain a command line tool that uses RLS to query everything it needs and still generates HTML and a compressed search index.


#5

I think it’s even more important because “save-analysis” style interface would be much more convenient for basically any tool except IDE, so we should never drop it. Regarding the exact interface, I think it makes sense to look at Google’s Kythe: https://kythe.io/docs/schema/ which seems to solve similar problems (I have not used it myself at all though).


#6

rustdoc totally deserves some love :slight_smile:

A RLS based rustdoc could probably be both:

  • An interactive service that you can query to make it render information as needed
    • that also has a “watch” mode to react to code changes immediately
    • which has a lot of code in common with the IDE feature showing docs for an item (or even more fancy stuff)
  • and a “static” version that queries all available data to render everything it can
    • just like rustdoc does today
    • incl. generating a search index

If you want to go all the way to Rustdoctopia, it might also be possible to compile rustdoc to webasm and export the RLS data for a crate as JSON, so you get a browser-based rustdoc that queries a document store.


#7

Yeah, there is certainly a tradeoff here between ease of development and contribution, and ease of hosting. I think that from a user perspective, they’d just run a rustdoc command which would start up a local server, so it should be just as easy, if not easier, than the current model. However, I agree it makes life harder for things like docs.rs. However, having a running server is so standard in the web world nowadays, that it might not be such a bad thing. I believe the indexability of single page web-apps is a solved problem.


#8

Here are my thoughts on the plan going forward for the RLS, in particular the interface with the compiler. This is mostly the result of the compiler meeting last week and related discussions.

motivations

The motivations for change, as I understand them:

  • need to cope with incremental compilation,
  • there is friction in save-analysis when changing the compiler, in particular with work towards incremental compilation,
  • there is a feeling that we can ‘do better’ than the current setup.

long-term plan

This is the long-term plan for the RLS and compiler integration (i.e., implemented after incremental compilation). More immediate action items are below.

compilation model

  • the compiler will continue to run in a batch mode, rather than as a long-running service
  • incremental compilation data should be readable from and writable to memory as well as disk
  • the RLS will continue to run Cargo and rustc and orchestrate builds
  • the RLS will manage incremental data (similar to how it currently manages open files) and pass it to/from rustc
  • when running as a library (as in the RLS), after compilation, the client can query the compiler via a new API to get information about a program from the current compile
  • if there are no changes to files, then starting the compiler and querying it should be very fast
    • the RLS could inform the compiler that there are no changes so it doesn’t even need to read files from disk to check

API

We didn’t really discuss the API the compiler will expose. There has been a lot of talk about queries, but we’re basically just talking about a regular API.

I expect the API will be roughly similar to that presented by the rls-analysis crate (at least the informational parts, see https://github.com/nrc/rls- analysis/blob/master/src/lib.rs#L253-L493). I expect the compiler API might be a bit lower level and rls-analysis could do some post-processing. The expectation is that clients would not have to store any data, do cross-referencing, or make multiple queries for typical IDE functionality. However, it might take a few calls to get the required data (e.g., for a hover, we might call one function to go from a span to the id of the definition, and a second function to go from that id to something like a Def which includes the type (by id), and a third call to get a string representation of the type, however, ‘find all refs’ would be similar, it wouldn’t need O(n) calls).

Some areas which need improvement to make this work:

  • spans
  • Ids (probably in the RLS, rather than the compiler)
  • metadata - need detailed info about function bodies of dependent crates

The rls-analysis crate would end up being mostly a thin wrapper around the compiler’s API and could perhaps disappear altogether.

It’s not clear whether a new API should be fresh or an evolution save-analysis. Or if there is a place for both a new API and save-analysis (I suspect not, in the long term)

short-term

Things we could now, or in parallel with the incremental compilation work.

new APIs

  • Compiler backed code completion
  • Auto-imports

Would be useful to experiment with now. Not sure where best to put this stuff - save-analysis or somewhere new.

compilation model

We would need to add a way to make callbacks after compilation, this should fit easily into the compiler driver API. I think the other changes could be done sooner rather than later, not sure if it is worth prioritising.

porting APIs

I think it might be useful to start experimenting with the new API. I’d be keen to get a sense of how well something like ‘find all refs’ will work in the new system.

changes to save-analysis

librustc_save_analysis has a bunch of technical debt, mostly in the form of code which is not really used. We could reduce the surface area of save_analysis by removing stuff which is not used by the RLS or might potentially be used by Rustdoc. That should hopefully reduce some of the friction when making changes to the rest of the compiler.

Outstanding questions

  • need to profile to ensure that ‘find all refs’ and other operations which require cross-referencing can be done fast enough without pre-cross-referencing data.
  • are we happy to expand crate metadata so that the compiler has info about function bodies of dep crates? What work needs to be done there?
  • querying while compilation is in progress
    • will compilation be fast enough that we don’t care?
    • if not, will we provide a way to query old data while a new compilation is running? Seems like this could be done by running multiple compiler sessions with slightly different inputs, managed by the RLS.

#9

Another thing that occurred to me is what to do about the std libs (and closed source libs). Save-analysis currently supports API-only dumps of data that are built in the CI and can be installed by Rustup. Under the model I proposed above that would not be necessary. However, I think it would mean crate metadata and thus the Rust distro would get a little larger.