2019 Strategy for Rustc and the RLS


#61

So you’re planning to let libsyntax2 just a code analyzer for a while, not yet-another frontend for rustc?

In some points I agree with your ‘another language server’ approach and really respect your work, but is it really need to be a part of libsyntax2? I feel it confusing that libsyntax2 has libeditor(<=> rls-vfs), libanalysis(<=> rls-analysis), … and other IDE-related crates. I know libsyntax2 is still in its early stage, but I really want these crates separated from libsyntax2, to make it easy to integrate with rls or other crates.


#62

Yep. Specifically, I’d like for libsyntax2-thing to be able to handle at least macro expansion and item resolution before merging with rustc. Macros seem to be a hard problem worth solving in isolation (some thoughts)

but is it really need to be a part of libsyntax2?

So, I think this is mostly a naming issue. I sometimes refer to hypothetical Rust Code Analyzer as libsyntax2, because the nascent code analyzer lives in a repository with libsyntax2 name, but, really, it’s slightly more at this point. Specifically, the repo currently contains (and I think that’s a good split long-term)

  • libsyntax2 – this is basically a pure function &str -> SyntaxTree, where SyntaxTree does not store any information besides the syntax itself. There’s hope that all other tools could reuse it by composition, by attaching all additional info to syntax nodes via side-tables.
  • libeditor – this is very much not rls-vfs. libeditor is all IDE stuff you can do, given only a single file (so, things like extend selection, matching braces, simple syntax-based intentions, scope-based completion). For rustc, there’s little difference if the whole crate is a bunch of files or a single file with all modules inline. However for IDE’s there’s a huge leap of complexity between “here’s a single file’s text” and “here are two files which exist in the file-system and somehow relate to each other”, and that’s why I think it’s worthwhile to mark this boundary very explicitly. libeditor could be used if you want to write a no-frills editor plugin, which provides some basic functionality, but does not care about build system at all, to implement CTAGS like indexing, but with real parser instead of regex or (my favorite) to make a WASM plugin to rust playground, so that you get client-side syntax-checking and basic completion, outline and goto-def.
  • libanalysis – this the main Code Analyzer, which cares about many files, and cares about files being modified. It contains that WorldState / WorldSnapshot thing. It doesn’t do any IO though, you have to explicitly teach it about existing files.
  • server – this is a mostly thin wrapper around libanalysis, which translates its internal data-structures and API to LSP.

These crates are already pretty separable, and can be used independently. However, I very much prefer a monorepo + workspaces setup to coordinated changes across repositories, so that’s why everything (including the VS Code plugin) lives in the single repository. Rust “crates are a DAG” restriction makes it difficult to introduce accidental dependencies between components, so additional splitting across repositories is not required.


#63

To elaborate on this more, one of IntelliJ Rust architectural problems is that it uses the single API for IDE-fronted stuff and compiler backend stuff. For example, the “resolve” function which is called when you invoke “go to definition” on an element is the same “resolve” function that is invoked recursively during the name resolution process, and that’s problematic because it’s interface needs to fit both IDE and compiler’s needs.

I now think it’s better to create two separate functions, resolve and resovle_internal and write some glue code between them. Even if at first the glue will look silly (just forward args), in the long term I think there’s a win.


#64

Sure. If you can generate HIR deltas (i.e., handle macro expansion and resolution), I think it would be easier to adapt rustc queries to be “online” than to rewrite them - I think all the information is already available.

That depends on how much of an API separation we want between the IDE and the compiler. If a good part of IDE support ends up in-tree (and doing accurate analysis out-of-tree feels like a PITA that will break at random compiler refactors to me), then we don’t need hard API separation.

And we probably want to have some functionality (e.g., a def-to-use mapping) tightly tied to the compiler, while some other functionality to be less so.


#65

Could you elaborate on this point? I’m not sure I understand what you are saying here. Thanks.


#66

To clarify, the end-goal is absolutely to move everything in-tree as soon as compiler and code analyzer start actually sharing code.

Sure, that’s true, but some soft separation might be nice from software engineering perspective. Specifically, it would be nice to have some kind of a facade module, so that IDE developers need only to read the facade to answer questions like “How to get the type of the expression? How to resovle this reference to definition?”, instead of referring to a specific part of the compiler that does corresponding analysis. In Kotlin, such fronted is provided by BindingContext and descriptors hierarchy.


#67

The compiler handles the expanded, resolved source as what the compiler calls “HIR”. HIR handling is incremental and on-demand, while HIR generation is a batch process, so you’ll want to handle HIR generation yourself.


#68

Oh, for some reason I thought that Macro expansion happens before AST->HIR. I, for some reason thought the process was, Parse -> AST -> Macro Expansion -> AST -> HIR -> MIR -> LLIR


#69

That’s exactly the process. The problem is that expansion & resolution is currently a batch process, so that any IDE will need to handle them by itself.

However, if the IDE can handle them, it can probably generate HIR, which type-checking can then handle by itself fairly incrementally.


#70

We do need to start talking about the Rust 2019 roadmap, yes, and I agree that there is intersection. How we’ve proceeded in the past is:

  • collate data from the survey, this is a starter point
  • put out a call for blog posts
  • read those blog posts :slight_smile: and try to summarize major themes

I think this worked fairly well but we’re always tinkering.

To a certain extent, my goal with this thread is to try and figure out what I think we ought to have on the roadmap from a compiler point-of-view. To that end I also think we should potentially be looking more than a year out – given how long these engineering projects can take, I think we should set some ambitious goals and try to figure out incremental steps we can take towards them.

I think another theme here is that one big push that we’ve had for the last few years is starting to “finish up”. That is, we set out with a goal to do a lot of ergonomic and productivity improvements in the language. I don’t by any means think we are “done” with that – there is lots of room to continue improving – but a number of the projects from that initial round (e.g., modules, NLL, etc) are coming to a close, and I want to think about where we should be headed.


#71

So, I was thinking about this. Looking beyond the RLS + incremental for a second, there are a number of proposed ideas that I think could have a significant effect on overall compilation time. Many of them will benefit one another, as well:

  • End-to-end queries
    • Right now, parsing + macro-expansion + name-resolution are not part of the query system and incremental can’t do anything with them
    • Creating the incremental infrastructure from the start would let us start to change that
    • This would be very useful for RLS integration of queries, I think
    • However, this “session” code is grungy and old and I think this is probably non-trivial. We sketched out some plans at Rust All Hands, I’d have to go back to our notes, but I remember the conclusion was “this will be work”.
    • This is also one place where the libsyntax2 story definitely intesects
  • Parallelize the compiler (tracking issue)
    • @Zoxc did awesome work enabling us to parallelize the compiler
    • But the last mile isn’t done! There were various refactorings and architectural changes we wanted to make as well.
    • It is not integrated with incremental, either
    • We’ve not done much measuring of perf
  • Multicrate sessions
    • The “full vision” for queries is that we should be able to compile many crates at once, as I talked about earlier
    • This might allow us to skip a lot of work
  • MIR-only rlibs (tracking issue)
    • Right now, our rlibs contain actual executable code
    • The idea here is that we just store MIR in the rlibs so that we can do a better job reducing duplicates and so forth
  • Erasing types where possible
    • We’ve often discussed (most recently here) the idea of erasing types (to reduce monomorphization overhead) when we can tell it’s safe. This is a big job but a worthwhile one.

#72

This could not only provide a speedup, right? It would reduce binary size as well.

Sorry, I’m very much new here. What are rlibs?


#73

rlib is what you usually get when compiling a Rust crate as a library. It contains compiled functions, but also things like macro definitions (which you can’t compile if you don’t know the invocation) and generic functions (you can type check generic function in isolation, but, to generate actual executable code, you need to know precise types which a downstream crate would use).

This is in contrast to C++, whose object files contain only executable code, but which requires you to put macros and templates in header files.


#74

Yes, potentially. We do run LLVM mergefunc now, I think, which can catch a lot of these cases too, but I’m not sure if it can catch them all.


#75

@eddyb does Lykenware’s GLL parser generator you are talking about here fit in the libsyntax2 / RLS2.0 story ?


#76

I hope we can replace the libsyntax parser with one generated from a CFG, with additional disambiguation rules on top, especially if we can easily make that incremental.

It could overlap or conflict with libsyntax2 plans. RLS would likely benefit from it, and could feed back information into it, about the delta from the last successful parse (improving error recovery).

We weren’t even planning to work on a Rust grammar for a longer while! But there seems to be some interest, and the benefits appear almost too good to be true.

We’ll have to see if performance is competitive with the hand-written parser, especially with the cost of additional features for error reporting and recovery.
I expect you’ll hear more within the coming months, especially after the Rust 2018 edition release.


Proposal: Grammar Working Group
#77

That’s something I’m especially interested in if some parser auto-generated from grammar replaces current libsyntax.
With manually written recursive descent you can easily implement basically any heuristics to report useful diagnostics.


#78

Yes, that’s an explicit goal of ours.

The GLL algorithm gives you a “formally correct” answer to “is there any way to parse this input as this grammar rule?”.

At the same time, we want to keep around enough information for any RD-style heuristic to be ran including the ability to feed back recovery possibilities into an incremental update system (as if the user fixed the syntax in the input themselves).
You can read about some of that in the issue linked in my previous comment.

One advantage of using GLL for this is that you’re no longer limited to one execution trace as with hand-written RD, and can explore multiple possibilities “at the same time”.


#79

It’s unlikely to conflict I think :slight_smile: What I care about is the concrete syntax tree API, and the actual parsing algorithm is more-or less irrelevant, as long as it can deal with incomplete input. Moreover, in the current libsyntax2 codebase the parser and parse tree construction are nicely separated, so it shouldn’t be too hard to use other parser with current syntax tree API, or, vice verse, to reuse parser to produce a different tree. The implementation of the syntax tree is swapable as well, the API is hopefully not leaky.

I’ll be more than happy to replace a hand-written parser with something generated, if it works great. The reason why libsyntax2 uses a hand-written parser atm is just that it was easier to implement. Like, I wrote my own parser generator, and fiddled quite a bit with LALRPOP, and wrote a blog-post about how an ideal parser generator should look like (REPL-ish features and inline tests!!!), and in the end just written something that works okeish by hand :slight_smile:

One non-obvious thing in this area i"d like to point out is that “being incremental” probably shouldn’t be the primary focus of a parser, for two reasons:

  • Vast majority of files are readonly.
  • Parsers are fast enough. Reparsing parser.rs (which is the largest non-generated Rust file I know about) completely from scratch takes 20 milliseconds using current libsytnax2 parser (which allocates all over the place). I think that’s fast enough even for interactions, which block UI (for example, to handle Enter key to auto-indent correctly), and is totally fine for async things like syntax highlighting. Adding super-simple incrementality (reuse tokens and don’t reparse outside balanced {}) should push the timings bellow 10 ms for the vast majority of edits I think.

#80

I wonder if that means we should fast-track file-level incremental reparsing - i.e. avoid reparsing unchanged files (we already hash the whole file), in rustc’s existing incremental setup.

cc @michaelwoerister