2019 Strategy for Rustc and the RLS


#21

Though I don’t think racer is very easy to maintain, this comparison is not fair.

Racer’s last commit is 5 years ago.


#22

I don’t think so. The fact that we extract parts of the compiler as a libraries does not mean that we need to stabilize them. For example, Cargo is published to crates.io as a library, but it is of 0.x version and is explicitly unstable. We do have convos like “- Please provide better document docs for APIs!; - APIs are unstable, you are on your own” once in a while on issue tracker, but that’s it.

Of course, there’s a certain risk that making parts of the compiler usable will lead to folks using them in their tools a de facto stabilization. I think the risks here are low: if someone writes a tool, that’s great, but they’ll just have to keep up with upstream’s pace of development themselves.


#23

Maaaybe? For Code Analyzer, I’d rather just abstract proc macro interface behind a trait: applying TokenStream -> TokenStream transformations is in scope of CA, but creating such a transformation by compiling Rust code is def out of scope.

As for the impl, I suspect mid-term the solution would be to use a separate native process for proc-macros and IPC mechanism? (loading proc macros into process is ok for batch compiler, but won’t work in CA due to reliability and perf requirements).


#24

I don’t think so? The feature freeze will be required only when merging two strands of development, but that point seems to be far-off. Optimistically, I’d say two years is a minimal time required to create an IDE foundation, it’s not something we can really reasonably spike. I don’t think IDE-work should affect language planing in a principal way until we’ve have at least the basics working.


#25

I think stabilization and specification efforts have to be explicitly out of scope to actually ship something usable.


#26

I’m pretty excited about the libsyntax2 work. I see a lot of potential to improve features in IDEs and help the compiler towards very incremental compilation. It would be great to improve Racer with it, for one thing.

However, my impression is that libsyntax2 allows us to do more, but doesn’t help that much with the type-driven actions which the RLS currently provides. Or rather it would let us do something similar and faster, but not as accurate (without implementing all of type checking there).

The perfect end-state I envision is to have a single rust-analyzer library, which is used by both command-line compiler as well as the language server

This is pretty much my ideal end-state too. And I also worry about how to get there. In particular the worry I expressed to Niko is that we use it in the RLS, but then there isn’t the resources to use it in the compiler and we have much increased maintenance costs for the RLS.

the rust-analyzer idea, is basically moving 80-90% of the compiler into a library. Which I think is a great idea, I’m a big fan of moving stuff out of the repo. However, there are issues with doing so (e.g., linking std versions to compiler versions) and realistically, from the RLS perspective, we won’t get anything new, it’s just an implementation detail.

The second reason for this is that working on RLS itself is super hard. …

This is a bit of an exaggeration, if you pull and use that day’s nightly it will work 8 times out of 10. It’s been worse recently because of the increased rate of changes for the edition. However, I totally agree if it would be better if this didn’t happen.

Half of the commits to RLS is “update X”, and there’s also a busy-work process of updating RLS in rust-lang/rust.

This is about right and it is extremely frustrating. I think though that any solution in this space will have these problems. There’s either API compat issues or you have the harder problem of keeping up with language changes without any compiler errors. I believe we can do much better by improving the tooling here - either using a mono-repo or some kind of structured integration of changes to Rust components.

As an example, I’ve implemented “run the test under cursor” feature both …

This isn’t really a fair test. It is something that libsyntax2 should be good at, but which isn’t a goal for the RLS design.

The tests in the RLS are bad, but I think rather than being a reason to throw it all out and start again, we should just write more (and better) tests.

The fact that libsyntax2 and RLS repositories have approximately the same number of Rust source code lines, if you exclude tests and generated code, also hints that adding a line of code to RLS is very expensive

I’m not sure that is a useful metric? If you really want to compare lines of code you should include half the compiler and the RLS support crates. I won’t deny that it is relatively difficult to add features to the RLS, but I expect that two years down the line when libsyntax2 is more complete it will be a lot more expensive to add code there too.

I thought that RLS was fairly close to being a “server API” around Racer, hence its functionality is orthogonal to libsyntax2 , which is a Rust frontend.

This is pretty true, and important.

Capitalize on and improve upon existing Racer, by incrementally substituting its heuristics with precise algorithms from Code Analyzer.

This indeed sounds like a good idea. We’ve also thought about the RLS giving type information to Racer so that it can complete a lot more. It might be useful to do the same for your code analyzer?

Capitalize on existing save-analysis infrastructure by using it for dependencies .

The hard part of dealing with deps is that you have to model the Cargo project. Hopefully some Cargo refactoring will make this much easier, but you still have to deal with some issues in any case (and this is where a lot of the RLS’s nasty concurrency issues come from).

In contrast, achieving 100% feature/bug parity with libsyntax1 will require huge effort.

ISTM that we have to address this eventually whatever approach you take and it will be a huge effort

Some people think that this would be fixed by incrementally improving RLS

I’m strongly in this camp. While most of the points you make are true, none of them are really about the foundation of the RLS and I think they can mostly be addressed without throwing everything away. It is a well-known true-ism in software engineering that you can have a clean prototype, but as you add features you inevitably end up with less clean code. We’ll surely have a bunch of problems with a clean start, we’ll just have wasted a year or so before we get to that point.

The second big RLS problem is its “shared mutable state is fine” handling of concurrency

We can do much better. Fundamentally the concurrency issues are difficult due to wanting to be responsive to the user and still be as accurate as possible. Using snapshots or something like that might be a reasonable improvement.

I think it very much depends on what you want to do as to whether repeatable read is important. AFAIK, nothing in the RLS currently needs it? We might deliver slightly out of date data, but that is usually not a problem and if the user is typing, then it’s the best you can do.

The third big RLS problem is that it is deeply integrated with the build system and runs rustc to get save-analysis data.

I don’t see this as a problem. Effectively (and this will require some work in Cargo) we will ask Cargo how to build a project and then have a very quick way to query "this file has changed what should be rebuilt) and then we do it. Other build systems will use Cargo as an abstraction layer. You never have to do a full build unless you want to (or say the Cargo.toml is changed).

If you can only guess at the project layout, I don’t think you can get accurate type info.

a significant chunk of RLS is build orchestration, which could have been just cargo check otherwise

This is not quite right. The complexity comes from tracking dependencies in workspaces where there might be multiple current crates and having files in memory which have been changed. My hope is that this can be factored out of the RLS and into Cargo.

if someone writes a tool, that’s great, but they’ll just have to keep up with upstream’s pace of development themselves

this has not worked in the past. If a tool appears which make the libs de factor stable, and people use that tool then the compilers are in an awful bind and basically have to ensure back compat.

Maintenance of the RLS

My hope is that by factoring things out to Cargo, and relying on save-analysis and the proposed query system for the compiler, we can ensure that the RLS is a fairly thin intermediary which doesn’t need too much maintenance and that by maintaining the compiler and Cargo, the RLS is also mostly looked after.

In the short-term making incremental improvement to the compiler, Racer, and the RLS is something that can be reasonably done with the resources we have. Rewriting the compiler front-end would seem to need lot’s more people working for a long time


#27

It is harder to coordinate “cross-cutting refactorings”, since you have to modify a number of components at once. (And if tooling is dependent on you, we may be reluctant to make breaking changes.)

They may be only internal libraries, which are only used by rustc.

So it’s easier to make refactorings over all, but you still have very clear interfaces.


#28

I’d just like to throw in my 2 cents here. I agree with above points (including, but not limited to):

  1. Settle on more principled approach to concurrency
  2. use query/incremental system instead of re-running rustc --emit=metadata ...
  3. facilitate tool development + repo coordination (this is frustrating, I agree)

These are real problems and would I also like to address them, however I honestly don’t like the ‘everything is broken and nothing works’ narrative because I feel like this is counter-productive here and pushes us further away from the goal, if anything.

Most of the RLS problems are growing pains. What first started as a LSP-compatible wrapper around cached rustc invocation has grown to something bigger; it now bundles other tools like Clippy or Rustfmt, reuses Cargo to run customized cargo check and later caches + orchestrates build itself spanning multiple crates and also supports more loosely related features, just like the aforementiond Run test functionality.

The long-term planned direction sounds great and it’d be awesome if we could make it a reality next (or the one after?) year. I just wanted to say that we definitely don’t need to throw out everything; we can focus on actively improving the status quo, instead. I imagine abstracting away the project model (incl. support for other build systems?), some amounts of refactoring, enhancing Racer with save-analysis data and cleaning up the RLS test suite will do wonders and it’s definitely doable now. Frankly, I think the RLS just doesn’t have the manpower to do it as quickly.


#29

Pipe-dream warning, so I’m hesitant to even ask:

How feasible would it be, as an end goal, to combine rustc and RLS into a stateful incremental compiler, i.e. one that holds the entire code in all stages from source to MIR in memory (or mmapped)? Source changes would propagate through the structures and incrementally update them. Queries could be run on the structures on-line to provide type information etc to satisfy the needs of the RLS.


#30

I don’t see this as a problem. Effectively (and this will require some work in Cargo) we will ask Cargo how to build a project and then have a very quick way to query "this file has changed what should be rebuilt) and then we do it. Other build systems will use Cargo as an abstraction layer. You never have to do a full build unless you want to (or say the Cargo.toml is changed).

This is the principal issue we disagree on. I’ll address only it, but thoroughly.

Let’s start with definitions, to agree precisely on what we disagree about.

Build Process is a sequence of tools invocations which produce project artifacts. Each sufficiently large project has it’s own slightly unique build process.

Project Model is a logical model of dependencies between various source files. Unlike a per-project Build Process, Project Model is fixed for a language. For Java, the model is classpath, for Rust, the model is a DAG of anonymous crates with attributed edges.

Now we can formulate the two approaches whose merits we are debating.

Approach 1: Code Analyzer should instrument the Build Process, intercept compiler calls and work from that.

Approach 2: Code Analyzer should be build-system and Build Process agnostic and instead work with Project Model.

RLS current architecture and current roadmap focuses on the first approach. I am proponent of the second one.

Now, with definitions sorted out, let me argue for the second approach.

The main argument for approach 1 is correctness: there’s an assumption that, to get analysis 100% correct, Code Analyzer should exactly repeat what compiler does.

There are two problems with this argument.

First, as eloquently expressed by @petrochenkov in a sibling thread, building large projects is hard, which results in Code Analyzer working great for small stuff, but being completely helpless for entreprisy use-cases.

Second, the assumption “you need to mimic compiler exactly to be precise” does not hold in practice. Quoting myself from that other thread,

As another point for “you can have a horrendous build system and precise Code Analyzer”, consider IntelliJ itself. It is a huge project, and I think it is build with every tool you can use to build Java. I definitely saw Ant, Maven, Gant, Gradle and JPS in various parts of it. However, IntelliJ is developed in IntelliJ and, naturally, all code insight features just work: IDEA knows little about it’s build system, it only has a project model which is a set of XML files describing classpaths of various modules. Now, syncing those XML files and build-system is a pain. However in Rust a similar task would be much simpler: it has a stronger project model and it has a mostly one build system (Cargo).

As for positive arguments for the approach 2, there are three of them.

First and foremost, it’s more reliable: getting a Project Model is much simpler than intercepting Build Process, so Code Analyzer works at least partially with any code-base.

It’s also more performance. It’s true that you can achieve good incrementallity using approach 1. However, the main requirement for Code Analyzer is not incrementality, but on-demandness, and that I believe is incompatible with approach 1.

Finally, nobody does approach 1 :slight_smile:

For Java, there are a ton of different build systems but IntelliJ is not married to any particular one, and happily works with various mixtures.

Kotlin also is build system agnostic. I think currently Kotlin itself is build with Gradle, but up until recently its build system was a zoo of nightmares as well (like, you had to have three different JDKs installed), which haven’t preventing the IDE from doing completions, type inference, and all other IDE stuff.

As linked above, Dart Analyzer also does not care much about Build Process.

The only project that I know of that does Build Process instrumentation is Kythe. And that makes perfect sense for Kythe, which is geared towards offline code indexing, DXR style, and not towards working with red-hot freshly typed code.


#31

@matklad The only other robust IDE-like tool for Rust that I’m aware of is intellij-rust, which, as I understand it, did the ground-up compiler re-write in Kotlin strategy. Given your involvement in both rustc and intellij-rust, could you provide a “lessons-learned” or comparison between the two approaches?

For example, does intellij-rust suffer from the same problems listed above? Does it suffer from different problems? Are there pros/cons to how tightly intellij-rust is integrated with the greater IntelliJ platform (e.g. quick-fix, intentions, etc.)


Obviously one advantage intellij-rust has is that JetBrains also funds it directly. I assume their business model is that it will boost CLion sales, and/or an eventual paid Rust IDE (similar to the track of GoLand). Perhaps part of the solution is to get more companies onboard to support the RLS? For example, I’d think Microsoft would have at least a small interest in the extra promotion/branding VSCode gets from its support of Rust.


#32

Just jumping in with a semi-related idea: What I would really love is the ability to generate some sort of semantic source graph thing. Basically, you get a fully annotated version of the source code that you can browse and click through. Every identifier has a link to its definition and (other) usage sites. I had the opportunity to work with such a tool recently at a company, and it was amazingly useful.

W.r.t. RLS and rustc:

  • Having such a tool would, I think, relieve pressure on RLS to use perfect semantic completion for everything.
  • The rust compiler already does all of the necessary analysis for this. We would just need to dump it in some stable format (e.g. json). Perhaps save-analysis could be extended for this? We could then integrate this with rustdoc (it already shows a snapshot of the source code anyway). Also, since this is not “interactive” in the same sense as an IDE, the pressure on compiler performance might be relieved a bit.

Thoughts?


#33

Something like Microsoft’s Reference Source?

It seems like that would really just be another consumer of non-incremental save-analysis data that would be used for deps, but presented in a nice visual way, so I don’t see why not.


#34

@mark-i-m You might want to look at https://github.com/nrc/cargo-src. IIRC it uses the RLS backend and thus save-analysis to generate the navigation data.


#35

I’m enjoying this thread. =) Lots going on here.

I want to try and drop a few thoughts:

On the question of mono- vs poly-repo:

This has come up twice. First, in @arielb1’s response to my thoughts about trying to break the compiler, but also in the discussion of the RLS. The issues are somewhat different but strongly related.

I think I really want a few things:

  • Straightforward cargo test, IDE-friendly workflow.
    • having to run x.py is unfortunate, but not the end of the world
  • Clear “sub-units” of the compiler:
    • so that you can master one part but not the others
    • ideally, with unit-testable, well-defined boundaries

I think you could achieve this in either a mono- or a poly-repo setup. Our current mono-repo introduces some hurdles:

  • you need to use x.py, not a standard workflow
  • we tend to build the stdlib first thing which is slow etc
  • landing requires going through the bors queue
  • using rust-lang/rust as your repo means that your issues get all mixed together etc

On the other hand, some of those same things (e.g., issues all mixed together) can be a kind of advantage (e.g., more eyes on the same issues, triage team will help sort them, etc).

I’m not really sure what precise setup is best here. Probably the thing to do is to focus on integrating Chalk and Polonius and seeing how it goes: in particular, is it a pain in the neck to keep them in a separate repo? Does that work out well? Is the “input/output” barrier a useful thing?


#36

Another thing that came up when I was talking to @aturon is that it probably makes sense to “think bigger” than 2019. The truth is, these are big engineering projects, and they take time, and we should probably be trying to layout a plan for the next “edition” – e.g., around three years. I’m not sure what changes here but it is useful to have that much time to think.

I am excited about the idea of really trying to turn rustc into a “world class” compiler over the next few years. We’ve done a ton of groundwork around this – ranging from incremental, to MIR, to projects like chalk, to the query system – but we’ve not really “reaped all the benefits” yet.


#37

Visual Studio does! :slight_smile: Though the project model approach is definitely useful and VS does use it in some aspects.


#38

The answer is obviously micro-services! :stuck_out_tongue_winking_eye:

More seriously, I feel this is a rather tactical question, which probably could be extracted to a separate thread as it feels pretty orthogonal to overall code architecture?

I personally have a huge distaste for submodule-based setups and putting commit hashes into Cargo.tomls, and prefer a single source tree where different components live in different folders, have mostly-independent build processes and are tied together by a top level build system, which is used for integration tests, but is avoided during normal development.

An interesting spin on this model is to have a single source tree, a repository per component (with independent issues and CI), and one master repository which pulls components from the forks (this I think is the Linux kernel approach?).


#39

I agree, although I think that these questions of how manage things are not truly orthogonal. The repo setup tends to reflect the interdependencies of the code underneath. In this case, perhaps, on underlying question is whether we want to try to have more “reusable libraries” that are independent from rustc (wherever they live). I think we should, but I also think we should prove out the approach.

In any case, the thread on the whole is a bit of a mish-mash. =) I’ve been looking over the previous comments and pondering how to extract out what the best questions to focus on.

One thing I could imagine is to make sure that we are all on the same page with respect to our goals. For example, if I were going to come up with a handful of “top goals” for the compiler, it would probably be this:

  • Truly excellent RLS integration
  • Chalk integration, which unlocks lazy normalization, GATs
  • Const generics (also blocked on lazy norm)
  • Continuing improvements to NLL

This list is sort of driven by a combination of things I think are important (GATs, const generics) and thinks where we have good ideas that are not yet fully implemented. I’d be interested to hear about other compiler-related things that are not on this list. (For example, I know that @michaelwoerister has been working on things like cross-language inlining in the context of Gecko. Or, I feel like we can do much better FFI integration,but that is a big project we’ve done no legwork on. And depending on details of how it worked might or might not require changes in rustc proper.)


#40

Just a data point: in the few job experiences I have had, the mono-repo has held a lot of benefits in terms of:

  • code sharing and reuse
  • not having to version stuff
  • finding out immediately when you have broken someone else’s stuff

This has apparently large enough benefits that one company I interned with actually put a lot of man-power into switching from ~100 individual repos to a mono-repo while I was there.

I think the goals you listed are definitely achievable in a mono-repo, but it would require some additional tooling (static analysis) and strongly-enforce conventions, I think.