2019 Strategy for Rustc and the RLS

Yeah, that looks exactly like what I want. Very useful! If something like that could be integrated with rustdoc, it would be extremely useful.

I do want to circle back to this a bit too — what is the long term strategy we see around the RLS?

I've gone back and forth on this. I think right now my sense is that we should be trying to merge the RLS and rustc, both in terms of teams and other things. I guess I think that the RLS is going to be important enough to Rust that it's really a shame that (for example) I don't know much about it.

This is not to say that there won't be various bits of logic in the RLS that are pretty independent from the compiler. I would like the two to have a fairly rigorous "boundary" (probably consisting of queries, though?) between them. (But the same holds within rustc, where e.g. chalk or polonius might have a rigid boundary.)

(Maybe this argues for moving the RLS into the Rust repository? This might also help with the breakage, of course, but perhaps has other complications?)

1 Like

OK, I decided maybe I’m derailing the thread with my last few posts. Sorry, I’m just thinking through a number of things.

Perhaps the thing to focus on – and certainly the most immediate question – is What to do with libsyntax2? (Hey, that rhymes! :grin:)

To expand a bit:

  • I am excited about the work that @matklad is doing. I would like to see our existing libsyntax replaced with something that can fulfill its original purpose:
    • In particular, I would like to have a nice library to serve as a foundation for rustc (both in the RLS and without) as well as procedural macros and the like. That library should support incremental re-parsing, error recovery, faithful reproduction, and so forth.

However, at the same time, I am not sure what the best way forward is.

I am nervous about building up a “separate compiler” – I don’t want to end up with two sets of parsers to maintain long term. I think we need to map out a route that includes a way to bring the libsyntax2 effort into rustc.

I also think that integrating the existing libsyntax is sort of the “foundation” of rustc, so extracting converting al the code ouse libsyntax2 might be a big pain. I’m not really sure what that implies exactly tbh. The AST itself isn’t that widely used, it’s more the span and codemap code, and maybe there are relatively few dependencies there (most things just pass spans around and don’t care to much about how they are composed?)

I am (separately) interested in trying to extract out macro expansion, name resolution, and type checking if we can. But that’s a long way out, realistically.

11 Likes

I am not sure that there are only two parties in play. It seems to me that there's an important state-management component which sort-of lives in-between compiler and IDE fronted. I also don't think that recursive, potentially cyclic queries are the right boundary.

From the IDE-fronted point of view, the most convenient interface would consist of two types, WorldState and WorldSnapshot. WorldState is a mutable object that holds current source state of the world, can incorporate changes and can give out an instance of an immutable WorldSnapshot. A snapshot is, conceptually, a fully annotated view of a world's state at a particular point in time, with syntax, symbol and type information. So, roughly the following API

struct World(...);
impl !Send + !Sync + !Clone for World {}
impl World {
    fn add_file(&mut, path: PathBuf, content: String);
    fn change_file(&mut, path: PathBuf, edit: TextEdit);
    fn remove_file(&mut, path: PathBuf);
    fn snapshot(&self) -> WorldSnapshot;
}


struct WorldSnapshot(...);
impl Send + Sync + Clone for WorldSnapshot {}
impl WorldSnapshot {
    fn get_syntax(&self, path: &Path) -> SyntaxTree;
}
impl SyntaxTree { 
    fn get_expr_at_offset(&self, text_offset: usize) -> Expr;
}
impl Expr {
    fn get_type(&self) -> Type;
}

The key property of snapshot is that it acts as if it were just a giant precompiled blob of everything that compiler knows about the code. It does not have queries on the surface, it is static. Now, as an impl detail, it surely should be 100% on-deman and incremental as much as possible, but that's not part of it's API.

For the compiler itself, the API should be different: it can't view world as static, because it's job is to populate it with derived information. This I think is a characteristic of the queries, that they can be recursive and cyclic. The right model here looks like an immutable snapshot of the source information, plus an append-only database of derivable facts, in the form of the query engine.

So, we have a fully-static view on the one side, a mutable append-only view on the other side, and the real world where files are actually added, edited and removed. This looks like a separate third component whose task is to manage the state and invalidation.

10 Likes

Yet another solution in this space is kythe, which has a rust indexer that seems to be unmaintained at this point (https://github.com/google/kythe/tree/master/kythe/rust).

1 Like

Insofar as the RLS is at its lowest level a mirror of the filesystem state, informed partly by file change notification APIs, it could be useful to look at how Watchman handles keeping queries of their in-memory view of the FS consistent over time and across multiple clients: Query Synchronization | Watchman.

2 Likes

A big benefit is that all PRs to these components get tested out of the rustc build cycle, and that when one updates the component in the rustc repo, all of these get tested at once. When that happens, there often is a bit of churn because rustc tests things that these components do not, but if these components have good and comprehensive CI, this can reduce significantly the bors queue in rustc itself.

I totally agree with that. I always imagined that the idea would be to basically merge this “libsyntax2-based compiler” with rustc at some point so that there is only one.

I would prefer if most bits and pieces in rustc would live in their own separate crates in the nursery, and rustc becomes a super-project in which the only changes going on are actually just about continuously updating these crates.

We can have two crates doing the same job that can be kind of swapped out. This is currently not true for polonious and chalk, but maybe at some point when we remove the old borrowck, someone could re-implement it into a crate that could be swapped out with polonius just producing different results. So I don't think that having two libsyntaxes, one for IDEs and one for rustc, is in itself a huge deal. Sure if we only need one that would be better, but if we really do need two, maybe it would be worth to consider integrating libsyntax2 in rustc in such a way that it could also be swapped out, so that when updating other components, we can make sure that libsyntax2 is not broken (or properly updated if it is).

It’s a lot of work to find the right interfaces

These interfaces already exist in rustc in some form, the whole compiler is split up into crates. If making cross-cutting changes becomes more of a pain, which it will, these interfaces might evolve to become more decoupled and make cross-cutting changes easier, or less necessary. That might not be a bad thing. In any case, this will take a lot of time, so I think the question here at this point is more like "Do we ever want to get there?". How exactly to get there can we discussed later. [0]

  1. I feel (but I have no proof) that with sufficiently smart use of incremental and the build-system, modifying an end-module should be basically free ( --keep-stage 0 almost gets us there, but more love is needed). For example, as of today, working on rustc_codegen_llvm can be done with no unneeded recompilations.

I find this is really hard to discover and wish ./x.py would do the right thing by default for those working on the compiler who use it the most. Maybe we could mark some of the crates as "--keep-stage 0" so that if you only make changes to it, ./x.py test src/rustc_codegen_llvm automatically does that for you. That is, maybe the better default for ./x.py should be "build and basic test" as fast as possible, and if you want a full stage 2 test, then you should opt into it.

I never faced the “rustc initial build” barrier, because I always have a fully-ready rustc somewhere on my PC,

When working on a laptop, each rustc build with incremental compilation takes dozen of Gbs. I wish there was a way to work on pieces of rustc without having to build it all.

Similarly we need to ensure that all of our bots are running on all these crates, and that we have a kind of consistent reviewing strategy

  • right now we are very focused on rust-lang/rust

That heavily depends on the crate: check out libc, stdsimd, packed_simd, etc.

3 Likes

That's basically what I am talking about in this thread :slight_smile:

For example, does intellij-rust suffer from the same problems listed above?

I think intellij-rust is in an excellent state, and I believe that if enough people are using Rust, JetBrains' products will have great Rust support, because for IntelliJ-Rust it's mostly a question of pouring more resources into incrementally improving the infrastructure that already exists.

However, there are two things about IntelliJ-Rust which are not perfect, and which are the reason why I am trying to switch gears to libsyntax2.

First, it's not written in Rust. In general, it's not true that langauge's dev-tools should be written in the language itself. For example, Swift is written in C++ because Swift should be a great language for app development, and not for compiler development. C++ was just the better language to write a production compiler in, at the time of writing.

Now, I firmly believe that Rust is absolutely the best language today for writing compilers and code analyzers (like, it's a cross between OCaml and C++, that's as perfect for compilers as it can get), so ther's no question for me that Rust should have excellent code analyzer implementation in itself.

The second thing is that IntelliJ Rust is not a reusable library. With some effort, it could be extracted as a stand-alone jar, but that's probably does not worth the effort, as JVM based libraries are not exactly a lightweight dependency.

The question of funding Rust development is interesting, but is probably not specific to IDEs.

It's also worth keeping in mind that while JetBrains indeed funded IntelliJ-Rust, its not like there's a huge team behind it. I think up until the moment I've left, there were at most one full-time and one part-time paid developer working on it?

I also disagree that resources are the principal problem here. In case of RLS, I personally simply don't see an infrastructure I can apply incremental improvements to. This may sound like 'everything is broken and nothing works', yes, but i don't think this is counter-productive? If I am right, there's some serious action items we should complete. If I am wrong, the discussion should provide a much better understanding of trade offs and possible solutions.

9 Likes

One thing I would like to raise around this topic is compile times. It seems to me that many of the same things that would be useful for a fast IDE experience would also be useful for fast incremental debug builds. In which case perhaps there could be quite a significant benefit to aiming towards the shared components approach. Alternatively perhaps trying to make the compiler libraries too generic could make it more difficult to optimise things? (probably/hopefully not?).

In any case, IMO compile times are at least as important as a good IDE experience, and they seem to be related in that the work to improve both of these things is likely to involve fairly significant refactoring of rustc, so I think it would be good if the design for RLS also was created with this in mind.

If I get time, I might try and write up a blogpost about this. Overall, I agree with other posters that 2019 might be a good year to focus on the compiler and toolchain quality and demphasise work on language features. I feel like Rust is already a fantastic language compared to for example Go, but is relatively weaker in terms of things like compile times and cross compiling.

4 Likes

I'm pretty sure this is 100% true. In fact, a "Fast Incremental Debug Build" is effectively the same thing as the IDE semi-continuously recompiling changed parts on-demand. In other words, if you solve the RLS/IDE problem you'll pretty-much automatically solve the "Fast Incremental Debug Build" problem without any significant additional effort.

This is mostly true, but, it would be a shame if things like "Constant Generics", "Generic Associated Types", "Async/Await", "Macros 2.0", and "Generators/Yield" were completely delayed until the IDE/RLS/Compiler issues were sorted.

Things like full HKT and other more advanced language things could, and possibly should (IMHO), take a back-seat to these compiler/RLS/IDE integration issues though.

4 Likes

Ha-ha, good luck.
We have a whole team of people whose responsibility is "designing new language features", and some of them really like their work.

18 Likes

This was my assumption, but the topic of "compile times in general" is more complex than that. For example, @michaelwoerister recently landed a pretty exciting PR implementing ThinLTO, which means we can start using incremental for optimized builds -- @michaelwoerister has also found that their work on cross-language inlining allows us to incline LLVM calls into the compiler, and might yield a non-trivial speedup, iirc?

Anyway, the point is that we should also think about defining what kinds of compilation times we care most about. My opinion is that we probably still ought to be focusing primarily on IDE-like use cases (that is, rapid turnaround for small changes), but there's a case to be made for other stuff too.

7 Likes

Back to the question on what we should do with libsyntax2.

To start, I’d like to make a couple of general observations.

First, “merging libsyntax2 into rustc” is not a goal, it’s an optimization of resources. By just switching the parser, we won’t get neither a better IDE, nor a better compiler.

Second, I think “amortized complexity analysis” also works here. Specifically, I don’t think that the amount of work required to integrate libsyntax2 into compiler at time T_1 will be larger then the work at time T_2 >= T_1. That means that the moment when it starts to be painful to maintain two compilers is the moment where it makes sense to initiate the merge. We don’t win much by merging earlier.

With this in mind, I think the plan proposed in 2019 Strategy for Rustc and the RLS makes sense, including the “for merging with rustc, let’s just see how it goes” bit.

Specifically, I’d like to at least implement macro expansion and module tree resolution in an alternative code analyzer before merging with rustc. I feel these are the core ingredients to make on-the-fly analysis work, and they are really challenging: module resolution is a fixed-point computation, intertwined with macro expansion which could create new modules, and it is also the bit that merges the real, ugly and quirky, file system world with idealized compiler vision.

Now, I think the most contentious bit of that plan is that it effectively proposes to start an alternative implementation of language server, instead of working off the current RLS code base, and I’d like to dig into this concern more.

First of all, to clear non-technical issues, you might think that @matklad just wants to be the “author” of rust language server and to prove that the RLS is somehow wrong. I would lie if I say that there isn’t a tiny little bit of that as well, but keep in mind that by switching onto the next code analyzer, I am abandoning my work on IntelliJ Rust as well.

But I hope this is irrelevant, because technical points bear much more weight:

  • It is important to have cargo +stable test build process for Code Analyzer, and RLS is incompatible with that.
  • Linking to Cargo and running compiler after the user haven’t typed anything for 1500 milliseconds is just a wrong approach to IDEs (see 2019 Strategy for Rustc and the RLS and Let proc macros pass information to the compiler for in-depth discussion)
  • Having strong guarantees about how the world looks like and transnational/snapshots semantics is important, and RLS does not have that.

It’s also important that the primary role of that alternative language server is to be the test-bed for Code Analyzer. Shipping IDE experience to users, at least mid-term, is supposed to be done by integrating with Racer and existing RLS.

15 Likes

I wouldn't count on the speedups from cross-lang inlining being too exciting. A few percent for non-check builds, hopefully. Assuming that we can actually do it without breaking our CI budgets.

I don’t have any insights to add to this discussion (as the discussion is way over my head), but there are two related things I’d like to mention:

  1. I love the discussion and ideas going on in this thread. This is what truly sucks me into the Rust ecosystem and what makes me excited to check out the Rust forums/Reddit/… during my morning routine, lots of activity, lots of civility, and a continuous march forward. And in the process, I keep learning more and more about not only Rust itself, but software development and compiler hacking in general. Keep it up :heart:

  2. Seeing the topic title “2019 Strategy for Rustc and the RLS”, seeing some estimations of projects like these costing 1+ years easily, and knowing that almost all effort is currently going into the 2018 edition release, which is coming closer, as is the end of the year itself. Is there anywhere but here that discusses the main focus points for 2019 (and beyond) for rust itself? The first post by @nikomatsakis casually mentions “lately I’ve been having separate conversations”, but is this in relation to plans being made for “post-2018”, and is it somehow related to the community survey that went out a couple of weeks ago?

    I ask, because this feels like a very important focus point, and one that I would gladly shelve other potential improvements for, if I know we’re working towards improving what’s discussed here, but I can also understand that there are many other people that might not care for this at all, and are more interested in the many RFCs that need some TLC after the 2018 edition ships.

13 Likes

There's no single formal convergence point for that discussion right now, but https://github.com/rust-lang/rfcs/pull/1728 describes the annual "roadmap" process which will no doubt happen again. The 2018 roadmap RFC was https://github.com/rust-lang/rfcs/pull/2314, so I'm expecting a 2019 Roadmap RFC to pop up around a similar time (i.e., January of 2019 ish).

3 Likes

Also the survey for 2018 is out: https://blog.rust-lang.org/2018/08/08/survey.html … There are 2 days left to answer it if you want. Then, we get the results sometime later…

So here is an example that came up recently. I suspect we all know the experience of building some crate for the first time and spending a long time building a long chain of dependencies. This can be a real issue — often, it may be the case that only a small fraction of that code winds up really needed in the final binary.

If we pushed hard on the query system, we might be able to do much better here. We've often thought about liberating rustc from the "crate at a time" mindset that it currently has, and instead having it be able to compile across many crates at once. Then you might (for example) only type-check/trans/build those parts of your dependencies that you actually use, etc. You would also get to the point of compilation errors much faster, since those don't depend on building machine code at all. etc

25 Likes

Yes please! That would let you start building a crate before you've completely finished building the crates it depends on. I think this is necessary to make significant build-time improvements for projects like ours with many crates (42 so far) in a single workspace with long dependency chains. Similar projects in C++ can build with almost perfect parallel scaling on large machines (thanks to a lot of manual labour crafting headers, to be fair), but cargo/rustc aren't getting anywhere near that.

I think something similar is needed for RLS for multi-crate projects. Currently RLS works tolerably as long as one only edits a single crate, but when one starts changing a different crate RLS seems to trigger something akin to a full workspace build, which is intolerably slow.

5 Likes