Moving bits of rustc into crates

Dear rustc developers (or potential rustc developers),

Now that rustc is able to depend on crates from crates.io, I think we should look at trying to break out bits of rustc where it makes sense. In fact, this transition is already underway. For example, we have a number of crates that rustc depends on which are managed by the rustc team:

  • ena (unification)
  • chalk-engine (trait solving, eventually)
  • polonius (borrow checking, eventually)

However, we have a number of other crates that are currently “baked in”, mostly in rustc_data_structures. Many of these have a crates.io counterpart already:

  • FxHashMap (there is fxhash on crates.io)
  • the rustc graph stuff and graph-algorithms (there is petgraph on crates.io, but it differs in some particulars; probably it’s mostly better, but…)

I see a number of advantages of moving things out from the main repo into separate crates:

  • Faster iteration and unit testing.
  • These crates can build up their own communities and maintainers. This has been a big success in servo.
    • Hacking on a subcrate is a good way to get involved without having to learn about all of rustc, or deal with rustc build times.
  • It encourages us to build up unit tests and mocks and the like for testing corner cases. This is good.

On the other hand, there are some costs:

  • It’s harder to land an “atomic change” that affects many things.

I think — on balance — it’s worth it. Here are some specific questions to think about?

  • What code is a good candidate for “moving out”?
    • In addition to the existing stuff, I think basically all the code in rustc_data_structures fits the bill.
  • How do we balance having control versus using other things?
    • For example, should we port to use petgraph in place of rustc’s graph? What about using fxhash?
    • My preference, I think, is that for these sorts of core data structures, rustc should have the ability to tweak as we need to, but it’s not a clear call.
  • In cases where the repo is owned by rust-lang project, should it be part of rust-lang or rust-lang-nursery?
    • I lean towards rust-lang: this is shipping code used in rustc etc! It’s not exactly immature or experimental in the usual sense.
    • That said, a lot of the APIs are not particularly ergonomic etc. Maybe we ought to signal that somehow.
25 Likes

I personally think this is a great idea, I feel like there’s a lot of untapped potential in growing out these crates! One “pro” I’d add as well is that this can often be a way to even more aggressively fix bugs or head off bugs in rustc. Often times if crates get used outside their original context (aka rustc) they’ll run into bugs but aren’t too difficult to send a PR in to fix. That way we can fix future rustc bugs before they come up!

I think you’ve sort of alluded to this so far but it definitely seems best to start with data-structure-like crates. Things that don’t change too too often and could use some love in terms of API and documentation. In terms of home I think either rust-lang-nursery or rust-lang can work. I might lean more towards rust-lang-nursery but only because the APIs haven’t been well vetted.

8 Likes

Another pro: Crates on crates.io will compile faster since Cargo disable incremental for them.

I think the one thing we should aim for is making sure that the parts we move out effectively work as crates – I would like to avoid introducing more submodules than we have today.

I personally don’t think the distinction between rust-lang and rust-lang-nursery is important (I’d personally move to consolidate them, and make the maintenance/stability guarantees via notes in the README); I agree with @nikomatsakis that rust-lang feels better for this – especially because people are more likely to have permissions for it.

3 Likes

Externing anything using unstable features will make it especially annoying to change those features. If nothing else, the current cfg(stage0) hacks would have to get even more hackish. Being nightly-only will also limit the chances of such crates building up their own community.

For those crates that can work with the stable language, it seems less of a problem.

Your prior post is also relevant to remember:

3 Likes

Well, that prior post is in actually a partial motivator here. For example, chalk would like to use FxHashMap — right now, it is using fxhash. Others on crates.io are presumably using that too. This crate is not under our control (though I’ve not spoken with the maintainer; I’m sure they are a lovely and careful person).

If we moved our hashmap to a rust-lang crate and published it, likely fxhash wouldn’t exist, and we’d have more users to boot to catch bugs. This would in turn ensure that we have fewer transitive dependencies that are out of our control.

There is some tension though — rustc will often want a kind of “slimmed down” profile of otherwise general purpose crates. Yet another reason though that it’s good if we have some ownership stake in them, I suppose.

2 Likes

cc @cbreeden — I see you are the author of the fxhash crate. In order to make progress with chalk integration, I kind of want to make use of it. Honestly, I could just make rustc use it directly, but I was curious what you would think about moving it over to the rust-lang or rust-lang-nursery organizations? (I’d sort of rather that we have the ability to patch and push new revisions if needed, etc, though you’re welcome to stay on as an administrator)

I think I would also prefer this. As of now, I don't think there are any unvetted/unpolished Rust library crates in rust-lang. It would be nice to keep rust-lang as an indicator that the crate has achieved some minimum level of quality. The two library crates there now (libc and regex) have both gone through the RFC process. I don't necessarily believe that the RFC process is the desired standard here, but I think it should be somewhere between "let's just slap a crate in there" and "let's spend a couple years getting feedback and going through the RFC process." :slight_smile:

9 Likes

I’m fine with rust-lang-nursery.

Actually I’m very fond of a properly semver'ed libsyntax, which can be used as a reference implementation of Rust source parser and used by the community (and all the dev tools), if that’s possible at all…

Maybe that’ll put up too much burden of maintenance, i don’t know, Just an idea.

8 Likes

that’s certainly a … mid-term goal :slight_smile:

2 Likes

Perhaps create a new rust-lang-rustc org for these to live under?

I considered that, but it’s already so annoying to deal with two orgs, I don’t really want to deal with three…

5 Likes

Yeah, there is already a mirror of libsyntax on crates.io (rustc_ap_syntax) which is used by rustfmt and Racer. It would be cool if we could make that the source of truth rather than a mirror. Making libsyntax stable is a bit challenging, but if it only had to happen once it should be OK. Separating out libsyntax tests would be another challenge. Perhaps we could start this with some of the deps of libsyntax?

Since “stable” out-of-tree libsyntax was mentioned, see also @matklad’s libsyntax2 experimental project (RFC, repo, LALRPOP)

Sounds like a good idea for code that doesn’t change much anymore. However, fixing bugs in external code requires two pull requests and two reviews. There’s also the chance of reviewers not being as strict for crates outside of rustc, so I’d proceed with care.

Well, it depends. Often just one, which lands without bors, and then a simple bump in the minimum version of Cargo.toml. But the larger point definitely stands: coordinating updates is more work.

Anyway tbh my primary immediate concern is the FxHashMap stuff, which chalk uses. I see a few options here:

  • Make chalk use fxhash but leave compiler untouched
  • Make compiler use and re-export fxhash
  • Create our own fxhash-like crate (rustc-hash?) in rust-lang-nursery and have compiler + chalk use that

I’m leaning towards the third option right now.

2 Likes

A bit more creativity please :slight_smile: "chalk" and "polonius" are good names. I'm sure you can come up with something interesting: "flashhash", "ultrahash" or something else that does or does not contain the word "hash" ^^'

TripleRMap (Rust Ricky Resin)? :slight_smile:

I think factoring out generic bits and pieces of the code and moving it into separate crates is a great idea.

The great drawback was already mentioned which is atomic changes across the entire codebase. I think that components can be refactored though to not rely on a multitude of rustc’s internals and be generic.

1 Like