Moving bits of rustc into crates

nikomatsakis · May 21, 2018, 6:54pm

Dear rustc developers (or potential rustc developers),

Now that rustc is able to depend on crates from crates.io, I think we should look at trying to break out bits of rustc where it makes sense. In fact, this transition is already underway. For example, we have a number of crates that rustc depends on which are managed by the rustc team:

ena (unification)
chalk-engine (trait solving, eventually)
polonius (borrow checking, eventually)

However, we have a number of other crates that are currently “baked in”, mostly in rustc_data_structures. Many of these have a crates.io counterpart already:

FxHashMap (there is fxhash on crates.io)
the rustc graph stuff and graph-algorithms (there is petgraph on crates.io, but it differs in some particulars; probably it’s mostly better, but…)

I see a number of advantages of moving things out from the main repo into separate crates:

Faster iteration and unit testing.
These crates can build up their own communities and maintainers. This has been a big success in servo.
- Hacking on a subcrate is a good way to get involved without having to learn about all of rustc, or deal with rustc build times.
It encourages us to build up unit tests and mocks and the like for testing corner cases. This is good.

On the other hand, there are some costs:

It’s harder to land an “atomic change” that affects many things.

I think — on balance — it’s worth it. Here are some specific questions to think about?

What code is a good candidate for “moving out”?
- In addition to the existing stuff, I think basically all the code in rustc_data_structures fits the bill.
How do we balance having control versus using other things?
- For example, should we port to use petgraph in place of rustc’s graph? What about using fxhash?
- My preference, I think, is that for these sorts of core data structures, rustc should have the ability to tweak as we need to, but it’s not a clear call.
In cases where the repo is owned by rust-lang project, should it be part of rust-lang or rust-lang-nursery?
- I lean towards rust-lang: this is shipping code used in rustc etc! It’s not exactly immature or experimental in the usual sense.
- That said, a lot of the APIs are not particularly ergonomic etc. Maybe we ought to signal that somehow.

alexcrichton · May 21, 2018, 7:29pm

I personally think this is a great idea, I feel like there’s a lot of untapped potential in growing out these crates! One “pro” I’d add as well is that this can often be a way to even more aggressively fix bugs or head off bugs in rustc. Often times if crates get used outside their original context (aka rustc) they’ll run into bugs but aren’t too difficult to send a PR in to fix. That way we can fix future rustc bugs before they come up!

I think you’ve sort of alluded to this so far but it definitely seems best to start with data-structure-like crates. Things that don’t change too too often and could use some love in terms of API and documentation. In terms of home I think either rust-lang-nursery or rust-lang can work. I might lean more towards rust-lang-nursery but only because the APIs haven’t been well vetted.

Mark_Simulacrum · May 21, 2018, 7:43pm

Another pro: Crates on crates.io will compile faster since Cargo disable incremental for them.

I think the one thing we should aim for is making sure that the parts we move out effectively work as crates – I would like to avoid introducing more submodules than we have today.

I personally don’t think the distinction between rust-lang and rust-lang-nursery is important (I’d personally move to consolidate them, and make the maintenance/stability guarantees via notes in the README); I agree with @nikomatsakis that rust-lang feels better for this – especially because people are more likely to have permissions for it.

cuviper · May 21, 2018, 7:56pm

Externing anything using unstable features will make it especially annoying to change those features. If nothing else, the current cfg(stage0) hacks would have to get even more hackish. Being nightly-only will also limit the chances of such crates building up their own community.

For those crates that can work with the stable language, it seems less of a problem.

Your prior post is also relevant to remember:

nikomatsakis · May 21, 2018, 10:00pm

Well, that prior post is in actually a partial motivator here. For example, chalk would like to use FxHashMap — right now, it is using fxhash. Others on crates.io are presumably using that too. This crate is not under our control (though I’ve not spoken with the maintainer; I’m sure they are a lovely and careful person).

If we moved our hashmap to a rust-lang crate and published it, likely fxhash wouldn’t exist, and we’d have more users to boot to catch bugs. This would in turn ensure that we have fewer transitive dependencies that are out of our control.

There is some tension though — rustc will often want a kind of “slimmed down” profile of otherwise general purpose crates. Yet another reason though that it’s good if we have some ownership stake in them, I suppose.

nikomatsakis · May 22, 2018, 5:55pm

cc @cbreeden — I see you are the author of the fxhash crate. In order to make progress with chalk integration, I kind of want to make use of it. Honestly, I could just make rustc use it directly, but I was curious what you would think about moving it over to the rust-lang or rust-lang-nursery organizations? (I’d sort of rather that we have the ability to patch and push new revisions if needed, etc, though you’re welcome to stay on as an administrator)

burntsushi · May 22, 2018, 6:12pm

I think I would also prefer this. As of now, I don't think there are any unvetted/unpolished Rust library crates in rust-lang. It would be nice to keep rust-lang as an indicator that the crate has achieved some minimum level of quality. The two library crates there now (libc and regex) have both gone through the RFC process. I don't necessarily believe that the RFC process is the desired standard here, but I think it should be somewhere between "let's just slap a crate in there" and "let's spend a couple years getting feedback and going through the RFC process."

nikomatsakis · May 22, 2018, 6:13pm

I’m fine with rust-lang-nursery.

crlf0710 · May 22, 2018, 7:03pm

Actually I’m very fond of a properly semver'ed libsyntax, which can be used as a reference implementation of Rust source parser and used by the community (and all the dev tools), if that’s possible at all…

Maybe that’ll put up too much burden of maintenance, i don’t know, Just an idea.

nikomatsakis · May 22, 2018, 7:10pm

that’s certainly a … mid-term goal

mark-i-m · May 22, 2018, 9:21pm

Perhaps create a new rust-lang-rustc org for these to live under?

nikomatsakis · May 22, 2018, 10:28pm

I considered that, but it’s already so annoying to deal with two orgs, I don’t really want to deal with three…

nrc · May 22, 2018, 11:19pm

Yeah, there is already a mirror of libsyntax on crates.io (rustc_ap_syntax) which is used by rustfmt and Racer. It would be cool if we could make that the source of truth rather than a mirror. Making libsyntax stable is a bit challenging, but if it only had to happen once it should be OK. Separating out libsyntax tests would be another challenge. Perhaps we could start this with some of the deps of libsyntax?

CAD97 · May 23, 2018, 4:58am

Since “stable” out-of-tree libsyntax was mentioned, see also @matklad’s libsyntax2 experimental project (RFC, repo, LALRPOP)

michaelwoerister · May 23, 2018, 2:24pm

Sounds like a good idea for code that doesn’t change much anymore. However, fixing bugs in external code requires two pull requests and two reviews. There’s also the chance of reviewers not being as strict for crates outside of rustc, so I’d proceed with care.

nikomatsakis · May 23, 2018, 6:00pm

Well, it depends. Often just one, which lands without bors, and then a simple bump in the minimum version of Cargo.toml. But the larger point definitely stands: coordinating updates is more work.

nikomatsakis · May 23, 2018, 6:01pm

Anyway tbh my primary immediate concern is the FxHashMap stuff, which chalk uses. I see a few options here:

Make chalk use fxhash but leave compiler untouched
Make compiler use and re-export fxhash
Create our own fxhash-like crate (rustc-hash?) in rust-lang-nursery and have compiler + chalk use that

I’m leaning towards the third option right now.

MajorBreakfast · May 23, 2018, 7:32pm

A bit more creativity please "chalk" and "polonius" are good names. I'm sure you can come up with something interesting: "flashhash", "ultrahash" or something else that does or does not contain the word "hash" ^^'

gbutler · May 23, 2018, 7:42pm

TripleRMap (Rust Ricky Resin)?

est31 · May 23, 2018, 8:47pm

I think factoring out generic bits and pieces of the code and moving it into separate crates is a great idea.

The great drawback was already mentioned which is atomic changes across the entire codebase. I think that components can be refactored though to not rely on a multitude of rustc’s internals and be generic.

Topic		Replies	Views
Moving ena crate to the rust-lang-nursery compiler	2	1047	March 25, 2019
@notriddle's Rust 2020 wishlist, or, Rust 2021: Integration community	10	2047	February 3, 2020
Building tools on top of cargo/rustc for inspecting crates	7	1350	March 25, 2019
Instru: Instrument Rust code using Rust	1	666	March 25, 2019
"Jar" for Rust: single file crate support for `rustc` compiler	33	2214	January 22, 2024

Moving bits of rustc into crates

Related topics