So, thanks to all the hard work on ./x.py, we have the ability to pull in crates from crates.io and elsewhere. This is wonderful. However, we currently impose some pretty minimal requirements on said crates. As far as I know, we just check that they have suitable licenses.
Recently, rayon encountered some problems due to its use of compiletest-rs, which turns out to be depending (on windows) on the dbghelp library. This is because of miri which uses backtrace. This strikes me as a kind of warning: our set of dependencies is rapidly growing and I don’t know that we are paying any sort of attention. This increases brittleness.
I am wondering if we should consider a whitelist on the transitive crates we employ. I feel like adding a new dependency to rustc should be a “more momentous” (but not necessarily very momentous) decision – perhaps one that we FCP.
Thoughts? Am I over-reacting?
UPDATE: @retep998 opened this issue describing the particular problem rayon hit. To be clear, though, I’m not saying that this problem itself is necessarily a problem, but it does seem to me like the set of dependencies is growing sort of rapidly. Not sure if this is necessarily a problem though.
The “dependency allowed to exist” part of this recent post seems relevant. At least in the sense that there is certainly a use-case for this level of control.
yes, it’s a dev-dependency. The problem we were encountering was travis failures. We use it to check that things are failing to compile when they should. I’m interested in other solutions to that problem, but that’s kind of a separate question.
Mostly what I’m trying to ask here is whether we should try to at least throw up some roadblock to adding new dependencies, so that we have a chance to look at them as they go in. I think we don’t have to be sticklers about it, but I suspect many of them are kind of “accidental”.
In this case, for example, miri depends on backtrace, but that’s not really needed from within the compiler; I think it’s used for prettier errors. We could have potentially made it an optional feature.
I would personally be a fan of trying out a whitelist for compiler dependencies (false positives through cargo/rustfmt/rls/etc shouldn’t affect this). The compiler dependencies end up being much trickier in practice I think than rls/cargo/etc because we continue to use them to generate programs, whereas cargo/rls are just standalone programs.
I’m also starting to get worried about our build times for the compiler itself. Whenever we put more dependencies in the compiler those are crates we have to build and it can take longer and longer to do that. I’ve just done some analysis which shows that we’re clearly regressing on how long it takes to bootstrap rustc over time, it’s just not clear to me what’s actually causing the problem. I don’t think pulling in tons of crates from crates.io help much, however.
Could we perhaps try an explicit whitelist for awhile of allowed crates? See how it goes? If it’s too onerous we can reconsider and otherwise I’d agree that having a double check for “are you sure you wanted to add this dependency?” would be a good idea.
I'm in favor of trying it out. Is it hard to do? =) I'd hate to add more onto your plate personally, I'm not really following who hacks around the build system though.
Hm it won’t be too hard but I think it will be nontrivial.
We’ve already got dependency checking which verifies license, so it should be mostly just adding an array next to EXCEPTIONS for a whitelist (and then checking the names of directories and such). The trick is that we only want to verify the dependencies of rustc itself, so we’d have to run cargo metadata to learn about the dependency graph. The build system already does that though so the logic could perhaps just be moved there.
I guess it’s more of a question of vetting. I don’t think a whitelist is sufficient. Before we upgrade any dependency we would want to make sure nothing fishy was added. Moreover, we don’t want to transitively trust huge numbers of crates, so we would need to then vet dependencies of dependencies, and so on (i.e. the whole DAG)…
That sounds like a huge effort, but it does come with a lot of benefits:
we know exactly what we depend on and what we don’t
we might get ideas for where to trim some of the excess weight
it incentivizes not adding dependencies -> faster builds, smaller binaries
I’m happy to try to start this, but my bandwidth is a bit low ATM, so it might be kind of slow…
Yeah, I figured it was something like that, but that includes dependencies of every in-tree tool too. For instance, xz2 is only used by the installer to generate tarballs; it’s not in anything we ship to users.
Maybe we do want to track all that fully – that’s what I’m asking.
This does raise an interesting question: is xz2 actually ok to use in the compiler? The whitelist prototype based on cargo metadata can’t differentiate…