Crate dependency discovery

If you'd like to see a bit of prior art in this area I implemented Rust caching support in sccache a few years back. sccache is built to be a ccache replacement so it operates at the compiler invocation level, relying on cargo to figure out all the crate dependencies and pass the proper commandline options. It does simply invoke rustc --emit=dep-info to find all the source files involved in compiling a crate, but relies on the fact that cargo has already built any crate dependencies. I think it'd be great to figure out how to push cargo towards more of a bazel-like compilation model where it has enough information to model the full build up-front. That probably involves working out a replacement for the current build.rs support though, since they are not at all declarative. (I looked into prototyping starlark integration into cargo at one point to see if we could try out replacing build scripts with bazel-style BUILD files but it wound up being a bit of a slog and I never got it working.)

2 Likes

Thanks @luser for posting, so I don't feel bad for resurrecting a dead thread to chime in :wink:

This thread touches on a number of things that I'm interested in:

Firstly, I'd love to have a reliable way to discover all the sources of a crate without having to do a full build/have dependencies available. This isn't possible in general because procmacros can make mod references up on the fly, but setting them aside, it seems to me that it should be possible, even if the mod declarations are conditional or produced by macros (since I don't think its possible to do token pasting to construct module names). The current state of the art I know of is srcfiles, which works reasonably well within its limits (namely: Rust 2018 only, and no general discovery of macro-generated mod declarations).

Secondly, I'm also very interested in building Rust with Buck rather than Cargo, not only to build in a polyglot environment, but also to take advantage of distributed building and pervasive reliable caching.

Also its worth making the point that by eliminating extern crate, Rust has moved in the direction of requiring an external build system to manage crate dependencies; they can't be reconstructed from the sources themselves. I think this is an excellent direction in removing redundant information as the build system will always need that info, where as rustc can easily get it from the build environment. Going the other way around is much more complex in general, and would inevitably result in a very Cargo-centric design.

Even if that were the case, many of its design decisions do actually help. For example: the decision to make it so that you need to explicitly declare direct dependencies rather than implicitly getting access to transitive dependencies. That's a design decision which could have gone the other way at an early stage - when it wouldn't have made much difference either way (or even made things "easier") - but would have been a disaster in retrospect and hard to undo (see: C++).

Rust's emphasis on build determinism (strictly versioned hash), while really intended to make incrementality tractable, also helps a lot with caching and distributed builds. If local variations creep in because the distributed build environments don't match up exactly, then we get immediate signal that something has gone wrong rather than apparently successful but divergent builds (see: C++).

The crate design gives us the nice experience of not having to repeat everything in separate implementation and interface files, but it does lose build parallelism vs (non-modular) C++. Pipelining is one immediate solution for that, but there are a number of other hiccups which prevent us from extracting all the parallelism we'd like, such as monolithic linking (being addressed in https://github.com/rust-lang/rust/issues/64191) and proc-macros which end up being build bottlenecks (@dtolnay's watt is a prototype of using Web Assembly to help mitigate this).

7 Likes

Let's assume the multi-library crates are here (I think it's only a matter of time). Imagine a complex package (say, a web browser) with a large number of internal as well as external library crates. Each internal library may depend on some subset of internal and external crates. In the model you've described maintaining accurate dependency information is the user's responsibility. Want to use crate foo in baz/lib.rs in addition to bar/lib.rs? Need to first go and declare this dependency in some other file.

An alternative approach could have been to continue declaring dependencies with extern crate in the same source file where one needs them and let the build system extract this information and sort out the dependencies between crates within the package.

1 Like

I'm not sure what you mean by this. Do you mean a Cargo package which can contain multiple library crates?

If that's the case, I'm not sure it really makes a difference either way, since Cargo packages are not a Rust language-level concept, and a crate is still a crate.

This is awkward for a build system because it effectively requires dynamic dependencies - you can't construct the entire dependency graph at once - you need build up the graph iteratively. If the build system handles dynamic dependencies and can cache them properly, then it may not be a huge deal - but not many do, and for the rest it would mean a two phase process of generating a build description from the Rust sources, and keeping it up to date as the code changes. (And that's assuming the code is correct enough to extract dependency info - if not it adds another level of iteration.)

Ultimately the programmer needs to declare their dependencies. An extern crate only names a crate, but has no other metadata like source location or version info, so it would still need external metadata to provide that info - and if it has that external metadata, you don't need the extern crate.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.