Crate dependency discovery

Is there a way to discover a crate's dependencies before compiling it? I am still wrapping my head around Rust's compilation and linking model so maybe I am missing something obvious but so far I couldn't find anything relevant.

Currently (AFAICS) a crate's root source file can have the following dependencies:

  • Module source files (mod foo;).

  • Other crates (--extern foo=...).

  • Native libraries (-lfoo).

  • Files includes with include_str!.

I am particularly interested in the first two. Before the 2018 edition I could probably write a scanner (or, better yet, rustc could provide a mode like this) that extracted this information from the mod foo; and extern crate foo; declarations without needing foo.rs or libfoo.rlib to exist. Now, with extern crate being optional, I am not sure this will be easy/possible.

As to why would someone need this, the more precise dependency information would allow more accurate change detection, better parallelism (no need to wait on crates we don't need), and the ability to support auto-generated source code as part of the main build step (as opposed to a pre-build step). To illustrate the first point, consider this setup:

cargo new --lib util
cargo new --bin prog

# Use util in prog.
#
echo 'util = {path="../util"}' >>prog/Cargo.toml
echo 'extern crate util;' >>prog/src/main.rs

# Add library crate to prog.
#
touch prog/src/lib.rs 

cd prog
cargo build

touch ../util/src/lib.rs
cargo build -v

The last command's output will be along these lines:

rustc ... ../util/src/lib.rs
rustc ... src/lib.rs --extern util=...
rustc ... src/main.rs --extern util=... --extern prog=...

As you can see, prog/src/lib.rs is recompiled even though it does not depend on util.

Interestingly, --emit=dep-info -Z binary-dep-depinfo takes a step in this direction by omitting (from the .d file) references to crates that are not actually used. However, it still expects the complete crate metadata for the crates that are used (I tried to pass a dummy .rmeta for an empty crate with the same name but that didn't work out).

I think that's not possible in general, as mod declarations might be created "on the fly" with macros.

1 Like

mod declarations might be created "on the fly" with macros

In fact, automod crates generate mod from the filesystem (please don't do this...), because some people really really hate writing mod.

3 Likes

I wonder if allowing/supporting this at the expense of being able to do static dependency discovery was a conscious decision?

FWIW, we just went through a similar issue for C++20 modules where an import can no longer be the result of a macro expansion (I realize these are different macros but the underlying problem is the same; see P1703 for details). So, in this regard and IMO, C++ has a saner model here. And if C++ is saner than you, that's not a good sign ;-).

3 Likes

I imagine the first step in changing that would be a warn-by-default clippy lint for mods being introduced by macros.

Even if it never becomes officially deprecated, that could discourage its use as something which constrains what tooling can be written and applied to your project.

1 Like

With my IDE writer hat on, I'd love to see this!

With my Rust user's hat on, I feel that would be a rather painful restriction, as things like cfg_if! and even #[cfg] (which is sort-of a macro) won't work.

3 Likes

I don't think banning the use of #[cfg], etc., with mod will be desirable or necessary if this dependency discovery is implemented by rustc (again, in the C++ land we can still have import inside #if/#endif).

1 Like

We can examine the spans of the emitted tokens to see if they were in the original file, and if needed the surrounding tokens can be sanity checked too.

Generating mods from macros is occasionally quite useful in tests as a way to namespace them. I've used the pattern a fair amount. Example: https://github.com/BurntSushi/byteorder/blob/058237661a88c2f610f280340310fce19d98d265/src/lib.rs#L2455

3 Likes

If I understand correctly, these are inline modules which don't pose the same problem (though having different rules could be undesirable).

2 Likes

Thinking some more on it, I don't see how this can even work: if I add a file to a directory, how does cargo know to recompile the crate? My understanding is that since nothing listed in the .d file has changed, cargo will assume the crate is up to date. Or do I need to manually cargo clean && cargo build for this to work?

Thinking some more on it, I don't see how this can even work: if I add a file to a directory, how does cargo know to recompile the crate?

That's why I recommended against it, but the fact is any solution would break things like automod, so either can't be done or would require deprecation.

C++ had to solve this problem only because it requires declarations up front even just to parse their uses. That's what forces their module parse order. (Secondarily, they also use smaller TUs with a separate link step, forcing the build system to handle this ordering even within a single linked artifact.)

Rust does not have this problem at all- you can parse files in any order you like, even across crates! And because its TUs are whole crates, post-parse cross-module dependencies are just worked out by the compiler as part of normal execution. All cross-crate dependencies are just specified manually to the build system. (They already were pre-2018 for versioning anyway- this duplication is why extern crate was removed.)

You can make all the accuracy and throughput improvements you want to the build system without changing this model. (Though @matklad is right that some restrictions could potentially simplify IDEs and incremental compilation.)

4 Likes

Well, C++ modules were designed when all the advanced build tooling already existed, so the module-related problems like quick / early dependency extraction or deep dependency trees could be perceived as regressions and not a norm.

Rust wasn't generally designed with large scale projects / quick rebuilds / highly parallel builds / distributed builds in mind. So what is a regression in C++ is just an everyday matter in Rust.
People recently started working on some improvements though, incremental compilation was also continuously improved in the last few years.

I suspect that the key difference with C++ is probably that any fundamental build system improvement in Rust will have to go through compiler support first.

1 Like

Hm, this hasn't been my experience at all. Specifically, generating the dependency information (--emit=dep-info), which feels like it should need the bare minimum of parsing, requires full extern crate metadata. For example, if I change prog/src/main.rs from my example to read:

fn main() {
    util::f();
}

Where util::f() is defined in the util crate, and then try to generate the dep info, I get an error unless I provide the complete metadata for crate util:

$ rustc +nightly -Z binary-dep-depinfo --emit=dep-info main.rs
error[E0433]: failed to resolve: use of undeclared type or module `util`

So when you say "can" in the above quote, do you mean "theoretically possible" because the language is designed in such a way as to allow parsing without knowing what elements in paths actually refer to? Sorry, I don't have enough knowledge about the language to answer this myself. Though my intuition suggests that this will be very error prone (is util and extern crate or the user just misspelled a module/type name).

1 Like

Looks like in case of --dep-info compilation goes further than strictly necessary.
It needs only to expand macros (that means resolve macros, imports and load metadata from crates that define macros), not resolve things like function calls.

Even for -Z binary-dep-depinfo we can avoid resolving util::f because we can see the dependency from --extern util=PATH if util is a crate. (Inferring dependencies from --extern options is a pessimization though, because some of them can actually be unused.)

One unfortunate thing may be determining the implicit panic/allocator/whatever runtime binary dependency because it may require resolving everything and analyzing the whole tree of actually used dependencies.

That's the impression I am getting as well. I guess the point of my post is to confirm this and see if there is any work/interest in improving things in this area. While I think the early metadata availability and incremental compilation are all great improvemens, it feels like better parallelization (through more accurate dependency information) is a lower-hanging fruit with potentially more significant benefit, especially if having Multiple libraries in a cargo project becomes reality (I really don't believe manually tracking dependencies between them will scale).

That's pretty much the case for C++ as well, at least where the dependencies are concerned; implementing a conforming preprocessor (which is a minimum requirement for anything general-purpose) is a non-trivial effort.

Note that rustc still doesn't perform full compilation when --dep-info is passed. E.g. type checking is not reached (type errors are not reported): Compiler Explorer.
So improving the situation here means moving the stopping point closer to the compilation start and minimizing the work required to calculate dep info by querifying macro expansion and early name resolution.

3 Likes

I would say that the answer is yes and also no.

Yes for the large projects part because for a long time the language development was driven mostly by needs and experience taken from developing Rust compiler itself and entire browser engine (sans Javascript runtime).

But also no in the sense that Rust was (and still is) designed to be build-agnostic language. I will use C++ as an example of the opposing approach. In C++ you have power but also responsibility to optimize for build times, incremental builds and so on by structuring code into separate headers and source files. If you do well then your builds are quick, incremental, distributable. If you neglect this aspect then you suffer badly. Rust on the other hand does not want to put such a a burden on the programmer and makes it an implementation detail of the compiler.

This is a short term/long term trade-off in my eyes. Build-agnostic system would perform worse in the short term but as the compiler matures it can do very well in the end and without any extra work from the programmer. One important thing is that build-agnostic system is more open to new ideas and different approaches.

Right now rustc is still pretty immature and also limited by LLVM build model.

5 Likes

To me a C/C++ header/source "module" (and, in C++20, a real module) pretty closely corresponds to a Rust's crate. And it's the user's responsibility to structure crates (i.e., decide on their granularity, dependencies, etc).

In other words, Rust's "build-agnostic"-ness (as you put it) is at the crate level, not at the project-and-all-its-dependencies level. And so I don't see how improvements to rustc can address issues that stem from inter-crate dependencies.