Dev-dependencies may not even be present in the normalized Cargo.toml. For example if you are using a path for a dev-dependency without also specifying a version. Then it gets omitted entirely when building the package.
I am sympathetic to your plight. As package maintainer for Rust crates in Fedora Linux, we often encounter the same issues while reviewing crate contents, and also file pull requests to exclude problematic files from published crates.
Big however: In my experience, the inverse problem happens much more frequently, i.e. not enough files are included in published crates for what we need - and I fear that including fewer files by default would exacerbate this issue.
The most frequent problem is that license texts - especially for licenses where including the license text is mandatory - are missing, which is a hard blocker for packaging purposes. It appears that there are multiple open RFCs for cargo that could improve this situation with better Cargo.toml metadata or cargo behaviour (cargo#5933, cargo#16666, cargo#16893, cargo#12053, cargo#9972, etc.), some of which are three Editions old ![]()
The second most frequent issue is that published crates don't contain input files for tests. We run cargo test during packaging (where possible) to make sure we don't ship software that is completely broken. I understand that test input files that are larger than a few kilobytes get excluded from published crates by most developers - this is a reasonable decision. However, it currently appears to be "best practice" (or at the very least, "usual practice") to extensively use include_bytes! or include_str! macros in test code to include that test data into tests, which means running cargo test fails to compile (due to missing input files) instead of failing the test at runtime. Tests that harmlessly fail at runtime are easy to deal with (cargo test -- --skip test::name), but if cargo test fails-fast because the tests fail to compile that's much more annoying to deal with.
In both cases, I fear that the proposal here (include fewer files by default) would make things worse, while only having a small positive effect in most other circumstances.
Tangential to the rest of the thread, but I think that this should be considered poor practice. It has several disadvantages:
- Fails compilation if the file is missing, as you note.
- Requires recompilation whenever the file contents change. (Only relevant when working on that specific test.)
- Passes the entire file contents through rustc and the linker, which is unnecessary effort and puts copies of the file in intermediate artifacts and the binary. (Only relevant if the file is large.)
I expect that the main reasons people use include_bytes! are that it is very convenient: it is concise, available with zero imports, and takes a file-relative path. It's also "more statically checked" in that it is guaranteed not to fail at run time — but that creates a packaging hazard, as you have seen.
I disagree. We use include_bytes! pretty extensively in RustCrypto by encoding KATs into a custom binary format, including the resulting blob, and parsing it at compile time (see the blobby crate).
What do you propose to use instead? Keep thousands of KAT lines as part of source code? Or use file IO to read KATs at runtime? We want to have compact tests, keep the compile time checks (e.g. that the blobs have expected format) and to be able to execute tests on no_std targets (with some minor tweaking).
Now, I agree that excluding only blobs and thus breaking tests is not correct and should be discouraged. For example, in cmac we exclude both "large" blob and module which uses it.
This is a strong reason to use include_bytes!(). But most projects do not support running tests outside the host environment at all. I’m not saying that include_bytes!() should always be avoided; rather, that it has significant disadvantages and should be used thoughtfully rather than for all test data.