Dev-dependencies may not even be present in the normalized Cargo.toml. For example if you are using a path for a dev-dependency without also specifying a version. Then it gets omitted entirely when building the package.
I am sympathetic to your plight. As package maintainer for Rust crates in Fedora Linux, we often encounter the same issues while reviewing crate contents, and also file pull requests to exclude problematic files from published crates.
Big however: In my experience, the inverse problem happens much more frequently, i.e. not enough files are included in published crates for what we need - and I fear that including fewer files by default would exacerbate this issue.
The most frequent problem is that license texts - especially for licenses where including the license text is mandatory - are missing, which is a hard blocker for packaging purposes. It appears that there are multiple open RFCs for cargo that could improve this situation with better Cargo.toml metadata or cargo behaviour (cargo#5933, cargo#16666, cargo#16893, cargo#12053, cargo#9972, etc.), some of which are three Editions old ![]()
The second most frequent issue is that published crates don't contain input files for tests. We run cargo test during packaging (where possible) to make sure we don't ship software that is completely broken. I understand that test input files that are larger than a few kilobytes get excluded from published crates by most developers - this is a reasonable decision. However, it currently appears to be "best practice" (or at the very least, "usual practice") to extensively use include_bytes! or include_str! macros in test code to include that test data into tests, which means running cargo test fails to compile (due to missing input files) instead of failing the test at runtime. Tests that harmlessly fail at runtime are easy to deal with (cargo test -- --skip test::name), but if cargo test fails-fast because the tests fail to compile that's much more annoying to deal with.
In both cases, I fear that the proposal here (include fewer files by default) would make things worse, while only having a small positive effect in most other circumstances.
Tangential to the rest of the thread, but I think that this should be considered poor practice. It has several disadvantages:
- Fails compilation if the file is missing, as you note.
- Requires recompilation whenever the file contents change. (Only relevant when working on that specific test.)
- Passes the entire file contents through rustc and the linker, which is unnecessary effort and puts copies of the file in intermediate artifacts and the binary. (Only relevant if the file is large.)
I expect that the main reasons people use include_bytes! are that it is very convenient: it is concise, available with zero imports, and takes a file-relative path. It's also "more statically checked" in that it is guaranteed not to fail at run time — but that creates a packaging hazard, as you have seen.
I disagree. We use include_bytes! pretty extensively in RustCrypto by encoding KATs into a custom binary format, including the resulting blob, and parsing it at compile time (see the blobby crate).
What do you propose to use instead? Keep thousands of KAT lines as part of source code? Or use file IO to read KATs at runtime? We want to have compact tests, keep the compile time checks (e.g. that the blobs have expected format) and to be able to execute tests on no_std targets (with some minor tweaking).
Now, I agree that excluding only blobs and thus breaking tests is not correct and should be discouraged. For example, in cmac we exclude both "large" blob and module which uses it.
This is a strong reason to use include_bytes!(). But most projects do not support running tests outside the host environment at all. I’m not saying that include_bytes!() should always be avoided; rather, that it has significant disadvantages and should be used thoughtfully rather than for all test data.
It's good to see the discussions flurishing here. ![]()
Meanwhile I've hopefully read through most of the issues and comments linked to the license problem mentioned by @epage. From what I gather, it mostly boils down to the fact that workspace crates "lose" license files during packaging if they are only located in the workspace root. I think the comment about deciding against the glob pattern for matching licenses deals with copying files from the workspace root into the individual workspace crates to circumvent that issue, specifically when running cargo new I think (Symlink-or-copy `LICENSE` / `LICENSE-*` in workspace root when creating new member · Issue #13328 · rust-lang/cargo · GitHub).
I agree that this is not a good way to go about this because here assumptions are made about which license belongs to which workspace crate and which license is related to which SPDX entry, while ignoring if maybe there is a custom license in the mix as well. So to solve this issue, I think the discussed approaches to link the SPDX identifieres with the specific files (either with a table that links SPDX identifier with the license file path or, also mentioned, with an adjustment of the SPDX standard to include the file path directly as part of the SPDX entry as well) are reasonable.
Yet I believe that the glob-based matching of license files that I have described above (like pip does it) is not related to the solutions discussed as part of the workspace-license issue.
- In this Pre RFC we try to limit the number of files included by default in a crate during packaging.
- As part of this, we are only dealing with files that are already there, either as part of a single crate or as part of workspace crates.
- During packaging, I've proposed that we only take a sensible selection of files instead of all of them to improve crate size and ease supply chain reviews. What that sensible selection actually is should be figured out in this (Pre) RFC.
What we are not doing with the approach described in this Pre RFC:
- We are not adding any files from outside of a (workspace) crate. We are only reducing the number of included files to a subset of the files that are already there.
- We are not making any assumptions about which license belongs to which crate or which SPDX identifier. We are not categorizing licenses into SPDX licenses or custom licenses. License files that are in a crate root have been put there by the crate authors deliberately. License files in a workspace crate have been put there deliberately as well to workaround the current limitations of cargo license handling.
So in my opinion the issue of workspace licenses can be handled independently from the topic described in this Pre RFC and is not a prerequisite. Adding a glob-based license file matching to make a sensible selection of license files to include in a crate is not a short term, lukewarm solution to the issue of workspace licenses and is not influencing any decisions in that process. As soon as a solution is found for the workspace license issue, the glob matching of license files can be (and has to be) replaced by pulling in clear information about which license files need to go where, and I think that should be easily possible. During that step, crate authors have to make adjustments to their file structures anyway. But since currently most (if not all) workspaces have to rely on workarounds to correctly ship license files, the glob-based matching should be a fitting approach alongside it.
As mentioned, the workspace-license discussion has been going on since 2018, so there is a good chance that I have not read about every aspect of that topic. Please let me know if I've missed a crucial part that affects what I have written above.
I hope it is not taken as carelessnes on my side.
It is also very interesting to read about the use cases of tests that are shipped with crates. Personally I've never run a test from a dependency in all my years of programming Rust at my day job or for personal projects, so it's good to learn about what others are doing with it. Since I am mostly working on macOS, I am not familiar with linux package managers and I feel like crater on the other hand is a somewhat mysterious thing looming in the background.
So it's great go get a bit more insight in these topics.
It's good to hear about other Rustaceans that go through the struggles of suppy chain reviews as well. ![]()
As mentioned, the license issue you are describing is definitely a pain point, but I fear it is out-of-scope for this particular Pre RFC.
Regarding tests, I am not sure if reducing the amount of files that should be included by default will automatically lead to fewer working tests. Automatically checking the tests as part of the packaging/publishing process has been discussed in this thread as well, but I fear it might be out-of-scope as well. I also would reckon that crate authors that are aware of the usefulness of tests in their crate will make sure to include the correct set of files.
Having more complex test setups should go hand in hand with a heightened awareness of why including additional test data in the shipped crate is important (or is not important). And here, requiring the authors to explicitly include these files should make sense in my opinion. ![]()
Globbing wouldn't ever be a solution to the workspace license problem. However, solving workspace licenses means Cargo would have a definitive solution to knowing and bunding license files, making the globs obselete. That is what needs to be engaged with.
I linked to issues not for the directly reported problem but the overlaps in use case and designs to make sure there is good alignment. While we may need to limit scope to move forward, we should consider the bigger picture to make sure we fit within it.
That should block publishing in my opinion, just as it would for normal dependencies.
I disagree, I think it is a strong argument against removing files by default until tests are run during publishing. Most crate authors are not aware, and because of the poor support today, crater and Linux distros are likely the only users of the tests currently. Let's not make that worse.
That would break publishing of serde. serde_json depends on serde, but serde depends on serde_json for doc tests. And serde_derive depends on serde for tests too (which in turn depends on serde_derive). At work we had a similar problem with a project that we wanted to publish to crates.io. Making use of this dev-dependency removal feature allowed us to unblock publishing.
You could have a flag to opt out for those rare cases. Or perhaps atomically publish several crates from the same workspace together.
You can actually publish circular dependencies (or at least, used to be able to, though I think cargo removes them during packaging now), crates.io does not check version constraints on publish, only that the crate name exists, so you just have to publish one name reservation version before the first publish.
Cargo does check the version constraints though. We already had a version of all involved crates previously published.
Only during verification, for that (and IIRC other reasons, though it's been a while), I commonly have to publish with --no-verify and just pray.
I am aware that you linked to the issue (and everything that is linked and cited from there) to draw attention to the bigger picture and all the conversations that happened around licenses/license files. As you mentioned, it is important to have these conversations in mind and I went through them without limiting myself on information that deals with the specific issue I've linked in my post again. I just included it because you mentioned before that there was a decision in the Cargo team to not use globs for license files.
Yet this discussion happened for a very different use case, as I was trying to make clear in my post you replied to. I spent the first part of my post to go into detail on this issue because I got the impression that this specific detail is currently considered to be a blocker for the changes layed out in this Pre RFC.
The rest of the post deals with the general relationship between the issue of workspace-licenses and this Pre RFC and how I think they can exist separately. I've tried to discuss how making a decision here would not affect or limit the possibilities for a future solution of the workspace-license issue. I've included why I think a glob-based license file matching fits in the current state of workspace licenses and would be a reasonable approach to figure out which license files to include in a crate by default.
Do you think my post is going in the direction of analyzing overlaps in use case and design or would you wish for something else in that regard? I don't claim to have covered every detail, but in my opinion, it should at least be a good start to compare both issues.
I absolutely get your point and I understand why running tests would alleviate that problem. What I meant by "out-of-scope" is that this would likely require a separate change (RFC even?) that would block the changes described here from being implemented. ![]()
Btw I just had a look at the crater run for the latest 1.95.1 beta (the full report). To my knowledge, these reports are currently the best way to gain insight into crater runs and how they are utilized in the Rust ecosystem, is that correct?
I was wondering if maybe someone has already looked into e.g. the "build fail" category to maybe understand which targets are failing and why (of course, of interest to us, failures to build test targets). Also I am yet to find some documentation for the different categories in the report. Have I not looked in the right place yet or is that something that needs doing in the future?
There are two parts here
- I called it a short term solution and you said it wasn't but I'm not really seeing a justification as to why. Re-iterating the structure of your post does not help.
- Your responses tend to be very verbose and take a while to replying to the core concern raised by the other person, to the point I've been wondering how much is being written by an AI and whether it is worth engaging. If using an AI, please don't. If this is yourself, then keep in mind that in technical writing, there is the concept of an Inverted Pyramid in structure where you make your core point in response and then slowly expand from there so people can't miss your point and can get as much detail as they need.
I can assure you that I never use LLMs when writing my posts. But I gotta say it is quite disheartening to get such a presumptuous reply.