This attack requires compromising the git repository. The security benefit is precisely to require that. It's not so hard to make the git repository more secure than than the crates.io tokens are, just encrypt your ssh private key. Since we really want to secure our got repositories in any case, making crates.io at least as secure send a major benefit, even if it's opt-in.
One can hope that the maintainers of the most widely used crates will take the effort to secure their repository keys.
When I said that I've had a different attack in mind when a malicious actor would publish their own crate and their own repo, except the repo had source code that looks innocent, but the crate would contain something else. This is a way to mislead users who review source on GitHub instead of reviewing actual crate source.
I am very worried about insecurity of access tokens, but IMHO that should be addressed directly with 2FA on crates-io, and not indirect and more complicated repo<>crate matching machinery. There's no guarantee that the repo is well protected, but crates-io could enforce its own 2FA.
Indeed, I think the most useful/least misleading thing was if crates.io could tell you the SHA corresponding to a crate, trying to attest to tags/branches seems absolutely the wrong thing to do.
Isn't it true that an attacker who has gained access to publish to creates.io is free to publish a version of a crate that references attacker's own github repo?
If the goal is to make it harder for attackers to publish malicious versions of an existing crate with compromised access tokens, I think implementing some sort of 2FA or staged uploads on the crates.io side would offer better protection than adding optional code verification against a git repository.
While not impossible, implementing code verification wouldn't be trivial either. Cargo modifies the source code of the crate when it generates .crate files (for example it moves the original Cargo.toml to Cargo.toml.orig, changing the contents of Cargo.toml), so we'd have to detect which version of Cargo was used to upload the crate (which is not possible for older Rust versions), download that version (potentially a nightly) and run cargo package on the repository. Cloning git repositories could also be really slow, clogging the crates.io background jobs queue when a lot of crates are published in a short amount of time.
I'm a little confused on the proposed logistics here. Assume that "cargo publish" did the above describe validation. What would happen in the following scenario:
I, a malicious crate publisher, insert some malicious code into my crate. I commit the malicious code via mercurial/pijul/git/whatever. Due to the aforementioned validation, I'm required to push this code to some public repo before publishing. I do that. (I assume I also have to record the name of a version control reference in my cargo.toml file. For example, the name of a git tag). Then I run cargo publish.
Crates.io on the backend will validate that the publicly available code matches what is uploaded in the .crate file (taking into account the differences described by pietroalbini).
But after that validation happens, I immediately delete the commit/tag that I just published, and perhaps publish a new one in its place (one that doesn't have the malicious code).
What happens now? Anyone can still download the crate file to inspect the code (edit: but the initial code verification is now useless). Will there be additional tools that will allow a downstream user of this malicious crate to say "the code uploaded at package time used to match some public commit, but not that commit cannot be found"? This is too expensive to do everytime a crate is downloaded. Maybe cargo-crev could learn to do this?
FWIW, I am somewhat skeptical of this code-comparison approach, and I also think that 2FA or something similar is the better approach to protect against the misuse of compromised crates.io credentials
Not if my idea is implemented and a previous version referred to the repository. The idea is to use the previous version's repository configuration to prevent that from happening.
I agree that would provide more security if it were used. If it were optional, it might well provide less security. Security approaches that are complicated or inconvenient aren't used.
Not crates.io itself, but certainly a Rust-specific crate source viewer website would be an excellent place to implement Rust-only features like running rust-analyzer or building registry-wide crossreferences to symbols. If this site became the standard for casual source review of Rust crates, then the "I'm not actually looking at the published source" problem goes away.
What is the right place to submit feature requests for this? For example, when pointing to a particular function, it would be good to be able to link to a line in the code, similar to what the rustdoc, GitHub or GitLab source viewer do.
I don't know how useful it is to have both docs.rs source view and the auto generated rustdoc source. Docs.rs will never be able to offer go to definition or anything like that, it doesn't have any info about the code itself. IMO what we should do instead is make it easier to navigate between the Rust source that rustdoc generates, and the non-rust code hosted by docs.rs that rustdoc ignores.
(To be clear, when I say non-rust code, I mean things like README.md, build.rs, Cargo.toml.)
I can neither remember the name of the project nor find it relatively quickly, but I recall a project from a while back that generated static webpages where the code contained go-to-definition style links, type annotations on hover, and similar. It was a clever thing that I wish were more widely used. yes, I know not remembering the name isn't helpful
I would presume something like this could theoretically be tied into the docs.rs build setup, though it would certainly extend the already stressed build times.
Rustdoc excludes source files that aren't used while compiling the current crate. That includes source files used on different platforms, build scripts and files read by build scripts.