I like everything in this post a lot.
I especially think its insightful that precompiled artefacts through crates.io is probably going to provide the biggest compile time win on average, rather than improving compiler performance directly. It’s something we’ve thought about, but I haven’t seen its relative impact to other things we’re working more seriously on pointed out in this way before. Its a complicated project though.
Can we actually measure this? I fear that the win wouldn’t be as big as some might hope, due to the fact that most time is spend on codegen and that a significant amount of code-gen happens in the leaf crate due to monomorphisation? Note that precompiled artifacts won’t affect the edit-build-run cycle, only the first iteration of it.
That said, I’ve recently experimented with adding caching of
target/ for crates-io dependencies on CI, and wins were substantial.
Doesn’t this only help the first compile, though? That’s not generally the largest pain point, in my experience.
Or any time you add a dependency or update your compiler version. But @matklad is probably right that we’re being really impacted by monomorphization. So much of that library code is being compiled again and again because its monomorphized into the downstream crate.
Or any time you add a dependency or update your compiler version.
Or any time you do
cargo install or
cargo install-update for some tool pulling in half of crates.io to build, like
Note also this thread: [Idea] Cargo Global Binary Cache
I think it’ll get us 70% of shared crates.io cache with a small amount of implementation & design work.
Fun observation: there’s a desire to add “rlib dependencies” feature to Cargo, to integrate with other build systems and such. This is a huge design space. However, just sharing
target dir for all
crates.io work will get us shared cache aspect of rlib dependencies, and is very simple because the layout of target dir is controlled by Cargo and is an implementation detail. Basically, we need to tweak a couple of lines in Cargo to say "if this dep comes from crates.io, compile it to
~/.cargo/target" and not to
Let’s avoid getting too deep in the weeds on this thread. Good thing for the cargo team to discuss!
While I appreciate that a binary cache would speed up cargo, I’d really like to avoid having one enabled by default, and having people get used to the idea of downloading and running binaries.
For this specifically, you’re already downloading code, compiling it to a binary, and running that binary. Downloading and running a binary is no more problematic, as you still have to trust the distribution, unless you’re in the habit of auditing the actual code you’ve downloaded.
Said cache would definitely want to be the one providing the compiled binaries rather than trusting uploaders to send the correct one, so that you only have to trust one source instead of many, but you still have to trust the distributer.
I don’t see how downloading a binary is any more dangerous than downloading the code and compiling it (other than local CPU time). If the host is complicated (knock on wood), they can alter either. If hashes are used to prevent this from a separate source, they prevent either, or if from the same source, neither.
Yes, downloading unsandboxed code and running it is a security risk, and it’s one that’s only solved by trusting the source. But the risks are the same for pre-compiled or not. You can even stick in a verification that the pre-compiled binary is the same as what you get building the code when you audit it, if you’re actually auditing the code. If it’s been built already, a trusted source confirms it’s built correctly, what would you benefit from building it again?
(This is assuming it’s being built with the same flags of course, if you have a rare configuration you’d just build it locally.)
TL;DR: assuming reproducible builds, what risk does downloading a hash-verified binary have that a hash-verified source doesn’t? As far as I can tell, the trust problem is identical to both of them.
You’re assuming reproducible builds and someone auditing the binaries to make sure that they were built appropriately.
Are we talking about having a central service build all Rust code once? And if so, how do we make sure that service (a high-value target) doesn’t get compromised? (Especially when it builds random Rust crates.)
That said, I can think of one advantage to building all Rust crates once. Imagine if, when a crate gets uploaded with
cargo publish, it goes into a staging area until built, and if it doesn’t build (in an isolated environment with only the declared dependencies and no network access) then it gets rejected.
But there are existing crates that have dependencies other than the declared ones. Like
openssl-sys, which additionally depends on OpenSSL, and
winapi, that depends on a bunch of closed-source DLLs. And those crates are useful, and you can’t legally bundle the source code for kernel32.dll.
winapi is quite capable of building without access to those proprietary DLLs.
And we need better solutions to handle dependencies on third-party libraries.
Fyi: Some popular source licenses don’t allow redistribution in binary form, or have additional obligations in this case.
That wouldn’t be an Open Source license.
Primarily distributing corresponding source code, which any service hosting binaries of crates would have to do.