I want to compile rustc completely from sources, with no binary blobs.
This is partly to satisfy my curiosity. It's also because I am concerned about the long-term implications of relying so much on binary blobs for rustc development. (For example, I have not been able to deduce the complete security story of blobs pulled by rustup.)
Is there a known happy path to compile rustc entirely from sources?
My best understanding is that I'll need to iteratively bootstrap versions of rustc until I get a working stage0 compiler. I'll need to start with a C compiler to bootstrap that first version of rustc.
If nobody else has done this, I'm happy to take advice and run experiments.
There should be several stories around about how to bootstrap current Rust from a given compiler like for example GCC/g++.
Here's one description from 2018: https://guix.gnu.org/blog/2018/bootstrapping-rust/ and I hope it is about the same thing you want to do, building rust "from scratch". Building from OCaml would be a gigantic task, and the more recent ways like the one described here have actually been done.
I've heard rumors that mrustc support for compiling Rust 1.29 is in progress @thepowersgang
Yeah, back in the early bootstrapped days you have to almost compile each merge commit with the previous one, there was no snapshoting like now ("every stable Rust build with the previous stable Rust") as far as I know. So that'd likely be hundreds of bootstrap builds, re-tracing all of Rust's history...
Is there something particular about rust that causes you concern, here? Do you trust the binaries that build your C compiler more for some reason? Is there something that C does better here that you'd like Rust to do?
(Not to mention the binary in your processor, or other such things...)
Sounds like mrustc is the way to go! I'll look into contributing to that project. Thanks everyone!
@scottmcm I'm not saying trust no binaries, I'm saying reduce the number that we have to trust. I'm aware of, and sympathetic to, security issues like the trusting trust attack. The fact that there are binary blobs in our CPUs doesn't imply that we want to use more binary blobs than we need to. (I believe this to be especially true in Rust's case, given the aforementioned questions around the security of rustup blob creation.) :-]
What I would find useful is if some small number of reputable companies would do an mrustc-based bootstrap from rustc 1.19.0 and, shortly after each official compiler release, post some small attestation that the official checksum matches the one that they get from their bootstrap chain. I think if Google, Facebook, Microsoft, and maybe one other reputable company were doing this as a community service, it would basically eliminate any concerns around simply using the official blobs even in use cases where that would ordinarily be far out of the question.
This sounds super-useful to do (checksum publishing etc) and would perhaps be something the https://reproducible-builds.org/ team might be interested in helping with?
This is not correct. We had a snapshot-registration system in place from the very beginning of bootstrapping. There have just been many, many snapshots over the years. Replaying even the snapshots would take a long time.
I'd love to have something like this in the general case so we could build systems that use precompiled binaries without completely punting on the trust issue like we mostly do currently. With all the work that has gone into reproducible builds over the years it seems like it should be feasible to build a public blockchain-style log where information on builds of specific pieces of software could be published along with signatures as attestation. You could build a Merkle Tree by adding entries for the builds of the tools you use (like gcc/clang/etc) and referencing them in the builds of other software (like rustc). If you could convince a few independent organizations to perform builds and submit attestations you could have fairly high confidence that a compiled binary for a specific piece of software is at least the same thing produced by other people.
I haven't heard of such a thing proposed before. Mozilla does have a proposal for Binary Transparency but this is primarily for the purpose allowing Firefox users to verify that the binaries they install are the same ones that Mozilla has produced and everyone else is installing (i.e., you haven't been targeted by a malicious actor who may be able to subvert code signatures). Debian has test infrastructure that attempts to build packages in a reproducible way and reports statistics. The closest thing is probably F-Droid's support for reproducible builds. F-Droid is an alternative software repository for Android, and they allow developers to publish signed applications if F-Droid can exactly reproduce the application build (modulo the signature). As part of this they have a Verification Server that anyone can run to attempt to reproduce builds of published applications. It feels like that would be a pretty good start towards creating such a thing, you'd just need to have somewhere to publish the results in a way that others could consume them.
I put up https://github.com/dtolnay/bootstrap which is a reliable and transferable setup for bootstrapping the most recent stable rustc from source from mrustc 0.9 and rust 1.29.
It doesn't yet exactly match the hashes of the official binary releases, but the hope is that once the reproducibility of the Rust compiler is in better shape (rust-lang/rust#34902) we can get the bootstrap chain from this setup to converge exactly with the official one.
At that point we can establish a diverse set of independent bootstrap chains at some number of reputable organizations in environments that are trusted by each one, and provide validation of the official releases as they come out.