Rust’s release packaging and distribution has several deficiencies today. The most important, and blocking rustup 1.0, is that updates suffer from periodic checksum failures due to how the data rustup reads from static.rust-lang.org is structured. rustup also needs to incorporate signature validation, and to better compress the binaries it downloads.
This is an outline of a design to fix these problems. It is building on a flawed system that has grown in ad-hoc ways over the years Rust has been published. Since each time we publish iterations to this scheme they must essentially be supported forever, I’m advertising this iteration in advance in hopes of getting feedback to avoid making any major design mistakes (such as the one that lead to the checksum bug).
This should also serve as documentation as to how the system works, since there is no such documentation now. This is technical information about the mechanics of distribution through rustup, and mostly won’t be of interest to end-users.
Please understand that the design here will by necessity build on the existing system. A complete rewrite is out of scope. In the near term, the only new feature that I am definitely planning on implementing is the checksum fix, but I want to make sure there is a path forword for the others as well.
If you don’t care about all the background you can just skip ahead to read about how rustup distribution will work tomorrow.
- How rustup distribution works today
- The problems with rustup distribution
- How rustup distribution will work tomorrow
How rustup distribution works today
I’m going to focus on the distribution of the published artifacts, and not how they are actually produced (which is complex, and I would also like to revisit someday).
The static.rust-lang.org layout
The easiest way to think about this is to first understand how all the published artifacts are arranged on static.rust-lang.org. So here’s a map of static.rust-lang.org with the key pieces named and noted:
- rustup.sh - This is the old script that downloads and installs a single toolchain. Unrelated to rustup.rs, but using the same metadata.
- rust-key.gpg.ascii - The Rust GPG signing key, a subkey of which is used to sign the distribution manifests and tarballs (which are validated by rustup.sh if GPG is available, but not by rustup.rs). No part of the official distribution tools actually use the key by downloading from this URL - rustup.sh includes an embedded copy.
- dist/ - The directory containing Rust distributables: the manifests
describing each release and the corresponding binaries, checksums and
signatures. This does not contain cargo binaries, for legacy reasons, or
rustup binaries, for organizational reasons.
- channel-rust-stable.toml - The “v2” manifest for the stable release channel. This is the file that describes the contents of a Rust release, produced for all three release channels as well as duplicated with stable URLs for each numbered release. The contents of this file are a list of "components", their architectures, their relationships, URLs, hashes, and assorted metadata. See the following section for an example. The URLS listed in the manifest all point into the archives. This is the entry point rustup uses to discover new releases today.
- channel-rust-stable.toml.sha256 - The SHA-256 sum of the above, as
produced by the
sha256sumcommand on Unix.
- channel-rust-stable.toml.asc - The GPG signature of the above.
- channel-rust-stable - The “v1” manifest. This is the legacy manifest format, used primarily in the process of building the v2 manifest, but also for installing old toolchains that don’t have v2 manifests.
- channel-rust-stable-date.txt - A file recording the archive date corresponding to a v1 manifest, produces as part of v1 manifest construction, used in the production of the v2 manifests, but otherwise not part of the distribution process.
- channel-rust-beta.toml - The manifest for the beta channel. Accompanied by all the same sibling files as the stable channel.
- channel-rust-nightly.toml - The manifest for the nightly channel. Accompanied by all the same sibling files as the stable channel.
- channel-rust-1.12.0.toml - The manifest for single stable release of Rust. rustup uses these for installing specific releases. These notably are not accompanied by the v1 manifest files.
- (There are many other files in this directory, many produced as part of the release process, but are irrelevant to distribution through rustup. The files described in the archives below are duplicated here but are essentially unused).
- $YYYY-$MM-$DD/ - Archive folders. The binary artifacts produced each day
are published here, along with duplicate copies of the above-mentioned
manifests and their accompanying sibling files. Importantly, every file in
these folders is written once ever (modulo accidentally publishing twice
in one day). The components installed by rustup are sourced from these
- (The manifest files above are all reproduced here. Eliding their descriptions).
- rustc-nightly-x86_64-apple-darwin.tar.gz - The binary containing the rustc “component”. It is in rust-installer format. rustup combines components as described in the manifest and requested by the user to produce a working toolchain.
- rustc-nightly-x86_64-apple-darwin.tar.gz.sha256 - The SHA-256 checksum of the above.
- rustc-nightly-x86_64-apple-darwin.tar.gz.asc - The signature of the above.
- (Various other components using the same scheme are in this directory, as will as the source tarballs corresponding to the release).
- cargo-dist/ - This directory has the same structure as
dist/, including archives, except that it contains only the cargo build artifacts. The two are seperated only for legacy reasons. The rust manifests contain URLs that point into this directory to retreive the cargo components.
- rustup/ - The directory for distributing the rustup tool.
- rustup-init.sh - The script hosted at sh.rustup.rs for installing rustup via curl on Unix.
- dist/ - The directory in which the current rustup-init binaries are published.
rustup is installed via a single self-contained executable that is downloaded
- $target-triple/ - Each installer is located in a target-specific directory.
- rustup-init - The installer, a platform-specific executable. On Windows it is called rustup-init.exe and in the future could be called rustup-init.msi.
- rustup-init.sha256 - The checksum of that installer.
- $target-triple/ - Each installer is located in a target-specific directory.
- archive/ - The rustup archives.
- $version - Each directory in the archives is a version number (whereas
the Rust archives are dates).
- (The contents of this directory are identical to rustup/dist/).
- $version - Each directory in the archives is a version number (whereas the Rust archives are dates).
The v2 manifest format
Here’s an example of what the manifest looks like, with redundant details elided. The most important things to note:
- The “date” is the same as the archive date.
manifest-versionallows the format to change incompatibly (rustup will reject unknown versions), but new types of information can be added to the manifest backwards-compatibly.
- The manifest is made of packages defined for some number of targets.
- Packages may have subcomponents which are themselves packages.
- Componentns may be required (the “components” key), or optional (the "extensions" key).
- Any given component may be unavailable, allowing lower-tier releases to fail to build.
- Components define URLs of the tarball and hashes of that tarball.
- The “rust” package is the only one with subcomponents. When rustup installs Rust it reads this package and traverses the component tree from there.
- The “rust” package’s tarball is never downloaded by rustup, only those of its components. The “rust” tarball is a combined package of all the required components and is the one individuals install off the website today.
- The “version” key provides the same information as
date = "2016-08-16" manifest-version = "2" [pkg] [pkg.rust] version = "1.11.0 (9b21dcd6a 2016-08-15)" [pkg.rust.target] [pkg.rust.target.aarch64-unknown-linux-gnu] available = true hash = "10cb2ed86992f6273d0b3bf631b7eed4c8418d88baa2b9c8057c7a60011dd4ce" url = "https://static.rust-lang.org/dist/2016-08-16/rust-1.11.0-aarch64-unknown-linux-gnu.tar.gz" [[pkg.rust.target.aarch64-unknown-linux-gnu.components]] pkg = "rustc" target = "aarch64-unknown-linux-gnu" [[pkg.rust.target.aarch64-unknown-linux-gnu.components]] pkg = "rust-std" target = "aarch64-unknown-linux-gnu" [[pkg.rust.target.aarch64-unknown-linux-gnu.components]] pkg = "rust-docs" target = "aarch64-unknown-linux-gnu" [[pkg.rust.target.aarch64-unknown-linux-gnu.components]] pkg = "cargo" target = "aarch64-unknown-linux-gnu" [[pkg.rust.target.aarch64-unknown-linux-gnu.extensions]] pkg = "rust-std" target = "aarch64-apple-ios" ... [pkg.cargo] version = "0.12.0-nightly (6b98d1f 2016-07-04)" [pkg.cargo.target] [pkg.cargo.target.aarch64-unknown-linux-gnu] available = true hash = "1bdf1f446199b164b8e4234fede68fb82d4528adce9757ad69fa6ace67859d94" url = "https://static.rust-lang.org/cargo-dist/2016-07-05/cargo-nightly-aarch64-unknown-linux-gnu.tar.gz" ... [pkg.rust-docs] version = "1.11.0 (9b21dcd6a 2016-08-15)" [pkg.rust-docs.target] [pkg.rust-docs.target.aarch64-unknown-linux-gnu] available = true hash = "a48747395ef79578e208b36a94c6337d30aa3993a62d2fba22c0cc8da3077c9d" url = "https://static.rust-lang.org/dist/2016-08-16/rust-docs-1.11.0-aarch64-unknown-linux-gnu.tar.gz" ... [pkg.rust-std] version = "1.11.0 (9b21dcd6a 2016-08-15)" [pkg.rust-std.target] [pkg.rust-std.target.aarch64-unknown-linux-gnu] available = true hash = "5bc07fe375913dee02dc4dba1d2388e6f35166f24ba69b0568d560f335689f31" url = "https://static.rust-lang.org/dist/2016-08-16/rust-std-1.11.0-aarch64-unknown-linux-gnu.tar.gz" ... [pkg.rustc] version = "1.11.0 (9b21dcd6a 2016-08-15)" [pkg.rustc.target] [pkg.rustc.target.aarch64-unknown-linux-gnu] available = true hash = "49d3977bc4d3d868ae0632b860308a43975255a81a92d26c3a40569b4186bdc1" url = "https://static.rust-lang.org/dist/2016-08-16/rustc-1.11.0-aarch64-unknown-linux-gnu.tar.gz" ...
How rustup interprets this information
OK, here’s the important stuff. There are two aspects of rustup updating: toolchain updates and self-updates.
Updating a toolchain from a release channel:
- rustup downloads
- rustup compares the checksum to that of its cached stable toolchain. If they are the same then there’s nothing further to do.
- rustup downloads
https://static.rust-lang.org/dist/channel-rust-stable.toml, the release channel manifest.
- rustup compares the downloaded checksum to the checksum of the manifest. If they are not the same, it reports an error and stops.
- based on the component tree described in the manifst, user options, and the currently-installed toolchain, rustup builds a list of tarballs to download and their checksums.
- rustup downloads each tarball from the archives and compares the contents to the checksum in the manifest. If any fails it reports an error.
- rustup installs each component by interpreting the tarball contents per the rust-installer format. If any of these fail it rolls back all installation operations.
Notable things rustup does not do while installing toolchains:
- Make use of the
.sha256files corresponding to component tarballs. These checksums are drawn from the manifest instead.
- Make use of the
Updating rustup itself:
- rustup downloads
- rustup compares the checksum to that of the current installation. If they are the same then there’s nothing further to do.
- rustup downloads
- rustup compares the downloaded checksum of the checksum of the installer. If they are not the same it reports an error and stop.
- rustup runs the installer and replaces itself.
The problems with rustup distribution
This section is going to describe the problems under consideration, along with some of the constraints to solving them.
Both the toolchain upgrade process and the self-upgrade process begin with a step that involves downloading a checksum of an artifact from a known location, downloading the artifact next to it, and comparing them. Sometimes these two files are not paired correctly and the check fails.
Exactly why these files don’t become available together is unclear, but static.rust-lang.org is distributed through a CDN, and the process of distributing the files takes quite a while. So it seems exceedingly likely that in the process of distribution these two files that must always be paired become unpaired for a window of time.
Today rustup does not guarantee that the artifacts it is installing were actually produced by the Rust build machines. It only guarantees that they come from a server that some certificate authority says is static.rust-lang.org.
The Rust build process does sign its artifacts using GPG, but GPG is seemingly unsuitable for use in rustup because it is a complex, unlibrarified dependency built on a lot of non-Rust code.
rustup needs to guarantee that the binaries it puts on your computer are the ones produced by the Rust release builders.
Today Rust binaries are distributed with gzip compression, which is not very efficient. There’s a lot of demand for us to use a compression scheme with a higher compression ratio. A compression scheme should favor decompression speed over compression speed since we can affort to waste time compressing, but want installation to be fast. The best candidate seems to be XZ.
How rustup distribution will work tomorrow
So the primary thing I’m trying to solve here is the checksum failures, but thinking ahead to the other issues, anything we do here will need to be extensible to support signing.
After review, I think solving the signing issue should be done by just impleminting The Update Framework. It seems to be a well-considered design, so I want to just do the work to implement the whole thing and be done with it. At that point Rust distribution should have best-in-class security properties.
TUF actually solves the checksum problem as well, but since implementing TUF will take considerable work I would like to do this in two stages: first solve the checksum problem with some simple hacks, expecting to throw them away later; then solve it right via TUF.
Stage 1: Fix the checksum failure
There are two distinct checksum failures: the self-update failure, and the toolchain manifest failures; and we’ll fix them in distinct ways. As part of this we’ll make the assumption that we are not using checksums to verify integrity. In the current iteration, we use HTTPS to ensure integrity, that the data is transmitted correctly. Once we convert to TUF, we can transmit over less reliable transports and TUF itself will ensure integrity, as well as provenance.
The self-upgrade checksum
The key thing to solve here is to convert the upgrade “entry point”, the initial file downloaded that determines whether to do the upgrade and where to look for the data, from two URLs to one, with all subsequent artifacts coming out of the archives (which are never overwritten and thus don’t suffer any sort of version drift). There are two obvious ways to do this: with a directory symlink, and with a different file format.
The mechanism for setting up a symlink is CDN-specific and doing so as part of the release process is sufficiently complex that I prefer not to do it. So my proposed fix is to create another single-file scheme for this purpose.
Besides being a single-file entry point, the only other thing it needs to accomplish is to provide some unique key that can be compared to the existing install, to determine whether the upgrade should be done at all.
The toml-based file scheme is:
schema-version = "1" version = "..."
This file is named
There is no corresponding
.sha256 file, because avoiding having a second file
is the whole goal. (We may though, just for consistency, generate and upload the
.sha256 file. Perhaps some find them useful).
Determining whether to do an upgrade will be performed by comparing the contents
of this file to the contents of the previously-installed file. It could also
be done by comparing the
version field or the artifact hashes.
Like the manifest format, this one comes with a version,
rustup will verify that it understands this version before proceeding. Although
we expect this file to be deprecated quickly, and unrecognize fields will be
ignored, versioning it is still prudent.
To do an upgrade, rustup will simply look in the archives under the folder
The toolchain manifest checksum
We’ll just paper over this one for now. When rustup sees a manifest checksum failure today it says:
info: syncing channel updates for 'nightly-x86_64-unknown-linux-gnu' warning: update not yet available, sorry! try again later error: checksum failed, expected: 75b220d4bdf9c4d670d4787e98de8444a7641a14cc82c898db2a36138248bb4', calculated: '8f10396e1feee2e8f69f6d1406ce5750cc0b3291924b0e11b0cac75fb71bbc70'
Instead we’ll just print the same thing we do when there are no updates:
info: syncing channel updates for 'nightly-x86_64-pc-windows-msvc'
We can do this for the manifests because the manifests are small, whereas rustup updates are several MB.
Stage 2: TUF
The TUF spec is required background for this section.
release-stable.toml file has no role in this scheme;
it is deprecated, but will continue to be produced for some period of time to
accomodate old clients.
static.rust-lang.org site is a single TUF ‘repo’, meaning that all
toolchains and rustup versions will be reachable from the same TUF 'root’
The metadata root is
https://static.rust-lang.org/update-metadata. All TUF
metadata files live under this directory. The base URL is
https://static.rust-lang.org/, meaning that the metadata may reference
’target’ files anywhere on the server.
We will define the following delegated target roles, for assigning the associated releases, each with their own keys:
The rust roles can provide the “/dist” and “/cargo-dist” paths. The rustup role can provide the “/rustup” path. The channels are divided by release channel so that each channel can have its own keys. The main channel roles only contain the latest release on a channel, while the archive roles contain the entire release history. Versioned Rust releases (that is - the manifests named after the release version, not the release channel - are provided by "rust-stable-archive.json").
The files captured in these target metadata files are:
All these paths exist today, and are written once and then never changed. When rustup needs to decide which manifest/rustup-init to install it searches for it in the target list of the correct role.
To solve the checksum problem, the metadata files will use “consistent snapshots”. What this means is that the files, in addition to being uploaded with their regular names, are additionally uploaded with a name prefixed by a hash of their contents. We will only use consistent snapshots for the metadata files, not for the target files, since we already have a scheme that results in consistent target data (archives). Adding consistent snapshots of those files would double our storage requirements for little gain. Using consistent snapshots only for metadata does not seem to be accounted for in the spec so we may need to develop an extension, discuss with the TUF author, etc.
We will continue using the Rust manifest format, although the TUF format is extensible such that it could accomodate the Rust manifests’ contents. Making changes here does not seem worth the effort at this time.
This is pretty easy extension. In the manifest files we’ll add two new lines to every component. Where today we list the tarball url and hash,
hash = "5bc07fe375913dee02dc4dba1d2388e6f35166f24ba69b0568d560f335689f31" url = "https://static.rust-lang.org/dist/2016-08-16/rust-std-1.11.0-aarch64-unknown-linux-gnu.tar.gz"
tomorrow we’ll add the following fields:
hash_xz = "49d3977bc4d3d868ae0632b860308a43975255a81a92d26c3a40569b4186bdc1" url_xz = "https://static.rust-lang.org/dist/2016-08-16/rust-std-1.11.0-aarch64-unknown-linux-gnu.tar.xz"
rustup will just know to use the xz artifacts if available.
Unfortunately, actually modifying the release process to generate the xz tarballs and extend the manifest is fairly involved.
- The rustup checksum issue.
- The rustup signature issue.
- The xz issue.
- The Update Framework.
- rust-packaging. The project used to generate the Rust combined installers from the individual cargo and rustc installers, and which produces the v1 manifests consumed by build-rust-manifest.py to produce v2 manifests.
- rust-buildbot. The buildbot instance that coordinates the release build process.
- build-rust-manifest.py. The script in rust-buildbot that constructs v2 manifests.
- rust-installer. The script that creates Rust package tarballs and defines the format interpreted by rustup to install individual Rust components.