Rust’s release packaging and distribution has several deficiencies today. The
most important, and blocking rustup 1.0, is that updates suffer from periodic
checksum failures due to how the data rustup reads from
static.rust-lang.org is structured. rustup also needs to incorporate signature
validation, and to better compress the binaries it downloads.
This is an outline of a design to fix these problems. It is building on a flawed
system that has grown in ad-hoc ways over the years Rust has been
published. Since each time we publish iterations to this scheme they must
essentially be supported forever, I’m advertising this iteration in advance in
hopes of getting feedback to avoid making any major design mistakes (such as the
one that lead to the checksum bug).
This should also serve as documentation as to how the system works, since there
is no such documentation now. This is technical information about the mechanics
of distribution through rustup, and mostly won’t be of interest to end-users.
Please understand that the design here will by necessity build on the existing
system. A complete rewrite is out of scope. In the near term, the only new
feature that I am definitely planning on implementing is the checksum fix, but
I want to make sure there is a path forword for the others as well.
If you don’t care about all the background you can just skip ahead to read about
how rustup distribution will work tomorrow.
Contents:
How rustup distribution works today
I’m going to focus on the distribution of the published artifacts, and not how
they are actually produced (which is complex, and I would also like to revisit
someday).
The easiest way to think about this is to first understand how all the published
artifacts are arranged on static.rust-lang.org. So here’s a map of
static.rust-lang.org with the key pieces named and noted:
-
static.rust-lang.org/
-
rustup.sh - This is the old script that downloads and installs a single
toolchain. Unrelated to rustup.rs, but using the same metadata.
- rust-key.gpg.ascii - The Rust GPG signing key, a subkey of which is used to
sign the distribution manifests and tarballs (which are validated by
rustup.sh if GPG is available, but not by rustup.rs). No part of the
official distribution tools actually use the key by downloading from this
URL - rustup.sh includes an embedded copy.
- dist/ - The directory containing Rust distributables: the manifests
describing each release and the corresponding binaries, checksums and
signatures. This does not contain cargo binaries, for legacy reasons, or
rustup binaries, for organizational reasons.
- channel-rust-stable.toml - The “v2” manifest for the stable release
channel. This is the file that describes the contents of a Rust release,
produced for all three release channels as well as duplicated with stable
URLs for each numbered release. The contents of this file are a list of
"components", their architectures, their relationships, URLs, hashes, and
assorted metadata. See the following section for an example. The URLS
listed in the manifest all point into the archives. This is the entry
point rustup uses to discover new releases today.
- channel-rust-stable.toml.sha256 - The SHA-256 sum of the above, as
produced by the
sha256sum command on Unix.
- channel-rust-stable.toml.asc - The GPG signature of the above.
- channel-rust-stable - The “v1” manifest. This is the legacy manifest
format, used primarily in the process of building the v2 manifest, but
also for installing old toolchains that don’t have v2 manifests.
- channel-rust-stable-date.txt - A file recording the archive date
corresponding to a v1 manifest, produces as part of v1 manifest
construction, used in the production of the v2 manifests, but otherwise
not part of the distribution process.
- channel-rust-stable-date.sha256
- channel-rust-stable-date.asc
- channel-rust-beta.toml - The manifest for the beta channel. Accompanied
by all the same sibling files as the stable channel.
- channel-rust-nightly.toml - The manifest for the nightly
channel. Accompanied by all the same sibling files as the stable channel.
- channel-rust-1.12.0.toml - The manifest for single stable release
of Rust. rustup uses these for installing specific releases. These notably
are not accompanied by the v1 manifest files.
- (There are many other files in this directory, many produced as part of
the release process, but are irrelevant to distribution through
rustup. The files described in the archives below are duplicated here but
are essentially unused).
- $YYYY-$MM-$DD/ - Archive folders. The binary artifacts produced each day
are published here, along with duplicate copies of the above-mentioned
manifests and their accompanying sibling files. Importantly, every file in
these folders is written once ever (modulo accidentally publishing twice
in one day). The components installed by rustup are sourced from these
directories.
- (The manifest files above are all reproduced here. Eliding their
descriptions).
- rustc-nightly-x86_64-apple-darwin.tar.gz - The binary containing the
rustc “component”. It is in rust-installer format. rustup combines
components as described in the manifest and requested by the user to
produce a working toolchain.
- rustc-nightly-x86_64-apple-darwin.tar.gz.sha256 - The SHA-256 checksum
of the above.
- rustc-nightly-x86_64-apple-darwin.tar.gz.asc - The signature of the
above.
- (Various other components using the same scheme are in this directory, as
will as the source tarballs corresponding to the release).
- cargo-dist/ - This directory has the same structure as
dist/, including
archives, except that it contains only the cargo build artifacts. The two
are seperated only for legacy reasons. The rust manifests contain URLs that
point into this directory to retreive the cargo components.
- rustup/ - The directory for distributing the rustup tool.
-
rustup-init.sh - The script hosted at sh.rustup.rs for installing rustup
via curl on Unix.
- dist/ - The directory in which the current rustup-init binaries are published.
rustup is installed via a single self-contained executable that is downloaded
and run.
- $target-triple/ - Each installer is located in a target-specific directory.
- rustup-init - The installer, a platform-specific executable. On
Windows it is called rustup-init.exe and in the future could be
called rustup-init.msi.
- rustup-init.sha256 - The checksum of that installer.
- archive/ - The rustup archives.
- $version - Each directory in the archives is a version number (whereas
the Rust archives are dates).
- (The contents of this directory are identical to rustup/dist/).
The v2 manifest format
Here’s an example of what the manifest looks like, with redundant details
elided. The most important things to note:
- The “date” is the same as the archive date.
-
manifest-version allows the format to change incompatibly (rustup will
reject unknown versions), but new types of information can be added to the
manifest backwards-compatibly.
- The manifest is made of packages defined for some number of targets.
- Packages may have subcomponents which are themselves packages.
- Componentns may be required (the “components” key), or optional (the
"extensions" key).
- Any given component may be unavailable, allowing lower-tier releases to fail
to build.
- Components define URLs of the tarball and hashes of that tarball.
- The “rust” package is the only one with subcomponents. When rustup installs
Rust it reads this package and traverses the component tree from there.
- The “rust” package’s tarball is never downloaded by rustup, only those of its
components. The “rust” tarball is a combined package of all the required
components and is the one individuals install off the website today.
- The “version” key provides the same information as
rustc --version.
date = "2016-08-16"
manifest-version = "2"
[pkg]
[pkg.rust]
version = "1.11.0 (9b21dcd6a 2016-08-15)"
[pkg.rust.target]
[pkg.rust.target.aarch64-unknown-linux-gnu]
available = true
hash = "10cb2ed86992f6273d0b3bf631b7eed4c8418d88baa2b9c8057c7a60011dd4ce"
url = "https://static.rust-lang.org/dist/2016-08-16/rust-1.11.0-aarch64-unknown-linux-gnu.tar.gz"
[[pkg.rust.target.aarch64-unknown-linux-gnu.components]]
pkg = "rustc"
target = "aarch64-unknown-linux-gnu"
[[pkg.rust.target.aarch64-unknown-linux-gnu.components]]
pkg = "rust-std"
target = "aarch64-unknown-linux-gnu"
[[pkg.rust.target.aarch64-unknown-linux-gnu.components]]
pkg = "rust-docs"
target = "aarch64-unknown-linux-gnu"
[[pkg.rust.target.aarch64-unknown-linux-gnu.components]]
pkg = "cargo"
target = "aarch64-unknown-linux-gnu"
[[pkg.rust.target.aarch64-unknown-linux-gnu.extensions]]
pkg = "rust-std"
target = "aarch64-apple-ios"
...
[pkg.cargo]
version = "0.12.0-nightly (6b98d1f 2016-07-04)"
[pkg.cargo.target]
[pkg.cargo.target.aarch64-unknown-linux-gnu]
available = true
hash = "1bdf1f446199b164b8e4234fede68fb82d4528adce9757ad69fa6ace67859d94"
url = "https://static.rust-lang.org/cargo-dist/2016-07-05/cargo-nightly-aarch64-unknown-linux-gnu.tar.gz"
...
[pkg.rust-docs]
version = "1.11.0 (9b21dcd6a 2016-08-15)"
[pkg.rust-docs.target]
[pkg.rust-docs.target.aarch64-unknown-linux-gnu]
available = true
hash = "a48747395ef79578e208b36a94c6337d30aa3993a62d2fba22c0cc8da3077c9d"
url = "https://static.rust-lang.org/dist/2016-08-16/rust-docs-1.11.0-aarch64-unknown-linux-gnu.tar.gz"
...
[pkg.rust-std]
version = "1.11.0 (9b21dcd6a 2016-08-15)"
[pkg.rust-std.target]
[pkg.rust-std.target.aarch64-unknown-linux-gnu]
available = true
hash = "5bc07fe375913dee02dc4dba1d2388e6f35166f24ba69b0568d560f335689f31"
url = "https://static.rust-lang.org/dist/2016-08-16/rust-std-1.11.0-aarch64-unknown-linux-gnu.tar.gz"
...
[pkg.rustc]
version = "1.11.0 (9b21dcd6a 2016-08-15)"
[pkg.rustc.target]
[pkg.rustc.target.aarch64-unknown-linux-gnu]
available = true
hash = "49d3977bc4d3d868ae0632b860308a43975255a81a92d26c3a40569b4186bdc1"
url = "https://static.rust-lang.org/dist/2016-08-16/rustc-1.11.0-aarch64-unknown-linux-gnu.tar.gz"
...
How rustup interprets this information
OK, here’s the important stuff. There are two aspects of rustup updating: toolchain
updates and self-updates.
Updating a toolchain from a release channel:
- rustup downloads
https://static.rust-lang.org/dist/channel-rust-stable.toml.sha256
- rustup compares the checksum to that of its cached stable toolchain. If they
are the same then there’s nothing further to do.
- rustup downloads
https://static.rust-lang.org/dist/channel-rust-stable.toml,
the release channel manifest.
- rustup compares the downloaded checksum to the checksum of the manifest.
If they are not the same, it reports an error and stops.
- based on the component tree described in the manifst, user options, and the
currently-installed toolchain, rustup builds a list of tarballs to download
and their checksums.
- rustup downloads each tarball from the archives and compares the contents to
the checksum in the manifest. If any fails it reports an error.
- rustup installs each component by interpreting the tarball contents per the
rust-installer format. If any of these fail it rolls back all installation
operations.
Notable things rustup does not do while installing toolchains:
- Make use of the
.sha256 files corresponding to component tarballs. These
checksums are drawn from the manifest instead.
- Make use of the
.asc signature files.
Updating rustup itself:
- rustup downloads
https://static.rust-lang.org/rustup/dist/$target-triple/rustup-init.sha256
- rustup compares the checksum to that of the current installation. If they are
the same then there’s nothing further to do.
- rustup downloads
https://static.rust-lang.org/rustup/dist/$target-triple/rustup-init
- rustup compares the downloaded checksum of the checksum of the installer. If
they are not the same it reports an error and stop.
- rustup runs the installer and replaces itself.
The problems with rustup distribution
This section is going to describe the problems under consideration, along with
some of the constraints to solving them.
Checksum failures
Issue.
Both the toolchain upgrade process and the self-upgrade process begin with a
step that involves downloading a checksum of an artifact from a known location,
downloading the artifact next to it, and comparing them. Sometimes these two
files are not paired correctly and the check fails.
Exactly why these files don’t become available together is unclear, but
static.rust-lang.org is distributed through a CDN, and the process of
distributing the files takes quite a while. So it seems exceedingly likely that
in the process of distribution these two files that must always be paired become
unpaired for a window of time.
Unverified provenance
Issue.
Today rustup does not guarantee that the artifacts it is installing were
actually produced by the Rust build machines. It only guarantees that they come
from a server that some certificate authority says is static.rust-lang.org.
The Rust build process does sign its artifacts using GPG, but GPG is seemingly
unsuitable for use in rustup because it is a complex, unlibrarified dependency
built on a lot of non-Rust code.
rustup needs to guarantee that the binaries it puts on your computer are the
ones produced by the Rust release builders.
Poor compression
Issue.
Today Rust binaries are distributed with gzip compression, which is not very
efficient. There’s a lot of demand for us to use a compression scheme with a
higher compression ratio. A compression scheme should favor decompression speed
over compression speed since we can affort to waste time compressing, but want
installation to be fast. The best candidate seems to be XZ.
How rustup distribution will work tomorrow
So the primary thing I’m trying to solve here is the checksum failures, but
thinking ahead to the other issues, anything we do here will need to be
extensible to support signing.
After review, I think solving the signing issue should be done by just
impleminting The Update Framework. It seems to be a well-considered
design, so I want to just do the work to implement the whole thing and be done
with it. At that point Rust distribution should have best-in-class security
properties.
TUF actually solves the checksum problem as well, but since implementing
TUF will take considerable work I would like to do this in two stages: first
solve the checksum problem with some simple hacks, expecting to throw them away
later; then solve it right via TUF.
Stage 1: Fix the checksum failure
There are two distinct checksum failures: the self-update failure, and the
toolchain manifest failures; and we’ll fix them in distinct ways. As part of
this we’ll make the assumption that we are not using checksums to verify
integrity. In the current iteration, we use HTTPS to ensure integrity, that the
data is transmitted correctly. Once we convert to TUF, we can transmit over less
reliable transports and TUF itself will ensure integrity, as well as provenance.
The self-upgrade checksum
The key thing to solve here is to convert the upgrade “entry point”, the initial
file downloaded that determines whether to do the upgrade and where to look for
the data, from two URLs to one, with all subsequent artifacts coming out of the
archives (which are never overwritten and thus don’t suffer any sort of version
drift). There are two obvious ways to do this: with a directory symlink, and
with a different file format.
The mechanism for setting up a symlink is CDN-specific and doing so as part of
the release process is sufficiently complex that I prefer not to do it. So my
proposed fix is to create another single-file scheme for this purpose.
Besides being a single-file entry point, the only other thing it needs to
accomplish is to provide some unique key that can be compared to the existing
install, to determine whether the upgrade should be done at all.
The toml-based file scheme is:
schema-version = "1"
version = "..."
This file is named https://static.rust-lang.org/rustup/release-stable.toml.
There is no corresponding .sha256 file, because avoiding having a second file
is the whole goal. (We may though, just for consistency, generate and upload the
.sha256 file. Perhaps some find them useful).
Determining whether to do an upgrade will be performed by comparing the contents
of this file to the contents of the previously-installed file. It could also
be done by comparing the version field or the artifact hashes.
Like the manifest format, this one comes with a version, schema-version.
rustup will verify that it understands this version before proceeding. Although
we expect this file to be deprecated quickly, and unrecognize fields will be
ignored, versioning it is still prudent.
To do an upgrade, rustup will simply look in the archives under the folder
corresponding to version.
The toolchain manifest checksum
We’ll just paper over this one for now. When rustup sees a manifest checksum
failure today it says:
info: syncing channel updates for 'nightly-x86_64-unknown-linux-gnu'
warning: update not yet available, sorry! try again later
error: checksum failed, expected: 75b220d4bdf9c4d670d4787e98de8444a7641a14cc82c898db2a36138248bb4', calculated: '8f10396e1feee2e8f69f6d1406ce5750cc0b3291924b0e11b0cac75fb71bbc70'
Instead we’ll just print the same thing we do when there are no updates:
info: syncing channel updates for 'nightly-x86_64-pc-windows-msvc'
We can do this for the manifests because the manifests are small, whereas rustup
updates are several MB.
Stage 2: TUF
The TUF spec is required background for this section.
The previously-described release-stable.toml file has no role in this scheme;
it is deprecated, but will continue to be produced for some period of time to
accomodate old clients.
The entire static.rust-lang.org site is a single TUF ‘repo’, meaning that all
toolchains and rustup versions will be reachable from the same TUF 'root’
metadata.
The metadata root is https://static.rust-lang.org/update-metadata. All TUF
metadata files live under this directory. The base URL is
https://static.rust-lang.org/, meaning that the metadata may reference
’target’ files anywhere on the server.
We will define the following delegated target roles, for assigning the
associated releases, each with their own keys:
- “targets/rustup-stable.json”
- “targets/rustup-stable-archive.json”
- “targets/rust-stable.json”
- “targets/rust-nightly.json”
- “targets/rust-beta.json”
- “targets/rust-stable-archive.json”
- “targets/rust-nightly-archive.json”
- “targets/rust-beta-archive.json”
The rust roles can provide the “/dist” and “/cargo-dist” paths. The rustup role
can provide the “/rustup” path. The channels are divided by release channel so
that each channel can have its own keys. The main channel roles only contain the
latest release on a channel, while the archive roles contain the entire release
history. Versioned Rust releases (that is - the manifests named after the
release version, not the release channel - are provided by
"rust-stable-archive.json").
The files captured in these target metadata files are:
/rustup/archive/$version/$target/rustup-init
/dist/channel-rust-$version.toml
/dist/$date/channel-rust-nightly.toml
/dist/$date/channel-rust-beta.toml
/dist/$date/channel-rust-stable.toml
/dist/$date/$package.tar.gz
All these paths exist today, and are written once and then never changed. When
rustup needs to decide which manifest/rustup-init to install it searches for it
in the target list of the correct role.
To solve the checksum problem, the metadata files will use “consistent
snapshots”. What this means is that the files, in addition to being uploaded
with their regular names, are additionally uploaded with a name prefixed by a
hash of their contents. We will only use consistent snapshots for the metadata
files, not for the target files, since we already have a scheme that results in
consistent target data (archives). Adding consistent snapshots of those files
would double our storage requirements for little gain. Using consistent
snapshots only for metadata does not seem to be accounted for in the spec so we
may need to develop an extension, discuss with the TUF author, etc.
We will continue using the Rust manifest format, although the TUF format is
extensible such that it could accomodate the Rust manifests’ contents. Making
changes here does not seem worth the effort at this time.
Better compression
This is pretty easy extension. In the manifest files we’ll add two new lines
to every component. Where today we list the tarball url and hash,
hash = "5bc07fe375913dee02dc4dba1d2388e6f35166f24ba69b0568d560f335689f31"
url = "https://static.rust-lang.org/dist/2016-08-16/rust-std-1.11.0-aarch64-unknown-linux-gnu.tar.gz"
tomorrow we’ll add the following fields:
hash_xz = "49d3977bc4d3d868ae0632b860308a43975255a81a92d26c3a40569b4186bdc1"
url_xz = "https://static.rust-lang.org/dist/2016-08-16/rust-std-1.11.0-aarch64-unknown-linux-gnu.tar.xz"
rustup will just know to use the xz artifacts if available.
Unfortunately, actually modifying the release process to generate the xz tarballs
and extend the manifest is fairly involved.
References