Future updates to the rustup distribution format

Rust’s release packaging and distribution has several deficiencies today. The most important, and blocking rustup 1.0, is that updates suffer from periodic checksum failures due to how the data rustup reads from static.rust-lang.org is structured. rustup also needs to incorporate signature validation, and to better compress the binaries it downloads.

This is an outline of a design to fix these problems. It is building on a flawed system that has grown in ad-hoc ways over the years Rust has been published. Since each time we publish iterations to this scheme they must essentially be supported forever, I’m advertising this iteration in advance in hopes of getting feedback to avoid making any major design mistakes (such as the one that lead to the checksum bug).

This should also serve as documentation as to how the system works, since there is no such documentation now. This is technical information about the mechanics of distribution through rustup, and mostly won’t be of interest to end-users.

Please understand that the design here will by necessity build on the existing system. A complete rewrite is out of scope. In the near term, the only new feature that I am definitely planning on implementing is the checksum fix, but I want to make sure there is a path forword for the others as well.

If you don’t care about all the background you can just skip ahead to read about how rustup distribution will work tomorrow.

Contents:

How rustup distribution works today

I’m going to focus on the distribution of the published artifacts, and not how they are actually produced (which is complex, and I would also like to revisit someday).

The static.rust-lang.org layout

The easiest way to think about this is to first understand how all the published artifacts are arranged on static.rust-lang.org. So here’s a map of static.rust-lang.org with the key pieces named and noted:

  • static.rust-lang.org/
    • rustup.sh - This is the old script that downloads and installs a single toolchain. Unrelated to rustup.rs, but using the same metadata.
    • rust-key.gpg.ascii - The Rust GPG signing key, a subkey of which is used to sign the distribution manifests and tarballs (which are validated by rustup.sh if GPG is available, but not by rustup.rs). No part of the official distribution tools actually use the key by downloading from this URL - rustup.sh includes an embedded copy.
    • dist/ - The directory containing Rust distributables: the manifests describing each release and the corresponding binaries, checksums and signatures. This does not contain cargo binaries, for legacy reasons, or rustup binaries, for organizational reasons.
      • channel-rust-stable.toml - The “v2” manifest for the stable release channel. This is the file that describes the contents of a Rust release, produced for all three release channels as well as duplicated with stable URLs for each numbered release. The contents of this file are a list of "components", their architectures, their relationships, URLs, hashes, and assorted metadata. See the following section for an example. The URLS listed in the manifest all point into the archives. This is the entry point rustup uses to discover new releases today.
      • channel-rust-stable.toml.sha256 - The SHA-256 sum of the above, as produced by the sha256sum command on Unix.
      • channel-rust-stable.toml.asc - The GPG signature of the above.
      • channel-rust-stable - The “v1” manifest. This is the legacy manifest format, used primarily in the process of building the v2 manifest, but also for installing old toolchains that don’t have v2 manifests.
      • channel-rust-stable-date.txt - A file recording the archive date corresponding to a v1 manifest, produces as part of v1 manifest construction, used in the production of the v2 manifests, but otherwise not part of the distribution process.
      • channel-rust-stable-date.sha256
      • channel-rust-stable-date.asc
      • channel-rust-beta.toml - The manifest for the beta channel. Accompanied by all the same sibling files as the stable channel.
      • channel-rust-nightly.toml - The manifest for the nightly channel. Accompanied by all the same sibling files as the stable channel.
      • channel-rust-1.12.0.toml - The manifest for single stable release of Rust. rustup uses these for installing specific releases. These notably are not accompanied by the v1 manifest files.
      • (There are many other files in this directory, many produced as part of the release process, but are irrelevant to distribution through rustup. The files described in the archives below are duplicated here but are essentially unused).
      • $YYYY-$MM-$DD/ - Archive folders. The binary artifacts produced each day are published here, along with duplicate copies of the above-mentioned manifests and their accompanying sibling files. Importantly, every file in these folders is written once ever (modulo accidentally publishing twice in one day). The components installed by rustup are sourced from these directories.
        • (The manifest files above are all reproduced here. Eliding their descriptions).
        • rustc-nightly-x86_64-apple-darwin.tar.gz - The binary containing the rustc “component”. It is in rust-installer format. rustup combines components as described in the manifest and requested by the user to produce a working toolchain.
        • rustc-nightly-x86_64-apple-darwin.tar.gz.sha256 - The SHA-256 checksum of the above.
        • rustc-nightly-x86_64-apple-darwin.tar.gz.asc - The signature of the above.
        • (Various other components using the same scheme are in this directory, as will as the source tarballs corresponding to the release).
    • cargo-dist/ - This directory has the same structure as dist/, including archives, except that it contains only the cargo build artifacts. The two are seperated only for legacy reasons. The rust manifests contain URLs that point into this directory to retreive the cargo components.
    • rustup/ - The directory for distributing the rustup tool.
      • rustup-init.sh - The script hosted at sh.rustup.rs for installing rustup via curl on Unix.
      • dist/ - The directory in which the current rustup-init binaries are published. rustup is installed via a single self-contained executable that is downloaded and run.
        • $target-triple/ - Each installer is located in a target-specific directory.
          • rustup-init - The installer, a platform-specific executable. On Windows it is called rustup-init.exe and in the future could be called rustup-init.msi.
          • rustup-init.sha256 - The checksum of that installer.
      • archive/ - The rustup archives.
        • $version - Each directory in the archives is a version number (whereas the Rust archives are dates).
          • (The contents of this directory are identical to rustup/dist/).

The v2 manifest format

Here’s an example of what the manifest looks like, with redundant details elided. The most important things to note:

  • The “date” is the same as the archive date.
  • manifest-version allows the format to change incompatibly (rustup will reject unknown versions), but new types of information can be added to the manifest backwards-compatibly.
  • The manifest is made of packages defined for some number of targets.
  • Packages may have subcomponents which are themselves packages.
  • Componentns may be required (the “components” key), or optional (the "extensions" key).
  • Any given component may be unavailable, allowing lower-tier releases to fail to build.
  • Components define URLs of the tarball and hashes of that tarball.
  • The “rust” package is the only one with subcomponents. When rustup installs Rust it reads this package and traverses the component tree from there.
  • The “rust” package’s tarball is never downloaded by rustup, only those of its components. The “rust” tarball is a combined package of all the required components and is the one individuals install off the website today.
  • The “version” key provides the same information as rustc --version.
date = "2016-08-16"
manifest-version = "2"

[pkg]

[pkg.rust]
version = "1.11.0 (9b21dcd6a 2016-08-15)"

[pkg.rust.target]

[pkg.rust.target.aarch64-unknown-linux-gnu]
available = true
hash = "10cb2ed86992f6273d0b3bf631b7eed4c8418d88baa2b9c8057c7a60011dd4ce"
url = "https://static.rust-lang.org/dist/2016-08-16/rust-1.11.0-aarch64-unknown-linux-gnu.tar.gz"

[[pkg.rust.target.aarch64-unknown-linux-gnu.components]]
pkg = "rustc"
target = "aarch64-unknown-linux-gnu"

[[pkg.rust.target.aarch64-unknown-linux-gnu.components]]
pkg = "rust-std"
target = "aarch64-unknown-linux-gnu"

[[pkg.rust.target.aarch64-unknown-linux-gnu.components]]
pkg = "rust-docs"
target = "aarch64-unknown-linux-gnu"

[[pkg.rust.target.aarch64-unknown-linux-gnu.components]]
pkg = "cargo"
target = "aarch64-unknown-linux-gnu"

[[pkg.rust.target.aarch64-unknown-linux-gnu.extensions]]
pkg = "rust-std"
target = "aarch64-apple-ios"

...

[pkg.cargo]
version = "0.12.0-nightly (6b98d1f 2016-07-04)"

[pkg.cargo.target]

[pkg.cargo.target.aarch64-unknown-linux-gnu]
available = true
hash = "1bdf1f446199b164b8e4234fede68fb82d4528adce9757ad69fa6ace67859d94"
url = "https://static.rust-lang.org/cargo-dist/2016-07-05/cargo-nightly-aarch64-unknown-linux-gnu.tar.gz"

...

[pkg.rust-docs]
version = "1.11.0 (9b21dcd6a 2016-08-15)"

[pkg.rust-docs.target]

[pkg.rust-docs.target.aarch64-unknown-linux-gnu]
available = true
hash = "a48747395ef79578e208b36a94c6337d30aa3993a62d2fba22c0cc8da3077c9d"
url = "https://static.rust-lang.org/dist/2016-08-16/rust-docs-1.11.0-aarch64-unknown-linux-gnu.tar.gz"

...

[pkg.rust-std]
version = "1.11.0 (9b21dcd6a 2016-08-15)"

[pkg.rust-std.target]

[pkg.rust-std.target.aarch64-unknown-linux-gnu]
available = true
hash = "5bc07fe375913dee02dc4dba1d2388e6f35166f24ba69b0568d560f335689f31"
url = "https://static.rust-lang.org/dist/2016-08-16/rust-std-1.11.0-aarch64-unknown-linux-gnu.tar.gz"

...

[pkg.rustc]
version = "1.11.0 (9b21dcd6a 2016-08-15)"

[pkg.rustc.target]

[pkg.rustc.target.aarch64-unknown-linux-gnu]
available = true
hash = "49d3977bc4d3d868ae0632b860308a43975255a81a92d26c3a40569b4186bdc1"
url = "https://static.rust-lang.org/dist/2016-08-16/rustc-1.11.0-aarch64-unknown-linux-gnu.tar.gz"

...

How rustup interprets this information

OK, here’s the important stuff. There are two aspects of rustup updating: toolchain updates and self-updates.

Updating a toolchain from a release channel:

  • rustup downloads https://static.rust-lang.org/dist/channel-rust-stable.toml.sha256
  • rustup compares the checksum to that of its cached stable toolchain. If they are the same then there’s nothing further to do.
  • rustup downloads https://static.rust-lang.org/dist/channel-rust-stable.toml, the release channel manifest.
  • rustup compares the downloaded checksum to the checksum of the manifest. If they are not the same, it reports an error and stops.
  • based on the component tree described in the manifst, user options, and the currently-installed toolchain, rustup builds a list of tarballs to download and their checksums.
  • rustup downloads each tarball from the archives and compares the contents to the checksum in the manifest. If any fails it reports an error.
  • rustup installs each component by interpreting the tarball contents per the rust-installer format. If any of these fail it rolls back all installation operations.

Notable things rustup does not do while installing toolchains:

  • Make use of the .sha256 files corresponding to component tarballs. These checksums are drawn from the manifest instead.
  • Make use of the .asc signature files.

Updating rustup itself:

  • rustup downloads https://static.rust-lang.org/rustup/dist/$target-triple/rustup-init.sha256
  • rustup compares the checksum to that of the current installation. If they are the same then there’s nothing further to do.
  • rustup downloads https://static.rust-lang.org/rustup/dist/$target-triple/rustup-init
  • rustup compares the downloaded checksum of the checksum of the installer. If they are not the same it reports an error and stop.
  • rustup runs the installer and replaces itself.

The problems with rustup distribution

This section is going to describe the problems under consideration, along with some of the constraints to solving them.

Checksum failures

Issue.

Both the toolchain upgrade process and the self-upgrade process begin with a step that involves downloading a checksum of an artifact from a known location, downloading the artifact next to it, and comparing them. Sometimes these two files are not paired correctly and the check fails.

Exactly why these files don’t become available together is unclear, but static.rust-lang.org is distributed through a CDN, and the process of distributing the files takes quite a while. So it seems exceedingly likely that in the process of distribution these two files that must always be paired become unpaired for a window of time.

Unverified provenance

Issue.

Today rustup does not guarantee that the artifacts it is installing were actually produced by the Rust build machines. It only guarantees that they come from a server that some certificate authority says is static.rust-lang.org.

The Rust build process does sign its artifacts using GPG, but GPG is seemingly unsuitable for use in rustup because it is a complex, unlibrarified dependency built on a lot of non-Rust code.

rustup needs to guarantee that the binaries it puts on your computer are the ones produced by the Rust release builders.

Poor compression

Issue.

Today Rust binaries are distributed with gzip compression, which is not very efficient. There’s a lot of demand for us to use a compression scheme with a higher compression ratio. A compression scheme should favor decompression speed over compression speed since we can affort to waste time compressing, but want installation to be fast. The best candidate seems to be XZ.

How rustup distribution will work tomorrow

So the primary thing I’m trying to solve here is the checksum failures, but thinking ahead to the other issues, anything we do here will need to be extensible to support signing.

After review, I think solving the signing issue should be done by just impleminting The Update Framework. It seems to be a well-considered design, so I want to just do the work to implement the whole thing and be done with it. At that point Rust distribution should have best-in-class security properties.

TUF actually solves the checksum problem as well, but since implementing TUF will take considerable work I would like to do this in two stages: first solve the checksum problem with some simple hacks, expecting to throw them away later; then solve it right via TUF.

Stage 1: Fix the checksum failure

There are two distinct checksum failures: the self-update failure, and the toolchain manifest failures; and we’ll fix them in distinct ways. As part of this we’ll make the assumption that we are not using checksums to verify integrity. In the current iteration, we use HTTPS to ensure integrity, that the data is transmitted correctly. Once we convert to TUF, we can transmit over less reliable transports and TUF itself will ensure integrity, as well as provenance.

The self-upgrade checksum

The key thing to solve here is to convert the upgrade “entry point”, the initial file downloaded that determines whether to do the upgrade and where to look for the data, from two URLs to one, with all subsequent artifacts coming out of the archives (which are never overwritten and thus don’t suffer any sort of version drift). There are two obvious ways to do this: with a directory symlink, and with a different file format.

The mechanism for setting up a symlink is CDN-specific and doing so as part of the release process is sufficiently complex that I prefer not to do it. So my proposed fix is to create another single-file scheme for this purpose.

Besides being a single-file entry point, the only other thing it needs to accomplish is to provide some unique key that can be compared to the existing install, to determine whether the upgrade should be done at all.

The toml-based file scheme is:

schema-version = "1"
version = "..."

This file is named https://static.rust-lang.org/rustup/release-stable.toml.

There is no corresponding .sha256 file, because avoiding having a second file is the whole goal. (We may though, just for consistency, generate and upload the .sha256 file. Perhaps some find them useful).

Determining whether to do an upgrade will be performed by comparing the contents of this file to the contents of the previously-installed file. It could also be done by comparing the version field or the artifact hashes.

Like the manifest format, this one comes with a version, schema-version. rustup will verify that it understands this version before proceeding. Although we expect this file to be deprecated quickly, and unrecognize fields will be ignored, versioning it is still prudent.

To do an upgrade, rustup will simply look in the archives under the folder corresponding to version.

The toolchain manifest checksum

We’ll just paper over this one for now. When rustup sees a manifest checksum failure today it says:

info: syncing channel updates for 'nightly-x86_64-unknown-linux-gnu'
warning: update not yet available, sorry! try again later
error: checksum failed, expected: 75b220d4bdf9c4d670d4787e98de8444a7641a14cc82c898db2a36138248bb4', calculated: '8f10396e1feee2e8f69f6d1406ce5750cc0b3291924b0e11b0cac75fb71bbc70'

Instead we’ll just print the same thing we do when there are no updates:

info: syncing channel updates for 'nightly-x86_64-pc-windows-msvc'

We can do this for the manifests because the manifests are small, whereas rustup updates are several MB.

Stage 2: TUF

The TUF spec is required background for this section.

The previously-described release-stable.toml file has no role in this scheme; it is deprecated, but will continue to be produced for some period of time to accomodate old clients.

The entire static.rust-lang.org site is a single TUF ‘repo’, meaning that all toolchains and rustup versions will be reachable from the same TUF 'root’ metadata.

The metadata root is https://static.rust-lang.org/update-metadata. All TUF metadata files live under this directory. The base URL is https://static.rust-lang.org/, meaning that the metadata may reference ’target’ files anywhere on the server.

We will define the following delegated target roles, for assigning the associated releases, each with their own keys:

  • “targets/rustup-stable.json”
  • “targets/rustup-stable-archive.json”
  • “targets/rust-stable.json”
  • “targets/rust-nightly.json”
  • “targets/rust-beta.json”
  • “targets/rust-stable-archive.json”
  • “targets/rust-nightly-archive.json”
  • “targets/rust-beta-archive.json”

The rust roles can provide the “/dist” and “/cargo-dist” paths. The rustup role can provide the “/rustup” path. The channels are divided by release channel so that each channel can have its own keys. The main channel roles only contain the latest release on a channel, while the archive roles contain the entire release history. Versioned Rust releases (that is - the manifests named after the release version, not the release channel - are provided by "rust-stable-archive.json").

The files captured in these target metadata files are:

  • /rustup/archive/$version/$target/rustup-init
  • /dist/channel-rust-$version.toml
  • /dist/$date/channel-rust-nightly.toml
  • /dist/$date/channel-rust-beta.toml
  • /dist/$date/channel-rust-stable.toml
  • /dist/$date/$package.tar.gz

All these paths exist today, and are written once and then never changed. When rustup needs to decide which manifest/rustup-init to install it searches for it in the target list of the correct role.

To solve the checksum problem, the metadata files will use “consistent snapshots”. What this means is that the files, in addition to being uploaded with their regular names, are additionally uploaded with a name prefixed by a hash of their contents. We will only use consistent snapshots for the metadata files, not for the target files, since we already have a scheme that results in consistent target data (archives). Adding consistent snapshots of those files would double our storage requirements for little gain. Using consistent snapshots only for metadata does not seem to be accounted for in the spec so we may need to develop an extension, discuss with the TUF author, etc.

We will continue using the Rust manifest format, although the TUF format is extensible such that it could accomodate the Rust manifests’ contents. Making changes here does not seem worth the effort at this time.

Better compression

This is pretty easy extension. In the manifest files we’ll add two new lines to every component. Where today we list the tarball url and hash,

hash = "5bc07fe375913dee02dc4dba1d2388e6f35166f24ba69b0568d560f335689f31"
url = "https://static.rust-lang.org/dist/2016-08-16/rust-std-1.11.0-aarch64-unknown-linux-gnu.tar.gz"

tomorrow we’ll add the following fields:

hash_xz = "49d3977bc4d3d868ae0632b860308a43975255a81a92d26c3a40569b4186bdc1"
url_xz = "https://static.rust-lang.org/dist/2016-08-16/rust-std-1.11.0-aarch64-unknown-linux-gnu.tar.xz"

rustup will just know to use the xz artifacts if available.

Unfortunately, actually modifying the release process to generate the xz tarballs and extend the manifest is fairly involved.

References

17 Likes

Thanks for writing all this up! It’s a lot to digest, but I’m excited at any progress towards rustup 1.0.

I do like the idea of TUF and I’ve seen a fair amount of excitement about it, but are there any package managers that are actually using it in production yet?

There are a few, though you may have to stretch “package manager” slightly:

The first is probably the best example, and Notary may be of use as an implementation of TUF - both client and server side - to study.

Thanks so much for writing this all up, @brson.

If this can be implemented as a separate package, we could use it on crates.io too, which is one of its longest outstanding bugs.

5 Likes

You should look into using zstd instead of xz for compression. It has a high compression ratio, almost to XZ-level compression, but it compresses 60x faster than XZ and decompresses 10x faster, getting about a quarter the decompression speed of LZ4. It is truly a next generation compression standard.

1 Like

The new compression format will be used for all of the available rust releases, so XZ is the only real choice since it’s (almost) everywhere. It will certainly be interesting to see if any distros pick up these new compression formats for their own package distribution, then there might be a new standard.

Don’t forget Cabal too

@brson I just skimmed this for now, but my initial thought is that if we are planning this far ahead, we might as well go all the way and decide rustup is just responsible for delivering rustc, and cargo (optionally) download cached builds of individual crates.

Kind of like what @steveklabnik says, Cargo needs TUF too regardless of rustup, and while it would be great to somehow implement it for both tools I suspect the code reuse gains here are slim as as crates.io and static.rust-lang.org have little in common. So I’d say go for phase 1 as written, but then adopt phase 2 to what I said so we only implement TUF once. (perhaps Cargo can cache executables too, and rustup just installs the binary Cargo acquires in accordance with TUF).

All Linux distributions already support zstd. Debian, Ubuntu, Fedora, Arch Linux, and Gentoo all offer libzstd. XZ isn’t supported any more than zstd.

This https://github.com/Phaiax/rsaltpack library looks like a good start on a reusable NaCl implementation. As long as its data format satisfies TUF's spec, or something close enough to it, then it could save a comparably significant portion of TUF's python code for signing, verifying, and generating keys.

1 Like

Does it come installed in the default minimal installation? It’s the same problem as GPG signing, if it isn’t installed by default, then rustup can’t rely on it.

There aren’t any minimal installs that I know of which features XZ support by default. Additionally, considering rustup requires network access to install, I fail to see why it’s a problem to add a check for and even automatically install libzstd. There’s official support for libzstd in Rust already, so it’d be trivial to implement.

I might be off base here, but I thought the intention was to add the new compressed archives in addition to the existing ones, not instead of. As such, since rustup is a statically-linked executable, how does it matter whether or not the compression library is likely to already be installed?

You don’t need anything like saltpack. I’d also note that saltpack is “homebrew crypto” (ideally it’d be using something like Rogaway’s STREAM construction), and that crate does not appear to be in anything close to a good state.

All TUF relies on are hash functions and digital signature algorithms. There are already plenty of implementations of the former. I would recommend using Ed25519 for the latter. There’s a pure Rust implementation available in rust-crypto as well as wrappers for the ref10 C code in both sodiumoxide and ring.

1 Like

I agree with this.

TUF is a pretty complex thing to implement. I recommend that you sort out the key management issues before you implement TUF. For example, how are you going to manage the HSM/smartcards or whatever that hold the keys? If so, are you going to need to use use RSA or ECDSA signatures since very few (no?) HSMs and smartcards support Ed25519? Or, are you going to use the threshold mechanism as a means to avoid needing HSMs/smartcards. Are keys held by individual developers or are the keys held in a central access-controlled signing server? (IMO, it is better to use both smartcards and the threshold mechanism if there is no central access-controlled signing server.) What happens w.r.t. key replacement when a developer becomes untrusted for whatever reason?

You will probably find that this key management stuff (the stuff in the previous paragraph) takes a surprisingly large amount of time to get right.

2 Likes

The difference between XZ and zstd is that while zstd is widely available, XZ is widely accepted. It’s considered an acceptable compression format for the data.tar member of a deb, it’s now the primary compression format GNU uses for source releases, etc.

No one would be surprised at having software distributed in XZ, but zstd is a new and unfamiliar thing. PR is a factor to consider, and the gain from using zstd over XZ is not huge.

Why should PR even matter in the first place? The compression technology used is just an implementation detail that’s invisible to the user of rustup. In addition, Rust cannot state that it is widely accepted either, so it’s silly to consider what’s widely accepted. Libraries can only become widely accepted when software actually uses the libraries. Let’s be among the next generation applications that are using libzstd so that we can push zstd forward as a widely accepted standard on Linux.

Finally, zstd does offer a huge gain over XZ if you’ve seen the benchmarks, especially with decompression speed. In fact, it’s from the same author of LZ4.

XZ is slower, but is it slow enough to worry about it? How much time is spent in decompression versus plain IO time writing all the new files?

rustc decompresses in a few seconds anyway - the only time installation is slow is when network conditions are suboptimal, and in that case we want high compression ratio, not high decompression speed.

And according to zstd's web-site (Zstandard - Real-time data compression algorithm), zstd is still 10% larger than xz. As xz is the current standard and zstd does not provide a significant advantage, we should be using xz.

1 Like

The compression format being added isn’t just for rustup, it will be used by all forms of rust distribution. And from the sounds of it, just adding a single format is a lot of work. So adding a format that is perceived as a widely accepted standard feels like the right move.

Also just a side-note, rustup should not install packages on the users system. Besides requiring administrative access, it makes too many assumptions about the user’s platform.