Rust CI / release infrastructure changes

We’re in the process of making a number of changes to Rust’s integration and release infrastructure, with the goals of making it simpler for the community to participate in Rust’s infrastructure maintenance, making the infrastructure more reliable, expanding the release build process so that it can accommodate more Rust standard tooling, such as the RLS and clippy, and accomodating distribution of the new book and other Rust docs.

Here I’m going to explain what’s happening and why.

Some of the factors at play in this transition include:

  • Our buildbot-based CI / release infrastructure cannot be maintained by community members, is generally bottlenecked on Alex and myself.
  • Our buildbot configuration has reliability issues, particularly around managing dynamic EC2 instances.
  • Our nightly builds sometimes fail for reasons not caught during CI and are down for multiple days.
  • Packaging Rust for distribution is overly complex, involving many systems and source repositories.
  • The beta and stable branches do not run the test suite today. With the volume of beta backports each release receives this is a freightening situation.
  • As certain core Rust tools mature we want to deliver them as part of the Rust distribution, and this is difficult to do within the current infrastructure / build system design. Distributing additional tools with Rust is particularly crucial for those intimately tied to compiler internals, like the RLS and clippy.

At the end of this process we will expect that

  • Anybody can modify the majority of Rust’s CI / release infrastructure by submitting a PR to rust-lang/rust.
  • Rust’s CI builds and testing will be running on Travis and AppVeyor, not private buildbot instances.
  • Rust will not be running buildbot at all.
  • There are binary builds available for every PR that is merged, for use in testing and bisecting.
  • Every day that a PR lands there will be a nightly.
  • Beta and stable branches receive the same amount of automated testing as the master branch.
  • Rust release builds are produced on Travis and AppVeyor.
  • The rust-lang/rust build itself will optionally compile, package, and install an expanded Rust distribution that includes cargo, and will be extensible to other core tools such as RLS and clippy in the future, laying the groundwork to install the RLS with Rust via rustup.

Note: in this document I make a distinction between “rustc” and the "platform", when discussing the build, where the platform refers to rustc plus the critical tools beyond rustc that are tied to rustc internals or that otherwise need to be distributed with Rust. I realize the word “platform” is tainted because of the previous library discussion, but the use of the term here shouldn’t be conflated with the distribution of additional libraries with Rust.

Moving CI to Travis and AppVeyor

Already underway and near completion, we are converting our CI builds - the ones that run on every PR and must pass before a PR is merged - to run on Travis and AppVeyor. This will make it easier for people to help fix the automation, since the configuration is stored directly in the source tree, and since others have easy access to their own instances. It should also make the builders more reliable since these systems are widely used and they create consistent, clean environments for every build.

These builds will still be coordinated by bors and the workflow for contributors and reviewers will remain the same.

We have dedicated instances from both Travis and AppVeyor.

Publishing builds from CI

Today the release build process tends to break in ways that are not detected during CI and nightlies are often not produced.

To address this, we are going to modify the CI build so that it produces full release artifacts on every merge. Every time bors builds Rust it will also run make dist and upload the binaries to S3 under a directory named after the commit sha. So we know that every time bors merges a PR we have a full set of release binaries for all supported platforms. And by having binaries of every merged PR available it will make it easier to bisect regressions.

We can later use this same infrastructure for producing the actual releases.

Integrating Rust tools into the build process

Today the build process for producing full Rust releases is quite convoluted, it can’t be reasonably be reproduced outside our infrastructure, and it is difficult to extend to include additional tools that we want to distribute with Rust, like the RLS.

To address this we are going to integrate all the scripts and build steps involved in producing a full Rust release directly into the rust-lang/rust source tree. This includes functionality that currently lives in rust-packaging and rust-buildbot, as well as the complete cargo build. By doing this anybody should be able to easily produce the same set of release artifacts we do, it will be easy for Linux distros to keep their cargo build properly paired with rustc, and it will be easy to expand our on release builds to include other tools.

Even though this will expand the set of build artifacts produced by the rust-lang/rust build, we will make it optional, leaving the rustc developer experience as-is.

When building / testing a complete release one will pass the --enable-platform flag, that is “enable the full Rust platform” (though we may need to pick a different name than “platform” since that word is tainted). When this is enabled several things happen:

  • at the end of the final stage build, cargo is built
  • during final stage testing, cargo is tested
  • during ‘make dist’, cargo is packaged
  • during ‘make dist’, combined ‘rust’ installers are produced that include cargo
  • during ‘make dist’, other package formats like msi are produced
  • ‘make install’ installs rust and cargo

This logic extends to future additions to the Rust distribution, like the RLS and clippy.

Adding the RLS and clippy to the Rust distribution

Once the aforementioned work on the build system is complete, that is, once we have an extended “platform” build that can be optionally enabled to build the full Rust distribution, we can begin adding other tools to Rust. This is most important for the RLS and clippy. Both tools are, or will be, widely used, an important part of the Rust developer experience, and both are intimately tied to the compiler’s internals. Because of this the most reasonable way for them to be distributed is as part of Rust - otherwise it is too difficult to pair the correct build of the tools with the correct build of rustc. An alternative would be to augment cargo in some way so that it can understand installation of toolchain-specific crates, but considering that this is exactly what rustup does, the most obvious path forward today is to distribute it through rustup or otherwise as part of the Rust distribution.

As mentioned before with the addition of cargo to the platform build, once e.g. rls is part of the build, it will be optionally built, tested, and packaged with every build of Rust; and thus installable via rustup.

For the most part it will be a straightforward addition. The major question is how to maintain the RLS within the tree. We could e.g. create an RLS submodule, but I think this is likely to be quite difficult to maintain since changes to rustc will break the RLS, and RLS tests will need to pass for rustc changes to land. For that reason, I believe the most practical solution is simply to pull the RLS in tree and deprecate the RLS repo. Again, because the extended platform build is optional, this should have little affect on day to day development, but Rust developers will be on the hook to keep the RLS building. Since delivering an RLS is a key part of our product strategy, this seems reasonable to me.

On the other hand, merging projects will also merge their issue trackers, and possibly slow down their development.

Future changes to doc build and distribution

This plan naturally extends to the treatment of Rust’s docs. Documentation issues on the horizen include the distribution of Rust’s new book, the future of the nomicon and the reference, distributing Cargo’s docs.

Like with tooling, docs will become part of the extended build, continuing to produce the single rust-docs package. Like with tooling, the question about where the documentation is maintained - either in-tree or out-of-tree - I think will be mostly driven by the stability of its subject matter. My preference is for things that can be maintained out of tree, to be maintained out of tree.

So we might end up with the new book, the nomicon, and the reference, as submodules, all discussing stable matters, and a separate in-tree document for unstable matters.

Then we can think about expanding the standard Rust bookshelf further. The first candidate will be the cargo docs, which should be accessible the same way as Rust’s other documentation.

As part of this process we’ll take a build-time dependency on mdBook, only when the extended build is enabled. This should be fine once the old build system is removed.

Producing releases from CI builds

As natural fallout of the above changes, we can then eliminate almost all of the nightly, beta, and stable build machinery. After a successful PR merge we will have the complete set of binaries available on S3, nearly ready for release as-is.

By releasing these binaries, it does mean that there is another intermediate party involved with our release infrastructure compared to today: in addition to AWS / macstadium we are also depending on Travis and AppVeyor to maintain the integrity of our release build environment.

We’ll need to consider how to make this setup reasonably secure, and it’s clear that we need to start making Rust’s release security better, not worse. We’ll think through this more clearly before doing anything.

The final steps after the binaries are produced is to sign them, generate the manifest used by rustup, and publish them to their final location for distribution.

To accomplish this we would have a small machine that periodically does the work. Exactly when it will run will depend on the release channel, but it will do a few things:

  • Download all the artifacts for the commit being released to local disk
  • Generate the release channel manifest
  • Sign all the artifacts
  • Upload them to static.rust-lang.org’s dist folder and archives
  • Run CDN invalidations

As an extension, we can also produce rustup-installable releases for every single bors-merge commit, not just once per day.

Beta / stable release changes

The beta and stable branches work just like the master branch, PR’s go through bors, run all tests. Whereas today commits are just merged into beta and stable.

Once change here is that, with releases keyed off of bors merges to beta and stable, there will be times when we need to submit a no-op pr just to trigger a build.

More efficient bootstrapping

As part of the switch away from buildbot, to fit within our time budget on Travis, we will modify the bootstrap to build fewer stages by default. We will only build the compiler and standard library twice, instead of three times, once with the bootstrap compiler, and then once more (I’m not going to talk about stages here because the way stage artifacts are intermingled makes it too confusing to communicate about, instead I will talk about the “first build”, “second build”, etc.). Today we always do three builds of std+rustc during bootstrap, tomorrow we will do two by default, and validate that the third continues to work.

In the Rust bootstrap process, the second and third builds are assumed to be ABI-identical (the first and the second builds are often not ABI-identical). In practice today there is no validation that this is true, and the purpose of doing the third build is to confirm that rustc can rebuild itself.

So by default, the rustc build will only do two builds, and we will have one integration machine testing the third build.

Travis/Appveyor build matrix

This next section is simply cataloging all the configurations used in the new system (the “build matrix” in Travis terms).

These are distinguished along three dimensions: whether they are building/testing rustc, whether they are building/testing the extended "platform" tools, or whether they are just providing other coverage.

Some of the division here is motivated by reducing total build time.

Here “host” means “platforms that run rustc”, and “target” means "platforms that run std".

There are 46 configurations, publishing bins for 44 platforms.

Cross host rustc/platform builders

These are the large set of cross-compiled host platforms that can be built from Linux. They do not run tests and therefore can afford to build both rustc and the extended platform. They publish rustc, the platform tools, and combined installers.

  • aarch64-unknown-linux-gnu
  • armv7-unknown-linux-gnueabi
  • arm-unknown-linux-gnueabi
  • arm-unknown-linux-gnueabihf
  • i686-unknown-freebsd
  • mips-unknown-linux-gnu
  • mipsel-unknown-linux-gnu
  • mips64-unknown-linux-gnuabi64
  • mips64el-unknown-linux-gnuabi64
  • x86_64-unknown-freebsd
  • x86_64-unknown-netbsd
  • powerpc-unknown-linux-gnu
  • powerpc64-unknown-linux-gnu
  • powerpc64le-unknown-linux-gnu
  • s390x-unknown-linux-gnu

Native host rustc builders

These build host toolchains on their native build platforms, as well as additional targets. Hosts are tested. Targets may be tested. They publish only the target artifacts, not the host artifacts, which are published by the platform builders in the next section.

  • x86_64-apple-darwin
    • aarch64-apple-ios
    • armv7-ios
    • armv7s-io
    • x86_64-ios
    • i386-ios
  • i686-apple-darwin
  • x86_64-unknown-linux-gnu
    • mips-unknown-linux-musl
    • mipsel-unknown-linux-musl
    • x86_64-rumprun-netbsd
    • arm-unknown-linux-musleabi
    • arm-unknown-linux-musleabihf
    • armv7-unknown-linux-musleabihf
  • x86_64-unknown-linux-gnu (not tested, not uploaded)
    • asmjs-unknown-emscripten (test)
    • wasm32-unknown-emscripten (test)
    • i586-unknown-linux-gnu (test)
    • i686-unknown-linux-musl (test)
    • x86_64-unknown-linux-musl (test)
  • x86_64-unknown-linux-gnu (not tested, not uploaded)
    • arm-linux-androideabi (test)
    • aarch64-linux-androideabi
    • i686-linux-android
  • x86_64-unknown-linux-gnu (not tested, not uploaded)
    • armv7-linux-androideabi (test)
  • i686-unknown-linux-gnu
  • i686-pc-windows-gnu
  • i686-pc-windows-msvc
  • x86_64-pc-windows-gnu
  • x86_64-pc-windows-msvc
    • i586-pc-windows-msvc

Native host platform builders

These are build / testing host platform bins on the native OS. These don’t run the rustc test suite but do test the additional platform components. All create combined tarballs and platform-specific installers and publish their artifacts.

  • x86_64-apple-darwin (+pkg)
  • i686-apple-darwin (+pkg)
  • x86_64-unknown-linux-gnu
  • i686-unknown-linux-gnu
  • i686-pc-windows-gnu (+msi)
  • i686-pc-windows-msvc (+msi)
  • x86_64-pc-windows-gnu (+msi)
  • x86_64-pc-windows-msvc (+msi)

Other coverage

These are additional configurations for test coverage. They publish nothing.

  • x86_64-unknown-linux-gnu (nopt tests)
  • i686-unknown-linux-gnu (nopt tests)
  • x86_64-unknown-linux-gnu (debuginfo in compiler, no test)
  • x86_64-unknown-linux-gnu (cargotest, grammartest)
  • x86_64-unknown-linux-gnu (LLVM 3.7, tests)
  • x86_64-unknown-linux-gnu (makefiles, tests)
  • x86_64-apple-darwin (makfiles, tests)
  • x86_64-pc-windows-msvc (makefiles, tests)
  • x86_64-pc-windows-msvc (cargotest)
  • i686-pc-windows-gnu (makefilest)
  • x86_64-unknown-linux-gnu (distcheck)
  • x86_64-unknown-linux-gnu (stage2 bootstrap)

CI / release build transition roadmap, and how to help

There’s a lot here, some immediate, some more speculative. In the short term we need to finish moving to travis/appveyor and configure them to be able to produce complete rust releases. The steps to that are part of this tracking issue. The faster we get through this the faster we can have the rls just work!

Beyond that is less clear, but we’ll be reworking the release process, and packaging the rls, the book, clippy, and other things. We can begin adding those to the build in relatively short order, once we have a --enable-platform flag, even if we don’t begin distributing them immediately. Here are some of the tasks that need to be done:

  • add rls submodule, add to rustbuild, build / test when --enable-platform
  • teach rustbuild to create the rls installer
  • add clippy submodule, add to rustbuild, build / test when --enable-platform
  • teach rustbuild to create the clippy installer
  • add new book submodule, removing / deprecating the old
36 Likes

Just curious, is there some reason why Travis and AppVeyor are chosen over TaskCluster?

Thank you for writing all this up! I know it’s been hectic working on this lately, and that the time taken to write this up takes away from time spent actually doing the work, so I really appreciate it :heart:

9 Likes

Not sure if this is the reason, but I personally would like to avoid TaskCluster until it’s easy to use TC when you’re not Mozilla (either as a service or an easy-to-set-up open source project). Right now, that isn’t the case, so replicating CI is hard.

This is great work and also should ease adding signing to the build process. This is so good.

1 Like

Here are some reasons for not picking TaskCluster for our CI:

  • TaskCluster is not very accessible to non-Mozillians, and one of our main goals is to have open infrastructure that others can contribute to. It is not practical to run one's own TaskCluster instance.
  • Travis and AppVeyor are well-known to open source contributors, so they can more easily contribute.
  • TaskCluster is a complex system that is managed by another team who are mostly beholden to Firefox, while Travis and AppVeyor are relied on by diverse customers
  • I've not enjoyed writing TaskCluster scripts in JavaScript in the past, though maybe it has other bindings that I would feel more comfortable with

There may be a role for TaskCluster in the release process yet, motivated by Firefox releng's work on creating secure workflows in TaskCluster.

As I understand it the plan is to retire buildbot. I’m currently scraping the buildbot logs to link releases to rust-lang git commits without having to download every release. Can you please make sure this information is available after the switch?

That makes sense. Thanks!

Nice! This should enable someone to write a Rust equivalent of the Perl 6 Bisectable bot. Here's a little description of what it can do:

If you are not on #perl6 channel very often, you might not know that we have a couple of interesting bots. One of them is bisectable. In short, Bisectable performs a more user-friendly version of git bisect, but instead of building Rakudo on each commit, it has done it before you even asked it to! That is, it has over 5500 rakudo builds, one for every commit done in the last year and a half. This turns the time to run git bisect from minutes to about 10 seconds [...] So if you pop up on #perl6 with a problem that seems to be a regression, we will be able to find the cause in seconds.

This would be great even if it's just a tool compiler devs have available. It doesn't necessarily need to be an IRC bot.

3 Likes

@brson: Great news, great timing. For DragonFly I wanted to get the buildbot running these days, now I think I will just wait until your work has landed. Is there any chance to add 64-bit DragonFly to the travis build matrix? And I don’t see OpenBSD listed there as well… This would greatly simplify life (and save myself a lot of time) for anyone developing on these platforms. Currently, rustup is not working on DragonFly because there are no official binaries available, which is a pity.

In order to be in the travis build matrix, it is expected to be able to cross-build the target from Linux.

For OpenBSD, it is a bit complex: the linker in base (binutils-2.17 for amd64 and i386) is a bit old and has also diverged from upstream. So some work is required before having a proper way to cross-build openbsd binaries from Linux.

For DragonFly, I think it should be more simple.

For those following along at home, rust-lang/rust CI is now gated on Travis and AppVeyor. #38598 was the first PR to go green on both CI systems and #38536 was the first PR merged via automation watching Travis and AppVeyor.

We’ve now merged 18 PRs with Travis + AppVeyor automation. The buildbot instances are still running and have remained green as well. If everything looks ok in a few weeks or so we’ll retire buildbot auto bots entirely.

There’s still a number of outstanding issues associated with the movement to Travis and AppVeyor. Notably we’ve seen some new spuriously failing tests:

Additionally the Travis/AppVeyor configuration isn’t quite as fast as I would like. My goal was a 2 hour limit on PRs, and currently the only matrix entry routinely slower than this is the arm-android image on Travis. I’ve sent a number of PRs to hopefully bring down this test time, however:

Hopefully when compounded with https://github.com/rust-lang/rust/pull/38631 we can get under that 2 hour ceiling!


Please report any issues you see or just ping me on IRC, and if all goes well we’ll retire those buildbots soon! The next steps would be to mirror all of our release infrastructure on Travis/AppVeyor (which is already in the works).

11 Likes

Thanks for the update @alexcrichton.

Are these available yet?

@cuvpier Yes they exist now but are not being used for the actual deployment yet.

The scheme is something like

https://s3.amazonaws.com/rust-lang-ci/rustc/builds/$SHA/rustc-nightly-x86_64-apple-darwin.tar.gz

@alexcrichton is just about done with the server that takes these CI builds and packages them for release, and we’ll probably switch from buildbot for nightlies in the next week.

2 Likes

In case someone wonder, it should be rustc-builds, not rustc/builds.

What’s the current state on this?

@ker we’re very nearly reaching completion currently. The last steps are:

  • First we need to land building Cargo in tree. We’ll likely have a few bugs to sort through after this, but it means that the “extended build” will now be totally self-contained.
  • Next we need to figure out where/when to run Cargo’s own tests. This will likely involve modifying the aux-test suite.
  • Next we need to disable uploads of Cargo’s binaries and likely tone down the amount of CI we do.
  • Finally we need to test out infrastructure which will be producing the next beta release of Rust. That is, rustbuild/Travis/AppVeyor will be responsible for the 1.17 stable and beta releases.

Once that’s all done we can see where the RLS is at and start moving it in tree, if it’s ready. And… I think that’s it!

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.