Pre-RFC: Split `rust-lang/rust`'s `.travis.yml` into 3 files (normal/`auto`/`try`)


#1
  • Feature Name: (irrelevant)
  • Start Date: 2017-08-13
  • RFC PR: (leave this empty)
  • Rust Issue: (leave this empty)

Summary

Split .travis.yml into three, managed by bors, so that non-auto branches and PRs run with minimal number of builders.

Motivation

Currently each rust-lang/rust build runs 41 jobs, but outside of the bors testing branch (auto) only 1 job is relevant. We use the ALLOW_PR and ALLOW_TRY environment variables to stop the irrelevant jobs, but still, those 40 jobs need to wake up to know they are not needed. These lead to:

  • 40 slots in the queue being occupied for nothing (see rust-lang/rust#43797);
  • Spurious red cross :x: on PR due to network issue (cannot clone the repo);
  • Spurious red cross :x: on PR due to the VM just failed to start (happens mostly with Mac builds);

Ideally we should be able to specify that those 40 build-matrix entries can only run in the auto branch, but this feature has been turned down by Travis. This pre-RFC proposes a workaround by modifying how the build-bot merges the commits.

Guide-level explanation

Split .travis.yml into three

As suggested by Travis, we would create three .travis.yml files:

  • .travis.yml — used by normal PRs, and the master/beta/stable branch
  • .travis.auto.yml — used in the auto branch (for @bors r+)
  • .travis.try.yml — used in the try branch (for @bors try)

The normal .travis.yml will only run one job, the ALLOW_PR one. Similarly, .travis.try.yml will only run the ALLOW_TRY one.

When one issues @bors r+, bors will rename .travis.auto.yml to .travis.yml on the auto branch, allowing Travis to run all 41 jobs.

Modifying the auto branch structure

When bors begin to test a PR, it will reset the auto branch to point to the master branch (or beta/stable/etc), and then create a merge commit with the PR.

(before this RFC)

     master       auto 
-------A----------A+B
                  /
      PR         /
-------B---------

If the test is successful, bors will fast-forward master to the head of auto:

(before this RFC)

                 master 
                  auto 
-------A----------A+B
                  /
      PR         /
-------B---------

This RFC suggests to add one more commit after the merge commit:

(after this RFC)

    master                 auto
-------A----------A+B-------C
                  /
      PR         /
-------B---------

The new commit “C” will only rename .travis.auto.yml to override .travis.yml.

// current:
$ git checkout auto 
$ git reset --hard master
$ git merge --no-ff contributor/pr -m "Auto merge of #12345 ..."
// new additional:
$ git mv .travis.yml .travis.pr.yml
$ git mv .travis.auto.yml .travis.yml
$ git commit -m "Prepared #12345 for testing ..."
$ git push origin auto -f

After the test is successful, it will fast-forward master to auto^:

(after this RFC)

                 master    auto
-------A----------A+B-------C
                  /
      PR         /
-------B---------

The head of the auto branch is then abandoned. When a new PR is tested, a new rename commit “E” is created:

(after this RFC)

                        (orphaned)
                      ------C
                     /
                 master            auto
-------A----------A+B-----A+B+D-----E
                           /
   another PR             /
-------D------------------

Reference-level explanation

(See the “Guide-level explanation”)

Drawbacks

  • Splitting the .travis.yml into three means the content may diverge in an incompatible way.

  • Implementing this does not increase the Travis capacity, i.e. you still cannot add a proper a Redox builder without upgrading the Travis plan. It won’t reduce the actual test time per PR (>2 hours) either.

  • Complicates maintenance of the CI configuration.

Rationale and Alternatives

Alternatives:

  • Instead of using 3 files, one could find a way to mark parts of .travis.yml to be included or excluded in the auto and try branches.

  • Lobby Travis for reopening travis-ci/travis-ci#2778 :wink:.

Unresolved questions

None


#2

Thanks for exploring the possibilities here! I’ve opened an issue on Travis to track this feature request as well historically, and figured it’d be good to cc here.

Unfortunately though there’s two other downsides to this which may throw a bit of a wrench into the issue:

  • As-is this would break the deployment of binaries. The deployment is baed on the HEAD commit, and we’d have to figure out how to reverse that back and get the parent to the HEAD commit.
  • This doesn’t play well with GitHub’s “required status checks” which we try to use when possible across repositories. The commit being merged wouldn’t be the one with status checks passing, so this strategy would cause problems for that.

I think it may be worth quantifying the problem here perhaps? Whenever I’ve looked at graphs of our capacity usage in the past everything drops off extremely quickly as Travis plows through the “noop” builders. Not to say it takes 0 time, but it takes what seems to me as a negligible amount of time historically.


#3

One could do this, but this is a big hack that I don’t recommend it at all

diff --git a/src/bootstrap/channel.rs b/src/bootstrap/channel.rs
index 9c1ae83d38..4fb52169fc 100644
--- a/src/bootstrap/channel.rs
+++ b/src/bootstrap/channel.rs
@@ -64,12 +64,12 @@ impl GitInfo {
                                       .arg("--date=short")
                                       .arg("--pretty=format:%cd"));
         let ver_hash = output(Command::new("git").current_dir(dir)
-                                      .arg("rev-parse").arg("HEAD"));
+                                      .arg("rev-parse").arg("HEAD^"));
         let short_ver_hash = output(Command::new("git")
                                             .current_dir(dir)
                                             .arg("rev-parse")
                                             .arg("--short=9")
-                                            .arg("HEAD"));
+                                            .arg("HEAD^"));
         GitInfo {
             inner: Some(Info {
                 commit_date: ver_date.trim().to_string(),

or one could git checkout HEAD^ inside the CI? :laughing:


Ah that’s bad. Reading the status API it seems one could copy the status from auto to auto^, but this solution is ugly.


Personally I only concern about the spurious :x:, checking first 4 pages of merged commits (100 PRs) we find two affected PRs (⇒ 2%):

But these are before my “Change Travis CI job order” PR (#43287) so they may not affect us now.

Checking the open PRs (currently 59), there are 12 PRs with a red cross, out of which only one PR #43550 (Travis) is spurious, again a Mac. So the spurious rate is roughly 1~2% as well.


#4

Using the status api directly is very straightforward, if that’s the only blocker to this proposal it shouldn’t be a strong one. I have used it on private repos in the past.


#5

Thanks for looking that up! Do you think this is high enough to pursue a solution here? It seems low to me, but you’re watching the queue much more nowadays than I am :slight_smile:


#6

I’d keep this open as long as the rate is not 0%. But since the rate is pretty low I don’t plan to submit it as a full RFC before 2018. Probably Travis would become more reliable the coming months :smile:.


#7

Recently I found that Travis’s Triggering build API allows replacing the .travis.yml, which could be used instead of creating a new commit.

Similar API for AppVeyor does not allow replacing the appveyor.yml. The API for Circle CI supports adding environment variables, but not replacing circle.yml (the equivalent of travis-ci/travis-ci#2778 on Circle CI is https://discuss.circleci.com/t/different-build-command-per-branch/1187). IIUC, Circle CI 2.0 steps can be branch-specific, but 2.0 doesn’t support macOS.


#8

Interesting! In that case we could maybe disable CI on every branch except auto, and then we could use @bors or a similar bot to trigger builds for PRs


#9

I think it is more natural to disable CI on auto and try, and make bors use triggering builds on these two branches. If we disable CI on the master branch, the PR tests will require API triggering.


#10

Seems plausible to me!


#11

Just submitted https://github.com/rust-lang/rust/pull/44631. Consider this Pre-RFC obsolete.

Turns out Travis does support branch-specific jobs via conditional jobs, despite no news coming from bugs 2778 or 7451. Probably accidentally implemented alongside their Build Stages beta feature.