Moving to TaskCluster: The Plan

Thanks for writing up how we could achieve this - alternative CI solutions are certainly one piece of the puzzle facing us.

I'd like to dig in some more to the motivating reasons for the move and what problems you'd see being solving by moving. Travis is already maintained by 'other people' (which I like!) and it's not clear what exactly you're referring to by "more powerful infrastructure". Your thread at RFC: Build our own self hosted CI infrastructure is more explicit, but wasn't written with taskcluster in mind so it's unclear how much is applicable.

Let's take a hypothetical motivation of fixing timeout spurious failures by eliminating time limits (I'm not saying this is actually an intention, it's just a strawman example). Assuming we can set up unbounded time limits on taskcluster - great! As soon as we've moved everything over to taskcluster, we've eliminated those spurious failures. However, it looks like >50% of our timeout failures were on Windows which are item 4 on your list as a more speculative 'to investigate' item. Let's put that aside for a minute though and assume we do have a solution. The next question to me is whether we should even try and be supporting unlimited run time - we have 11 PRs in the last 24 hours, meaning we should be targeting a 2 hour merge time.

Just generally, as @nrc points out in Putting bors on a PIP - #4 by nrc, we should look to the future and work towards it - it may well be that taskcluster is part of that, which makes this plan valuable! But we should make sure we're pointing in the right direction before running and I anticipate us discussing this at the Rust all hands next week.