Yeah I definitely agree that the @bors queue can be frustrating, especially when your PR fails for a spurious reason and then takes days to reach the head of the queue again. In general spurious failures are extremely annoying for everyone involved!
In general though we need to approach improvements with a principled eye. Before proposing a solution or a change I’d recommend learning about the current system (e.g. why things are the way they are) to help predict the impact of a proposed change. I’m always up for answering questions about our CI and/or build system!
The statement here is not true, nor is this the cause of really that much slow down. We use a custom LLVM so we have a location to backport fixes which we do so on a regular basis. Put another way it’s guaranteed that all system LLVM versions are buggy in one way or another, so it’s not acceptable to just blanket always use the system LLVM. Note that we do also have a builder which uses the system LLVM, and this is what’s run on Travis (one of our fastest configurations).
Also note that building LLVM is not a time hog. We leverage sccache on all builders to cache builds of LLVM. If you take a look at the logs and look at the timings you’ll notice that LLVM typically takes about ~5 minutes to build from a warm cache. This, out of a multi-hour build, is just a drop in the bucket.
So in summary, switching to the system LLVM would (a) mean we can’t fix critical bugs and (b) not actually help build times all that much.
Switching to different CI providers should always be an option, so it’s worth considering. So long as a CI provider integrates with GitHub it’ll be able to interact with the @bors queue correctly.
That being said, I highly doubt that switching will make our builds 80% faster. We’re a CPU bound project (we’re a compiler), so unless they’ve got 80% faster CPUs we’re not really going to see that much improvement. I’d recommend fully investigating such an alternative proposal before just curtly stating that we should switch.
Furthermore, the lack of “docker support on Travis” isn’t really a problem. We cache docker images across PRs today so that doesn’t impact build times.
So along a similar vein, I’ve never actually used a fire extinguisher in my life! I know what it is, and I’ve learned what it’s used for in the past, but I’ve never personally needed it! It sure does take up a lot of space under my kitchen sink and is getting to be inconvenient nowadays that I have more stuff I want to put under my sink. That means I should throw out my fire extinguisher, right?
In a less joking fashion, this isn’t up for change at this time. I do realize it’s tough to envision a world without @bors as we’ve had it so long (for the “lifetime” of many Rust contributors!). Those that remember the dark ages before @bors, however, will swear that this is never worth it.
A system like @bors is not without its downsides, of course, but I’d rather waste some space under my sink for a fire extinguisher than burn my whole house down when I need it.
Yeah spurious failures are exceptionally annoying for basically everyone except @bors who is eternally hungry for more PRs. We have a number of other active issues for spurious failures where I believe the most notorious is spurious segfaults on OSX linkers.
I’d love to foster encouragement to tackle these issues as they are incredibly high profile bugs to fix and you have the benefit of basically making every rust-lang/rust developer’s life nicer. I can’t remember the last time I saw 24 hours of yellow suns in my inbox from @bors, and I’d love to see it again! In the meantime I unfortunately end up sinking a lot of time into reading every failure log @bors generates to make sure it’s not spurious 
It would be useful to quantify the pain here rather than just state that such a clone is painful. I’ve typically seen an LLVM clone take ~5 minutes, which is a drop in the bucket for our builds. I’ve investigated submodule depth 1 cloning in the past but never got it to work out, but it’d be great to speed this up regardless!
As @eddyb points out, I encourage you to link to factual evidence for claims like this. The OSX builds are about to become the slowest overall builds.
Can you elaborate on what you’d like to see from a hypothetical “docker support with Travis”? I’m under the impression that we wouldn’t benefit much at this point (other than having someone else maintain the support), but I may be missing something!
I completely agree and I believe that this is one of the lowest hanging fruit for motivated contributors to help out with our CI. The Homu project has long languished from a lack of a solid maintainer, and we have a laundry list of issues and feature improvements that we could add to Homu. I unfortunately don’t personally have time to work on this much, but we can very easily update homu whenever we need! Some issues off the top of my head would be:
- Automatic rollups. There’s a boatload of heuristics we can throw into this, and there’s been a novel’s worth of previous conversation on this topic as well. I totally agree with @petrochenkov that I believe this would help tremendously.
- Homu could comment on a PR when Travis-on-the-PR fails. This is now just a subset of the main test suite, so a failure on Travis on the PR is a guaranteed failure on Travis as the next PR to merge.
- Homu could link to failing logs as opposed to just the build itself, making it easier to explore what failure happpened
- Homu could have different prioritization logic. We’ve long wanted to favor new contributors in the queue to help improve the “first patch experience”.
- Homu could work with unicode in PR titles/descriptions. Right now if a PR with such a description gets to the head of the queue the whole world grinds to a halt
And much more! The aspirations for Homu go well beyond the Rust project itself to the Rust community as a whole. A one-click integration with a solid CI bot would be massive for everyone!