CircleCI?
Disclaimer: I’ve had my own grievances with Circle, but overall their platform is well worth considering because I believe it at least meets the Hard Requirements. Being a college student and only having used their free plan, I also don’t have that much experience working at a Rust-sized scale, but I do believe they could meet these demands.
For what it’s worth, my team had many similar issues with Travis CI’s spotty maintenance and incident response, and while I give them a pass for being one of the largest supporters of Open Source by offering free builds, I think there are definitely some competitive options out there, especially for paying customers.
We don’t use anywhere near the number of workers that Rust would use, but CircleCI reduced our build times by over 50% coming from Travis, because Travis has much higher overhead on each job. In my time on CircleCI, we’ve had very few outages coming from CircleCI themselves—all of the service disruptions I’ve been affected by have been on the side of upstream providers such as GitHub API issues. Anecdotally, a job that previously took 1 minute on Travis to run 5 seconds of tests took 7 seconds on CircleCI. The speed of Circle’s job runners also means that any backlogs take less time to disappear after an outage.
For what it’s worth, CircleCI:
- Is a Docker-first platform, and is almost entirely based on Docker. Workers are Docker containers (except on macOS) and start almost instantly as a result, and overhead is quite low. Paid plans would almost certainly support intermediate layer caching.
- Definitely meets the requirement for having direct/premium support, and is hosted by them, but also has a self-hosted option.
- They have a sales team who would hopefully be able to answer your questions and provide concrete quotes.
- Has macOS and Linux workers available, and payment scales to capacity as set on the account.
- I believe the default resource constraints are 2CPUs x 4 GB, but the resource_class setting (available by support request) allows for scaling up to 8CPUs x 16 GB.
- Capacity is determined on the organization level, i.e. you set up a certain number of workers for the organization and those workers are always available right when the build starts.
As such, I think Circle at least covers the hard requirements. To the nice-to-haves, it’s worth asking sales about some of the things, but there are a few requirements that I can say are pretty much satisfied:
- The ability to log into the builders remotely to investigate spurious failures is built-in. One can re-run jobs with SSH and use any SSH keys associated with their GitHub account to log in.
- The ability to easily run and debug builds locally is also built-in—there is a CLI which can be used for config validation.
- Built-in caching support is also available. There is support for both shorter-term “caching” and longer-term “artifact” storage. (The former primarily for sharing between builds, the latter more for downloadable artifacts.)
- Pay-for-what-you-use might be possible. Scaling resource constraints is certainly easier on Circle than it is on Travis, from my experience.
There are a large number of things that my team have found really nice, such as each commit getting multiple commit statuses on GitHub, so you can explicitly click on the “Details” and get right to the build, or mark certain job passes as “required” for branches to be mergeable. I think these things are configurable, if those are not wanted.
There are some cons, though.
- Circle’s web UI isn’t the most intuitive, nor is their configuration documentation bulletproof. They’re improving both of these things, but in their current forms they aren’t perfect.
- I’ve encountered some performance issues with loading logs from builds with lots of output, in that it just takes time to load the logs.