Taskcluster
Taskcluster fails the first hard requirement, but might deserve some discussion for completeness.
Taskcluster is Mozilla’s response to Firefox’s testing needs outgrowing Buildbot. It can be thought as a loose collection of hosted services with APIs (Queue, Index, Secrets, …) and pieces of software (API client libraries in various languages, worker agents, …) that you can put together to build any CI system. It is designed to be very generic and “self-service” in order to enable unanticipated use cases, but the counterpart of that is that anything non-trivial takes some work to put in place.
I’ve spent a few months migrating Servo from Buildbot to Taskcluster. (A few jobs are still on Buildbot, but all the infrastructure-level pieces are there to support them.) For more see the tracking issue as well as the scripts and READMEs starting here. A part of that time was me learning about and experimenting with TC (and ops things in general), but I think another was because Servo was possibly the first project of this scale other than Firefox to use TC. I think a lot of that can be reused. For example the generic decisionlib.py
is separated from the Servo-specific decision_task.py
script that uses it.
Some aspects of Servo + TC that might be unusual in off-the-shelf CI systems:
- Testing each PR starts with running in a “decision task” a script from the repository (so it can be modified in the same commit that is being tested) that uses the API to schedule other tasks with an arbitrary dependency graph.
- Docker images are built on demand from
Dockerfile
s in the repository (again so that can be modified in the same commit that uses the modifications) and cached. - A single building task can produce an executable that is then used by many testing tasks in parallel.
- Automatic provisioning and scaling (based on queue) of AWS EC2 instances of any type, with any system image. (I think support for other cloud providers is being added.)
- Bring your own hardware: any machine with appropriate credentials can pick tasks from the queue. Servo does this with non-virtualized machines from Packet (in order to run Android emulators with KVM for CPU acceleration) and macOS workers from MacStadium.
TC is also moving from being hosted (there is a single Queue service in the world, operated by Mozilla) to “shipped” software (anyone can deploy their independent instance). Of course, everything is open-source.
I think that TC can meet every requirement and nice-to-haves, except the first one: there is no company you can contract for Taskcluster support that I know of. People on Mozilla’s Taskcluster team are generally helpful when asked for help on IRC, but that’s not enough at this scale. For a TC-based CI system to be viable for Rust, there needs to be someone whose paid job (or at least part of it) is to build and maintain this system.
Then a question perhaps more tricky than money for their salary is what company can provide the legal structure to have that person as an employee.