So, I’ve actually had plans for fixing this for years, it’s just that I don’t have the time to do this work. Part of the issue is that bors isn’t testable yet so making large changes is tough.
One thing to note is that when doing rollups it’s not about prioritizing small PRs over large ones, it’s about prioritizing less risky PRs over risky ones. My personal heuristics are:
- PRs marked “rollup” are not risky
- PRs that touch the build system are risky
- PRs that touch arch stuff are risky
- PRs that are humongous are risky
- PRs which haven’t yet finished travis are risky
Generally when doing rollups I’ll also mark particularly risky PRs as p=5 or something so that if the rollup fails, they get tested (since the non-risky PRs will get in via rollup eventually, but these will never drain out otherwise).
The kernel of the idea is that instead of the rollup column denoting “please rollup this” tiny PRs, we repurpose it to signal riskiness. We can have four levels: not risky at all (what “@bors rollup” is now), somewhat risky but okay to roll up, “probably do not roll up”, and “never roll up” (the last one is for stuff like backports and perf things for which we absolutely do not want a rollup). The queue shows travis statuses and lets you sort by this to create rollups.
This itself isn’t hard to implement.
However, once we have this, the next logical step is to just automate it; have bors automatically create rollups containing mostly not-risky stuff as well as a limited number of “somewhat risky” stuff, and test those. If there’s a failure, go on and test one of the “do not roll up” ones, and hopefully by then we’ll have noticed what caused the failure and r-d the PR. If we want to get smart about it we can have bors bisect by attempting a smaller rollup with half the PRs, and continue trying. (There are a bunch of nuanced issues to handle here)
We can also have the “probably don’t roll up” category go through automated try runs in parallel – passing a try run promotes it to “somewhat risky and okay to roll up”.
Another tactic is to take the rollupable PRs and in parallel do bisecting try runs. This may be a lot more resource-intensive.
FWIW I did a ton of rollups last month, and the biggest papercut was that we have jobs on both travis and appveyor which often run out of time.