So, Iâve actually had plans for fixing this for years, itâs just that I donât have the time to do this work. Part of the issue is that bors isnât testable yet so making large changes is tough.
One thing to note is that when doing rollups itâs not about prioritizing small PRs over large ones, itâs about prioritizing less risky PRs over risky ones. My personal heuristics are:
- PRs marked ârollupâ are not risky
- PRs that touch the build system are risky
- PRs that touch arch stuff are risky
- PRs that are humongous are risky
- PRs which havenât yet finished travis are risky
Generally when doing rollups Iâll also mark particularly risky PRs as p=5 or something so that if the rollup fails, they get tested (since the non-risky PRs will get in via rollup eventually, but these will never drain out otherwise).
The kernel of the idea is that instead of the rollup column denoting âplease rollup thisâ tiny PRs, we repurpose it to signal riskiness. We can have four levels: not risky at all (what â@bors rollupâ is now), somewhat risky but okay to roll up, âprobably do not roll upâ, and ânever roll upâ (the last one is for stuff like backports and perf things for which we absolutely do not want a rollup). The queue shows travis statuses and lets you sort by this to create rollups.
This itself isnât hard to implement.
However, once we have this, the next logical step is to just automate it; have bors automatically create rollups containing mostly not-risky stuff as well as a limited number of âsomewhat riskyâ stuff, and test those. If thereâs a failure, go on and test one of the âdo not roll upâ ones, and hopefully by then weâll have noticed what caused the failure and r-d the PR. If we want to get smart about it we can have bors bisect by attempting a smaller rollup with half the PRs, and continue trying. (There are a bunch of nuanced issues to handle here)
We can also have the âprobably donât roll upâ category go through automated try runs in parallel â passing a try run promotes it to âsomewhat risky and okay to roll upâ.
Another tactic is to take the rollupable PRs and in parallel do bisecting try runs. This may be a lot more resource-intensive.
FWIW I did a ton of rollups last month, and the biggest papercut was that we have jobs on both travis and appveyor which often run out of time.