Proposal: move to bors-ng

bors-ng is a rewrite of bors/homu in Elixir. Interesting things:

  • uses the new GitHub integration APIs.
  • automatic rollup/bisect. No manual action needed to clear the queue.
  • can be used with any CI that posts commit status. This means we can potentially move to TaskCluster at some time.

Concerns: Will the algorithm work well with the flaky CI infrastructure?

1 Like

From what I can tell, the automatic rollup is decent, but in my opinion not all that good for us. We build artifacts on every PR merge which are then used for bisection and performance reporting; this makes rollups of PRs which have potential to change perf and introduce accidental breaking changes problematic for us. I don’t think there’s really any major advantage beyond the automatic rollups to using bors-ng other than the possibly better handling of various states due to more active maintenance.

So are you having problems do to the manual rollups we currently do? Why would procedurally generated rollups be different?

When doing manual rollups we (or at least I) try to limit to only simple or less likely to be the bisection cause PRs. It’s possibly not the best approach, but it does work. As far as I can tell, bors-ng by-default rolls up all PRs, not just those marked with rollup. This may be incorrect though.

You’re correct.

If you want the artifact cache to work the same way as it’s been working, you’re going to want parallel building, which sounds like something Travis CI would love :money_mouth: (but I’m going to guess that it would be within Rust’s budget, if the code were implemented).

BTW, I think Rust would also like the (admittedly very simple, so it shouldn't be that hard to add to homu) "try arguments." Basically, you leave a command like this:

bors try --platform=linux

And bors-ng writes this into the commit message:

Try #NNN: --platform=linux

Your test harness can then use the git cli to grab (and sed to parse?) the commit message and use it to determine which platform to test on. Rust already does this kind of stuff to limit the subset of the test suite that's run when PRs are opened.

Of course, the ideal setup would throw a second Integration into the mix, to make this work:

bors try --cargobomb

That would probably require a custom webhook, since cargobomb is not going to run within Travis.

Indeed parallel building is the best way to speed up testing and leave the artifacts intact. This Is best described in the documentation for zuul. But as alexcrichton pointed out “Right now we unfortunately just don’t have capacity”.

The discussion on adding procedurally generated rollups to the existing infrastructure is at homu/issues/102. Although sofar it is mostly me working conceptually. I would love to hear what people are concerned about and what strategies they would or would not find acceptable.

Also one day I’d love to see this in rust, but those projects are still pretty far off.

I still find that automatic rollup would be much better, because:

  • It’s hard to predict whether a PR is 100% safe or not. We have oversights.
  • PRs get rolled up all times. This means that we can stay at a relatively idle state, avoiding the rollup mistakes when the queue gets filled with too much PRs (pressure) that can potentially fail.

But well, @notriddle I found the lack of priority a blocker for migration. Let’s get it implemented soon. bors-ng is really well maintained :slight_smile:

The described batching algorithm does seem like it has a high potential to not work well for Rust, with our long turnaround times, and high intermittent rates. I’d be more comfortable adopting it if it supported our existing workflow and let us gradually experiment with others.

If bors-ng is going to be maintained, and it can support our workflow, and somebody is willing to do the work of the transition, and preferably if it can remain under the venerable ‘bors’ account, then it might be worth pursuing. We’ve long wanted to replace homu. I’m glad somebody has finally done the work - I didn’t realize it was ongoing.

Here’s what it looks like in action:

Sadly there’s no activity in the queue.

1 Like

Also re the batching algorithm, we often use bors in ways where we probably don’t want batching. For example, just last week I threw a PR at bors several times, fully expecting it to fail, just so I could see what the CI windows results would be.

It would be prudent to identify our current bors wishlist, and see how bors-ng will address them. We’ve often discussed changes to homu we want, but don’t make because nobody wants to hack on homu.

  • Every repository gets it’s own queue.
  • Right now, the front dashboard page only shows repos that you have reviewer permissions for. That’s not going to change: on the public bors-ng instance, where anybody can add their repo, a global “awaiting review” list would be too long.
  • Right now, the individual repo pages are only accessible to reviewers. That shouldn’t be hard to fix, by providing a sort of “read only” mode when non-reviewers open them.
  • From an admin’s POV, I can see that there’s webhooks hits every single day. I should probably set up a public metrics collector?
  • Right now, my main concern is fixing the hit-by-a-bus factor. I assume this would be a high priority for Rust, as a potential user.

Here’s an example of what it looks like in RO mode. The PR to enable this is in review, now.

Edit: PR landed and deployed. Here’s the queues for:

1 Like

In the context of finding enough people who know the language and are willing to improve the bors-ng codebase, is the usage of elixir expected to be a benefit, a detractor, or neutral?

Compared to Rust, I expect it to be neutral as far as finding developers. Compared to Python, however, it probably makes it harder to find developers.

I am working on understanding how homu works so I can hack on it. I am starting with batching, I am trying to design something very small and conservative to start with.

If we have a wishlist, then let’s get it together so persons can help.

As bors-ng seams better maintained, when it meets most of the requirements then I’d advocate for the switch. Not that I have much say in the matter.

1 Like

It does prioritization now. Shout-out to @khodzha for implementing it, BTW!

1 Like

@alexcrichton I think it’s a good time to migrate to bors-ng. Are you interested in rolling it out on all nursery repositories, or even cargo?

It’s already being adopted at Be patient; there’s still a few missing features before the bigger rust-lang repos can use it.


any progress ?