IDEA: `cargo bisect-rustc` -- a tool to help users help us


#1

So, oftentimes we get regression reports that are troublesome to reproduce – for example, they might be on a platform we don’t have ready access to. As it happens, we now have “PR-by-PR” artifacts available that really help us to narrow down precisely where the problem was introduced. It would be so cool if we had a tool cargo bisect-rustc that, given some inputs, find the exact PR that triggered caused the build to fail.

In my ideal world, the tool would require zero input from the user, though they would be able to optionally supply some stuff to make it go faster.

First, it would verify that the build fails with the current nightly that is in use.

It would work by doing backwards jumps over nightlies to find a “working” date. So, it would start with the previous nightly, then try 2 nightlies ago, then 4, then 8, etc. until the build starts to pass again.

Then it would narrow down to a specific nightly.

Naturally, the user should also be able to supply bounds explicitly, and also to specify the failing command expected result (e.g., is build not supposed to pass? or maybe it’s a test/example that fails?).

All of that can basically always be done. But then we can go further and use the PR-by-PR builds. We can find the SHA1 info from the nightlies where the breakage occurs and use rustup-toolchain-install-master to install their builds, and then do bisection.

This would basically just be an easier-to-use, zero configuration version of the various tools that already exist – e.g., @est31 I think has some bisection script, @kennytm wrote the rustup-toolchain-install-master, etc.

Naturally such a tool isn’t just for users – it would help us too. Some of us – cough me cough – always get hives thinking about running git bisect, for example, since it requires me to think hard, take a bunch of manual steps, etc.

What do you all think?


#2

Not only could a supplied file fail to build, a test case could be evaluated by running the compiled file.

Are previous builds of rustc stored somewhere so that they can easily be invoked by cargo bisect instead of having to be rebuilt?


#4

The state-of-the-art bisection method currently is to use bisect-rust, which performs a “PR-by-PR” bisection using git bisect. It will require the user to specify a test script that describes what constitutes a failure, so it is not “zero input”.

A date-based bisection was used by rust-bisect, but it is no longer maintained.

If this new cargo-bisect-rustc command needs to support PR-by-PR bisection, it will need a copy of the Rust git repository (can be a bare repo) in order to perform git bisect.

I think @Mark_Simulacrum has more idea on this (ref https://github.com/rust-lang-nursery/rustc-guide/issues/78#issuecomment-371201988)


#5

Perhaps it can do the checkout to a temporary directory. Expecting git seems reasonable. Expecting a rustc checkout less so.


#6

I personally think removing this requirement is key :slight_smile:

The difference between saying:

Hey, can you do this:

> cargo install cargo-bisect-rustc-regression
> cargo bisect-rustc-regression

and just about anything else is huge.


#7

So I suppose this tool would work like this:

  1. By default, it will execute cargo build. If the exit code is 0, it is “working”, if the exit code is not 0, it is “broken”. This means a getting a normal error or ICE are both considered “broken”.

  2. You could also specify a custom command line, similar to the interface of rust-bisect

    cargo bisect-rustc -- rustc -Zsanitizer=asan -O a.rs
    cargo bisect-rustc -- ./test.sh
    

A draft interface:

  • --targets TRIPLE1,TRIPLE2,TRIPLE3 — some regression happens on non-Tier-1 platforms, so we need to download additional targets. The default command will become cargo build --target TRIPLE1 && ....

  • --with-cargo — download cargo. Default is to use the existing one i.e. assume cargo is irrelevant to the regression, which often is.

  • --preserve — save the downloaded artifacts.

  • -v, --verbose — show all outputs, etc.

  • HTTPS_PROXY/ALL_PROXY — set an HTTP proxy for downloading.

Specifying range of regression:

  • (nothing) — use current nightly as end point, and use the 1→2→4→8→… backwards jump to find the starting point, saturating at 2014-11-07. It needs to handle missing nightly gracefully.

  • --end 2018-03-16 — date-based bisection with a specific end point.

  • --start 2017-12-22 --end 2018-03-16 — date-based bisection with a specific range.

  • --start 41b82b6a --end e65547d4 — PR-by-PR bisection.

    A shallow clone of the Rust repository up to 168 days ago will be checked out automatically (note that our current policy is to delete PR artifacts after 90 days).

    Do we want to allow user to reuse an existing Rust repository? Would this destroy user’s repo if we only use it to do git bisect?

  • --end e65547d4 — PR-by-PR bisection, starting point found by 1→2→4→8→… previous commits, if this is possible at all. Stop at 4096 commits.


In the “2.0” version the tool could also be used to find performance regression, but this is much more difficult to reliably automate. The command to execute will produce a real number, and a build is “broken” if that number significantly increases.


#8

The interface @kennytm proposes seems reasonable; I’ll take a shot at implementing it or something like it over the next few days.


#9

This is awesome. One minor possible extensions:

Yes. It would be great if we also had convenient switches for other common cases:

# Build, only an ICE is considered a failure -- perhaps just grep output?
> cargo bisect-rustc --ICE

# Build, success is considered failure
> cargo bisect-rustc --should-fail

And then those could be combined with a few other common things:

# Build the given example
> cargo bisect-rustc --example foo

# Run the given test; I am imagining that this does a command like
# `cargo test --all -- foo`
> cargo bisect-rustc --test foo

But these are all “nice to haves”.


#10

I personally wouldn’t bother in v1. It’d maybe be a nice to have in some specific circumstances.


#11

I’d prefer if the tool could at least optionally take a script because sometimes an error occurrs when running tests, passing particular features, etc. The alternative here would be to support the whole cargo interface, but at least the possibility of using a script should be there.

Also, it would be great if this tool could narrow the last working nightly and first broken nightly first, and then, in an opt-in mode, start downloading rustc, compiling it, etc. to bisect between both nightlies on a PR-by-PR basis.

As a user I might be able to quickly improve bug reports by using this tool to narrow the last working nightly, but depending on my computing power, it might not make sense to compile rustc 4 times to narrow the PR responsible on an old laptop (might take 5 hours to do so).

Also with the nightly and the error context, humans can often guess which PR of the ones involved is actually the culprit (the error is about const eval, only one PR touched that, … that PR becomes at least instantaneously suspicious).


#12

Optionally, yes.


#13

Note that we are talking about downloading the binaries for doing the PR-by-PR hunt, not compiling. But I agree we should offer ways to get varying degrees of resolution. For one thing, PR-by-PR binaries are only available from the last 90 days.


#14

The initial implementation is up: https://github.com/rust-lang-nursery/cargo-bisect-rustc.

It currently doesn’t implement the --ICE and --should-fail flags, but they should in theory be fairly easy to implement. Most other parts of the suggested interface are implemented.

Please try it and leave issues; I’m thinking that after some discussion and a few other users either here or on the issues we’ll probably publish it to crates.io.


#15

Wow, awesome!

Seems like a great opportunity to mentor. Maybe post some instructions?

I’m sure I’ll have an opportunity to put it to use in the very near future =)