Consolidating compile-fail and ui testing, and possibly others as well


#1

Continuing the discussion from Testing the compiler:

I’d like to propose a Grand Unification of our testing system. In particular, I’d like to unify run-pass, compile-fail, and ui testing. What follows is a write-up of some of my reasoning. Presuming we get some agreement, then I’m hoping to mentor people who want to work on this refactoring.

What do we want from a testing setup?

Let me start by elaborating the kinds of things I think we want from tests:

  • We want to be sure that broken code fails to compile
    • compile-fail does this, ui does too
  • We want to be able to check how we format our errors
    • ui tests do this today
  • We want to maximize “happy accidents” in testing
    • e.g., if I screw up formatting errors in some subtle way, I might not find it from targeted tests, but if we are checking that all of our compile-fail tests continue to work as expected, we’re more likely to catch the problem
    • similarly, if I suddenly make NOTE and HELP messages go away, I probably want to know that too
    • ui tests, since they test the complete output, maximize the change of catching subtle problems
  • We want to make it easy to update tests
    • e.g., if we change from using - to ~ to do underlines, we should be able to easily update all of the reference files
    • similarly, if we change the test
    • compile-fail is good if we add to the test, but bad if we make a systemic change to wording (or some other thing that it considers)
    • ui tests have this covered with the script
  • We want to make it easy to review
    • ui tests make it easy to see what new errors etc look like, and make it hard for small diffs to sneak in unnoticed
  • We want to make it easy to read the tests and know which scenarios result in an error
    • compile-fail tests do this well, since the //~ ERROR annotations are inline in the test
    • ui tests require you to load a separate file and correlate line numbers, not good
  • We want to check the JSON output
    • nothing tests this now, really
    • compile-fail SORT OF does, but only sort of, because it only looks at some of the json fields
    • something like ui but for the json output would be better

One case that is not covered well today:

  • We want to be able to easily enumerate related cases that should and should not work
    • nothing does this very well today
    • sometimes you can put them all into a compile-fail test
    • if you need separate files, you either spread them across two directories (run-pass, compile-fail) or else use #[rustc_error], which is annoying and a hack

What do I propose?

I think I would like to unify run-pass, run-fail, compile-fail, and ui tests into one sort of thing. However, I’d like to start by just unifying compile-fail and ui tests, and then move on to the others. Let me describe that first.

compile-fail and ui

These tests are very close. In fact, we could just copy all the compile-fail tests into the ui directory, run the “ui generation script”, and call it a day. But that would be a bit odd, because then we’d have all these //~ ERROR lines that have no semantic meaning anymore. In fact, I think they are kind of useful, because (as I noted above) if you have a test like this, it’s nice to be able to tell which of the functions are errors and which are not while reading the source itself, without having to cross-reference (this comes from compile-fail/lifetime-bound-will-change-warning.rs, which I picked at random):

fn test1<'a>(x: &'a Box<Fn()+'a>) {
    // just_ref will stay the same.
    just_ref(&**x)
}

fn test1cc<'a>(x: &'a Box<Fn()+'a>) {
    // same as test1, but cross-crate
    lib::just_ref(&**x)
}

fn test2<'a>(x: &'a Box<Fn()+'a>) {
    // but ref_obj will not, so warn.
    ref_obj(x) //~ ERROR mismatched types
}

fn test2cc<'a>(x: &'a Box<Fn()+'a>) {
    // same as test2, but cross crate
    lib::ref_obj(x) //~ ERROR mismatched types
}

So I propose that we extend the ui runner so that it ALSO scans the file and extracts the //~ ERROR annotations. It would then grep the output and check that we do indeed get errors on the lines where we say we will.

Note that this is a kind of “secondary” check. We don’t need to have //~ NOTE annotations or any of that jazz, because the ui test is there to check that the message is what we expected and that everything looks right. In fact, we could probably simplify the //~ annotations so that they don’t even include a message, just //~ ERROR. It’s just there to help you when you skim a list of cases so you can see which ones result in errors and which don’t.

So this is the precise change then that I propose:

  • Move all compile-fail tests to ui tests.
  • Simplify to just support //~ ERROR and //~ WARNING; go back to grepping the output instead of parsing JSON.
    • Also, just support matching the “main” message, which is easy to grep
    • Remove all the //~ NOTE, //~ HELP, etc, because those are complicated to grep
  • Generate reference tests, naturally

supporting run-pass, run-fail etc

Eventually I’d like to support other kinds of tests. I think I would want to just have the sort of test be determined by an annotation at the top (e.g., // run-pass) rather than the directory it lives in. For a run-pass test, I’d like to have a reference file that tells you the expected output when it is executed.

other kinds of tests

Finally, if we want to test json output, we can make tests that use // compile-flags:--error-format=json and just use the JSON output as the expected output. We might need to tweak the normalization code, not sure.

thoughts?

What do people think?

One concern that @eddyb raised is that he doesn’t like the way that the ui test relies on a script to be maintainable. In particular, the expected workflow is to run the compiler and then check the diffs in the output, and use the script if you feel they are ok. He feels that this is too prone to error. I am sympathetic, but I cannot come up with another way to test all the notes and formatting completely and universally. However, my belief is that this concern is mitigated by two factors:

  • First, the fact that ui tests would still check that //~ ERROR and //~ WARNING occurs on the right lines means you can hand-craft the tests, to some degree, to ensure you get the expected results.
    • You just also run the script in that case, but it’s not the “primary line of defense”.
  • Second, I find that changes to ui tests are a joy to review, so I do expect to rely somewhat on reviewers, and the ablity to easily skim a git diff, to detect “surprising” changes.
    • If anything, I’d expect we’ll do a BETTER job than we do today, since we’ll notice when smaller notes or other details change that are not tested by compile-fail.

#2

This is exactly what I’d like to see as well. Former compile-fail tests still ensure that erroneous code produces errors, and former ui tests still ensure formatting is good + cross checks for extra reliability, and all this without having to write something like

//~^ NOTE
//~| NOTE 
//~| NOTE
//~| HELP
//~| HELP
//~| HELP

manually.


#3

As always, I’d be happy to work on this. :sunglasses: