Pre-RFC: Make #[test] and #[bench] more flexible


#1

Summary

Allow #[test] and #[bench] to accept an additional function signature

Detailed design

#[test]

The test attribute now accepts two signatures. The one currently accepted:

#[test]
fn my_test() {
    assert_eq!(2, 1 + 1);
}

And this new signature:

extern crate test;

#[test]
fn my_test() -> test::TestResult {
    if 1 + 1 == 2 {
        test::TestResult::Success
    } else {
        test::TestResult::Failure("Oops!".to_string())
    }
}

Definition of TestResult:

pub enum TestResult {
    /// The test succeeded.
    Success,

    /// The test failed. Contains a description of the problem.
    Failure(String),

    /// The test has determined that it must be ignored (for example the O/S doesn't support a necessary feature).
    Ignore,

    /// The test succeeded but encountered a problem. Still counts as a success but the message is displayed.
    Warn(String),

    /// Multiple sub-tests have been run.
    /// Contains the name and result of each individual sub-test.
    Multiple(Vec<(String, TestResult)>),
}

Rustc’s testing library will still be responsible of printing the results of the tests. The reason behind this is that we may want to be able to output the test results in other formats than just text on the stdout. If we allowed the user to directly print the result to stdout, we would likely close the door to any JSON/XML/whatever output.

Examples of using a test framework

Let’s say that you want to test an HTTP server for conformance with HTTP 2.0, and have a library named mahstf (“my awesome http server testing framework”) that tests this.

#[test]
fn tests_entry_point() -> test::TestResult {
    let port = 8000;
    let server = teepee::Server::listen(port);
    mahstf::start_testing(port)
}

The mahstf::test function will likely return a Multiple containing one entry for each category. Each category would contain another Multiple representing each individual test that has been run.

#[bench]

Similarly, the #[bench] attribute now accepts two signatures:

#[bench]
fn my_bench(_: &mut test::Bencher) {
}

And this new signature:

#[test]
fn my_bench() -> test::BenchResult {
}

Definition of BenchResult:

use std::time::duration::Duration;

pub enum BenchResult {
    /// The benchmark has run successfully.
    /// Contains the duration of each iteration.
    Bench(Vec<Duration>),

    /// Multiple sub-benches have been run.
    /// Contains the name and result of each individual sub-bench.
    Multiple(Vec<(String, BenchResult)>),

    /// There was an error during the benchmarking.
    Error(String),
}

Again, it is rustc’s responsibility to print the results.

Example: benchmarking OpenGL

Benchmarking is an important part of computer graphics, but they can’t be done with the current bencher system.

#[bench]
fn triangle() -> test::BenchResult {
    let display = init_opengl();

    let mut results = Vec::new();
    for _ in range(0, 500000) {
        // inserts an entry in the asynchronous commands queue that queries for the current timestamp of the GPU
        let before = display.query_timestamp();

        // send the commands to draw a triangle
        draw_triangle(&display);

        // insert another entry which is queried only after the triangle has been drawn
        let after = display.query_timestamp();

        results.push(after.get() - before.get());
    }

    test::BenchResult::Bench(results)
}

#2

Looks good. Multiple in particular looks very useful for composing tests; the xUnit frameworks I’ve used were pretty ugly in that regard.

Might it be worth having Ignore take a String to explain why it was ignored? “Explicitly ignored” versus “Couldn’t run because missing feature X” seems like a distinction that somebody reading the output might care about, and even explicitly ignored tests might benefit from a comment (“Failing due to issue #12345 but not worth breaking the build for”).


#3

I believe bench automatically works out how many runs it needs in order to minimise variance. Perhaps the argument should instead be a trait object with an appropriate selection of methods that benchmark callbacks. The most general would be one that takes a name, a closure, and returns the measurable time taken. You could then have simpler methods for more common cases.


#4

Yeah, that’s a good point.


#5

To me: that just seemed like the distinction between Ignore and Warn(String). You are warning the user that the test has failed, but there is some outstanding reason which makes the failure permissible.

For example:

  • Warn("cannot test [feature]: ssl library not linked")
  • Warn("test should pass when #12345 is resolved.")

I would use Ignore for e.g if I were skipping tests because they’re not applicable on a given platform . – By extension there’s no benefit to printing information if the test is ignored, since the test itself is irrelevant on that platform.


#6

I think there’s a significant difference between “test was not run at all” (Ignore) and “test ran and passed but with problems” (Warn, according to the pre-RFC text). That said, I’m having trouble coming up with a situation where Warn in that sense would be useful; I’ve never used a framework that had it and never really missed it.

Something like a benchmark regressing very slightly, maybe? @tomaka, could you maybe add a motivating example?


#7

I would rather use Ignore for cannot test [feature]: ssl library not linked, in which case it makes sense to add a String to it. Ignore should be used when the test is not being run.

What I had in mind for Warn was for example Expected 1.0 but got 0.9999998. But now that you make me notice this, I can’t even find any other example. I’m tending to just remove it from the RFC.


#8

Yeah, if you’re testing floating-point code for exact bitwise equality, odds are you’re doing it wrong.