Pre-RFC: machine readable test output

hauleth · August 20, 2015, 6:46am

Summary

Add --format flag to rustc --test to select testing output.

Motivation

Currently when testing Rust provide only human-readable output which is nice in most cases as we are humans or other humanoids. But from time to time we need to feed machine with results of our tests (i.e. some kind of CI that will report which tests failed or something like that) and there is no way to do that in “civilised” way. We need to parse non-machine-readable output which isn’t nice. But wait: there are some nice, standardized and well described formats for providing test output:

TAP
TAP-Y/J

Use them for greater good!™

Detailed design

As described above just add flag to select desirable test output format which will take one of the possible values:

default - current, human-readable format
human - alias for default
tap - Test Anything Protocol which is nice format between human-readable and machine-readable
tap-y - next-gen TAP variation that use YAML stream with tests results, it allows to provide more data for sake of developer toolings
tap-j - like above but uses JSON instead of YAML

Drawbacks

Add some complexity in test results displaying but no more known drawback at this time.

Alternatives

Left as is, otherwise none.

Unresolved questions

Available output formats. For now it has been chosen based on my knowledge of well known and popular test output formats but I can miss something.

llogiq · August 20, 2015, 7:23am

This should probably also work for benchmarks.

hauleth · August 20, 2015, 7:31am

I’ve assumed that benchmarks are also tests (as they are in cfg(test) flag).

llogiq · August 20, 2015, 7:36am

I just thought I’d mention it, because the output is not quite the same. Also for machine-readable output, we may want to output the timings with all precision we have.

Also note that neither tap nor tap-y/j have a provision for incorporating benchmark results (though the extra section is an obvious candidate), so we need to define our own convention/format here.

killercup · August 20, 2015, 8:08am

Great idea—this could be a nice way to reduce the complexity of the test framework.

Outputting TAP seems to allow easy conversion to other formats (either by us or 3rd party tools). We could also deprecate the current ‘human’ format in favor of piping the TAP output into smaller tools like faucet (rewriting this in Rust might be a nice beginner project; cargo could bundle such output filters).

I haven’t read much about TAP-Y/J. They seem to offer the developer some advantages (generating JSON is easy, and valid JSON is valid YAML, so you get both for free) and might contain more information. From what I can tell, converting between TAP-J and TAP looks easy as well (given a TAP and a JSON parser).

Let me tell you about one more radical idea: Assuming that the internal representation of test results is already similar to the schema of TAP-J, offering only TAP-J could also be an option to further reduce complexity. External tools (that could be bundled with cargo, as above) could be used to convert it that anything. (The default test output would just be JSON.)

(All this would probably be a breaking change for users, though.)

hauleth · August 20, 2015, 8:12am

There could be “built-in” parser for TAP-J added for backwards compatibility. I don’t see how to do it otherwise without breaking change.

nikomatsakis · August 20, 2015, 3:47pm

I just recategorized this to “tools and infrastructure” – but I think it’s a great idea! Code-wise it’d be nice to rewrite the current “front end” as a TAP consumer.

hauleth · August 20, 2015, 3:49pm

I would think about using something like TAP-J with public specification. I could try to write one when I find some time.

alexcrichton · August 20, 2015, 4:26pm

I agree that this sounds like a pretty awesome idea! Semantically this probably wouldn’t be a flag to the compiler but rather to the test binary itself, but beyond that this is certainly one in a long list of items it’d be cool if our test infrastructure did by default.

I’ve long wanted the ability to plug in your own test harness instead of always having to use libtest as that would allow developing this kind of functionality on crates.io before moving it into the main distribution. Plus it’d even allow for faster iteration! This is pretty far out though, so just mentioning it as a passing thought.

I do like the idea of being as composable as possible, so we may wish to cut back the scope here to only have human and machine readable output so long as the latter can be convertible to basically anything else.

hauleth · August 21, 2015, 10:24am

What you think about:

Detailed design

Replace current, human-readable, output with JSON-based output based on TAP-J. For compatibility with existing work flow there should be added built-in parser for that protocol which will output in current format.

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

Protocol description

Output of test suite should be stream of JSON objects that can be parsed by additional tooling.

Structure

Output is stream lines containing JSON objects separated by new line. Each object MUST contain type field which designates suite, test, bench or finally. Any document MAY have extra field which contains an open mapping for additional information.

Suite

{
  "type": "suite",
  "name": "Doc-Test",
  "build": "2015-08-21T10:03:20+0200",
  "count": 13,
  "rustc": "2a89bb6ba033b236c79a90486e2e3ee04d0e66f9"
}

Describe test suite. It MUST appear only once at the beginning of stream.

Fields:

type MUST be suite.
build MUST be ISO8601 timestamp of build date. It will prevent accidentally running old test suites at zero cost.
name SHOULD contain current suite type (“Test”, “Doc-Test”, “Benchmark”).
count MUST be count of all tests (including ignored in runtime)
rustc MUST be version of Rust compiler used for building test suite.

Test

{
  "type": "test",
  "subtype": "should_panic",
  "status": "ok",
  "label": "octavo::digest::md5::tests::test_md5",
  "file": "src/digest/md5.rs",
  "line": 684,
  "stdout": "",
  "stderr": "",
  "duration": 100
}

Each test MUST have produce one and only one test struct.

Test unit MUST have status field which MUST be one of the values: ok, fail, ignore.

It is RECOMMENDED to add subtype field which contain either test, bench or should_panic.

Unit MUST also contain label field which describe name of the test.

A test SHOULD contain file and line fields for sake of debugging.

A test MAY contain stdout and stderr that are outputs on given streams.

It is RECOMMENDED to include duration field that contain test run time in nanosecond .

Benchmark

{
  "type": "bench",
  "status": "ok",
  "label": "octavo::digest::md5::tests::bench_md5",
  "file": "src/digest/md5.rs",
  "line": 698,
  "iterations": 382,
  "duration": 300
}

Fields description is the same as in test with 2 conditions:

duration field MUST be present
additional iterations field MUST be present that presents iterations measured by benchmark

Finally

{
  "type": "final",
  "results": {
    "ok": 10,
    "fail": 0,
    "ignore": 2
  }
}

This MUST finish the stream and parser MUST reject all input that will be after this structure.

results struct MUST include all fields named ok, fail and ignore which describe how many tests passed, failed and was ignored respectfully.

jschievink · August 21, 2015, 11:03am

Is this supposed to say duration? And is the benchmark time also in nanoseconds (0.01 ns in the example JSON seems odd)?

killercup · August 21, 2015, 11:12am

Wow, thanks for putting so much effort into this, @hauleth!

In most specs I've read so far, this was always describes as 'stream of JSON documents', where the schema of the JSON documents was specified later as objects. Shouldn't matter, though; just something I noticed.

To make your description of the test schema a bit more concise, I translated it to this struct:

enum TestSubType { Test, Bench, ShouldPanic }
enum TestStatus { Ok, Fail, Ignore }

struct Test {
    subtype: Option<TestSubType>,
    status: TestStatus,
    label: String,
    file: Option<PathBuf>,
    line: Option<u64>,
    stdout: Option<String>,
    stderr: Option<String>,
    duration: Option<Duration>,
    iterations: Option<u64>,
}

Concerning benchmarks:

I think you meant 'duration' instead of 'time'.

Also, what about measuring 'throughput'? The current benchmarks are able to do that, so we might want to model that.

Oh, and I noticed the suite structure contains a time stamp. Is there something in std (or libc for that matter) that can output this?

hauleth · August 21, 2015, 11:41am

Yeah, that was typo due my live work on specs. Sorry, I'll fix that.

To be honest I assumed the bench type will be changing as test crate isn't stable yet. I would like, i.e. to see array nspi instead of duration number. It would allow to check against time differences in call (I'm working on crypto crate named Octavo and it would be really helpful) or to check against cache re-usage. It would also provide a way to provide more subtle statistical analysis.

Now I see that I've also missed description of failed test that could provide also stack trace.

hauleth · September 17, 2015, 12:12pm

I’ve finally commited RFC https://github.com/rust-lang/rfcs/pull/1284

system · March 25, 2019, 8:25am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Alternate libtest output format	12	2083	March 25, 2019
Past, present, and future for Rust testing libs	104	16024	March 25, 2019
Unify the condensed test output of rustdoc-gui and other test suites tools and infrastructure	3	226	May 2, 2025
[pre-RFC] Implementing "<test_binary> --list --format json" for use by IDE test explorers / runners libs	14	1065	May 22, 2023
Path for stabilizing libtest's json output? tools and infrastructure	8	733	April 19, 2024

Pre-RFC: machine readable test output

Summary

Motivation

Detailed design

Drawbacks

Alternatives

Unresolved questions

Detailed design

Protocol description

Structure

Suite

Test

Benchmark

Finally

Related topics