Pre-RFC: retryable tests

Motivation

In an ideal world, all tests would be deterministic. However, in reality, they're often flaky due to a number of reasons, like:

  • Transient filesystem errors
  • Attempting to contact a remote service
  • Testing a task is run at a specified time

Proposal

A simple #[attempts(n)] attribute, where n > 0 represents the number of attempts allowed (default: 1). The test is repeated on each failure.

#[test]
#[attempts(3)]
fn test_that_might_fail() {
    do_thing_that_could_potentially_error().unwrap();
}

Prior art

Most heavier testing frameworks have something like this.

Assuming this proposes the test to be re-tried immediately, I suppose this is something that could easily happen inside of the test, too? For convenience, with an attribute macro one could even get essentially the proposed syntax, as far as I can tell. Note that I'm not deeply familiar with Rustʼs testing framework though, please give some more details of what the benefits of having this compiler-provided would be, in case there are technical details I'm not aware of.

If it's only about convinience, then having this as a language/compiler feature would probably need further supporting arguments as for example: it's commonly needed and the proposed API is likely to be useful and stable in the long-term; or things like that. (I don't know hard or clear criteria that such a feature needs to fulfill off the top of my head, but naturally, better arguments means more likely to be accepted.)

Given that the examples you gave (filesystem access, remote service, or measuring timing) already require a good amount of infrastructure to be run in the first place (i. e. not your average minimal-dependency tiny unit test), arguably the need for some helper crate with a macro as described above doesn't seem all that bad to me? And maybe you even want something more complex and/or application specific in terms of retry logic which such a crate could provide (IDK, maybe you'd even need to wait for some time before re-trying, or distinguish different kinds of errors between cases that should and shouldn't retry).

6 Likes

Note that there is a similar feature of flushing out flaky tests where one wants to run a test N times until failure to detect these flaky bits. This is usually better done at the top-level however rather than on specific tests as the flakiness level is typically (IME), a post-build decision based on resources or the like.

However, in line with this:

Indeed.

One benefit of it being innately understood is that stats on test flakiness and/or reliability can be more easily gathered. That said, it can also be part of the attribute macro mechanism as well.

Prior art here is CTest's ctest_test(REPEAT) mechanism.

1 Like

Reporting in general is much improved if it's a part of the test harness output (including structured (json) output understood by IDEs/CI/etc).

2 Likes

State-of-the-art here is JUnit output (at least as far as "what is understood everywhere"). Does it have some notion of "repeating tests to discover temporal behavior differences" already?

There's also TAP that's about as broadly understood. Notable that neither protocol emits attempt counts, though.

I did recently attempt to implement a workaround using FuturesExt::catch_unwind as it's in an async test and I was hoping I could avoid spawning entire new tasks for it. I did run into a problem, though: UnsafeCell references can't be used across suspend points, and Tokio's tokio::time::sleep (a perfectly sensible function to call within an integration test) relies on one, so tests using it don't compile.

Would it be preferred in such situations to spawn a thread or equivalent instead? Edit: corrected my code to not require it anymore.

I think the core of what this is missing a reason why it needs to be a built-in attribute, as opposed to just using a function to do the same thing, like

#[test]
fn test_that_might_fail() {
    attempts(3, ||
        do_thing_that_could_potentially_error().unwrap()
    )
}

(Could even have the function understand Result and thus let the test be attempts(3, do_thing_that_could_potentially_error).)

Yeah, I'm taking back my request now. Rust's native testing mechanism seems to be geared towards very simple, fully deterministic use cases.

Edit: I wish I could mark this thread as resolved somehow.

1 Like

Indeed. See my prior pre-RFC for the ability to do runtime "this test cannot work" skipping.