Pre-RFC: retryable tests

dead-claudia · October 30, 2022, 11:43am

Motivation

In an ideal world, all tests would be deterministic. However, in reality, they're often flaky due to a number of reasons, like:

Transient filesystem errors
Attempting to contact a remote service
Testing a task is run at a specified time

Proposal

A simple #[attempts(n)] attribute, where n > 0 represents the number of attempts allowed (default: 1). The test is repeated on each failure.

#[test]
#[attempts(3)]
fn test_that_might_fail() {
    do_thing_that_could_potentially_error().unwrap();
}

Prior art

Most heavier testing frameworks have something like this.

steffahn · October 30, 2022, 2:40pm

Assuming this proposes the test to be re-tried immediately, I suppose this is something that could easily happen inside of the test, too? For convenience, with an attribute macro one could even get essentially the proposed syntax, as far as I can tell. Note that I'm not deeply familiar with Rustʼs testing framework though, please give some more details of what the benefits of having this compiler-provided would be, in case there are technical details I'm not aware of.

If it's only about convinience, then having this as a language/compiler feature would probably need further supporting arguments as for example: it's commonly needed and the proposed API is likely to be useful and stable in the long-term; or things like that. (I don't know hard or clear criteria that such a feature needs to fulfill off the top of my head, but naturally, better arguments means more likely to be accepted.)

Given that the examples you gave (filesystem access, remote service, or measuring timing) already require a good amount of infrastructure to be run in the first place (i. e. not your average minimal-dependency tiny unit test), arguably the need for some helper crate with a macro as described above doesn't seem all that bad to me? And maybe you even want something more complex and/or application specific in terms of retry logic which such a crate could provide (IDK, maybe you'd even need to wait for some time before re-trying, or distinguish different kinds of errors between cases that should and shouldn't retry).

mathstuf · October 30, 2022, 2:57pm

Note that there is a similar feature of flushing out flaky tests where one wants to run a test N times until failure to detect these flaky bits. This is usually better done at the top-level however rather than on specific tests as the flakiness level is typically (IME), a post-build decision based on resources or the like.

However, in line with this:

Indeed.

One benefit of it being innately understood is that stats on test flakiness and/or reliability can be more easily gathered. That said, it can also be part of the attribute macro mechanism as well.

Prior art here is CTest's ctest_test(REPEAT) mechanism.

jdahlstrom · October 30, 2022, 7:03pm

Reporting in general is much improved if it's a part of the test harness output (including structured (json) output understood by IDEs/CI/etc).

mathstuf · October 31, 2022, 3:15pm

State-of-the-art here is JUnit output (at least as far as "what is understood everywhere"). Does it have some notion of "repeating tests to discover temporal behavior differences" already?

dead-claudia · November 2, 2022, 9:45pm

There's also TAP that's about as broadly understood. Notable that neither protocol emits attempt counts, though.

dead-claudia · November 2, 2022, 9:58pm

I did recently attempt to implement a workaround using FuturesExt::catch_unwind as it's in an async test and I was hoping I could avoid spawning entire new tasks for it. I did run into a problem, though: UnsafeCell references can't be used across suspend points, and Tokio's tokio::time::sleep (a perfectly sensible function to call within an integration test) relies on one, so tests using it don't compile.

~~Would it be preferred in such situations to spawn a thread or equivalent instead?~~ Edit: corrected my code to not require it anymore.

scottmcm · November 2, 2022, 11:04pm

I think the core of what this is missing a reason why it needs to be a built-in attribute, as opposed to just using a function to do the same thing, like

#[test]
fn test_that_might_fail() {
    attempts(3, ||
        do_thing_that_could_potentially_error().unwrap()
    )
}

(Could even have the function understand Result and thus let the test be attempts(3, do_thing_that_could_potentially_error).)

dead-claudia · November 2, 2022, 11:41pm

Yeah, I'm taking back my request now. Rust's native testing mechanism seems to be geared towards very simple, fully deterministic use cases.

Edit: I wish I could mark this thread as resolved somehow.

mathstuf · November 3, 2022, 12:33pm

Indeed. See my prior pre-RFC for the ability to do runtime "this test cannot work" skipping.

system · February 1, 2023, 12:33pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Maybe rust is too stable?	11	2661	January 17, 2020
Error ergonomics language design	34	3646	March 25, 2019
Stablization priority internals	13	1251	February 21, 2024
Test depends_on language design	14	1079	August 14, 2022
[Idea] `#[fallible_drop]` attribute for let statements language design	7	937	January 10, 2024

Pre-RFC: retryable tests

Motivation

Proposal

Prior art

Related Topics