Introduce env var CARGO_RUN_ID for identifying related processes/threads

bonega · July 6, 2024, 7:27pm

Hello Cargo team,

I'd like to propose a new feature for Cargo: the introduction of a CARGO_RUN_ID environment variable. This would solve an issue in identifying which processes or threads are spawned from the same Cargo command.

Problem:

In projects using libraries like SQLx for database testing, it's crucial to coordinate cleanup operations across multiple test processes. Currently, there's no built-in way to determine if separate test processes originated from the same Cargo invocation, leading to race conditions when cleaning up the test database. sqlx issue

Additionally, this issue extends beyond testing. Any scenario where multiple processes need to coordinate based on their origin from a single Cargo command would benefit from this feature.

Proposal:

Add an environment variable CARGO_RUN_ID that is available for the crate being compiled, similar to other Cargo-set environment variables (Environment Variables - The Cargo Book).

Properties of the generated ID:

Unique across Cargo invocations
Chronologically ordered

Given these requirements, a UUID v7 seems like a suitable choice for generating this ID.

Benefits:

Improved coordination in multi-process scenarios (e.g., SQLx database cleanup)
Ability to group logs or artifacts from related processes
Enables more effective strategies for cleanup and resource management in complex builds and operations.
Provides a uniform method for tools and libraries to identify related processes, benefiting the entire Rust ecosystem.

Nextest has implemented a similar feature with NEXTEST_RUN_ID, demonstrating the value of such an identifier.

I'm willing to implement this feature myself if the proposal is accepted. Would this be something the Cargo team would consider? I'm open to feedback or suggestions to refine this proposal.

Thank you, Andreas Liljeqvist

weihanglo · July 7, 2024, 2:52pm

Could you show an example how it will be used? As I as I know there is only one cargo invocation when you run cargo test.

bonega · July 7, 2024, 3:48pm

A custom cargo testrunner could spawn several processes that run in parallel. (nextest does)

If our tests are using #[sqlx::test] any leftover databases should be cleaned up the next time cargo nextest command runs. This is a race condition since sqlx wasn't designed to handle a multi-process runner.

github.com/launchbadge/sqlx

Comment by abonander - fix: sqlx::macro db cleanup race condition by adding a margin to current timestamp

launchbadge:main ← fhsgoncalves:fix/test-macro-monotonic-time

The real issue is with the multiprocess model of `cargo-nextest`, which this was… not designed for, which has been a known issue: #2123 The existing code assumes all testing is happening in the same process, which is why it uses a static `AtomicBool` to decide whether to clean up or not. You'd need some sort of multi-process lock to fix this properly. I had a brief look around for crates implementing something like that but didn't find anything I particularly liked: * There's `proc_lock` which uses the `fs2` crate, which hasn't been updated in 6 years. * There's `named_lock` which uses `flock()` on UNIX, which has some pretty weird semantics: https://users.rust-lang.org/t/cross-platform-library-for-file-locking/68698/4 The POSIX specification _has_ named semaphores but I can't find any crate wrapping it that's even worth considering. `named_semaphore` hasn't been updated in 2 years and its API is completely undocumented. `sema` also hasn't been updated in 6 years, and specifically doesn't do named semaphores.

Sometimes the current database gets dropped and test aborts.

CARGO_RUN_ID would be a way to group processes that originate from the same cargo command. It makes the coordination a lot simpler compared to handling multi-process locking etc.

See PR for sqlx

epage · July 8, 2024, 2:07pm

Looking into NEXTEST_RUN_ID, the reason it was added is that insta writes pending snapshots to disk and only want snapshots stored for the "latest" run. See Detecting invocations of nextest · Issue #416 · nextest-rs/nextest · GitHub

This only seems relevant once we start running binaries in parallel. Its not to clear why it would be used now.

bonega · July 8, 2024, 2:28pm

In the context of nextest, CARGO_RUN_ID would replace NEXTEST_RUN_ID.

For libraries like SQLX we want to do a similiar thing: only keep the latest test database run.

So for Sqlx we want to listen to CARGO_RUN_ID instead of an unknown set of {NEXTEST_RUN_ID, OTHER_TEST_RUNNER_ID, ETC...}
The command cargo nextest will run the binaries in parallel

epage · July 8, 2024, 2:41pm

So your more asking for cargo to support this now so that it can be part of the expected interface that test runners (e.g. nextest) use when spawning test harnesses (e.g. test binaries)?

bonega · July 8, 2024, 2:47pm

Yes, though I think that there might be utility that is broader than just tests.

epage · July 8, 2024, 2:49pm

I think that there might be utility that is broader than just tests.

It would be good to have concrete use cases / users, rather than us developing for theoretical users.

weihanglo · July 8, 2024, 2:55pm

Also note that Cargo only passes down a very limited set of environment variables to 3rd-party plugins. Setting an environment variable when crate building may or may not help if a plugin invokes Cargo multiple times.

bonega · July 8, 2024, 8:55pm

Thanks for the feedback, I hope to address some of the concerns.

Library-focused solution: CARGO_RUN_ID primarily benefits libraries like SQLx and Insta that need to coordinate across test processes. It solves real issues:
- SQLx: Cleaning up test databases without race conditions
- Insta: Managing snapshot storage for the latest run
Test runner agnostic: CARGO_RUN_ID would allow these libraries to work consistently across different test runners. Instead of handling NEXTEST_RUN_ID, OTHER_TEST_RUNNER_ID, etc., they'd use one standard identifier.
Future-proofing: When Cargo implements parallel test binary execution, libraries using CARGO_RUN_ID will be ready without changes.
Not limited to testing: I chose CARGO_RUN_ID over CARGO_TEST_RUN_ID because its utility extends beyond testing. It can coordinate any processes from a single Cargo invocation, allowing for broader future applications.
Exposing CARGO_RUN_ID to the plugin Would be nice, but is not the main focus currently.

Hope this clarifies the idea.

epage · July 8, 2024, 9:02pm

As I said, saying there might be use cases is insufficient on their own. If we want to design for this and call it out, we need to identify real users who could benefit.

bonega · July 8, 2024, 9:03pm

Ok, So you would be happier with calling it CARGO_TEST_RUN_ID?
That is a concrete use case.

bonega · July 9, 2024, 7:35am

Think I understand a bit what you're getting at, sorry for misunderstanding. You don't think I should mention the point since it is speculative? My intention mentioning it was just an extra plus, not a core part.

sahnehaeubchen · July 9, 2024, 10:35am

These variables are build time variables. You probably do not want that as this would trigger a complete recompile on every cargo invocation.

bonega · July 10, 2024, 7:34am

That sounds less than optimal Well then we would have to find another place.

sunshowers · September 5, 2024, 9:16pm

(nextest author here)

I'd lean towards not adding knowledge of this in Cargo -- NEXTEST_RUN_ID is very useful but its semantics aren't fully fleshed out, and the relationship between it and CARGO_RUN_ID may not be 1:1.

For example:

Nextest will at some point let you rerun failed tests. We probably want to use separate run IDs but may need a parent ID that groups those reruns together.
A single nextest invocation might lead to more than one Cargo build in the future. Many devs ask for a single nextest run to encompass something like cargo hack's --feature-powerset.

Not committing to particular semantics for this within Cargo means that these more concrete details can be fleshed out over time.

Also note that Cargo only passes down a very limited set of environment variables to 3rd-party plugins

Yes, I would imagine that if Cargo did support this, nextest would have to have a second implementation. But that's something nextest already has to do and it's not a huge deal.

Topic		Replies	Views
Disable parallel tests	3	7083	March 25, 2019
Help test Windows behavior between rustup and cargo cargo	5	564	May 13, 2024
Pre-RFC: Add cargo target name environment variable cargo	3	786	March 25, 2019
Are [env] blocks documented anywhere? documentation	2	283	October 6, 2024
Cargo config.toml different runner for tests cargo	3	1261	December 22, 2024

Introduce env var CARGO_RUN_ID for identifying related processes/threads

Related topics