Introduce env var CARGO_RUN_ID for identifying related processes/threads

Hello Cargo team,

I'd like to propose a new feature for Cargo: the introduction of a CARGO_RUN_ID environment variable. This would solve an issue in identifying which processes or threads are spawned from the same Cargo command.

Problem:

In projects using libraries like SQLx for database testing, it's crucial to coordinate cleanup operations across multiple test processes. Currently, there's no built-in way to determine if separate test processes originated from the same Cargo invocation, leading to race conditions when cleaning up the test database. sqlx issue

Additionally, this issue extends beyond testing. Any scenario where multiple processes need to coordinate based on their origin from a single Cargo command would benefit from this feature.

Proposal:

Add an environment variable CARGO_RUN_ID that is available for the crate being compiled, similar to other Cargo-set environment variables (Environment Variables - The Cargo Book).

Properties of the generated ID:

  • Unique across Cargo invocations
  • Chronologically ordered

Given these requirements, a UUID v7 seems like a suitable choice for generating this ID.

Benefits:

  • Improved coordination in multi-process scenarios (e.g., SQLx database cleanup)
  • Ability to group logs or artifacts from related processes
  • Enables more effective strategies for cleanup and resource management in complex builds and operations.
  • Provides a uniform method for tools and libraries to identify related processes, benefiting the entire Rust ecosystem.

Nextest has implemented a similar feature with NEXTEST_RUN_ID, demonstrating the value of such an identifier.

I'm willing to implement this feature myself if the proposal is accepted. Would this be something the Cargo team would consider? I'm open to feedback or suggestions to refine this proposal.

Thank you, Andreas Liljeqvist

2 Likes

Could you show an example how it will be used? As I as I know there is only one cargo invocation when you run cargo test.

A custom cargo testrunner could spawn several processes that run in parallel. (nextest does)

If our tests are using #[sqlx::test] any leftover databases should be cleaned up the next time cargo nextest command runs. This is a race condition since sqlx wasn't designed to handle a multi-process runner.

Sometimes the current database gets dropped and test aborts.

CARGO_RUN_ID would be a way to group processes that originate from the same cargo command. It makes the coordination a lot simpler compared to handling multi-process locking etc.

See PR for sqlx

Looking into NEXTEST_RUN_ID, the reason it was added is that insta writes pending snapshots to disk and only want snapshots stored for the "latest" run. See Detecting invocations of nextest · Issue #416 · nextest-rs/nextest · GitHub

This only seems relevant once we start running binaries in parallel. Its not to clear why it would be used now.

In the context of nextest, CARGO_RUN_ID would replace NEXTEST_RUN_ID.

For libraries like SQLX we want to do a similiar thing: only keep the latest test database run.

So for Sqlx we want to listen to CARGO_RUN_ID instead of an unknown set of {NEXTEST_RUN_ID, OTHER_TEST_RUNNER_ID, ETC...}
The command cargo nextest will run the binaries in parallel

So your more asking for cargo to support this now so that it can be part of the expected interface that test runners (e.g. nextest) use when spawning test harnesses (e.g. test binaries)?

Yes, though I think that there might be utility that is broader than just tests. :+1:

I think that there might be utility that is broader than just tests.

It would be good to have concrete use cases / users, rather than us developing for theoretical users.

Also note that Cargo only passes down a very limited set of environment variables to 3rd-party plugins. Setting an environment variable when crate building may or may not help if a plugin invokes Cargo multiple times.

Thanks for the feedback, I hope to address some of the concerns.

  1. Library-focused solution: CARGO_RUN_ID primarily benefits libraries like SQLx and Insta that need to coordinate across test processes. It solves real issues:

  2. Test runner agnostic: CARGO_RUN_ID would allow these libraries to work consistently across different test runners. Instead of handling NEXTEST_RUN_ID, OTHER_TEST_RUNNER_ID, etc., they'd use one standard identifier.

  3. Future-proofing: When Cargo implements parallel test binary execution, libraries using CARGO_RUN_ID will be ready without changes.

  4. Not limited to testing: I chose CARGO_RUN_ID over CARGO_TEST_RUN_ID because its utility extends beyond testing. It can coordinate any processes from a single Cargo invocation, allowing for broader future applications.

  5. Exposing CARGO_RUN_ID to the plugin Would be nice, but is not the main focus currently.

Hope this clarifies the idea.

As I said, saying there might be use cases is insufficient on their own. If we want to design for this and call it out, we need to identify real users who could benefit.

1 Like

Ok, So you would be happier with calling it CARGO_TEST_RUN_ID?
That is a concrete use case.

Think I understand a bit what you're getting at, sorry for misunderstanding. You don't think I should mention the point since it is speculative? My intention mentioning it was just an extra plus, not a core part.

These variables are build time variables. You probably do not want that as this would trigger a complete recompile on every cargo invocation.

That sounds less than optimal :smiley: Well then we would have to find another place.