Summary
This RFC aims to improve the process of collecting code coverage data for Rust libraries. By including Cargo in the process of instrumenting Rust libraries and running the unit tests, the sequence of steps to get coverage results will be simplified. This RFC also proposes adding support for cargo to selectivly choose which crates get instrumented for gathering coverage results.
Motivation
Why are we doing this? What use cases does it support? What is the expected outcome?
The motivation behind this feature is to allow for a simple way for a Rust developer
to run and obtain code coverage results for a specific set of crates to ensure confidence
in code quality and correctness. Currently, in order to get instrumentation based code coverage,
a Rust developer would have to either update the RUSTFLAGS
environment variable or cargo
manifest keys. This would automatically enable instrumentation of all Rust crates within the
dependency graph not just the top level crate. Instrumenting all crates including transitive
dependencies does not help the developer ensure test coverage for their own crate.
Guide-level explanation
This section examines the features proposed by this RFC:
CLI option
A new subcommand for the cargo test
command would be added. The new command --coverage
would instruct Cargo to
add enable the Rust flag -C instrumental-coverage
, for the given crate only. This would mean that only the top-level
crate would be instrumented and code coverage results would only run against this crate. As an example, lets take the
following crate foo
:
/Cargo.toml
+-- src
+-- lib.rs
Where crate foo
has a dependency on the regex
:
[dependencies]
regex = "*"
And lib.rs
contains:
use regex::*;
pub fn add(left: usize, right: usize) -> usize {
left + right
}
pub fn match_regex(re: Regex, text: &str) -> bool {
let Some(caps) = re.captures(text) else {
return false
};
let matches = caps.iter().filter_map(|s| s).collect::<Vec<Match>>();
matches.len() > 0
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn it_works() {
let result = add(2, 2);
assert_eq!(result, 4);
}
#[test]
fn find_match() {
let result = match_regex(Regex::new(".*").unwrap(), "Hello World");
assert_eq!(result, true);
}
#[test]
fn find_no_match() {
let result = match_regex(Regex::new("a+").unwrap(), "Hello World");
assert_eq!(result, false);
}
}
Now running cargo test --coverage
would produce the coverage results for the foo
crate only and ignore
all functions defined outside of this crate.
Reference-level explanation
As mentioned earlier this RFC proposes adding a new subcommand to the cargo test
command. This subcommand, --coverage
would be responsible for setting the -C instrument-coverage
flag that Cargo would pass on to the rustc invocation of the
top-level crate. In the previous example, foo
would be the top level crate and regex
would be upstream an dependency.
Using the --coverage
subcommand, Cargo would only manually set the -C instrument-coverage
flag for the crate foo
. If
the RUSTFLAGS
environment variable had already been set to include the -C instrument-coverage
flag, then Cargo would still
pass that flag to all crates within the dependency graph, including the regex
crate and any transitive dependencies.
This should not break any existing workflows and is strictly an opt-in feature.
To use this new feature do the following:
cargo test --coverage
This subcommand would also be responsible for setting the LLVM_PROFILE_FILE
environment variable which is read by LLVM
to generate a .profraw
file for each test executable that is run. Once again if the environment variable is already set,
then Cargo would not make any changes and would leave the value as is to use the user-defined file name. If the environment
variable is not set, Cargo would set it to ensure a unique naming scheme is generated for each .profraw
file that would be
generated.
These updates to Cargo would be sufficient enough to ensure that a Rust developer would have control over what crates are instrumented and code coverage results are generated. This would also allow the Rust developer to no longer have to set environment variables manually to ensure crates are instrumented for gathering coverage data.
Drawbacks
A drawback of this feature would be that Cargo would need to enable the LLVM_PROFILE_FILE
environment variable in order to ensure unique profile data is generated for each test
executable. I am not aware of any other Cargo features that set environment variables today
so this would be a new potential problem introduced for Cargo.
Rationale and alternatives
Rationale
This design provides a simple mechanism to integrate collecting code coverage
results for a given crate. Allowing Cargo to be part of the coverage process
would reduce need for setting environment variables manually. Simply running
cargo test --coverage
would automatically run a build setting the
-C instrument-coverage
Rust flag and set the LLVM_PROFILE_FILE
environment
variable to ensure each test run produces a unique profraw
file.
This design does not break any existing usage of Rust or Cargo. This new feature would
be strictly opt-in. A Rust developer would still be able to manually set the
-C instrument-coverage
Rust flag and instrument all binaries within the dependency
graph. Since this is a Cargo specific feature, the Rust compiler will not need any updates.
Alternatives
Alternative 1: leave the existing feature
Supporting this alternative would mean that no changes would be necessary to either Cargo or Rust. Getting instrumentation-based code coverge is still supported and would continue to work as it does today.
The drawback for this option is that it would require setting the flag for all crates in the dependency graph, including upstream dependencies. This would also instrument all binaries and report on coverage for functions and branches that are not defined by the current crate with the potential of skewing coverage results.
Alternative 2: use a new manifest key instead of a cli option
Supporting this alternative would mean that changes would need to be made to the existing
manifest format to possibly include a new section and/or key. A new coverage
key could
be added to the target section, coverage = true
. This still has the added benefit of not
requiring any changes to the Rust compiler itself and the feature could be scoped to Cargo only.
The drawback for this option is that it could potentially add clutter to the Cargo.toml
files. Adding this new section to every crate that a Rust developer wants to have instrumented
would only add to all the data that is already contained within a toml file.
Alternative 3: use a RUSTC_WRAPPER program to selectively choose which crates to profile
Supporting this alternative would mean that there wouldn't need to be any changes to Cargo at all.
This would require creating a RUSTC_WRAPPER program specifically for selecting which crates to profile.
This means more boiler plate code for each Rust developer that simply wants to profile their own crate.
I believe the feature this RFC proposes would both be a cleaner solution long term and more in line
with the Cargo workflow of potentially reading these kinds of behaviors from the Cargo.toml
manifest
file.
Prior art
VSInstr
Visual Studio ships with a tool vsinstr.exe
which has support for instrumenting binaries after
they have already been built. Since LLVMs instrumentation-based code coverage hooks into each object
file it generates this scenario is a bit different than the feature this RFC proposes. vsinstr
does
allow for excluding namespaces of functions to skip over so that everything within a binary does not
get instrumented.
gcov based coverage
Rust also has support for another type of code coverage, a GCC-compatible, gcov-based coverage implementation.
This can be enabled through the -Z profile
flag
This uses debuginfo to derive code coverage. This is slightly different than the source-based code coverage which
allows for a more precise instrumentation being done.
Unresolved questions
- Are there any drawbacks from having Cargo set the
LLVM_PROFILE_FILE
environment variable that LLVM uses to name each of the generatedprofraw
files? This is used to ensure each test run generates unique profiling data as opposed to overwriting the previous run.
Future possibilities
Specifying multiple crates to instrument for Code Coverage
This would allow for Rust developers to specify each of the crates they want to instrument in
advance and Cargo would be able to pass on the -C instrument-coverage
flag for only the
crates specified. This would allow a more targeted approach of getting code coverage results
and not for developers to instrument the entire dependency graph. This could either be in the form
of a manifest key in the toml file which would take a ,
separated list of crates to include in
the code coverage analysis or by specifying each crate at the command line using --coverage:crate_name
.