Pre-RFC: Cargo profile setting to sanitise host-dependent absolute paths, enabled by default for release builds

Summary

There should be a new profile setting named like sanitize-paths to sanitise all host filesystem dependent paths (e.g. $PWD and /home/username/.cargo) that may be embedded in the compilation output, backed up by --remap-path-prefix in rustc. This should be enabled by default for release profile.

Motivation

  1. Privacy. Most of the times the paths will include the username, and release binaries built on a crate maitainer's machine may be distributed. Additionally, some CI (such as GitLab CI) checks out the repo under a path where it may include things that really aren't meant to be public. Without sanitising the path by default, this may be inadvertently leaked.
  2. Facillitating binary reproducibility. In the below example, I was able to produce binary equivalent executables on two different x86_64 Linux machines. Some very non-trivial programs can also be made easily reproducible with the same RUSTFLAGS, such as ogham/dog. Path remapping alone will not guarantee reproducible builds on everthing (for instance, cargo built from source are still different on those two machines), but this simple change probably could cover a lot of cases.

The Problem

Currently, binaries (and libraries) built with cargo have a lot of local filesystem dependent paths baked in. They mostly exist inside panic messages.

With an example:

Cargo.toml:

[package]
name = "rfc"
version = "0.1.0"
edition = "2018"

[dependencies]
rand = "0.8.0"

src/main.rs

use rand::prelude::*;
    
fn main() {
    let r: f64 = rand::thread_rng().gen();
    println!("{}", r);
}

Then run

$ cargo build --release
$ strings target/release/rfc | grep $HOME
could not initialize thread_rng: /home/cbeuw/.cargo/registry/src/github.com-1ecc6299db9ec823/rand-0.8.3/src/rngs/thread.rs
/home/cbeuw/.cargo/registry/src/github.com-1ecc6299db9ec823/rand_chacha-0.3.0/src/guts.rsdescription() is deprecated; use Display
/home/cbeuw/.cargo/registry/src/github.com-1ecc6299db9ec823/getrandom-0.2.2/src/util_libc.rs

However, if we manually remap the paths to the working directory and .cargo folder by supplying --remap-path-prefix to rustc (Note this won't fully work if you have rust-src component installed until #73167 is fixed. There is a fix in progress. This post assumes rust-src isn't installed.):

$ RUSTFLAGS="--remap-path-prefix=$PWD=. --remap-path-prefix=$CARGO_HOME=cargo_home" cargo build --release
$ strings target/release/rfc | grep registry

Path to the cargo registry will be remapped:

could not initialize thread_rng: cargo_home/registry/src/github.com-1ecc6299db9ec823/rand-0.8.3/src/rngs/thread.rs

With a new profile setting called sanitize-paths or similar, this can be done automatically so users need not to manually discover which paths should to be remapped. It's also logical to have this done by default when the user does cargo build --release.

Reference-level explanation

With sanitize-paths = true, the following paths will be remapped, in the following order

  1. The working directory under which the cargo command is invoked. This will be remapped to .
  2. Cargo's home directory will be remapped to /cargo on Unix and \\cargo on Windows
  3. Rust's sysroot will be remapped to /sysroot on Unix and \\sysroot on Windows
  4. Any other package's root will be remapped to /[package name] on Unix and \\[package name] on Windows

For instance:

  • /home/cbeuw/rfc/src/main.rs -> ./src/main.rs
  • /home/.cargo/registry -> /cargo/registry
  • /home/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib -> /sysroot/lib
  • /home/cbeuw/path/dependencies/foo/src/lib.rs -> /foo/src/lib.rs

(TODO: should we ignore the path separator/prefix differences on the last two and make it the same across board?)

Drawbacks

For source files under Cargo's home or Rust's sysroot, panic messages in release builds will no longer print a valid path. However, source files in the working directory, presumably ones the user is working on, will still have the correct relative path. The overall impact should be small and confusion unlikely.

9 Likes

I'm actually fairly positive towards doing this. For each directory that cargo uses, it already uses or creates an environment variable for the base directory. So "all" that's needed is to map $CARGO_HOME to the literal string $CARGO_HOME, $CARGO_TARGET_DIR to its literal string, and $CARGO_MANIFEST_DIR to its literal string. (Using the --manifest-path of the invoked workspace avoids the build being dependent on where you actually invoke cargo from.) Rustup has $RUSTUP_HOME, etc.

Some consideration will need to be had for path dependencies used rather than workspace or cargo dependencies, along with [patch] overrides, but these could be solved or decided out of initial scope.

2 Likes

Somewhat related, but going in the opposite direction:

Some paths embedded in the executables are ambigious. For example, at the moment I am looking at the following stack trace:

   0: std::panicking::begin_panic
   1: ide_assists::tests::check
             at ./src/tests.rs:156:13
   2: ide_assists::tests::check_assist
             at ./src/tests.rs:36:5

The problem is, this is a workspace, so there's more than one ./src/tests.rs file. As a result, I can't ctrl+click on the path to open the file.

3 Likes

This is better than nothing, but I think this should be done for debug builds as well, providing ways for debuggers and tools to still work (e.g. for gdb you can create a .gdbinit with "set substitute-path" commands, provide a patched gdb or provide a wrapper that bind mounts the paths).

In the current situation debugging would be broken if you move the source elsewhere, which is simply wrong since project directories and home directories must always be movable without causing issues.

I'm not sure that "." is the best prefix for the current project though, maybe something like "//rust/work/[crate name]/[git tree hash for current directory contents]" might be better; also I think we should add "//rust" (with two slashes, which is semantically the same as one) to all paths so that it doesn't conflict with actual paths and works fine in a multi-language environment.

It would be fine if all paths were workspace-relative for crates compiled as part of a workspace (as opposed to cargo-manifest-relative).

project directories and home directories must always be movable without causing issues.

This seems like a bizarre requirement. Why would you move the source directory while you're actively working on the code? This seems like a very tiny benefit, and it breaks Ctrl+click in IDEs.

I wouldn't move a source directory while working on it. But if I decide to reorganize my entire home directory, I do expect that project directories that aren't in use right now can be moved around wholesale without breaking anything.

Please let's not make up top level directories or UNC paths (what if there exists a CIFS server addressable as \\sysroot on someone's LAN?) @CAD97's suggestion of beginning all the remappings with $CARGO_SOMETHING_OR_OTHER/ seems safer to me.

2 Likes