Pre-RFC: Split `CARGO_HOME`

Summary

Today, cargo stores all user-global files in CARGO_HOME (~/.cargo), including

  • Caches
  • Config
  • Installed binaries

Similarly, rustup stores all user-global files in ~/.rustup.

This RFC would provide a way for users to emulate platform-specific paths by providing environment variables to control each of these types of paths as an incremental, transitional step towards eventually supporting platform-specific paths.

Motivation

Benefits include:

  • Using a .cargo directory is violating the recommendations/rules on most operating systems. (Linux, Windows, macOS. Especially painful on Windows, where dotfiles are not hidden)
  • Putting caches in designated cache directories allows backup tools to ignore them. (Linux, Windows, macOS. Example: Time Machine ignores the cache directory on macOS)
  • It makes it easier for users to manage, share and version-control their configuration files, as configuration files from different applications end up in the same place, instead of being intermingled with cache files. (Linux and macOS)
  • Cargo contributes to the slow cleanup of the $HOME directory by stopping to add its application-private clutter to it. (Linux)
  • Using a standard directory for binary outputs can allow the user to execute Cargo-installed binaries without modifying their PATH variable. (Linux)

See arch's wiki for a non-exhaustive list of software that has native support for XDG paths and software that allows emulating it, like in this proposal.

While providing full support for XDG paths, and the Windows equivalent, would be ideal, we are going for this scaled back solution for now because

  • We'll likely want a transition window anyways and this just splits each phase into its own RFC, rather than deciding on every phase's details up front
  • We can get users benefits now without getting bogged down in the details like whether macOS users should use XDG or OS native paths
  • This allows us to collect feedback and iterate, rather than predict how important each use case is (e.g. how to lookup the paths)

Guide-level explanation

Users who want to split up ~/.cargo or ~/rustup into platform-specific paths can set

  • CARGO_CONFIG_HOME
  • CARGO_DATA_HOME
  • CARGO_BIN_HOME
  • CARGO_CONFIG_HOME
  • RUSTUP_CONFIG_HOME
  • RUSTUP_CACHE_HOME

Previously, we had CARGO_HOME and RUSTUP_HOME which allowed moving the directories as a monolith.

A Linux user could run the following migration script

CARGO_HOME=$(realpath ~/cargo)
CARGO_CONFIG_HOME=$(realpath ~/.config/cargo)
mkdir -p $CARGO_CONFIG_HOME
echo CARGO_CONFIG_HOME=$CARGO_CONFIG_HOME >> ~/.bashrc
CARGO_DATA_HOME=$(realpath ~/.local/share/cargo)
mkdir -p $CARGO_DATA_HOME
echo CARGO_DATA_HOME=$CARGO_DATA_HOME >> ~/.bashrc
CARGO_BIN_HOME=$(realpath ~/.local/share/cargo/bin)
echo CARGO_BIN_HOME=$CARGO_BIN_HOME >> ~/.bashrc
CARGO_CACHE_HOME=$(realpath ~/.cache/cargo)
mkdir -p $CARGO_CACHE_HOME
echo CARGO_CACHE_HOME=$CARGO_CACHE_HOME >> ~/.bashrc

RUSTUP_HOME=$(realpath ~/rustup)
RUSTUP_CONFIG_HOME=$(realpath ~/.config/rustup)
mkdir -p $RUSTUP_CONFIG_HOME
echo RUSTUP_CONFIG_HOME=$RUSTUP_CONFIG_HOME >> ~/.bashrc
RUSTUP_CACHE_HOME=$(realpath ~/.cache/rustup)
mkdir -p $RUSTUP_CACHE_HOME
echo RUSTUP_CACHE_HOME=$RUSTUP_CACHE_HOME >> ~/.bashrc

function migrate_cargo_config {
    local item=$1
    mv $CARGO_HOME/$item $CARGO_CONFIG_HOME/$item
    ln -s $CARGO_CONFIG_HOME/$item $CARGO_HOME/$item
}

function migrate_cargo_data {
    local item=$1
    mv $CARGO_HOME/$item $CARGO_DATA_HOME/$item
    ln -s $CARGO_DATA_HOME/$item $CARGO_HOME/$item
}

function migrate_cargo_bin {
    mv $CARGO_HOME/bin $CARGO_BIN_HOME
    ln -s $CARGO_BIN_HOME/$item $CARGO_HOME/$item
}

function migrate_cargo_cache {
    local item=$1
    mv $CARGO_HOME/$item $CARGO_CACHE_HOME/$item
    ln -s $CARGO_CACHE_HOME/$item $CARGO_HOME/$item
}

function migrate_rustup_config {
    local item=$1
    mv $RUSTUP_HOME/$item $RUSTUP_CONFIG_HOME/$item
    ln -s $RUSTUP_CONFIG_HOME/$item $RUSTUP_HOME/$item
}

function migrate_rustup_cache {
    local item=$1
    mv $RUSTUP_HOME/$item $RUSTUP_CACHE_HOME/$item
    ln -s $RUSTUP_CACHE_HOME/$item $RUSTUP_HOME/$item
}

migrate_cargo_config config.toml
migrate_cargo_data env
migrate_cargo_data .crates.toml
migrate_cargo_data .crates2.json
migrate_cargo_data credentials.toml  # avoid backing up secrets
migrate_cargo_bin
migrate_cargo_cache registry
migrate_cargo_cache git
migrate_cargo_cache target  # used by "cargo script"
migrate_cargo_cache .package-cache
migrate_cargo_cache .package-cache-mutate

migrate_rustup_config settings.toml
migrate_rustup_cache downloads
migrate_rustup_cache tmp
migrate_rustup_cache toolchains
migrate_rustup_cache update-hashes

Reference-level explanation

We'll add to the confusingly named home package, the following

  • cargo_config_home: Returns the first match
    1. CARGO_CONFIG_HOME, if set
    2. home::cargo_home()
  • cargo_data_home: Returns the first match
    1. CARGO_DATA_HOME, if set
    2. home::cargo_home()
  • cargo_bin_home: Returns the first match
    1. CARGO_BIN_HOME, if set
    2. home::cargo_home().join("bin")
  • cargo_cache_home: Returns the first match
    1. CARGO_CACHE_HOME, if set
    2. home::cargo_home()
  • rustup_config_home: Returns the first match
    1. RUSTUP_CONFIG_HOME, if set
    2. home::rustup_home()
  • rustup_cache_home: Returns the first match
    1. RUSTUP_CACHE_HOME, if set
    2. Return home::rustup_home()

Cargo will be modified to call these more specific home directories, based on the above migration script.

Each of these new environment variables will be blocked from being set in config.tomls [env] table.

Drawbacks

Why should we not do this?

~/.cargo/bin and related files requires both updated rustup and cargo

rustup assumes complete control of ~/.cargo/bin and people might map it to ~/.local/bin which would cause unexpected behavior

Rationale and alternatives

credentials.toml was put under CARGO_DATA_HOME as its program-managed data

  • CARGO_CONFIG_HOME might cause it to get backed up to public git repos, exposing secrets
  • Ideally people will start migrating to OS-native credential stores

Existing issues:

Previous proposals:

An alternative approach is to rely on caches being throw-away and to instead do

  • Read both configs (like git)
  • Hard split for cache

Reasons we didn't go with this:

  • This doesn't work for CARGO_BIN_HOME
  • Rustup doesn't support a layered config for reading from two locations at once

Prior art

git

  • Layers both the old ~/.git/config and the new ~/config/git/config on top of each other
  • No user cache

ansible

  • ANSIBLE_HOME
  • ANSIGLE_HOME_CONFIG (config file path)
  • ANSIBLE_GALAXY_CACHE_DIR

asdf

  • ASDF_CONFIG_FILE
  • ASDF_DATA_DIR

For more, see arch's wiki entry for XDG Base Directory.

Unresolved questions

  • Does config fallback work for rustup?
  • What should be done for env variables that are empty (treat is unset?) or relative (unset? error?)?
  • Is rustup update-hashes cache, state, data, or config?
  • Is rustup toolchains cache, state, data, or config?
  • What all paths should rustup uninstall delete?

Future possibilities

Platform-specific paths

Note: The information here is for illustrative purposes and discussing the details is only relevant so far as it might affect a decision being made within this RFC.

Update home package with the following

  • cargo_config_home: Returns the first match
    1. CARGO_CONFIG_HOME, if set
    2. CARGO_HOME, if set
    3. cargo_home(), if present
    4. linux or macOS:
    5. $XDG_CONFIG_HOME/cargo, if set
    6. ~/.config/cargo
    7. windows:
    • AppData\Roaming\Cargo
  • cargo_data_home: Returns the first match
    1. CARGO_DATA_HOME, if set
    2. CARGO_HOME, if set
    3. cargo_home(), if present
    4. linux or macOS:
    5. $XDG_DATA_HOME/cargo, if set
    6. ~/.local/share/cargo
    7. windows:
    • TBD
  • cargo_bin_home: Returns the first match
    1. CARGO_BIN_HOME, if set
    2. CARGO_HOME/bin, if set
    3. cargo_home().join("bin"), if present
    4. linux or macOS:
    • TBD
    1. windows:
    • TBD
  • cargo_cache_home: Returns the first match
    1. CARGO_CACHE_HOME, if set
    2. CARGO_HOME, if set
    3. cargo_home(), if present
    4. linux or macOS:
    5. $XDG_CACHE_HOME/cargo, if set
    6. ~/.cache/cargo
    7. windows:
    • AppData\Local\Temp\Cargo
  • rustup_config_home: Returns
    1. RUSTUP_CONFIG_HOME, if set
    2. RUSTUP_HOME, if set
    3. cargo_home(), if present
    4. linux or macOS:
    5. $XDG_CONFIG_HOME/rustup, if set
    6. ~/.config/rustup
    7. windows:
    • AppData\Roaming\Rustup
  • rustup_cache_home: Returns the first match
    1. RUSTUP_CACHE_HOME, if set
    2. RUSTUP_HOME, if set
    3. cargo_home(), if present
    4. linux or macOS:
    5. $XDG_CACHE_HOME/rustup, if set
    6. ~/.cache/rustup
    7. windows:
    • AppData\Local\Temp\Cargo

Windows: Roaming app data is no longer supported on Windows 11. We'll still use these paths to communicate intent to Windows to be future compatible.

macOS: This currently favors XDG over the platform-specific application directories to be more consistent with other CLI developer tooling. The platform-specific application directories can always be emulated by setting the appropriate environment variables. The final decision will be left to the relevant RFC

Open questions

  • The paths for macOS
  • Local vs roaming on Windows
  • Should the higher-precedence-with-presence check be used for cargo_home or platform-specific paths?
  • How do we want to handle the fact that rustup proxies set CARGO_HOME? When we get to the state when users aren't explicitly setting more specific variables, rustup's CARGO_HOME will win out

Automated migration

Add a rustup subcommand to migrate people to this

Compat symlinks

Have rustup manage symlinks when installing older toolchains

Cargo CLI for reading the values

While applications can call into home to get the values, sometimes a user will want it interactively.

Potential ideas include:

  • A cargo dirs built-in command
  • Reuse cargo config
  • cargo --print <target> much like rustc --print <target>

Allow setting CARGO_CONFIG_HOME from .cargo/config.toml

For sharing of caches between host and docker, rust-lang/cargo#6452 requested config-control over the global cache location. Being a cache, all results should be safe to re-calculate, unless some of the other CARGO_*_HOME variables.

33 Likes

I would love to see this. Programs ignoring XDG is a pet peeve of mine (I use chezmoi + git for configs, and I use btrfs with automated time based snapshots and would prefer not to snapshot cache).

With regards to the bin directory and associated files (crates2.json, ...), it can perhaps be answered by thinking about use cases for the split up data:

  • It is not configuration files (would I like to manage it in git and sync it between computers?)
  • It is not cache (would I mind if it is automatically removed by a tool like bleachbit that tries to free up disk space?)
  • It is maybe state (~/.local/cargo/bin etc?). At least I can't find an obvious reason why it wouldn't fit here

I actually currently sync ~/.cargo/bin and the the associated metadata files between computers using syncthing so I don't have to build everything using cargo install on my weaker laptop. I would prefer if that is still possible in whatever we end up with in the future. So either deferring it or putting all of those files in one place sounds good to me.

So, to explain some of my decisions in the script I recently posted to partition CARGO_HOME up via symlink:

When I was looking at the files I saw four categories of files:

  1. User edited config files:
  • config.toml
  1. Cargo managed database:
  • credentials.toml
  • (.crates.toml and .crates2.json probably, though I am avoiding cargo install on this system so didn't really think about them)
  1. Less important databases (useful for cargo --offline usage, but does not need backup):
  • git/db/
  • registry/index/
  • registry/cache/
  1. Short term caches (could be deleted between every cargo run without issue):
  • git/checkouts/
  • registry/src/

I then mapped those to my understanding of the XDG Base Directories specification:

  • XDG_CONFIG_DIR contains user-edited config files (1)
  • XDG_DATA_HOME contains application managed persistent data (2)
  • XDG_STATE_HOME contains less important data that doesn't require backup (3)
  • XDG_CACHE_HOME contains low priority caches (4)

Since posting that script I did also run into target/, which I have put into XDG_CACHE_HOME too.

Though, re-evaluating my understanding now I think I might actually demote 3 and 4 a level. git/checkouts/ and registry/src/ really don't need to be kept at all, so they can go in XDG_RUNTIME_DIR and get cleaned up automatically (I would not be disappointed to see cargo get the option to use fully transient directories for this set of caches, created just before rustc is run and then deleted again afterwards, I don't see any reason to keep them on disk). My instinct to put 3 into XDG_STATE_HOME was likely because I initially put 4 into XDG_CACHE_HOME and then needed somewhere more persistent to put them, 3 sounds closer to the intent of XDG_CACHE_HOME to me now.

That results in:

XDG_CONFIG_HOME
  config.toml
XDG_DATA_HOME
  credentials.toml
XDG_CACHE_HOME
  git/db/
  registry/index/
  registry/cache/
  target/
XDG_RUNTIME_DIR (/doesn't need to exist at all)
  git/checkouts/
  registry/src/

The credentials.toml feels like it's too important to go in XDG_CACHE_HOME to me, my expectation of that directory is it would be valid to have a periodic cleanup task deleting out of it.

3 Likes

That is quite the performance hit if we have to expand them on every build, especially for windows users. "Jar" for Rust: single file crate support for `rustc` would allow us to bypass this completely.

Yeah, something like that where the files never need to exist at all would be preferable, I doubt my idea is worth pursuing over that; it was mostly just some flavor of how non-persistent I see these files as being.

(I did notice that currently cargo will expand out all the sources even if the build is fully up to date, it seems like it should only expand them out once it has determined that that crate does need building.)

We at least stopped checking mtime of the files but I wouldn't be surprised if there are other reasons it works better to have things expanded to determine what to build. For example, the determining of what targets exists to build is taken from a Cargo.toml on disk. Its possible to stop expanding but the end-benefit seems relatively low, especially since someone deleting the files without holding one of the cargo locks can likely mess up a build.

I like this proposal! For me personally, a key motivation is putting CARGO_CACHE_HOME on a faster filesystem than CARGO_HOME (today I have rust/cargo installed on a very slow filesystem, which has seriously deleterious effects on compile times)

I'm really excited about this! Hopefully taking this incrementally will make it easier to migrate the defaults in the future.

Another potential reason for moving the cache: putting the cache into a directory that gets preserved between CI runs, without needing special-case CI logic for Cargo.

I like this proposal -- thanks for working on this!

As a macOS user, I don't love this choice -- mostly for the caches. I imagine that your reasoning about caches being throw-away applies on macOS just as much as it does on Windows and Linux, and I don't think consistency with other CLI developer tooling reasoning is very strong for the cache side of things. It makes more sense for config, particular @Nemo157's category 1 of user-edited config files.

In short, IMO CARGO_CACHE_HOME should default to ~/Library/Caches/cargo on macOS, and RUSTUP_CACHE_HOME should default to ~/Library/Caches/rustup. (Alternatively, Cargo and Rustup or org.rust-lang.cargo and org.rust-lang.rustup? Seems to be a decent mix of all three variants in my ~/Library/Caches at least.)

All this does bring back another age-old topic: are there fundamental good reasons that rustup's duties shouldn't move into cargo?

7 Likes

Every rustup version supports all older rustc versions and newer rustc versions up to a certain point. Cargo only works with 3 rustc versions at a time and as such you basically need a cargo matching to the rustc version you want to use, while you can always use the latest rustup version. Also rustup is so much simpler that you can be much more confident that a rustup update doesn't break anything, while for cargo supporting every rustc version ever produced would result in a lot of back compat hacks which are bound to break. Modern cargo depends on a lot of features that older rustc don't support. Everything from --cap-lints and json diagnostics to --emit metadata, --edition, -Cincremental, -Cembed-bitcode and -Clinker-plugin-lto. And I probably forgot a whole lot of options cargo depends on now which haven't existed since rustc 1.0.

1 Like

Discussions like this is a large part why that is under the Future Possibility. I mapped out a solution but we don't have to decide it now.

Honestly, if we're going to change this stuff it makes more sense to me to try to get it right in one go rather than ending up with a series of transitions/migrations, which has worse ergonomics for users.

3 Likes

The question is whether that future decision affects a decision now. If its a matter of what path we map a CARGO_*_HOME to in the future, then it doesn't need to be decided now. However, if its "what path maps to a CARGO_*_HOME, then that needs to be decided now.

I am undecided about environment variables. My .foorc file has more environment variables defined than I want and expected. I seldomly look at the file. For CI and manifest files environment variables are great.

For local machines a rust.toml would nice to specify the rust environment.

The problem then is where do you store rust.toml?

I would hope the end-state is that we can have something like: Cargo probes first <PLATFORM_CONFIG_DIR>/config.toml then ~/.cargo/config.toml, using either platform conventions or the existing layout based on which it finds; these environment variables are just an intermediate state on our path there (which hopefully can be used to provide evidence that people really want this transition to happen, rather than becoming a long-term "good enough" solution).

1 Like

Your Linux/Windows/MacOS could ship a toml that specifies where to put stuff in your home directory. Users could still override the default with their own toml.

I totally agree with the RFC to split up the paths to give users more flexibility and control.

Minor note on Windows platform paths:

Roaming data and settings is no longer supported as of Windows 11. If there's a logical split between "roaming" and "local" app configuration/settings (those that don't and do potentially refer to local file paths), then the split still makes sense as a sort of hint to people who might manually sync app data. But otherwise just use the local data folder.

Additionally, I believe local temp is supposed to be data that has no/little reason to persist between app starts, and local cache is a separate storage location. But the docs around that aren't super consistent and I can't find how those symbolic paths map to actual filesystem paths anyway.


My intuition for the eventually desired behavior would be (example cargo config path):

  • if CARGO_CONFIG_HOME set, use that; else
  • if CARGO_HOME set, use that; else
  • if ~/.cargo exists, use that (legacy support); else
  • if XDG_CONFIG_HOME set, use that; else
  • use a platform specific default:
    • Windows: %LocalAppData%\Cargo
    • macOS: ~/Library/Application Support/Cargo
    • Linux: ~/.config/cargo

Notably, no matter the specific choices of paths, CARGO_CONFIG_HOME should be the highest priority. So adding split home environment variables should easily be forwards compatible with any future default home migration.


I don't see this mentioned in the RFC text, but Cargo has the extra consideration that deliberately mixing use of old and new versions of Cargo is fairly common, by way of rustup. Maybe it's not particularly relevant, but it should probably be mentioned.

A rustup config to say "never use a toolchain older than X" might be useful, even if it's just a hard conflict when hitting a toolchain file asking for a too old toolchain. This would make it possible to guarantee only cargos new enough to respect the new home paths will run, and old config directories won't get accidentally created.

That would be great. Rust/Cargo using wrong directories on macOS is a pain not only because of backups, but also due to full disk search indexing modified files in real time. Spotlight thinks everything in user's home is super important and has to be indexed urgently, and runs on full blast while Cargo is working, slowing builds down (this is an even bigger issue for target/, but I'll take fixes one step at a time).

BTW, credentials.toml is not a cache — it takes manual labor to recreate this file. I'd be annoyed if I had to log in again every time I reboot or run low on disk space. Ideally this file should be replaced by code-signed cargo binary using system keychain directly (specifically not via an external credential helper binary that breaks the chain of trust).

5 Likes

It doesn't respect the CACHEDIR.TAG files created by cargo?

1 Like

Just because roaming is not longer supported, doesn't mean we should use anything other than %AppData% for persistent application state.

Windows may in the future decide to rename AppData/Roaming to someone else like Persistent or even layer it over Local. We should still use the %AppData% environment variable regardless. Where that points is up to Microsoft.