Giving each toolchain/build target its own target directory

Hi, I recently encountered an issue when trying to compile & run two rust binaries of the same workspace using two toolchains version (one stable, and one nightly). I could not compile both at the same time because rustc lock the whole target directory when compiling.

This lead me to the following question: why isn't cargo creating one target sub-directory for each toolchain version/triple ?

The fact that compiling a program using a nightly toolchain, and then compile it again using a stable toolchain only redoes the linking step (building the final binary file) makes me think that the build artifacts are not shared between the two versions of the rust toolchain.

If so, wouldn't it make more sense to separate the toolchain artifacts into different folders ? This would allow for a more granular target locking mechanism (allowing multiple targets to be built in parallel), and would prevent the final binary from being replaced by each toolchain. This would also play well with compiling on Windows (targeting windows-msvc) and in WSL (targeting linux-gnu for example) at the same time. And finally, it would allow to cargo clean a single target (so you don't have to re-compile everything for the other targets afterwards).

I skimmed through the RFC and IRLO posts mentioning the target folder, but couldn't find anything about this specific problem.

I was thinking about something like

target/
β”œβ”€ stable-1.57-x86_64-unknown-linux-gnu/
β”œβ”€ stable-1.57-x86_64-pc-windows-msvc/
β”œβ”€ stable-1.56-x86_64-pc-windows-msvc/
β”œβ”€ nightly-2021-12-07-unknown-linux-gnu/

Where there would be one "target folder" for each toolchain version/target.

If little is shared between two toolchains, this shouldn't cause any problems, cargo already has to know which toolchain version/target to use in order to execute its commands (such as run/test/build/check/etc...).

Of course, this is all purely based on what little understanding I have of the build artifacts produced by rustc. If there is a large amount of data shared between multiple targets, then it might be better to continue using a single target directory.

1 Like

This would be a breaking change. The location of the produced final artifact is guaranteed to be target/<profile>. Your proposal would change it to target/<toolchain>/<profile> (or just keep the final artifact in target/<profile> and the dependencies in target/<toolchain>/<profile>, but that would still require a lock on target/<profile>) You can already emulate it using an explicit --target-dir argument to all cargo invocations.

Producing per-toolchain output in addition to current output wouldn't be breaking, woud it?

That would duplicate every file on systems that don't support either symbolic or hard links, though.

I would be fine with this as long as build products (executables, static libraries, and cdylibs) remained where they are. target/release/foo is easy to hardcode in scripts, but toolchain-specific paths can't be.

And then I'd prefer a step further, and move all of Cargo-private build temporaries into subdirectories of /var/tmp/cargo/ (or equivalent platform-appropriate cache dir).

This is already a bad assumption because one might not be building in release mode, have passed --target-dir, etc. cargo run, cargo install, and coordination with --target-dir are the only ways to get "stable" paths out of an arbitrary build and I'd push any tools making assumptions to poke holes for such things. For example, I use target/ale for vim-ale to avoid stalling my main builds and target/tarpaulin for cargo-tarpaulin to avoid complete rebuilds when handling coverage.

Please no. Keeping the artifacts is nice and saves CPU time (sccache can too, but it can only store so much), but moving them off into some far away place is just asking to get suddenly surprised by a ENOSPC error randomly (at least I'm more aware of space limitations in $HOME than /var/tmp on my machines) given the sizes of these directories. Also, given that /var/tmp is world-writable, unless mkdtemp is in use, squatting is possible and that means that directories aren't predictable and some "where is the build tree" artifact needs dropped into the source tree so it can be found again. Maybe $XDG_CACHE_DIR and friends, but that still has the "what is the build tree path for this source tree?" problem without recreating the hierarchy under the cache directory (and erroring for source trees under that directory).

FWIW, I have a "wonky" setup, but it works for me:

- name-of-project/            # directory for everything related to this project
  - builds -> $HOME/misc/builds/name-of-project/
                              # symlink off to backup-ignored directory
                              # with all build trees across the system
  - target -> builds/target/  # symlink to a build tree under the build forest
  - src/                      # the source repository (`src-$reason/` for worktrees)
    - target -> ../target/    # symlink to make cargo build bits into the "build tree forest"

This "build tree forest" can then be perused with ncdu to prune out excessively large directories as needed (instead of being lost in the noise of monstrous git repositories or whatever). This is also the reason I end up with a handful of "fix .gitignore to support symlinked target" PRs a year.

3 Likes

This is even more the case now that custom cargo profiles can be declared/used.

I tried to see how much of the target files were shared by multiple toolchains by building one of my project once using a stable toolchain and once using a nightly toolchain and comparing the resulting target folders (using the files md5 hashs and paths) :

find ./target -type f -exec md5sum "{}" + | cut -d/ -f1,3- > target.chk2 && sort target.chk2 > target.chk && rm target.chk2
find ./target_nightly -type f -exec md5sum "{}" + | cut -d/ -f1,3- > target_nightly.chk2 && sort target_nightly.chk2 > target_nightly.chk && rm target_nightly.chk2
comm -12 target.chk target_nightly.chk

And this only resulted in a few files:

./CACHEDIR.TAG
./release/deps/project.d
./release/project.d
./debug/project.d
./debug/.fingerprint/project-ffffffffffff/dep-bin-project
./release/.fingerprint/project-ffffffffffff/dep-bin-project
./debug/.fingerprint/project-ffffffffffff/output-bin-project
./release/.fingerprint/project-ffffffffffff/output-bin-project
./debug/.cargo-lock
./release/.cargo-lock
./debug/deps/project.d
./debug/.fingerprint/project-ffffffffffff/invoked.timestamp
./release/.fingerprint/project-ffffffffffff/invoked.timestamp

Which, if I'm not mistaken, are all just configuration files and not build artifacts. It seems that the toolchains share very little of the contents of the target directory (these files occupy a few kilobytes of disk space). If this is indeed the case, giving each toolchain its own directory would only result in ~10KB of overhead (per target) due to these config/lock files.

Except that such script can decide not to use a custom profile or a custom --target-dir, so thr existence of these options is backwards compatible, but it can't choose to use a version of cargo that uses a shared target dir if newer versions of cargo choose to use separate target dirs, so using separate target dirs is a breaking change.

In this case, wouldn't it be possible to change this in the rust edition ? Cargo needs to know the current edition at some point to pass it to rustc.

Another use of such a feature I can think of would be to be able to run cargo build while rust-analyzer is running cargo check in the background (which is not currently possible because of the target lock).

I'm going to second @mathstuf on this; I strongly prefer having all the build artifacts in target (or wherever I choose to put it) rather than having it dribbled around my filesystem. I know that /var/tmp gets cleaned up automatically on its own, but when you're running low on disk space knowing that you can just write a quick bash script to run cargo clean on all your projects (or more dangerously, git clean -xdf) and get back some space is handy.

A workspace doesn't have an edition. Editions are per-package.

You can tell rust-analyzer to pass --target-dir to cargo using rust-analyzer.checkOnSave.extraArgs.

The problem is that /tmp is cleared every time you shutdown your system. It should rather be in ~/.cache. This has the problem of being less discoverable than a target dir in the workspace root though.

1 Like

But that is also a problem. Suppose I have several directories full of projects that roughly segregate how important a project is to me. I can select which directory I choose to run cargo clean on, cleaning up from the least to the most important. Whenever I have enough space that I don't need to clean up stuff, I can quit. cargo doesn't have a clue as to what I think of as important, so I've forgotten to do cargo clean before I delete a directory then all the stuff in .cache/cargo that relates to a (now defunct) project will remain. The only easy way is to delete the .cache directory and let things get rebuilt slowly over time. That is less than ideal.

That is in addition your point about it being less discoverable.

Reading your previous reply again I now understand you were arguing against /var/tmp, not in favor. My bad. I was also arguing against.

1 Like

I don't get it how you find it easier to write a script that finds all cargo projects and runs cargo clean, instead of calling rm -rf /var/tmp/cargo. Subdirs there could be named after crates (+hash), so you could also do rm -rf /var/tmp/cargo/junkproject*. The bonus would be that it only deletes temporary files, but not build products, so your target/release/exe remains unharmed.

Also instead of adding micromanagement chores, Cargo should be smarter with the size of its caches. For example, web browsers have non-trivial logic for figuring out how much space they can reasonably use, and clean it up automatically before they fill-up the whole disk.

It has other benefits too, e.g. macOS would know not to run spotlight indexing on these files. Currently it goes bananas when it sees thousands of new files to index, and slows down compilation.

Mostly it's a muscle memory thing, and a 'compactness' thing. I like having everything that I build in one place, so I know it's all related. When stuff gets dribbled around in other places, it just bothers me.

100% agree with you on this! That said, it's still easier for me to think about everything being clustered in the same spot.

A hash of…what? It needs to be unique to the workspace and discoverable from it. But also, to avoid squatting and inter-user shenanigans (ln -s /place/other/users/but/not/me/have/access/to /var/tmp/cargo), it needs to be non-guessable. Would there just be an artifact dropped in .cargo/ that says "where mkdtemp sent intermediate artifacts"?

Note that things like --target-dir needs to be taken into account so that we don't have the same "changing build tool bits block/wipe each other".