I feel we should set --compress-debug-sections=zstd in the link args by default for linux rust targets . I would like others thoughts on this before I raise a proper MCP.
Reasoning
Currently the target folder can get very big for large rust projects and every once in a while we can see this popping up on reddit as well. For e.g
By setting --compress-debug-sections=zstd in the link args we can clearly see the size difference for the hello world programs and feels like a simple low hanging fruit.
RUSTFLAGS="-C link-arg=-Wl,--compress-debug-sections=zstd" cargo build --target-dir=compress_zstd_del && du -sh compress_zstd_del
Compiling hello_world v0.1.0 (/mnt/g/rust_projects/hello_world)
Finished dev [unoptimized + debuginfo] target(s) in 1.20s
1.5M compress_zstd_del
cargo build --target-dir=base_del && du -sh base_del
Compiling hello_world v0.1.0 (/mnt/g/rust_projects/hello_world)
Finished dev [unoptimized + debuginfo] target(s) in 1.60s
4.7M base_del
Not only that is there a reduction in folder size, In my wsl2 environment, I see wall time improvements as well while compiling hello world.
RUSTFLAGS="-C link-arg=-Wl,--compress-debug-sections=zstd" hyperfine --prepare "cargo clean --target-dir=compress_zstd" "cargo build --target-dir=compress_zstd" -w 3
Benchmark 1: cargo build --target-dir=compress_zstd
Time (mean ± σ): 1.247 s ± 0.070 s [User: 0.157 s, System: 0.383 s]
Range (min … max): 1.122 s … 1.319 s 10 runs
hyperfine --prepare "cargo clean --target-dir=base" "cargo build --target-dir=base" -w 3
Benchmark 1: cargo build --target-dir=base
Time (mean ± σ): 1.651 s ± 0.027 s [User: 0.152 s, System: 0.470 s]
Range (min … max): 1.599 s … 1.693 s 10 runs
Other Misc things
On windows the only resource I can find relating to this is Shrink my Program Database (PDB) file - C++ Team Blog but I am not seeing any effect of this by passing /PDBCompress in the link arg . Maybe others who are well versed there can help.
I dont own a macOS machine. So others can pitch in here.
I'd love to see this happen. The best way to make a case for it would be to document which debuggers (or other tools relying on debug information) do and don't support it, in which versions, and thus which debuggers/tools we'd be breaking if we did this. If this means breaking decade-old debuggers in favor of making the experience better for everyone else, that's an easy case to make. If it means breaking quite recent debuggers that are still in contemporary use, that's probably not going to happen. If it's somewhere in between, we'll need to make a judgement call.
Why would this be? I expect a modern Gen 3 or Gen 4 NVME SSD would be more than quick enough that compression would be the bottleneck rather than the SSD (or more realistic in this case, something else entirely in the linker would be the bottneck).
On an actual harddrive sure, compression would help, but I doubt you are using that?
Maybe with a faster compression like lzo those results would make sense, but I found zstd quite slow except at very low compression levels. Is that what is going on here perhaps?
Also, is that only on WSL or on native Linux too? The former would perhaps indicate something strange with the IO performance on the Windows host.
It's also possible that the way the format works means it gives 50%+ compression even on low levels. (Like how rust code easily gets about 75% compression even on "fastest" zip.)
If you have enough RAM all the data just goes to the page cache, not immediately to disk. We don't fsync anything and I hope we're not triggering auto_da_alloc heuristics too often. So you shouldn't be IO-limited at all.
I guess it just means the linker has to shovel fewer bytes from A to B and that means it finishes faster.
Unless he's using the 9p remote filesystem on WSL (anything accessing the c drive). That would indeed be terribly slow.
The question is how it affects runtime performance. If debug parsers have to decompress the data into memory it instead of mmaping it that could hurt perf if anything prints stacktraces.
I have no thoughts on the solution but I feel the motivation itself is lacking. When discussing target/ size, we need to discuss why its big so we focus on the right problems that will make a meaningful difference. Without that context, I see this only being about reducing the size of what gets distributed, which for myself means I would only want this for release builds so it doesn't impact my debug build times.
Which debuggers should I check for. Is checking for rust provided debuggers (i.e rust-gdb and rust-lldb present in ~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/) enough ?
One data point: binutils and gdb in Ubuntu 22.04LTS don't support zstd. Based on the package dependencies, it also looks like dwarfdump in Ubuntu 23.10 also doesn't support zstd.
Even with cargo GC this is helpful as the size of libs itself is smaller.
I dont think this is a bad solution for addressing target folder sizes. If am not mistaken rustc driver also uses compressed debug sections. Also there are projects which legit can make use of the above even for debug builds for e.g
with arrow-rs project I am able to save 300mb of space for library only build without tests.
Ouch, my condolences! But that is not really a realistic scenario for most developers any more. Consider also measuring on a machine with an NVME SSD to get the other extreme. My intuition is that the overhead of compression would be a slowdown due to taking CPU cycles from other parts of the build process. The disk interface on a modern PCIE gen4 SSD is insanely fast and I doubt it is the bottleneck.
I see, but we still need to look at both ends of the spectra to determine what is realistic across the world and what the performance impact would be.
Yes, for HDDs for sure. But just measure some different scenarios (or get help doing it if you don't have access to such hardware yourself). That will make a much more solid case when you submit the MCP.
Also consider measuring what the effect is in Github CI on a large rust project with this change. In my experience CI nodes are pretty anemic when it comes to CPU, few CPU cores. But storage seems not that much slower. And CI is also an important use case.
That seems like a showstopper for making this default. Many developers I know of (myself included) use Ubuntu LTS or some other enterprise distro at work. Until tooling in Ubuntu LTS and whatever the relevant version of Red Hat Enterprise Linux is has support I don't see this going anywhere.
(At home I run Arch Linux, so that wouldn't be affected, but yeah I have to use Ubuntu LTS at work.)
No, rust-gdb is a wrapper around the host gdb, so until that has support as well, you can't. I don't know if the plan for rust-lld is to ship it or wrap the host one.
And there is still all the other host tooling like valgrind and perf to consider.
No I'm not confusing them. The linker needs to be able to produce a binary with compressed debug info, and the debuggers need to be able to consume it. Thus both sides need proper support.
If the support isn't there in LTS/Enterprise distros at this time, I don't see this suggested change going anywhere for now.