Pre-MCP set compressed_debug_sections=zstd as the default for linux

I feel we should set --compress-debug-sections=zstd in the link args by default for linux rust targets . I would like others thoughts on this before I raise a proper MCP.

Reasoning

Currently the target folder can get very big for large rust projects and every once in a while we can see this popping up on reddit as well. For e.g

By setting --compress-debug-sections=zstd in the link args we can clearly see the size difference for the hello world programs and feels like a simple low hanging fruit.

RUSTFLAGS="-C link-arg=-Wl,--compress-debug-sections=zstd" cargo build  --target-dir=compress_zstd_del && du -sh compress_zstd_del
   Compiling hello_world v0.1.0 (/mnt/g/rust_projects/hello_world)
    Finished dev [unoptimized + debuginfo] target(s) in 1.20s
1.5M    compress_zstd_del

cargo build  --target-dir=base_del && du -sh base_del
   Compiling hello_world v0.1.0 (/mnt/g/rust_projects/hello_world)
    Finished dev [unoptimized + debuginfo] target(s) in 1.60s
4.7M    base_del

Even go had done the same thing .

Not only that is there a reduction in folder size, In my wsl2 environment, I see wall time improvements as well while compiling hello world.

RUSTFLAGS="-C link-arg=-Wl,--compress-debug-sections=zstd" hyperfine --prepare "cargo clean --target-dir=compress_zstd" "cargo build  --target-dir=compress_zstd" -w 3
Benchmark 1: cargo build  --target-dir=compress_zstd
  Time (mean ± σ):      1.247 s ±  0.070 s    [User: 0.157 s, System: 0.383 s]
  Range (min … max):    1.122 s …  1.319 s    10 runs

hyperfine --prepare "cargo clean --target-dir=base" "cargo build  --target-dir=base" -w 3
Benchmark 1: cargo build  --target-dir=base
  Time (mean ± σ):      1.651 s ±  0.027 s    [User: 0.152 s, System: 0.470 s]
  Range (min … max):    1.599 s …  1.693 s    10 runs

Other Misc things

  • On windows the only resource I can find relating to this is Shrink my Program Database (PDB) file - C++ Team Blog but I am not seeing any effect of this by passing /PDBCompress in the link arg . Maybe others who are well versed there can help.
  • I dont own a macOS machine. So others can pitch in here.
4 Likes

I'd love to see this happen. The best way to make a case for it would be to document which debuggers (or other tools relying on debug information) do and don't support it, in which versions, and thus which debuggers/tools we'd be breaking if we did this. If this means breaking decade-old debuggers in favor of making the experience better for everyone else, that's an easy case to make. If it means breaking quite recent debuggers that are still in contemporary use, that's probably not going to happen. If it's somewhere in between, we'll need to make a judgement call.

3 Likes

Why would this be? I expect a modern Gen 3 or Gen 4 NVME SSD would be more than quick enough that compression would be the bottleneck rather than the SSD (or more realistic in this case, something else entirely in the linker would be the bottneck).

On an actual harddrive sure, compression would help, but I doubt you are using that?

Maybe with a faster compression like lzo those results would make sense, but I found zstd quite slow except at very low compression levels. Is that what is going on here perhaps?

Also, is that only on WSL or on native Linux too? The former would perhaps indicate something strange with the IO performance on the Windows host.

It's also possible that the way the format works means it gives 50%+ compression even on low levels. (Like how rust code easily gets about 75% compression even on "fastest" zip.)

1 Like

If you have enough RAM all the data just goes to the page cache, not immediately to disk. We don't fsync anything and I hope we're not triggering auto_da_alloc heuristics too often. So you shouldn't be IO-limited at all. I guess it just means the linker has to shovel fewer bytes from A to B and that means it finishes faster.

Unless he's using the 9p remote filesystem on WSL (anything accessing the c drive). That would indeed be terribly slow.

The question is how it affects runtime performance. If debug parsers have to decompress the data into memory it instead of mmaping it that could hurt perf if anything prints stacktraces.

1 Like

I have no thoughts on the solution but I feel the motivation itself is lacking. When discussing target/ size, we need to discuss why its big so we focus on the right problems that will make a meaningful difference. Without that context, I see this only being about reducing the size of what gets distributed, which for myself means I would only want this for release builds so it doesn't impact my debug build times.

For example, possible scenarios

Sorry for the late reply.

I am getting perf numbers on a dev drive backed by HDD.

The perf number improvement is most likely the time taken to write the data to disk. I dont own an linux machine so had to use WSL.

1 Like

Sorry for the late reply,

Which debuggers should I check for. Is checking for rust provided debuggers (i.e rust-gdb and rust-lldb present in ~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/) enough ?

One data point: binutils and gdb in Ubuntu 22.04LTS don't support zstd. Based on the package dependencies, it also looks like dwarfdump in Ubuntu 23.10 also doesn't support zstd.

Edit: based on zstd compressed debug sections | MaskRay, support for zstd is very recent. It won't be really widespread for a few years.

1 Like

Consider checking other common tools that consume debug data, such as:

  • Valgrind
  • perf --call-graph=dwarf (no idea if perf works under WSL)

There are probably others debuggers and profilers, but those are the ones that come to mind (apart from gdb and lldb).

1 Like

Sorry for the late reply,

To be honest fix: stop emitting .debug_pubnames and .debug_pubtypes was the the movitation for me to write this post.

Even with cargo GC this is helpful as the size of libs itself is smaller.

I dont think this is a bad solution for addressing target folder sizes. If am not mistaken rustc driver also uses compressed debug sections. Also there are projects which legit can make use of the above even for debug builds for e.g

with arrow-rs project I am able to save 300mb of space for library only build without tests.

$ RUSTFLAGS="-C link-arg=-Wl,--compress-debug-sections=zstd" cargo build --lib  --target-dir=compress_zstd_del && du -sh compress_zstd_del
1.8G    compress_zstd_del
$ cargo build --lib --target-dir=base_del && du -sh base_del
2.1G    base_del

Ouch, my condolences! But that is not really a realistic scenario for most developers any more. Consider also measuring on a machine with an NVME SSD to get the other extreme. My intuition is that the overhead of compression would be a slowdown due to taking CPU cycles from other parts of the build process. The disk interface on a modern PCIE gen4 SSD is insanely fast and I doubt it is the bottleneck.

I do think this is a realistic senario, because in my country a lot students end up buying hdd electronics because SSD ones are pretty expensive.

Also modern processors are pretty fast, CPU overhead are lower than writing to disks. For e.g mine is a 10thgen i5 processor.

2 Likes

rust-lld has support for it.

$ ~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/bin/gcc-ld/ld.lld --help | grep compress
  --compress-debug-sections=[none,zlib,zstd]

I see, but we still need to look at both ends of the spectra to determine what is realistic across the world and what the performance impact would be.

Yes, for HDDs for sure. But just measure some different scenarios (or get help doing it if you don't have access to such hardware yourself). That will make a much more solid case when you submit the MCP.

Also consider measuring what the effect is in Github CI on a large rust project with this change. In my experience CI nodes are pretty anemic when it comes to CPU, few CPU cores. But storage seems not that much slower. And CI is also an important use case.

That seems like a showstopper for making this default. Many developers I know of (myself included) use Ubuntu LTS or some other enterprise distro at work. Until tooling in Ubuntu LTS and whatever the relevant version of Red Hat Enterprise Linux is has support I don't see this going anywhere.

(At home I run Arch Linux, so that wouldn't be affected, but yeah I have to use Ubuntu LTS at work.)

Not exactly a show stopper since we can use zlib instead of zstd. Or once rust-lld becomes the default we can use zstd as the default .

No, rust-gdb is a wrapper around the host gdb, so until that has support as well, you can't. I don't know if the plan for rust-lld is to ship it or wrap the host one.

And there is still all the other host tooling like valgrind and perf to consider.

I think you are confusing rust-lldb with rust-lld .

rust-lldb is the debugger

~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/rust-lldb --help
lldb not found! Please install it.

rust-lld is the linker.

$~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/bin/rust-lld --help
lld is a generic driver.
Invoke ld.lld (Unix), ld64.lld (macOS), lld-link (Windows), wasm-ld (WebAssembly) instead

No I'm not confusing them. The linker needs to be able to produce a binary with compressed debug info, and the debuggers need to be able to consume it. Thus both sides need proper support.

If the support isn't there in LTS/Enterprise distros at this time, I don't see this suggested change going anywhere for now.