Some non-trivial(above 100k loc) Rust projects I'm working on or have worked on before all have some kind of compile-time/bloat issues. The compile time is almost tolerable, but besides time, there is one common problem: they all generate very large debug files, for example, a pdb file larger than 1GB.
I suspect the real issue causing the debug info bloating is that there are many very long type names. In my project, or I assume many Rust projects, use very complex and nested types to model the problem and express the zero-cost abstraction is a recommended pattern(for example iterators futures). However, this approach tends to create very long type names when compiling.
I use a pdb parser to read the 1GB pbd files I mentioned above and do some analysis. It turns out the type name takes 56% of that pdb file in byte size. There is not a single very long name, but hundreds of thousands of very long type names. In that project, I have already do a lot of trait boxing to avoid long type name.
Very long type names may cause the linker to fail, increase compile time, and bloat the compiler debug output substantially. Long type names is not useful at all when debugging. In my opinion, this is an important and practical issue for Rust, because it directly hinders the most common Rust coding/architecture style.
So I suggest adding some forms of compiler configuration: For type names exceeding the given threshold, only generate names for the outer part of the type, using an opaque ID or hash for the internal part to avoid creating very long type names in any compiling stage.
If I enable more optimization in debug build, the pdb size and executable size will reduce a lot start from o1. but the all_name_size/pdb_size ratio stays same.
If the name size larger than 1000 byte considered large name(5-10 lines in typical console window). Then for that 928mb pdb, all large name takes 370mb(40%), all name is 540mb(59%).
I also print the name size histogram of the large names:
Maybe PDB and DWARF need a representation for compound strings ... of course, then you end up worrying about the billion laughs attack, only it might not even be (in fact, probably isn't) an attack.
DWARF supports compressed debug info. There doesn't seem to be a way to enable it in cargo or even rustc though. That seems like a worthwhile thing to implement or at least experiment with. I have no idea if PDB supports this.
Another option is to enable compression at the file system level (btrfs, NTFS and a few more support this).
Rust's "legacy" symbol names already have a hash, and "v0" format supports backreferences, which is a form of compression.
But I'd like the problem to be tackled more at the source. Instead of generating tons of data to process and compress, generate less!
Debuginfo of zero-cost abstractions isn't zero-cost
It's preserved in full fidelity, but it's practically useless when the code for it compiles down to a single instruction or nothing. It's supposed to help debugging, but it has a net-negative value for debugging- it's tedious to jump into multiple layers of tiny wrapper functions, like < going through PartialOrd trait. Sometimes std uses specialization traits, which adds a ton of abstract indirect boilerplate in debug info, only to remove the code behind it.
I wish I could just completely discard debuginfo for all inlineable code. Having that debuginfo is worse than not having it.
The overly detailed excessively inlined debuginfo also destroys code attribution in godbolt. Almost every line technically is from core, and godbolt isn't showing which lines of code I wrote compiled to. I was shocked when I compared that to C++ which doesn't attribute stl templates to its standard library, so godbolt has 1:1 mapping between every C++ source code line and its assembly. Rust has maybe 1 in 10 lines working, and they're usually useless ones like function prolog.
Automatically de-genericize code
The most bloated generic code usually has lots of unused parameters. Every method of a type inherits all of its parameters, but not every method uses all of them. When the types are nested (iterators, futures, closures) every needlessly varying generic argument multiplies the cost.
People sometimes fix it by hand by wrapping fragments of code in local functions with fewer (or zero) generic type arguments, but I wish the compiler could do that automatically.