Compiler should avoid generating very long type name

Some non-trivial(above 100k loc) Rust projects I'm working on or have worked on before all have some kind of compile-time/bloat issues. The compile time is almost tolerable, but besides time, there is one common problem: they all generate very large debug files, for example, a pdb file larger than 1GB.

I suspect the real issue causing the debug info bloating is that there are many very long type names. In my project, or I assume many Rust projects, use very complex and nested types to model the problem and express the zero-cost abstraction is a recommended pattern(for example iterators futures). However, this approach tends to create very long type names when compiling.

I use a pdb parser to read the 1GB pbd files I mentioned above and do some analysis. It turns out the type name takes 56% of that pdb file in byte size. There is not a single very long name, but hundreds of thousands of very long type names. In that project, I have already do a lot of trait boxing to avoid long type name.

Very long type names may cause the linker to fail, increase compile time, and bloat the compiler debug output substantially. Long type names is not useful at all when debugging. In my opinion, this is an important and practical issue for Rust, because it directly hinders the most common Rust coding/architecture style.

So I suggest adding some forms of compiler configuration: For type names exceeding the given threshold, only generate names for the outer part of the type, using an opaque ID or hash for the internal part to avoid creating very long type names in any compiling stage.

similar question/issues

6 Likes

What level of debug information are you using? It it made less bad if you use a lower level?

Exponential type names also led me to this iterator change:

1 Like

I'm using the default config in debug mode compile, disable the unwind when panicing.

I did more experiments on a project:

If I enable more optimization in debug build, the pdb size and executable size will reduce a lot start from o1. but the all_name_size/pdb_size ratio stays same.

config:debug pdb:928mb(59% name) exe:51mb

config:o1+debug pdb:605mb(%62 name) exe:22mb

config:o2+debug pdb:596mb(%58 name) exe:25mb

config:o3+debug pdb:570mb(%59 name) exe:20mb

If the name size larger than 1000 byte considered large name(5-10 lines in typical console window). Then for that 928mb pdb, all large name takes 370mb(40%), all name is 540mb(59%).

I also print the name size histogram of the large names:

# Number of samples = 78616
# Min = 1001
# Max = 61439
#
# Mean = 4924.477345578524
# Standard deviation = 6738.761048252779
# Variance = 45410900.465448886
#
# Each โˆŽ is a count of 710
#
 1001 ..  2210 [ 35536 ]: โˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽ
 2210 ..  3419 [ 12313 ]: โˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽ
 3419 ..  4628 [  7925 ]: โˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽโˆŽ
 4628 ..  5837 [  4203 ]: โˆŽโˆŽโˆŽโˆŽโˆŽ
 5837 ..  7046 [  3615 ]: โˆŽโˆŽโˆŽโˆŽโˆŽ
 7046 ..  8255 [  4155 ]: โˆŽโˆŽโˆŽโˆŽโˆŽ
 8255 ..  9464 [  1923 ]: โˆŽโˆŽ
 9464 .. 10673 [  1051 ]: โˆŽ
10673 .. 11882 [  1253 ]: โˆŽ
11882 .. 13091 [  1157 ]: โˆŽ
13091 .. 14300 [   422 ]:
14300 .. 15509 [   750 ]: โˆŽ
15509 .. 16718 [   705 ]:
16718 .. 17927 [    58 ]:
17927 .. 19136 [   241 ]:
19136 .. 20345 [   153 ]:
20345 .. 21554 [   122 ]:
21554 .. 22763 [   234 ]:
22763 .. 23972 [   215 ]:
23972 .. 25181 [   253 ]:
25181 .. 26390 [   377 ]:
26390 .. 27599 [   420 ]:
27599 .. 28808 [   250 ]:
28808 .. 30017 [    37 ]:
30017 .. 31226 [   137 ]:
31226 .. 32435 [    43 ]:
32435 .. 33644 [    59 ]:
33644 .. 34853 [   175 ]:
34853 .. 36062 [   197 ]:
36062 .. 37271 [    95 ]:
37271 .. 38480 [     8 ]:
38480 .. 39689 [    95 ]:
39689 .. 40898 [     9 ]:
40898 .. 42107 [    10 ]:
42107 .. 43316 [     6 ]:
43316 .. 44525 [     4 ]:
44525 .. 45734 [     9 ]:
45734 .. 46943 [     0 ]:
46943 .. 48152 [    24 ]:
48152 .. 49361 [    14 ]:
49361 .. 50570 [     8 ]:
50570 .. 51779 [    13 ]:
51779 .. 52988 [    79 ]:
52988 .. 54197 [    21 ]:
54197 .. 55406 [   141 ]:
55406 .. 56615 [     8 ]:
56615 .. 57824 [    14 ]:
57824 .. 59033 [     3 ]:
59033 .. 60242 [     2 ]:
60242 .. 61451 [    74 ]:
1 Like

I believe this is because even a little bit of optimization ends up doing a lot of inlining which can remove symbols from the binary needing names.