Good point.
Such a test corpus should not be too hard to get by.
I personally would like to stick to A-Za-z0-9_. Others in this thread have expressed a preference of allowing symbols to be UTF-8.
This is a tricky one. Here’s are some examples from the reference implementation test cases:
-
_RN7std_xxx3fooITNS_3BarES1_ES2_EE:
std[xxx]::foo<(std[xxx]::Bar,std[xxx]::Bar),(std[xxx]::Bar,std[xxx]::Bar)>
-
_RN7std_xxx3fooFINMNS_4QUUXE3barINS0_3BARSEEEEE:
std[xxx]::foo<std[xxx]::QUUX::bar<std[xxx]::foo::BAR>>
-
_RNXlN7foo_xxx3BarIxEE4quuxF1_Cs_IcEE:
<i32 as foo[xxx]::Bar<i64>>::quux::{closure}'2<char>
As soon as definitions get nested, it’s almost impossible to demangle the name in your head. And compression makes it worse still. So I’d say anything Itanium-based is past human readable except for simple cases. Although one can still glean some useful information from a mangled - which might be all we are interested here.
I guess a good next step would be to collect a test corpus of symbol names for trying out compression schemes.