Rustc can be "tricked" into generating exponentially large debug info


#1

After reading this topic on the D forum, I was wondering how rustc would behave when prodded in the right places. I present the following program:

#[derive(Copy, Clone, Debug)]
struct S<T, U, V>(T, U, V);

fn f<T: Copy>(t: T) -> S<T, T, T> { S(t, t, t) }

fn main() {
    let val = f(f(f(f(f(f(f(f(f(f(f(5)))))))))));
    println!("{:?}", ((((((((((val.0).0).0).0).0).0).0).0).0).0).0);
}

This program, when compiled with rustc -g --emit asm explosion.rs produces a 10MB assembly file, most of which consists of generated function and type names. One of the shorter types looks like this:

"_ZN9explosion102f<explosion::S<explosion::S<i32, i32, i32>, explosion::S<i32, i32, i32>, explosion::S<i32, i32, i32>>>E"

When I compile with rustc -g --emit obj explosion.rs I hit an assertion:

Assertion failed: isIntN(Size * 8 + 1, Value) && "Value does not fit in the Fixup field", file C:\bot\slave\nightly-dist-rustc-win-msvc-64\build\src\llvm\lib\Target\X86\MCTargetDesc\X86AsmBackend.cpp, line 115

This is on Windows, I’m not sure how it behaves on linux.

Should this behavior (generating exponentially large debug symbols) be considered a bug? If so, what should be done about it?


#2

I wouldn’t call that being tricked - if you write a program involving exponentially large types, it seems to me like you’d expect debuginfo to be exponentially large. The LLVM assert isn’t great though - sounds like there are some checks missing on the rustc side.


#3

With types like PhantomData it can easily be done without making the actual type take an exponential amount of space, e.g. the following program, when compiled with rustc -g main.rs, produces a binary of about 648MB on my machine (it also takes a long time to compile, and uses a lot of memory while doing so), even though different instances of S are never larger than a byte.

use std::marker::PhantomData;

#[derive(Copy, Clone)]
struct S<T, U, V>{
    data: T,
    pd1: PhantomData<U>,
    pd2: PhantomData<V>
}

impl<T,U,V> S<T,U,V> {
    fn new(data: T) -> S<T, U, V> {
        S { data: data, pd1: PhantomData, pd2: PhantomData }
    }
    fn compound(self) -> S<T, S<T,U,V>, S<T,U,V>> {
        S::<T, S<T,U,V>, S<T,U,V>>::new(self.data)
    }
}

fn main() {
    let val = S::<u8, (), ()>::new(5);
    // Season to taste
    let val = val
        .compound().compound().compound().compound().compound().compound().compound()
        .compound().compound().compound().compound();
    println!("{}", val.data);
}

My question is: is it really desirable to have debug symbols that are more than a few kB in length? Because no human is going to be able to use those symbols anyway.


#4

PhantomData/the actual runtime size of the types seems somewhat irrelevant to me: the types themselves contain an exponential amount of information. Of course, you are correct that most humans are unlikely to want to actually read every part of such a large type, but I could imagine tooling wanting to know non-corrupted details about types/functions.


#5

Could you explain to the lambda user how these types cause an exponential blow-up?

I mean, if we look at @Thiez’s latest example, there are only a very limited number of types:

  • type S0 = S<u8, (), ()>;
  • type S1 = S<u8, S0, S0>;
  • type S11 = S<u8, S10, S10>;

That’s only ~12 different types, all in all, so I suspect that somehow the Debug information is completely inlined and does not use aliases.

Am I correct? Would it be possible to use aliases?