Rustc can be "tricked" into generating exponentially large debug info

After reading this topic on the D forum, I was wondering how rustc would behave when prodded in the right places. I present the following program:

#[derive(Copy, Clone, Debug)]
struct S<T, U, V>(T, U, V);

fn f<T: Copy>(t: T) -> S<T, T, T> { S(t, t, t) }

fn main() {
    let val = f(f(f(f(f(f(f(f(f(f(f(5)))))))))));
    println!("{:?}", ((((((((((val.0).0).0).0).0).0).0).0).0).0).0);
}

This program, when compiled with rustc -g --emit asm explosion.rs produces a 10MB assembly file, most of which consists of generated function and type names. One of the shorter types looks like this:

"_ZN9explosion102f<explosion::S<explosion::S<i32, i32, i32>, explosion::S<i32, i32, i32>, explosion::S<i32, i32, i32>>>E"

When I compile with rustc -g --emit obj explosion.rs I hit an assertion:

Assertion failed: isIntN(Size * 8 + 1, Value) && "Value does not fit in the Fixup field", file C:\bot\slave\nightly-dist-rustc-win-msvc-64\build\src\llvm\lib\Target\X86\MCTargetDesc\X86AsmBackend.cpp, line 115

This is on Windows, I’m not sure how it behaves on linux.

Should this behavior (generating exponentially large debug symbols) be considered a bug? If so, what should be done about it?

I wouldn’t call that being tricked - if you write a program involving exponentially large types, it seems to me like you’d expect debuginfo to be exponentially large. The LLVM assert isn’t great though - sounds like there are some checks missing on the rustc side.

With types like PhantomData it can easily be done without making the actual type take an exponential amount of space, e.g. the following program, when compiled with rustc -g main.rs, produces a binary of about 648MB on my machine (it also takes a long time to compile, and uses a lot of memory while doing so), even though different instances of S are never larger than a byte.

use std::marker::PhantomData;

#[derive(Copy, Clone)]
struct S<T, U, V>{
    data: T,
    pd1: PhantomData<U>,
    pd2: PhantomData<V>
}

impl<T,U,V> S<T,U,V> {
    fn new(data: T) -> S<T, U, V> {
        S { data: data, pd1: PhantomData, pd2: PhantomData }
    }
    fn compound(self) -> S<T, S<T,U,V>, S<T,U,V>> {
        S::<T, S<T,U,V>, S<T,U,V>>::new(self.data)
    }
}

fn main() {
    let val = S::<u8, (), ()>::new(5);
    // Season to taste
    let val = val
        .compound().compound().compound().compound().compound().compound().compound()
        .compound().compound().compound().compound();
    println!("{}", val.data);
}

My question is: is it really desirable to have debug symbols that are more than a few kB in length? Because no human is going to be able to use those symbols anyway.

PhantomData/the actual runtime size of the types seems somewhat irrelevant to me: the types themselves contain an exponential amount of information. Of course, you are correct that most humans are unlikely to want to actually read every part of such a large type, but I could imagine tooling wanting to know non-corrupted details about types/functions.

Could you explain to the lambda user how these types cause an exponential blow-up?

I mean, if we look at @Thiez’s latest example, there are only a very limited number of types:

  • type S0 = S<u8, (), ()>;
  • type S1 = S<u8, S0, S0>;
  • type S11 = S<u8, S10, S10>;

That’s only ~12 different types, all in all, so I suspect that somehow the Debug information is completely inlined and does not use aliases.

Am I correct? Would it be possible to use aliases?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.