Gecko has opened a bug to disable LTO for builds by default but have pointed out that this comes at a great cost to binary size! I’ve inquired to the specifics and @froydnj provided some useful data. Turns out LTO drops the size of libxul.so from 81MB to 76MB, a hefty reduction in size!
To my knowledge LTO is useful for two things:
- Giving the compiling inlining opportunities that weren’t there before
- Stripping unused functions that otherwise may look “used”
While inlining can reduce code size I wouldn’t expect 5MB of savings from just inlining, so my guess is that the vast lion’s share of savings here comes from stripping unused functions. That in turn, I think, may be a “bug” in rustc!
Today we compile all object files in Rust with the equivalent of -ffunction-sections and -fdata-sections, essentially meaning that everything that takes up space in an object file is placed in its own section. Each of these sections can then be considered independently by the linker via the --gc-sections
argument (or the relevant equivalent for the platform at hand). This works out great for Rust executables and such where --gc-sections
will eliminate lots of unneeded symbols. If you run rustc -C link-dead-code
that’ll inhibit the --gc-sections
argument and you can see the savings!
For example, let’s take a simple fn main() {}
on rustc 1.19.0 (0ade33941 2017-07-17)
. For each of these measurements I ran strip -g
to remove debug information from the executable as well:
command | size |
---|---|
rustc |
396208 |
rustc -C link-dead-code |
1363680 |
rustc -C lto -O |
370632 |
So clearly -C link-dead-code
is doing quite a bit here! Notably -C link-dead-code
should be getting almost all the benefits of -C lto
in terms of size wins. We see here that -C lto
is a little smaller than the vanilla rustc
invocation, but not “we just shaved off 6% of our 81MB library” smaller!
Ok so now we’ve reached the question. If removing dead code at link time works so well for executables, why is Gecko running into this problem? My guess is that it has to do with dynamic libraries. Let’s try to emulate what Gecko is doing with a few commands. Let’s take a small Rust file:
#[no_mangle]
pub fn print_in_rust() {
println!("hello!");
}
and a small C file:
void print_in_rust(void);
void foo(void) {
print_in_rust();
}
Here we’ll compile the Rust code as a staticlib
and then compile everything into a shared library to get the sizes (again stripping debuginfo as before)
rustc args | gcc args | size |
---|---|---|
--crate-type staticlib |
-fPIC |
1437272 |
--crate-type staticlib |
-fPIC -Wl,--gc-sections |
1437096 |
--crate-type staticlib -C lto -O |
-fPIC |
163312 |
--crate-type staticlib -C lto -O |
-fPIC -Wl,--gc-sections |
163152 |
Whoa! Here we see that --gc-sections
isn’t doing anything! What’s happening here is that -C lto
is the only way to get size savings, not through the usage of --gc-sections
. The more we dig here the more we get into “how linkers work”, and this is no exception! Let’s go through what’s happening here:
- When creating a shared library, the linker initially export symbols given in object files on the command line. In this case this is our C file with the symbol
foo
- The symbol
foo
references the symbolprint_in_rust
, so the linker’s gotta fine that. - We’ve also provided
libfoo.a
to the linker (the Rust file compiled as a staticlib). Turns out there’s an object file in this archive with the symbolprint_in_rust
. - The
print_in_rust
symbol, however, transitively references tons of symbols in the standard library (e.g. I/O printing functions). The standard library, however, is also inlibfoo.a
(that’s how staticlibs work). - The linker now loads the one object file for the standard library.
At this point we’ve loaded (essentially) three object files, the C file with foo
, the Rust crate with print_in_rust
, and the Rust standard library. Now the standard library has tons of symbols we’re not going to ever use, like all those float parsing functions and such. We want the linker to strip all that out! Unfortunately, though, once the object file is loaded, the linker realizes it’s creating a shared library and all these symbols in the standard library are exported, so none of them can be stripped!
When we were creating an executable the linker knows that we don’t actually need to export anything. Despite the standard library having exported symbols, none of them are called “main” literally so they’re all still candidates to get GC’d, and as we saw above they do indeed get GC’d. In the case of a shared library, however, the linker doesn’t know that we don’t want to export the symbols in the standard library, so it ends up including all of them!
Ok so that’s a bit of a long-winded explanation of why I think that Gecko is seeing such massives wins from LTO. Basically the linker doesn’t know that it can strip tons of symbols from Rust and none of those symbols should actually be exported from the standard library. There’s even longer and more detailed explanations for why these symbols are exported from the standard library, but that’s perhaps a topic for another time!
I wanted to post this and ask others if they’ve encountered the same problem. We mitigate this problem with the cdylib
crate type by passing a whitelist of symbols to export to the linker (so it knows to gc all unused libstd symbols). When you’re not producing dylibs through rustc
, however, this can be difficult to do as you’ve gotta maintain a list of symbols separate from your source code.
Have you run into this problem when integrating Rust into external libraries? Did you solve it without LTO? Curious to hear thoughts and suggestions!