Rust staticlibs and optimizing for size


#1

Gecko has opened a bug to disable LTO for builds by default but have pointed out that this comes at a great cost to binary size! I’ve inquired to the specifics and @froydnj provided some useful data. Turns out LTO drops the size of libxul.so from 81MB to 76MB, a hefty reduction in size!

To my knowledge LTO is useful for two things:

  • Giving the compiling inlining opportunities that weren’t there before
  • Stripping unused functions that otherwise may look “used”

While inlining can reduce code size I wouldn’t expect 5MB of savings from just inlining, so my guess is that the vast lion’s share of savings here comes from stripping unused functions. That in turn, I think, may be a “bug” in rustc!

Today we compile all object files in Rust with the equivalent of -ffunction-sections and -fdata-sections, essentially meaning that everything that takes up space in an object file is placed in its own section. Each of these sections can then be considered independently by the linker via the --gc-sections argument (or the relevant equivalent for the platform at hand). This works out great for Rust executables and such where --gc-sections will eliminate lots of unneeded symbols. If you run rustc -C link-dead-code that’ll inhibit the --gc-sections argument and you can see the savings!

For example, let’s take a simple fn main() {} on rustc 1.19.0 (0ade33941 2017-07-17). For each of these measurements I ran strip -g to remove debug information from the executable as well:

command size
rustc 396208
rustc -C link-dead-code 1363680
rustc -C lto -O 370632

So clearly -C link-dead-code is doing quite a bit here! Notably -C link-dead-code should be getting almost all the benefits of -C lto in terms of size wins. We see here that -C lto is a little smaller than the vanilla rustc invocation, but not “we just shaved off 6% of our 81MB library” smaller!

Ok so now we’ve reached the question. If removing dead code at link time works so well for executables, why is Gecko running into this problem? My guess is that it has to do with dynamic libraries. Let’s try to emulate what Gecko is doing with a few commands. Let’s take a small Rust file:

#[no_mangle]
pub fn print_in_rust() {
    println!("hello!");
}

and a small C file:

void print_in_rust(void);

void foo(void) {
    print_in_rust();
}

Here we’ll compile the Rust code as a staticlib and then compile everything into a shared library to get the sizes (again stripping debuginfo as before)

rustc args gcc args size
--crate-type staticlib -fPIC 1437272
--crate-type staticlib -fPIC -Wl,--gc-sections 1437096
--crate-type staticlib -C lto -O -fPIC 163312
--crate-type staticlib -C lto -O -fPIC -Wl,--gc-sections 163152

Whoa! Here we see that --gc-sections isn’t doing anything! What’s happening here is that -C lto is the only way to get size savings, not through the usage of --gc-sections. The more we dig here the more we get into “how linkers work”, and this is no exception! Let’s go through what’s happening here:

  • When creating a shared library, the linker initially export symbols given in object files on the command line. In this case this is our C file with the symbol foo
  • The symbol foo references the symbol print_in_rust, so the linker’s gotta fine that.
  • We’ve also provided libfoo.a to the linker (the Rust file compiled as a staticlib). Turns out there’s an object file in this archive with the symbol print_in_rust.
  • The print_in_rust symbol, however, transitively references tons of symbols in the standard library (e.g. I/O printing functions). The standard library, however, is also in libfoo.a (that’s how staticlibs work).
  • The linker now loads the one object file for the standard library.

At this point we’ve loaded (essentially) three object files, the C file with foo, the Rust crate with print_in_rust, and the Rust standard library. Now the standard library has tons of symbols we’re not going to ever use, like all those float parsing functions and such. We want the linker to strip all that out! Unfortunately, though, once the object file is loaded, the linker realizes it’s creating a shared library and all these symbols in the standard library are exported, so none of them can be stripped!

When we were creating an executable the linker knows that we don’t actually need to export anything. Despite the standard library having exported symbols, none of them are called “main” literally so they’re all still candidates to get GC’d, and as we saw above they do indeed get GC’d. In the case of a shared library, however, the linker doesn’t know that we don’t want to export the symbols in the standard library, so it ends up including all of them!


Ok so that’s a bit of a long-winded explanation of why I think that Gecko is seeing such massives wins from LTO. Basically the linker doesn’t know that it can strip tons of symbols from Rust and none of those symbols should actually be exported from the standard library. There’s even longer and more detailed explanations for why these symbols are exported from the standard library, but that’s perhaps a topic for another time!

I wanted to post this and ask others if they’ve encountered the same problem. We mitigate this problem with the cdylib crate type by passing a whitelist of symbols to export to the linker (so it knows to gc all unused libstd symbols). When you’re not producing dylibs through rustc, however, this can be difficult to do as you’ve gotta maintain a list of symbols separate from your source code.

Have you run into this problem when integrating Rust into external libraries? Did you solve it without LTO? Curious to hear thoughts and suggestions!


#2

Is ThinLTO a possible solution of the long compile times of Rust code?

https://clang.llvm.org/docs/ThinLTO.html


#3

Perhaps! I suspect though that due to the architecture of ThinLTO it would not have the same code-size benefits of LTO. With ThinLTO I believe the main benefit is the same level of performance of an LTO build without the huge amount of time LTO takes today


#4

staticlibs are explicitly described in the Rust Reference as being suitable for linking into external applications. Would it be feasible to declare the whitelist of roots as all symbols that are extern "C" (or maybe those + all symbols that have their linkage names set explicitly by Rust code), and then internalize everything not reachable from those symbols, so the linker’s dead code elimination can potentially eliminate all the bits of std that aren’t used?


#5

Don’t you need to do that anyway? With complicated dynamic libraries written in C, it’s the only way I know not to go nuts trying to keep your symbol versioning correct.


#6

Heh indeed! I even wrote that oh-so-long-ago!

Unfortunately though we actually already do what you mention! We try to internalize everything that’s not a #[no_mangle] pub extern ... style thing. This is what actually happens with -Clto (we guarantee everything is internalized) and this is also what happens when you produce a cdylib (we tell the linker to only export a few symbols). With a staticlib, however, we don’t even call the linker. The final object file has properly internalized symbols but all the upstream dependencies didn’t necessarily know they were going to become part of a staticlib (namely, libstd didn’t know that).

As a result we (afaik) don’t have a method of retroactively changing the symbol visibility in an object file. Historically we’ve tried ld -r which takes a bunch of object files and creates a new object file (in theory) but this doesn’t work on MSVC and it’s been buggy historically (for reasons I now forget).

I wonder though if there’s perhaps a way to compile everything with hidden visibility by default, and then use the linker to raise that visibility? The only case that I know of that we need to handle is that when we create libstd.so (just for rustc itself) we need everything exported, but we know precisely what we need exported so we may be able to instruct the linker to raise symbol visibility if it can do it? (not sure if it can do this)

Maybe! It looks like Gecko at least is a project not doing this?


#7

I think this is already possible on ELF-based systems, but I’m not 100% sure; it might be that you have to annotate the functions you want exported when you initially create the object file. (The linker feature that might be able to do this is called “version scripts.”) I don’t know about Mach-O or PE.

Probably because libxul isn’t considered to have a stable ABI so they didn’t need it for ABI versioning? It might be worth doing just for the size reduction, though.


#8

Maybe! It looks like Gecko at least is a project not doing this?

We use a complicated mix of #pragma GCC visibility push(hidden) or -fvisibility=hidden, generated wrappers for system header files that restore default visibility, and __attribute__ ((visibility ("default"))) for symbols we explicitly want to export. AFAIK, Rust has no equivalent to these–things are either public from a crate or they’re not.

The tricky thing here is that we are going to create plenty of #[no_mangle] functions in our Rust code so that we can call it from Gecko’s C code, but those APIs aren’t meant to be public from the final binary. I suspect for our purposes an equivalent to -fvisibility=hidden would be all we’d need for now, since we’re unlikely to want to actually export public APIs written in Rust from libxul. (It’s possible we might want that in the future, but we could probably work around it at that point, we don’t have that many exported APIs.)


#9

Yeah Rust so far has mostly not exposed linkage goop at the source level, mostly opting to infer ‘hopefully what you intended’ as much as possible. This works most of the time but as this issue shows clearly not all the time! I don’t think it’d really help though because we don’t actually want to tag the standard library as a bunch of hidden symbols. Rather we want libstd.so to have a bunch of exports and libstd.rlib to sometimes have hidden symbols and sometimes not!

That being said, I think there’s a clear amount of inference rustc could be doing here, given the proper tools in the linker and such. For example:

  • non-pub symbols (generally) should be “internal” in the sense that they’re not exported at all, this is the rough equivalent of static linkage in C
  • pub Rust-based symbols should be one of two linkages. When compiled to a dylib they need to be fully exported, but everywhere else they need to be hidden visibility. Currently they’re fully exported everywhere but conveniently gc’d for executables and manually hidden with cdylib crate types.
  • #[no_mangle] pub extern Rust-based symbols probably want to always be exported no matter what. You’d have to manually hide them from libxul.so if you wanted them not exported, but for cdylib crate types and in general transforming a staticlib to a dynamic library they need to be exported!

In that sense I think we need to solve one problem, which to put it crisply is to make “public Rust-only symbols” hidden linkage somtimes but not all the time. When we generate a staticlib they should have all hidden visibility (not exported from shared libraries by default) and when generating a dylib crate type they’re exported.

Unfortunately I don’t actually know how to do this :(. We achieve this property with cdylib by passing a whitelist of exported symbols, effectively making all the other symbols “hidden”. For staticlib outputs though we don’t control the linke so it’s not up to us what happens here.


One thing that I remembered recently as well is that I believe this is a non-Windows problem. On Windows you have to opt-in to export a symbol from a shared library, and Rust never does that at the layer of object files (we pass explicit whitelists to the linker of what symbols to export) which means that everything is gc’d as you’d expect. As a result, the example of “taking a staticlib to a dynamic library” I believe works precisely as expected on Windows, which is to say that the unused parts of the standard library are gc’d appropriately.


#10

Measurements of xul.dll on 64-bit Windows show that LTO only wins by 400k vs. non-LTO, or ~0.6%. That’s still a pretty decent size difference, though I think it shows that your hypothesis is correct!


#11

Oh yeah I don’t mean to say that LTO doesn’t have size wins for sure! Size benefits on the order of 400k sound like what I would expect from LTO in terms of wins from actual optimizations and removal of various functions.


#12

MIR-only rlibs will someday solve this problem by not giving the linker anything to accidentally pull in.