Blazing Fast Unlinking

State of the art

I believe I surprise no-one when mentioning that Rust is slow to build.

Efforts in improving Rust compilation times abound:

  • Parallel front-end,
  • Incremental builds,
  • Cranelift,
  • ...

This post is about linking. The recent switch to lld by default (on Linux) has helped, but still, linking a static binary is not fast. In fact, with incremental builds, it often dominates build times entirely.

There are alternative linkers:

  • mold (or sold) may help, depending on the platform.
  • David Lattimore aims to build an incremental linker.

Yet, it seems that one solution remains unexplored here: dynamic linking, or how to avoid painfully slow link times altogether.

Idea

The benefit of static binaries should not be underestimated. The ability to just copy/paste one single binary across hosts, and have it work, is gold. It's unclear how often such binaries are moved, but it is clear that there are cargo commands during which they are not moved: cargo test and cargo run.

A simple idea to bypass link time pains thus arises:

  • Add a new flag, --placeholder, to cargo build. When used, library dependencies are compiled as dynamic libraries, instead of static ones, and the (eventual) binary uses rpath, or appropriate platform equivalent, to reference their location so no LD_LIBRARY_PATH trick is necessary.
  • Have cargo test and cargo run default to dynamic linking.

Folks who invoke cargo build manually, without specifying the flag, will still build statically linked binaries, and may move them like they are used. Folks who use cargo test and cargo run will benefit from instant link times.

Q&A

But Rust doesn't have a stable ABI!

A stable ABI is only necessary when building against a version V with compiler C and set of flags F then attempting to link with a version V', built with a compiler C', and set of flags F'.

This usecase would remain unsupported (no guarantees), but is of no consequence to this proposal. In our case, cargo build --placeholder will build an up-to-date version of each dynamically linked dependency, with the same compiler & flags as the binary.

The very architecture which guarantees incremental builds work (same compiler, flags, deps, etc...) will underpin the stability of this arrangement.

Can I call the binary myself?

Yes, of course. Indeed, by using rpath, the loader will know where to look for the dynamic libraries (where they were build) of a binary without any further hint.

The only restriction is that such a binary is no longer self-contained, so it cannot easily be moved to another host -- though with absolute rpaths it could be moved around on the same host -- and that the dynamic libraries may be removed, leaving the loader dangling.

How are generic and inline methods handled?

Just like today. This means they may requiring codegen work in the final binary, even when dynamic linking the dependency. Thus a dependency with a heavily generic interface (and a slim or no non-generic core) would not benefit much. All other dependencies will, however, so hopefully the speed-up will still be notable on most projects.

And what if ...?

Please ask questions. I don't promise to have all the answers -- I'm not a specialist of the question -- but if we come all together, we'll figure them out!

3 Likes

I do agree that utilizing dynamic linking by default for debug builds could be an interesting avenue for tightening the edit/test cycle.

However, I find it reasonably important that cargo build $flags && cargo run $flags only builds once. So rather than change run/test, change the dev profile to prefer dynamic linking between crates.

How does this differ from -Cprefer-dynamic?

8 Likes

Bevy has the dynamic_linking feature which dynamically links the entirety of Bevy to your program. Currently only a single crate can safely use this hack at a time though. If two dylibs include a common crate rustc will refuse to use both dylibs as dependency of your project. Cargo has to be involved to ensure this common dependency is dynamically linked itself.

That only makes rustc prefer a dylib over an rlib. It doesn't force any dependencies to be compiled as dylibs.

I think all non-local dependencies should be compiled into a single dylib to reduce overhead of using many dylibs. However this may get tricky with crates being allowed to not use all of their dependencies.

1 Like

Ah, okay. I'm not super aware of how static and dynamic library artifacts differ in practice beyond how they're used.

If we're treating this as an optimization over full static linking for incremental development builds, then it shouldn't be that big of a deal. Relinking the dependency artifact bundle when dependency edges change shouldn't be meaningfully more work than would be done linking them directly into the final artifact.

Somewhat more interesting is workspaces with multiple binary targets, especially if they have different dependency trees. Linking the dependency bundle separately from the local crates should still be faster for rebuilds, but might share less than might be otherwise hoped.

1 Like

I thought about this originally, but this doesn't scale as well.

Let's take a "common" environment:

  • A workspace for the project.
  • With 50 crates.
  • A set of 3rd dependencies, not all of which are used in all crates.
  • And local creates depending on other local crates in a tree.

And imagine that you want to run cargo test.

If you build one dylib per library, you can share many of the dylibs across the 50 cargo test runs and 50 cargo doctest runs.

However, if you build a single "fat" dylib per target binary, you may have to build 50 dylibs, each including most of the dependencies, and once again it'll take forever to build them

So, in the end, I believe building 1 dylib for each static lib that would have been built is likely to be fastest, on top of being simplest. There'll be a slight load time overhead, but it should be well below the cost of relinking.

Me neither, unfortunately, so thanks for bringing the flag to my attention, and thanks @bjorn3 for clarifying :slight_smile:

Cargo doesn't know which dependency edges exist until after compilation, but it needs to know whether to build dependencies as rlib or as dylib before that.

The issue is that adding extra dependencies may actually cause the build to fail. For example if two both define a #[global_allocator] or both define the same unmangled symbol name.

2 Likes

It is more complex, but what about grouping them: Often you have 2 crates with shared dependencies, which are not used by others in the workspace:

Example Crate A and B depend on X,Y,Z while A and C additionally depend on M,N,O and B and D additionally depend on W.

Instead of building one dlib with X,Y,Z,M,N,O,W cargo could build the following:

  • X,Y,Z (needed by A and B)
  • M,N,O (needed by A and C)
  • W (needed by B and D)

Thus you minimize the number of dlibs without requiring to build M,N,O or W when building crate B.

If X and M now depend on another crate (T) you would probably have to compile T in its own dlib in addition to that, so I'm not sure how big the difference to one dlib per crate would be in reality.

This is a very ELF-centric solution. On Windows, one is left with PATH environment supplements (or binary-patching hacks which embed full paths in DLL name entries; the linker won't help you here and pins a build to a specific machine to boot; no $ORIGIN here). On macOS, using @rpatah/ forces use of rpath; there's no "but also look in the system paths" fallback. But it is inherited, so one can just make a list of rpaths in the executable and be fine. Modern ELF no longer inherits, so any dynamic library needs a set of rpaths for all of its dependents and cannot rely on the loading executable to prepare a set of paths (but $ORIGIN exists and is probably sufficient here).

Dynamic linking is indeed nicer for things like this, but a lot of platform-specific-detail demons start poking their noses out when it is used.

1 Like

Were you still replying to CAD97's bundle solution? Just converting existing static libs into dylibs shouldn't add any dependency.

I'd treat the number of dylibs as an implementation detail, and propose we start with blindly making each static lib a dylib then move on from there.

We're mostly talking about local builds and local interactions, I wouldn't worry about optimizing load times from the start. Let's go simple, measure, and iterate if it's really terrible.

Indeed. I've never developed for Windows, or MacOS, or iOS, or Android, etc... so I have no idea what's available in those scenarios. I do hope the idea could be adapted, even if it means the exact restrictions change from one platform to another.

The primary goal should be to have cargo test and cargo run work, and those can tweak environment variables, etc... when launching a binary. Making it easy for a human to launch the binary themselves and have it work is a somewhat secondary goal. It's ideal if it works, but there's still substantial benefits even if it doesn't.

Yes. So let's start with ELF-only if that's what it takes, then incrementally work on similar solutions for other platforms.

Just like PGO, BOLT, lld, etc... started on Linux before moving out to other platforms.

4 Likes

On macOS you can use @executable_path, which is similar to ELF's $ORIGIN (i.e. it expands to the directory containing the main executable). There is also @loader_path, which expands to the directory containing the executable/dylib that specified the dependency. Both of these can be specified either as part of the path that an executable/dylib uses to reference a dylib (i.e. in the LC_LOAD_DYLIB command), or within an rpath entry for a second level of indirection. So macOS is pretty flexible and should be able to replicate whatever approach is adopted for ELF.

I don't know anything about how Windows handles this, though.

1 Like

--placeholder is an unhelpful name. something like --dynamic-deps would be much more descriptive while not requiring much more typing.

1 Like

I'm aware of those. The thing is that if @rpath/ is how the library reference is stored, the path must be in an rpath entry (though macOS can use rpaths of loading libraries, so an executable can just say @executable_path/../Libraries and you're probably fine. However, if you don't use @rpath/, the rpath entries are useless.

The answer is, basically, one of:

  • use @rpath/ everywhere and hope you can flow rpath entries to all of the link lines that you care about
  • use @executable_path or @loader_path library ids and don't worry about rpaths

What I do to get the latter is:

  1. all builds use absolute path library ids. This makes sure that builds work, tests run, etc. without having to figure out how to flow rpath entries to umpteen build systems (this project isn't Rust)
  2. after all projects are installed, have a post-processing step that finds libraries, computes relocatable package-relative library ids and use install_name_tool to edit everything (this does require -headerpad_max_install_names, but that is a static flag to inject which is far easier)

Anyways, off topic for here. Yes, on macOS is possible, but it is not nearly as simple as ELF's "look here too" rpath mechanism.

Further entrenching this approach would likely run counter to efforts by testing-devex to stabilize a generic solution for test discovery, see Global Registration (a kind of pre-rfc)

While filling a different role than this thread, I've wondered about a more formal approach to dynamic linking in Rust. In the C++ world, you have headers-only libraries and dynamic libraries. Our static linking fills a role similar to headers-only libraries but we don't have something more akin to dynamic libraries. I could see us allowing a dependency sub-tree to be "independent" with its own lockfile and feature unification and could be built as a dynamic library for development builds (at least). By having its own lockfile, we'd also get better cross-project caching results with per-user caching.

1 Like

If added, this should be enabled by a compiler flag/cargo profile option, not by default. There is already plenty of code out there that assumes you can just grab a binary from target/ and run it wherever, both for dev and release builds. We do that in CI. If the defaults change, at the very least it should happen over an edition, but patching CI scripts can be a hassle.

And I second @CAD97 in that I expect cargo build && cargo run to build only once. This shouldn't be an issue if dyn linking is enabled via a profile option.

I worry that this addition could cause a de facto stabilization of Rust's dynamic linking ABI, as well as cause people to pin rustc version.

Ok, maybe that last part is unfounded. After all, existing static linking builds didn't cause people to hackily link Rust object files statically.

Would it?

#[test] are not, today, collected amongst the dependencies of a crate, only within the crate itself, as far as I know. I only skimmed the proposal so I may be missing something, but for now it seems completely orthogonal to me.

I wouldn't even try.

There is a single statically compiled language+toolchain combination which successfully combines generics & dynamic linking: it's called Swift, and it comes with an entire spectrum of ABI guarantees, and multiple representations of objects & generics depending on the selected ABI guarantees.

This is a multi man-years project.

Which is precisely because this proposal here doesn't even attempt to formally define an ABI for dynamic linking, and doesn't even attempt to offer a guarantee that the dynamic libraries so created could be switched with other versions.

In a sense, this proposal treats dynamic linking as an implementation detail rather than a feature of its own.

Do note that this proposal does NOT propose to change the default for cargo build, only for cargo test and cargo run, which compile a binary and immediately run it in place.

Do people use cargo test to compile & run the binary in place, then move the binary and run it elsewhere? I would naively expect that if the goal is to move the binary, one would use cargo build --test to build it.

I'm not necessarily adverse to using a cargo profile option, but I am adverse to everyone having slow tests unless they specifically enable this option -- it's a bad default experience -- and I am adverse to the idea of having to wait 2-3 years to switch the default with edition 2027, and keeping the bad default experience all this time.

I wonder if there's a way to turn a static library into a dynamic one.

If this is possible, then cargo build would build the static library, and cargo run would just need to convert this static library into a dynamic one, not doing a full rebuild.

I think this would be an acceptable trade-off, no?

Depending how it's done, perhaps. If just taking the current static library and making it dynamic, with the symbols embedding a hash of toolchain version + flags + source code, it would be unusable for any other purpose, so I don't think an issue would occur.

1 Like

Maybe. But it will likely still take some time, and I often use cargo run to benchmark binary execution time.

I'd say my real problem is that --out-dir option isn't stable, so I can't just build artifacts and run them directly. Hardcoding paths in target/ is poor experience and interacts poorly with builds for non-default targets and debug/release switch. If I could set --out-dir, or at least get the path to binary directly from cargo, I probably wouldn't need this two-step process.

I don't see how switching a single option in Cargo.toml is that much of a burden. People already do much more complex things to optimize their builds. It could even be automatically set when creating a project with cargo new, and in existing projects changing this option can cause breakage.

Even in this case, I expect people to be confused when they try to move the resulting binary and it doesn't work. Rust is widely known to do static builds and not even support dynamic linking (except via FFI). And you certainly can't expect people to keep track of all release notes. I don't know what a smooth transition to a different default would look like, but it's certainly not just "let's change it overnight for everybody".

2 Likes

The point of global registration is making this available to other usecases as well, many of which care about collecting registrations from multiple crates.

1 Like

That is for #[test] today. With the design, we're trying to keep in mind the registration of fixtures which may come from dependencies.

Like with this, ABI doesn't have to be a part of it.

Well, there would be work-around -- building with dynamic dependencies or running without -- but it would be something to remember indeed.

Ah, I definitely missed this. I'll be rooting for global registration, I've got a log library which would definitely appreciate not having to muck with .ctor sections just to gather log statements metadata on start-up.

With that said, I don't see any conflict with the particular proposal at hand. This proposal is about using a dynamic library as an optimization over using a static library, the items can still be enumerated directly in the binary from compiler metadata in the worst case.

No, the current plan is to specifically design-out dynamic libraries because of challenges with them.