Blazing Fast Unlinking

Vorpal · June 30, 2024, 7:51am

Speaking as a rust user, I'd rather have dynamic libraries than global registration if I have to choose. Dynamic linking is waay more useful to me, even without a stable ABI.

binarycat · June 30, 2024, 7:56pm

but if you're doing cargo build && cargo run, then you're not building a library, you're building an executible.

even if it was possible to compile a statically linked executible, then turn it into a dynamically linked one, this would always be slower.

additionally, this transformation would probably result in a different binary than if you just compiled directly to a dynamic library.

it's going to be stuck behind a nightly flag at first anyways, so it's probably gonna be implemented as profile configuration option.

Vorpal · June 30, 2024, 8:01pm

@epage even if you want to do global registration before dynamic linking it is important to not design yourself into a corner so that dynamic linking never becomes possible.

Of the two features, dynamic linking is the more important feature by far to some users (such as me). I'm sure the reverse is true for others. Some probably consider them equally important.

So "designing out dynamic linking" sounds downright dangerous to me, and I'm deeply concerned about this.

CAD97 · June 30, 2024, 10:21pm

Then you should already be complaining about #[global_allocator] which has the exact same issue. I fully expect that global registration would have the same relationship with dynamic linking as the global allocator, since it's fundamentally the same problem, just 1 versus N objects.

If they aren't used across dylib borders, there is of course no issue with dynamic linking. (But this isn't on topic here so open a new thread if you want to discuss this further.)

Last I saw, anyway, the intent was to expose an iterator rather than a slice directly, which would be able to support dynamic linking at minor cost per participating dylib. (See the existing other thread.)

zackw · July 1, 2024, 2:33pm

The global allocator is always going to be a special case, since lower-level platform libraries may impose a hard requirement of one allocator per process regardless of what Rust does. (It is very common, for instance, for Unix C libraries to assume that nobody but themselves will call sbrk with a nonzero argument. Yes, even today.)

bjorn3 · July 1, 2024, 3:00pm

You can have multiple rust global allocators in a single process already if you are using cdylib. What you can't do is define your own global allocator when depending on libstd as dylib as the global allocator to use is fixed when linking a dylib.

zackw · July 1, 2024, 3:26pm

I confess that I have never understood the difference between a dylib and a cdylib. That said, sure, if Rust's "global allocator" is layered on top of the platform's allocator (usually but not necessarily C malloc), then there's no problem having it not be a process singleton. But what I'm saying is that if Rust's global allocator bypasses or replaces the platform's allocator, then it may be a platform requirement to have it be a process singleton regardless of anything Rust does.

bjorn3 · July 1, 2024, 3:30pm

The difference between dylib and cdylib is whether the resulting dylib is usable as regular rust crate or exclusively exports a C abi. In the former case you are effectively forced to dynamically link libstd, while in the latter case libstd is almost certainly statically linked.

It depends on the specific global allocator what it is layered on top of.

zackw · July 1, 2024, 3:46pm

I think we're saying the same thing in different words at this point.

matthieum · July 1, 2024, 4:13pm

Sorry for the short-hand.

What I meant was that cargo build would prepare static libraries for all (most) dependencies, and cargo run would just have to turn those libraries into dynamic ones.

It depends which 1 vs N you're speaking about.

For example, if we're talking #[global_allocator] vs #[test] then they are completely different:

#[global_allocator]: defining a new global allocator should override the previous one.
#[test]: defining a new test should add to the previous ones.

The latter is completely uncomplicated, I do it today with an intrusive linked list of global variables, and a new library getting linked in would just add itself to the list. The former is quite complicated.

One difficulty with dynamic libraries is libraries that are loaded after start-up. Fortunately, it's not a problem for this proposal.

binarycat · July 1, 2024, 4:26pm

but that's not what cargo build does! cargo build builds all the dependencies, then builds the executable, statically linking the dependencies. the whole point of this proposal is to eliminate the overhead of that static linking.

currently, cargo run is very simple wrapper that calls cargo build then runs the resulting executable. it is equivalent to cargo build && ./target/debug/MYBIN, to the point where some users may use them interchangeably. making cargo run into something other than a trivial wrapper is a very bad idea imo.

Vorpal · July 1, 2024, 5:36pm

Having this simply be a profile flag would solve this issue, without weird magic behaviour depending on which way cargo is invoked. Then I can as a developer make an informed decision about which way I want to do my debug builds based on how I intend to use the build

Just locally from the build tree: Dynamic with rpath is fine on Linux (and possibly OS X?).
Copy to another machine (e.g. when cross compiling and testing on an Aarch64 system): Do a traditional static build. Or if all .so files are in the target directory along with the binary, I can easily extend my scp command to include those files as well.

The weird one (as usual) is Windows, but perhaps we can begin by not supporting this on Windows (as already suggested by someone, can't find it in the thread right now). But I thought Windows would happily load DLLs from the same directory as the executable (as long as they weren't one of the "well known" built in to Windows ones)? If so it should just work there too.

(If we place the shared libraries in the target output directory (which would make sense to me), we need to version them (to support multiple semver incompatible versions of a single crate). This seems like an obvious and uncontroversial result of putting them all in the same directory to me, so just mentioning it for completeness. Here I'm not talking about SO-versions of traditional ELF systems, just put the version in the name part to avoid any strange behaviour.)

binarycat · July 2, 2024, 3:33pm

yes, and i think that is how this should be implemented, glad to be in agreement.

matthieum · July 2, 2024, 5:17pm

We really are talking past each others, it seems.

I'll remind you that this proposal was specifically about speeding up cargo test and cargo run in the common case, hence my comments are focused on those commands.

My point about converting the static libraries created by cargo build into dynamic libraries is thus about saving time in executing subsequent cargo test and cargo run commands

Not for cargo build, no. Not by default at least.

I have specifically left cargo build out of the picture because while cargo test and cargo run create a binary and execute it immediately, hence completely controlling its execution environment, cargo build doesn't and who knows what the user wishes to do with that binary.

I would argue that cargo build --placeholder && ./target/debug/MYBIN is still just a trivial wrapper.

I disagree, for now. Though not too strongly.

I think the default experience matters a lot, and I argue that the default experience means:

cargo build builds a statically linked binary that you can move around as much as you want.
cargo test runs tests as quickly as possible.
cargo run runs the target binary as quickly as possible.

Current users may have evolved different expectations, but habits can be unlearned and relearned, and if we offer them a 10x performance improvement, I'm pretty convinced they'll be happy to do so.

With that being said, profiles would partially solve the issue, because we could stick a default placeholder = true in the test profile, but there's unfortunately no equivalent run profile.

Perhaps it's time cargo gained a run profile and used it for cargo run by default. Perhaps it's not worth modifying cargo run and much would be gained speeding up cargo test already.

I don't know. But I do really think we need to keep "default experience" in mind. Let's not ask people to become gurus just to get decent compile-times.

binarycat · July 2, 2024, 5:42pm

you cannot leave cargo build out of the picture, since that is the command that actually does the building.

sure, but then you're not doing the whole static to dynamic conversion, you're just building a dynamic executable from the start.

you're forgetting the possibility of making this the default for new crates, but keeping the existing default in place for old crates, either through the edition system, or with cargo new.

Vorpal · July 2, 2024, 6:07pm

My issue with this is that I often switch between cargo run and cargo build + manual run. The former is more convenient but spits out extra text (including a bunch of warnings about not yet used functions). The latter avoids that. I don't want to have to rebuild when switching between those two ways of running.

I feel quite strongly (probably a 4.5 out of 5) that cargo build in the default dev profile should produce the same binary cargo run does. I am however quite happy if that is a dynamically linked binary:

I work on Linux (so don't really care about the Windows situation as long as it builds in CI and I don't need to think about it).
I have over a decade of experience in C++ before coming to Rust. I'm no stranger to rpath and other trickery.

That said, having a binary that can't be moved by default might be a barrier to new people with a different background. Especially those that are new to compiled languages or programming in general.

Here we have to weigh the pros and cons of having this on by default or off by default. Or even some hybrid mode like you want by default, as long as there is the ability to change to the way I and @binarycat prefer this to work.

CAD97 · July 2, 2024, 9:44pm

...huh? How is doing more work saving any time? Currently, if you cargo build && cargo run, the run will see that an up-to-date build has already been produced and run that build. Converting the statically linked build into one using some dynamically linked components is doing extra work you don't have to do.

There's at least some merit for cargo test, since the test executables still need to be built and linked.

matthieum · July 3, 2024, 4:33pm

I have the impression that you completely forgot the original proposal, so just for reminder:

It means:

cargo build gains the ability to produce dynamically linked binary, somehow, but does NOT do so by default.
cargo test and cargo run now default to dynamically linked binaries.

The idea of the static to dynamic conversion was to maximize the amount of common work being down between cargo build & cargo build --placeholder. One of the domains of application of the proposal would be game development, where even in Debug it's common to optimize code (sometimes O3 for dependencies and O1 for local code, or some mix) in which case compiling twice with O3 could be quite a time sink.

I'm not sure whether it's worth it, or not. Specifically, I'm not sure whether people would switch between static & dynamic linking often enough to merit it. Hell, I'm not even sure if it's technically possible...

Yes, and no.

It'd definitely help to make it a default for new code, but in the meantime we'd still have a lot of code in the wild which wouldn't default to it, including code by people who are not following Rust news close enough, and won't realize it's available.

I have the opposite opinion.

Mostly because the workflow at work regularly involves building locally and executing remotely -- due to specificities of the target host which cannot be emulated locally -- and thus for my team it's critical that our dev and release builds keep producing statically linked binaries.

On the other hand, we run unit-tests/integration-tests most often (unsurprisingly), and also regularly work on binaries which can be run locally (yeah!) in which cases the faster the better.

This is why a dev or release profile tweak is the least desirable solution for us, whereas a test profile tweak (and run profile tweak if it existed) would definitely be a non-problem.

I can imagine we're not the only ones in such a situation, but I have no idea what are the most common usecases.

It's saving time compared to building each codegen unit from scratch again, yes. The idea of static-to-dynamic conversion is to always compile statically, then convert to dynamic if necessary, hoping that converting to dynamic is much quicker than re-compiling from scratch.

Obviously it's not saving time compared to just always compiling statically or always compiling dynamically.

And I've got no idea if it's worth the complexity, or what the conversion overhead would be (hopefully small, especially due the embarrassingly parallel nature of the problem).

afetisov · July 3, 2024, 5:07pm

The concept of "run profile" makes no sense. Of course you build a binary to run it. What else are you gonna do with it? The dev/release/test/bench profile split differs by the type of binary which is built and its purpose. What's run vs dev/release supposed to mean?

It feels very much like you're overfitting this feature proposal for your specific use case, with relatively little weight given to different use cases.

The difference in options and behaviour between cargo build and cargo test/cargo run is quite annoying, and I'd rather avoid it. First, it's needlessly confusing on its own. I don't want to keep all defaults in the head for all build options which have no business of being different. Second, those are not the only 3 cargo commands in existence. You make a split in them, what the hell should be chosen for all other possible commands? What about cargo bench? What about cargo nextest, or cargo criterion, or cargo afl, or cargo valgrind? You force everyone to make a choice, and whatever they choose, it will be mostly a coin flip, and a burden to remember.

At the moment alternative commands can just passthrough all unknown options to cargo, but what should they do if cargo build and cargo run differ in their option handling?

CAD97 · July 3, 2024, 7:14pm

As a bit of a summary and overview:

Especially in large programs with many crates, linking is a significant portion of the compilation time for Rust, and essentially all of that work gets fully redone each time a binary is built.
Utilizing dynamic linking instead of static linking is typically meaningfully faster for build plus execute, when the binary will be run a single digit number of times (e.g. in an edit/test cycle).
In theory it should be an identical amount of work, but dynamic linking can easily defer or skip work that the static linker does (e.g. how unused symbols get handled), static linking needs to do more filesystem work to save the built executable, and static linkers that work at the speed of the filesystem are rare, although possible.
On the other hand, dylib loading is complicated and highly platform dependent. Cargo can make it work transparently when running directly in the workspace, but nothing can beat single file portable executables for distribution convenience.
- But also, outside the world of CLI tooling, a project will typically require further assets not bundled into the executable. This is currently deliberately out of scope for Cargo, but shouldn't be fully ignored.
This has almost no impact on monomorphic API and other inlining candidates. It can have some when generic sharing is on, but this doesn't occur beyond opt-level=1 for performance reasons.
Library crates specify their crate type, which controls which artifacts^[1] it generates when built. All selected crate types are always built.
If a lib crate has both static and dynamic type, the static library is used unless the build is set to prefer dynamic.

That should cover the mostly objective situation. As for what can be done to improve the status quo, more subjectively:

The lib crate-type is the default for libraries, and is not specified to be a static library. Formally, at least, this means that making lib resolve to dylib in some cases is an allowed change.
However, it's a well known fact that Rust/Cargo produces (almost^[2]) fully statically linked binaries by default, so changing that is potentially ill advised.
At least on Unix platforms, it is straightforward to repack a static object library (win .lib, nix .a) into a dynamic object library (win .dll, nix .so). Windows is a bit more interesting since .lib files are still used for dynamic linking as a sort of binary analog to code header files.
Modulo the use of linker ~~crimes~~ scripts, some of the relinking cost can be saved even in fully static builds, by linking together the rarely changing crates (nonlocal dependencies) into a pre-linked bundled artifact that can be reused between builds of more commonly changing crates (workspace packages), assuming the platform linker is willing to do such a "partial" linking.
- It should, since that's effectively what a shared library is, only differing in how that library exposes its ABI for consumption / linkage. But I'm far beyond expecting linkers to act predictably.
- Specifically, AIUI .a are just archive bundles of the various .o without any linking work done yet. There's some object resolution rule differences, but linking to .a is otherwise no different from directly using the individual .o object files.
That is a good application of incremental concepts to linkage, but still requires relinking a lot when the bundled crates do change. Dynamically linking all dependencies means adding new dependencies will not need to redo that work, instead smearing it over every dynamic linker/loader runtime.
Changes to behavior should strive to avoid making it so that cargo build/cargo run/cargo test in the default configuration can't share compilation work, since that will increase the amount of rebuilds, which goes against the purpose of making improvements.
Additionally, there are various desirable features which work seamlessly and "zero cost" if in a single statically linked bundle, but are less seamless in a dynamically linked environment, if they work at all.
- Example: #[global_allocator] and #[panic_handler] are essentially unspecified for dynamic linking. In practice, IIRC, a non-bundled artifact will keep a dependency on the used extern symbol name(s), and bundled artifacts (i.e. staticlib, cdylib) will bundle in whatever they have visible and ignore the rest of the world that might show up later.
- Example: #[test] collection is currently done by placing all found test items in a slice in the binary. This is currently restricted to collection within a single binary crate target, but can almost trivially be made available across static
- Features like this should never cross bundled artifact boundaries by design. They can cross Rust dynamic library boundaries in one of two main ways: either by just monomorphizing that part into the final artifact anyway or by embracing dynamic linking and doing work at library load time. The more dynamic approach isn't "zero cost" anymore, though, and likely requires the use of load-time hooks (i.e. (limited^[3]) life before main), where Rust's lack of such is treated as a feature by a significant chunk of the user base.

I, personally, am fully in support of one specific change (assuming it can get appropriately gated):

Add a profile setting that acts as a stronger form of today's -Cprefer-dynamic and additionally makes the lib crate-type produce a dynamic library instead of a static library. (Honestly, there's an argument that -Cprefer-dynamic should do that anyway.)

This will allow gathering experimental evidence as to the benefits (and costs) of this approach. However, it's important to note that the use of Rust dynamic libraries still does not permit swapping out a Rust dependency without recompiling its full downstream. The use of dynamic libraries in this way is solely for the more efficient incremental usage of the platform linker; providing Rust dylibs at runtime other than the ones output by the build process^[4] is entirely at the developer's risk and likely to not function at all.

They are:
- lib — Default "compiler recommended" library. Almost no guarantees beyond being usable for downstream compilation.
- rlib — Rust static library. Loadable as a lib crate dependency that will be statically linked. Retains load time dependency on upstream Rust dependencies.
- dylib — Rust dynamic library. Loadable as a lib crate dependency that will be dynamically linked. Retains load time dependency on upstream Rust dependencies.
- staticlib — System static library. All statically linked objects are bundled into the artifact as much as is possible.
- cdylib — System dynamic library. All statically linked objects are bundled into the artifact as much as is possible.
↩︎
The OS system libraries are typically dynamically linked. At some level this is obviously required, since the OS isn't bundled into the executable, but exactly what this means is highly target specific, potentially even depending on target features like +crt-static. ↩︎
To be clear, all of the load time code that runs would be under compiler control, and limited to doing the kind of work that is typical of dynamic loading, such as swapping in resolved pointers/references for placeholders, just in ways slightly more involved than the system loader can be coerced into doing for us. There would still be no user-defined or extensible life before main. If only startup dylib loading is allowed, it could even be logically placed in the existing runtime startup before user main is called instead of during library load time hooks. ↩︎
If we can coerce system loaders into it, we should consider using a distinct ".rdylib" format to make it clear that the rules aren't necessarily the same as with typical system dynamic library bundles. ↩︎

Topic		Replies	Views
Cargo dylib-run/dylib-build	5	166	August 24, 2024
Dynamic linking for compilation speed improvement? compiler	10	2387	March 19, 2021
Shared library for faster builds (research) compiler	11	494	June 25, 2024
Slow Linking with External Crates (trying to investigate internal cause)	3	955	March 25, 2019
Static binary support in rust	61	30758	March 25, 2019

Blazing Fast Unlinking

Related topics