Dynamic linking for compilation speed improvement?

Since linking is a huge part of incremental compilation time, would it be possible to instead dynamically link all dependencies and the main crate together?

I'm imagining that it could work like an additional compilation mode, in addition to --debug and --release, perhaps --dynamic.

When compiling in --dynamic mode, the compiler would generate dynamically loaded libraries for the main crate, as well as all dependencies. Linking would only require doing symbol table stuff, and not processing a bunch of machine code.

When running, the main binary could be a simplified main that would call the main function of the generated library for the main crate.

This could be supported purely as a way to do fast incremental compiles, and there would be no support for doing anything but running the generated main binary and library with cargo run. (I.e. no way to deploy the collection of libs and main binary together somewhere else.)

It seems to me that this would make incremental compilation very fast.

Would this be a good approach to speeding up incremental compilation?

The obvious problem here is that a significant chunk of compile times (majority?) is spend compiling generic code, and you can't dynamically link generics (short of JIT-compiling them on the fly, which would actually be fun).

TBH, at least in rust-analyzer's case, I haven't observed "linking is the bottleneck" effect. It seems that, even for incremental compiles, most of the time is spend compiling the chain of deps, and not in the final linking.

Is there some way I can instruct Cargo to do "everything but the final linking step", so that I can double-check linking times?

EDIT: for the curious, here's the compile time profile of rust-analyzer build, after adding a comment to one of the middle-end crates: https://gist.github.com/matklad/3fc87d13549b3810fe08dbb3e50228cc. 25 seconds in total, 7 seconds for the leaf binary crate, of which 5 seconds is linking.

EDIT: redoing the bench more carefully, it seems like the total time is more like 10s, which makes linking take 50% of that: Intuition for rustc current incremental compilation?

3 Likes

In the case of Bevy linking takes a significant amount of time. This is probably because a simple game only uses generics a little bit and there are a lot of dependencies for the graphics, which don't provide a generic interface to the game itself.

On nightly you can use cargo rustc -- -Ztime-passes |& grep link_binary.

3 Likes

That makes sense. As an alternative, you could save all the MIR to disk, and then interpret the MIR with MIRI, which wouldn't really involve any linking.

Yes, but since Miri is an interpreter, execution would be really slow.

That's true, but if it were as slow as python (or even a bit worse) it would still be great for development.

You could, for example, run a small number of simple tests very quickly, which is a common loop.

1 Like

That's what I thought too until I tried it. Unfortunately it did not pay off, even for cases with a small(-ish) test suite.

2 Likes

Miri is way slower than python. I believe an 100x slowdown over native execution is expected.

As a side note I am currently working on improving the jit mode of cg_clif in ways that can reduce the compilation time. (lazy compilation and maybe re-using individual functions when the source changes. I got the former mostly working, but the later is likely much harder)

2 Likes

Another thing to look at perhaps as an alternative to dynamic linking would be incremental linking, Which is supported by at least 2 linkers, mvsc, and gnu-gold, which essentially reserves padding in the final output, so you wouldn't actually have to change the linkage model to dynamic, which may be less invasive.

I don't think i've ever encountered much use on gold/linux however, part of this seems to be difficulty integrating it into Makefiles where the linker args are rarely reflective of the full dependency tree. For msvc there is already this issue #37543

Or use an interpreter to run, enter pure rust code, and run without compilation