Blazing Fast Unlinking

Vorpal · July 3, 2024, 7:45pm

Having an optionally separate run and dev profiles is interesting. It is key though that you don't have to build everything twice normally (or that you at least can opt out of that behaviour).

Can't the remote case be solved by also copying the so files? Something like: scp target/debug/{*.so,mybin} remote:path/?

You can also define custom profiles, which inherit from one of the standard profiles. I use this here for a profiling profile (release with full debug info): paketkoll/Cargo.toml at 1ae24f8c9009d02e29642fb445e4e4716d3ed866 · VorpalBlade/paketkoll · GitHub

For you a remote profile inheriting from dev could make sense. For me your use case seems niche: when I do remote deployment it is also usually cross compiled (either to ARM or to some embedded no-std thing). Not using the same binaries as the host anyway in other words.

When I'm not doing embedded work I do command line programs, that can be run locally, and I like quick iteration. See my previous arguments as to why switching between run and build shouldn't trigger a rebuild there.

Vorpal · July 3, 2024, 8:02pm

I seem to remember that this is a case where behaviour between ELF (Nix excluding MacOS) and PE (Windows) will differ.

ELF does symbol resolution in a global namespace for all public symbols across all shared libraries (and the executable as well). This is why things like LD_PRELOAD works, and can be used to replace malloc with an alternative implementation. This includes (for C at least) public symbols where the caller and callee are in the same library. A call site simply says "I need symbol foo".
PE works differently, here a call site will say "I need symbol foo from library bar". So tricks like LD_PRELOAD won't work.

This distinction is likely to have an effect on global allocators. On Unix it is trivial to make them just resolve to the same one (in the order of resolution the first one wins, don't remember the exact resolution order of the top of my head, something like LD_PRELOAD, executable, linked shared libraries in some order or other). On Windows you instead need a common dependency to define a shared symbol for everyone else to use. So alloc could define a shared global (static with interior mutability?) that others can set and read.

I have no clue about MacOS and their MachO format.

Source for this: ELF I know a lot myself, PE I watched a conference talk comparing PE and ELF (in the context of C++) some time ago. I can't seem to find it any more.

bjorn3 · July 4, 2024, 12:30pm

Mach-O defaults to two level namespaces, which behave similar to PE. You can also force usage of a single global namespace to make it behave like ELF using an env var at execution time as is necessary for using the macOS equivalent of LD_PRELOAD.

matthieum · July 4, 2024, 4:48pm

I would note that it's already the case. That is cargo test uses the test profile which inherits from the dev profile, while cargo bench uses the bench profile which inherits from the release profile.

And yes, this means that other tools also need to pick their default profile. nextest likely picks test, criterion likely picks bench, no idea what afl and valgrind aim for as I don't use them.

I would expect that given a choice between consistency and 10x build time improvements for cargo test, most users would pick 10x build time improvements. Sometimes, pragmatism wins.

I am really not certain that's the case.

Looking at my own workflow the most frequent commands I invoke are, in order:

cargo clippy.
cargo fmt.
cargo test.

And then way down I may finally build a binary to run it locally or push it to a remote host.

Thus, for my own workflow, cargo build rebuilding from scratch is not a problem, whether dev or release.

There's dynamic linking -- with the expectation that a "similar" library can be swapped -- and there's dynamic linking as an implementation detail like here.

A pragmatic take on the singleton issues (#[global_handler], #[panic_handler]) would be to cheat, and put them in the final binary regardless, thus bypassing the problem altogether. When dynamic linking is an implementation detail, after all, it's really up to the compiler to decide what to dynamic link.

For #[test] and other "collections", it's "just" a matter of moving from a single slice to a collection of slices that can be iterated over. It may require a bit of leg work, but is eminently doable. (Using the .ctor section, you can have the slices form an intrusive linked list on their own)

There's life before main and life before main

Coming from the C++ world, the fact that every user can arbitrarily schedule work before (and after) main, work which may take an arbitrarily long time, and work which may crash (static initialization/destruction order fiasco, anyone?) is a real pain and arriving in Rust is a breath of fresh-air.

But that's not what we're talking about, here. We're talking about a language feature implementation, which may happen to require executing a minimum of amount of code at load-time, depending on the platform.

For #[test], for example, we're talking about a way for the language to allow walking through the list of #[test]. One implementation would be to create a slice of tests in a dedicated (and reserved) section of each library, and then walk over those sections at runtime, iterating each slice -- that's how .ctor works, the section is a slice of function pointers.

And if that's not possible on the platform, then an alternative implementation is used instead. Perhaps an intrusively linked list of slices. Perhaps something else. It's an implementation detail of the runtime.

That is, contrary to the C++ situation, I'd expect that even when load-time is involved (1) it's minimal and (2) it won't crash under my feet.

It may be niche. There's a combination of special hardware (more CPUs/RAM, higher-end GPUs, FPGAs, ...), special location (close to data, firewalled area, ...), etc...

I doubt I'm the only one with these constraints, but I have no idea how common they are.

Vorpal · July 4, 2024, 6:23pm

That is way different to me. I use cargo run a lot currently, I'm making a command line program with an embedded script interpreter in it (currently testing out rhai and rune to decide which one to actually go with).

Formatting the file is a key binding in vscode, I only really run it as a command just before a git commit to make sure I didn't miss anything (I usually did).

Clippy is similar, after I'm done with a bunch of things and am ready to commit I go through and clean up the lints.

I run cargo nexttest quite often though, almost as much as cargo run.

I believe you are focusing too much on your workflow and over-specialising for it. Now there are at least three people who have said so, please consider that you might not represent the majority here.

While that would work for ELF, I'm pretty sure you need the exact opposite for PE (and possibly MachO if I understood @bjorn3 correctly.

bjorn3 · July 4, 2024, 6:51pm

As I understand it Chrome strongly prefers if we don't add more global constructors: Stdlib contains a static initializer on Linux, without any way to opt out · Issue #111921 · rust-lang/rust · GitHub

For PE and Mach-O defining symbols in both the main executable and a dylib will cause both to disagree about which one to use. For the panic handler this would merely mean that the wrong panic handler is used when panicking inside a dylib, but for the global allocator this can mean attempting to allocate with one allocator and deallocating with another, which causes a crash at best and can cause an exploitable memory safety issue at worst.

Vorpal · July 4, 2024, 9:10pm

Perhaps I was unclear: with exact opposite I meant that the symbol needs to be declared in a common dependency (e.g. Core or Alloc). As I understand it, a library on PE can't resolve a symbol in the binary? PE dependencies form a DAG, right?

CAD97 · July 4, 2024, 10:32pm

Note that there are some classes of errors which are caught by a build but not a check. In situations where those are less rare than typical for most Rust development, occasionally doing build as a more thorough check is beneficial, even if you haven't yet done enough to also want to rerun tests yet. (When you have enough tests that running tests contributes meaningfully to the build+test time, anyway.)

binarycat · July 5, 2024, 3:47pm

this seems like the most logical option, and something that could be gated behind a nightly feature flag at first.

matthieum · July 5, 2024, 4:32pm

I am afraid you are terrible at reading minds. I certainly am.

I am not trying to focus on my own workflow, and I am certainly not pretending it's "the" workflow -- I did specify it was my workflow or my team workflow, and never even tried to pass it of as "the" workflow.

It just so happens that this is the only workflow I am intimate with, and I can only suspect that most others people are similarly only really familiar with their own (or perhaps a variant or two).

Thus I share mine, and can only hope that others share theirs, so that we can get an overview of the usecases that need be covered. No more, no less.

Interesting, so for you speeding up cargo run would be crucial.

Do you use cargo build much? And if so in which circumstances?

I'd certainly prefer it too -- I'm very much of the "You Don't Pay For What You Don't Use" mindset -- which is why I would favor if platform-specific "reflection" methods would be used to implement the features instead.

Still, I do note that in this case we are specifically talking about opt-in global constructors: if one doesn't use global registration, then there's no need for a global constructor to register anything. Given the wording of the bug report (no way to opt out), it seems even Chrome would be okay with such as a scheme (they'd simply never opt in).

Is it possible to simply NOT define the symbol in the dylib, and only define it in the executable?

Are you prescient? I just got it by this today, with check/clippy passing, but test failing to compile because I hadn't activated the const_generics feature.

In this case it's just poor user experience -- I mean, check can see the code is using const_generics, so should flag it immediately -- but I do seem to remember other such cases.

In any case, I don't see the issue. Users who want a fast cargo build (because they use it often) can easily opt in to dynamic linking even if it's not the default, either permanently (through configuration) or on an ad-hoc basis.

Interestingly, my cargo test is pretty slow not due to the tests themselves being slow, just due to them being slow to build (link)

bjorn3 · July 5, 2024, 7:30pm

Your dylib may be used by both a cdylib and the main program (or alternatively two independent cdylibs). A cdylib has to be standalone and as such would need to define the symbol. The main program doesn't know that the cdylib defines the symbol and as such has to define it too. The end result will be that you get two definitions of the same symbol. Depending on how things work this will result in either a linker error, dynamic linker error or either the cdylib or the main program disagreeing about it's value with the dylib that imports the symbol.

Vorpal · July 5, 2024, 7:54pm

Yes, often when I work on command line programs. Cargo adds its own output. This is at least a distraction (yes I know half the functions are currently unused!) and at worst a show stopper (I'm piping the output into another command and it is NOT expecting extra output from cargo).

So due to this I quite often do cargo build followed by some cmd | target/debug/something -flags | other cmd. I obviously want dynamic linking here.

Even when cross compiling/running elsewhere (I'm currently working on a crate that interacts with the package manager on Linux, for several different distros. (A personal configuration management^[1] system to be specific)) I would prefer dynamic linking and just copy the so files along as well.

Personal as in "I have too many computers, I want to sync system config and dot files between them". As opposed to tools like Ansible that are targeted at sysadmins managing company computers. ↩︎

matthieum · July 6, 2024, 10:55am

Have you tried --quiet? (The description is unclear as to exactly what it hides, I'm not sure it would not display warnings)

From what I've gathered so far, I think there's significant interest from most thread participants to using dynamic linking to speed-up build/test times, but:

It's not clear what defaults should be used, and how switching the ecosystem to such default should be achieved.
There may be technical hurdles, either on some platforms, or in conflicts with other features (global_allocator, panic_handler, ...).

Considering the above two, I believe that:

The focus should be put on creating the feature, leaving it purely opt-in for now, and postpone any discussion about defaults. Anyway, it would be to be opt-in during implementation & validation phase, so no huge loss.
The feature should NOT be about enforcing dynamic linking, but strictly about using dynamic linking as an implementation detail, leaving unspecified which parts are dynamically linked, and which are not. In the extreme, for some targets it could do nothing (wasm, for example).

And thus I would suggest that the feature be named -Coffload-dynamic, which instructs the compiler to offload as much as technically feasible to dynamic libraries -- in order to reduce link-times -- but leaves open exactly what is offloaded, and what is not, so that it may change based on targets, flags, compiler versions, etc...

Vorpal · July 7, 2024, 4:18pm

I have not.

I have now yet another use case for running cargo build separately: I'm now working on a command that must be run as root to work (no, it doesn't work in a container either, it talks directly with real ACPI vendor and model specific hardware over interfaces in /sys). For probably obvious reasons sudo cargo run is not a good idea. Can I mock this for tests? Yes obviously, but I also need to test it against the real thing.

Vorpal · July 11, 2024, 6:15pm

Update: --quiet doesn't remove the warnings anyway.

Topic		Replies	Views
Cargo dylib-run/dylib-build	5	179	August 24, 2024
Dynamic linking for compilation speed improvement? compiler	10	2448	March 19, 2021
Shared library for faster builds (research) compiler	11	575	June 25, 2024
Slow Linking with External Crates (trying to investigate internal cause)	3	955	March 25, 2019
Static binary support in rust	61	30995	March 25, 2019

Blazing Fast Unlinking

Related topics