Idea: Light-weight reusable dependencies

crlf0710 · July 29, 2022, 4:20pm

Coming from a C++ background, there're basically two different kinds of dependency libraries. Most dependencies comes with many different *.cpp translation unit files, get compiled and archived into libraries and get linked into final executable files. This is very similar to the Rust crates and Cargo lib dependencies.

On the other head, there's another different kind of dependency libraries, called "header-only" libraries. There's actually 0 translation units within it. Instead it's inlined into downstream translation units, and they got compiled together. Today in Rust there's no equivalent.

Recently reading some libraries' code, and some PRs removing dependencies from packages (because they feel the burdensome doesn't worth it), I'm feeling that maybe it would be nice to have a "inlined-package manager" parallel to cargo, and it can fetch and inlinize a library crate as a module into current package/crate, keeping track of versions and dealing with updates etc.

Curious what people think about it.

jjpe · July 29, 2022, 5:19pm

The first thing that popped into my kind after reading this is "whoa that's a lot of extra complexity". So what exactly is it we'd be getting for this high price?

Consider that pretty much every rust project in existence already uses cargo, and can thus already resolve dependencies.

I think I'm failing to see any kind of upside to this proposal, while I so see a whole lot of downside. That might be a lack of imagination on my part though, so if some cool stuff is enabled by this, please expand on that.

epage · July 29, 2022, 5:28pm

I think this is to help with build times. A common way to reduce build times is to drop a dependency and to re-implement the subset of what you need. This helps in two ways

You get only what you need
The compiler builds your dependency as part of your crate, trading the overhead for tracking a distinct crate's output for the compiler having to process more within the crate.

I'm unsure how much the latter will help. While a dependency might extend out the longest path for your build times, it still has to be built either way and without the compiler doing more processing in parallel it'll be serialized as compared to building two dependencies in parallel.

BearOve · July 29, 2022, 9:42pm

Isn't the way cargo work closer to the header-only approach than multiple static libs? Everything is potentially inlined and optimization happens with all the crates in one big soup? What is missing from a traditional c/c++ workflow is the ability to reuse already built static libraries accross different builds.

CAD97 · July 30, 2022, 2:05am

The main reason for header-only libraries in C++ is one of

"Fake" single header libraries, where it's a single .h for the purpose of being copied into your source tree so you can decide how to build it.
- "fake" because most times you have to do a special #define LIB_DEFINITION; #import "lib.h" to actually compile the non-header part of the library.
Pure header libraries, where everything is in headers because it has to be, because it's all templates and maybe inline functions.

Nearly all Rust code uses cargo, so the first point (of trivially integrating into your flavor of build system) is moot. The second one is also moot; generic code can be provided in .rlib just fine, and we can even theoretically do MIR optimization on pre-monomorphization generic functions. (Though we probably won't unless it's very fast and gives a reduction in compile-time of anyone using it, as this delays pipelining.)

And even then, these in C++ aren't typically versioned. They're copied into your source tree and forgotten about. Copying "nanodependencies" into your source tree as-is exactly replicates what the C++ practice is for such single-file dependencies.

It used to be that cargo was failing to properly pipeline dependencies, but now it can start on downstream when upstream has finished its cargo check but before it's finished (or maybe even started) its codegen. The benefit of bringing a dependency into your same crate is minimal, and we can continue to shrink it.

What actually could potentially be interesting, though, is detecting when a crate passes some "small" criteria to bundle its compilation (the codegen part which does not block downstream compilation) automatically with another small dependency into a single codegen unit. Doing this transparently should be able to even further decrease the pipelining cost of splitting crates into multiple smaller crates.

Achieving the identical compile time for a clean build of multiple crates as if they were all a single crate isn't possible, because there is overhead in enforcing the orphan rule and other properties of the crate boundary. But we can get close, and identifying concrete places where the separation of crates currently has a nonintrinsic cost is the first step to improving the overhead.

Heliozoa · July 30, 2022, 3:22am

I've had a similar thought in the past, imagine something like cargo gist url-to-leftpad.rs that just inserts the code contained to src/gist/leftpad.rs and has another command for updating such files to the latest versions. The (not very serious) thought was motivated by wanting more transparency when it comes to dependencies. Of course, they're already transparent and you can go and look at the code you're including as part of your project, but I do wish it was easier to vet what's actually happening when I run cargo add or cargo upgrade.

What I think I really want is some kind of advanced diff UI for previewing the changes that happen as a result of adding/upgrading, but I thought I'd mention this whole thing as one angle for wanting a feature like this.

hyeonu · July 30, 2022, 6:23am

Inlining dependency code is to avoid keeping track of versions and dealing with updates.

Adding dependency is more than trusting current snapshot of code of it and its transitive dependencies. Dependencies are keep updating. They may adds some bugs, accidental breakage, unnecessarily features or even malicious code(potentially by stolen credentials) with version up. Adding dependeicy is to trust its author to not do so in the future. It's also to trust its author's decision to trust its own subdependencies.

Sometimes, especially for small code, it's too much cost compared to the benefits of potential future bugfix and improvements. If the license allows it copying code directly lets you to think less. Just trust code, no more update. But you need to manage it yourself.

felix.s · July 30, 2022, 7:21am

The primary reason for header-only libraries in C++ is that C++ has no standard package manager, or even a standard build system, and as such integrating external dependencies any more trivial than that into your build is not always obvious.

Cargo exists, so that particular motivation is absent here.

crlf0710 · July 30, 2022, 6:04pm

Exactly. And it's simply cumbersome to perform this inlining by hand, and it's even more cumbersome to update the inlined code later. So that's why i think some tooling is needed here, though i'm not sure about the details.

Well the motivation is comparatively weaker, but absolutely not absent here. Here's a full category of crates here: rust patterns on lib.rs that seems too heavy to be added as a dependency to people but none the less useful, that people will be forth and back about whether including them as dependencies, that people will manually reinvent - again and again and again - 100 pages in search results

Imo such phenomenon will NOT go away any time soon (std is not a code gallery), but will become more severe only over time, and should be treated seriously.

CAD97 · July 30, 2022, 9:16pm

The thing is, if they're copying rather than using the dependency, this isn't going to be changed by a different kind of dependency. And perhaps what you're looking for is git submodule.

felix.s · July 31, 2022, 7:01am

The question of whether to add a leftpad-style dependency hinges on whether it is worth the maintenance burden (of having to put up with incompatible upgrades) and the security risk (of the dependency being a potential vector for malicious code, or for supply-chain DoS like the trope namer). The way such a dependency is managed by the build system is pretty immaterial: this is primarily a problem of policy, not of technology. Trying to find a technological solution is futile.

kpreid · July 31, 2022, 2:48pm

Different technological capabilities make different policies feasible (or attractive) to implement.

mathstuf · August 1, 2022, 10:31pm

The main issue is that Rust source code is all (usually) written with crate:: for its cross-module imports. This does not translate well when embedded. I suppose code exclusively using super:: could fare better, but this is…non-standard to say the least. Probably less of an issue for single-source file embeddings, but that seems awfully restrictive too.

FWIW, it's a very similar problem in Python where from mypackage import X is common but does not scale well when embedded.

system · October 30, 2022, 10:31pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How about add some options to cargo that compiles only "used" library or mod or fn? compiler	4	606	August 11, 2023
Unused dependency code - can compiler performance be improved by doing less work? compiler	12	1712	May 24, 2022
Using crates like dynamic libraries + native package managers relation with Cargo libs	46	3983	December 6, 2021
[Idea] Cargo Global Binary Cache cargo	32	6443	March 31, 2019
[pre-RFC] Generate "headers" for greater parallelism	8	2264	March 25, 2019

Idea: Light-weight reusable dependencies

Related topics