Idea: Light-weight reusable dependencies

Coming from a C++ background, there're basically two different kinds of dependency libraries. Most dependencies comes with many different *.cpp translation unit files, get compiled and archived into libraries and get linked into final executable files. This is very similar to the Rust crates and Cargo lib dependencies.

On the other head, there's another different kind of dependency libraries, called "header-only" libraries. There's actually 0 translation units within it. Instead it's inlined into downstream translation units, and they got compiled together. Today in Rust there's no equivalent.

Recently reading some libraries' code, and some PRs removing dependencies from packages (because they feel the burdensome doesn't worth it), I'm feeling that maybe it would be nice to have a "inlined-package manager" parallel to cargo, and it can fetch and inlinize a library crate as a module into current package/crate, keeping track of versions and dealing with updates etc.

Curious what people think about it.

The first thing that popped into my kind after reading this is "whoa that's a lot of extra complexity". So what exactly is it we'd be getting for this high price?

Consider that pretty much every rust project in existence already uses cargo, and can thus already resolve dependencies.

I think I'm failing to see any kind of upside to this proposal, while I so see a whole lot of downside. That might be a lack of imagination on my part though, so if some cool stuff is enabled by this, please expand on that.

1 Like

I think this is to help with build times. A common way to reduce build times is to drop a dependency and to re-implement the subset of what you need. This helps in two ways

  • You get only what you need
  • The compiler builds your dependency as part of your crate, trading the overhead for tracking a distinct crate's output for the compiler having to process more within the crate.

I'm unsure how much the latter will help. While a dependency might extend out the longest path for your build times, it still has to be built either way and without the compiler doing more processing in parallel it'll be serialized as compared to building two dependencies in parallel.

1 Like

Isn't the way cargo work closer to the header-only approach than multiple static libs? Everything is potentially inlined and optimization happens with all the crates in one big soup? What is missing from a traditional c/c++ workflow is the ability to reuse already built static libraries accross different builds.

7 Likes

The main reason for header-only libraries in C++ is one of

  • "Fake" single header libraries, where it's a single .h for the purpose of being copied into your source tree so you can decide how to build it.
    • "fake" because most times you have to do a special #define LIB_DEFINITION; #import "lib.h" to actually compile the non-header part of the library.
  • Pure header libraries, where everything is in headers because it has to be, because it's all templates and maybe inline functions.

Nearly all Rust code uses cargo, so the first point (of trivially integrating into your flavor of build system) is moot. The second one is also moot; generic code can be provided in .rlib just fine, and we can even theoretically do MIR optimization on pre-monomorphization generic functions. (Though we probably won't unless it's very fast and gives a reduction in compile-time of anyone using it, as this delays pipelining.)

And even then, these in C++ aren't typically versioned. They're copied into your source tree and forgotten about. Copying "nanodependencies" into your source tree as-is exactly replicates what the C++ practice is for such single-file dependencies.

It used to be that cargo was failing to properly pipeline dependencies, but now it can start on downstream when upstream has finished its cargo check but before it's finished (or maybe even started) its codegen. The benefit of bringing a dependency into your same crate is minimal, and we can continue to shrink it.

What actually could potentially be interesting, though, is detecting when a crate passes some "small" criteria to bundle its compilation (the codegen part which does not block downstream compilation) automatically with another small dependency into a single codegen unit. Doing this transparently should be able to even further decrease the pipelining cost of splitting crates into multiple smaller crates.

Achieving the identical compile time for a clean build of multiple crates as if they were all a single crate isn't possible, because there is overhead in enforcing the orphan rule and other properties of the crate boundary. But we can get close, and identifying concrete places where the separation of crates currently has a nonintrinsic cost is the first step to improving the overhead.

7 Likes

I've had a similar thought in the past, imagine something like cargo gist url-to-leftpad.rs that just inserts the code contained to src/gist/leftpad.rs and has another command for updating such files to the latest versions. The (not very serious) thought was motivated by wanting more transparency when it comes to dependencies. Of course, they're already transparent and you can go and look at the code you're including as part of your project, but I do wish it was easier to vet what's actually happening when I run cargo add or cargo upgrade.

What I think I really want is some kind of advanced diff UI for previewing the changes that happen as a result of adding/upgrading, but I thought I'd mention this whole thing as one angle for wanting a feature like this.

Inlining dependency code is to avoid keeping track of versions and dealing with updates.

Adding dependency is more than trusting current snapshot of code of it and its transitive dependencies. Dependencies are keep updating. They may adds some bugs, accidental breakage, unnecessarily features or even malicious code(potentially by stolen credentials) with version up. Adding dependeicy is to trust its author to not do so in the future. It's also to trust its author's decision to trust its own subdependencies.

Sometimes, especially for small code, it's too much cost compared to the benefits of potential future bugfix and improvements. If the license allows it copying code directly lets you to think less. Just trust code, no more update. But you need to manage it yourself.

3 Likes

The primary reason for header-only libraries in C++ is that C++ has no standard package manager, or even a standard build system, and as such integrating external dependencies any more trivial than that into your build is not always obvious.

Cargo exists, so that particular motivation is absent here.

8 Likes

Exactly. And it's simply cumbersome to perform this inlining by hand, and it's even more cumbersome to update the inlined code later. So that's why i think some tooling is needed here, though i'm not sure about the details.

Well the motivation is comparatively weaker, but absolutely not absent here. Here's a full category of crates here: rust patterns on lib.rs that seems too heavy to be added as a dependency to people but none the less useful, that people will be forth and back about whether including them as dependencies, that people will manually reinvent - again and again and again - 100 pages in search results

Imo such phenomenon will NOT go away any time soon (std is not a code gallery), but will become more severe only over time, and should be treated seriously.

The thing is, if they're copying rather than using the dependency, this isn't going to be changed by a different kind of dependency. And perhaps what you're looking for is git submodule.

1 Like

The question of whether to add a leftpad-style dependency hinges on whether it is worth the maintenance burden (of having to put up with incompatible upgrades) and the security risk (of the dependency being a potential vector for malicious code, or for supply-chain DoS like the trope namer). The way such a dependency is managed by the build system is pretty immaterial: this is primarily a problem of policy, not of technology. Trying to find a technological solution is futile.

3 Likes

Different technological capabilities make different policies feasible (or attractive) to implement.

The main issue is that Rust source code is all (usually) written with crate:: for its cross-module imports. This does not translate well when embedded. I suppose code exclusively using super:: could fare better, but this is…non-standard to say the least. Probably less of an issue for single-source file embeddings, but that seems awfully restrictive too.

FWIW, it's a very similar problem in Python where from mypackage import X is common but does not scale well when embedded.

1 Like