Using crates like dynamic libraries + native package managers relation with Cargo

ShadowJonathan · August 3, 2021, 9:22pm

Today i encountered a person who was not at all impressed in with rust, they got that impression implicitly through fedora's crate packaging guidelines, which makes rust able to be compiled offline within their own package management system.

They were convinced that, because rust does not allow dynamic linking by default, i.e. on crate-level, it had no future in systems programming and "couldnt be treated seriously".

I personally wanted to ask about the dynamic linking bit, while i somewhat agree to the fact that rust in native package managers seems... odd, i also cant exactly respond to it, mainly because i don't know what the story is between cargo package managing, and native package managing.

They went on to argue that that was a huge flaw of rust, comparing cargo to Golang's dependency management, Ruby's, PHP's, Node.JS', and shunning all of them while upholding the fact that native dependency managers were better attuned for the future.

Now, i'm not here to perpetuate the argument (i'm just repeating it for context), but i'm rather curious about a few points;

What is the story between native package managers and cargo? How well do both work with eachother, and what are some big topics or articles i can read for more background on how both have a history? What is the community consensus wrt the future of native package managers vs cargo? To embrace it, or to ignore it?
Are there any plans to work on, or treat, crates as individual dynamic libraries, and have them able to be distributed as such? I imagine discussion happened early on wrt if rust should use native package managers, or it should have its own, if it does exist, can i have some links to that? What are some general "modern common sense" arguments or reasonings to not or to consider dynamic libraries for development? What is rust's community consensus on this?

ShadowJonathan · August 3, 2021, 9:24pm

Hmmm, interestingly, just after i posted this thread, i saw Why are Rust shared libraries not shipped with popular OS? - libs - Rust Internals (rust-lang.org), sorry about that

quinedot · August 3, 2021, 10:32pm

Here's a Gentoo packager's perspective on the topic. There are links to related articles in both the post and the comments.

bascule · August 3, 2021, 11:32pm

I'd just like to give a shout out to the Fedora Rust SIG and rust2rpm.

I'd personally consider them (and vicariously, the extended RedHat family that benefits from their work) the leading Linux distro when it comes to producing packages both of crates and Rust itself, but curious to hear other opinions.

hyeonu · August 4, 2021, 2:06am

Native package managers never worked well with C++ libraries. Traditionally C++ devs copy the source/header files from FTP servers or web pages and put it in their local directory.

CAD97 · August 4, 2021, 5:32am

Not including the general arguments for static/early binding and against dynamic/late binding to dependencies, the main one is

Dynamic libraries are HARD to IMPOSSIBLE in the face of generic API surfaces.

Gankra's article on how Swift achieved stable dynamic linking while allowing library evolution is probably the key reading link here.

The key points are that

Generic APIs are implementable in one of two main ways: monomorphization (Rust's impl Trait, C++'s template<>; where code is "copy-pasted" for each instantiating type) or polymorphization (Rust's dyn Trait, C++'s virtual; where a single code path is used for every instantiated type).
- Ahead-of-time monomorphization cannot work across late-binded library boundaries. If you copy implementation code from the library, it knows about and relies on details of the library that are not public and stable, and could change in a future version. (JITs effectively implement a sort of runtime monomorphization for the hot paths!)
- Polymorphism works for late-binded library boundaries, but comes with a number of restrictions and performance penalties. Swift has the @frozen attribute to opt-out of polymorphization and promise a type won't change in incompatible ways for exactly this reason. Think roughly of the restrictions for object-safe traits in Rust — everything that isn't object-safe either cannot be used a cross a dynamic library boundary, or pays even more extra dynamic costs silently (e.g. alloca, hidden accessors for field access, etc.) to be usable.
And oh boy don't forget about the other issues with keeping libraries not just API compatible but ABI stable, which means you can never change the size or layout or name or ... of any public item, ever. Basically only private implementation details completely hidden from the consumer can be changed, ever; thus the PIMPL pattern.

Use Cargo dependencies for library dependencies.

If you want to amortize some of the compilation cost of libraries between projects (a big draw of system-wide shared libraries), you can either just set $env:CARGO_TARGET_DIR to some shared directory (and let Cargo handle it), or use a tool like sccache.

(In fact, in the future it might make sense for system package managers to provide a sccache-like service, where they provide pre-compiled library artifacts in a way more amenable to sharing for Rust, rather than the globally shared dylibs that they provide for the platform C ABI. Maybe. I don't know.)

That's not to say that you can't write a library that can be used as a dynamic library! The abi_stable crate exists specifically to make this (effectively the PIMPL pattern) somewhat reasonable to implement while maintaining reasonable type safety within Rust's compilation model.

Rust just makes the costs of supporting a stable ABI in this way (painfully) obvious, and it's not worth the effort for most library authors when Cargo source dependencies are just as (if not more) convenient to use, and not providing the stable ABI means you can provide a much richer API.

kornel · August 4, 2021, 10:17pm

I'm not sure about distro packaging. I think there are many signs that it's not working well, and it will either have to adapt, or be sidelined:

Developers are adopting containers. To me this is a sign of totally giving up on package management. It's too hard to install more than one application on top of a system package manager, so developers would rather ship a snapshot of an entire operating system than to deal with dependencies.
There's a push for snaps, flatpaks, appimage, etc. which also bypass/fight traditional package managers.
Major programming languages other than C/C++ ended up building their own packaging (npm, maven, packagist, bundler, pypi, cargo, ~~gopm~~ [go modules]), and they're thriving. You can laugh at left-pad, but npm is by far the biggest package manager with most users and most packages.
- It shows that traditional packaging systems didn't work well (if they weren't broken, then users would stay on them instead of adopting other package managers)
- The worst argument "against" new language-specific package managers is basically that they're too easy to use: developers publish too many libraries (even very small ones), and are too keen to add lots of dependencies to their programs. To me this sounds like a great success in removing pain and barriers that were holding developers back.
Linus: the packaging model used by Linux distributions makes Linux a poor target for application developers. Linus Torvalds on why desktop Linux sucks - YouTube

ShadowJonathan · August 5, 2021, 7:40am

(FTR, Golang as a language officially adopted Go modules, gopm is one of those community package managers that bit the dust by then, along with (iirc) 5 others.)

Thanks for the elaborate responses though, this is interesting.

dhardy · August 5, 2021, 6:29pm

There are three distinct cases, and it's worth talking about them separately:

Shipping an entire OS / desktop environment. This tends to include a lot of binaries (unless using the busybox approach) which, with static linking, adds up to large sizes (install size, update size, even memory usage). This is also usually a highly-controlled environment, thus dealing with ABI stability is less problematic.
Shipping $product to $users across multiple platforms. In this case static linking makes a lot more sense. Dynamic libraries can still be useful if $product includes multiple executables.
Dynamically-linked plugins. In this case dynamic linking is a must and plugins are likely to be built with a very specific compiler/environment.

federicomenaquintero · August 10, 2021, 1:50am

Against Packaging Rust Crates by firstyear is a great read, and I fully agree with it. It is written from the perspective of someone who works on a distro, maintains important packages/libraries, and knows the historical limitations of distro packages.

My own "Rust does not have a stable ABI" is from the viewpoint of another distro person, who maintains a shared library with a stable C ABI, all written in Rust - namely librsvg.

Some disconnected thoughts, all related to dynamic libraries and distros:

Even within distros, there is bundling happening. As an example, both Inkscape and gnome-shell embed slightly differently patched versions of libcroco (an old C library to parse CSS).
I worry a bit that if other platform libraries like librsvg get (even partially) ported to Rust, we'll have more copies of the Rust standard library and other low-level crates in memory. But so what? The same happens for C++ libraries with templates, or header-only libraries, and people don't seem to complain. Now, if all the platform libraries got ported to Rust and managed to preserve their C ABI... you know a single big libplatform.so written in Rust sounds pretty damn appealing.
I've seen mentions that Apple had a hard requirement to have shared binaries for Swift libraries, to avoid multiple copies in low-memory phones. I would love to see an actual analysis of what happens either way - with their presence or with their absence. Are we talking gigabytes of wasted memory? Does it have repercussions on battery life or whatever?
Flatpak apps seem to take a large amount of time to start up the first time, compared to apps linked against the system's libraries, which are already in memory by the time you log in. Is this because the runtime takes a long time to load into memory? Is it comparable to the first boot's system libraries? Is there overhead from the container foo? Can we solve this with a bit of preloading (say, at login time before you have had a chance to launch Flatpak apps) - something that was hugely successful for system libraries years back?

atagunov · August 10, 2021, 4:58pm

Speaking theoretically:

can Linux package managers not already be used to supply multiple versions of libjpeg.so?
imagine each app on the system declared with minor version of libjpeg.so it wanted and linked against that specific minor version
would the package managers not be able to support this?
- install all required versions of libjpeg.so
- delete them when the last app using them is gone?

I understand this goes against social norm and there are reasons humans will not want it. But technically are the package managers not already able to do this? And

does it not alleviate (all of?) the pain associated with C++ and Rust ABI instability?
while also allowing system to save RAM and disk when exactly the same version of an .so is used by multiple applications?

semicoleon · August 10, 2021, 5:40pm

Even if package managers have robust support for multiple versions being "live" at once, there is not one canonical current version of any language's compiler. Generally speaking there will be a fairly large number of users who are using a recent but not perfectly up to date version of the compiler. Sometimes projects can't update to the latest compiler version for some time, due to required changes or bugs.

Given that multiple versions of the compiler almost certainly need to be supported, you now need to compile each new version of your library n times to support the n most recent compiler versions (assuming each version can't guarantee it's ABI is the same as the previous version). You might be able to abuse semvar to make that work but it isn't going to be easy or elegant. If we don't want this to be a huge mess, package managers would need to support both a semantic version and a compiler version.

Note that all of this still ignores the problems around monomorphization. You'd probably be restricted to only exposing non generic structs, enums, functions, and dyn Trait types in public interfaces. (Possibly you could expose a fixed set of monomorphizations of generic types too).

atagunov · August 10, 2021, 6:16pm

Does not reliance on a specific minor version of a crate solve this?

export/import whatever you like
it's okay for crate's fn-s to be inlined into .exe
it's okay for .exe to include monomorphised code from the crate
crate's .so can include some pre-generated commonly used monomorphisations - or else that .so may end up being empty

You are right. It's possible that some kind of a hex build hash on top of semver and compiler version would also be desirable for each "shared crate" (.so)

But if package managers did this it would solve the issue completely wouldn't it?

Alternative Plan II:

each app is supplied as a mix of .exe and .so files
package manager keeps a map from .so md5sum to all matching .so files
upon discovering that two apps are supplied with the same .so (byte-by-byte same) a hard link is used sharing the file and its inode

This would mean space is wasted on downloads but saved on disk and in RAM, right?

semicoleon · August 10, 2021, 7:52pm

TL;DR

It's possible to do this safely, but very hard in practice. Bug fixes in "patch" versions of your library are extremely risky, as they could easily break monomorphized items in binaries compiled with previous versions of your library in ways that are non obvious and very hard to debug.

More Detail

In general, no. Monomorphization means that the contents of all generic, public items become part of the API. If a "patch" release of your library changes the body of a public generic function in an incompatible way, programs compiled with the old version won't see the change while programs compiled against the new version will. This imposes some extra requirements on public generic functions but that's not the end of the world. The more significant problem is that monomorphization will almost always require inlining the bodies of generic functions that aren't public. It is incredibly difficult to do a "patch" or "fix" release of a library that does not inadvertently alter the behavior of a function that ends up inlined by monomorphization. Thus the only way to monomorphize safely is do so with a single build of all dependencies.

Swift solved this problem by... Not solving it. Swift's generics can be specialized but are not guaranteed to be. Swift can always fall back on polymorphism. In order to avoid "stable" dynamic libraries being hilarious slow compared to static and "fragile" dynamic libraries, Swift added the ability to mark private items as "usable from inline" which means those items (and critically their contents) can be treated as public and stable. This indicates that the author has promised that the compiler can inline the item into a specialization in the final binary that will eventually link to the dynamic library safely. If a generic item isn't usable from inline Swift will simply not specialize the item and it will always be polymorphic. If an item marked "usable from inline" changes in an incompatible way, your library has broken it's ABI. Under your versioning scheme this would mean you need to bump the minor or major version.

Technically you only need the implicit polymorphism (and a compatible ABI for your vtables of course) to do dynamic linking with generics, but performance will suffer. Particularly for value types.

The reasons Rust can't currently do the same thing are:

Rust never implicitly makes generics polymorphic, and currently doesn't support all traits being polymorphic (they must be object safe). There's no real reason Rust couldn't do this in theory, but it would wildly change performance characteristics. Something Rust generally tries to avoid. (It would also be quite a lot of work, I imagine)
Rust has no analagous concept to "usable from inline" to help reason about when monomorphization is safe.

I strongly recommend reading Gankra's previously mentioned article about Swift ABI stability in contrast to Rust. It's thorough and covers why ABI stability is so crucial to dynamic linking.

atagunov · August 10, 2021, 8:10pm

Certainly. Maybe I spelled it wrong. I meant to say

every change to a crate
no matter how minor it is
every patch
every bug fix

results in a new version number. It is this most fine-grained of .so version numbers that we link our .exe against.

The claim is that doing this retains some of the benefits of package managers/shared dll-s:

saves RAM and disk space
speeds up application startup

while ditching others - indeed

there is no opportunity to apply a security fix to an .so
only to an app as a whole

semicoleon · August 10, 2021, 8:36pm

In that case yes, you could do that today. Unfortunately the odds of a single machine installing two binaries which happen to depend on the exact same version and compiler version of a dependency are quite small. There's very little benefit to implementing such a system as a package manager. In the vast majority of cases it would be no different than just building your app, forcing the library to link as a dynamic library, and shipping them together.

If you ship two programs that share dependencies and benefit from not duplicating code, you could simply make the dependencies their own package and version it in lockstep with both tools. You don't need explicit support from the package manager.

atagunov · August 10, 2021, 9:13pm

Hmm.. but how correct is this assessment?

I'm running Ubuntu. I obtain my software via apt-get install and apt-get update. Surely this environment is highly controlled by distro maintainers. They should have no problem switching compiler versions in a controlled predictable manner so that the universe of software I update to is all built with the same compiler.

I would also like to hope that in many cases when app X uses library Y the "latest" version of Y would be suitable. The only challenge then for maintainers is to ensure that while hundreds of packages using Y are built the crate repo is "frozen" such that the "latest" version of Y keeps resolving to the same value.

They should be able to achieve that by using a private copy of crate repo or some sort of repo proxy and/or a tool like sccache.

Exceptions are inevitable but isn't there reason to hope the most common case would be that of sharing .so-s?

P.S. one problem I see with this schema is that it creates a strong incentive to rarely update Rust compiler; it would certainly be switched on major Ubuntu version changes but might stay glued for too long on LTS... I hope this problem is more social than technical and the right balance can be found here.

mathstuf · August 10, 2021, 9:51pm

You also have the issue of feature flags. Every crate would need to be built with all of its (positive) features so that all consumers would have any API they expect available. This means that you'll pay for extra libraries loaded for things like diesel, gfx, or other abstraction layers loading backends that aren't actually cared about.

The main issue with having libjpeg.so.1 and libjpeg.so.2 on the same machine is that these libraries both provide the same symbols. If both are loaded at the same time, things…don't usually end well (and they're usually the most "fun" kind of crashes or bugs you'll ever see). It's just easier to have everyone use the same copy of the library.

atagunov · August 10, 2021, 10:04pm

Indeed.

You could build crate Y several times - as needed - each time with different feature/compiler flags.
And you could roll this exact set of flags somewhat inelegantly - and carefully avoiding clashes into

I'm starting to fear this mismatches the way package managers are working now. But I'm also wondering if this is a vague hint at a possible path forward.

semicoleon · August 10, 2021, 10:11pm

My understanding is that distro maintainers are responsible for a core set of packages and that most other packages are built by either the developers of the software, or a third party. I am in no way an expert on this though.

This document certainly gives me the impression that in general Debian packages are not published directly by the Debian maintainers.

Topic		Replies	Views
Idea: Light-weight reusable dependencies tools and infrastructure	13	1199	October 30, 2022
Dynamically Linking Rust Crates to Rust Crates compiler	40	11667	September 13, 2019
Follow up: the Rust Platform	31	11132	March 25, 2019
Link rust crate outside cargo? cargo	6	1225	December 22, 2024
Debian Rust Packaging Policy (draft)	19	7674	March 25, 2019

Using crates like dynamic libraries + native package managers relation with Cargo

TL;DR

More Detail

Related topics