Using crates like dynamic libraries + native package managers relation with Cargo

Xiphoseer · August 12, 2021, 1:59pm

When it comes to dynamic libraries and monomorphization, I'm curious to know/understand whether some sort of opt-in cross-product-of-crates dynamic libraries may work, partly because the orphan rules exist and the dependency graph must have no cycles.

For example: serde_json defines a type Error. If mycrate uses thiserror for its own error type and includes a variant Json(#[from] serde_json::Error) then the impl From<serde_json::Error> for mycrate::Error can be monomorpized and included in a library rust1.54+std+serde_json+mycrate when compiling and packaging mycrate for a distro.

This is where the orphan rules come into play: even with specialization, only mycrate, serde_json or std would ever be allowed to provide that implementation, without even knowing whether mycrate depends on serde_json or the other way around. So given a set of enabled crates that a distro would like to provide this "early monomorphization" for, and a set of rules which functions to generate (e.g. "no generics on structs", "no cfg'd functions", "only these features for that crate"), it should be possible to find out whether any particular function should be pre-compiled both when packaging and when using mycrate.

So a new cargo subcommand that scans crates like rustdoc/cargo doc could invoke rustc to precompile all qualifying functions from these enabled crates and a hook could be added to rustc to check whether a function that is supposed to be codegen'd matches the crate list & rules to skip codegen.

That way, functions like

serde_json::to_string(value: SmallVec<[i32; 4]>) -> Result<String, serde_json::Error>

(in std+smallvec+serde_json) or

ring::hmac::sign(key: &ring::hmac::Key, data: &[u8]) -> ring::hmac::Tag

(in std+ring) might someday be commonly available in distros.

When that is the case, it might become feasible to configure rustc to use the intersection of the list of enabled crates and rules from packaging with those from other major distros by default.

The idea is for this to be opt-in and easily automated. A distro would have a single "sysroot-precompile-config" and rustc would write out a list of required libs while packaging a crate. If that list is empty, great! If it includes std, std+serde and a handful of others, it should be easy have a script add those to the dependency spec of the published app package.

atagunov · August 12, 2021, 3:19pm

Premonomorphisation discussion seems highly relevant, doesn't it?
There @mcy suggested to take inspiration from C++ which can

// Explicitly instantiate the template.
template void Foo<int>(int);

explicitly include particular template instantiations into a given .so

Extremely interested to know how the set of packages for Ubuntu 20.04.1 LTS is compiled.
I find it hard to believe this isn't done centrally in one big batch.

Xiphoseer · August 12, 2021, 7:31pm

atagunov:

Premonomorphisation discussion seems highly relevant, doesn't it? There @mcy suggested to take inspiration from C++ which can
// Explicitly instantiate the template.
template void Foo<int>(int);
explicitly include particular template instantiations into a given .so

Yes, thanks for pointing that out! As dynamic libraries don't appear to be included when they're not actually used (just tested with libz-sys and libz) – if there was a way to do

#[link = "std+smallvec+serde_json"]
extern "Rust" {
    fn serde_json::to_string(value: SmallVec<[i32; 4]>) -> Result<String, serde_json::Error>;
}

in a synthetic std+smallvec+serde_json crate (?) that links a std+smallvec+serde_json.so generated by the scanning tool I talked about using that explict premonomorphisation and gets picked up by -Zshared-generics, that would work. In that case, it might even make sense to have a -Zshared-generics=extern that only uses these ones.

jan_hudec · August 12, 2021, 8:29pm

The main reason for wanting dynamic linking in many cases is to be able to fix security issues in applications even when you don't have the source and can't reach the vendors, or if you do, without having to rebuild the whole universe.

The best example for this is OpenSSL. As bugs are found in OpenSSL, the package maintainer can just patch that, ship a new shared library and all the applications are patched. Even the ones where the authors went out of business and the source code is long lost.

My understanding is that it is the reason Debian policy requires dynamic linking unless not possible. It is also explicit requirement in my current project at work, because the team does not have the time to rebuild the application for all customers whenever some important dependencies change.

RedHat also goes to great lengths in OpenShift to be able to rebuild containers on new bases and I've seen a tool to do it by reassembling the layers of existing containers.

Now this makes sense for some kind of libraries. Cryptography is an important example. But it also applies to other system services. E.g. network library like curl can transparently add support for http-2, dns-over-https or ipv6 to existing clients, or a GUI library like Qt or Gtk can switch rendering backends.

This kind of libraries have relatively small, high level API with a lot of functionality behind it, so the overhead of polymorphisation and the resulting dynamic dispatch is negligible relative to the time spent doing the actual work while the benefit of being able to replace large parts of it without having to recompile the clients is big.

On the other hand libraries that provide data structures or computation algorithms benefit greatly from monomorphisation, inlining and link-time optimization while there is very little on them that can be usefully changed without touching the interface anyway.

So my suggestion would be to explicitly handle this as two separate cases:

For the low-level libraries, continue to default to static linking with all the monomorphisation.
For the high-level libraries, provide explicit opt-in support for polymorphised API and encourage the libraries where it makes sense to use it. abi_stable can be considered a prototype of this.
- As a bonus for this subset it should be mostly easy to define bindings for other languages (C, C++, Python, JS from webassembly etc.).

Tom-Phinney · August 12, 2021, 9:10pm

I find this rationale so compelling that I feel it deserves its own thread and thoughtful proposals for support, both in tooling (e.g., Cargo) and in high-level documentation guidance for developers.

@Shnatsel

ShadowJonathan · August 12, 2021, 9:29pm

Concurred, this sounds very interesting as a potential component or feature to rust, to be able to dedicate an api surface to more limited rules in exchange for shared library linking and such. This could help a lot with the original argument, not yield to “we’ve always done it this way” feelings, but address the actual problem here.

Tom-Phinney · August 12, 2021, 10:34pm

The biggest problem I see here is to ensure the authenticity of the dynamic library while still permitting upgrades by responsible parties other than an absent, perhaps deceased, developer. Otherwise this just creates an easier way to mount MITM and other security attacks via fraudulent, Trojaned versions of a library.

atagunov · August 12, 2021, 11:31pm

Would distro (Ubuntu, Debian) maintainers not be the right party to make the call?

jan_hudec · August 13, 2021, 5:42am

For system components if an attacker can replace system components, they already own the system in many other ways anyway. So the operating system protections are the right tool, and the distro maintainers are the ones to ensure they ship sound versions. The system admin has to trust the OS vendor already.

For protecting applications from each other—when the applications shall not be considered fully trustworthy because the user installing them does not have enough information or skill to decide how trustworthy they really are (like Android or Windows Store apps)—the linking should only be allowed from the application sandbox to system components or system add-ons with higher trust level. This has to be enforced at the sandbox level (Flatpak, Snap etc.).

Language-level protection like signed dlls in dotnet is mainly for protecting the software author against using their software in unauthorized way, and I don't think Rust needs to handle that in the standard. Mostly the author can always opt to linking statically anyway, at the cost of having to provide more security support.

I would also add that the sandboxing mechanisms may be limited to inter-process APIs. Windows RT went with DCOM, Android has its own API based on the binder and unfortunately Java. Linux is supposed to use D-BUS, and it does in systemd and I think also in snap and flatpak, though I fear it might need some performance improvements to be really viable for the mid-level work like network access. So I think when defining the API subsets for dynamic libraries, easy wrapping (via appropriate bindgen-style tool with minimum extra input needed) should be included in the design goals.

infogulch · August 14, 2021, 5:28am

This is a great cut. Perhaps the explicit restriction to only support polymorphic-appropriate apis will help shake out which libs would be good candidates. (no serde, yes ripgrep, etc)

There is a storied history of necessary security considerations for using dynamically linked libraries, the long and short of it is if you load a compromised shared library you're screwed, so.... don't do that. The how of "just don't do that" comes down to policy and system design which all lie squarely on the OS vendor and shouldn't really affect Rust's implementation.

I wonder if the abi could be designed in such a way that it would allow libraries to be unloaded and new ones loaded live on the fly, without restarting the process. I guess this is typically not done because a process often entangles itself pretty deeply with its libraries and cleanly disentangling them is not worth the headache so restarting the process when a library changes is common. However, with lifetimes perhaps it's possible to have a process pause, cleanly unload a library, reload a new version, and resume knowing that nothing was missed because use of the old library is tied to its expired lifetime. Maybe with tech like Lt<’a> - lifetimes for fn! being discussed elsewhere right now. This is also kinda similar to how Thesus OS is designed to reload and relink crates/cells at runtime.

PoignardAzur · August 27, 2021, 9:52am

I've been thinking about this kind of use-case lately. I think this conversation is a bit pointless without actual, real-world numbers, which I'm not sure how to get. Stuff like "how often does the average crate publish a new patch version?" or "how much of the average Rust binary could be shared with another average Rust binary?".

I suspect most of the potential to share would be in large popular crates like clap, regex, tokio, etc. For these crates, the ecosystem could agree on one "canonical" pin per Rust compiler version; kind of like a fixed Cargo.lock that changes every three months.

That way, package managers can ship the canonical pins as dependencies for Rust binaries, and get the benefits of symbol sharing. It would be on a best-effort basis, though, since Rust binaries might want to use their own lock instead of the canonical one (for instance, if there's a more recent version with a critical security fix).

That being said... I'm not sure this would actually be useful? Like federicomenaquintero said:

We'd need some actual numbers to know how important this is.

mathstuf · August 28, 2021, 6:36pm

The closest thing available might be Haskell. It supports shared libraries, does monoporphization and such, has generic code that needs to be handled across the ABI boundaries, etc. The details might differ from Rust, but for determining "how many resources are taken by a statically built stack" versus a shared library stack can probably be estimated at least. There are super generic Haskell libraries that probably have almost zero code until monomorphized (this might be close to serde) and others that are mostly concrete and probably act more like C libraries in their ABI surface (I don't know of one off-hand, maybe parts of XMonad?).

Again, I would look to Haskell here. Stackage curating a set of "blessed" versions has made getting a deptree to agree on compatible versions far easier.

jan_hudec · August 29, 2021, 8:04pm

I don't think sharing is a compelling reason except for maybe a few system libraries. Even on iOS the only libraries that are actually shared is the standard runtime and the GUI framework, as the package manager does not support 3rd party library packages anyway.

The real use-case is being able to update the library without having to recompile all the applications that use it—whether to fix a security-critical bug (think TLS implementations), add compatibility support (think newer version of network protocol, newer version of TLS being important sub-case) or general tweaks (UI library changing look&feel of standard widgets). And honestly, those are the actual reasons why Apple wanted shared libraries in swift too—a lot of changes are done to the system libraries that the applications are supposed to pick up automatically.

PoignardAzur · August 30, 2021, 9:54am

I don't think that's super useful.

For another language, sure, but one of the points of Rust is that it's pretty hard to to accidentally break compilation for your reverse dependencies. Package managers could just maintain a fork where they regularly update Cargo.lock files and do no other work, and get most security updates that way.

mathstuf · August 30, 2021, 12:32pm

Patching .lock files is tedious and annoying. There are perpetual conflicts and it's just better to let cargo do what it needs.

I think it would be easiest if it were possible to populate a local cargo index of crates. Installation involves bulding, testing, and then installing the crate (and any binaries, doc files, licenses, etc.). Then distros could install packages into the registry, update it as a post-install script, patch their packaging rules to only pull from there, and let the package manager control what is available and necessary that way.

nacaclanga · August 30, 2021, 1:45pm

I think it is justified to say that one needs the ability to create dynamic linkable system libraries and a few use facing libraries for complex tasks as well. However just asking for a stable dll-abi fells a little be short sighted. Shouldn't such libraries also be FFI friendly and support 3rd party languages? Should the dll loading mechnism really just copy past its basic design from C, when we know that it has quite some flaws?

I think Rust could support a design quite well, where an dynamic library written in Rust exposes an C ABI (which is also very FFI friendly) and this ABI is then again wrapped into Rust code. One could probably build some tooling to simplify this process. Possibly one could also define a superset of plane C, that supports some more features like simple trait objects and maybe some kind of algebric datatypes, that could still be called from C, but with some more hassle. (Notice that C++ does also not have an dll-abi that supports all its features like generics etc.)

All more ambitious cases, could probably be handeld with 3rd party tooling (like abi_stable - Rust), that can more flexible respond to the particular application.

mathstuf · August 30, 2021, 2:09pm

The C++ language doesn't, but the primary implementations make this guarantee. Nothing stops a hypothetical Rust compiler from saying "we guarantee that we will always layout a type the same way for now and ever more" and actually making stable ABI guarantees like the C++ compilers do. It'll be hard, but it's doable. Note that things such as struct packing and niche creation then become impossible to add (or remove) in the future without explicit attributes on the types (and doing this breaks the ABI of the crate adding such attributes). It seems today that rustc is not willing to commit to that today (or probably ever ). But, this path is possible (as C++ has shown), but it also leads to the "did you compile everything with the same compiler?" debugging questions when errors inevitably crop up in C++ when wacky behaviors rear their head.

CAD97 · August 30, 2021, 5:57pm

Citation needed; C++ is having a lot of issues presently due to not being allowed to break ABI. MSVC isn't going to support [[no_unique_address]], std::unique_ptr is more expensive than a raw pointer, std::regex can never be fixed, all due to requiring breaking ABI.

atagunov · August 30, 2021, 6:29pm

All this will not be generally possible at the same time as allowing inlining and monomorphisation to smuggle code across crate boundaries right? Seems there is a choice

support shared libraries for the sake of sharing (only precise version match supported)
build a brick wall between crates and support bug fixes, new protocols, look&feel changes ...

This brick wall could well look like this:

...but it's still a brick wall, it's not the normal Rust dependency between crates

nacaclanga · August 30, 2021, 6:43pm

Are you sure? I am under the impression, that the C++ compiler does not mandate anything at all here, but the implementations agree on a standard that supports most, but not all features. In particular templates cannot be shared, but must be statically linked via the header file.

Topic		Replies	Views
Idea: Light-weight reusable dependencies tools and infrastructure	13	1195	October 30, 2022
Dynamically Linking Rust Crates to Rust Crates compiler	40	11601	September 13, 2019
Follow up: the Rust Platform	31	11110	March 25, 2019
Link rust crate outside cargo? cargo	6	1194	December 22, 2024
Debian Rust Packaging Policy (draft)	19	7638	March 25, 2019

Using crates like dynamic libraries + native package managers relation with Cargo

Related topics