Dynamically-linked stable shared subset of the Rust standard library โ€” using the C ABI?

Recently, I've noticed that the overhead added by Rust's standard library to every Rust binary is too small! Every Rust executable already contains a symbolicator and a parser for external debug symbol archives. The archives may be compressed, so every Rust executable also includes a gzip decompressor. However, the archives also support zstandard compression, which doesn't work with libstd's backtraces! It's because the normal build of the standard library for some reason chose not to also bundle a full zstandard decompressor into every Rust binary.


It might take a while before Rust has a stable ABI, especially one that is complete enough to support all of libstd, and has mechanisms for long-term forward- and backward- compatibility.

However, dynamically linking all of libstd using a Rust-native ABI isn't necessary to get the benefits.

How about creating (an optional build of) libstd that depends on a simpler libstd-private dynamic library just for its common bulky code?

libstd would continue to be statically linked, but it could call to a shared library for some of its functionality, like the backtrace support with gzip and zstd, and select stable and boring parts of libstd that could be put behind a C ABI (rng, helpers for fmt, unwinding, Unicode routines). The compatibility risk for this is low, because newer versions of libstd can simply choose to stop using some no-longer-useful parts of the private library, or abandon it altogether.

This would allow operating systems start shipping base libraries for Rust. Even if it's not going to be much in the beginning, it will help establish processes for doing so for OS vendors and software distributors.

It could also be a way for operating systems to start adapting Rust-written software for the OS, e.g. instead of every Rust binary having its own copy of miniz_oxide, the OS-provided libstd helper could use the OS's preferred zlib implementation.

It may also help the Rust project to have its own motivation and real-world testing for features like #[export]/#[repr(crabi)].

3 Likes

For backtrace symbolication I've had the same idea for a while now. For the rest of the things you mention I think there is either not much benefit (rng (that is just a couple hundred bytes for open("/dev/urandom") + read() or getrandom(), right), unwinding (I don't expect this to amount to much code if you exclude backtrace symbolication)) or they are unstable enough that we did have to abandon usage of the standard library again within a year (fmt (it would also make changes to reduce the formatting size overhead harder), unicode routines (a new version of unicode is released about every year))

Dynamically linking the libstd.so for the rustc version with which the libraries got compiled should already be possible, right?

It's already possible for a single compiler version, but that is too narrow. Not only it has 6-week expiration date, it doesn't even guarantee that a slightly different compiler build will be still compatible. That's very unappealing, even in tightly controlled environments.

I imagine that some libstd-basics.so would have compatibility across a wide range of Rust versions, so if e.g. Debian shipped one, I could use almost any compiler version I want to target it.

I know there's a risk of it becoming outdated when libstd changes, but if it's not an API for Rust binaries directly, only a libstd back-end, then libstd can stop using obsolete parts of it whenever it wants. For example if the .so ships fmt_v1, and the next version of libstd refactors std::fmt, it can just stop calling fmt_v1 and go back to using its own statically linked code. That may happen, but the cost of the .so being in the OS is nearly free, so it's still a benefit while it works, and no worse than the static linking status quo when it doesn't.

Evolution of the shared .so should also be possible, if there's a way to specify minimum API level required (e.g. tell Rust 1.99 to target .so v81 because that may be what Debian has, and libstd would use the .so only for functions that were in that version and are still relevant, and statically bundle the rest). The .so could get a new fmt_v2 in a new Rust version, and when libstd is built targeting new-enough .so, it can start relying on it.

The only thing I can realistically see that we will have compatibility for several years is the backtrace symbolication. That is just address -> list<(demangled symbol, filename, line, column)>. Anything else I can almost guarantee you that by the time Debian does a release, the latest rustc stable already can no longer use half of the api's in libstd-basic.so that you listed as suggestions and it only gets worse over time until the next Debian release.

That would make the binary bigger then, which may well stop us from making genuine improvements because it inflates binary size too much.

Distros can just ship the libstd of whichever rustc version they use, right? And if they update rustc, they can provide multiple packages for the different libstd versions until all packages get recompiled with the new compiler.

1 Like

Why not? One libstd.so.1.xx.0-<compiler hash> in the library lookup path every six weeks, prune after dependents don't need it any longer, doesn't sound like all that much. I wouldn't expect it to accumulate forever.

And "tightly controlled environments" makes me think of rebuild-the-world things like embedded distros, which should make sticking to a single version even easier... or your control is not tight enough :stuck_out_tongue:

I'm not so pessimistic. I don't see that many deep changes in libstd any more. It does tweak things here and there regularly, but I don't think the changes are dramatic enough to need ABI breaks and render previous implementations completely unusable. Debian's frozen-in-time version will miss out on recent bug fixes and performance improvements, but that's Debian's definition of stability. The Rust project can still release the best new build of libstd-basics.so every 6 weeks, only needs to tolerate older versions.

For example, std's remove_dir_all_recursive had a small bugfix a year ago, but the rest of it was mostly unchanged for 6 years, and the same API would have worked for at least 9 years.

Back-end of std::net is full of functions created 8-10 years ago, with an ABI break 3 years ago (which barely missed getting into Debian stable).

I'm not sure if I understand you reply. With what I have in mind, the binaries would never be larger than they are in the current setup. The status quo we have today, of having nothing shared, is already the worst possible scenario.

Do you mean that in the future, once people are used to smaller binaries, going back to statically linking libstd would be an unacceptable overhead and hold back improvements?

I'm assuming the cost of the shared lib is amortized, so it doesn't matter if it's 2x or even 10x larger. C devs don't even know how large libc is.

An issue with this idea is maintenance of stable versions. Currently the Rust project maintains nightly, beta and the most recent stable. Find a security issue or critical bug in stable - 1? It's not maintained. This seems like a reasonable approach based to not create too much work.

With a libstd-basic that should be stable over several years, there will be a need to backport fixes to one or more LTS branchs of it, and release new point releases. That is extra maintenance burden. That seems like a waste of time and effort to me (as does the LTS approach in general to software).

(I have had more issues with LTS distros than I have ever had with Arch Linux that I have now been running for several years. With Ubuntu LTS or Debian there would frequently be critical bugs with suspend/resume or graphics that I reported that were just ignored and never fixed until the next LTS happened to fix it. With Arch I have never even seen those types of bugs, and what bugs I have had were fixed within days or weeks as the next version came out. LTS is a sham.)

1 Like

Yes

2 Likes

The problems arise when one tries to use an LTS distro to run modern software. I don't know how many times I've seen a project want to use bleeding-edge GCC but policies require everything else to come from an LTS distro and then complain that CMake isn't up-to-date. It really boggles my mind how CMake isn't considered eligible for an update in such a situation when the freaking compiler is, but that's what happens with checkbox compliance :confused: .

Well, updating to a new version of unicode could be achieved by updating the shared library, without recompiling the program, right? If so, that would be a win

Having self contained binaries is actually a huge plus most of the time I feel. For terminal programs on Linux my preferred method of distribution these days is (outside of Arch package / AUR package) as a statically linked musl binary. This is the simplest for me as upstream, and when I'm the user it is also simple. The only issue is how to handle additional files (tab completion, man pages, systemd units etc).

So I would hope that this idea (if it is ever realised) would be opt-in (or at least opt-out).

1 Like

Every Rust executable already contains a symbolicator and a parser for external debug symbol archives. The archives may be compressed, so every Rust executable also includes a gzip decompressor. However, the archives also support zstandard compression, which doesn't work with libstd's backtraces! It's because the normal build of the standard library for some reason chose not to also bundle a full zstandard decompressor into every Rust binary.

It sounds like the main motivation is to dynamically link backtrace-rs (or some subset of it). Which seems much more achievable in any reasonable time frame than trying to stabilize all of std's internal interfaces.

1 Like

Wouldn't build-std also solve this? If you know you will use zstd for debug info, build std with that feature flag. If you want to just get address and symbolicate afterwards, you can build without most of that code. Etc.

How is that effort going? Wasnt that a project goal or something?

Stabilizing an MVP build-std is a goal but I'm told that setting std features won't be part of that so as to keep the initial scope as narrow as possible.

Leaving dynamic loading aside, I'd be thrilled with an option to simply not link in the backtrace symbolication code at all, and to instead ship a separate executable with the Rust toolchain that accepts a non-symbolic backtrace and a Rust debug binary and produces a symbolic backtrace.

3 Likes

I feel that's the same conclusion as the other thread. :slightly_smiling_face:

Ok, I made a very quick PR that should allow simply omitting symbolisation from printed traces Add experimental `backtrace-trace-only` std feature by ChrisDenton ยท Pull Request #143910 ยท rust-lang/rust ยท GitHub