Perfecting Rust Packaging - The Plan

Based on requirements from the Perfecting Rust Packaging thread, I’ve outlined a set of tasks, divided between Rust and Cargo, that must be completed to make packaging Rust easier.

I’m very interested in feedback, especially from distros other than Debian - many of the requirements here came from Debian.

This is mostly focused on packaging Rust and Cargo themselves, though there are some steps here that serve to make packaging Cargo crates easier.

Once this has gotten review, I’ll make sure there is an issue filed for each and begin working on them myself (though I am happy to have help).

References

Rust

Task: Compiler command line customization in the Rust makefile

Issue: https://github.com/rust-lang/rust/issues/29554

Distros often have a standard, custom, set of flags that should be passed to all compiler invocations when building the binaries distributed as packages for that platform (think things like hardening options). For example, Debian wants to pass -Wl,-z,relro during the link step.

To support this, every compiler invocation in the makefile needs to include the appropriate CFLAGS, CXXFLAGS, LDFLAGS, or RUSTFLAGS variable. See the Debian patch for guidance.

Task: Bootstrapping from previous releases

Issue: https://github.com/rust-lang/rust/issues/29555

Bootstrapping from arbitrary commits is too difficult for downstream distributors that want to bootstrap from their own binaries.

We’ll retire the current snapshot system, and bootstrap off of the previous stable compiler. This means that upstream will need to wait six weeks between making changes that are incompatible with the bootstrap compiler. At present, the snapshot compiler is regenerated rarely, so this is expected to be ok.

I expect that smaller distros will not be able to keep up with our six week release schedule, but a simple script should allow them to catch up on the bootstrap by rebuilding the intermediate compilers.

We’ll set up CI to ensure that the distro scenario of locally providing the previous compiler works for each release.

Make sure to consider what happens for stable point releases, which we have never done. We probably need to guarantee that all point releases from series N can bootstrap all point releases from series N+1.

Details TBD.

Task: Re-bootstrapping from the current release

Issue: https://github.com/rust-lang/rust/issues/29556

In some scenarios (I’ve forgotten offhand), distros want to rebuild the compiler using itself. This is more-or-less equivalent to starting the build at stage 2.

We’ll provide a way in the Makefile to do this easily.

Task: Bootstrapping unstable code from a stable compiler

Issue: https://github.com/rust-lang/rust/issues/29557

The stable Rust compiler requires the nightly Rust compiler to build. This is no good for distros that want to bootstrap from their own toolchain and don’t want to maintain a nightly toolchain just for building Rust. We’ll teach the Rust makefiles to deal with bootstrapping off the stable compiler transparently (the stable compiler already contains nightly features, just inaccessible).

Task: Dynamic LLVM

Issue: https://github.com/rust-lang/rust/issues/29558

Distro’s generally want to use their own system LLVM. Rust’s support for this has traditionally been spotty, but we currently do CI builds with a statically-linked 3.7. There’s demand for using dynamic linking for LLVM as well (Gentoo), so we’ll get that working and set up CI for it too.

We’ll guarantee in the future that Rust always builds with some recent release of LLVM.

See the existing PR.

Task: Add an i586-unknown-linux-gnu target spec

Issue: TODO

Debian’s 32-bit x86 distro uses i586-level cpu features while our’s is i686. We should be able to teach rustc about the i586 target so Debian can configure it as the host triple.

FIXME: per @eefriedman this may not be the right solution.

Task: Teach the makefile to mix in additional ‘extra filename’ information

Issue: https://github.com/rust-lang/rust/issues/29559

Some systems, like Gentoo, want to package multiple versions of the Rust compiler along side each other. There are several obstacles to this, but the obvious one is that the installed Rust crates need to not have conflicting names. While we have a mechanism for this --filename-extra, the extra strings appended by the current makefile are not sufficient to discriminate between arbitrary compiler revisions.

Most likely we will add a configure switch that specifies an additional string to hash into the filename extra, in addition to what we’re already hashing.

Task: Ensure that SxS installation of crates from multiple compilers works reliably.

Issue: https://github.com/rust-lang/rust/issues/29560

For SxS installations of arbitrary Rust compilers, there will be multiple copies of the standard library residing in the same path. Without care, rustc will see them as duplicates.

Neither @alexcrichton or I am confident that the current crate resolver correctly rejects crates that weren’t generated by the same compiler, though we suspect it works correctly in most scenarios.

Tightening this up further could have negative implications for e.g. a distributed Cargo cache, depending on how strict rustc is about rejecting crates it didn’t produce.

I’m not sure how real this problem is but probably needs some thought.

Task: Fix --libdir issues

Issue: https://github.com/rust-lang/rust/issues/29561

The way rust and rust-installer handles --libdir is broken, and at least in Gentoo’s case is unusable without patching.

See Gentoo’s rust-installer patch. And an old rust-installer PR.

Task: Disambiguate system-installed crates during resolution

Issue: https://github.com/rust-lang/rust/issues/16402

At the moment, if a Rust crate is installed to the system, then rustc builds that depend on that crate will fail with duplicate crate errors - rustc won’t tolerate multiple crate matches. This causes horrible problems for distros that are installing binary Rust crates.

Task: Update Homebrew packages

Issue: https://github.com/rust-lang/rust/issues/29562

The homebrew packages don’t have a dedicated maintainer, and the current recipe has flaws: the obvious one is that there is no cargo source tarball for it to use.

While it builds Rust from source it appears to install Cargo from binaries.

It also removes the uninstall script. I’m not clear on why. Perhaps Homebrew has its own uninstall mechanism?

I think we don’t actually need to take over maintenance of the Homebrew package, just update it for best practices once we get them sorted out.

Task: Produce packaging guidelines

Issue: https://github.com/rust-lang/rust/issues/29563

Summarize what we’ve learned into some general guidelines for packagers.

Potential topics:

  • Maintaining independently-bootstrapped Rust compilers.
  • Packaging Cargo libraries / applications
  • Generating offline docs

Cargo

Task: Publish source tarballs of Cargo releases

Issue: https://github.com/rust-lang/cargo/issues/2107

Although Cargo does have tagged releases, we don’t publish source tarballs (the GitHub auto-tarballs are of course broken because submodules). Distros generally prefer tarballs to git.

Update the make dist rules to publish source tarballs consistently with Rust’s own.

Task: Pair Cargo releases with Rust releases

Issue: https://github.com/rust-lang/cargo/issues/2108

While Cargo does have tagged releases, we don’t actually use those in the Rust releases, instead pairing rustc with an arbitrary recent revision of Cargo.

Update rust-packaging to pair rustc not with Cargo nightlies, but with Cargo stable releases.

Task: Make binary releases of Cargo

Issue: https://github.com/rust-lang/cargo/issues/2109

To pair Cargo releases with Rust our distribution servers must have versioned releases of Cargo to download. Add new stable dist builders and modify the Rust release process to also do a Cargo release.

Task: Validate that Cargo releases build with the corresponding Rust release

Issue: TODO

Distros need to build Cargo, and they want to do it with the version of Rust they are also deploying. Presently Cargo builds using some old, arbitrary version of Rust.

Set up CI to ensure that the Rust we’re releasing is capable of building the Cargo it’s paired with. Not sure the best place for this offhand.

FIXME: @alexcrichton thinks this is too much effort and I’m inclined to agree.

Task: Bootstrap Cargo without Cargo

Issue: https://github.com/rust-lang/cargo/issues/2110

Cargo is self-hosting and difficult to bootstrap on new platforms, moreso than rustc. This causes the most problems for systems that upstream doesn’t directly provide binaries for, like the BSDs.

Create some mechanism that can reliably bootstrap Cargo without running Cargo itself and add testing to validate that it continues to work. Needs design.

@dhuseby has a script he uses to bootstrap on OpenBSD that may be useful.

Task: Enable Cargo to work without any network access

Issue: https://github.com/rust-lang/cargo/issues/2111

Build farms need to build Cargo projects without hitting the network. Assuming that distros rewrite their Cargo.tomls to convert dependencies to local paths, then as far as I know the other source of network access is just updating the index.

Task: Compiler command line customization

Issue: https://github.com/rust-lang/cargo/issues/2112

As with the Rust build itself, when distros build and package binaries of Cargo applications, they want to be able to customize all command lines to all compilers.

In the Cargo case this means at least customizing the rustc command line; it’s not clear whether Cargo itself needs to provide facilities for customizing CFLAGS, etc. or if that’s the responsibility of build scripts.

This requirement seems to be at odds with design goals of the Cargo developers. Design work is needed.

Non-goals

Non-goal: Dynamic-linking support for ‘anti-bundling’

Many distros strongly prefer to use dynamic libraries, and have policies against static linking (see a recent Fedora thread).

While Rust does support dynamic linking, the default is to link statically, and almost all Rust crates do so - the obvious exception being rustc plugins.

The strongest reason for prefering dynamic linking is so that distros can provide security updates without recompiling downstream reverse dependencies.

This use case is not generally supported by current Rust, even with dynamic libraries. Details of Rust’s unstable ABI currently require all downstream code to be rebuilt when upstream changes at all. This is strongly enforced by the compiler itself.

Because of this, I do not see any practical advantage to promoting dynamic linking in Rust - it is likely to surface weird discrepancies between the dynamically-linked distro world and the statically linked upstream world.

Distros will need to adopt mechanisms to handle security updates by recompiling reverse dependencies. Barring major work on the security-update use case, this would be necessary even with dynamically linked Rust libraries. (TODO: any examples of how distros already deal with this - e.g. with Go)

This applies most acutely to the packaging of 3rd-party crates, not the standard library itself, so it will not become problematic until Rust applications begin to be packaged by distros.

I recognize that dylibs-for-security-updates is an important use case for distros, but solving it is a large problem that is out of scope for now.

Non-goal: Redirect crate dependencies to local source installations

Distros that want to package Rust applications built with Cargo generally want to package all their dependencies themselves as well.

This means that, when a distro is building a packaged crate, it wants to use its own package manager to install all the dependencies locally; then when Cargo runs it downloads no additional source code, instead building the deps from the local source.

Presently, we believe an acceptable solution to this problem is that distros that package crates will rewrite the Cargo.toml files, replacing the crates.io dependencies with local path dependencies, according to their own scheme.

Note: Gecko also wants to be able to use its own sources and prevent Cargo from hitting crates.io, but their use case is different - they aren’t repackaging their deps; they just don’t want to depend on external resources. There is still work to do here.

13 Likes

Thanks for synthesizing all this @brson, it's quite the plan!

I think that this issue is still valid for those who invoke the compiler manually, but this shouldn't be an issue affecting Cargo. Any native libraries picked up in system directories should have native= on their -L paths and the compiler will otherwise not look for Rust dependencies in these directories. In that sense I don't see this as a particularly pressing issue, I don't think rustc -L /usr/lib foo.rs is that common.

I believe homebrew does actually, they build everything into a local /usr/local/Cellar/rust directory and then symlink everything into place inside /usr/local. In that sense uninstalling is just removing symlinks and then deleting the data.

I wouldn't personally consider this too high priority, Cargo currently builds with 1.2.0 and I keep quite a close eye on the beta/nightly builds to make sure we don't regress. I don't expect that Cargo will start requiring super new versions of Rust any time soon. Always good to have CI though!

Wouldn't that be less convenient and more error-prone than to have Cargo configured to look for installed crates in e.g. /usr/src/cargo/crates as a substitute for crates.io in some sort of distro-build mode?

2 Likes

Not to argue with this, just a FYI: Tests (at least one in libcoretest) will fail once you get rust to compile on i586 hardware. i686 is the earliest hardware with SSE2. That's not simply a performance boost, it also affects the correctness of floating point calculations: pre-SSE2, basic arithmetic operations are sometimes rounded incorrectly. There has been some discussion about how to deal with this and similar problems, but so far there is neither consensus nor an implementation.

I'm going to say one last time that this is only hard because of your weird decision to unpack and bundle archives found in /usr/lib.

The way that i586 vs. i686 is getting discussed is really confusing. i586 usually refers to something like gcc’s -march=i586. i686 usually refers to something like gcc’s -march=i686. Rust’s default is something like gcc’s -march=pentium4.

I think this is almost possible with paths in .cargo/config -- it just needs a specification to look at all subdirectories under .../crates/ rather than having to enumerate all individual crate paths. Maybe just write this as crates = "/path/to/crates/" and ignore crates.io entirely if that's present.

This idea is still a bit of a kludge, but perhaps nicer than rewriting Cargo.toml.

2 Likes

Certainly nicer! Then a drop-in .cargo/config with the system-wide crate root should work in the source tree of any package that is shippable on crates.io.

You could have a helper script, or a packaging macro, to install that config file into the CWD; I suggest naming it cargo-embargo :smiley:

1 Like

I haven’t forgotten about this thread. Just working on other infrastructure stuff the last week. Sorry for the delays.

I am wondering if something like make dist would be useful to add to cargo that basically creates an archive of the crate and all its build deps in source form. This "full" source archive can then be distributed on crates.io and used as an upstream source archive that distros would then be able to cache and use to offline cargo build with whatever compiler flags and install for packaging.

I think this is similar to what go does which seems the easiest way to go until dynamic linking is working properly.

I don’t think a full-source bundle is what we want for distros. That makes it really hard to track what crates are built where, in case there’s ever an urgent need to patch and update one.

2 Likes

I agree that it makes patching libfoo used in multiple crates much harder as your cve checking and the like now needs to understand cargo dependencies and patches must be done to all of those crates until upstream releases an update with the fix.

That said quite often upstream releases are the ones making these fixes anyway as a person spending time looking at cve fixes flow in to my distro so it may not be so bad.

Getting cargo install to know a /usr/share/ type location to store rust crates that cargo can use to build for the package install would be much easier to directly patch but I wanted to give other options too.

I’ve just realised, for maximum compatibility on 32-bit x86, the stage0 rustc binary could be made into the generic i686 version. It’s only needed for stage0 and doesn’t affect anything that happens later so it’s a no-brainer really.

In other words, if the target is left unchanged the build process would yield the standard rustc exactly as before whereas a modified i686 target could be built natively on anything pre-P4.

Yes it will be less convenient.

Can you explain this? I don't believe I've ever heard you say this before and I don't know what you mean.

Is this something we are doing wrong? Does bundling an i586-unknown-linux-gnu target not make sense in your understanding?

The linked issue describes the use case from @jauhien, and passing rustc -L /usr/lib is exactly what they are doing. A point was also made on that thread that rustc can't even be compiled when it's already installed (though that is surprising to me...). This is apparently blocking Gentoo though I don't fully understand their installation model.

Edit: I've modified the text to state this is a rustc problem, not cargo.

There's two ways you can look at it. One is that our current "i686" target is buggy because it doesn't actually work on all i686-class machines. The other way of looking at it is that using target triples to specify CPU features is a bad idea because we try to attach too many different meanings to the triple, so the right solution is not to add an "i586" triple, but rather add some other mechanism to distinguish CPU features.

Either way, adding an "i586" target to rustc without any other changes would be extremely confusing.

I’ve filed issues on everything but the i585 triple, which @eefriedman has concerns about, and CI for building Cargo with its own release of Rust (since @alexcrichton is cool on the idea and it’s a lot of work for marginal gain).

@gus What do you think of @eefriendman’s concerns that encoding the i586-ness of the plattform in the target triple is inconsistent with gcc, not the right way to encode CPU features?

My understanding of the problem is that there are three cases for linking LLVM: static bundled copy, static system copy, and dynamic system copy. The first is easy to handle because it’s decided by the configure script so we know when we’re in that case and can do whatever we need to, so the only difficulty is handling the two system library cases.

The current solution involves using #[link(..., kind = "static")] for static libraries and #[link(..., kind = "dylib")] (or usually leaving it off because it’s the default) for dynamic ones, which is hard because pkg-config and similar tools don’t tell you whether the library you’re linking against is static or dynamic. The reason they don’t tell you is because in the C world you basically don’t care: in either case you just pass -lfoo when running the linker (there are edge cases that don’t work, but those don’t work with Rust’s method either). As far as I can tell there’s nothing preventing Rust from doing the same thing.

That behavior is what kind = "dylib" produces. kind = "static" searches the system for the archive file, unpacks it, and then bundles all the objects into the resulting rlib (or whatever you’re producing). This is useful if the archive is some bundled library that you don’t want to install separately (like the bundled LLVM case), but it’s a weird thing to do with a system library. Just using the “dylib” behavior for any system library should just work because it matches the C behavior that linking was built up around.

(As a disclaimer I can only comment from a Linux point of view. As I understand it things are more complicated in Windows. There are some more details and discussion of this whole thing (including some Windows stuff) in a previous internals thread.)