Perfecting Rust Packaging

Noted.

Huh. I'm sad to say I've looked at this patch for a while and still don't see what the bug is. Can you explain in more detail what the patch is working around? Is there a filed issue? (I know that --libdir has always been in some state of broken)

This is a common theme!

cc @alexcrichton

Ah, interesting point. I've opened an issue: Option to control -Werror in makefiles · Issue #28599 · rust-lang/rust · GitHub

Ah, yes, that makes sense. Although we (try to) tie that extra bit to the version number. Perhaps we're just not capturing enough metadata to differentiate the versions of Rust you are installing together?

In fact, the code looks wrong. It's hashing an undefined variable...

Would hashing the version number as intended be enough to distinguish your sxs installs?

Edit: After investigation, this does actually hash the version number correctly before CFG_RELEASE is defined. I don't understand why offhand.

Thank you, my fault, I haven't seen it. May be adding a link to the signature to the download page would be a good idea. The same for sha256 file.

The problem with these lines is that they lead to failure in case if libdir which was used during compilation starts with lib/ (our case). Also changing libdir during installation is broken anyway, so makes no sense, as it gets hardcoded in the compiler during build by catching env variable.

It could work for some environments, but not for Gentoo. We have two Rust packages: binary and source. Now

  1. you install binary with already defined suffixes (these hashes)

  2. you want to install source package (e.g. for testing or whatever else reason, or may be you were bootstrapping using binary package). If suffixes are the same version hashes as in 1, your installation will not work properly. That's why we are changing suffixes to the custom ones for source packages this way that we have guaranty that only one package with such suffix can be installed in the system (we have one suffix per slot which corresponds to release upstream channel).

I have created an issue with detailed explanation and steps to reproduce: Installation is broken when custom relative libdir path starts with lib/ or when libdir is changed during installation · Issue #28627 · rust-lang/rust · GitHub.

OK. Ugh. The way I want this to work is that setting --libdir during configure doesn't mess at all with the installer's layout (so this case wouldn't trip incorrectly with custom-named lib- prefix files), and passing --libdir to the installer puts it in the right place (this is how --mandir works). The obvious problem here is that the relative path to libdir is hard-coded into rustc. Making this work may be more trouble than it's worth so I left a comment on the bug about a simpler stop-gap.

Hm, I still think this scenario should work if the release channel were hashed, since a default source build is on the 'dev' channel, not 'stable'. Would hashing the channel help? If not, we could add a configure option to mix in more extras. Feel free to submit a PR in this area if it would help you.

On the issue of side-by-side installs, neither @alexcrichton or I are clear on what actually happens when two copies of std, from two different toolchains, are installed to the same location. We believe this should be workable, but don't know any reason why it would work reliably with the current crate resolver.

My next steps here are to summarize what we’ve learned into a plan to resolve the upstream problems in Q4, as well as some guidelines for packagers. Stay tuned.

5 Likes

@gus @jauhien @cuviper @alexcrichton et al.

I’m thinking about the requirements for teaching Cargo to allow the local system to override dependencies, and specifically wondering how a packager might extract the library artifacts from a Cargo build in order to put them at the desired location locally (presumably somewhere under /usr/lib). Are you able today to get the precise information out of Cargo to identify which files to copy somewhere? Does Cargo need to do more to make it clear which of its outputs are the packagable artifacts?

Another question: if we modify Cargo so that it can use locally-available rlibs, as packaged by the distro, then the distro is going to accumulate a bunch of rlibs in /usr/lib. Because of the way rustc works now those will all be invalidated when rustc is upgraded - rustc will not be able to use them at all, and will likely just pretend they don’t exist because they ‘belong’ to another compiler. The obvious way I see to fix this is to make all Rust crate packages depend on a specific version of the compiler. I know that sucks, but is it doable?

Edit: oh, @gus I see that in your previous thread, you said that rust packages would include the source in Debian, not just the rlibs, and I also recall this was a subject of the anti-vendoring thread in Fedora. I would not expect distro packages of crates to include the source, but instead the rlibs, just globally invalidated in some way when the compiler version changes. Is this difficult to do?

If Debian packages simply dropped the source code onto the system with the expectation that Cargo rebuilds it every time its needed then there will be lots of extra builds.

Does Debian really want to package the source code, and not the rlibs? If it packages the source code where would it put it on the local system for Cargo to reuse? Either way is going to require cargo and/or rustc changes.

Any other distro maintainers have opinions about - given Rust’s compilation model and ABI limitations - whether their packages of Cargo crates will simply install the source code, or the rlibs?

@gus Your use case for distributing packaged cargo apps I think works like this:

  • User says ‘apt-get install rust-application’
  • apt installs the source of all deps somewhere locally (where?)
  • apt installs rust-application, and in the process of doing so compiles it and all its deps locally. Cargo automatically knows to use the local source, not the crates.io source.

So the distro package manager, which is usually distributing binaries and not building locally, is actually distributing source code and building locally. Is this right? Is this really what you want to have? Can we skip straight to distributing rlibs and not do this?

The ABI issues are exactly what makes shipping source-based libraries seem like a lesser evil. As I mentioned, this appears to be what golang packagers are doing in Fedora already.

Yes, you could make a Rust crate package (with rlibs) depend on a specific rustc version. Then you effectively still have to rebuild them all to get a rustc update out into the distro, so I'm not sure it's really any help. Even then, is there any guarantee that an rlib from rustc-x.y-1 (original release) is compatible with rustc-x.y-2 (perhaps a recompile with some security patches)?

No, that's Gentoo -- I think these would be statically-built binaries. They would have a build-time requirement on the crate-dev package, with its crate source, but the final product would stand alone.

1 Like

No, the opposite is guaranteed :slight_smile: They will be forcibly incompatible.

Right, so if any tweak to rustc breaks all rlibs, there's really no use in shipping rlibs at all.

Hm, ok. So under this scheme the binary packages for Rust libraries would basically be useless right? IOW, the builders when building 'rust-application' will use the 'rust-dependency-dev' package, but there is no practical use for the 'rust-dependency' package from the perspective of users of 'rust-application'; installing 'rust-application' will not install 'rust-dependency'.

Will there still then be binary (rlib) packages for 'rust-dependency' and what role do they play?

Right. The user could still install the crate-dev package for their own local use, if they like. But it wouldn't be needed for rust-application. This also simplifies rustc upgrades a little, as only the leaf packages with static binaries have to be rebuilt. (If you even want to rebuild them at all -- if there's no security aspect, you may not need to.)

I think these shouldn't be packaged at all until there is some ABI stability. Maybe that will come to dylib first -- my intuition says that may be easier -- then we'd stop static-linking application binaries and use dylibs instead. Or if some library crate exports stable C-FFI, that could be packaged too.

@cuviper I’ve reread @gus’s previous proposal and it reads like library packages themselves install Rust source locally, not the -dev packages. @gus can you clarify?

Exact naming is probably just a bikeshed color. I would generally expect a subpackage that’s only used at build time to be called -dev on Debian or -devel on Fedora. If the crate has any kind of [[bin]] of its own I’d put that in the base package, otherwise there doesn’t have to be an installable base package at all.

@jauhien I’ve been thinking more about side-by-side install and I have one more question for you.

So I understand how having control over the ‘filename extra’ helps you keep multiple copies of Rust’s libraries in /usr/lib, but I’m wondering about the contents of /usr/lib/rustlib. This directory contains artifacts that are not Rust crates, and so do not contain the hash, most notably compiler-rt.a. Without special care these will clobber each other if multiple instances of Rust are overlaid on each other. Are you dealing with that in any way? Should we?

FWIW, I noted in bugzilla that Fedora’s clang already ships the equivalent to compiler-rt.a as libclang_rt.builtins-x86_64.a. I hope to configure Rust to just use that, but I haven’t researched how yet.

(Sorry if I’m replying too much. I even got a popup warning not to monopolize the discussion. I’ll try to step back for a while…)

There is no automatic way to deduce this currently, but I think this is obvious enough for humans and that's ok. Eg: for a "source based" library you just copy the entire source tree; for some executable app you just copy the (few) output binaries; for some hypothetical "binary rlib/dylib library" package you would just copy the (few) rlibs/dylibs.

If/when packages start including complex application data as well, then yeah we'll need to make a more powerful "cargo install" to work out what goes where - but right now I think this is a low priority.

Yep, this is technically easy to do in the packaging metadata. As you go on to discuss, the implications of this are that we'd need to rebuild every Rust library package when a new rustc was released.

Just to complete this line of thought, we could in fact package and reuse dylibs by using similar tight restrictions at the distro packaging metadata level and it would work just fine with the same caveat that we have to rebuild everything (this time applications too) with each rustc release.

My proposal (and current plan) is to package libraries as source and not distribute rlibs/dylibs at all, for the sole reason that this format is more portable across compiler revisions. The downsides are additional cpu cycles at (application package) build time, we need to rebuild all affected application packages whenever a library is updated, and there will be some library out there somewhere that has a license that won't let us ship source (but I don't care about that right now).

I haven't thought too much about "plugins" yet (both rustc compiler plugins and any project where the "deliverable" is a .so library, like a hypothetical pam module written in Rust). I have a suspicion they might require a tight version requirement on the compiler or std dylibs and perhaps need to be rebuilt on every compiler release. Provided the number of such packages is small, we can deal with that.

Does Debian really want to to package the source code? No. We're just looking for what looks like the best tradeoff within the current limitations of the Rust toolchain and ecosystem. I expect/hope this will evolve quite a bit as we get more Rust applications "in production" and the ABI stability story matures. (My beard is showing, but yes I remember the a.out -> ELF transition that C-on-Linux went through for basically all the same reasons :wink:

As @cuviper also clarified, no this isn't correct. Debian (and just about every other distro - notably not gentoo) have a clear distinction between "source" packages and "binary" packages (Debian nomenclature, but the idea is the same in Redhat, etc).

Note in particular that "binary packages" often include libraries - it's anything that is the "output" of the package build process. The separation of pre- and post- build also results in a sharp distinction of "build-time" ("build-deps" in Debian speak) and "run-time" dependency relationships between packages. I expect the "binary package" jargon and the fact that they might include "libraries" is confusing, and I wish I had a whiteboard handy to draw boxes with arrows.

Each upstream project is bundled up as a "source" package (which typically contains source, duh), and the distro machinery centrally compiles that into (possibly multiple) "binary" packages (which typically contain binary executables, shared libraries, or data/config files of some sort). Regular end users download and install binary packages only. This is good because it doesn't require CPU on the user end, and the run-time dependencies are typically much fewer/simpler than the build-time dependencies.

In "source based distros" (Gentoo is the major example, but also OpenEmbedded/buildroot/etc), users download the source packages and do the compile locally - with the help of the packaging tools. Requires lots of CPU, but allows them to have enormous flexibility in exactly how that gets built. The embedded folks like this because they can get the ultimate size and flexibility in their output. I've never understood why Gentoo users do it :wink:

So: my "source-based Rust libraries" plan is to have:

  • Debian "source" packages that include whatever library/application Rust source.
  • The Debian "binary" package for a Rust library will just be the source, installed in a known directory somewhere (ie: the Debian package build step is basically a no-op).
  • The Debian package for a Rust application will build-depend on (probably several) Rust library packages. The build-deps will ensure the Rust library packages (sources) are installed before building.
  • The application package build step will run rustc (via cargo) to compile the application and all the relevant library sources. The resulting (statically linked) executable goes into the application's Debian binary package.
  • Note the application package has no run-time dependency on the libraries. When the end user "apt-get installs" the application package, they get just the statically linked executables, with no need to ever know about the library packages.
  • A security fix in one of the library packages requires a rebuild of any application packages that build-depend (at whatever depth) on the library package (this is visible in the Debian metadata).
  • A new rustc doesn't require anything to be rebuilt, but we do need to ensure that all applications are able to be rebuilt.

This means: The rust-library "binary" packages will be basically identical to the rust-library "source" packages, except with different path prefix, and probably patches to Cargo.toml applied, etc. rlibs won't be used anywhere, except libstd.

(Apologies for my posts being so lengthy, it's hard to gauge how much background is already understood without body language.)

2 Likes

Re: shipping source. Others already commented this is how Go is packaged. This is also how Common Lisp is packaged in Debian, because there are too many Common Lisp implementations to make binary packages for all implementations. Common Lisp libraries are packaged in source form, and compiled to binary form (Common Lisp has many native code compilers) locally at install time in post-installation script.