Perfecting Rust Packaging


#21

No, I meant that the binary package naming/dependency metadata should be so that you can install librust-std for the host platform using just its short name without the target triple (using Fedora as an example):

dnf install rust-std

Or, if side-by-side installation is supported:

dnf install rust-1.4-std

rust-std may then be a virtual package name to pull in the package for the current stable version (and the appropriate target triple suffix, if the actual binary package name is ramified with it).

Same for host-targeting rustc and cargo, if separate tool binaries are needed for different targets. The general principle is, packages for host tools and libraries are available by short names, but naming of cross-compilation tools has to designate the target triple.


#22

It appears there is precedence for this. There’s an exception noted under Fedora’s guidelines for static linking for all OCaml programs:

Programs written in OCaml do not normally link dynamically to OCaml libraries. Because of that this requirement is waived. (OCaml code that calls out to libraries written in C should still link dynamically to the C libraries, however.)

It appears golang packages are also static linking everything, though I don’t see an explicit exception for that. I’ll have to look closer at how those languages are dealing with this. At a quick glance, it looks like golang libraries are just shipping sources, like @gus suggests here. Maybe it’s not as strict as I thought.

If there’s a system version of a library, those crates should definitely be dynamically linking that, not bundling their own. Those crates will have to be fixed as part of their initial packaging process.

It’s not just about new point releases, but also in case rust needs to be rebuilt for any other reason. It could be a bugfix or security patch we want to apply, or it could be a simple rebuild due to other changed dependencies, say a new LLVM soname.

Once we get bootstrapped, the Fedora build root will have whatever rustc package is currently stable. So if we’re on rustc-1.4.0-1, and we want to build rustc-1.5.0-1, that’s fine, it’s the previous release. Then a point release appears, we need to build rustc-1.5.1-1 using the buildroot’s 1.5.0 we made earlier. Then if we need to rebuild for any reason, now rustc-1.5.1-2 will be built from 1.5.1-1.

There are ways to override the buildroot, but I don’t think you can tag obsolete packages that way. Usually overrides are used to bring in some new package to chain build others, before sending them all to stable.

It sounds like @gus is also describing the same thing about jumping to stage2, but I hadn’t even considered the problem of feature use. I don’t think multiple --release-channel builds should be necessary – there ought to be an option for even stable builds to use features. There’s no need to be draconian, just let advanced use cases be advanced. IMHO :smile:

It doesn’t need to be an environment variable, but we really do need control of compiler flags for cargo build of crates. Even just for simple stuff, like we usually want distro packages built with optimization and debuginfo. The crate author can set their profile preferences in Cargo.toml already, but IMO this shouldn’t be treated as final. (Nevermind any -C options we might also want…)

Right, this is just policy for distro packages. Alice and Bob are free to make their own choices of compiler options. (But we still need Cargo to give them that control!)


#23

For bootstrapping (or maybe even for regular builds), one could throw known-good stage0 compiler binaries for all supported build architectures into the source package.


#24

We can bundle stage0 for bootstrapping, yes, but not after. See the policy here: https://fedoraproject.org/wiki/Packaging:Guidelines#Exceptions

Some software (usually related to compilers or cross-compiler environments) cannot be built without the use of a previous toolchain or development environment (open source). If you have a package which meets this criteria, contact the Fedora Packaging Committee for approval. Please note that this exception, if granted, is limited to only the initial build of the package. You may bootstrap this build with a “bootstrap” pre-built binary, but after this is complete, you must immediately increment Release, drop the “bootstrap” pre-built binary, and build completely from source. Bootstrapped packages containing pre-built “bootstrap” binaries must not be pushed as release packages or updates under any circumstances.


#25

Note that Cargo has automation to ensure it builds on stable now, and I plan to have this continue to work relatively far back (~1.1 right now I believe) into the forseeable future. Almost all of Cargo’s deps compile on 1.0.0 (and have automation to ensure that) and only a few require 1.1.0.

This was the purpose of the links attribute in manifests as it allows packagers to override build scripts with whatever they like. I do imagine, however, that some crates will need modification to work in all cases, but in principle we have all the machinery in place, just gotta make sure it’s in use.

We actually haven’t discussed the current meatiest patch (NullCheckEliminationPass) with upstream, but there’s definitely nothing blocking us from using stock LLVM. I compile and test with it from time to time (especially whenever I upgrade LLVM), and I would be surprised to learn if stock LLVM 3.5-7 didn’t work. @brson is right in that we need automation for this, but this should all have worked for some time now actually!

Note that build scripts are designed with this kind of use case in mind. For example the main OpenSSL bindings in Rust specify a links which allow you to completely override the build script entirely and it also by default uses pkg-config which allows even further customization of the build process while still assembling some custom shims rust-openssl uses. This sort of control should allow you to have any of the hooks you need to globally tweak how native libraries are linked into Rust programs through Cargo.

I’m a little confused by this, are you looking to turn crates into Debian packages? I would be curious to hear why this would be necessary, and otherwise I think I may be missing the motivation here for why this is happening.

One thing I’ve always been confused about with points like this is that at some point there has to be network activity, right? For example Cargo (and other Rust projects) fundamentally has dependencies which need to fetched from the network at some point. When is the best time that this is normally done for packages in Debian? Is there a point where a “source tarball” is built and that source tarball is intended to contain the entire state of the world? It’s pretty plausible to add a command to Cargo to do something like this!

In general an “offline mode” doesn’t make a whole lot of sense in Cargo because Cargo already works totally fine offline, so long as all the dependencies are downloaded. You just have to arrange for Cargo at some point to have already downloaded the dependencies and placed them somewhere. In that sense Cargo only ever talks to the network when absolutely necessary, and the problem is timing exactly when this happens to be when you expect it to happen.


#26

Ignoring build scripts, I don’t see where this is the case. All of them can be overridden by using a .cargo/config with the appropriate paths.


#27

By “libraries” I meant to also include Rust libraries, not just native libraries: We may need to patch a Rust library and rebuild downstream Rust packages using the patched version. Actually, since almost all native libraries will be dynamically linked (and so won’t need applications to be rebuilt), I guess most of these examples are going to be Rust libraries.

So: I don’t think links or build scripts help here - unless I’m misunderstanding something.

I attempted to describe the motivation in the very section you replied to, so clearly something didn’t work :stuck_out_tongue:

I’m not sure how to describe the rationale more clearly - after you’ve reread that post, perhaps you could describe how you see the distro security workflow proceeding for a fix to a Rust package?

There’s a number of concerns driving such a simply worded policy:

  • Avoiding network at build time reduces a significant source of flakiness.
  • Allows offline development of packages.
  • Maintaining our own copy of upstream sources means we don’t have to worry about upstream going away, or moving.
  • Some licenses (GPL, under an earlier interpretation) require the person giving you the binary to also make the source available
  • Some upstream sources contain unredistributable pieces (requiring repacking), or aren’t easily available at all from a well-connected upstream location.
  • Just making the sources available on disk in a standard location/format transcends whatever favourite build tool/language/version-control/archive/verification of the day is in use, allowing greater experimentation and freedom with such tools. For example, rustc upstream releases tarballs and make install whereas cargo uses git and a cargo-based build, yet both Debian packages can be processed with the exact same build machinery.

So the way this works for Debian, and most other distros, is that someone obtains, verifies, and possibly modifies the upstream source, adds whatever packaging details, and then uploads that to the distro archive. From that point on, the distro uses that copy of the upstream source and never contacts the original upstream location.

Re “entire state of the world”: the packaging metadata describes versioned build-dependencies between packages, so the build environment for a particular package is assembled on-demand from individual packages prior to building that package. There is never a “whole world” source assembled because it would be too big, unnecessary with incremental package builds, couldn’t be shared effectively between similar-but-different environments, and more importantly not everything can coexist all at once (llvm and gcc both want to provide “cc”, for example).

Yeah, I have a feeling that cargo can already do quite a bit (perhaps all!) of what we need - and I just need to work out what files to construct and place where. I (or someone else) should probably start putting together a straw-man so we can talk about the non-obvious bits… It would be useful to have a cargo --offline flag that threw errors so we knew when we had failed to construct the right environment, but I guess we can fake that up with some iptables rules or other acts-of-sysadmin initially.


#28

The override is neat start, especially if we could specify the rustc-flags we want anytime (whether or not it has links). That would let us use a global %optflags-like behavior after all! But if a build script is meant for other tasks too, say code generation, then we don’t really want to inhibit that!

Aside, I find it odd to refer to foreign (non-rust) libraries as “native”. It only serves to present rust as the outsider. If rust is to be a systems language, pervasive throughout the distro, then it has to “go native” too, no?

We don’t want the whole world of crate dependencies bundled together, for the same reason we try to avoid “native” library bundling. Each crate that ends up in the distro should be maintained in one place, so they can be easily tracked as necessary. Realistically this means each is its own package.


#29

For library crates, that means the “binary” package (the unit of package installation in the system) has to provide the library in a form that can be used to build dependent crates. For now, the only format that can be expected to work is installing the sources in some system-wide location and have a way to tell Cargo to use those sources instead of fetching from the network. A more efficient form of distribution would be to provide .rlib archives, but that means maintaining backward compatibility on the static library metadata and ABI between Rust releases, ideally up to the next major version of Rust, or having to recompile and update all Rust packages at once when the compiler and the standard library is updated. Actually, once ABI backward compatibility is maintained, it should be a small change to switch to packaging dylib crates by default, resolving all concerns with bundling statically linked code.


#30

Ok, it wasn’t clear reading originally whether references were referring to C or Rust libraries, but with the mindset of both this makes more sense. Note that I wouldn’t consider myself an expert at all in packaging, I’m just trying to understand the problem space!

So with all that in mind, I’m still a little confused about how you might be expecting things to work out. It sounds like you want to mirror Cargo.toml as normal Debian build dependencies and not use Cargo for dependency management at all. This would allow you to understand the structure, have each source tree be independent, and if you need to you can patch any crate in the ecosystem. On the other hand it also sounds like you want to use Cargo to build everything without needing modifying any sources and have it just pick up all the dependencies which happen to already be on the system. Does that sound right? Reconciling these two desires will be difficult to do, but it may indeed be possible.

We may also want to continue this discussion on a new thread (as I believe @brson wants to focus this primarily on packaging Rust/Cargo itself at least from the start).

True! I just use it to colloquially refer to “things normally found in a package manager” vs “normal rust crates on crates.io” kinda

While I think this makes sense, I think that something may need to budge somewhere on this. Unless all package managers are explicitly willing to entirely duplicate everything Cargo does for dependency management we may need to start assuming this may not happen and strive to find some other solutions.

For example, why do distros want to duplicate Cargo’s dependency management? Or is it fair to say that distros want to do this? Is this only for security updates? If that’s the only reason, then we can probably reach a more targeted solution, but it’d be good to explore this space first. (although like above we may want to continue off-thread to avoid getting too far in the weeds!).

The reason I mention perhaps finding another solution is that I’m not sure that distros really want a package-per-crate. Projects like servo have hundreds of dependencies, many of which are tiny and the likelihood that distros keep up with the rate of change of the entire Rust ecosystem seems… unlikely? The other reason I feel distros don’t want to do this is that it doesn’t currently really make sense to “install a Rust library”. Cargo explicitly will not look at the system for dependencies (e.g. this is how it guarantees reproducible builds), and managing dependencies for a Rust project is idiomatically done through Cargo, not the system package manager.


#31

I think this is right. Cargo can still verify dependencies, similar to the way a configure script would, and after that it just needs to act like make.

Patching sources for packaging is fine in some cases, but if we would end up repeating the same modification on every single one, that’s probably something the tool should support directly. Overriding opt/debug flags with distro choices, for instance.

Packaging just Rust and Cargo is a great start, of course. And while Cargo itself uses many crates, I think we can get away with bundling those together for the moment, at least while this is the only place they are used in the distro.

Security is probably the most important, and generic bugfixing also follows.

Keep in mind also that a distro is tracking dependencies of the entire world; Cargo only knows Rust crates. In a world where Rust programs and libraries are integrated more deeply with the rest of the system, it would be bad if the distro package manager had only a Cargo blackbox for the Rust corner of the world.

We don’t have to package everything on crates.io, just those that prove useful for a program we want in the distro. That still may be a lot of tiny crate packages, yes. But how many of those hundreds used by servo are frequently updated? My hunch is that most of these are probably for one-off functionality that won’t update much.

Compare to a few other languages, with a quick glance at Fedora I see 176 golang, 669 ruby, 1482 python, and 2677 perl packages. And this is just searching at the level of source rpms; some may create multiple installable packages. (Searching this way also includes all historical packages, so some might be retired by now.)

Even though perl has the most, that’s a drop in the bucket of the 154,917 modules on cpan. In the 3,013 crates on crates.io, I’m sure there are much fewer that will find their way into distros.

Anyway, yes, distros keep up with lots of packages. :smile:

Surely part of that idiomatic way is because these things just don’t exist yet in the system package manager.

I think you’re sort of getting at the disconnect between distro packaging and containers here. You may not be explicitly containerizing, but building all your own exact dependencies into a monolithic static binary is pretty close. I definitely don’t think we should dig into that disconnect here, but let’s allow that both methodologies should be made possible with Rust. If Rust is to be the new systems language, it needs to be able integrate into systems too.


#32

Just curious, do you know how many of these are truly external dependencies? At a glance I see a lot of relative paths, which would be fine to keep in the same source package if these are just pieces of servo.


#33

Thanks for the link. I’ve noted it.

Thanks. Filed an issue.

Thanks for bringing this to my attention. Here’s another issue.

I’ve made a note of this self-re-bootstrapping requirement. We can bake this functionality into the makefiles, including bypassing the feature checks so you don’t need to package nightly.

Thanks for all this explanation and the link to your proposal.

Thanks for the clarification.

Thanks for the description, and agree that rebootstrapping shouldn’t require nightly. We’ll solve that in the makefiles.

Edit: Heh, I just realized how terse this message was. ‘Thanks!’. It’s late here.


#34

Ok, cool, thanks for all the info @cuviper! I think for now we can hold off on more details for a later thread (and focus on rustc/cargo themselves here), but it’s really helpful to learn about this space!

Specifically on the topic of Servo I believe they’ve got a handful of external libraries (e.g. skia, harfbuzz, png, etc), but they’d probably know more than I!


#35

This has been the rule for most other module ecosystems in the Linux distros (Python, CPAN, Eclipse modules to name just a few; Fedora even repackages node.js packages despite the strong preference for bundled dependency installation in npm). They do want to reflect both build dependencies and binary dependencies in the respective source/binary packages, avoid downloading in builds, and discourage library bundling.

I think the main reason for that is maintainability on the distribution level. If a library crate X is found to have a critical bug, they don’t have practical ways to comb through the entire package universe searching for Cargo build manifests listing that library as a dependency, let alone projects that don’t even use Cargo. All they have is their package database, so the bundling of crates should be reflected as build dependencies in the packages.


Subteam reports 2015-09-21
#36

components/servo/Cargo.lock currently lists 181 packages, 118 of which are from crates.io and 34 from other git repositories. This leaves 31 crates in the same repository.


#37

Hi,

I didn’t read the whole thread, sorry (had no time, going to do so), so may be I’ll repeat someones thoughts. Some points from the Gentoo point of view:

  • System wide LLVM: it worked for us, but now we’ve switched to shipping shared LLVM libraries only, so we need support for linking with shared libs from Rust upstream /see rust-lang/rust#27937/.

  • Installation of multiple rusts. It works for us without multirust, the only thing we need is the ability to install rust libs/binaries to custom dirs. There is one bug in the rust installer that we temporary fix with patch.

  • Cargo binary package support: we really need versioned cargo binary releases similar to those versioned rust binary releases (I failed to find any versioned ones for cargo).

  • Cargo/rust infrastructure support in general: the main problem is how to make cargo use system libraries during production build (it fetches everything and uses fetched versions). This + not very clean linking model (or may be I just do not understand it well enough) makes shipping cargo based packages in gentoo at the moment near impossible. The ideal solution would be if there existed some ‘production’ mode for cargo when it is used just as a nice build system and uses only packages already installed in the system. If no necessary package is found it should just fail.

  • -Werror switched on during Rust build, ideally it would be good to have a possibility to switch it off (I’m just sedding mk scripts at the moment)

  • Additional libraries suffix. At the moment I’m sedding with sed -i -e "s/CFG_FILENAME_EXTRA=.*/CFG_FILENAME_EXTRA=${postfix}/" mk/main.mk, it would be good to have some switch in the build system to do this (it is necessary for the possibility of relible side by side istallation of libs).

  • Signing of rust release tarballs, so their authentity can be checked.

  • Rust bootstrapping: at this point of stability we can switch to bootstrapping by the previously installed compiler I think. There is a switch in the build system that can be used for it (local rustc). But we need guarantees from upstream that compiler will be self compilable by, say, N previous versions.

So far that’s all from my side. I’m going to read the thread and may be I’ll have more points.


#38

-Werror switched on during Rust build, ideally it would be good to have a possibility to switch it off (I’m just sedding mk scripts at the moment)

I agree, but if -Werror is causing any problem for you, I think we are interested in bug reports.

Signing of rust release tarballs, so their authentity can be checked.

I think we already do this, i.e. https://static.rust-lang.org/dist/rustc-1.3.0-src.tar.gz.asc. Perhaps we should document this better? What would be needed?


#39

Noted.

Huh. I’m sad to say I’ve looked at this patch for a while and still don’t see what the bug is. Can you explain in more detail what the patch is working around? Is there a filed issue? (I know that --libdir has always been in some state of broken)

This is a common theme!

cc @alexcrichton

Ah, interesting point. I’ve opened an issue: https://github.com/rust-lang/rust/issues/28599

Ah, yes, that makes sense. Although we (try to) tie that extra bit to the version number. Perhaps we’re just not capturing enough metadata to differentiate the versions of Rust you are installing together?

In fact, the code looks wrong. It’s hashing an undefined variable…

Would hashing the version number as intended be enough to distinguish your sxs installs?

Edit: After investigation, this does actually hash the version number correctly before CFG_RELEASE is defined. I don’t understand why offhand.


#40

Thank you, my fault, I haven’t seen it. May be adding a link to the signature to the download page would be a good idea. The same for sha256 file.