Perfecting Rust Packaging - The Plan

@wthrowe thanks for the explanation and link to the other thread. I think I basically understand your critiques of how Rust handles linkage.

I'm afraid I don't really understand the concern. As far as I understand it, triples are just a handy way to encode a bunch of cpu architecture / ABI / platform options into a single string. My understanding (please correct me) was that gcc can indeed be configured with separate i386/i486/i586/i686-linux-gnu triples, and yes they just map to different default values for -march, -msse, etc. I'm not sure if @eefriedman is arguing that adding more triples is bad, or that adding triples is not sufficient/scalable and we need to add cargo support for passing arbitrary codegen compiler flags. I'd like a triple for my platform, and I'd like the ability to pass arbitrary codegen flags :wink:

I'm fine with adding a more specific i586-debian-linux-gnu, if we want to make it clear that it means only "the abi/cpu features that Debian assumes". I'm also fine with adding that just to the Debian rustc.deb if there's resistance to carrying it upstream. Really, I just need something I can pass to rustc/LLVM to get it to produce output that fits the Debian "i386" architecture definition (basically gcc's i586-linux-gnu), and doesn't assume the existence of pentium2 instructions. I can pass a bunch of compiler flags around, or I can create the triple that represents those same flags - either way I need enough support/hooks in my cargo executable to be able to do that, and right now that means a new triple.

My primary point is just that a system where ā€œi686-pc-linux-gnuā€ mean pentium4, and ā€œi586-pc-linx-gnuā€ mean pentium1 is really confusing. Also, if Debian retires i586 in favor of i686 sometime in the next few years, youā€™re going to run into trouble because i686 wonā€™t mean what you need it to mean.

On a side note, Iā€™m pretty sure the testsuite will fail on an ā€œi586ā€ target because of differences between SSE and x87 math.

i686 should actually be Pentium Pro?

Iā€™ve filed an issue against Rust to apply -W,-z,relro by default. This is one of the reasons for wanting to apply custom command line arguments to rustc (which we still plan to do). I donā€™t know how feasible it is but it seems like the sort of option Rust might like.

Just to update everyone here, I tried to put together a quick proof-of-concept of getting cargo to use local packages. It doesnā€™t work yet, but Iā€™ve learnt a few things along the way.

In no particular order (or difficulty):

  • Even something ā€œsimpleā€ like cargo requires a lot of crates :stuck_out_tongue: I think Iā€™ve built 37 ā€œlibraryā€ packages in order to build cargo, and thatā€™s with me cheating a bit and collapsing some dependency chains that specified supposedly incompatible versions of the same crate (but I hack the Cargo.toml dependency and reuse the one version for both).
  • Lots of over-specified version requirements across crates.io
  • Several crates declare blanket dependencies on winapi/kernel32 that arenā€™t actually required unless youā€™re building for a windows target (for example, cargo itself)
  • Often the only indication of copyright is the keyword in Cargo.toml entry (no explicit LICENSE text nor copyright comments in readme/source)
  • Many upstream git repos donā€™t actually make releases (or declare git tags, etc) other than the snapshot that happens to get uploaded to crates.io. This effectively means we need to build distro packages from whatā€™s in crates.io or else we have no hope of matching the semver dependencies used between packages.
  • A number of *-sys crates ship a full upstream C library source, which they only use as a fallback for when the local platform version couldnā€™t be found. I hadnā€™t actually noticed that before, and it makes packaging (only slightly) interesting because now we have a bunch of extra files to either strip out, or audit licenses.
  • The good news with distro packaging is that we can use the cross-language distro package dependencies to just make sure we always have pkg-config, the required C library, headers, etc installed giving us a much simpler and more predictable result. In just about all cases, we hit that first pkg-config line in build.rs and the rest is skipped entirely.
  • Repackaging from crates.io rather than upstream github isnā€™t great because:
  • Canā€™t share work between upstream repos that contain multiple crates. Probably canā€™t do this anyway (easily), since the ā€œsub cratesā€ often declare wildly different crate versions so we wouldnā€™t be able to do a 1:1 crate version and distro package version.
  • A number of files are often missing from what gets uploaded to crates.io. In particular, documentation, examples, license files - things that arenā€™t actually cargo buildables.
  • Several ā€œsub cratesā€ assume the original source subdirectory layout, and derive their crate name from the directory. I needed to patch several of these to add explicit Cargo.toml name=... directives once I started shipping them in my own (differently named) $name-$version directories.
  • crates.io doesnā€™t make it easy to verify sources. There are checksums buried away in the metadata (retrieved via a git checkout), but ideally thereā€™d be a simple http-accessible signature alongside the source download (like there is for rustc source itself).

Some implementation details (all open for discussion, this is mostly just a strawman POC):

I hacked up a quick/horrible python script to fetch source from crates.io and it autogenerates most of the bare minimum debian/* required to package up a library. The end result is (currently) a debian package named ā€œlibrust-$crate-devā€ that contains the crate source, with a patched Cargo.toml. The astute here will notice this package naming scheme implies we can only support a single version of the crate at once, without spawning off more explicitly versioned Debian package names (this is probably something that needs to be changed). The patched Cargo.toml has all dependencies rewritten from libc = "0.1" to libc = { path = "/usr/share/rustsrc/libc-0.1" }. I also rudely truncate all more explicit semver dependencies to 2-digit x.y (to prevent having to update all the Cargo.toml paths all the time), and use the Debian package metadata to preserve the original more-specific version requirement.

When you install the package, the crate source gets dumped into (rather arbitrary) /usr/share/rustsrc/$crate-$x.$y.$z with a symlink from $crate-$x.$y for Cargo.toml path convenience. I made the guess that x.y would be good enough to at least verify the approach, and so far Iā€™ve hit other problems first. Note that I completely ignore Cargo.lock - Iā€™m sort of unclear on whether thatā€™s a bad thing, or expected.

So! I eventually have an example non-library Rust thing that Iā€™m trying to build in this environment (in my case ā€œcargoā€ itself). The debian/rules packaging script currently tries to build this with CARGO_HOME set to a temporary (empty) local directory to isolate from the local userā€™s settings, and I run ./configure --prefix=/usr (works fine), then make (and cargo build --release fails).

Some issues:

  • As far as I can see, cargo insists on updating the crates.io registry, even though thereā€™s no dependency (afaics) which is using crates.io. Should I create a dummy registry checkout in order to prevent this?
  • Cargo goes ahead and ignores all my Cargo.toml path-rewriting work, and downloads all the crates from crates.io again, including things like winapi that shouldnā€™t even be in the dependency chain after my edits! Any suggestions on where cargo is getting this idea of the dependency graph from?

Note all of this is all entirely unrelated to the cargo package recently actually added to Debian unstable. That package uses a simpler (working!) build approach - Iā€™m trying to explore the more general challenge of building cargo-using apps without vendoring all the dependent crates.

3 Likes

I'd be interested in hearing more about this.... if anything, we've had an under specified problem, I'd think.

Just wanted to say thanks for all the work you've put into blazing the trail here!

This is being worked on in Need ability to add dependencies based on `#[cfg()]` Ā· Issue #1007 Ā· rust-lang/cargo Ā· GitHub / https://github.com/rust-lang/rfcs/pull/1361

This was a reason I suggested the possibility of adding a make dist style cargo command, especially if the other option on the table is each distro manipulating the Cargo.toml which I have no interest in managing.

I use -Wl,-z,relro,-z,now in my build of Rust for Yocto. Its a fork from a previous repo and is a bit of a mess but Iā€™m working to clean it up. The repo is called meta-rust.

Iā€™ve actually landed a patch in Rust master to treat i386, i486, i586, and i686 in mk/platforms.mk recently. It had already done i386 and i686. Yocto also targets i586 and the meta-rust I cloned my repo from had been patching that spot for some time.

Sorry for interfering. Any plans for two specific things:

  1. Binding to custom crate repositories?
  2. Allowing to release pre-built artifacts for crates?

Thanks

@gus Thanks for working through all that and giving us the details. Next week the Rust team is meeting in person and weā€™ll try to regroup to understand the problems you are dealing with, how Cargo can ease them.

@target_san

  1. It is possible to use other repos than crates.io, though not super tested in the wild. See this unit test, which frobs registry.index in .cargo/config. @alexcrichton says you might also want to look at the implementation of cargo-vendor.

  2. I donā€™t think there are specific plans for releasing binaries, though itā€™s known to be a desirable feature.

It'd be helpful to have something that can reproduce some of the issues mentioned -- like Cargo talking to crates.io despite everything being tied to paths. @gus, is the work you've done something you could easily send us in a tarball?

Some updates.

@gus, @alexcrichton, @lucab and I had a quick video call today to discuss Debianā€™s progress. The notes are available though I guess they will be pretty hard to understand.

Some of the major points:

  • Cargo really needs better support for sourcing registry crates from the local filesystem; path rewriting is too painful. Issue.
  • Debianā€™s tools for scraping version updates from websites donā€™t understand crates.ioā€™s pages. Iā€™ve filed an issue, but donā€™t know enough about it to say what needs to be done yet. Somebody more knowledgeable might fill in the details.
  • The Rust trademark policyā€™s inclusion of the sentence ā€œThis document supplements the official Mozilla trademark policy which governs use of all Mozilla trademarksā€ is a huge source of confusion. We consider this a bug and are trying to fix it, but working with the lawyers is slow.
  • We think that sourcing packages from crates.io tarballs is the way to go, not to try to trace the source back to git.

While I havenā€™t been doing a lot of coding yet on the bugs to come out of these discussions, I have started a patch to teach cargo to interpret a RUSTFLAGS environment variable, ala CFLAGS.

5 Likes

A somewhat tangential point, but I've seen two instances of people who needed to be able to locally mirror the index and packages. I've also personally set up a local mirror of the index because constantly hitting the network (and the occasional outage) are a tremendous pain in the ass.

The major problem is that this makes it impossible to work on public binary packages, because changing the index alters the paths in Cargo.lock. It would be really nice if whatever is worked out here helps with the above.

1 Like

I've just realised, for maximum compatibility on 32-bit x86, the stage0 rustc binary could be made into the generic i686 version.

There's been a new stage0 snapshot release (2015-12-18) and the i686 version is unchanged. Have it your way.

Thanks for bringing this up again. If there is a change you are looking for here perhaps you can open an issue to suggest it. I don't recall all the context, and changing the snapshot configuration hasn't been on my radar.

FWIW though, we are planning on removing snapshots completely in favor of using the releases, so any change in snapshots may be short lived.

Ok, thanks for replying. There was never an issue about it so it must have slipped @alexcrichtonā€™s mind.

The idea was to enable the snapshot to be run on pre-P4 hardware too, keeping the default code-generation setting unchanged. Completely transparent but still preserving the option of modifying -C target-cpu= at build-time.