The State of Rust Tarballs

I have had several friends bug me about this on-and-off for a while now, so I decided to write up the experience using non-rustup methods of installing Rust. The following is what I discovered after looking around the Rust Forge and main website, as well as going through the contents of the tarballs and comparing to the contents of the ~/.rustup directory.

The state of non-rustup installation methods

The download page

The Rust installation tarballs seem a bit neglected. The download page (https://forge.rust-lang.org/other-installation-methods.html) only lists tarballs for the latest versions, and only for full distributions (not components or targets). The link to past releases 404’s (https://static.rust-lang.org/dist/index.html), and there is no visible way to download the targets that don’t have rustc/cargo (looking at x86_64-unknown-linux-musl in particular), and no way to browse for them because of the aforementioned 404.

The tarballs themselves

The tarballs are in a layout that requires the use of an ./install.sh script to place them in a configurable prefix (default /usr/local). It goes beyond just copying files into place, however, and there are a few stateful bits. The installer writes manifest files with absolute paths, which seem to only be used for the uninstall script. On linux it will also try to create an ld.conf.d file (but only if you installed to the standard prefix?) and then run ldconfig (regardless of where you installed), which will fail unless you’re running with sudo. This part seems extra weird to me as the rustup version seems to live with user-only installs just fine and doesn’t have to mess with ldconfig.

Why should we care?

Rustup is handy, right? I think rustup is a great and useful tool as a Rust developer. There are cases where we should have a better story around installing one particular Rust version and leaving it at that. Build and CI springs immediately to mind; if you care about reproducibility you want to also make sure you’re using an exact version of a toolchain in addition to an exact version of a codebase. I believe it’s also an extra inhibition to people finding a Rust project, getting interested, and trying to compile from source.

I’ve also talked to several people who balk at the standard curlbash installation method for rustup and are also then nonplussed at having to click through several steps (while being reminded that rustup is a thing you should use) to get to the tarballs. Hopefully they then don’t notice the broken links to the past versions or the inability to add targets after that.

Why can’t you just use X?

Docker

Using docker to get a specific Rust version to build with is like using a (hard-to-debug) sledgehammer to kill a fly. Besides, if we simplified per above, it would be simple to install Rust yourself inside the container if you wanted to start from your own base to build. I prefer the model of starting with something I know and adding other stuff, rather than the docker solution which would be starting with someone else’s base and then adding/removing.

Distro Packages

  1. No guarantee they will have the version you request to build with
  2. No guarantee the version that you build with on distro version X will exist on distro version Y
  3. No guarantee the compiler versions will be the same across distros, because of vendor patchsets
  4. No guarantee the components/targets you want will be available via the system package manager
  5. Juggling package managers and package names across distros

The tarballs that exist today

It would be nice if I could glue things together in such a way that I can do the minimal amount of bash-calling and privilege-escalation before starting my build, which is the only place I’m hoping or expecting to see things fail. If I can download a tarball (or stash it locally), hash it to verify I got what I expected, and just un-tar and build, I’d save a lot of boilerplate. I’m also paranoid about whether or not that ldconfig thing is actually necessary so I can’t just install it to /tmp/rust and then re-tar.

Rustup

See previous. If I don’t like running an install step from a tarball that I can hash I’m not going to want to introduce non-determinism by curlbashing rustup and having it install things.

What if tarball users aren’t real and we made packaging better for nothing?

Even if I’m 100% wrong about people being put off by rustup or tarball hunting, I think we could make a couple changes and benefit both users of rustup and of tarballs and come closer to unifying the two so we aren’t bifurcating our build process or splitting our attention.

  1. Tarballs should be self-contained and relocatable. Their structure should be something like

    rust-1.31.1/
      bin/
      lib/
      share/
      etc/
    

    so you could un-tar it somewhere, update $PATH to point at the rust/bin dir, and away you go. This is how golang tarballs work, and it is rather handy. Want to uninstall? rm -rf. Want to update? Either rm -rf and un-tar the new one, or untar to a new path and update a symlink. Want to have multiple installs? Go ahead!

  2. Components/targets are also tarballs that have the same structure. Targets would basically be a tarball with rust-<version>/lib/rustlib/<target>. Adding components is overlaying tarballs. I assume this is pretty close to how components actually look today as this is roughly what rustup does (I can’t check due to the web directories being unbrowsable).

  3. Downloads page should have an *-all.tar.gz tarball that basically contains the components that the tarball today does (cargo, rustc, rls, docs, clippy, rustfmt, std, llvm-tools, rust-analysis), except self-contained as in #1 (I also wouldn’t mind a *-min.tar.gz version, but I’m not sure what constitutes a minimum useful set)

  4. Rustup can now use these tarballs as its components to add/remove. Removal would have to be done by manifest-tracking files upon install (un-tar’ing) but that appears to be how it does it today anyway?

  5. (Moon Shot) Rustup could assume control of an existing install. If someone installed via tarballs and later wanted to get on the rustup train, you could point rustup at the existing install and it could just assume ownership and start tracking that version. It would be no longer necessary to have someone blow away a Rust installed via tarballs before they could use rustup.

What about people currently using tarballs?

If this is a concern (and I’m not sure if people are using tarballs by scraping for latest and just following the same install procedure), leave the install.sh script in the tarball. It can do exactly what it does today, moving things to different prefixes or paths. Maybe remove the ldconfig thing though.

11 Likes

Hello

I’d like to add another use case.

I work at Avast (though in different department than the one responsible for this). What Avast wants to do is to consider rust binaries as „trusted source“ ‒ basically, there’s a big database of „likely malware“ and „likely clean“ binaries and the chance of flagging rustc.exe as malware is smaller if it gets into the „likely clean set“ (and rustc.exe does some things that make it suspicious for an AV ‒ like writing other binaries). Considering it as malware is not good for Avast, and it is not good for Rust.

Some time ago there was a setup that crawled and downloaded the releases and added them to the „likely clean“. But it recently stopped working, probably because of that 404 there. I wanted to report that 404, but I didn’t have the time to find out where exactly to send the report yet.

So, it would be nice if that page (or any other page) started to work again. The tarballs/zips (?) as they are now are fine for this use case (as is probably any other archive), though.

5 Likes

I agree that having tarballs work is a good idea, and that the tarball 404 should be fixed…

About a year ago, it was possible to use Rust by extracting the tarballs and pointing PATH to the right place.

Since codegen backends arrived, you also need to have the lib directory on your library search path (e.g. LD_LIBRARY_PATH). It might be a good idea to fix this.

1 Like

How does this work with rustup then? Does it somehow fiddle with your LD_LIBRARY_PATH when you call cargo/rustc?

I’m not quite sure what it does.

EDIT: extracting the tarball does work, because the rust-libs component contains all the libs in rust-nightly-x86_64-unknown-linux-gnu/rust-std-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/, and has codegen backends in rust-nightly-x86_64-unknown-linux-gnu/rustc/lib/rustlib/x86_64-unknown-linux-gnu/codegen-backends.

Yes. For rustup users, the cargo and rustc binaries in the PATH are wrappers that set up the environment based on the selected toolchain, then call out to the actual cargo/rustc binaries.

Actually, if you set up things "properly", using the tarball in dist does work. The problem is when you use the component tarballs.

What kind of setup is involved? Because its starting to sound like the rustup wrappers are actually required to utilize things like components/targets, which would be a troubling state of affairs.

No setup, just combine all the bin and lib directories and call bin/rustc.

What's this referring to then? I feel like I'm more confused than when I started this thread. Does manually combining the bin and lib dirs work for all components, including other compile targets? Does it require messing with LD_LIBRARY_PATH?

On a separate note, why is rustup's wrapper messing with LD_LIBRARY_PATH instead of that being up to cargo/rustc, which seems that it should actually be concerned with the location of its libs? Is that not a concern that a user of non-rustup Rust also have to contend with and would be better solved for everyone?

1 Like

NB: dated directories still have indexes. They even “work” for directories that do not exist yet.

You may be interested in my rustbud toolchain manager, or maybe a future version of it once I've finished porting it to Rust. It does downloading and hashing and verifying like rustup does, but it lets you put a file in your codebase that specifies an exact, specific version of Rust - by URL, if necessary, so your CI doesn't depend on Rust's hosted binaries.

Basically, rustc looks for its various libraries at in a particular path relative to the binary location. If you take all the component tarballs that rustup/rustbud download, figure out which files are metadata and which files need to be installed, then copy them all into a common directory prefix, they Just Work and nobody needs to set LD_LIBRARY_PATH or run ldconfig or anything.

1 Like

This seems like a good argument to just fix the tarballs. Since relative paths to the libs work, just changing the directory tree in the tarball would obviate the need for rustbud or for large swaths of code in rustup.

2 Likes

Discussing these issues is now on the Infra team agenda for their next weekly meeting, which will be on the 15th; however, it is relatively low priority.

2 Likes

Related: Dist index gone · Issue #56971 · rust-lang/rust · GitHub

we could also get torrent tarballs to be completely sure the server didn’t tamper with them since last time you downloaded them.

Torrents seem overwrought for this, tar + hash is just as strong. If you’re really wanting to make sure its repeatable/reproducible, you’re not trusting the server’s signature except for the first time you build with it, and then you save that hash and make sure the next time you build, you’re using the same hash you used previously.

1 Like

I don't have discord for myself so I can't go see the arguments but is the problem really that we can't generate static indexes for an S3 bucket?

Yes :frowning:

And the server can just DoS you then. Torrents are DoS-resistant.