Should Cargo be written in Rust?


#1

I imagine it won’t be too long before Cargo is used to build rustc itself - the advantages of doing so will become greater and greater over time, as both the libraries available via crates.io become more attractive, and as the Rust compiler infrastructure matures, it may become desirable to for non-rustc projects to independently fetch and use different versions of compiler helper libraries. (It’s easy to imagine an external tool would want to use, say, libsyntax, and also want to use Cargo’s version management to track it.) Over time, I can imagine that Cargo will become the preferred way to build Rust code, including the Rust compiler itself.

This is likely to make bootstrapping Rust on a new platform into a much bigger challenge than it has been so far. Much worse, I think, than it is currently, when we can get Rust working first, and then worry about building the package manager. There has already been a lot of complaining about the difficulty in tracking Cargo on FreeBSD (which has thankfully gotten quite a bit easier as the language has started to stabilize). I know there aren’t that many platforms to support these days, but isn’t it a pain to keep up with all our target platforms already?

And to what end? Rust is going to be the best tool for a lot of jobs for which C and C++ are currently the best tools. But it isn’t the best tool for all jobs. I so far haven’t figured out a compelling reason to believe the language is better suited to package management than, say, Ruby is. Can anyone enlighten me? To me, a scripting language seems like a better fit for Cargo’s task than Rust is. In particular:

  • That scripting languages are in plain-text means that, if Cargo doesn’t work on your platform for some reason, you can hack the scripts to make it work. You can insert debugging print-statements to see what’s going on. There is a level of transparency and hackability that cannot be matched by a compiled program.
  • A scripting-language Cargo would likely rely more on external tools to operate, rather than being library driven. (For example, it would likely just run git from the command-line, rather than link against libgit2.) This would greatly improve the transparency into what Cargo is doing, and why it’s doing it, and probably stabilize Cargo quite a bit, too (command-line arguments seem to have breaking changes much more rarely than library interfaces do). This would, of course, have performance implications, but:
  • It’s highly unlikely that the bottleneck in a big project build would be Cargo. It’s much more likely to be I/O as packages are downloaded from the net, and CPU when those packages actually get built. We don’t need to optimize the package manager for efficiency in the same way as we do most Rust applications.
  • It should help Cargo’s portability, since it could rely more directly on OS vendors’ package managers for things like git or curl, rather than having to roll our own adapters to the host platform.
  • Finally, it will make it much to build tools like rustc and rustdoc using Cargo, if we can avoid a circular dependency between Cargo and the compiler itself.

I don’t yet understand why Cargo is implemented in Rust, other than perhaps a desire to exploit the language to support its own ecosystem. It still strikes me as quite possible that Cargo in Rust will become an issue some day? (Actually, it’s already been an issue for a long time, witness the occasional FreeBSD griping about keeping up with Cargo builds. I expect this is dieing down quite a bit now, but it did show that these concerns can be a significant headache.)

I expect this may be a can of worms… I’d like to be persuaded that there’s a good reason that it’s better for Cargo to be implemented in Rust than in, say, Ruby? But I just can’t convince myself.


#2

As you say, I think much of this unfortunate pain is instability forcing things to be rebuilt often. I believe the goal is to have cargo using only stable features, so it can be built with the fixed stable compilers.

As the cross-compilation story improves (if it doesn’t work already), it should be possible to get rustc/cargo for other platforms with the appropriate --target=... on some host platform that currently works. I don’t think this step would be any harder if rustc bootstraps using cargo, since the host with rustc will presumably also have cargo (or can have cargo built for it).

Further, I believe @brson is interested in allowing others to provide build machines that hook into our testing, nightly & snapshot infrastructure, so there could be (close to) first class FreeBSD support. If this was set up, there would only have to be a single initial cross-compile to get the nightlies/snapshots started and from there it should be smooth sailing for everyone.

We do not want to force users to have installed these programs and have their path configured to have them in scope, at the very least, it can be non-trivial to install on Windows.

This is similarly not a good thing. Cargo switched its support for build scripts from shell to Rust, since the only thing guaranteed across platforms is that a Rust compiler exists, e.g. nothing about behaviour of the shell. Switching the whole of Cargo to a similar (its not exactly the same) set up would be a step backwards. (Of course, the build scripts can try to call binaries in the same way or just shell out to script, but at least they have a chance to be self-contained, not imposing a hard constraint from the beginning.)

AIUI, most of the instability problems are the Rust language changing, not the (non-Rust) libraries it uses changing their interfaces.


#3

OK, it seems the main thing you point out that I didn’t consider is that this gives Cargo a much better deployment story, is that fair? That it’s a single executable that doesn’t require hardly any outside infrastructure makes deployment much easier. This is compelling, to me.

Most, but not all, and this will have to do with the fact that Rust was fundamentally changing so much faster than its non-Rust dependencies, and with the fact that Cargo and crates.io are still young. This balance will change when the language stabilizes, and I worry that doing things like including libgit2 and libssh2 with our crate distributions (which is done, today) will end up meaning that Rust will have to re-solve portability and security issues in attached libraries. And we’ll have to deal with these issues while tightly-coupled to the wide interface of a C library, instead of being loosely-coupled to the narrow interface of the command-line. I strongly suspect this will become a significant drag, over time…


#4

I’ve thought more about this… While this is still a concern to me, it’s a concern for any crates.io C-library wrapper, and is not specific to how Cargo is implemented. And on the other hand, a crate that ran git or curl externally would be coupled to the narrow command-line interface, and would insulate its clients in the same way as I suggested a scripting-language would. The concern here is (almost) orthogonal to Cargo’s implementation language, since Cargo could be written to invoke these tools externally.


#5

Bootstrapping the compiler is difficult enough, and it’s fairly self-contained. Cargo though has a number of native dependencies and is a complex bootstrapping problem in its own right. I personally do not want to use Cargo to bootstrap rustc for the reasons you state.

I have projects where I want to be able to reproduce Cargo builds but can’t count on Cargo being available, and for that I was thinking of having Cargo spit out some kind of build plan, and using a python script to interpret it.


#6

I think it could output a bash/posix shell script. cargo build --verbose almost does that already. When I was porting cargo to arm-unknown-linux-gnueabihf, instead of cross compiling cargo with cargo build --target= (which I tried, but couldn’t get to work due to the C dependencies), I built cargo directly in the target ARM device by simply copy pasting the output of cargo build --verbose (a bunch of rustc calls, and calls to rust binaries (build.rs stuff)) in the shell. I had to supply some extra stuff though:

  • The source code of the cargo dependencies. In my case, I simply scp -r ~/.cargo into the ARM device - but an hypothetical shell script could call git or curl.
  • Create some extra directories (mkdir -p target/$PKGNAME/out)
  • Supply some env variables: TARGET, HOST, OUT_DIR, CARGO_MANIFEST_DIR, etc

It was tedious doing it by hand, but it was something I only needed to do once. Afterwards, I used the manually compiled cargo, to bootstrap newer cargo versions.

(Sorry, if I went too off-topic)


I don’t think cargo will replace the current make build system we use. Even if, in the far future, we end up publishing some crates like core and syntax in crates.io, that doesn’t necessarily mean that we’ll have to use cargo to bootstrap rustc. To me, making cargo support the bootstraping feature that rustc needs seems like too much work for little gain, because (1) probably only rustc would use such feature, and (2) I don’t think it will (significantly) improve rustc development. I’d prefer if those efforts were instead directed to features like incremental compilation/codegen for rustc, which would speed up rustc development and development in the cargo ecosystem at the same time.


#7

Regarding point 2, consider the things that currently need to be provided by rustc, but which are general purpose and could benefit from being decoupled from rustc proper:

  • parsing will probably eventually rely on a parser generator.
  • rustc’s error-reporting could be generalized to support compilers for non-Rust languages.
  • interacting with host time, for things like -Z time-passes.
  • trace logging facilities.
  • json generation.

Some of these already are decoupled from rustc, and rustc needs to include alternative implementations to maintain the build separation (liblog, libflate, etc.). The ultimate vision would be that the rustc distribution include only those things that are unique to rustc, while the rest are externalized probably via Cargo and crates.io. Isn’t it somewhat compelling that, for each decoupled part of building the compiler, there’s a dedicated crate that does that specific thing, and that crate can be easily integrated into anything that wants to use that piece?

I’ve argued elsewhere that Cargo should define a bootstrap target, that could build a tarball including all the dependencies used to run a build, and maybe a Ninja file, or a shell-script for actually running the build.


#8

Note that the choice of using libgit2 was very intentionally done (it used to be implemented by shelling out to git) for two primary reasons:

  1. When shelling out to git, the behavior varies greatly among the various versions of git. This means that the commands cargo executes are quite hard to get right in a “git-portable” sense (and sometimes impossible!). By using libgit2 we are guranteed the exact same git interface across all platforms and we know precisely what’s happening. I’ve found this to be quite a nontrivial benefit!
  2. Shelling out to git really is slow. When you’re actually compiling the entire world the runtime is definitely insignificant, but the edit-compile cycle I found was hugely slowed down by shelling out to git so often. I found the average call took ~5ms and there were 3 minimum per git dependency. At the time everyone used git dependencies and it meant that projects like Cargo or Servo took nearly 1s from cargo build to “ok, nothing to run”. I’m sure there were more pieces we could optimize, but the argument for “oh the compiler surely is the slowest” isn’t quite always right.

I mentioned this above as well, but Cargo really is the bottleneck when you’re dealing with fresh builds. Figuring out that a dependency does not need to be rebuilt can take a nontrivial amount of work sometimes. I also suspect that resolution of dependencies will only get more complicated over time and performance may actually seriously come into play there (e.g. having cargo update not take 1min).

All that being said, you are 100% correct that in a first-build situation Cargo is nowhere near the bottleneck due to rustc and downloading.

I alluded this above as well, but I’ve found it actually hurts portability to use system tools instead of bundled tools.

To me personally I found it a great opportunity to exercise Rust in writing another “real application”, so I personally see that as a piece of the decision. It is far outweighed, however, by the benefits of using Rust in general which include:

  • Static type checking. I have caught so many bugs in Cargo when I’ve implemented 90% of a feature and then one match statement forgot to handle a variant (e.g. it wasn’t exhaustive) and I had to scrap all the work up to that point because I forgot about that case. This is such a great boon for refactoring code and maintaining robustness.
  • Deployment of a binary is actually super easy. As @huon alluded to relying on a system tool being installed is not fun on Windows, and it’s not always easy elsewhere either. This does have the downside of bootstrapping being hard, but it has the upside of once you’re bootstrapped it’s a breeze.
  • I believe that speed will indeed eventually come into play (I alluded to a few examples above), and using Rust makes it that much easier to optimize in the future.
  • Writing it in Rust forced me to fill out a small nice of dependencies such as tar-rs, flate2-rs, git2-rs, ssh2-rs, and toml-rs as well as giving feedback to other dependencies like curl-rust and docopt.

#9

Thank you for the detailed response. I’m basically convinced… If I can characterize this point:

It seems that the portability problem is resolved by, essentially, lugging a platform along with you in library form? (Granted, libgit2 is a much smaller platform than the Windows git CLI distribution, which actually seems to come with a whole Unix environment, gah!) This actually did (probably still does) cause trouble on FreeBSD, where current libgit2 requires a patch to compile (or at least did last month), and FreeBSD Ports is still a version behind. (As an aside, I think the proper solution to this type of issue, where it’s impossible to have a single crate adequately capture the changing versions of a wrapped native dependency, is to have multiple crates for the different versions, for example, a crate like libgit2-sys-0_22 mapping to libgit2.so.0.22, or a git-cli-1_6 crate that can speak the git 1.6 CLI. Of course, that’d require some ability for a host-query phase to affect Cargo dependencies, not sure if that’s possible, yet?) I will say that I don’t think this invalidates my point: it’s possible to conceive of having a single wrapper around the git CLI that tracks git versions back many years (if one only uses a subset of git’s capabilities). It’s more difficult to think that a single library wrapper could as easily cover libgit2 0.21 and 0.22.

But I do find these arguments compelling… They make me feel better about the effort I had to go to in getting Cargo to run on FreeBSD. :smile:


#10

Sorry, I wasn’t clear. I actually meant development speed. I think that building the compiler with make vs building it with cargo is going to be pretty much be the same in terms of time, hence there’s no significant improvement in terms of development speed, as the bottleneck is compiling + running the test suite, either by the developers or by bors.

I do agree that’s a good idea to decouple stuff from rustc, and in particular try to break it down in as many pieces as possible to improve build times by having many crates being build in parallel. Right now the slowest part of the bootstrap is compiling rustc which takes several minutes and is done in a single thread.


#11

This is currently a problem for the Illumos platform. While Joyent’s pkgsrc has rustc that I tried to use as stage0, it doesn’t have cargo.

When setting up a Continuous Integration setup for golang (which is written in golang), I was able to first build golang 1.4.3 using gcc, and then use golang 1.4.3 to build all later versions of golang.

I’m going to poke about a bit more to understand the various components involved, but I can tell you that it may be strategically important to know how to port Rust to a new platform, and how to bootstrap without having a pre-packaged stage0 rustc or cargo.


#12

The rust way is using crosscompilation.


#13

mrustc is now another way to bootstrap Rust without a stage0 binary.