Brainstorming improvements for new rustc devs

As a pretty new rustc dev, I wanted to share some of my experiences here and see if we can brainstorm ways to help improve things for other people who want to contribute.

Before I get started: I know that “compilers are hard”. While some complexity is inherent, I suspect that some of this might be fixable with better docs, improvements to the makefiles, etc.

Makefile confusion

Trying to use the current makefiles can get frustrating if you need to build/test specific things. I can learn some of the basic incantations like “make rustc-stage1 -j4” to get a working compiler I can play with rather than having to wait for the whole build cycle. It’s great that we let people stop at a particular stage and only build certain artifacts.

That said, it’s difficult to predict how to run parts of the test suite. There’s a great command “make tips” (yay!) that helps you get started. Unfortunately, the use of abbreviations is unpredictable and no master list is given in the Makefile. Eg it’s not “make check-stage1-run-pass” it’s “make check-stage1-rpass”. What about running the core tests? “make check-stage1-coretesttest” or “make check-stage1-coretest”? Or syntax tests? “make check-stage1-syntaxtests” or “make check-stage1-syntax” or “make check-stage-stest”? You get the picture. There are some easy fixes for this, eg putting the master list in the Makefile or by naming the test groups in a predictable way.

Equally puzzling is that “make check” doesn’t seem to run all the tests, or doesn’t appear to. This may be related to another issue listed here (see “debugging issues” below) which may be prematurely making “make check” seem complete when it isn’t. If that’s the case, we may want to have “make check” spit out a full summary even if run with -j4.

Some tests, like the check-docs tests, don’t run via “make check” by design. If that’s still the case, it would make sense to bring all the tests under one umbrella. Basically, the command would be “make check-everything-travis-will-complain-about” :wink:

Debugging issues

There’s a known issue that lldb tests don’t appear work on the latest OS X (https://github.com/rust-lang/rust/issues/32520). This seems to make “make check” not complete successfully and miss important tests that travis later complains about. Until it’s resolved, we may want to reorder the tests so that the test groups we know are broken on some platforms run last.

The next issue, assuming this has been fixed, is that compiling a debug-able compiler can be slow if you don’t pass the right set of flags. Here again, I think we could have Makefile (or rustbuild) come to the rescue. By default, if ‘make rustc-stage1’ built a debugging compiler that was optimized to the point of being “fast enough” this seems like a win-win.

Unfortunately, between having problems putting together a debug-able compiler and the issues with lldb I ended up just resorting to println debugging. As you can imagine, that plus the long compile cycle times makes for a very slow process.

Assuming rustc could be debugged by default, something I liked about working on the chapel compiler is that you can start debugging the compiler in one step. By passing an additional commandline option to the compiler, it would start itself inside of gdb and load all the additional helper commands that let you inspect known compiler data structures. As you can imagine, this helps you shave off lots of time trying to ferret out tricky bugs.

Build times

Yup, I know this is a tricky problem to solve, but it’s still worth a mention. If you want to fix a bug and see if you got it right, you may have to wait 8-10 minutes if it shows up in stage 1 (say, if you’re editing libsyntax) or tens of minutes if it shows up in stage 2. Asking around, this isn’t going to get better with incremental compilation because - as I understand it - you can’t save off some of the previous compiler build. You have to start from scratch each time.

The issue here is basic mechanics. Waiting that long to see if a bug is fixed minimizes the amount you can get done to something like 8 fix attempts/day when you count the full compile/test time.

I mentioned this to a coworker and they said “just work on multiple branches at a time”. In my experience, this advice is only feasible with multiple computers, as running a make with -j4 on this laptop brings it to a crawl.

Like I said, I totally get that working on a compiler is very much a “working in the mud” endeavor. But I get a sense that some of these rough edges can be fixed, and it’ll help bring new people on-board.

13 Likes

Great post. As another new rustc dev, I’ve hit these same pain points.

“Make confusion” is a symptom of “build times,” IMHO. Coming from LLVM (I realize this is an apples-and-oranges comparison) I don’t even care about any targets besides make and make check, because it only rebuilds what I changed and does it quickly. When I’m hacking on my LLVM passes my rebuilds are under 1 minute.

To be honest, I don’t understand why the build is so slow. At a high level, I understand it’s rebuilding everything each time, but then I don’t understand why it needs to rebuild everything each time. And then whatever the answer is to that I would probably ask “why?” again.

I realize it’s subjective but I’m also not a fan of printf debugging. Especially with the long build times. Again in LLVM I can single-step the whole compiler.

I’ve actually become somewhat of an accidental expert at making LLVM’s build as fast as possible. The big “wins” are:

  1. Replacing make with ninja
  2. Only building targets you care about (ex: on x86)
  3. Building shared libs

ninja is basically a drop in replacement for make if you use CMake (which the LLVM project has migrated to).

I would like to get “lldb tests” working on OSX but I don’t have much to add beyond that :smile:

At least rustbuild has a ninja option and I’d almost force it on by default, or at least warn when ninja is not available, because it’s so much better. cc @alexcrichton

2 Likes

For me, better support for out-of-tree (i.e., my distros package) LLVM would be nice. I don’t work on the compiler regularly so every time I try to fix a bug/test out a small change, make check rebuilds LLVM from scratch.

At this point, I often just don’t bother testing locally for minor changes. I just make a pull request and let travis make sure I haven’t broken anything (which is decidedly not the correct way to do things).

1 Like

Incremental compilation should definitely help when only a single stage needs to be rebuilt. @eddyb how does rustbuild use ninja? I thought it was lil scripts + rust executable + cargo.

For building LLVM, that’s what we were talking about.

Oh, gotcha

I was also trying to make an analogy between building LLVM and rustc.

There is a configuration for building LLVM that greatly reduces the edit-compile-debug cycle and then there is a totally different configuration for building an actual compiler one would install.

IMHO, there should be a similar “dev build” config for rustc that’s way faster than the normal build and optimizes the edit-compile-debug cycle. AFAIK this config is just build make rustc-stage1 instead of make but that’s not nearly as fast as I would hope for :wink:

If your changes are limited, it is possible (but not officially supported) to only build what you changed, since rustc puts most of its crates in a shared object. Here are some scripts to get you started:

I’ve succesfully used these to rebuild part of rustc to be used in conjuction with the official binary distribution. Note: it’s been a while since I used either of these so they might require some modifications.

Yeah - I had also heard there were some alternate build approaches that people could take when I asked around. Though the “not officially supported” thing makes go “hmmm”.

Definitely helpful to know about what tricks are available. We should also think some about how these might lead to a happy path for both new and seasoned developers. Things like having a decently fast edit-compile-debug cycle, like Scott points out, sounds like goodness.

Maybe we can take the tricks and turn them into something that’s better supported?

1 Like

I agree with @scottcarr that many of these can boil down to “slow builds”, but as @jntrnr points out they aren’t going away any time soon so it’d be best to buckle down for the long haul and figure out how to deal with it in the meantime.

One of my main goals with rustbuild was to create a readable build system that was far easier to contribute to than the makefiles and even just to read and understand. Unfortunately I haven’t quite had the time to push it over the finish line just yet, but I think that it’s the best vector to solving much of the “makefile confusion” above. Once rustbuild is on by default it should be pretty easy to get some docs up and going as well as perhaps some reorganization of the rule names to be easier to call.

The problem with LLDB on OSX is unfortunate, but @michaelwoerister indicates that the latest xcode should work. Is this not the case though? This should in theory come up pretty rarely, and the problem here was doing the “right thing” wrt debuginfo ended up doing the “wrong thing” wrt the debuggers. This passed the bots as they’re using a different version of xcode than most (slightly out of date), and ideally we’d have a bot for all versions of xcode but unfortunately we’re already bursting at the seams for the amount of automation we have.

Finally, I feel that one of the biggest pain points today is that the “fast paths” (e.g. use a stage1 compiler) don’t actually work (as you’ve mentioned) because tests are likely to fail. I would love to start setting up some bots for workflows like this to ensure that they always pass. These should be pretty cheap bots that can just fill in the gaps around the rest of the automation (when it’s idle). This way we can ensure these shortcuts, which everyone wants and “seasoned devs” know what kinds of failures to ignore, will always work for everyone reliably.

It doesn't work for me.

$ lldb --version
lldb-350.0.21.9

Yeah… I think I ran into most of these two years ago when I submitted my one and so far only code contribution to rustc. Except on my netbook-class processor (probably slower than a smartphone these days?) those “tens of minutes” build times were 2-3 hours (I can’t remember if that was including the testsuite, or just the build itself). The patch was something like 10 lines and I started working on it Friday and managed to submit it sometime on Sunday. It didn’t make me want to repeat the experience…

Since so much of this is about how disorganized the makefiles are, it seems like the entire picture will change once we’ve moved fully to rustbuild.

As far as bootstrap times go, every time we talk about it I don’t see any way to cut out any part of it, particularly without dropping dylib plugins. But we still might be able to create a better defalut workflow for developers that short-circuits parts of the bootstrap in the right scenarios. For example, there’s a clear difference in the possible workflows of compiler devs and std devs. std devs can generally short-circuit everything but the stage2 stdlib.

So in that light one way we could re-design the build process is toward ‘roles’. In the ‘std-developer’ role after the full rebuild it only rebuilds std (we could additionally make it so stage2 build stop right at std/test and doesn’t build librustc_*). Doing partial rebuilds like this though affects which parts of the test suite can be expected to work correctly, and not having this relationship well-coordinated is one of our problems here.

Since plugins are the main reason that builds need to go all the way through stage2, we might be able to stop by default at the end of stage1+std (no librustc), but just turn off all the test suite that deals with plugins (printing some warning that you aren’t running the full thing). That means everybody would build the std + rustc once, then std one more time to get one compatible with the new rustc. I think that’s basically the minimum amount of building we can reasonably expect to get a working compiler.

One problem the build system traditionally has here is that it doesn’t differentiate between the set of libraries necessary for std, and the set of all libraries we provide, it just considers every crate a ‘target library’, so even if you just need to build std, you are still building all the way through librustc, which in the final stage is only used by plugins. rustbuild has a more fine-grained opinion about which libraries are grouped together and will let us more easily just stop building at std.

3 Likes

Just a few more thoughts on what the minimal set of crates you need to build to do any useful development. For the minimal development build one could cut out most of stage0 too. Imagine if you are just fixing a one-liner in rustc_driver than doesn't impact the ABI. The only thing that actually needs to change to validate that it works is to rebuild rustc_driver (and the few other crates that depend on it) - the new rustc should still be able to use the exact same std it was built with. So you could imagine a really clever build system reusing all the stage0 artifacts from the snapshot up to rustc_driver, rebuilding just rustc_driver, promoting that combination of snapshot stage0 bins and the new rustc to stage1, then testing the stage1 compiler using the stage0 sysroot. It should all work, as long as you haven't broken any ABIs that the test suite cares about. But automatically understanding the impact on the test suite is super hard.

As an aside, I've long disliked the way the bootstrap uses stageN host artifacts to build stageN target artifacts, then copies those target artifacts to stageN+1 (iow that the compiler for one stage builds its own libraries for that stage; that artifacts from two different stages are mixed together). I created this scheme, but I'd rather have it so the stageN compiler builds the stageN+1 libs and then copies those to become the stageN+1 host compiler (both schemes have complications though). The more time passes though the more impossible it seems that we could ever make such a major change.

Ugh, so as usual every time this comes up, I don't have any concrete plan as how to fix it - the issues are tiny and interrelated and complex. But here are some scenarios with their minimal builds and impact on the test suite

  • std-basic (stage0 host + stage0 sysroot)

    • rebuild stage0 std
    • do not rebuild rest of stage0 libraries
    • reuse snapshot rustc to test std
    • any test that exercises plugins will fail because librustc hasn't been built
  • rustc-basic (stage1 host + stage0 sysroot)

    • Minor change to rustc that affects no function signatures, metadata or other ABI matters:
    • reuse stage0 std
    • reuse stage0 libs up to the rustc lib with the code change
    • rebuild stage0 rustc
    • promote stage0 target to stage1 host
    • do not rebuild stage1 std, instead use stage0 sysroot
    • all test should pass because no ABI's have changed. everything under test is directly from the snapshot except the minor rustc change

Well, I'm bored of this now. But you could imagine that everybody starts out in either the std-basic or rustc-basic "roles", then when they run into weird test failures could fallback to the full build.

OK, another suggestion to speed things up: observe that in the stage0 build if the source of std has not changed then rebuilding it will result in the exact same crate that is in the snapshot. In stage0 the build system could short-circuit the build of every crate that hasn't actually been touched by copying them from the snapshot.

Sorry for rambling.

OK, some more rambling. For people that really care about this, I think it would be worthwhile to prototype a completely different build system that is focused on developer speed (hopefully reusing the rustbuild libs but maybe not), while sacrificing correctness in some scenarios. The existing build system works well for validating that the entire system works, but most devs don’t care about that at all until right before (or after) submitting a PR. Maybe a fresh start would shed some light on the problem. It would not be a bad thing at all if active developers use a less full-featured build system for day-to-day work if it lets them do some large percentage of what they need more quickly than the complete bootstrapping build.

I’d suggest that before doing this it would be good to get a complete list of all the tricks that active developers use to short-circuit the existing build in particular scenarios, as well as understanding more thoroughly the various types of maintainers we have (std, rustc, docs, etc).

Right. This is basically what I was doing with the scripts linked above.

I don't think this makes sense - if anyone has made an ABI-breaking change in this release, stage0 and stage1 will produce incompatible libraries.

What I sometimes do is to:

  1. pre-build stage1 sysroot for master (make rustc-stage1).

  2. modify the compiler in a way that does not change the ABI

  3. build rustc only (make x86_64-unknown-linux-gnu/stage1/bin/rustc)

  4. manually perform the tests I need.

  5. if not done, goto step 2

  6. run the stage2 test-suite locally/on travis.

This is basically as optimal as it can be (when we get incremental compilation, it will be even faster, as a 1-liner change to librustc will not require any recompilations) unless you do evil things like manually building some .so-s and copying them over.

However, this strategy is hackish and has several problems:

  1. if I do change the ABI, I can expect a mess.

  2. if I accidentally run make at steps 3/4/5, it ruins the stage1 sysroot and I have to rebuild it.

  3. if I git commit and then recompile rustc_metadata, the metadata guard will complain.

A less-hackish infrastructure will fix (2), and disabling the metadata guard for development builds will fix (3).

I think the most annoying issue with the normal build process is that make/make check builds a full set of stage3 libraries, which should be identical to the stage2 libraries. OTOH I rarely ever do it locally – make rustc-stage2 builds the stage3 sysroot, which is fairly fast.

The second problem with that is that the --test tests are also a part of stage3. This means that if I have a syntax error in a test, I have to go back all the way to stage1. If I am actively developing something, I can compile it with --test using the stage0 compiler (and stage1 dependencies), but if a refactoring breaks something, this can be very annoying.

Unless there is some underlying issue, we can just copy the stage2 binaries to the stage3 binaries, and do the --test tests using the stage1 compiler.

I think a possible fix for everybody would be:

  • make stage3 libxyz.so a copy of stage2 libxyz.so
  • have the --test tests be a part of stage2 (i.e. built by stage1)
  • have a rustbuild lock stage1/rustbuild unlock stage1 commands.

rustbuild lock stage1 makes sure stage1 is up-to-date, and prevents any further calls to rustbuild from recompiling it. rustbuild unlock stage1 reverts.

This means that ABI changes while stage1 is locked will cause stage2 and stage3 to differ - so this is unsafe, and works best if we weaken the metadata guard.

However, now everything works perfectly - the std tests now only depend on stage2 libstd and its dependencies, and if you change a part of the compiler, it only builds the compiler libs (incrementally, when we get that) and promotes the old stage2 to stage3.

If you change the ABI, you just need to rustbuild unlock stage1; rustbuild lock stage1 to recompile stage1 (maybe combine it: rustbuild relock stage1).

I think we will still want a separate stage3 on bors, to assure us that our compiler can compile itself correctly.