Help test parallel rustc!

It's that time of year again and we've got a small gift for all y'all for the holidays! The parallel compiler working group has implemented a plan for you to test out a build of rustc which has far more parallelism than the current rustc does today. To cut straight to the chase, the perf improvements are looking great and we're curious to compare two nightly compilers against each other:

  • nightly-2019-12-18 - this compiler has more parallelism
  • nightly-2019-12-17 - this compiler has less parallelism

You can acquire, test and run these compilers with:

$ rustup update nightly-2019-12-18
$ rustup update nightly-2019-12-17
$ cargo +nightly-2019-12-18 build
$ cargo +nightly-2019-12-17 build

(etc)

What is parallel rustc?

But wait, you may be saying, isn't rustc already parallel! You're correct, rustc already has internal parallelism when it comes to codegen units and LLVM. The compiler, however, is not parallel at all when it's typechecking, borrow-checking, or running other static analyses on your crate. These frontend passes of the compiler are completely serial today. In development for quite some time now is a compiler that can run nearly every single step of the compiler in parallel.

Enabling parallelism in rustc, however, drastically changes internal data structures (think using Arc instead of Rc). For this reason previous builds of rustc do not have the ability to support frontend parallelism. A special nightly build, nightly-2019-12-18, has been prepared which has support compiled in for parallelism. This is experimental support we're still evaluating, though, so the commit has already been reverted and subsequent nightlies will be back to as they were previously.

What information to gather?

The parallel compiler working group is keen to get widespread feedback on the parallel mode of the compiler. We're interested in basically any feedback you have to offer, but some specifics to help you get started we're interested in are:

  • Have you found a bug? Please report it!
    • For example did rustc crash?
    • deadlock?
    • produce a nondeterministic result?
    • exhibiting any other weirdness when compiling?
  • Is parallel rustc faster?
    • When comparing, please compare nightly-2019-12-18 (parallel) and nightly-2019-12-17 (not parallel)
    • Is a full build faster?
    • Is a full release build faster?
    • Is a check build faster?
    • How about incremental builds?
    • Single-crate builds?
  • How does parallelism look to you?
    • Did rustc get slower from trying to be too parallel?

Time measuring tools like the time shell built-in as well as /usr/bin/time are extra useful here because they give insight to a number of statistics we're interested in watching. For example kernel time, user time, wall time, context switches, etc. If you've got info, we're happy to review it!

Some example commands to compare are:

# full build
$ cargo clean && time cargo +nightly-2019-12-18 build
$ cargo clean && time cargo +nightly-2019-12-17 build

# full release build
$ cargo clean && time cargo +nightly-2019-12-18 build --release
$ cargo clean && time cargo +nightly-2019-12-17 build --release

# full check
$ cargo clean && time cargo +nightly-2019-12-18 check
$ cargo clean && time cargo +nightly-2019-12-17 check

# ... (etc)

When you report data it'd also be very helpful if you indicated what your system looks like. For example:

  • What OS do you have? (Windows/Mac/Linux)
  • How many CPUs do you have? (cores/threads/etc)
  • How much memory do you have?

We've already seen some widely varying data across different system layouts, for example 28-thread machines have shown very different performance characteristics than 8-thread machines. Most testing has happened on Linux so far so we're very interested to get more platforms into the pipeline too!

Known issues

  • The compiler will max out at 4 parallelism. We've hit some issues with rustc scaling to many threads causing slowdowns for various reasons. We're working on a solution and have a number of ideas of how to solve this. If you've got a 128 core system and only 4 are in use, fear not! We'll soon be able to make use of everything :slight_smile:
  • If you pass -jN (which defaults to the number of cores you have) Cargo may end up spawning more than N rustc processes. No more than N should actually be doing work, but it may be the case that more processes are spawned. We plan to fix this before shipping parallel rustc.

Thanks in advance for helping us out! We hope to turn at least some parallelism on by default early next year (think January) with full parallelism coming soon after. That all depends on the feedback we get from this thread, though, and we'd like to weed out any issues before we turn this on by default!

26 Likes

@josh reported a few improvements so far:

@Eh2406 reports not much difference for Cargo itself on Windows

1 Like

I wouldn't call it "not much difference" for smaller crates; that was a solid ~7% improvement, which is quite welcome.

8 Likes

Building all of Servo (which includes a lot of C++) in release mode on a 14 cores 28 threads Linux desktop went from 6m47s to 6m18s, improving by 7%.

Time for the script crate excluding codegen went from 64.8 to 42.9 seconds, improving by 34%!

That crate is by far the largest, and the shape of the dependency graph is such that not a lot else is happening while it is being compiled. In the output of cargo build -Z timings, the CPU usage graph is very telling:

I think we see this happen in this CPU usage graph around the 240s mark. It sounds promising that even more wins are within reach!

There are still times where only one CPU thread is being used, though. Are some parts of the frontend still sequential?

6 Likes

I think wasmtime has the same thing happen, when compiling syn, which normally ends up in a single-threaded compile for a little while in the middle of the build. Now I see a few CPUs in use at that point.

1 Like

Timings for async-std 1.3.0:

[skade@Nostalgia-For-Infinity async-std]$ cargo clean && time cargo +nightly-2019-12-18 build

real	0m7,978s
user	0m31,223s
sys	0m2,658s
[skade@Nostalgia-For-Infinity async-std]$ cargo clean && time cargo +nightly-2019-12-17 build

real	0m8,853s
user	0m30,087s
sys	0m2,182s
[skade@Nostalgia-For-Infinity async-std]$ cargo clean && time cargo +nightly-2019-12-18 build --release

real	0m11,329s
user	1m4,118s
sys	0m2,573s
[skade@Nostalgia-For-Infinity async-std]$ cargo clean && time cargo +nightly-2019-12-17 build --release

real	0m12,421s
user	1m5,124s
sys	0m2,218s

[skade@Nostalgia-For-Infinity async-std]$ cargo clean && time cargo +nightly-2019-12-18 check

real	0m5,248s
user	0m18,615s
sys	0m2,014s
[skade@Nostalgia-For-Infinity async-std]$ cargo clean && time cargo +nightly-2019-12-17 check

real	0m6,084s
user	0m16,230s
sys	0m1,606s

It's roughly 10% on debug builds, slightly less in release. Which is not a big surprise, we spend a lot of our time linking.

Edit: Sorry, forgot the machine info:

Carbon X1 6th gen. 4core i7 @ 2GHz + 16GB Ram.

1 Like

Yes there's a number of portions of the compiler that are still sequential, others can speak more to specifics but I think the high-level ones are:

  • Codegen (sorta). Only one thread performs translation from MIR to LLVM IR so it takes some time to "ramp up" and get parallelism. Once parallelism is on-by-default we plan to refactor this to have truly parallel codegen.
  • Parsing
  • Name resolution

The compiler isn't perfectly parallel, and we've found it's increasingly more difficult to land more parallelism unless it's all on by default. The thinking is that what we currently have is the next big step forward, but it's certainly not the end!

I also agree that the little bump in the middle of the graph you're looking at is the 4 cores getting active. Looks like that rate limiting is actually working! You can also experiment with the -Zthreads value (such as -Zthreads=28) if you'd like to test higher numbers. You may experience slowdowns at the beginning of the compilation but are likely to experience speedups for the script crate itself.

It may be worthwhile perhaps trying out just the script crate compilation, with a high -Zthreads limit? You may also be able to get some mileage with measureme to see where the sequential bottlenecks are so we can plan to work on those too!

3 Likes

Overall, this change is a great improvement! The stats below were obtained from building tokio-rs/tracing at commit fc3ab4. The VM is c3.8xlarge (32 Virtual CPUs, 60.0 GiB Memory, 640 GiB SSD Storage) running Amazon Linux 2.

cargo build: A speedup of 33.6%.

  • cargo clean && time cargo +nightly-2019-12-17 build: 21.42 seconds
  • cargo clean && time cargo +nightly-2019-12-18 build: 14.23 seconds
131.11user 7.97system 0:21.42elapsed 649%CPU (0avgtext+0avgdata 587096maxresident)k
0inputs+1127952outputs (333major+1828747minor)pagefaults 0swaps
157.53user 9.38system 0:14.23elapsed 1172%CPU (0avgtext+0avgdata 656468maxresident)k
0inputs+1109600outputs (0major+2030716minor)pagefaults 0swaps

cargo build --release: A speedup of 11.7%.

  • cargo clean && time cargo +nightly-2019-12-17 build --release: 34.29 seconds
  • cargo clean && time cargo +nightly-2019-12-18 build --release: 30.28 seconds
421.71user 8.69system 0:34.30elapsed 1254%CPU (0avgtext+0avgdata 738148maxresident)k
0inputs+460136outputs (0major+1976122minor)pagefaults 0swaps
463.09user 10.84system 0:30.29elapsed 1564%CPU (0avgtext+0avgdata 758324maxresident)k
0inputs+459600outputs (0major+2191324minor)pagefaults 0swaps

cargo build --check: A speedup of 26.1%.

  • cargo clean && cargo +nightly-2019-12-17 check: 18.56 seconds

  • cargo clean && cargo +nightly-2019-12-18 check: 13.71 seconds

90.01user 5.22system 0:18.56elapsed 513%CPU (0avgtext+0avgdata 555520maxresident)k
0inputs+461128outputs (0major+1384224minor)pagefaults 0swaps
109.42user 6.85system 0:13.71elapsed 847%CPU (0avgtext+0avgdata 593012maxresident)k
0inputs+461072outputs (0major+1633770minor)pagefaults 0swaps
1 Like

I’ll try to dig in further in a bit, but I got a very surprising result trying to build Volta: it has some small but noticeable improvements in debug builds… and crashed the laptop hard, twice in a row, when running the parallel release build when on battery. The regular release build was fine. Once I’m on power again I’ll see what happens and figure out whether this was a fluke!

Just tested wasmtime with -Zthreads=72, and the time went from 1m9s (parallel rustc without -Zthreads) to 2m1s (parallel rustc with -Zthreads=72). The user time massively increased, from 16m50s to 50m.

Watching htop, it looks like the jobserver bits between cargo and rustc aren't actually limiting to 72 jobs at a time, because I see a load average around 300-500 and many hundreds of running (not blocked) rustc processes across many crates.

Yes this is a known bug we're working on, on my 28-thread system locally -Zthreads=28 makes compiles times quite bad. We'll be sure to reach out to you though when we think we have a fix for this, 72 is the highest number we've seen so far!

Basically, it's expected that -Zthreads=72 performs pretty badly right now.

1 Like

Results from compiling the spurs crate...

I ran each command twice.

Intel(R) Core(TM) i5-7500T CPU @ 2.70GHz

16GB RAM

# full build, serial compiler
real	0m10.321s
user	0m36.706s
sys	0m2.307s

real	0m10.215s
user	0m36.790s
sys	0m2.333s


# full build, parallel compiler
real	0m11.396s
user	0m38.184s
sys	0m2.716s

real	0m12.243s
user	0m38.848s
sys	0m2.724s

# full build, release, serial
real	0m23.591s
user	1m27.648s
sys	0m2.180s

real	0m23.220s
user	1m27.705s
sys	0m2.254s

# full build, release, parallel

real	0m23.931s
user	1m29.144s
sys	0m2.432s

real	0m24.661s
user	1m28.921s
sys	0m2.435s

# check, serial
real	0m7.703s
user	0m20.652s
sys	0m1.827s

real	0m7.712s
user	0m20.718s
sys	0m1.816s

# check, parallel
real	0m7.921s
user	0m22.267s
sys	0m1.986s

real	0m8.068s
user	0m22.192s
sys	0m1.987s

So in summary

full build: parallel is 15% (~1.4 seconds) slower on average 
full build release: parallel is 3% (<1 second) slower on average
check: 3% (<1 second) slower on average

On crates.io's codebase at 1fc6bfa on macOS Catalina 10.15.1, 3.1 GHz Dual-Core Intel Core i7, 16 GB RAM:

Full build

$ cargo clean && time cargo +nightly-2019-12-18 build
real	10m1.042s
user	18m47.605s
sys	1m50.826s
$ cargo clean && time cargo +nightly-2019-12-17 build
real	8m33.007s
user	17m14.295s
sys	1m38.205s

Soooo parallelism made it slower?

Full release build

$ cargo clean && time cargo +nightly-2019-12-18 build --release
real	18m58.546s
user	44m1.749s
sys	1m48.858s
$ cargo clean && time cargo +nightly-2019-12-17 build --release
real	16m54.551s
user	43m2.278s
sys	1m38.579s

Also slower for me :frowning:

Full check

$ cargo clean && time cargo +nightly-2019-12-18 check
real	6m35.014s
user	12m18.504s
sys	1m22.026s
$ cargo clean && time cargo +nightly-2019-12-17 check
real	5m16.316s
user	11m27.044s
sys	1m17.450s

:frowning:

I'm happy to run anything else that would be useful, or provide any other info, let me know what!

2 Likes

I tested on Fuchsia. Everything built and all tests passed correctly, so no problems there.

Here are three timings from builds that were run without parallel rustc enabled:

5284.35user 696.95system 3:00.15elapsed 3320%CPU (0avgtext+0avgdata 2963072maxresident)k
1704inputs+61728288outputs (173653major+94432200minor)pagefaults 0swaps

5337.72user 679.63system 3:07.12elapsed 3215%CPU (0avgtext+0avgdata 2962480maxresident)k
6217992inputs+61728224outputs (174415major+97968950minor)pagefaults 0swaps

5279.77user 705.58system 3:01.22elapsed 3302%CPU (0avgtext+0avgdata 2968028maxresident)k
14080inputs+61729144outputs (182088major+96475006minor)pagefaults 0swaps

And here are 3 runs with parallel rustc enabled:

5313.74user 689.57system 2:59.98elapsed 3335%CPU (0avgtext+0avgdata 2982788maxresident)k
1048inputs+60992440outputs (155345major+96491042minor)pagefaults 0swaps

5318.76user 692.16system 2:58.39elapsed 3369%CPU (0avgtext+0avgdata 2964200maxresident)k
1137304inputs+61014216outputs (160991major+98495705minor)pagefaults 0swaps

5330.31user 691.24system 2:57.67elapsed 3389%CPU (0avgtext+0avgdata 2965328maxresident)k
1040inputs+60995288outputs (159364major+97233184minor)pagefaults 0swaps

So it looks like we saw a small but significant improvement in build time. I suspect that enabling >4 threads would lead to further improvements.

Note that there are some non-Rust steps in those builds, so the gain might be higher percentage-wise. (Only the Rust targets were invalidated, but we have some targets that depend on everything at the very end.)

This wasn't very scientific, so should probably be taken with a grain of salt. :slight_smile:

On the build of specifically our third party code (built with cargo; takes around 20s; included in the above builds) I did not notice any significant change.

1 Like

Okay, after getting machine on power, results:

  • cargo clean && time cargo +nightly-2019-12-18 build: 80.14 real 485.11 user 48.62 sys

  • cargo clean && time cargo +nightly-2019-12-17 build: 75.14 real 456.43 user 37.09 sys

  • cargo clean && time cargo +nightly-2019-12-18 build --release: 136.43 real 1227.13 user 43.18 sys

  • cargo clean && time cargo +nightly-2019-12-17 build --release: 138.98 real 1214.59 user 33.23 sys

(Best guess: I hit some odd condition with maxing the cores on low battery earlier.)

1 Like

Quick followup: I did some perf runs on -Zthreads=72 builds, and it looks like the substantial user-time overhead (going from 16m50s to 50m, or 40m with a kernel patch to improve pipe wakeup fairness) consists heavily of attempts to do work-stealing:

  13.07%  rustc            librustc_driver-0d78d9a30be443c5.so          [.] std::thread::local::LocalKey<T>::try_with
  10.95%  rustc            librustc_driver-0d78d9a30be443c5.so          [.] crossbeam_epoch::internal::Global::try_advance
   6.93%  rustc            [unknown]                                    [k] 0xffffffff91a00163
   5.86%  rustc            librustc_driver-0d78d9a30be443c5.so          [.] crossbeam_deque::Stealer<T>::steal
   4.14%  rustc            librustc_driver-0d78d9a30be443c5.so          [.] <core::iter::adapters::chain::Chain<A,B> as core::iter::traits::iterator::Iterator>::try_fold
   3.02%  rustc            ld-2.29.so                                   [.] _dl_update_slotinfo
   2.18%  rustc            ld-2.29.so                                   [.] __tls_get_addr
   1.65%  rustc            librustc_driver-0d78d9a30be443c5.so          [.] crossbeam_epoch::default::HANDLE::__getit
   1.39%  rustc            librustc_driver-0d78d9a30be443c5.so          [.] crossbeam_epoch::default::pin
   1.14%  rustc            ld-2.29.so                                   [.] update_get_addr
   1.14%  rustc            libc-2.29.so                                 [.] __memmove_sse2_unaligned_erms
   1.10%  rustc            [unknown]                                    [k] 0xffffffff91a00b27
   1.05%  rustc            rustc                                        [.] free
   0.93%  rustc            rustc                                        [.] malloc
   0.70%  rustc            ld-2.29.so                                   [.] __tls_get_addr_slow
   0.66%  rustc            libLLVM-9-rust-1.41.0-nightly.so             [.] combineInstructionsOverFunction
   0.64%  rustc            librustc_driver-0d78d9a30be443c5.so          [.] __tls_get_addr@plt

I dug further, and the calls to std::thread::local::LocalKey<T>::try_with come from crossbeam_deque::Stealer<T>::steal.

1 Like

I used my latest project multi_file_writer and compared it with stable too.

The CPUs look more occupied in the load graph when 2019-12-18 is running. However, having the benchmark running tree times shows me that the variance is bigger than the improvement. And for a laptop, cooling has a greater effect than the optimization.

  • Hardware: AMD Ryzen 7 PRO 3700U (8 Cores, 24GB RAM, 1TB SSD - Lenovo T495
  • Software: Kubuntu Linux 19.10, Kernel 5.3.0-24-generic
# full build
cargo clean && time cargo +nightly-2019-12-18 build
real    0m17.877s   0m16.555s   0m18.536s
user    1m35.093s   1m32.151s   1m39.613s
sys     0m4.695s    0m4.615s    0m5.139s

cargo clean && time cargo +nightly-2019-12-17 build
real    0m18.620s   0m17.727s   0m18.014s
user    1m25.832s   1m23.918s   1m24.582s
sys     0m3.884s    0m3.891s    0m3.960s

cargo clean && time cargo +stable build
real    0m18.954s   0m18.702s   0m19.721s
user    1m31.474s   1m29.693s   1m31.224s
sys     0m4.240s    0m3.892s    0m4.166s



# full release build
cargo clean && time cargo +nightly-2019-12-18 build --release
real    0m45.455s   0m40.779s   0m41.285s
user    4m42.960s   4m27.842s   4m27.474s
sys     0m5.058s    0m4.540s    0m5.088s

cargo clean && time cargo +nightly-2019-12-17 build --release
real    0m43.404s   0m43.381s   0m41.122s
user    4m35.760s   4m30.659s   4m25.856s
sys     0m4.376s    0m4.173s    0m4.233s

cargo clean && time cargo +stable build --release
real    0m40.928s   0m40.848s   0m42.138s
user    4m22.785s   4m24.846s   4m28.641s
sys     0m4.290s    0m4.274s    0m4.358s


# full check
cargo clean && time cargo +nightly-2019-12-18 check
real    0m12.896s   0m13.526s   0m12.870s
user    0m59.298s   0m59.617s   0m58.271s
sys     0m3.352s    0m3.519s    0m3.672s

cargo clean && time cargo +nightly-2019-12-17 check
real    0m13.727s   0m14.880s   0m13.569s
user    0m49.721s   0m53.186s   0m51.119s
sys     0m2.922s    0m2.983s    0m3.023s

cargo clean && time cargo +stable check
real    0m14.428s   0m14.235s   0m14.488s
user    0m52.424s   0m52.405s   0m53.769s
sys     0m3.142s    0m2.962s    0m3.148s

My CPUs layout: image

If there is an other benchmark, I would be happy to run that too.

Testing against reso_dd at commit 2d21506, a soon-to-be-released crate that is 2.2 MB of serde-annotated structs. The only functions it contains are custom serde serialization/deserialization. Dependencies are serde and chrono.

For this particular type of workload, the speedup seems quite significant.

Operating system: MacOS 10.15.1 Hardware: MacBook Pro (16-inch, 2019), 2.4 GHz 8-Core Intel Core i9, 32 GB 2667 MHz DDR4

# full build
cargo +nightly-2019-12-18 build -p reso_dd  63.89s user 3.26s system 208% cpu 32.208 total
cargo +nightly-2019-12-17 build -p reso_dd  58.59s user 2.94s system 134% cpu 45.696 total

# full release build
cargo +nightly-2019-12-18 build -p reso_dd --release  136.12s user 3.42s system 390% cpu 35.720 total
cargo +nightly-2019-12-17 build -p reso_dd --release  129.22s user 3.11s system 289% cpu 45.679 total

# full check
cargo +nightly-2019-12-18 check -p reso_dd  53.91s user 2.51s system 212% cpu 26.513 total
cargo +nightly-2019-12-17 check -p reso_dd  49.01s user 2.22s system 134% cpu 38.152 total

Using commit 80066510b54f1ae05d51f65b52e18bdd5357016c of differential dataflow and compiling some of the examples, I found:

  • a noticeable speedup when doing a full build, i.e. dependencies are being compiled (~10% for debug, ~5% for release)
  • no speedup (or: lost in the noise) when just compiling final binaries (makes sense - rustc timings indicate most time is spent in llvm, also explaining debug/release difference) (I don't know how much the type rechecking etc the touch will cause)
# Parallel debug builds:

$ rm -rf target

$ /usr/bin/time cargo +nightly-2019-12-18 build --example progress
[...]
    Finished dev [unoptimized + debuginfo] target(s) in 31.48s
110.99user 4.06system 0:31.50elapsed 365%CPU (0avgtext+0avgdata 920740maxresident)k
440inputs+1249848outputs (7major+1433125minor)pagefaults 0swaps

$ /usr/bin/time cargo +nightly-2019-12-18 build --example pagerank
[...]
    Finished dev [unoptimized + debuginfo] target(s) in 11.17s
28.30user 0.69system 0:11.17elapsed 259%CPU (0avgtext+0avgdata 804092maxresident)k
0inputs+444360outputs (0major+307311minor)pagefaults 0swaps

$ /usr/bin/time cargo +nightly-2019-12-18 build --example monoid-bfs
[...]
    Finished dev [unoptimized + debuginfo] target(s) in 10.13s
25.70user 0.81system 0:10.14elapsed 261%CPU (0avgtext+0avgdata 770656maxresident)k
0inputs+413664outputs (0major+290452minor)pagefaults 0swaps

# Non-parallel debug builds

$ rm -rf target

$ /usr/bin/time cargo +nightly-2019-12-17 build --example progress
[...]
    Finished dev [unoptimized + debuginfo] target(s) in 36.61s
101.69user 3.60system 0:36.62elapsed 287%CPU (0avgtext+0avgdata 960348maxresident)k
32inputs+1249832outputs (0major+1290682minor)pagefaults 0swaps

$ /usr/bin/time cargo +nightly-2019-12-17 build --example pagerank
[...]
    Finished dev [unoptimized + debuginfo] target(s) in 10.59s
28.35user 0.87system 0:10.60elapsed 275%CPU (0avgtext+0avgdata 845212maxresident)k
0inputs+444344outputs (0major+322688minor)pagefaults 0swaps

$ /usr/bin/time cargo +nightly-2019-12-17 build --example monoid-bfs
[...]
    Finished dev [unoptimized + debuginfo] target(s) in 10.11s
26.29user 0.71system 0:10.12elapsed 266%CPU (0avgtext+0avgdata 755828maxresident)k
0inputs+413624outputs (0major+277129minor)pagefaults 0swaps

# Parallel release builds 

$ rm -rf target

$ /usr/bin/time cargo +nightly-2019-12-18 build --example progress --release
[...]
    Finished release [optimized + debuginfo] target(s) in 2m 11s
297.09user 4.98system 2:11.90elapsed 229%CPU (0avgtext+0avgdata 2827380maxresident)k
0inputs+1018456outputs (0major+2283912minor)pagefaults 0swaps

$ touch examples/progress.rs

$ /usr/bin/time cargo +nightly-2019-12-18 build --example progress --release
[...]
    Finished release [optimized + debuginfo] target(s) in 1m 35s
132.45user 1.34system 1:35.33elapsed 140%CPU (0avgtext+0avgdata 2835180maxresident)k
0inputs+424168outputs (0major+993830minor)pagefaults 0swaps

#Non-parallel release builds

$ rm -rf target

$ /usr/bin/time cargo +nightly-2019-12-17 build --example progress --release
[...]
    Finished release [optimized + debuginfo] target(s) in 2m 20s
289.44user 4.20system 2:20.92elapsed 208%CPU (0avgtext+0avgdata 2828940maxresident)k
0inputs+1042776outputs (0major+2153859minor)pagefaults 0swaps

$ touch examples/progress.rs

$ /usr/bin/time cargo +nightly-2019-12-17 build --example progress --release
[...]
    Finished release [optimized + debuginfo] target(s) in 1m 37s
134.14user 1.39system 1:37.55elapsed 138%CPU (0avgtext+0avgdata 2845420maxresident)k
0inputs+448848outputs (0major+1003126minor)pagefaults 0swaps

Timing data for the turtle crate.

This is using a fairly powerful CPU. Full system info at the end of this comment.

cargo clean && time cargo +nightly-2019-12-18 build

real	0m23.471s
user	4m30.381s
sys	0m17.490s

cargo clean && time cargo +nightly-2019-12-17 build

real	0m26.377s
user	4m3.021s
sys	0m13.118s

cargo clean && time cargo +nightly-2019-12-18 build --release

real	0m43.751s
user	10m41.284s
sys	0m17.831s

cargo clean && time cargo +nightly-2019-12-17 build --release

real	0m47.365s
user	10m21.766s
sys	0m13.473s

The turtle crate has a lot of examples, so I thought it would be worth it to run those too:

cargo clean && time cargo +nightly-2019-12-18 build --examples

real	0m33.796s
user	6m52.774s
sys	0m36.885s

cargo clean && time cargo +nightly-2019-12-17 build --examples

real	0m36.875s
user	6m27.881s
sys	0m31.876s
System info

image