Help test parallel rustc!

Yes there's a number of portions of the compiler that are still sequential, others can speak more to specifics but I think the high-level ones are:

  • Codegen (sorta). Only one thread performs translation from MIR to LLVM IR so it takes some time to "ramp up" and get parallelism. Once parallelism is on-by-default we plan to refactor this to have truly parallel codegen.
  • Parsing
  • Name resolution

The compiler isn't perfectly parallel, and we've found it's increasingly more difficult to land more parallelism unless it's all on by default. The thinking is that what we currently have is the next big step forward, but it's certainly not the end!

I also agree that the little bump in the middle of the graph you're looking at is the 4 cores getting active. Looks like that rate limiting is actually working! You can also experiment with the -Zthreads value (such as -Zthreads=28) if you'd like to test higher numbers. You may experience slowdowns at the beginning of the compilation but are likely to experience speedups for the script crate itself.

It may be worthwhile perhaps trying out just the script crate compilation, with a high -Zthreads limit? You may also be able to get some mileage with measureme to see where the sequential bottlenecks are so we can plan to work on those too!

3 Likes

Overall, this change is a great improvement! The stats below were obtained from building tokio-rs/tracing at commit fc3ab4. The VM is c3.8xlarge (32 Virtual CPUs, 60.0 GiB Memory, 640 GiB SSD Storage) running Amazon Linux 2.

cargo build: A speedup of 33.6%.

  • cargo clean && time cargo +nightly-2019-12-17 build: 21.42 seconds
  • cargo clean && time cargo +nightly-2019-12-18 build: 14.23 seconds
131.11user 7.97system 0:21.42elapsed 649%CPU (0avgtext+0avgdata 587096maxresident)k
0inputs+1127952outputs (333major+1828747minor)pagefaults 0swaps
157.53user 9.38system 0:14.23elapsed 1172%CPU (0avgtext+0avgdata 656468maxresident)k
0inputs+1109600outputs (0major+2030716minor)pagefaults 0swaps

cargo build --release: A speedup of 11.7%.

  • cargo clean && time cargo +nightly-2019-12-17 build --release: 34.29 seconds
  • cargo clean && time cargo +nightly-2019-12-18 build --release: 30.28 seconds
421.71user 8.69system 0:34.30elapsed 1254%CPU (0avgtext+0avgdata 738148maxresident)k
0inputs+460136outputs (0major+1976122minor)pagefaults 0swaps
463.09user 10.84system 0:30.29elapsed 1564%CPU (0avgtext+0avgdata 758324maxresident)k
0inputs+459600outputs (0major+2191324minor)pagefaults 0swaps

cargo build --check: A speedup of 26.1%.

  • cargo clean && cargo +nightly-2019-12-17 check: 18.56 seconds

  • cargo clean && cargo +nightly-2019-12-18 check: 13.71 seconds

90.01user 5.22system 0:18.56elapsed 513%CPU (0avgtext+0avgdata 555520maxresident)k
0inputs+461128outputs (0major+1384224minor)pagefaults 0swaps
109.42user 6.85system 0:13.71elapsed 847%CPU (0avgtext+0avgdata 593012maxresident)k
0inputs+461072outputs (0major+1633770minor)pagefaults 0swaps
1 Like

I’ll try to dig in further in a bit, but I got a very surprising result trying to build Volta: it has some small but noticeable improvements in debug builds… and crashed the laptop hard, twice in a row, when running the parallel release build when on battery. The regular release build was fine. Once I’m on power again I’ll see what happens and figure out whether this was a fluke!

Just tested wasmtime with -Zthreads=72, and the time went from 1m9s (parallel rustc without -Zthreads) to 2m1s (parallel rustc with -Zthreads=72). The user time massively increased, from 16m50s to 50m.

Watching htop, it looks like the jobserver bits between cargo and rustc aren't actually limiting to 72 jobs at a time, because I see a load average around 300-500 and many hundreds of running (not blocked) rustc processes across many crates.

Yes this is a known bug we're working on, on my 28-thread system locally -Zthreads=28 makes compiles times quite bad. We'll be sure to reach out to you though when we think we have a fix for this, 72 is the highest number we've seen so far!

Basically, it's expected that -Zthreads=72 performs pretty badly right now.

1 Like

Results from compiling the spurs crate...

I ran each command twice.

Intel(R) Core(TM) i5-7500T CPU @ 2.70GHz

16GB RAM

# full build, serial compiler
real	0m10.321s
user	0m36.706s
sys	0m2.307s

real	0m10.215s
user	0m36.790s
sys	0m2.333s


# full build, parallel compiler
real	0m11.396s
user	0m38.184s
sys	0m2.716s

real	0m12.243s
user	0m38.848s
sys	0m2.724s

# full build, release, serial
real	0m23.591s
user	1m27.648s
sys	0m2.180s

real	0m23.220s
user	1m27.705s
sys	0m2.254s

# full build, release, parallel

real	0m23.931s
user	1m29.144s
sys	0m2.432s

real	0m24.661s
user	1m28.921s
sys	0m2.435s

# check, serial
real	0m7.703s
user	0m20.652s
sys	0m1.827s

real	0m7.712s
user	0m20.718s
sys	0m1.816s

# check, parallel
real	0m7.921s
user	0m22.267s
sys	0m1.986s

real	0m8.068s
user	0m22.192s
sys	0m1.987s

So in summary

full build: parallel is 15% (~1.4 seconds) slower on average 
full build release: parallel is 3% (<1 second) slower on average
check: 3% (<1 second) slower on average

On crates.io's codebase at 1fc6bfa on macOS Catalina 10.15.1, 3.1 GHz Dual-Core Intel Core i7, 16 GB RAM:

Full build

$ cargo clean && time cargo +nightly-2019-12-18 build
real	10m1.042s
user	18m47.605s
sys	1m50.826s
$ cargo clean && time cargo +nightly-2019-12-17 build
real	8m33.007s
user	17m14.295s
sys	1m38.205s

Soooo parallelism made it slower?

Full release build

$ cargo clean && time cargo +nightly-2019-12-18 build --release
real	18m58.546s
user	44m1.749s
sys	1m48.858s
$ cargo clean && time cargo +nightly-2019-12-17 build --release
real	16m54.551s
user	43m2.278s
sys	1m38.579s

Also slower for me :frowning:

Full check

$ cargo clean && time cargo +nightly-2019-12-18 check
real	6m35.014s
user	12m18.504s
sys	1m22.026s
$ cargo clean && time cargo +nightly-2019-12-17 check
real	5m16.316s
user	11m27.044s
sys	1m17.450s

:frowning:

I'm happy to run anything else that would be useful, or provide any other info, let me know what!

2 Likes

I tested on Fuchsia. Everything built and all tests passed correctly, so no problems there.

Here are three timings from builds that were run without parallel rustc enabled:

5284.35user 696.95system 3:00.15elapsed 3320%CPU (0avgtext+0avgdata 2963072maxresident)k
1704inputs+61728288outputs (173653major+94432200minor)pagefaults 0swaps

5337.72user 679.63system 3:07.12elapsed 3215%CPU (0avgtext+0avgdata 2962480maxresident)k
6217992inputs+61728224outputs (174415major+97968950minor)pagefaults 0swaps

5279.77user 705.58system 3:01.22elapsed 3302%CPU (0avgtext+0avgdata 2968028maxresident)k
14080inputs+61729144outputs (182088major+96475006minor)pagefaults 0swaps

And here are 3 runs with parallel rustc enabled:

5313.74user 689.57system 2:59.98elapsed 3335%CPU (0avgtext+0avgdata 2982788maxresident)k
1048inputs+60992440outputs (155345major+96491042minor)pagefaults 0swaps

5318.76user 692.16system 2:58.39elapsed 3369%CPU (0avgtext+0avgdata 2964200maxresident)k
1137304inputs+61014216outputs (160991major+98495705minor)pagefaults 0swaps

5330.31user 691.24system 2:57.67elapsed 3389%CPU (0avgtext+0avgdata 2965328maxresident)k
1040inputs+60995288outputs (159364major+97233184minor)pagefaults 0swaps

So it looks like we saw a small but significant improvement in build time. I suspect that enabling >4 threads would lead to further improvements.

Note that there are some non-Rust steps in those builds, so the gain might be higher percentage-wise. (Only the Rust targets were invalidated, but we have some targets that depend on everything at the very end.)

This wasn't very scientific, so should probably be taken with a grain of salt. :slight_smile:

On the build of specifically our third party code (built with cargo; takes around 20s; included in the above builds) I did not notice any significant change.

1 Like

Okay, after getting machine on power, results:

  • cargo clean && time cargo +nightly-2019-12-18 build: 80.14 real 485.11 user 48.62 sys

  • cargo clean && time cargo +nightly-2019-12-17 build: 75.14 real 456.43 user 37.09 sys

  • cargo clean && time cargo +nightly-2019-12-18 build --release: 136.43 real 1227.13 user 43.18 sys

  • cargo clean && time cargo +nightly-2019-12-17 build --release: 138.98 real 1214.59 user 33.23 sys

(Best guess: I hit some odd condition with maxing the cores on low battery earlier.)

1 Like

Quick followup: I did some perf runs on -Zthreads=72 builds, and it looks like the substantial user-time overhead (going from 16m50s to 50m, or 40m with a kernel patch to improve pipe wakeup fairness) consists heavily of attempts to do work-stealing:

  13.07%  rustc            librustc_driver-0d78d9a30be443c5.so          [.] std::thread::local::LocalKey<T>::try_with
  10.95%  rustc            librustc_driver-0d78d9a30be443c5.so          [.] crossbeam_epoch::internal::Global::try_advance
   6.93%  rustc            [unknown]                                    [k] 0xffffffff91a00163
   5.86%  rustc            librustc_driver-0d78d9a30be443c5.so          [.] crossbeam_deque::Stealer<T>::steal
   4.14%  rustc            librustc_driver-0d78d9a30be443c5.so          [.] <core::iter::adapters::chain::Chain<A,B> as core::iter::traits::iterator::Iterator>::try_fold
   3.02%  rustc            ld-2.29.so                                   [.] _dl_update_slotinfo
   2.18%  rustc            ld-2.29.so                                   [.] __tls_get_addr
   1.65%  rustc            librustc_driver-0d78d9a30be443c5.so          [.] crossbeam_epoch::default::HANDLE::__getit
   1.39%  rustc            librustc_driver-0d78d9a30be443c5.so          [.] crossbeam_epoch::default::pin
   1.14%  rustc            ld-2.29.so                                   [.] update_get_addr
   1.14%  rustc            libc-2.29.so                                 [.] __memmove_sse2_unaligned_erms
   1.10%  rustc            [unknown]                                    [k] 0xffffffff91a00b27
   1.05%  rustc            rustc                                        [.] free
   0.93%  rustc            rustc                                        [.] malloc
   0.70%  rustc            ld-2.29.so                                   [.] __tls_get_addr_slow
   0.66%  rustc            libLLVM-9-rust-1.41.0-nightly.so             [.] combineInstructionsOverFunction
   0.64%  rustc            librustc_driver-0d78d9a30be443c5.so          [.] __tls_get_addr@plt

I dug further, and the calls to std::thread::local::LocalKey<T>::try_with come from crossbeam_deque::Stealer<T>::steal.

1 Like

I used my latest project multi_file_writer and compared it with stable too.

The CPUs look more occupied in the load graph when 2019-12-18 is running. However, having the benchmark running tree times shows me that the variance is bigger than the improvement. And for a laptop, cooling has a greater effect than the optimization.

  • Hardware: AMD Ryzen 7 PRO 3700U (8 Cores, 24GB RAM, 1TB SSD - Lenovo T495
  • Software: Kubuntu Linux 19.10, Kernel 5.3.0-24-generic
# full build
cargo clean && time cargo +nightly-2019-12-18 build
real    0m17.877s   0m16.555s   0m18.536s
user    1m35.093s   1m32.151s   1m39.613s
sys     0m4.695s    0m4.615s    0m5.139s

cargo clean && time cargo +nightly-2019-12-17 build
real    0m18.620s   0m17.727s   0m18.014s
user    1m25.832s   1m23.918s   1m24.582s
sys     0m3.884s    0m3.891s    0m3.960s

cargo clean && time cargo +stable build
real    0m18.954s   0m18.702s   0m19.721s
user    1m31.474s   1m29.693s   1m31.224s
sys     0m4.240s    0m3.892s    0m4.166s



# full release build
cargo clean && time cargo +nightly-2019-12-18 build --release
real    0m45.455s   0m40.779s   0m41.285s
user    4m42.960s   4m27.842s   4m27.474s
sys     0m5.058s    0m4.540s    0m5.088s

cargo clean && time cargo +nightly-2019-12-17 build --release
real    0m43.404s   0m43.381s   0m41.122s
user    4m35.760s   4m30.659s   4m25.856s
sys     0m4.376s    0m4.173s    0m4.233s

cargo clean && time cargo +stable build --release
real    0m40.928s   0m40.848s   0m42.138s
user    4m22.785s   4m24.846s   4m28.641s
sys     0m4.290s    0m4.274s    0m4.358s


# full check
cargo clean && time cargo +nightly-2019-12-18 check
real    0m12.896s   0m13.526s   0m12.870s
user    0m59.298s   0m59.617s   0m58.271s
sys     0m3.352s    0m3.519s    0m3.672s

cargo clean && time cargo +nightly-2019-12-17 check
real    0m13.727s   0m14.880s   0m13.569s
user    0m49.721s   0m53.186s   0m51.119s
sys     0m2.922s    0m2.983s    0m3.023s

cargo clean && time cargo +stable check
real    0m14.428s   0m14.235s   0m14.488s
user    0m52.424s   0m52.405s   0m53.769s
sys     0m3.142s    0m2.962s    0m3.148s

My CPUs layout: image

If there is an other benchmark, I would be happy to run that too.

Testing against reso_dd at commit 2d21506, a soon-to-be-released crate that is 2.2 MB of serde-annotated structs. The only functions it contains are custom serde serialization/deserialization. Dependencies are serde and chrono.

For this particular type of workload, the speedup seems quite significant.

Operating system: MacOS 10.15.1 Hardware: MacBook Pro (16-inch, 2019), 2.4 GHz 8-Core Intel Core i9, 32 GB 2667 MHz DDR4

# full build
cargo +nightly-2019-12-18 build -p reso_dd  63.89s user 3.26s system 208% cpu 32.208 total
cargo +nightly-2019-12-17 build -p reso_dd  58.59s user 2.94s system 134% cpu 45.696 total

# full release build
cargo +nightly-2019-12-18 build -p reso_dd --release  136.12s user 3.42s system 390% cpu 35.720 total
cargo +nightly-2019-12-17 build -p reso_dd --release  129.22s user 3.11s system 289% cpu 45.679 total

# full check
cargo +nightly-2019-12-18 check -p reso_dd  53.91s user 2.51s system 212% cpu 26.513 total
cargo +nightly-2019-12-17 check -p reso_dd  49.01s user 2.22s system 134% cpu 38.152 total

Using commit 80066510b54f1ae05d51f65b52e18bdd5357016c of differential dataflow and compiling some of the examples, I found:

  • a noticeable speedup when doing a full build, i.e. dependencies are being compiled (~10% for debug, ~5% for release)
  • no speedup (or: lost in the noise) when just compiling final binaries (makes sense - rustc timings indicate most time is spent in llvm, also explaining debug/release difference) (I don't know how much the type rechecking etc the touch will cause)
# Parallel debug builds:

$ rm -rf target

$ /usr/bin/time cargo +nightly-2019-12-18 build --example progress
[...]
    Finished dev [unoptimized + debuginfo] target(s) in 31.48s
110.99user 4.06system 0:31.50elapsed 365%CPU (0avgtext+0avgdata 920740maxresident)k
440inputs+1249848outputs (7major+1433125minor)pagefaults 0swaps

$ /usr/bin/time cargo +nightly-2019-12-18 build --example pagerank
[...]
    Finished dev [unoptimized + debuginfo] target(s) in 11.17s
28.30user 0.69system 0:11.17elapsed 259%CPU (0avgtext+0avgdata 804092maxresident)k
0inputs+444360outputs (0major+307311minor)pagefaults 0swaps

$ /usr/bin/time cargo +nightly-2019-12-18 build --example monoid-bfs
[...]
    Finished dev [unoptimized + debuginfo] target(s) in 10.13s
25.70user 0.81system 0:10.14elapsed 261%CPU (0avgtext+0avgdata 770656maxresident)k
0inputs+413664outputs (0major+290452minor)pagefaults 0swaps

# Non-parallel debug builds

$ rm -rf target

$ /usr/bin/time cargo +nightly-2019-12-17 build --example progress
[...]
    Finished dev [unoptimized + debuginfo] target(s) in 36.61s
101.69user 3.60system 0:36.62elapsed 287%CPU (0avgtext+0avgdata 960348maxresident)k
32inputs+1249832outputs (0major+1290682minor)pagefaults 0swaps

$ /usr/bin/time cargo +nightly-2019-12-17 build --example pagerank
[...]
    Finished dev [unoptimized + debuginfo] target(s) in 10.59s
28.35user 0.87system 0:10.60elapsed 275%CPU (0avgtext+0avgdata 845212maxresident)k
0inputs+444344outputs (0major+322688minor)pagefaults 0swaps

$ /usr/bin/time cargo +nightly-2019-12-17 build --example monoid-bfs
[...]
    Finished dev [unoptimized + debuginfo] target(s) in 10.11s
26.29user 0.71system 0:10.12elapsed 266%CPU (0avgtext+0avgdata 755828maxresident)k
0inputs+413624outputs (0major+277129minor)pagefaults 0swaps

# Parallel release builds 

$ rm -rf target

$ /usr/bin/time cargo +nightly-2019-12-18 build --example progress --release
[...]
    Finished release [optimized + debuginfo] target(s) in 2m 11s
297.09user 4.98system 2:11.90elapsed 229%CPU (0avgtext+0avgdata 2827380maxresident)k
0inputs+1018456outputs (0major+2283912minor)pagefaults 0swaps

$ touch examples/progress.rs

$ /usr/bin/time cargo +nightly-2019-12-18 build --example progress --release
[...]
    Finished release [optimized + debuginfo] target(s) in 1m 35s
132.45user 1.34system 1:35.33elapsed 140%CPU (0avgtext+0avgdata 2835180maxresident)k
0inputs+424168outputs (0major+993830minor)pagefaults 0swaps

#Non-parallel release builds

$ rm -rf target

$ /usr/bin/time cargo +nightly-2019-12-17 build --example progress --release
[...]
    Finished release [optimized + debuginfo] target(s) in 2m 20s
289.44user 4.20system 2:20.92elapsed 208%CPU (0avgtext+0avgdata 2828940maxresident)k
0inputs+1042776outputs (0major+2153859minor)pagefaults 0swaps

$ touch examples/progress.rs

$ /usr/bin/time cargo +nightly-2019-12-17 build --example progress --release
[...]
    Finished release [optimized + debuginfo] target(s) in 1m 37s
134.14user 1.39system 1:37.55elapsed 138%CPU (0avgtext+0avgdata 2845420maxresident)k
0inputs+448848outputs (0major+1003126minor)pagefaults 0swaps

Timing data for the turtle crate.

This is using a fairly powerful CPU. Full system info at the end of this comment.

cargo clean && time cargo +nightly-2019-12-18 build

real	0m23.471s
user	4m30.381s
sys	0m17.490s

cargo clean && time cargo +nightly-2019-12-17 build

real	0m26.377s
user	4m3.021s
sys	0m13.118s

cargo clean && time cargo +nightly-2019-12-18 build --release

real	0m43.751s
user	10m41.284s
sys	0m17.831s

cargo clean && time cargo +nightly-2019-12-17 build --release

real	0m47.365s
user	10m21.766s
sys	0m13.473s

The turtle crate has a lot of examples, so I thought it would be worth it to run those too:

cargo clean && time cargo +nightly-2019-12-18 build --examples

real	0m33.796s
user	6m52.774s
sys	0m36.885s

cargo clean && time cargo +nightly-2019-12-17 build --examples

real	0m36.875s
user	6m27.881s
sys	0m31.876s
System info

image

Testing Beamium (a Prometheus to Warp10 metrics forwarder) at commit 819ef8c:

cargo +nightly-2019-12-18 build
534,97s user 30,78s system 604% cpu 1:33,65 total

cargo +nightly-2019-12-17 build
517,03s user 26,19s system 577% cpu 1:34,11 total

cargo +nightly-2019-12-18 build --release
1319,95s user 29,98s system 515% cpu 4:22,00 total

cargo +nightly-2019-12-17 build --release 
1356,64s user 28,18s system 496% cpu 4:39,15 total

cargo +nightly-2019-12-18 check
341,53s user 26,41s system 689% cpu 53,353 total

cargo +nightly-2019-12-17 check 
338,01s user 22,66s system 654% cpu 55,137 total

On Rust stable, just in case (1.40.0):

cargo build
526,18s user 27,23s system 619% cpu 1:29,33 total

cargo build --release 
1322,03s user 27,14s system 520% cpu 4:19,12 total

cargo check
354,53s user 23,55s system 630% cpu 1:00,01 total

Tested on Dell XPS 15 (9550):

  • Intel® Core™ i7-6700HQ CPU @ 2.60GHz
  • 16GB of RAM
  • Ubuntu 19.10 64bits

Timing for grayarea-runtime on 40 Intel CPU 32Gb RAM Ubuntu 19 shows about 25% perf boost for debug build and check and 10% for release builds:

+nightly-2019-12-17 build
real    1m22.457s
user    8m46.634s
sys     0m37.100s
+nightly-2019-12-18 build
real    1m2.301s
user    10m8.246s
sys     0m51.905s
+nightly-2019-12-17 build --release
real    1m51.374s
user    26m32.222s
sys     0m42.592s
+nightly-2019-12-18 build--release
real    1m37.700s
user    29m30.259s
sys     0m58.299s
+nightly-2019-12-18 check
real    1m5.725s
user    5m40.797s
sys     0m26.380s
+nightly-2019-12-18 check
real    0m44.395s
user    6m37.955s
sys     0m39.464s

P.S. Give it first dry run to download deps

Building substrate 5dece712a on AMD EPYC Rome (64 cores/128 threads), 256 GB machine.

cargo clean && time cargo +nightly-2019-12-17 build
    Finished dev [unoptimized + debuginfo] target(s) in 6m 15s

real    6m17.934s
user    135m2.361s
sys     17m1.909s
cargo clean && time cargo +nightly-2019-12-18 build
    Finished dev [unoptimized + debuginfo] target(s) in 6m 29s

real    6m32.575s
user    228m14.238s
sys     32m0.462s
cargo clean && time cargo +nightly-2019-12-17 build --release
    Finished release [optimized] target(s) in 8m 38s

real    8m41.842s
user    411m59.930s
sys     23m30.786s
cargo clean && time cargo +nightly-2019-12-18 build --release
    Finished release [optimized] target(s) in 8m 45s

real    8m48.119s
user    537m58.740s
sys     45m5.127s
cargo clean && time cargo +nightly-2019-12-17 check
    Finished dev [unoptimized + debuginfo] target(s) in 4m 46s

real    4m49.572s
user    154m7.090s
sys     20m57.464s
cargo clean && time cargo +nightly-2019-12-18 check
    Finished dev [unoptimized + debuginfo] target(s) in 4m 26s

real    4m29.848s
user    210m40.669s
sys     39m16.688s

Running cargo +nightly-2019-12-18 check with RUSTFLAGS="-Zthreads=64" I am getting:

    Finished dev [unoptimized + debuginfo] target(s) in 20m 54s

real    20m57.729s
user    746m42.182s
sys     1532m35.727s

The second time I ran it I got

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', src/libcore/result.rs:1188:5
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:84
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:61
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1025
   5: std::io::Write::write_fmt
             at src/libstd/io/mod.rs:1426
   6: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:65
   7: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:50
   8: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:193
   9: std::panicking::default_hook
             at src/libstd/panicking.rs:210
  10: rustc_driver::report_ice
  11: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:475
  12: rust_begin_unwind
             at src/libstd/panicking.rs:375
  13: core::panicking::panic_fmt
             at src/libcore/panicking.rs:84
  14: core::result::unwrap_failed
             at src/libcore/result.rs:1188
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports

note: rustc 1.41.0-nightly (3ed3b8bb7 2019-12-17) running on x86_64-unknown-linux-gnu

note: compiler flags: -Z threads=64 -C debuginfo=2 --crate-type lib

note: some of the compiler flags provided by cargo are hidden

query stack during panic:
end of query stack

Testing sv-parser at 6e887e2. The testing environment is Intel Xeon Gold 6134 CPU @ 3.20GHz (16cores/32threads) / 1.5TB DRAM. About 15-20% improved.

cargo clean && time cargo +nightly-2019-12-17 build

real    2m35.962s
user    5m6.225s
sys     0m21.336s

cargo clean && time cargo +nightly-2019-12-18 build

real    2m10.055s
user    5m40.738s
sys     0m26.435s

cargo clean && time cargo +nightly-2019-12-17 build --release

real    2m49.236s
user    16m11.879s
sys     1m29.441s

cargo clean && time cargo +nightly-2019-12-18 build --release

real    2m24.627s
user    17m15.008s
sys     1m26.060s

cargo clean && time cargo +nightly-2019-12-17 check

real    2m5.321s
user    2m55.150s
sys     0m13.223s

cargo clean && time cargo +nightly-2019-12-18 check

real    1m41.439s
user    3m26.564s
sys     0m15.532s

On my 2-cpu 8-thread server (and old xeon E5), all of my projects were faster with the 2019-12-18 nightly (across all projects, best was 1.51 times faster for check --release, worst as 1.01 times faster for another check --release). In no cases was it slower.

Adding another set of data points, I just grabbed a few repos people were using in this thread. Seeing awesome improvements, nice job to the team working on this.

Hardware

  • Processor: AMD 3970X
  • Memory: 64Gb DDR4 @ 3000MHz C15
  • Storage: PCIE-Gen4 Corsair MP600

Results

sv-parser @ 6e887e:
  2019-12-17 Build: 1:57.42
  2019-12-18 Build: 1:37.97

crates-io @ e0d149:
  2019-12-17 Build: 0:57.62
  2019-12-18 Build: 0:42.93

turtle @ 9efd23:
  2019-12-17 Build: 0:22.69
  2019-12-18 Build: 0:12.23