Yes there's a number of portions of the compiler that are still sequential, others can speak more to specifics but I think the high-level ones are:
- Codegen (sorta). Only one thread performs translation from MIR to LLVM IR so it takes some time to "ramp up" and get parallelism. Once parallelism is on-by-default we plan to refactor this to have truly parallel codegen.
- Name resolution
The compiler isn't perfectly parallel, and we've found it's increasingly more difficult to land more parallelism unless it's all on by default. The thinking is that what we currently have is the next big step forward, but it's certainly not the end!
I also agree that the little bump in the middle of the graph you're looking at is the 4 cores getting active. Looks like that rate limiting is actually working! You can also experiment with the
-Zthreads value (such as
-Zthreads=28) if you'd like to test higher numbers. You may experience slowdowns at the beginning of the compilation but are likely to experience speedups for the script crate itself.
It may be worthwhile perhaps trying out just the
script crate compilation, with a high
-Zthreads limit? You may also be able to get some mileage with measureme to see where the sequential bottlenecks are so we can plan to work on those too!