Taking a quick look at the “core structures that will require synchronization”, I noticed some synchronization performance low-hanging fruits. Wishing I had more time to dedicate to this…
perf-stats
- these are just hacky little counters and things
As far as I can tell, this is just a bunch of accumulators, and rustc does not need to know their transient values during compilation, only the final result at the end. Whenever you find data which follows this general pattern, you can use a simple trick to get your synchronization overhead near zero:
- Each thread accumulates the statistics in a local variable
- All the local variables are merged into a global result at the end
tx_to_llvm_workers2
- this is just a channel, seems fine
It seems like the fix is easy here: just clone the mpsc::Sender instead of keeping it in the shared state. If the locking solution is correct (i.e. threads do not depend on the order in which data is sent down the pipe), this solution will be correct too, and avoid double synchronization.