Clang's new `-ftime-trace` flag

Clang has recently added a -ftime-trace flag that outputs a flame chart detailing how long different parts of the code being compiled took to do so. I wasn't sure whether this meant that passing flags to rustc will produce this info about a given rust program, and searching didn't directly give me the answer to that, so I tried it.

The first step there was running rustc -C llvm-args="--version" to confirm that the version matched the version of Clang that has the new flag. The output was as follows:

  LLVM version 9.0.0-rust-1.38.0-stable
  DEBUG build.

Along with some information about the target and host CPU. (By the way, I hope that "DEBUG build." doesn't mean that rustc is using an unoptimized build of LLVM!) The new flag came with Clang version 9.0.0, so that looked promising. However, neither rustc -C llvm-args="--help" nor rustc -C llvm-args="--help-list-hidden" listed -ftime-trace, and just trying it anyway produced an error.

Confusingly, --time-trace-granularity was listed, which appears to be an optional flag related to the time trace flag. Running cargo rustc -- -C llvm-args="--time-trace-granularity=1000000" successfully builds the project I tried it on, but no .json file was produced in the target/debug directory.

If this feature is part of the Clang C/++ frontend, and not part of the underlying LLVM language-agnostic compiler backend, then rustc cannot (easily) use it.

You may, however, be interested in rustc's own self-profiling features, possible improvements to which are being discussed elsewhere on this forum as we speak.

Ah, I had not seen that thread, thanks for pointing it out! I'm glad that the concept of make data available that can answer the question "what changes can I make to this code to make it compile faster?", rather than just the question "What changes can be made to rustc to make this compile faster?" is being discussed. The -ftime-trace flag itself was also already linked to from there.

"What changes can I make to this code to make it compile faster?" is a difficult question in Rust, because the language has a rather sophisticated compilation model:

  • Monomorphization and macros may generate many copies of the "same" code, but the compiler is smart enough to reuse a given generic instantiation multiple times in a crate.
  • Incremental compilation means that code which was built before may not get rebuilt, and the analysis that decides what gets built has sub-source file granularity.
  • The emerging lazy "query" model, which was introduced as part of the incremental compilation effort, could also reduce the compilation impact of unused code...
    • Not sure if it does today, but pretty sure it will in the future, once MIR-only rlibs are a thing.

It's not like in C where each function in a .c file will get built exactly once with rather predictable overhead.

However, there are two classic tricks that can greatly speed up your Rust builds:

  • Audit your dependencies. Cargo makes it trivial to bring in plenty of external crates to speed up your development workflow, but with great power comes great responsibility.
    • If initial build times matter to you, consider dropping dependencies which you do not use much (e.g. replacing clap with raw std::env for basic CLI argument needs).
    • If your dependencies have cargo features, only enable those that you need.
  • Use cargo-bloat to figure out what takes up room in your output binary. Code size is a surprisingly effective indicator of compile-time overhead, considering how indirect it is, and it can be an easy way to spot classic macro and monomorphization mistakes that can also hurt your compile times.

Here's another interesting discussion of some common causes of compile times and binary bloat in Rust.


I don't see any option that we could use with rustc -C llvm-args, but Clang just calls llvm::timeTraceProfilerInitialize() etc. from llvm/Support/TimeProfiler.h. We could add bindings for that API in src/rustllvm to enable a custom -C/-Z option.


the problem with only adding the llvm::timeTraceProfilerInitialize() was that the llvm code is not tread safe so you need to use the -Z no-parallel-llvm also.

also made a hack to add the traces to -Z self-profiling to get the thread safety. but then I needed to filter to only what function that was optimized, as I bypassed the --time-trace-granularity that was one of the parts that was not thread safe, this due to self-profiling do not support the filter over time and some crates failed to compile due to the 1GB event store was not enough.

see more in the thread

1 Like

If you're inclined, that could be an LLVM contribution opportunity to make that API thread safe.