Profile-guided optimization: How well does it work for you?

michaelwoerister · October 14, 2019, 10:10am

Since Rust 1.37 the compiler supports profile-guided optimization (PGO), a feature that many power users have been asking for. However, so far the performance improvements enabled by PGO have been a bit underwhelming, even after fixing a bad interaction with Cargo. On the other hand, my sample of test applications is rather small.

So my question is: Did anybody try out PGO with their projects? Or would anyone here like to try it? You'd have to use at least Rust 1.39 (currently beta) in order to get a working Cargo version and then just follow the instructions in the official docs: https://doc.rust-lang.org/rustc/profile-guided-optimization.html#a-complete-cargo-workflow

In theory, LLVM's PGO can noticeably improve runtime performance (e.g. up to 10% for Firefox).

kornel · October 14, 2019, 11:28am

I don't like using RUSTFLAGS (seems too fragile), so I'm waiting for PGO to get some first-class support in Cargo before trying it.

tkaitchuck · October 23, 2019, 10:07pm

One thing I am looking for would be PGO for libraries.

If for example we did PGO on the collections in the standard library, and other common crates and check in the generated information and point to it in the toml file, then downstream packages could depend on them and get more optimized code without having to run profiling themselves. (At least if they are compiling to one of the optimized architectures)

I gather that isn't simply going to work out of the box. Are the obstacles surmountable? I would be willing to help if someone can provide direction.

michaelwoerister · October 24, 2019, 8:03am

I think there's a bit of a conceptual conflict here because PGO works by optimizing for specific usage profile while libraries usually are general purpose with no knowledge of the usage profile yet.

vehls · October 24, 2019, 8:08am

I tried it in my application at work and the result was slowdown. But I guess it was probably because of my lack of knowledge here.

michaelwoerister · October 24, 2019, 11:46am

Thanks for giving it a try, @vehls!

kentnl · October 25, 2019, 11:10am

I gave it a shot, but I saw minimal benefit (beneath noise floor).

That said, my application was incredibly tiny already, and I made substantially more savings(at least memory-wise) by construcing a smaller BufReader, because the files I was reading were under 200 bytes long already, averaging 10 bytes per line, and I was only using BufReader to get nice linewise semantics, and I didn't need 8K of heap to read that

Tools like valgrinds dhat can make this under-utilization of allocated memory more obvious, not sure if PGO can cut corners here.

jonh · October 26, 2019, 1:10pm

For a private app of mine LTO gave a few percent improvement. PGO didn't give any improvement over LTO.

vorner · October 26, 2019, 3:59pm

While I didn't really try it much, my understanding of PGO in these days is that one generally shouldn't expect much improvement from it.

The compiler needs to make decisions about which if-branch is more likely to decide which one to optimize more, possible at the cost of the other. There are many heuristics by which the compiler can make an educated guess.

PGO only provides the real data so it doesn't have to guess.

But for this to have any effect on the end application, the original guess would have to be wrong. Which would mean that either the application does something very unusual to confuse the heuristics or that the compiler is bad at guessing. And the compilers have several decades of research about how to guess better and better in them by now.

rpjohnst · October 27, 2019, 1:37am

That's not really accurate - I see regularly see substantial wins from PGO in C++ codebases.

HadrienG · October 27, 2019, 8:04am

Note that the C++ build process differs significantly enough from that of Rust (in particular wrt compilation unit granularity, which has a strong effect on inlining even with LTO on) that the increased effectiveness of PGO in C++ could be caused by this difference. But that's just a possibility.

michaelwoerister · October 28, 2019, 8:32am

I agree, in Firefox the improvement for C/C++ code is 5-10%.

Huh, that is a really interesting theory! And one that can even be tested :) I'll give it a try for the regex benchmark suite when I find the time.

HadrienG · October 28, 2019, 9:27am

Cool ! Feel free to ping me about the results if you do.

The background behind this theory is that I recently tried LTO on my C++ builds and was disappointed at how little cross-unit inlining the compiler (in this case GCC) would actually perform. At the time, I speculated that PGO metadata about hot call paths might hint the compiler in the right direction, but never got around actually testing this hypothesis (it's on my to-do pile somewhere).

michaelwoerister · November 22, 2019, 3:52pm

So I finally got around to testing this and indeed PGO makes much more of difference when compiling with a higher number of compilation units. In my tests PGO improve performance by 0.3% with 1 CGU and by 1.2% with one CGU per Rust module.

Maybe even more interesting: Tuning ThinLTO via the -import-instr-limit parameter made the effect even more pronounced. There I was able to get a 4% improvement over the best non-PGO configuration which is well within expectations for a PGO build. It seems that the default "brute-force" ThinLTO settings actually negatively interfere with PGO.

So, I basically consider PGO as "working as expected" now.

system · February 20, 2020, 3:52pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Add Propeller support to the Rustc compiler compiler	8	842	January 24, 2024
Help test out ThinLTO! compiler	52	16357	March 25, 2019
Optimizing by default internals	55	13862	March 25, 2019
Compiler Profiling Survey compiler	25	2680	January 27, 2020
Compile-time plugins: Re-purposing Cargo `[patch]` and proc-macros cargo	2	747	February 28, 2022

Profile-guided optimization: How well does it work for you?

Related Topics