To expand on thin lto:
I think how thin lto works (global, but parallel analysis of all TUs which facilitates cross-TU inlining without requiring analyzing literally everything in one chunk) is on a very fundamental level the way Rust optimizing compilers should work. We can't do separate compilation (because zero-cost abstractions) and we can't merge everything into one TU (because we need scalability across CPUs or, ideally, machines). The map-reduce of thin-lto is what's left then.
Even if in the future we replace thin lto with something like mir-only rlibs, I think the overall feeling (compile time and runtime performance) should be roughly the same for the outside observer.
This makes me think that lto=thin is just the natural, neutral thing to do for --release, and that it should ideally have been the default. I think it is not default, because lto=thin postdates Rust. But
"In Rust 2024, default for release profile is lto=thin" seems like a great thing to have on a roadmap. For builds of rust-analyzer, I get the following (with -Clink-arg=-fuse-ld=lld on Linux):
// lto=false
real 85.77s
cpu 1457.66s (1415.48s user + 42.18s sys)
rss 983.97mb
// lto="thin"
real 96.07s
cpu 1538.36s (1491.24s user + 47.12s sys)
rss 1382.45mb
Compile time hit here seems reasonable: of course, doing more global analysis is going to be slower than not doing that, and, from the mechanics of the language, that seems to be a more-or-less mandatory analysis for reasonable runtime behavior.
Memory hit is quite a bit worse. I think the reason why we didn't enable (we do lto=thin when building ra) OTOH, it doesn't seem like an unreasonable memory requirement, and memory is generally "cheaper" than time.lto for rust-analyzer is that default github builders started to oom actually?