Is "`#[inline(const)]`" possible?

#[inline] has a lot of implications for both runtime and compiletime cross-crate performance.

The runtime cost is obvious -- if a function is both nongeneric and not #[inline], then (cross-crate) it is always an indirect function call (until LTO, anyway).

There's a compiletime cost to #[inline] as well, though -- AIUI, #[inline] gives LLVM a hint that the function is a good candidate for inlining[1], which can make LLVM inline more aggressively than without. Additionally, as part of that, the function needs to be instantiated separately in every CGU in every crate.

Thus the idea for #[inline(const)] -- the point is to make the function available for inlining, but not apply any inlining hints, and ideally keep using the canonical implementation as the one in the source crate when inlining isn't done.

The reason to spell it #[inline(const)] is the intended application -- for a crate like bitvec, most functionality isn't an unambiguous candidate for inlining[2] when the parameters are unknown. However, at the same time, at the same time they really want to be able to constant fold.

Thus where #[inline] roughly communicates "this is a good candidate for inlining," #[inline(const)] is meant to roughly communicate "this is a good candidate for constant folding," importantly without impacting the default inlining heuristics.

Is this even a distinction which we can communicate to LLVM?


  1. And because inlining is bottom-up, whether a function is a good inlining candidate is not typically locally obvious to the developer; typically, it's better to let the optimizer use its own inlining heuristics. ↩︎

  2. This is AIUI why bitvec 1.0.0 significantly reduced the usage of #[inline] -- but due to this, theoretically simple operations can often fail to constant fold. ↩︎

5 Likes

My color: #[inlinable]

I feel that ideally we solve this in a more fundamental way: it's very arcane knowledge that, if you publish a crate, you need to slap #[inline] on non-generic "getters" to not tank performance.

I wonder, would enabling lto=thin by default in release profile be a fix enough to allow libraries to not sprinkle inlines everywhere?

9 Likes

Wouldn't that become irrelevant once MIR optimizations become sufficiently powerful and/or MIR libraries become possible? Also, how is that meaningfully different from lto=thin, other than more busywork for library authors?

Bikeshed: the notation is bad. I would expect it to mean something like "inlinable if called in const context" or something like that. The inner syntax of the #[inline] attribute is its own private concern, so why not make it something more directly relevant, like #[inline(maybe)] or #[inline(enable)].

2 Likes

I had the a similar reaction to @afetisov -- that inline(const) means "inlined when called with a const argument (which may or may not be in a const context)".

(Digression) But that might be because I added this feature to the PyPy's JIT many years ago -- the ability to add a custom callback to say whether a function should be inlinined, based on whether the JIT knew something was const at runtime. https://foss.heptapod.net/pypy/pypy/-/commit/59d154eed759336cabd2e4a5493e4939af5f4978 and https://foss.heptapod.net/pypy/pypy/-/commit/f4ec7ef2672e841d1bb81f8bb8fcff05c9f29fa9 if anyone was curious.

No, because without #[inline] of some kind the MIR inliner won't be able to inline due to missing MIR for the function to be inlined.

But generic functions can be inlined regardless of that, right? Maybe this could be changed to have all functions be inlineable cross-crate

True. The only thing blocking them from being inlined by LLVM is that generic functions end up in a separate cgu. Thin local LTO as enabled by default when optimizations are enabled should already allow inlining them though.

It is possible, but it will increase the size of the crate metadata by a non-trivial amount and as such make rustc slower even for debug builds.

Very much this.

The nuances around CGUs are a royal pain right now. We end up needing to #[inline] things in core sometimes that already have MIR available so are essentially inline-available, but only to one of the CGUs and thus random CGU choices make a huge impact to some things sometimes.

It would be wonderful to get out of worrying about that. (cc @mara, who has expressed similar things.)

1 Like

To expand on thin lto:

I think how thin lto works (global, but parallel analysis of all TUs which facilitates cross-TU inlining without requiring analyzing literally everything in one chunk) is on a very fundamental level the way Rust optimizing compilers should work. We can't do separate compilation (because zero-cost abstractions) and we can't merge everything into one TU (because we need scalability across CPUs or, ideally, machines). The map-reduce of thin-lto is what's left then.

Even if in the future we replace thin lto with something like mir-only rlibs, I think the overall feeling (compile time and runtime performance) should be roughly the same for the outside observer.

This makes me think that lto=thin is just the natural, neutral thing to do for --release, and that it should ideally have been the default. I think it is not default, because lto=thin postdates Rust. But
"In Rust 2024, default for release profile is lto=thin" seems like a great thing to have on a roadmap. For builds of rust-analyzer, I get the following (with -Clink-arg=-fuse-ld=lld on Linux):

// lto=false
real 85.77s
cpu  1457.66s (1415.48s user + 42.18s sys)
rss  983.97mb

// lto="thin"
real 96.07s
cpu  1538.36s (1491.24s user + 47.12s sys)
rss  1382.45mb

Compile time hit here seems reasonable: of course, doing more global analysis is going to be slower than not doing that, and, from the mechanics of the language, that seems to be a more-or-less mandatory analysis for reasonable runtime behavior.

Memory hit is quite a bit worse. I think the reason why we didn't enable lto for rust-analyzer is that default github builders started to oom actually? (we do lto=thin when building ra) OTOH, it doesn't seem like an unreasonable memory requirement, and memory is generally "cheaper" than time.

2 Likes

Rust 2024 is the next edition. Or do you mean changing it on some non-edition boundary?

Update: the original comment has been updated to "2024", so this has been answered :slight_smile: .

1 Like

I've had OOM Problems in the past with even thin LTO on large projects. I'd hate to have this on for lccc or rustc, for example.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.