Method inlining on the Iterator trait


I've noticed that many iterator methods are marked as #[inline] (eg., but some are not (eg. sum, while it only calls into another function as it's only part of the body). Who makes the decision what should and should not be inline?

I'm asking for a specific use case. I'm playing with autovectorization and function multiversioning (using the multiversion crate). The thing is, when a function gets inlined, it inherits the allowed instructions of the parent function, but if it is called as a function, it uses only the basic instruction set even if it is the only place that function is being called.

I've noticed sum does this to me a lot, that it refuses to get inlined, so I have to unroll the summing manually by a for cycle. Shouldn't the one-line method (and similar ones) be marked inline and leave it up to the Sum::sum implementation to „decide“ if it wants to be inlined or not?

There's some recent discussion on this in #t-libs on zulip, and this part seems relevant:

#[inline] on a generic function is usually not the right idea, these are already available for inlining to LLVM

It might be a bit more subtle than that, but it's not as simple as inlining everything with a single-line body.

1 Like

The following tidbit is also interesting, this is a clear reason for using #[inline] judiciously, also for generic functions: Additionally in release mode #[inline] causes translation into every single CGU; and a reminder that the story of inline has been changing all the while, as Llvm has been changing, and Rust has changed, now using codegen units and so on.

Specifically for Iterator, I think it could make sense to remove #[inline] on methods that implement entire traversal loops by theselves. i.e. .sum() loops over the whole iterator, so it likely has a significant method body in itself.

I know there are downsides to inlining. That's why I'm asking who makes the decisions and on what basis.

Furthermore, it is not Iterator::sum that loops over the iterator, it's Sum::sum:

The Iterator::sum is just a convenient method alias that delegates the call to another function with the same parameters and passes the result unchanged. Inlining such function should make it disappear completely.

Furthermore, I think it is the job of the Sum::sum implementation to decide if it makes sense to inline. In my case it is reasonable to do, because my type is supposed to help with vectorization ‒ but for that to work, I need to get the right instructions enabled with #[target_feature(enable = "AVX2")] or so. And if it doesn't get inlined, the annotation „stops“ at the nearest non-inlined function call. Right now, I can't ask for inlining from inside (because then the Sum::sum gets inlined into the Iterator::sum and stops there) and I don't think there's a way to ask for inlining from the caller side.

Or is there an alternative way to ask LLVM to please inline this, or to pass the enabled instructions inside? I'd be OK with calling the function as proper, but if it got compiled with the right instructions inside.

And no, I can't just stick the #[target_feature(enable = "AVX2")] onto the Sum::sum :-(. I'm creating vector types that rely on autovectorization and allow using multiversioning ‒ so I really do want to have one code that doesn't know about what instructions it'll be used with, but let the compiler use whatever available to make them as fast as possible.


This sounds like a case where it would be nice if rust supported the llvm flatten attribute, which tries to inline everything within a function.


That sounds exactly like the thing. I guess for that to come to Rust, one would have to start with RFC.

This sounds like it could be the basis for two feature requests?

  1. Optionally inherit target-feature settings all the way down to all used items? (In this case, automatically create multiple copies with different features, as needed. This makes multiversioning recursive, and potentially extra expensive).

  2. Automatically inherit target-feature settings on an instantiation of a function, if all the callers (or the single caller) use the same setting.

That is probably harder than it looks at the first glance. The enabled instruction set would have to become part of the mangled function name to begin with, so the symbols don't collide. And I guess touching the definition how mangling is done for Rust is a Big Deal with consequences all the way to debuggers, profilers, ...

That would be a nice thing to have and seems to mostly be a conservative optimization of the current state. But I don't think this would help me a lot, because the idea about multiversioning is to actually have multiple copies of the same function in the binary, each with different level of enabled instructions ‒ so in my case, this situation would probably never happen.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.