To fight sort-generated binary bloat and more

leonardo · June 7, 2020, 1:53pm

If in my code I have to perform a sort on many small slices that contain different types, in many different parts of the code, I think my binary grows a bit too much, because the compiler inlines too much sorting code, monomorphizing it too many times. To solve this problem we can add a small_stable_sort function in the Rust stdlib, that contains only a little amount code (like a selection sort). Or we can add in the stdlib a sort function prefixed by inline(never). Or we can add a stub function that calls another sort function that uses type erasure (as with the qsort of C language). An even simpler solution is to introduce in Rust the possibility to add an inline(never) at the call point (but this gets tricky if you are using iterator chains like itertools sorted):

#[inline(never)]
my_slice.sort();

elidupree · June 7, 2020, 2:02pm

I'm not sure why inlining would affect the monomorphization; regardless of whether the functions are inlined, you need many different monomorphizations, because the comparison functions are different.

To implement sorting on multiple arbitrary types without multiplying the amount of code generated, the only thing I can think of is to implement a sort function that uses a trait object for the comparison function. This would have significant performance cost, because it triggers dynamic dispatch on every comparison. As something with a significant performance cost that is not commonly needed, it would be appropriate to implement such a thing in a separate crate, rather than the standard library.

leonardo · June 7, 2020, 2:08pm

The binary bloat comes from both sources: the numerous monomorphizations and the numerous times the same monomorphized instance of the function gets inlined. The type erasure and the never-inline avoid each of them. Sorry for not clearly telling apart the two cases in my original message.

The point I'm trying to express is that the current design of the stdlib sort has a different kind of cost (I mean the binary bloat, and I guess also the compilation time) that I've seen in my codebase it's often not worth paying. I am not sure if it's a good idea to design the stdlib thinking only about one cost, ignoring the other ones.

josh · June 7, 2020, 3:10pm

I just wrote a few test programs, and confirmed that sort does not seem to get inlined directly into the code, and that multiple calls to sort on different vectors seem to call the same function.

Can you give some examples of code where you're finding sorting inlined into your code?

leonardo · June 7, 2020, 3:12pm

Then we can consider this thread essentially done, thank you.

cuviper · June 7, 2020, 3:15pm

You could use libc::qsort if you really want to avoid adding code at all. You would just need a small shim to get the arguments right.

mjbshaw · June 7, 2020, 3:27pm

Is this just speculation or do you have concrete evidence supporting this? If it's just speculation then I recommend collecting concrete evidence before jumping to conclusions. If you do have concrete evidence then I suggest sharing it (e.g., a rust.godbolt.org link).

system · September 5, 2020, 3:27pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Idea: polymorphic baseline codegen compiler	22	3281	March 25, 2019
To write faster code as the compiler devs compiler	4	1218	September 30, 2020
Interior mutability reflection language design	5	944	April 26, 2023
Add back polymorphization compiler	16	917	May 11, 2025
Explicit monomorphization for compilation time reduction compiler	32	5391	April 14, 2022

To fight sort-generated binary bloat and more

Related topics