To fight sort-generated binary bloat and more

If in my code I have to perform a sort on many small slices that contain different types, in many different parts of the code, I think my binary grows a bit too much, because the compiler inlines too much sorting code, monomorphizing it too many times. To solve this problem we can add a small_stable_sort function in the Rust stdlib, that contains only a little amount code (like a selection sort). Or we can add in the stdlib a sort function prefixed by inline(never). Or we can add a stub function that calls another sort function that uses type erasure (as with the qsort of C language). An even simpler solution is to introduce in Rust the possibility to add an inline(never) at the call point (but this gets tricky if you are using iterator chains like itertools sorted):

#[inline(never)]
my_slice.sort();

I'm not sure why inlining would affect the monomorphization; regardless of whether the functions are inlined, you need many different monomorphizations, because the comparison functions are different.

To implement sorting on multiple arbitrary types without multiplying the amount of code generated, the only thing I can think of is to implement a sort function that uses a trait object for the comparison function. This would have significant performance cost, because it triggers dynamic dispatch on every comparison. As something with a significant performance cost that is not commonly needed, it would be appropriate to implement such a thing in a separate crate, rather than the standard library.

2 Likes

The binary bloat comes from both sources: the numerous monomorphizations and the numerous times the same monomorphized instance of the function gets inlined. The type erasure and the never-inline avoid each of them. Sorry for not clearly telling apart the two cases in my original message.

The point I'm trying to express is that the current design of the stdlib sort has a different kind of cost (I mean the binary bloat, and I guess also the compilation time) that I've seen in my codebase it's often not worth paying. I am not sure if it's a good idea to design the stdlib thinking only about one cost, ignoring the other ones.

I just wrote a few test programs, and confirmed that sort does not seem to get inlined directly into the code, and that multiple calls to sort on different vectors seem to call the same function.

Can you give some examples of code where you're finding sorting inlined into your code?

1 Like

Then we can consider this thread essentially done, thank you.

You could use libc::qsort if you really want to avoid adding code at all. You would just need a small shim to get the arguments right.

1 Like

Is this just speculation or do you have concrete evidence supporting this? If it's just speculation then I recommend collecting concrete evidence before jumping to conclusions. If you do have concrete evidence then I suggest sharing it (e.g., a rust.godbolt.org link).

4 Likes