Reducing generics bloat


#1

While I was reading this Reddit thread about the design trade-offs of Swift language generics (that also contains an awesome comment by Gankro):

I’ve seen the Reddit user jrmuizel express a common desire:

It would be nice if you could optionally compile rust code with this model to avoid having to monomorphize all the time and therefore improve build times.

This reminds me of a paper I’ve read few years ago, from 2009, Dan Tsafrir, Robert W. Wisniewski, David F. Bacon, and Bjarne Stroustrup: Minimizing Dependencies within Generic Classes for Faster and Smaller Programs. ACM OOPSLA’09:

http://www.stroustrup.com/SCARY.pdf

I don’t know how much generally applicable are the ideas of this paper, if they could be used to reduce the generics bloat in Rust code.

It explains the collections contain methods that depend only a subset of the template arguments:

the size method that returns the number of elements stored by the std::set

template<typename T, typename C, typename A>
size type set<T,C,A>::size() const { return this->count; }

This method just returns an integer data member (the alias size type is some integer type) and so its implementation is independent of T, C, and A. Yet, for every T/C/A combination, the compiler emits another identical instantiation.

Then the paper suggests template hoisting and generalized template hoisting as possible solutions.

Later the paper proposed a language-level solution, that in Rust-y language could look like this:

struct C<X, Y, Z> {
    x: X,
    y: Y,
    z: Z,
}

impl<X, Y, Z> C<X, Y, Z> {
    // For backward compatibility, this is
    // equivalent to: fn f2() use X,Y,Z
    fn f1(self) {
    }

    // Only allowed to use X or Z, not Y
    fn f2(&self) use X, Z {
    }
}

The f2 method is not tied to the generic Y type.


#2

f2 at the very least depends on the size of Y.


#3

My interpretation of the C++ code isn’t very good, take a look at the paper :slight_smile:


#4

We have one thing I’m interested in, and that’s helping the compiler figure out when a generic parameter is not used at all. It happens often (A closure defined in a function that has type parameters, for example, depends on those type parameters) but sometimes that’s a dependency too much.


#5

I’ve just seen this: https://github.com/rust-lang/rust/issues/46477


#6

Seeing how many times some generic functions get instantiated, the feature proposed here (the “use X, Z”) could speed up Rust compilation a little too:


#7

I wonder if something similar could be used to simplify compilation of functions that doesn’t strictly use the generic argument, like some of the memory functions E.g ptr::null, which really just translates to returning null pointer value regardless of the type will generate one instantiation, and a lot of calls for the optimizer to inline for pretty much every type that is used with heap allocation somewhere, which can be a lot.


#8

There may be another optimization possible for generic functions using T: AsRef<Type>. For such functions all of the code after .as_ref() call will be identical.

fn foo<T: AsRef<[u8]>>(arg: T) {
  // a number of generic implementations
}

could be deduped as:

#[inline]
fn foo<T: AsRef<[u8]>>(arg: T) {
   foo_u8(arg.as_ref())
}
fn foo_u8(arg: &[u8]) {
  // one non-generic implementation
}

#9

This requires as_ref to be referential transparent (e.g. the result must not be random or depend on global value).

This transformation should be applicable to many conversion traits such as Into and IntoIterator and futures::IntoFuture


#10

I think it should be possible even if as_ref is a black box, as long as it’s called at the beginning of the function (or can be safely hoisted there), which I think is a realistic expectation for this type of functions.


#11

As a C++ user, I’ve had some experience fighting template-induced code bloat in the past, and there are definitely “patterns” emerging. I thought it could be helpful to share the techniques we know about, whatever the original language, as they may translate to Rust patterns… or even better rustc optimizations.


Here are 3 that I can think of, off the top of my head.

Hoisting

As already mentioned, one pattern is to “hoist” the definition of items outside of the main template code to reduce the number of instantiations. For example, in:

template <typename T, std::size_t N>
class fixed_capacity_vector {
public: /**/
private:
    std::size_t size = 0;
    std::aligned_storage_t<sizeof(T) * N, alignof(T)> data;
};

A lot of the methods will NOT depend on N, the iterator and const_iterator types will not depend on N, … In this case, the use of “mixin” is pretty natural: define a base class which stores all the non-N stuff!

Note: LLVM’s SmallVector is an example of this strategy.

As already mentioned by @kornel with AsRef, this can also be applied to functions where a “monomorphic” core can be extracted from a “generic” sandwich.

Shims

When there is no immediate monomorphic core, it can be possible to create one by introducing light-weight type-erasure.

The best C++ example I have here is a printf-like replacement:

template <typename Writer, typename... Args>
void printf(Writer& writer, char const* format, Args const&... args);

Writing all the formatting code there is going to lead to a large generated function, and it’s going to be repeated for each slight variation of the argument types (and each permutation!). Furthermore, the actual meat of the formatting is likely to be significant (formatting an integer in ASCII costs, etc…), so a small performance penalty would be unnoticeable…

void printf_impl(
    IWriter& writer,
    char const* format,
    std::initializer_list<IArgument const*> args
) __attribute__((flatten));

template <typename Writer, typename... Args>
void printf(Writer& writer, char const* format, Args const&... args) {
    auto iWriter = make_iwriter(writer);
    printf_impl(
        iWriter,
        format,
        { &static_cast<IArgument const&>(make_iargument(args))... }
    );
}

The code can still be inlined, if at least via LTO, but the bloat is considerably reduced.

Total Type Erasure

More extreme, laying a thin layer of strongly-typed code over a void* core. This obviates the need to instantiate shims, albeit only provides very limited functionality obviously. Still, qsort does works with void*, so not all is lost.


#12

I’ve created this thread with the hope that adding a compiler feature (with a specific syntax) could help “significantly” reduce the generics bloat :slight_smile:


#13

I hope that the compiler could one you up here, and automagically reduce bloat in some circumstances.

For example, the hoisting strategies seem amenable to automation.

In any case, thanks for creating the thread.


pre-RFC: Automatic generic-to-dynamic conversion
#14

An idea I had a while back closely related to this: