That is, macro-generated code will contain only simple forwarding and no actual logic.
Will this refactoring help to make code slightly faster to compile, because we don’t duplicate functions? Or is deduplication happening anyway?
Does it make sense to actually do this refactoring, or, given that the current code is working OK and is highly optimized, it’s better not to touch it?
As IterRaw<T> is generic, it’ll be monomorphized for each <T>, but if you use both Iter<T>(IterRaw<T>) and IterMut<T>(IterRaw<T>) with the same T the forwarding will be monomorphized for both wrappers but once for the raw iter.
Even if dedupe happens, it happens at the LLVM level and giving LLVM less to do will almost always be “better”.
As the existing implementation is macro-copy-pasted currently, achieving the reuse through the type system seems better to me (though I suppose it introduces another forwarding Iterator before aggressive inlining, so isn’t a surefire win).
As a warning, last time this was touched Ralf found out that even seemingly trivial changes had material perf impact because slice iteration is so fundamental to most rust programs. (Hence the use of the normally-unneeded-and-bad #[inline(always)] and such.)
So while I support code simplification in general, in this particular case I might prefer the less-indirection of macros to make sure that LLVM continues to do as good a job as possible with it.
Welp, the best way to figure out if perf will be impacted is to profile, right? It doesn’t seem like this would be a large change, so I think the cost of experimentation would be low here.
The cost of making the code change is small; the cost of the profiling may be large.
My point isn't that it should or shouldn't be done -- I honestly don't know -- but that the tradeoffs for things as core as Vec can be different from the ones one might normally make. For example, Vecs are monomorphized into so many crates that it might turn out that arguably-cleaner code in liballoc is worse because it makes for more code for LLVM to have to churn through, and thus compilation slower for users. It's just hard to measure and weight such things.