Vec! lack of optimization for "zeroed" types


#1

There is a thread on the users forum where someone had a question about the relative performance of “vec!” using a default value of a complex that is zeroed vs creating a vec! of all i16’s (twice the length) and then transmuting to a vec! of complex. There seems to be about a 5x difference in performance for a large vec. Is this something the compiler could optimize better? Is this something that the complex or vec libraries could handle better?

See the original discussion here: https://users.rust-lang.org/t/question-about-the-efficiency-of-vec-on-num-complex-complex/19839


#2

if the type is Copy can’t the compiler just manually check that the bytes are all 0?

I wonder if having a Complex::ZERO const would make any difference…


#3

vec does specialize for zero values of known types: https://github.com/rust-lang/rust/blob/master/src/liballoc/vec.rs#L1630

When specialization stabilizes we can make IsZero a public trait and the Complex type could implement it.


#4

Why not just use a function like this?

fn is_mem_zero<T>(v: &T) -> bool {
    unsafe {
        slice::from_raw_parts(v as *const T as *const u8, mem::size_of::<T>()).iter().all(|&v| v == 0)
    }
}

#5

As the linked discussion mentions, this sort of thing would ideally be hidden behind a safe abstraction so as not to need to use unsafe directly.


#6

Is that the most useful thing to be doing? How about having an auto-derived “Zeroable” marker trait that vec! and similar things could leverage to know that a type is safe to represent as a number of zero-bytes for the size of the type? Like mentioned here: https://users.rust-lang.org/t/question-about-the-efficiency-of-vec-on-num-complex-complex/19839/9?u=gbutler69


#7

Which comment in that discussion specifically? Since this is information that Vec uses internally to call alloc or alloc_zeroed, I don’t really see the point in avoiding unsafe here.


#8

That approach has issues with padding bytes.


#9

Why is that? If the compile-time size of the type is known (including padding) and you set all the bytes to zero, because it is zeroable, wouldn’t that suffice? Or are you referring to how the compiler may be using padding bytes for other purposes?


#10

Only in that it might miss an opportunity for optimization. The behavior will still be correct. The optimization will work for all currently supported is_zero types (because they don’t have padding).


#11

So, if a type had an unsafe marker trait auto-derived (or manually implemented) that indicated that the type was zeroable, you’re saying that vec could not rely on that trait to optimize to using the __rust_alloc_zeroed like it would for i16, i32, etc? Or are you saying something else?


#12
#[repr(C)]
struct S {
    a: u8,
    b: u16
}

let almost_zero: S = transmute([0u8, 13, 0, 0]);

almost_zero does not pass is_mem_zero as defined above, but it would presumably test as zero with a more specialized implementation. However, this is means it’s just a missed opportunity for optimization, using alloc followed by copy is still correct. My proposal for is_mem_zero would allow all current types that use is_zero as well as many user-defined types to use the optimized paths, without requiring any public stabilization of traits or features.