Exploit the padding?

Not everything needs to be fast. For example if your priority is embedded systems with kilobytes of RAM.

Not particularly. It would break unsafe code that relies on padding. That code should be audited before using the relevant compile-time flags.

Like anything that uses std::ptr::copy and similar functions? It claims it "Copies count * size_of::<T>() bytes from src to dst", and that includes padding bytes. This means that under your proposal the current implementations of std::mem::replace & co are unsound.

7 Likes

I work with the SNES, which has 128kiB of WRAM normally (more if you exploit the fact the cartidge has control over most of the address space, and add on-cartridge DRAM). In this case, the alignment of u32 is 2, and the ((u32,u8),u8) has a total of two padding bytes. To move the inner (u32,u8) costs 3 16-bit moves, or 2 16-bit moves and 1 8-bit move. Changing the size of moves costs 3 cycles and saves 1 per access (thus, if m=0, so memory accesses are 16-bit, the former is two cycles faster, and if m=1, so memory accesses are 8-bit, the latter is two cycles faster iff the 8-bit move occurs first; note: this is an oversimplification). I'd gladly trade those two bytes for those two cycles. I can certainly understand when that may not be the case, though.

2 Likes

std::ptr::copy is generic and can thus be monomorphized to take padding into account.

it's fine.

That would make it do something different than what it claims to do. It doesn't say it will copy count Ts, it says it will copy count * size_of::<T>() bytes, which is different because it implies it will copy padding bytes.

Sometimes breaking changes are necessary.

Thankfully this one is meant to be opt-in, which makes it not really breaking.

... Altho maybe a better place to make this change would be a target 'triplet'. x86_64-pc-linux-gnu-packed ABI?

Btw: (((u32, u8), u16), u8) is not covered by size+stride, but is covered by padding.

It cannot, because there is no requirement that the type used with std::ptr::copy be the same as the type that was originally used.

This code runs successfully under Miri:

use std::mem::MaybeUninit;

fn bytewise_copy<T>(src: &mut T, dst: &mut T) {
    let src_ptr: *mut MaybeUninit<u8> = src as *mut T as *mut _;
    let dst_ptr: *mut MaybeUninit<u8> = dst as *mut T as *mut _;
    unsafe { std::ptr::copy(src_ptr, dst_ptr, std::mem::size_of::<T>()); }
}

fn main() {
    let mut src: (u32, u8) = (10, 20);
    let mut dst: (u32, u8) = (30, 40);
    bytewise_copy(&mut src, &mut dst);
    println!("Dest: {:?}", dst);
}

If dst was a pointer into a ((u32, u8), u16), then bytewise_copy would end up overwriting the value of the u16, since it will overwrite the padding bytes used to store the u16. The only type that std::ptr::copy is told about is MaybeUninit<u8> - so even if we wanted to change its behavior, we couldn't, since we don't have the necessary information available when the function is monomorphized.

Yes, but only to fix soundness bugs, or as part of specifying previously underspecified areas of the compiler (e.g. std::mem::uninitialized with uninhabited types and non-zero-initializeable types). This case is neither of those.

6 Likes

Wait isn't padding UB today?

Anyway, what's wrong with forking the stdlib for this one feature? A critical feature that every web browser written in Rust should use.

I'm not sure what you're asking. The existence of padding is exposed via std::mem::size_of, and it's safe to read uninitialized bytes as MaybeUninit.

If you mean literally forking the rust-lang/rust repo and modifying the standard library (and trying to distribute that in some way), nothing is stopping you.

This is a pretty sweeping claim - do you have any benchmarks showng that this feature will give the performance gains that you think it will?

No we mean having two slightly different stdlibs, and picking between them based on target triplets.

In particular, having ptr::read be aware of padding in one of them.

It should lower RAM usage by half. The performance hit is more than worth it. It'll be more than made up for by not hitting swap.


(would definitely be interesting if gcc-rust had a GNU extension for packed structs. the -ffast-math of Rust layout optimizations.)

I posted a concrete example of code that copies data one byte at a time, with no knowledge of the 'actual' type. How can that existing code be compliled to make your proposal work?

Where are you getting 'half' from?

I would be very surprised if any browser frequently required swap due to exhausting RAM. Do you have any concrete data about how often this happens?

You break it. Opt-in, ofc.

Yeah we don't have much RAM. The swap is almost always at least 25% full. At least the system isn't entirely unusable because of it, after a lot of tuning. But still not great.

Do you have profiles showing that a large fraction of real-world web browser RAM usage is going to struct padding?

3 Likes

Letting aside the fact that if you end up with a type like that you probably have bigger problems, can't you rewrite your program so that it uses an equivalent type but without padding? A bit of tuning on the most used structs should take much less effort than rewriting the stdlib and any other crate to be compatible with your proposal. Not to mention the ecosystem split and the double effort people will have to make to support both ways.

10 Likes

Would I be right to summarize thinking as

  • Option8 - group discriminants together
  • Zero padding - use niches
  • size/stride - remove padding

?

E.g. three very different but related ideas?

This one is different from all of those tbh. None of those handle (((u32, u8), u16), u8) the way we want.

That's fine, but why must you spell your type (((u32, u8), u16), u8)? Is there really no other spelling for that type that makes sense?

rust already has repr(C,packed) that has that behaviour. I'd hope that gcc-rs would support that, and not some random extension to achieve the same.

Also, if this applies to builtin types like tuples, this would also affect impls that provide stable abi guarantees, such as GitHub - LightningCreations/lccc: Lightning Creations Compiler Frontend for various languages. Either this has to become the default (which is a breaking change as mentioned above), or the abi changes with compiler flags/options outside of the two abi-control options -Z repr-rust-layout and -Z build-abi. Both of these are well beyond what I would want to implement, especially since it would likely apply to things like TokenStream that crosses the boundary between the compiler and compiled rust code, potentially breaking the knowledge I otherwise posses about the guaranteed layout of the type.

Because that's how you'd usually lay out your types, unless you happen to only ever use integers/floats directly, or objects with guaranteed no padding.

You wouldn't write them literally like that, but you would build them up like that, with generics, etc.