Exploit the padding?

Types like (u32, u8) have padding. Types like u8 have alignment of 1. Why not make ((u32, u8), u8) the same size as (u32, u8) by using the padding? It would lower RAM usage by a lot.

This would make tuples not safe to memcpy, but thankfully nobody does that...

(... right?)

1 Like

Every move is effectively a memcpy, including assignment, even though it doesn't have to be a literal call to libc::memcpy.

If you pass a &mut (u32, u8) somewhere, it has to be safe to write that without knowing outer context, like whether there might be data lurking in the padding.

12 Likes

Related:

2 Likes

Also Towards even smaller structs - #3 by scottmcm

1 Like

The padding isn't currently defined as being copied.

In particular rustc wouldn't be able to emit writes for padding ever.

That's not related. That's about enums, this is about padding.

The "downside" can be solved with an optimization target: an "-Olowmem" equivalent maybe. (yeah changing struct sizes based on optimization targets is kinda weird but only kinda. maybe some other codegen flag tho?) It's also possible the gains in memory usage could increase performance beyond the losses in code size, so that's also something to keep in mind.

Use-cases include embedded systems and web browsers.

Why not? Reading from padding bytes will produce uninitialized bytes anyway. At least the GCC emits code to write for padding.

In case if you confused why I shared the C code here, the Rust object/memory model is not formally specified yet but many of us believes it will inherit "what C does" with modifications. So it would make sense to assume "what C does" for cases like this where we've not made much decisions (yet).
1 Like

Memory in Rust (AFAIK) is not typed - only accesses are typed. So, it's legal for unsafe code to transmute a &mut (u32, u8) to a &mut [u8; 8], and write through the new pointer.

Also, as a practical consideration: forbidding writes that affect padding would mean that Rust can never emit a 64-bit move for a write to a &mut (u32, u8). Instead, it would need to emit a 32-bit move and and 8-bit move (so that it doesn't touch the three bytes of padding).

No it's not. &mut T is invalid if it points to invalid T, and u8 is invalid if uninitialized. It is mentioned that the padding between the fields are considered uninitialized.

https://doc.rust-lang.org/reference/behavior-considered-undefined.html

2 Likes

It's not about what you, the user of the inner (u32, u8), may read. It's a problem if your write clobbers the extra data of someone's outer ((u32, u8), u8) which occupied that padding.

2 Likes

FYI this was already proposed, Idea: guaranteed-zero padding for niches · Issue #70230 · rust-lang/rust · GitHub

We can just define that:

let a: &mut (u32, u8);
*a = b;

will de-sugar to:

*a.0 = b.0;
*a.1 = b.1;

That's about enums. This is about structs.

Guaranteed zero-padding makes padding unusable for extra fields. On the other hand, guaranteed not-touch-the-padding makes the padding usable for extra fields, which may include but is not limited to the enum tag.

That may be less efficient then what can already be generated. For (u32, u8), perhaps (since that's 32-bit read/write+8-bit read/write), but for say (String, u8), which is 4 usize, It could easily be worse to emit a memcpy for only the first 3, then a 8-bit read, rather than simply a 4*size_of::<usize> memcpy, which can just be an avx2 move or two sse moves (and only 1 on ix86), vs. 1 sse move, one qword move, and one byte move. Example codegen (edit, fixed to use intel syntax instead of AT&T but not actually):

// Rust signature: extern"sysv64" fn((String,u8))->(String,u8)
copy_string_with_padding_sse:
    movups xmm0, [rdi]
    movups [rax],xmm0
    movups xmm0,[rdi+16]
    movups [rax+16],xmm0
    ret

copy_string_no_padding:
    movups xmm0,[rdi]
    movups [rax],xmm0
    mov rsi,[rdi+16]
    mov [rax+16],rsi
    mov sil,[rdi+24]
    mov [rax+24],sil
    ret
3 Likes

Good point - however, I think this would still apply to a raw pointer (*mut [u8; 8]).

I definitely think there's value to allowing Rust to exploit the padding of structs. However, I think it would need to be opt-in (e.g. an attribute to disable taking a reference to a field) to preserve performance, and to avoid breaking unsafe code.

1 Like

It's all the same problem, though. The thing that currently blocks it is reading and writing through references. So the solution is to disallow those references, at which point the compiler can emit whatever code it needs to in order to handle the overlap.

struct Foo { move x: u8, move y: (u8, u16) } would allow smart overlaying of those structs in the same way it would allow arithmetic coding of multiple enum fields.

1 Like

No it can work with references too. It would emit more code but the RAM savings would be more than worth it.

It should be opt-in with a compiler flag. Unsafe code should be aware of it, but that's all.

This is definitely not true in general - for example, a program that performs a large number of reads/writes from a fixed number of locations in memory.

Enabling this flag would break any unsafe code that performs any reads/writes through a differently typed pointer of equivalent size (e.g. *mut [u8; 8] or *mut [MaybeUninit<u8>; 8])

1 Like