Exploit the padding?

Soni · June 27, 2021, 1:45am

Types like (u32, u8) have padding. Types like u8 have alignment of 1. Why not make ((u32, u8), u8) the same size as (u32, u8) by using the padding? It would lower RAM usage by a lot.

This would make tuples not safe to memcpy, but thankfully nobody does that...

(... right?)

cuviper · June 27, 2021, 2:23am

Every move is effectively a memcpy, including assignment, even though it doesn't have to be a literal call to libc::memcpy.

If you pass a &mut (u32, u8) somewhere, it has to be safe to write that without knowing outer context, like whether there might be data lurking in the padding.

quinedot · June 27, 2021, 3:07am

scottmcm · June 27, 2021, 3:48am

Also Towards even smaller structs - #3 by scottmcm

Soni · June 27, 2021, 12:14pm

The padding isn't currently defined as being copied.

In particular rustc wouldn't be able to emit writes for padding ever.

Soni · June 27, 2021, 12:15pm

That's not related. That's about enums, this is about padding.

Soni · June 27, 2021, 12:23pm

The "downside" can be solved with an optimization target: an "-Olowmem" equivalent maybe. (yeah changing struct sizes based on optimization targets is kinda weird but only kinda. maybe some other codegen flag tho?) It's also possible the gains in memory usage could increase performance beyond the losses in code size, so that's also something to keep in mind.

Use-cases include embedded systems and web browsers.

hyeonu · June 27, 2021, 1:41pm

Why not? Reading from padding bytes will produce uninitialized bytes anyway. At least the GCC emits code to write for padding.

_{In case if you confused why I shared the C code here, the Rust object/memory model is not formally specified yet but many of us believes it will inherit "what C does" with modifications. So it would make sense to assume "what C does" for cases like this where we've not made much decisions (yet).}

Aaron1011 · June 27, 2021, 2:12pm

Memory in Rust (AFAIK) is not typed - only accesses are typed. So, it's legal for unsafe code to transmute a &mut (u32, u8) to a &mut [u8; 8], and write through the new pointer.

Also, as a practical consideration: forbidding writes that affect padding would mean that Rust can never emit a 64-bit move for a write to a &mut (u32, u8). Instead, it would need to emit a 32-bit move and and 8-bit move (so that it doesn't touch the three bytes of padding).

hyeonu · June 27, 2021, 2:35pm

No it's not. &mut T is invalid if it points to invalid T, and u8 is invalid if uninitialized. It is mentioned that the padding between the fields are considered uninitialized.

https://doc.rust-lang.org/reference/behavior-considered-undefined.html

cuviper · June 27, 2021, 2:42pm

It's not about what you, the user of the inner (u32, u8), may read. It's a problem if your write clobbers the extra data of someone's outer ((u32, u8), u8) which occupied that padding.

elichai2 · June 27, 2021, 2:55pm

FYI this was already proposed, Idea: guaranteed-zero padding for niches · Issue #70230 · rust-lang/rust · GitHub

elichai2 · June 27, 2021, 2:56pm

We can just define that:

let a: &mut (u32, u8);
*a = b;

will de-sugar to:

*a.0 = b.0;
*a.1 = b.1;

Soni · June 27, 2021, 3:19pm

That's about enums. This is about structs.

Guaranteed zero-padding makes padding unusable for extra fields. On the other hand, guaranteed not-touch-the-padding makes the padding usable for extra fields, which may include but is not limited to the enum tag.

InfernoDeity · June 27, 2021, 3:46pm

That may be less efficient then what can already be generated. For (u32, u8), perhaps (since that's 32-bit read/write+8-bit read/write), but for say (String, u8), which is 4 usize, It could easily be worse to emit a memcpy for only the first 3, then a 8-bit read, rather than simply a 4*size_of::<usize> memcpy, which can just be an avx2 move or two sse moves (and only 1 on ix86), vs. 1 sse move, one qword move, and one byte move. Example codegen (edit, fixed to use intel syntax instead of AT&T but not actually):

// Rust signature: extern"sysv64" fn((String,u8))->(String,u8)
copy_string_with_padding_sse:
    movups xmm0, [rdi]
    movups [rax],xmm0
    movups xmm0,[rdi+16]
    movups [rax+16],xmm0
    ret

copy_string_no_padding:
    movups xmm0,[rdi]
    movups [rax],xmm0
    mov rsi,[rdi+16]
    mov [rax+16],rsi
    mov sil,[rdi+24]
    mov [rax+24],sil
    ret

Aaron1011 · June 27, 2021, 4:07pm

Good point - however, I think this would still apply to a raw pointer (*mut [u8; 8]).

Aaron1011 · June 27, 2021, 4:27pm

I definitely think there's value to allowing Rust to exploit the padding of structs. However, I think it would need to be opt-in (e.g. an attribute to disable taking a reference to a field) to preserve performance, and to avoid breaking unsafe code.

scottmcm · June 27, 2021, 4:34pm

It's all the same problem, though. The thing that currently blocks it is reading and writing through references. So the solution is to disallow those references, at which point the compiler can emit whatever code it needs to in order to handle the overlap.

struct Foo { move x: u8, move y: (u8, u16) } would allow smart overlaying of those structs in the same way it would allow arithmetic coding of multiple enum fields.

Soni · June 27, 2021, 5:13pm

No it can work with references too. It would emit more code but the RAM savings would be more than worth it.

It should be opt-in with a compiler flag. Unsafe code should be aware of it, but that's all.

Aaron1011 · June 27, 2021, 6:00pm

This is definitely not true in general - for example, a program that performs a large number of reads/writes from a fixed number of locations in memory.

Enabling this flag would break any unsafe code that performs any reads/writes through a differently typed pointer of equivalent size (e.g. *mut [u8; 8] or *mut [MaybeUninit<u8>; 8])

Topic		Replies	Views
Reordering of writes via differently-typed pointers Unsafe Code Guidelines	14	2019	March 25, 2019
Pre-RFC: mem::trailing_padding!	13	307	October 10, 2024
Writing down binary data... with padding bytes	69	6439	February 21, 2020
Make a way to avoid UB when accessing byte representation of type with padding	6	1062	November 15, 2020
Pre-RFC: Allow array stride != size language design	34	3005	January 16, 2023

Exploit the padding?

Related topics