Feature Idea: Allow reducing the alignment

Various discussions about memory layout and types with larger alignment (generic integers), as well as having a few large-alignment types in our own codebase got me thinking more about alignment and memory layout.

Feature suggestion: Re-align

#[repr(align(512))]
struct Large(u8, u64);

enum Command {
    SomethingElse(u8),
    LargeVariant(#[realign(1)] Large),
}

impl Large {
    #[min_align(1)]
    fn access_u8(&self) -> u8 {
        self.0
    }
    #[min_align(8)]
    fn access_u64(&self) -> u64 {
        self.1
    }
    #[min_align(8)]
    fn write_u64(&mut self) {
        self.1 = 5;
    }
    /// By default only allow this function under the "real"
    // alignment (to avoid breaking things).
    fn something_that_needs_high_alignment(&self) {
    }
}

Instead of forcing the alignment upon the Command type this would require moving the bytes to an aligned memory location (e.g. on the stack) to call functions that require a higher alignment:

fn do_something(cmd: Command) {
    if let Some(value) = cmd {
        value.access_u8(); // Allowed because it is already aligned
        value.align().access_u64(); // Allowed by moving the required bytes (or the entire thing)
        value.aligned(|v| v.access_u64()); // Alternative syntax
        value.aligned_mut(|v| v.write_u64()); // Copies the bytes back afterwards (v is &mut Large with alignment of 8 or higher)
        value.aligned(|v| v.something_that_needs_high_alignment())
    }
}

aligned and aligned_mut may need some way to specify which alignment is required for the code running in its closure (or what can be called on it afterwards).

Why would/could this be useful? This allows (at the cost of copying bytes around) to reduce the alignment of a type without having to put it on the heap (though you might still want to do that). It would also allow specifying functions that do not require higher alignment to still be used on lower-alignments (for example when none of the high-alignment functions are needed/used).

Motivation

Let's say I have a type with a large alignment (and no invalid bit patterns) and want to use it in the context of another struct/enum:

#[repr(align(512))]
struct Large(u8);

enum Command {
    SomethingElse(u8),
    LargeVariant(Large),
}

Due to the alignment restriction Command has a size of 1024 bytes. The same goes for something simple like Option<Large>.

The best way to solve this (as far as I know) is to put the type with a large alignment into a Box:

enum Command2 {
    SomethingElse(u8),
    LargeVariant(Box<Large>),
}

this reduces the size to 512+16 (512+8 for Option<Box<Large>>) and the alignment to 8.

For the code where you need this alignment there is no way around that (as long as size must be a multiple of alignment), but I think there is a better option, especially when you don't need the alignment requirement for everything (in our codebase we need it only in one place):

// Conceptually
enum Command3 {
    SomethingElse(u8),
    LargeVariant([u8; 512]),
}

This drops the alignment down to 1 and the size to 513 bytes, at the cost of having to copy the memory when the alignment is required.

godbolt (size+alignment computation)

I feel like this can just be done in user code (by creating a second struct that mirrors Large but with an appropriate packed representation added), and it seems most of the ergonomics (like all those align, aligned, aligned_mut methods) that you present here seem possible to achieve using a macro, doesn't it?

2 Likes

The problem with this is that assigning to a realign-ed value would be non-trivial.

Someone will want to assign a 512 byte aligned Large in a Command::LargeVariant. That would require non-trivial conversion in certain cases.

2 Likes

You can make a straightforward wrapper in plain code... except for internal mutability causing issues. (I wonder if the built-in derives handle this...)

#[derive(Copy, Clone, …)]
#[repr(packed(8))]
struct Realign8<T>(pub T);

impl<T> Realign8<T> {
    pub fn into_inner(self) -> T {
        self.0
    }

    pub fn with_ref<R>(&self, f: impl FnOnce(&T) -> R) -> R
    where
        T: Freeze,
    {
        let it = ManuallyDrop::new(unsafe { ptr::from_ref(self).read().0 });
        f(&it)
    }

    pub fn with_mut<R>(&mut self, f: impl FnOnce(&mut T) -> R) -> R {
        let mut it = guard(unsafe { ptr::from_ref(self).read().0 }, |it| {
            unsafe { ptr::from_mut(self).write(Realign8(it)) };
        });
        f(&mut it)
    }
}

EDIT: I think I fixed the soundness issues.

1 Like

reminder that size must be a multiple of alignment

That looks unsound in cases where T is aligned more than 8?

1 Like

Ah whoops I meant to use the *_unaligned pointer methods :person_facepalming:

Yes? How is that relevant here? The wrapper sets the alignment to 8 independent of the field's size/align and only provides access by temporarily moving the field to the stack, where it is properly aligned (with any trailing padding that might not be present in the wrapper). There is no attempt to avoid the copy to the stack when the value is coincidentally aligned, which would be potentially unsound.

…Wait, even with *_unaligned, read/writing T could write more bytes than Realign8<T>; the write needs to happen with type Realign8<T> instead to be sound. That's a good catch. (The compiler will handle that automatically for direct place assignments but can't when going through a pointer, which is needed in this case to avoid dropping the stale value.)

(sorry for triple post)

Alright, I think I fixed the soundness issues in that example.

Actually, no, the size of T is still the size of T even when its alignment has been forced lower by #[repr(packed)]. Using *_unaligned pointer access should be sufficient.

But I actually think doing the access with type Realign8<T> looks a little cleaner, and it makes it more obvious to optimizations that an alignment of 8 can be used for the load/store. It likely doesn't matter either way and other incidental changes likely cause a greater impact to compile time, so I guess it's just a style choice in the end. (And the only reason I choose to load as Realign8<T> in the end is that the rustfmt result is cleaner; usually I'd prefer to avoid having any state where an unwind would be unsound by loading the dupe value already protected by ManuallyDrop.)

The built-in derives require the fields of #[packed] to be Copy, which (at least currently) prevents them from containing shared mutability. Given the number of times I've seen "could Cell be Copy" discussed and seen the answer be "yes but it'd be a footgun like with iterators" and haven't seen this implication of Copy: Freeze mentioned, the logic seems slightly fragile, but that's at least how it stands currently.

1 Like

I'd say if you can't afford to waste extra 512 bytes on the stack due to overalignment, then you likely can't afford to waste the 513 bytes required by packed enum either. And why exactly is your struct aligned to 512 bytes? This seems like a very odd requirement. Typically you either require extra alignment due to hardware restrictions (e.g. align(16) to use SSE2 instructions efficiently), or rarely you align to cache line or page size. Your example is none of that, and page alignement won't work with your approach anyway.

You'd also waste quite a lot of cycles copying those big structures around, and you'd probably copy a lot due to misalignment. So, what exactly are you trying to do here?

Note that you can already achieve all your goals via wrapper types with #[repr(packed)], which is enough for occasional use.

You're probably right. Even though there isn't much copying involved in my case it likely still isn't worth inflating the size of the enum in the first place (which means using a Box regardless).

It is. The ENCLU(EREPORT) instruction on x86 requires an alignment of 512 bytes (Intel® Software Guard Extensions Programming Reference section 2.15).

I'm effectively calling this instruction once and store the result somewhere (the reads don't require this high of an alignment). That's why this alignment only matters once and why I was more concerned about the memory footprint+alignment of the containing structure.

True, I haven't thought about #[repr(packed)] (or manually copying the bytes using unsafe into a [u8; 512]).

1 Like

I see. In that case I would probably introduce ad-hoc structures specifically for calling that instructions. Something like

#[repr(align(512))]
struct Ereport([u8; EREPORT_SIZE]);

Afterwards you can just pass the contents around as a byte array.

1 Like