[Pre-RFC] Grouping fields in repr(Rust) for layout control


#1

Summary

Introduce semicolon as group separator in #[repr(Rust)] struct definitions. Groups are stored in the order they are declared, while fields inside a group are subject to reordering for meeting alignment requirements.

Motivation

The default repr(Rust) says nothing about fields order, and repr(C) guarantees every fields are stored in the order they are declared. C-style down-casting needs guaranteed data layout, but may not need such a strong guarantee that repr(C) does. Group separators can specify partially the struct layout as described above, keeping rustaceans from resorting to repr(C).

Guide-level explanation

If struct Base and part of struct Derived are identical in layout, pointers of Base and Derived may be casted to each other.

Example

The following is an example of a base/derived struct.

// a base struct.
struct Link {
    next : *mut Link,
    prev : *mut Link,
}

// a derived struct.
struct Node<T,W> {
    next   : *mut Link,
    prev   : *mut Link,
    flag   : u8,
    elem   : T,
    weight : W,
}

Casting between *Link and *Node requires next and prev fields stored in the very beginning in both struct, which results in using repr(C).

With group separators, the structs can be written as:

struct Link {
    next : *mut Link; // a group composed by one field
    prev : *mut Link; // another group composed by one field
}

struct Node<T,W> {
    next   : *mut Link; // a group composed by one field
    prev   : *mut Link; // another group composed by one field
    flag   : u8,
    elem   : T,
    weight : W,
}

If groups containing identical fields declarations are guaranteed to have the same layout, they can be written as:

struct Link {
    next : *mut Link, // first field in the group
    prev : *mut Link; // second field in the group
}

struct Node<T,W> {
    next   : *mut Link, // first field in the group
    prev   : *mut Link; // second field in the group
    flag   : u8,
    elem   : T,
    weight : W,
}

Compiler Errors and Warnings

  • Semicolons shoud not be used in instantiating struct.

    error: expected one of `,`, `.`, `?`, `}`, or an operator, found `;`
      --> main.rs:13:21
       |
    13 | n = Node{ next:a, prev:b; /* omitted */ };
       |                         ^ expected one of `,`, `.`, `?`, `}`, or an operator
    
  • Only the first group can have ZSTs.

    error: only the first group can have ZSTs.
      --> main.rs:13:21
       |
    13 | struct S { a: u8, b: bool; c: () }
       |                               ^ only the first group can have ZSTs
    
  • Only the last group can have DSTs.

    error: only the last group can have DSTs.
      --> main.rs:13:21
       |
    13 | struct S { a: isize, b: [u8]; c: String }
       |                         ^ only the last group can have DSTs
    
  • Semicolons as group separators are only applicable in repr(Rust)].

    warning: layout groups are not applicable to repr(C)
      --> main.rs:13:21
       |
    13 | #[repr(C) struct S { a: u8, b: bool; c: usize }
       |                                    ^ help: consider using ',' instead
    

Reference-level explanation

The implemenation is straight-forward. Rather than doing sorting on all the fields to meet alignment requirements, the fields are grouped by semicolons and sortings are applied on each group. Finally each group’s layout are concatenated and paddings may be appended to groups if needed.

The compiler should also guarantee that identical declarations result in identical layouts, to make group containing more than one field suitable for poiner casting.

As mentioned previously, ZST/DST are allowed only in first/last group, or compiler errors will be generated.

Drawbacks

Programmers who are used to C/C++ may use semicolons as field separators by chance, resulting in potentially more space cost of the struct.

Rationale and alternatives

If resorting to repr(C), the Link/Node struct mentioned in the example section can be written as two different alternatives:

  • 1.simply adding #[repr(C)]

      #[repr(C)]
      struct Link {
          next : *mut Link,
          prev : *mut Link,
      }
    
      #[repr(C)]
      struct Node<T,W> {
          next   : *mut Link,
          prev   : *mut Link,
          flag   : u8,
          elem   : T,
          weight : W,
      }
    
  • 2.using extra struct definition(s) to achieve similar result

    #[repr(C)]
    struct Link {
        next : *mut Link,
        prev : *mut Link,
    }
    
    // it's repr(Rust)
    struct Data {
        flag   : u8,
        elem   : T,
        weight : W,
    }
    
    #[repr(C)]
    struct Node<T,W> {
        link: Link,
        data: Data,
    }
    

Group separator proposed here has advantages over them:

  • potentially more compact layout size compared to alternative #1

    For example, the size of Node<usize,u8> in repr© is greater due to field reordering completely disabled.

  • more brief compared to alternative #2

    No need to use attributes since repr(Rust)] is default. No need to definestruct Data`.

Prior art

<???>

#2

Maybe I’m missing something because I never do the kind of low-level programming that requires this stuff, but this seems more like an argument for guaranteed “prefix layout” of one or more fields rather than grouping arbitrary subsets of fields. Especially since Node could just contain a Link directly. We would need some kind of explicit guarantee that Link has the same layout when embedded in other structs, but that seems uncontroversial (maybe we already have it?)… at least to me. Then all you need is #[repr(prefix)] or whatever on the link: Link field, which seems much less subtle than having to worry about, say, whether Link and Node consisting of two groups each with the same fields also guarantees the same padding between those two groups. My impression is that the idea of “prefix layout” is already generally accepted as a thing we’ll probably get someday (it comes up in a lot of the stuff https://github.com/rust-lang/rfcs/issues/349 links to), though I’m not sure if it’s feasible to decouple it from all the other “virtual struct” issues.

Also, this is the unimportant bikeshed part so please don’t focus on it, but a semicolon instead of a comma is way too subtle. Attributes on the fields seem much better for layout configuration like this.


#3

The syntax is waaay to subtle.


#4

Thanks for your reply. Perhaps it’s better to search previous related discussions to get more sophisticated ideas before I post it. Seems much more things worth considering than I have thought :slight_smile:


#5

I think that if this is being used it’s probably because other nuanced unsafe code is relying on it, so it should be very obvious – not something that looks like something I typo all the time when switching between rust and languages that end fields with semicolons.

Note that Rust does have a facility for grouping fields together in layout: making another struct :smile: ((u16, u8), (u32, u8)) is a very different type from (u16, u8, u32, u8).

So I think things like #[repr(linear)] (first mention I can find: https://github.com/rust-lang/rust/pull/37429#issuecomment-260080723) or #[repr(first_stays_first)] or whatever is a better way to do this kind of thing.


#6

This might be a bit of an obvious question here, but if you want to downcast from a &Derived to &Base, couldn’t the Derived struct just contain a Base then implement AsRef<Base>?

  #[repr(C)]
  struct Link {
      next : *mut Link,
      prev : *mut Link,
  }

  #[repr(C)]
  struct Node<T,W> {
      link: Link,
      flag   : u8,
      elem   : T,
      weight : W,
  }

impl<T, W> AsRef<Link> for Node<T, W> {
    ...
}

It seems like you can already solve this problem without needing to add a language feature or have special cases in the compiler. Why not pull the common fields out into their own struct if you need to ensure the same group of fields are laid out in the same way in different types? That way you don’t need to use #[repr(C)] and use raw pointer casts, you’d just be leveraging the existing type system (e.g. use AsRef and friends to make working with the group of fields more ergonomic).