PhantomData, UnsafeCell, and repr C


#1

PhantomData is a Zero-Sized Type. In Rust’s #[repr(C)], ZSTs remain zero-sized, so we could add PhantomData to a layout-sensitive struct without damaging the layout.

But PhantomData cannot be used in types with #[repr(C)], because PhantomData itself is not #[repr(C)]. It triggers the improper_ctypes warning.

(Side note: the fact that marking a struct #[repr(C, packed)] and including a non-C-repr member is a warning by default, and not an error, is disconcerting.)

#[repr(C, packed)] is overloaded to mean two things:

  1. Lay this out like C.
  2. Lay this out exactly as I have written it without any clever field reordering.

In embedded/driver development we use meaning 2. Under this meaning, it’s reasonable (if unusual) to want a laid-out-as-written type (such as a register set) to take an otherwise unbound type parameter, requiring the use of PhantomData. (I am attempting to use this to model the distinctions between the six different timer-counter flavors on the STM32F4, for example.)

Because PhantomData is a lang item, I can’t define my own equivalent of it bearing #[repr(C)] (and still link with core).

This same set of problems applies to UnsafeCell, something that comes up in discussions of the right way to model “volatile” register accesses in Rust (e.g. here). UnsafeCell is not #[repr(C)], so it can’t be used to safely model parts of layout-sensitive aggregate types. What do I mean by this? Here is a lightly-contrived example derived from @huon’s suggestions from that thread:

#[repr(C, packed)]
struct Volatile<T> {
  val: UnsafeCell<T>,
}

impl<T> Volatile<T> {
  pub fn get(&self) -> T { ... }
  pub fn set(&self, v: T) { ... }
}

#[repr(C, packed)]
struct Registers {
  a: Volatile<u32>,
  b: Volatile<u32>,
  ...
}

UnsafeCell is more complicated than PhantomData because its ability to be #[repr(C)] depends on whether its type parameter is #[repr(C)].

Like PhantomData, UnsafeCell is a lang item, so I can’t replace it (at least if I want to link with core).

So:

  • Is there a way we could enable the use of phantom type parameters in #[repr(C)] types? For example, could we mark PhantomData as #[repr(C)] – would this hurt anything?

  • Should there be a way to indicate conditional #[repr(C)] that can degrade, e.g. "UnsafeCell<T> is #[repr(C)] iff T is #[repr(C)]?" Sort of like noexcept in modern C++. (In practice, improper_ctypes being a warning rather than an error by default almost achieves this for all uses of #[repr(C)]… but that’s not the right answer.)


#2

The point of #[repr(C)] and improper_ctypes is that you are supposed to put a clear boundary between Rust land and C land. The fact that PhantomData is zero-sized and that UnsafeCell only contains one member are supposed to be implementation details. Reading from C a struct that contains anything else than pointers and numbers should be undefined.

Of course I’m inventing a bit here, because Rust doesn’t have well-defined rules about what is safe and unsafe in unsafe land, but what I described is as far as I know the intention behind this design.

For example in your situation you can use UnsafeCell<Registers> instead of putting the UnsafeCell inside Registers.


#3

Ask for the reason, and if it’s not good enough, ask for an ER.


#4

As I noted, #[repr(C)] is currently overloaded. My program in question contains no C. It does, however, contain memory-mapped register banks laid out in a particular order. #[repr(C)] is the only way to ensure that struct members are laid out as-written.

Including a PhantomData should not affect the layout/size of the struct, and is the only way to include an otherwise unbound type parameter. But it strips #[repr(C)].

…no, that just moves the improper_ctypes warning to UnsafeCell<Registers>. A struct containing UnsafeCell and an UnsafeCell containing a struct are both improper ctypes.

Edit: Okay, I should go into more detail on why I’m not happy with this option.

  1. The registers are eventually going to be represented as a #[no_mangle] extern static, because that’s how I expose a symbol to the linker script. I could suppress the improper_ctypes warning on this static item. I’m uncomfortable about that, because nothing about the current definition of UnsafeCell seems to prevent it growing a field or alignment constraint that would break my memory layout – it is not #[repr(C, packed)]. So the warning is actually doing its job.

  2. From a separation-of-concerns perspective, modeling registers using a VolatileCell type that wraps UnsafeCell and handles the unsafe volatile accesses is really quite pleasant, and is far less error-prone than nesting things the other way – a struct within an UnsafeCell, which would presumably need to be nested in another struct to keep clients from using UnsafeCell's operations directly. With the VolatileCell approach, one only has to review a few lines of code to ensure that all register accesses are using the proper volatile access intrinsics.

Fair enough! What is the reason?