Currently, an enum type is always statically sized. I propose to allow dynamically sized enum types. This proposal does not allow recursive enum (aka. infinitely sized) types. I use terminology from RFC #2580: Pointer metadata & VTable.
The strategy is to put the discriminant in the pointer metadata if the enum type is dynamically sized. In other words, a pointer of a dynamically sized enum type is a wide pointer. In this way, the subtleties of thin DSTs are circumvented.
The unsizing coercion cannot modify the value behind the reference. Thus, the value representation of the dynamically sized enum and statically sized enum must be identical. Because a statically sized enum value has no metadata, the discriminant has to be stored in the enum value. Would that mean the discriminant is duplicated in both metadata and value? Partially yes, but actually no. Let me explain.
The critical idea is to put the enum tag at the end of an enum value instead of the beginning. The enum tag is "forgotten" when coerced to the dynamically sized enum value, and the metadata is now used when matching the enum value. In this way, the unsizing coercion is always just a transmute for the pointer. It is even possible to construct an enum value by allocating just the variant size, not the maximum size of all variants!
Edit: it turns out this design is inheritly incompatible with CoerceUnsized
because the enum tag cannot be extracted from a possiblely dangling pointer. It should be possible to unsize a reference or a Box
(original layout has to be saved), but doesn't work for a Weak
(because it can be dangling).
Reference-level explanation
Consider this enum:
enum Either<L: ?Sized, R: ?Sized> {
Left(L),
Right(R),
}
enum Foo<T: ?Sized> {
First,
Second(String, T),
}
What would be the Pointee::Metadata
of Either<L, R>
? When the type is dynamically sized (either L
or R
is a DST), it should be isomorphic to Either<L::Metadata, R::Metadata>
.
However, note that the metadata of Foo<DynT>
is not Foo<DynT::Metadata>
, because the String
field of the Second
variant is not needed. The metadata is isomorphic to Option<DynT::Metadata>
.
Also, note that the metadata type should derive the needed trait implementations (Copy + Send + Sync + Ord + Hash + Unpin
) required by the trait bound on the Metadata
, even if the underlying type (Either
or Foo
) doesn't implement those traits.
A possible representation of the type Either<L,R>
is the union of following types:
#[repr(C)]
struct Left {
value: L,
pad: [u8; max(0, size_of_R - size_of_L)],
tag: u8 = 0
}
#[repr(C)]
struct Right {
value: R,
pad: [u8; max(0, size_of_L - size_of_R)],
tag: u8 = 1
}
Then, the size (not counting alignment padding) of Either<L, R>
is max(size_of_L, size_of_R) + 1
.
When a reference of &Either<L, R>
is coerced to &Either<DynL, DynR>
, the pointer is identical, but now the padding and the enum tag are not included in the size. That is, size_of_val(&Either::Left<L>(x) as &Either<DynL, DynR>)
is size_of_L
.
If the metadata type is represented in a Rust code, except that the implementation Pointee
of is language-level, it would be:
#![feature(ptr_metadata)]
#![feature(unsize)]
use std::{ptr::{self, Pointee}, alloc::Layout, marker::Unsize};
enum Either<L, R> {
Left(L),
Right(R),
}
#[derive(Clone, Copy, Ord, Hash)]
enum EitherMetadata<MetaL, MetaR> {
Left(MetaL),
Right(MetaR),
}
impl<L: ?Sized, R: ?Sized> Pointee for Either<L, R> {
type Metadata = EitherMetadata<<L as Pointee>::Metadata, <R as Pointee>::Metadata>;
}
// Specialization is needed when the type is statically sized.
// Note: `EitherMetadata<(), ()>` is not isomorphic to `()`.
impl<L, R> Pointee for Either<L, R> {
type Metadata = ();
}
// Similar to [DynMetadata in std::ptr - Rust](https://doc.rust-lang.org/nightly/std/ptr/struct.DynMetadata.html).
impl<MetaL, MetaR> EitherMetadata<MetaL, MetaR> {
pub fn layout(self) -> Layout {
match self {
EitherMetadata::Left(meta_left) => meta_left.layout(),
EitherMetadata::Right(meta_right) => meta_right.layout(),
}
}
// Similarly,
// pub fn size_of(self) -> usize;
// pub fn align_of(self) -> usize;
}
// The implemetation of the unsizing coercion
impl<L, R> Either<L, R> {
pub fn as_unsize<DynL: ?Sized, DynR: ?Sized>(&self) -> &Either<DynL, DynR>
where L: Unsize<DynL>, R: Unsize<DynR> {
let meta = match *self {
Either::Left(ref left) => EitherMetadata::Left(ptr::metadata(left as &DynL)),
Either::Right(ref right) => EitherMetadata::Right(ptr::metadata(right as &DynR)),
};
unsafe { &*ptr::from_raw_parts(self as *const _, meta) }
}
}
Interaction to the niche filling optimization
As far as I can tell, the niche filling optimization is unchanged. Option<T>
can omit the enum tag if T
has a niche value. The as_unsize
code above works even if the discriminant is stored implicitly in the niche. The critical requirement solved by "storing enum tag at the end" is that the address of the variant value is the same as the address of the enum value.
Additionally, the metadata type (EitherMetadata
in the example above) is a subject of the layout optimization.
For example, the metadata of Option<dyn Trait>
is isomorphic to Option<DynMetadata<dyn Trait>>
, and as a vtable pointer is always non-null, the size of the metadata is the size of the pointer.
Relation to enum variant types
By putting the enum tag at the end, a reference to an enum variant type can be transmuted to the reference of the original enum type (&Sum::A -> &Sum
) while allowing the enum variant type to have a smaller size than the original enum type.
Previous discussions
- Unions with unsized variants · Issue #3041 · rust-lang/rfcs
- unsized_enum - crates.io: Rust Package Registry
- Why can't an enum consist only of a ZST and a single DST? - language design - Rust Internals
- Recursive and unsized enums · Issue #1151 · rust-lang/rfcs
The sealed trait or enum trait idea is also closely related.