Motivation
I have recently become the Nth person to run into difficulties during an FFI binding due to the inability to put unions in a #[repr(C)]
struct. My use case calls for very little interaction with the unions, but I do still need to access struct fields after the union-typed members, which requires Rust to be able to determine the size and alignment of the union. There have been quite a few union proposals in the past, but all of them have died due to not building consensus; this proposal is optimized for its ability to achieve consensus, which means doing as little as possible while at the same time being extensible in many ways so that when Rust has a much more ambitious union system, this proposal will appear as just a special case rather than a deprecated wart.
My intention in sending this is to build consensus behind some proposal, ideally to the point where anyone who gets the itch can implement it and expect it to be merged (subject to quality-of-implementation checks).
Detailed design
Syntax
We add #[repr(union)]
as a new attribute. It may only be used on enums, and only in conjunction with #[repr(C)]
.
#[repr(union)]
#[repr(C)]
enum my_ffi_union {
branch_a { ptr: usize },
branch_b { bits: [u16; 3] },
}
(rationale: Because this is only allowed when #[repr(C)]
is already specified, this can be considered a proposal to extend FFI rather than a proposal for unions.)
Semantics
#[repr(C)]
unions follow the target platform’s C ABI rules for unions. In most cases this will require the union to be as large as the largest branch, be as aligned as the most aligned branch, and have all branches at an offset of 0.
Because #[repr(C)]
unions have a layout constrained by the C specification and precisely specified by platform-specific ABI documentation, they can validly be used with raw pointer casts and mem::transmute_copy
. In fact, this is necessary for many common tasks due to the minimalism of this proposal.
#[repr(C)]
unions do not implement Drop, and their branches must be Copy. Adding a type with a nontrivial drop semantics to a union results in an ill-formed type. (rationale: There are again several proposals here. Forbidding it at compile time maximizes forward compatibility.)
(rationale: The rule on requiring Copy on branches is quite strict, and it could possibly be avoided in some cases, especially if we were to add linear types. In a world with mem::forget
and half a dozen ways to leak memory, linear types are primarily a lint. C++11’s unrestricted unions patch appears to specify that if a union contains a branch with a nontrivial destructor, the union becomes a linear type, and must be destructured to a specific branch before being deleted; I would be quite fine with that, except that we don’t have linear types now and making a type affine now but linear tomorrow would be a breaking change. So, for now I propose to forbid unions where any branch has a nontrivial destructor. Copy versus !Drop remains as a question, but seems low enough impact that it can be left to the implementor’s discretion.)
Union constructors may not be used for pattern matching under any circumstance. This includes derive-generated code, so most derivations are not applicable to unions. (rationale: There are viable proposals for pattern matching unions, but they add quite a bit of compiler complexity that is not needed for a first pass.)
Union constructors may be used as functions, and fill the remainder of the union with mem::uninitialized()
. (rationale: mem::transmute_copy
cannot be used to expand a value, so otherwise constructing a value of FFI union type would require an awkward dance with mem::uninitialized
and raw pointer casts. Still, this can be omitted if it proves difficult.)
Extracting data from a union, if it is to be done, must use raw pointer casts or mem::transmute_copy
as no other method is specified at this time.
fn mk_branch_a(ptr: usize) -> my_ffi_union {
branch_a { ptr: ptr }
}
fn mk_branch_a_manual(ptr: usize) -> my_ffi_union {
// not using the constructor, to demonstrate that constructors are a severable part of this proposal
unsafe {
let mut tmp: my_ffi_union = mem::uninitialized();
*(&tmp as *mut my_ffi_union as *mut usize) = ptr;
tmp
}
}
unsafe fn unmk_branch_a(un: my_ffi_union) -> usize {
mem::transmute_copy(un)
}
Drawbacks
If this proposal is adopted and in the future, a desire exists to add a featureful native Rust notion of unions, there will be strong pressure to have the future unions subsume the unions proposed herein. As such, this proposal somewhat constrains the design of future unions. Efforts have been taken to make this proposal as unopinionated as possible, to minimize the impact of said constraint.
Alternatives
Some way of creating a type with the shape of a C union is necessary.
A much more ambitious proposal could exist which defines both Rust and FFI unions. In fact, several of them already do.
rust-lang/rfcs#371 is an interesting example of a very minimal proposal, but it creates new syntax that has no credible evolution to a full union system, so it seems much more problematic to stabilize as a language feature.
If we had a featureful type-level constants system with const fn
integration, a very crude approximation of this could be done as a library:
trait HasType { type TYPE; }
struct ForAlign<NN: i32>;
impl HasType for ForAlign<1> { type TYPE = u8; }
impl HasType for ForAlign<mem::align_of::<u16>() == 2 ? 2 : -1> { type TYPE = u16; }
impl HasType for ForAlign<mem::align_of::<u32>() == 4 ? 4 : -2> { type TYPE = u32; }
impl HasType for ForAlign<mem::align_of::<u64>() == 8 ? 8 : -3> { type TYPE = u64; }
const fn is_power_of_two(x: usize) -> bool { (x & (x - 1)) == 0 }
impl HasType for ForAlign<(mem::align_of::<usize>() > 8 || !is_power_of_two(mem::align_of::<usize>())) ? mem::align_of::<usize>() : -4> { type TYPE = usize; }
type SomethingOfAlign<NN: u32> = ForAlign<NN>::TYPE;
struct AlignedBuffer<ALIGN, MIN_BYTES> {
_array: [ SomethingOfAlign<ALIGN>; (MIN_BYTES / ALIGN).ceil() ],
}
struct Union2<BRANCH_1, BRANCH_2> {
_padding: AlignedBuffer<cmp::max(mem::align_of::<BRANCH_1>(),mem::align_of::<BRANCH_2>()),
cmp::max(mem::size_of::<BRANCH_1>(),mem::size_of::<BRANCH_2>())>,
}
However, this would have a poor integration with compiler lints, and since some C ABIs have specific requirements for unions (IIRC), it would spread C ABI knowledge between #[repr(C)]
structs in the compiler and this external library. I’d much prefer to have the ABI knowledge in one place.
Unresolved questions
Copy
or !Drop
?
Should we allow multiple fields in branches? It would be somewhat more C-y to restrict branches to one field each, and force users to define structs if they want multiple fields. On the other hand, enforcing that seems user-hostile and complicates the compiler for no identifiable gain.