Pre-RFC: unsafe enums. Now including a poll!

I had an RFC before for unsafe enums: https://github.com/rust-lang/rfcs/pull/724

Unfortunately it was closed and postponed, but now that we’ve gone through several stable versions of Rust, maybe it is time to reconsider it, especially since there still isn’t any sort of nice way to deal with C unions in Rust for FFI purposes. Also because the latest core team meeting expressed an interest in finally paying attention to people trying to do FFI from Rust. Here are the options:

  • Unsafe enums with pattern destructuring
  • Unsafe enums with direct field access
  • Repr union structs with direct field access
  • I think people who use unions deserve to suffer with their terrible macro solutions

0 voters

Unsafe enums with pattern destructuring:

unsafe enum MyUnion { Foo(i32), Bar { x: i32, y: f32 } }
let thing = MyUnion::Foo(5);
let x = unsafe { let MyUnion::Foo(x) = thing; x }

Unsafe enums with direct field access:

unsafe enum MyUnion { Foo(i32), Bar { x: i32, y: f32 } }
let thing = MyUnion::Foo(5);
let x = unsafe { thing.Foo.0 };

Repr union structs with direct field access:

#[repr(union)] struct MyUnion { foo: i32, bar: (i32, f32) }
let thing = MyUnion { foo: 5 };
let x = unsafe { thing.foo };

Keep in mind that if we do go with the unsafe enums approach, both pattern destructuring and direct field access can coexist.

So, let’s get this thing rolling again and try to finally support unions properly.

1 Like

Overall I like this idea.

One point that doesn’t seem to be covered in the old RFC is whether it is legal to access a different variant than the one that was used for initialization. (No one can ever seem to figure out whether it’s legal in C, so I think it should be spelled out here.)

On the subject of variant bounds: first, I don’t think it is possible to make tight enough bounds to make access safe without restricting the type to uselessness, because initializing with one variant and reading a larger one would presumably give undefs. Second, I’d actually be interested in having no restriction and allowing types that are normally dropped in variants, as I think this would allow implementation of ManuallyDrop entirely in library code. (I’ve used the equivalent construction in C++ for that purpose.)

I'm in favor of this general move. It seems clear that we will need this eventually for smooth integration. This part of the RFC gave me pause:

Due to the lack of a discriminant there is no way for Rust to know which variant is currently initialized, and thus all variants of an unsafe enum are required to be Copy or at the very least not Drop.

I'm not sure that special rules are needed regarding Copy. The usual rules for an enum is that it can be made Copy iff all the variants are Copy, and it seems like that applies equally well here -- that is, copying/moving does not require knowing which variant the enum is, so I think it works the same.

Drop otoh is trickier. However, it seems to me it is ok for some of the data in the unsafe enum to be Drop if you define your own Drop impl. But we really want that impl to be the one in charge of dropping the contents. Presumably we handle this by saying that the compiler never drops the contents of an unsafe enum, and (probably) if the contents are not Copy then you must implement Drop.

Now obviously there is overlap (as has been pointed out numerous times) with the ManuallyDrop proposal. I've forgotten now -- are unsafe enums strictly more general? Are there cases they don't handle? @huon?

Hm, I suspect this isn't good enough: I imagine there are often times when there isn't enough information in the enum value itself to work out how to drop it, i.e. the discriminant is stored externally. This will be for cases when an unsafe enum is used for precise pure-Rust data layout control, rather than FFI.

We may just want to just lint or even do nothing about this sort of thing rather than disallow it at a language level (skipping destructors isn't unsafe after all).

There is some subtleties around data representation that may not be desirable for unsafe enums to handle, e.g. if you have unsafe enum Foo { X(&'static u8) } or even unsafe enum Foo { X(&'static u8), Y(Box<u8>) }, can the compiler do the NPO for Option<Foo>?

I don't see the problem there -- just define Drop as a no-op and handle it elsewhere. But yeah maybe a lint is better. The language rules to prevent it seem pretty hokey.

Seems... clear to me that it CAN, since the user has given us the range of possible values. Now unsafe enum Foo { X(&u8), Y(usize) } would be another thing. But I guess the question is WILL we? (I'm not sure whether we optimize analogous cases with enums today) Put another way, this seems like a case that is not really different between safe-and-unsafe enums, but perhaps has more importance to unsafe enum users (since they are likely to be interacting with FFI).

Probably not, but it'd be pretty neat to be able to do something like that, even reordering the fields of variants to get NonZeros to all line up.

Hm, this brings up a question: how undefined should the representation of unsafe enums be? Should they be automatically implicitly #[repr(C)] to disallow moving the fields of variants around? Having the attribute (explicit or implicit) is presumably necessary to get guarantees for FFI (although in practice I imagine most FFI-unsafe enums have variants with a single field, either a primitive or some type/struct from the C library), and people doing some seriously crazy stuff in pure-Rust may assume things match source order?

Well, it's more important for unsafe enums being a replacement for ManuallyDrop, since theoretically an Option<Foo> would never appear in FFI as Option itself isn't FFI-safe. ManuallyDrop<T> is explicitly designed/intented for storing possibly invalid instances of T (my new preferred name is Opaque<T>, which reflects this better), and so the NPO cannot happen.

This, I don't think, is really true in practice. That is, I think people use Option<&T> in extern "C" fn declarations and the like. @wycats argues explicitly (and persuasively) for this pattern, in fact.

An Option explicitly wrapping a pointer is a special case in my mind, I specifically don’t think Option<NonTrivialType> will (well, should) appear in FFI.

Agreed.

Regarding the Copy and !Drop requirements, I’d be okay with making them lints, but possibly deny by default. If Drop was allowed, then it would basically be a form of ManuallyDrop.

As for whether to use #[repr(C)], I think you should still have to specify #[repr(C)] if you intend to have explicit control over the layout or for FFI purposes, otherwise it would be inconsistent with structs. Maybe have a warning if you don’t put #[repr(C)].

Accessing a variant other than the one you initialized is a tricky question. On one hand I don’t want any sort of guarantee that you can do that, since it would prevent things like not initializing unused space in a variant for performance reasons. On the other hand people might have valid use cases for it, just like people have valid use cases for transmute. Perhaps define it as having the same defined/undefined behavior as transmute? Although with a bit of an extension since this would allow converting between different sized types.

Sorry of this is off topic, But has anyone thought about FFI for the case for “discriminated unions” like:

typedef struct {
    uint8_t tag;
    union {
        bar_t bar;
        baz_t baz;
    };
} foo_t;
#[repr(C)]
unsafe enum foo_t_union {
    bar(bar_t),
    baz(baz_t),
}
#[repr(C)]
struct foo_t {
    pub tag: uint8_t,
    pub u: foo_t_union,
}

It is a bit more verbose unfortunately.

@retep998 Well that represents the foreign type correctly, but I want to have pattern matching an other nice things—trust the C code that it is using its tag correctly.

I was thinking something like

#[repr(CManual(u8))]
enum Foo {
    bar(Bar) = 1,
    baz(Baz) = 2,
}

@Ericson2314 There are a lot of structs out in the C world that have a bunch of fields, a tag somewhere in there, and a union, or maybe even multiple unions, or even nested unions. Supporting all those cases with all your sugar would be massively complicated. While it would be nice to have sugar like that, and maybe one day we can have such sugar, at the moment I just want to get something that lets this be done at all in a flexible manner, without manually calculating alignment + size and calling transmute all the time.

That’s fair. Having enums desugar to unsafe unions + structs + magic safety assertions could be a good way to bridge the gap. Last I heard the MIR’s desguaring of match into switch would somewhat dovetail with that.

One quick point about sugar for pattern matching and so on: as @retep998 points out, there are lots of ways you might be encoding a tag, and more generally, there are many cases where you’d like to provide pattern matching for non-traditional enums, separate from C interop.

If we wanted to address these cases, I’d suggest looking into something like Scala’s unapply, which provides a very slick way to do custom pattern matching in general.

3 Likes

This seems to me like the worst of both worlds: You've lost any guarantees you had about cleaning up the contents (since the Drop impl can be defined as a no-op) and you can't use the type in places that forbid destructors (which would limit the usefulness of a ManuallyDrop built on top of this).

Being a Scala developer, I can’t plus enough about unapply support for generic pattern matching implementation. I think having something like special lang-item Match trait would is really nice:

#[lang = "pattern_match"]
trait Match {
  type Result;
  fn unapply(&self) -> Option<Result>();
}

struct AStruct {
  name: String,
  count: usize
}

impl Match for AStruct {
  type Result = (&str, usize);
  fn unapply(&self) -> Option<(&str, usize)> {
    (&*self.name, self.count)
  }
}

// later
let s = AStruct { name: "Teddy".into(), count: 127 };
match s {
  // calls AStruct::unapply(&s)
  AStruct { name, count } => {
    // types:
    // name: &str, count: usize
  }
}

Upd. This would be even more useful to allow pattern matching opaque structs with private fields, provided by some third-party libs.

If you want to be able to Drop an unsafe enum you should probably encapsule it in a struct (that provides a “safe” interface).

I wouldn’t want to encourage the use of unsafe enums for other things than low-level stuff.