[Pre-RFC] Opaque Structs


#1

I’ve been feeling the need for something like this to allow opaque references to C/C++ types from rust code with standard rust syntax (like &T, and &mut T) while writing bindings to XPCOM’s string types for rust code in Gecko.

The idea behind this pre-RFC would be to provide a minimal feature which should hopefully be forward compatible with future work in the space of custom fat pointer layouts, which I get the impression is very far away.

Summary

Extend rust’s typesystem with the concept of an “Opaque Struct”, which is !Sized, and has thin references and pointers. This type would be useful to represent FFI and other unsafe pointers to objects which have variable or unknown sizes, such as C++ base objects.

This could be seen as a backwards-compatible, minimal, FFI focused step towards flexible dynamically sized types such as arbitrary custom fat pointer layouts like those proposed in rust-lang/rfcs#1524.

This RFC is not intended for use for defining unsized types in rust code, and is instead focused on FFI to languages like C and C++. However, it should be fairly straightforward (I think) to make it backwards compatible with any sufficiently powerful custom unsized type system which is developed for rust in the future.

Motivation

Currently when representing a type for which the definition is opaque to Rust code, the option of choice is to use an empty enum behind a raw pointer. For example, bindings may look something like the following:

enum Foo {}

extern "C" {
    fn GetAFoo() -> *const Foo;
}

This type has the advantage of being unable to be created in rust, but has a serious disadvantage: namely that references to it cannot be easily managed through rust’s lifetime system. A reference &'a Foo is very unsafe, because it can be dereferenced by safe code to get a value of type Foo, which is equivalent to !, generating the unreachable llvm intrinsic when matched against.

Other options, such as providing a dummy struct representation, also fall down in front of methods such as std::mem::swap. For example:

mod f { pub struct Foo(u8); /* ... */ }
use f::Foo;

// ...

fn foo_user(a: &mut Foo, b: &mut Foo) {
    std::mem::swap(a, b); 
    // The first byte in the representations of a and b were just swapped. UB heyo!
}

Using a zero-sized type also has hurdles, due to the improper_ctypes lint which complains about using these types anywhere within FFI function signatures, including behind raw *const and *mut pointers.

Instead, the safest option currently is to define unique pointer types which are used instead of the standard rust & and &mut types which wrap around *const and *mut pointers, which is very verbose, error prone, and causes consumers of your library to write ugly code. For example, instead of writing:

fn use_some_foos<'a, 'b>(a: &'a Foo, b: &'b mut Foo) { /* ... */ }

The user would have to write:

fn use_some_foos<'a, 'b>(a: FooRef<'a>, b: FooRefMut<'b>) { /* ... */ }

Which are custom wrapper types which act like the usual & and &mut types. This also means that much generic rust code will not work on these types, as it expects normal references. As an explicit example, Deref cannot produce one of these references.

The proposed type has the properties which you would want for a type which represents these opaque data structures. It is !Sized so operations like swap are not possible to be called on it, it is unconstructable in rust code (having to be passed in by reference from unsafe code like C++ bindings), does not cause the optimizer to assume that code is dead when it sees a reference to it, and it has a thin pointer representation, which is desirable, because for many of these pointers there is no extra word of information which we would want to store alongside the pointer.

This type also has the advantage of not affecting unsize coercion, which is a thorny subject, as there is no base type which can unsize coerce to this type. References to objects which contain Opaque must be created manually by unsafe code.

Detailed design

This design is up to heavy debate. It is a strawman implementation of one way in which types like this could be implemented.

This RFC introduces a new lang item, opaque_struct, and a struct implementation in core::marker to match it:

#[lang = "opaque_struct"]
#[repr(C)]
pub struct Opaque(()); // strawman implementation

This type would be !Sized, and references to it would be represented as thin pointers. Consumers of Opaque would use the type as follows:

use std::marker::Opaque;
#[repr(C)]
struct MyOpaqueStruct(Opaque);

This type would also be !Sized as it contains an unsized member, and would be unconstructable, because it is impossible to construct an object of type Opaque.

The Opaque type may also be used as the last object in a #[repr(C)] struct definition, which represents a known type prefix, followed by some unsized opaque data. This type would also be unconstructable, as its Opaque member could not be constructed.

It would be a compile time error to include a member of type Opaque within a non #[repr(C)] struct, or as any member other than the last member of a #[repr(C)] struct, as the layout cannot be known, and thus any code which uses it is inherently unsafe.

The Opaque type would not raise an improper_ctypes warning when included in an extern "C" function signature.

size_of_val() is defined on all references, including references to Opaque. To get around this issue, size_of_val() for Opaque will be implemented as:

fn size_of_val(_: &Opaque) -> usize {
    panic!("size_of_val is meaningless on Opaque objects")
}

See Unresolved Questions for alternatives.

This feature would be gated behind the opaque_struct feature gate.

Drawbacks

This adds complexity to the compiler in the form of an additional lang item, and a special case of a thin pointer to an unsized object.

Alternatives

We could not implement this. If this is the case, creators of libraries which wrap third party libraries will have to perform more work to write safe wrappers of opaque types in C++ and similar languages.

We could relax the improper_ctypes lint to allow the use of ZSTs in FFI signatures behind raw pointers. This would allow the development of safe APIs, except that they would be pretending that their objects point to objects of zero size, rather than unsized objects, which is more the truth, causing the behavior of swap(), for example, to be surprising (as the function would do nothing).

Much of the goals of this RFC could be achieved by rust-lang/rfcs#1524, which is a much larger and more complicated RFC providing the possibility to implement custom fat pointer types. That RFC is definitely more flexible than this one, but adds more complexity to the language. If this RFC is accepted, Opaque could be implemented on top of it as a utility for developers of wrappers around FFI types.

Unresolved questions

  • Can this be shaped as a stabilizable subset of something like rust-lang/rfcs#1524 in order to make thin references to dynamically sized types possible, in a backwards compatible way, before having to figure out how custom dynamically sized types affect things such as unsizing rules?

  • Instead of a lang item, a #[opaque_struct] attribute which could be applied to #[repr(C)] structs could be used, which makes the struct !Sized, and have a thin pointer representation.

  • Instead of a lang item, the unsizedness could come from an impl !Sized for ... {} implementation.

  • What should the definition of the Opaque type in std::marker look like? There is no good analogue to the type in question.

  • Is there a situation where we would want to allow an Opaque type within a non-#[repr(C)] struct, and what would that imply in terms of layout?

  • What traits should Opaque implement? It could be reasonable for the type to implement Debug, such that it is possible to derive Debug on headers of objects which end with an Opaque object. Are there others which should be implemented?

  • What should the value of size_of_val be for one of these types? Should it be user-definable (perhaps through an impl !Sized for ... {})? If we choose a number, say 0, instead of panicing, should the answer be 0 or 1 for #[repr(C)] struct CStr(u8, Opaque);?

  • A new OIBIT, strawman-named DynamicSized could be introduced which is implemented for all types which do not contain Opaque. size_of_val would then require that DynamicSized be implemented for types which are passed to it to avoid the "what is size_of_val for a type we know nothing about" problem. This is very unfortunate though, as it means that many functions would need to have their trait bounds changed manually from ?Sized to ?DynamicSized to be able to work with opaque references.


Recent change to make exhaustiveness and uninhabited types play nicer together
#2

I like this idea, since I’m currently working on something that could use it. I personally dislike panic in size_of_val. It’d be better solved by adding SizeOf trait, which would be automatically implemented for everything except Opaque. This would NOT break backward compatibility because size_of_val is already generic and adding requirement which all existing types satisfy would be OK.

In addition to statically preventing panic, it’d be useful in this scenario:

#[repr(C)]
struct MyStruct {
    size: usize,
    opaque: Opaque,
}

impl SizeOf for MyStruct {
    fn size_of(&self) -> usize {
        self.size
    }
}

I’m currently working on something very similar.


#3

@Kixunil Also see the discussion at the related RFCs:

https://github.com/rust-lang/rfcs/pull/1993

https://github.com/rust-lang/rfcs/pull/1861

the second RFC in particular appears to be close to merging.


#4

Only if SizeOf becomes a default bound for all type parameters (like Sized), and furthermore is also assumed in trait objects (i.e., &Trait needs to mean &(Trait+SizeOf)). Otherwise, these functions will break:

fn my_sizeof<T>(x: T) -> usize { size_of_val(&x) }
fn my_sizeof(x: &Debug) -> usize { size_of_val(x) }

#5

In https://github.com/rust-lang/rfcs/pull/1993 this trait is called DynSized - you can read about the implications of its backwards-compatible implementation there. The amount of complexity which it adds to the language is one of the biggest drawbacks of adding this feature.