I’ve been feeling the need for something like this to allow opaque references to C/C++ types from rust code with standard rust syntax (like &T
, and &mut T
) while writing bindings to XPCOM’s string types for rust code in Gecko.
The idea behind this pre-RFC would be to provide a minimal feature which should hopefully be forward compatible with future work in the space of custom fat pointer layouts, which I get the impression is very far away.
Summary
Extend rust’s typesystem with the concept of an “Opaque Struct”, which is
!Sized
, and has thin references and pointers. This type would be useful to
represent FFI and other unsafe pointers to objects which have variable or
unknown sizes, such as C++ base objects.
This could be seen as a backwards-compatible, minimal, FFI focused step towards flexible dynamically sized types such as arbitrary custom fat pointer layouts like those proposed in rust-lang/rfcs#1524.
This RFC is not intended for use for defining unsized types in rust code, and is instead focused on FFI to languages like C and C++. However, it should be fairly straightforward (I think) to make it backwards compatible with any sufficiently powerful custom unsized type system which is developed for rust in the future.
Motivation
Currently when representing a type for which the definition is opaque to Rust code, the option of choice is to use an empty enum behind a raw pointer. For example, bindings may look something like the following:
enum Foo {}
extern "C" {
fn GetAFoo() -> *const Foo;
}
This type has the advantage of being unable to be created in rust, but has a
serious disadvantage: namely that references to it cannot be easily managed
through rust’s lifetime system. A reference &'a Foo
is very unsafe, because
it can be dereferenced by safe code to get a value of type Foo
, which is
equivalent to !
, generating the unreachable
llvm intrinsic when matched
against.
Other options, such as providing a dummy struct representation, also fall down
in front of methods such as std::mem::swap
. For example:
mod f { pub struct Foo(u8); /* ... */ }
use f::Foo;
// ...
fn foo_user(a: &mut Foo, b: &mut Foo) {
std::mem::swap(a, b);
// The first byte in the representations of a and b were just swapped. UB heyo!
}
Using a zero-sized type also has hurdles, due to the improper_ctypes
lint
which complains about using these types anywhere within FFI function signatures,
including behind raw *const
and *mut
pointers.
Instead, the safest option currently is to define unique pointer types which are
used instead of the standard rust &
and &mut
types which wrap around
*const
and *mut
pointers, which is very verbose, error prone, and causes
consumers of your library to write ugly code. For example, instead of writing:
fn use_some_foos<'a, 'b>(a: &'a Foo, b: &'b mut Foo) { /* ... */ }
The user would have to write:
fn use_some_foos<'a, 'b>(a: FooRef<'a>, b: FooRefMut<'b>) { /* ... */ }
Which are custom wrapper types which act like the usual &
and &mut
types.
This also means that much generic rust code will not work on these types, as it
expects normal references. As an explicit example, Deref
cannot produce one
of these references.
The proposed type has the properties which you would want for a type which
represents these opaque data structures. It is !Sized
so operations like
swap
are not possible to be called on it, it is unconstructable in rust code
(having to be passed in by reference from unsafe code like C++ bindings), does
not cause the optimizer to assume that code is dead when it sees a reference to
it, and it has a thin pointer representation, which is desirable, because for
many of these pointers there is no extra word of information which we would want
to store alongside the pointer.
This type also has the advantage of not affecting unsize coercion, which is a
thorny subject, as there is no base type which can unsize coerce to this type.
References to objects which contain Opaque
must be created manually by unsafe
code.
Detailed design
This design is up to heavy debate. It is a strawman implementation of one way in which types like this could be implemented.
This RFC introduces a new lang item, opaque_struct
, and a struct implementation
in core::marker
to match it:
#[lang = "opaque_struct"]
#[repr(C)]
pub struct Opaque(()); // strawman implementation
This type would be !Sized
, and references to it would be represented as thin
pointers. Consumers of Opaque would use the type as follows:
use std::marker::Opaque;
#[repr(C)]
struct MyOpaqueStruct(Opaque);
This type would also be !Sized
as it contains an unsized member, and would be
unconstructable, because it is impossible to construct an object of type
Opaque
.
The Opaque
type may also be used as the last object in a #[repr(C)]
struct
definition, which represents a known type prefix, followed by some unsized
opaque data. This type would also be unconstructable, as its Opaque
member
could not be constructed.
It would be a compile time error to include a member of type Opaque
within a
non #[repr(C)]
struct, or as any member other than the last member of a
#[repr(C)]
struct, as the layout cannot be known, and thus any code which uses
it is inherently unsafe.
The Opaque
type would not raise an improper_ctypes
warning when included in
an extern "C"
function signature.
size_of_val()
is defined on all references, including references to Opaque
.
To get around this issue, size_of_val()
for Opaque
will be implemented as:
fn size_of_val(_: &Opaque) -> usize {
panic!("size_of_val is meaningless on Opaque objects")
}
See Unresolved Questions for alternatives.
This feature would be gated behind the opaque_struct
feature gate.
Drawbacks
This adds complexity to the compiler in the form of an additional lang item, and a special case of a thin pointer to an unsized object.
Alternatives
We could not implement this. If this is the case, creators of libraries which wrap third party libraries will have to perform more work to write safe wrappers of opaque types in C++ and similar languages.
We could relax the improper_ctypes
lint to allow the use of ZSTs in FFI
signatures behind raw pointers. This would allow the development of safe APIs,
except that they would be pretending that their objects point to objects of zero
size, rather than unsized objects, which is more the truth, causing the behavior
of swap()
, for example, to be surprising (as the function would do nothing).
Much of the goals of this RFC could be achieved by
rust-lang/rfcs#1524, which is a
much larger and more complicated RFC providing the possibility to implement
custom fat pointer types. That RFC is definitely more flexible than this one,
but adds more complexity to the language. If this RFC is accepted, Opaque
could be implemented on top of it as a utility for developers of wrappers around
FFI types.
Unresolved questions
-
Can this be shaped as a stabilizable subset of something like rust-lang/rfcs#1524 in order to make thin references to dynamically sized types possible, in a backwards compatible way, before having to figure out how custom dynamically sized types affect things such as unsizing rules?
-
Instead of a lang item, a
#[opaque_struct]
attribute which could be applied to#[repr(C)]
structs could be used, which makes the struct!Sized
, and have a thin pointer representation. -
Instead of a lang item, the unsizedness could come from an
impl !Sized for ... {}
implementation. -
What should the definition of the
Opaque
type instd::marker
look like? There is no good analogue to the type in question. -
Is there a situation where we would want to allow an
Opaque
type within a non-#[repr(C)]
struct, and what would that imply in terms of layout? -
What traits should
Opaque
implement? It could be reasonable for the type to implementDebug
, such that it is possible to deriveDebug
on headers of objects which end with anOpaque
object. Are there others which should be implemented? -
What should the value of
size_of_val
be for one of these types? Should it be user-definable (perhaps through animpl !Sized for ... {}
)? If we choose a number, say 0, instead of panicing, should the answer be 0 or 1 for#[repr(C)] struct CStr(u8, Opaque);
? -
A new OIBIT, strawman-named
DynamicSized
could be introduced which is implemented for all types which do not containOpaque
.size_of_val
would then require thatDynamicSized
be implemented for types which are passed to it to avoid the "what issize_of_val
for a type we know nothing about" problem. This is very unfortunate though, as it means that many functions would need to have their trait bounds changed manually from?Sized
to?DynamicSized
to be able to work with opaque references.