Summary
Allow Rust code to define dynamically sized types with custom thick (and thin)
pointers, and define slice functions in terms of these, instead of transmute.
Also, convert the CStr type to use this functionality,
and make it a thin pointer; this will allow use with FFI.
Motivation
As of right now, the lack of custom DSTs in Rust means that we canât communicate
with C in important ways - we lack the ability to define a CStr in the
language such that &CStr is compatible with char const *,
and we lack the ability to communicate nicely with C code that uses
Flexible Array Members.
This RFC attempts to fix this, as well as introduce more correctness to existing practices.
Apart from FFI, it also has usecases for indexing and slicing 2-d arrays.
Guide-level explanation
Thereâs a new language trait in the standard library, under std::ops:
unsafe trait DynamicallySized {
type Metadata: 'static + Copy;
fn size_of_val(&self) -> usize;
fn align_of_val(&self) -> usize;
}
with an automatic implementation for all Sized types:
unsafe impl<T> DynamicallySized for T {
type Metadata = ();
fn size_of_val(&self) -> usize { size_of::<T>() }
fn align_of_val(&self) -> usize { align_of::<T>() }
}
If you have a type which you would like to be unsized,
you can implement this trait for your type!
#[repr(C)]
struct CStr([c_char; 0]);
unsafe impl DynamicallySized for CStr {
type Metadata = ();
fn size_of_val(&self) -> usize { strlen(&self.0 as *const c_char) }
fn align_of_val(&self) -> usize { 1 }
}
and automatically, your type will not implement Sized.
The existing DynamicallySized types will continue to work;
if one writes a DynamicallySized type T,
and then wraps T into a struct, theyâll get the obvious semantics.
struct Foo {
x: usize,
y: CStr,
}
// size_of_val(&foo) returns size_of::<usize>() + size_of_val(&foo.y)
// same with align_of_val
More Examples
Non-trivial types
For non-trivial types (i.e., those that have a destructor), Rust generates the obvious destructor
from the definition of the type itself - i.e., if you hold a Vec<T> in your type, Rust will destroy it.
However, if your type contains additional data that Rust doesnât know about, youâll have to destroy it yourself.
#[repr(C)] // we need this to be laid out linearly
struct InlineVec<T> {
capacity: usize,
len: usize,
buffer: [T; 0], // for offset, alignment, and dropck
}
unsafe impl<T> DynamicallySized for InlineVec<T> {
type Metadata = ();
fn size_of_val(&self) -> usize {
Self::full_size(self.capacity)
}
fn align_of_val(&self) -> usize {
Self::full_align()
}
}
impl<T> Drop for InlineVec<T> {
fn drop(&mut self) {
std::mem::drop_in_place(self.as_mut_slice());
}
}
impl<T> InlineVec<T> {
// internal
fn full_size(cap: usize) -> usize {
std::mem::size_of_header::<Self>() + cap * std::mem::size_of::<T>()
}
fn full_align() -> usize {
std::mem::align_of_header::<Self>().max(std::mem::align_of::<T>())
}
pub fn new(cap: usize) -> Box<Self> {
let size = Self::full_size(cap);
let align = Self::full_align();
let layout = std::alloc::Layout::from_size_align(size, align).unwrap();
let ptr = std::raw::from_raw_parts_mut(std::alloc::alloc(layout) as *mut (), ());
std::ptr::write(&mut ptr.capacity, cap);
std::ptr::write(&mut ptr.len, 0);
Box::from_raw(ptr)
}
pub fn len(&self) -> usize {
self.len
}
pub fn capacity(&self) -> usize {
self.capacity
}
pub fn as_ptr(&self) -> *const T {
&self.buff as *const [T; 0] as *const T
}
pub fn as_mut_ptr(&mut self) -> *mut T {
&mut self.buff as *mut [T; 0] as *mut T
}
pub fn as_slice(&self) -> &[T] {
unsafe {
std::slice::from_raw_parts(self.as_ptr(), self.len())
}
}
pub fn as_mut_slice(&mut self) -> &mut [T] {
unsafe {
std::slice::from_raw_parts(self.as_mut_ptr(), self.len())
}
}
// panics if it doesn't have remaining capacity
pub fn push(&mut self, el: T) {
assert!(self.size() < self.capacity());
let ptr = self.as_mut_ptr();
let index = self.len();
std::ptr::write(ptr.offset(index as isize), el);
self.len += 1;
}
// panics if it doesn't have any elements
pub fn pop(&mut self) -> T {
assert!(self.len() > 0);
self.len -= 1;
let ptr = self.as_mut_ptr();
let index = self.len();
std::ptr::read(ptr.offset(index as isize))
}
}
Reference-level explanation
In addition to the explanation given above,
we will also introduce three functions into the standard library,
in core::raw, which allow you to create and destructure these
pointers to DynamicallySized types:
mod core::raw {
pub fn from_raw_parts<T: DynamicallySized>(
ptr: *const (),
meta: <T as DynamicallySized>::Metadata,
) -> *const T;
pub fn from_raw_parts_mut<T: DynamicallySized>(
ptr: *mut (),
meta: <T as DynamicallySized>::Metadata,
) -> *mut T;
pub fn metadata<T: DynamicallySized>(
ptr: *const T,
) -> <T as DynamicallySized>::Metadata;
}
and we will introduce two functions into core::mem, to help people write types with Flexible Array Members:
mod core::mem {
pub fn size_of_header<T: DynamicallySized>() -> usize;
pub fn align_of_header<T: DynamicallySized>() -> usize;
}
These functions return the size and alignment of the header of a type; or, the minimum possible size and alignment, in other words. For existing Sized types, they are equivalent to size_of and align_of, and for existing DSTs,
assert_eq!(size_of_header::<[T]>(), 0);
assert_eq!(align_of_header::<[T]>(), align_of::<T>());
assert_eq!(size_of_header::<dyn Trait>(), 0);
assert_eq!(align_of_header::<dyn Trait>(), 1);
// on 64-bit
assert_eq!(size_of_header::<RcBox<dyn Trait>>(), 16);
assert_eq!(align_of_header::<RcBox<dyn Trait>>(), 8);
Notes:
- names of the above functions should be bikeshed
-
extern types do not implement DynamicallySized, although in theory one
could choose to do this (that usecase is not supported by this RFC).
-
T: DynamicallySized bounds imply a T: ?Sized bound.
We will also change CStr to have the implementation from above.
On an ABI level, we promise that pointers to any type with
size_of::<Metadata>() == 0
&& align_of::<Metadata>() <= align_of::<*const ()>()
are ABI compatible with a C pointer - this is important,
since we want to be able to write:
extern "C" {
fn printf(fmt: &CStr, ...) -> c_int;
}
Unfortunately, we wonât be able to change existing declarations in libc
without a new major version.
as casts continue to allow
fn cast_to_thin<T: DynamicallySized, U: Sized>(t: *const T) -> *const U {
t as *const U
}
so we do not introduce any new functions to access the pointer part of the thick pointer.
Drawbacks
- More complication in the language.
- Lack of a
Sized type dual to these unsized types â
the lack of a [u8; N] to these typesâ [u8] is unfortunate.
- Inability to define a custom DST safely
- The
size_of_val and align_of_val declarations are now incorrect;
they should take T: DynamicallySized, as opposed to T: ?Sized.
Rationale and alternatives
This has been a necessary change for quite a few years.
The only real alternatives are those which are simply different ways of writing
this feature. We need custom DSTs.
We also likely especially want to deprecate the current ?Sized behavior of size_of_val and align_of_val, since people are planning on aborting/panicking at runtime when called on extern types. Thatâs not great. (link)
Prior art
(you will note the incredible number of RFCs on this topic â we really need to fix this missing feature)
Unresolved questions
-
How should these thick pointers be passed,
if they are larger than two pointers? (repr(Rust) aggregate)
-
Are std::raw::ptr and std::raw::ptr_mut necessary?
You can get this behavior with as. (removed)
Future possibilities
- By overloading
DerefAssign, we could add a BitReference type