eRFC: Minimal Custom DSTs via Extern Type (DynSized)

This is intended to be an eRFC, or perhaps now it would be an MCP, for a short-to-medium-term change to unstable behavior on the way to a long-term solution. cc @Ericson2314

This resulted in a design notes PR: Note design constraints on hypothetical `DynSized` by CAD97 · Pull Request #166 · rust-lang/lang-team · GitHub

Summary

(Temporarily?) Resolve the "what do size_of_val and align_of_val do" question on extern type by punting it to the developer defining that specific type.

This could in the future become an implementation of DynSized and perhaps Pointee for custom metadata. As such, and because this eRFC is specifically intended to be an experimental unstable stepping stone between the current state and eventual stable extern type, we provide two potential spellings: both with and without DynSized.

Motivation

extern type and size_of_val/align_of_val do not play well together. size_of_val is defined over T: ?Sized, so under the current rules, this means that it can be called with an extern type. However, the hole point is that the Rust world doesn't know the size of an extern type, and perhaps that it's actually unknowable!

As of the writing of this eRFC, there is weak team support for making the use of extern type in generics fail to compile via an internal default-bound DynSized trait which extern types do not implement.

The purpose of this eRFC is to allow developers to define extern type which are DynSized (i.e. the size/align is knowable dynamically), thus supporting use cases like a thin CStr or simple custom DSTs.

Guide-level explanation

By default, when you define an extern type, you cannot use this type in a generic context. This is because Rust knows nothing about the type, so it can't monomorphize code to use the type.

In order to use an extern type in a generic context, you need to tell Rust two things: how to determine the size of a pointee and the required alignment of a pointee. If these two facts about the type are known, we say that the type is dynamically sized. (If the size of the type is statically known, i.e. T: Sized holds, then the type is statically sized.)

With DynSized

In order to provide this information, you implement DynSized:

extern {
    type CStr;
}

unsafe impl DynSized for CStr {
    fn align_of_val_raw(_: ()) -> usize { 1 }
    unsafe fn size_of_val_raw(this: *const Self) -> usize {
        libc::strlen(this.cast()) + 1
    }
}

trait DynSized is defined in the implementation-level section.

Without DynSized

In order to provide this information, you supply two items at the definition of your extern type:

extern {
    type CStr {
        fn align_of_val_raw(_: ()) -> usize { 1 }
        unsafe fn size_of_val_raw(this: *const Self) -> usize {
            libc::strlen(this.cast()) + 1
        }
    }
}

The signature of align_of_val_raw is explained in the alternatives section.

By defining size_of_val_raw and align_of_val_raw, Rust now knows enough about the type to monomorphize code. There are a number of restrictions on size_of_val_raw and align_of_val_raw unlike normal functions to facilitate this:

  • align_of_val_raw must be a power of two.
  • size_of_val_raw must be a multiple of align_of_val_raw.
  • size_of_val_raw must accurately represent the size of this instance of the type and the size of the allocated object[1].
  • align_of_val_raw is provided any pointer, potentially even a null pointer[2]. The pointer may be underaligned.[3]
  • size_of_val_raw is provided a raw pointer with read-only permissions (it may not be used to write to the pointee). The pointer may be underaligned[3:1].
  • size_of_val_raw and align_of_val_raw must be pure and not have any observable effects beyond the time it takes to calculate. Calls may be inserted spuriously[4] and calls may be removed[5].
  • The return value of size_of_val_raw and align_of_val_raw must not change for any given instance of the type, even as the instance gets mutated.
  • size_of_val_raw may be called with a pointer to dropped Self[6]. The author of the dynamically sized extern type is responsible to ensure that size_of_val_raw still functions correctly after calling Drop::drop.

If any of these requirements are not met, the program behavior is undefined, even if the type is not used.

The following changes are made to existing functions in the standard library and langugage[7]:

  • Any generics <T: ?Sized> no longer accept extern types that are not dynamically sized.
  • std::mem::size_of_val calls DynSized::size_of_val_raw for extern types.
  • std::mem::align_of_val calls DynSized::align_of_val_raw for extern types.
  • std::mem::size_of_val_raw (and thus Layout::for_value_raw) requires that pointers to an extern type sized tail are pointers to an allocation of a valid but potentially dropped instance of the extern type tail.
  • std::mem::align_of_val_raw becomes safe... modulo concerns about other unsized tails.

Implementation-level explanation

A new default-bound lang item unsafe OIBIT is added: DynSized.

unsafe trait DynSized: ?DynSized {
    fn align_of_val_raw(metadata: <Self as Pointee>::Metadata) -> usize;
    unsafe fn size_of_val_raw(this: *const Self) -> usize;
}

We have Sized: DynSized, and ?DynSized implies ?Sized. DynSized is primarily intended as a forever-unstable implementation trait like Freeze is, but depending on its utility in user code, it may be a candidate for future exposure.

All types except for extern type (and perhaps other future custom DSTs) are DynSized. extern types are not DynSized, and DynSized can be implemented for these types by user code.

The automatic implementation of DynSized implements align_of_val_raw and size_of_val_raw using the implementation of the min_align_of_val and size_of_val intrinsics. (See their current implementation in rustc, which should now bug! when used on ty::Foreign.)

mem::align_of_val_raw and mem::size_of_val_raw are removed/deprecated in favor of DynSized::*. mem::align_of_val and mem::size_of_val forward to DynSized::*.

Drawbacks and rationale

DynSized adds complexity to the language. However, it is the author's belief that DynSized is a simplification over the ad-hoc additional rules required to support the RFC-accepted extern type feature well. Both the current state where size_of_val returns 0 and the proposed state where size_of_val panics and a best-effort lint is provided are annoying footguns. In fact, the existence of a lint may require an implementation that looks a lot like DynSized. A later tentative proposal was to ban the use of extern type in generics altogether; this is what this eRFC is reifying. Even if DynSized is never exposed to the developer, a mechanism that looks very similar to the OIBIT will have to exist anyway, so it makes sense to implement it as one.

If/when DynSized is ever stabilized (which this eRFC is not proposing to necessarily happen), a lot of code will want to change from T: ?Sized to T: ?DynSized. To a first order of approximation, this would likely be most code dealing in just Borrowing<'_, T> and no code dealing in Owning<T>.

Disclaimer: the eRFC author is the author and maintainer of an FFI binding which optionally uses extern type and makes great use of a generic Handle<T: LibExternType>. This necessarily biases the author to the version of this proposal which allows T: ?DynSized bounds in user code.

Alternatives

This eRFC uses DynSized::align_of_val_raw(<T as Pointee>::Metadata) to be compatible with dyn Trait (and future custom fat DSTs), which requires a valid DynMetadata. The "don't say DynSized" way of specifying a dynamically sized extern type shows this in its signature for align_of_val_raw, but if DynSized is purely an implementation detail, it could be an associated const. This would allow the power-of-two alignment to be easily enforced, but would preclude future extensibility using that syntax to custom Pointee metadata.

Portions of the DynSized trait could be made const, and no consideration towards the implications of ~const DynSized has been made by the author.

Pointee and DynSized could be merged into a single type, but this seems undesirable, since truly unsized extern type are still pointees, and it would prohibit Thin from being a simple trait alias.

Also, bikeshedding. At a minimum, since align_of_val_raw takes metadata rather than a pointer, that's probably a bad name; perhaps align_for_metadata?

Prior art

The entire purpose of extern type is to emulate the use of incomplete types in C APIs for typesafe opaque handles. When a type is incomplete in C (or C++), the compiler refuses to emit any glue which requires knowledge about the type T. The only thing which you can do with an incomplete type is talk about pointers to it, and anything you learn about the type is provided by a function. This is what is represented as ?DynSized.

There's two ways main that a C FFI library can use an incomplete type:

typedef struct LIB_SYSTEM LIB_SYSTEM;

// fully opaque; allocated by the lib
LIB_SYSTEM* lib_create_system();
void lib_release_system(LIB_SYSTEM*);

// partially opaque; allocated by the caller
size_t lib_system_size(); // NB: C alloc typically doesn't specify alignment
void lib_create_system(LIB_SYSTEM*);
void lib_release_system(LIB_SYSTEM*);

The latter is primarily used in resource-constrained or hard realtime libraries where introducing a malloc call is undesirable, to allow the caller to use their own allocation strategy, whether that be a stack buffer, arena, or just a call to malloc. With DynSized and custom allocators, a similar pattern becomes possible in Rust[8].

Unresolved questions

  • Unknown unknowns

Future possibilities

  • We could allow explicit implementations of Pointee on extern types to provide custom fat pointer metadata for full custom DST support.
  • Fixing the FFI ecosystem, which is currently using various iterations of patterns for representing opaque structs. Notably, implementing object-safe traits for opaque types represented this way is a giant footgun because it creates a vtable that says the size/align is 0/1. This is probably not a soundness hole, but this relies on the absence of code which tries to do something clever with the size/align, and it is an open question whether accessing past the Sized bytes is allowed by the Rust memory model.

  1. This means that pointer reads within size_of_val_raw bytes must be valid, and reads outside size_of_val_raw bytes are assumed to be invalid. In addition, copying size_of_val_raw bytes must copy the entire object, and size_of_val_raw bytes must be the correct size to allocate/deallocate memory for this instance of this type. ↩︎

  2. Alignment needs to be known before indirection (without looking at the type instance) in order to determine the field offset of an unsized tail. This is most commonly found in the implementation of Rc/Arc, which internally store roughly a pointer to roughly a struct RcInner<T: ?Sized> { ref_count: RefCount, data: T }. Without knowing the alignment of T, it's impossible to offset a pointer to the field in order to align_of_val it. ↩︎

  3. Consider #[repr(packed)] struct Packed { tail: ExternTail }. ↩︎ ↩︎

  4. Being able to insert spurious calls is required for common code motion optimizations, but the compiler will attempt to avoid inserting unnecessary calls, as calculating the size/align is allowed to be nontrivial (e.g. strlen walks the length of a null-terminated string). ↩︎

  5. E.g. when std::mem::size_of is called multiple times, it is an allowed optimization to remember the results of the first call and reuse them, rather than recalculating the size. ↩︎

  6. This is for the purpose of deallocation, and specifically for Rc/Arc. The order of deallocating a Box<T> calls drop_in_place(*mut T) first and then deallocate(*mut T). For Box, since these are done linearly in a single function, it could just calculate the Layout::for_value(&T) before dropping the T. However, Rc works differently because of Weak. When the last Rc is dropped, drop_in_place(*mut T) is called. Then, later when the last Weak is dropped, deallocate(*mut RcInner<T>) is called. This means that Weak needs some way to get the size/align of RcInner<T> after the T has been dropped. The simplest way is to allow getting size/align from a dropped T, but there are alternatives[7:1]. ↩︎

  7. When the author previously talked to T-lang about these requirements on size/align (w.r.t. Weak::as_ptr and friends) there was weak consensus that requiring statically known alignment and retrieving size from dropped T was a reasonable restriction. However, alternative libs designs were discussed such that this guarantee was not finalized yet: Rc could store a pointer directly to T and use reverse offsets for the refcount to avoid needing to know the alignment statically, and dropping the T in RcInner<T> could overwrite it with Layout::for_value(&T) such that Weak can just read that on Drop, or additional fields could be added to the header data to store size/align as required. ↩︎ ↩︎

  8. The need for this approach is limited in pure-Rust applications, because Rust is statically linked by default and does not consider changing the size of a struct as a breaking change, unlike in C where changing the size of a public struct is an ABI breaking change that can cause silent UB when upgrading dynamic libraries without recompiling the world. However, Rust is still used for FFI where this is a concern and should be able to interoperate with C libraries following this pattern. ↩︎

8 Likes

These two requirements may be in conflict depending on the language for which you write the bindings.

Many foreign types may need to be aligned. In those cases I think it should rather be forbidden to construct a packed version. That is unsafe anyway.

I think it's impossible for Rust to bind to those languages already (if I recall correctly this is the whole size != stride thing), so that's not really a concern.

size_of_val_raw may be called with a pointer to dropped Self

This just still feels weird to me... Is there anything else in Rust that relies on being able to access fields of dropped structs? I guess it makes sense because it's an unsafe trait hm...

So it is not allowed to panic or abort? That seems unfortunate, since I personally would rather emulate the proposed state where size_of_val panics than the current one where it returns 0.

Otherwise, thumbs-up from me.

This is required for Rc<T: ?DynSized> and Arc<T: ?DynSized> to be sound, as they will call layout_of_val_raw on a dropped value if the last handle to go out of scope is a Weak handle.


However, there's a more concerning issue with Mutex<T: ?DynSized>; because the proposed DynSized trait doesn't require atomic reads, calling layout_of_val_raw on the mutex while another thread modifies the contained value is potentially racy.

Since the size of a value cannot be changed at runtime (it refers to the allocation size used for the alloc/free), the size determination can be based solely on read-only parts of the object.

However, this does cause issues with e.g. CStr, since determining the size requires reading the entire value (to find the terminating null byte). On a strongly ordered architecture like x86, this wouldn't cause issues (since you'd either read the old or the new value, both of which are non-null) so long as the null terminator is never overridden, but weak architectures (e.g. the C++ memory model which Rust currently delegates to) says such a race is full UB (and on actual hardware, this means you could read a value never written via tearing).

What's worse is that this isn't a problem with Mutex<T: ?DynSized> (that's the lack of DynSized, and this eRFC doesn't propose any generics actually support that yet), it's a problem with Mutex<T: ?Sized + DynSized>.

This is quite unfortunate, since supporting a thin CStr is one of the primary motivatiors for this eRFC, and I don't think it's possible in a clean way to avoid this problem w.r.t. shared mutability.

The three resolutions I can think of:

  • &mut CStr doesn't allow mutating the string (this is actually the case currently if I read the docs correctly).
    • ... but this still runs into the restriction that &mut is specified to be a unique reference, and that means that reads of the length through &Mutex<CStr> would revoke the &mut access unless something like universal two-phase borrows is adopted.
  • Slightly weaker but also much more annoying: &mut CStr allows mutating the string but only with (unordered) atomic writes, and determining the size is done with (unordered) atomic reads (which probably also means the libc strlen can't be used) ... this resolution is compatibility hell and bad.
  • Make supporting ?Sized + DynSized types opt-in where determining the size would cause aliasing problems (and don't opt in at least for types that loan out &mut T, perhaps for all interior mutability types). Completely underspecified at best, and likely just means that it's always opt in (see the previous point about &mut) which kills most of the motivation for the eRFC anyway.

I don't think this works; you can still overwrite the metadata with itself, for example:

fn noop_write<T: ?Sized>(a: &mut T) {
    let len = std::mem::size_of_val(a);
    let ptr = a as *mut T as *mut u8:
    unsafe { std::ptr::copy(ptr, ptr, len); }
}

Your first two resolutions amount to making this code unsound for all thin DSTs.

Do you mean with interior mutability in that example? If anything aliases a in your example, that's just UB.

The problem is &Mutex<T> combined with noop_write(&mut T). More specifically,

let boxed: Box<Mutex<ThinCStr>> = make();
let mutex: &Mutex<ThinCStr> = &*boxed;

join(
    // a
    || {
        let mut lock = mutex.lock();
        let r: &mut ThinCStr = &mut *lock;
        noop_write(r);
    },
    // b
    || {
        let _size = std::mem::size_of_val::<Mutex<ThinCStr>>(mutex);
    },
}

This necessarily does a read of the ThinCStr on thread b, because that's the only way to recover the size information of a ?Sized + DynSized value. Thread a uses the fact that it has a mutable reference to write to overwrite the entire memory of ThinCStr with itself nonatomically.

This says that DynSized can't read from the pointee, because it's already okay to have shared mutability for ?Sized values.

So what it looks like we'd really want for the full gamut of dynamically sized types is

  • Today's Sized: statically known size/align.
  • Today's ?Sized: can determine size/align from pointer/metadata pair.
  • ?Sized + ?MetaSized: can determine size/align from the pointer/metadata/pointee[1].
  • ?Sized + ?MetaSized + ?DynSized: unknown size/align.

  1. For this to be sound, either ?DynSized has to be forbidden from UnsafeCell, or ?Freeze + ?DynSized pointees must not be allowed in the safe [size|align]_of_val. Properly specifying when it's safe to call the raw versions is quite tricky, however. ↩︎

Follow up: I PRd the conclusions from this as design notes to the lang team repo:

1 Like

T: ?Sized + MetaSized + DynSized", where the size and alignment are known from the data pointer and metadata

Is it intentional with MetaSized to allow dependence on the pointer address? I would expect that deriving size and alignment from just the metadata is the typical scenario. I can imagine some schemes where the pointer does encode the dynamic size, but those are necessarily incompatible with freely relocating the object by copying the bytes in ptr..(ptr+size).

The intent of the design doc is just to document the constraints on the design space. I see no constraint to why MetaSized cannot know the address of the type, though I agree in principle that just pointee type + pointee metadata ought to be enough.

If you can suggest a succinct way to integrate that recommendation (since it's in the opinionated section already), I'd be happy to integrate it.

Well, the specific use case I have in mind is the possibility of extending [T] to e.g. T: ?Sized + MetaSized with the constraint that the Ts in the slice have the same metadata, so you only need one copy of it: The metadata of [T] would be (usize, T::Metadata) (or something equivalent). In addition to location invariance of size&align, this would require that every value in a slice would get the same metadata when unsized, or that it's at least valid to use the metadata of one value for all of them. As far as I know, these would hold for slices and trait objects, as well as structs containing them.

That extension leads to the expected meaning of multidimensional slices like [[T]] or [Block] where struct Block(Header, [T]), and likely more surprising/misleading [dyn Trait] as a slice of some single concrete type that implements the trait.

I added a short note to that point

Additionally, it could be useful to restrict MetaSized to only know the pointee metadata and not the data pointer; this would allow things like [T] where T: ?Sized + MetaSized using both slice and T metadata for an extra-fat pointer (e.g. [[T]] for 2D slices doing the obvious thing (without stride)).

but it's worth noting that for matrix slices, this isn't sufficient; those want 3×usize metadata, for (minor_len, major_len, stride) to allow you to do a proper multidimensional subslice.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.