2024 CFP: Exotically sized types before 2027

There's no way that actual support for exotically sized types is landing in time for the 2024 edition. But I think we understand enough about the problem space that we can reserve the space for them in the 2024 edition. The goal is to make it so we can compatibly enable the use of unsized types in the language and standard library without allowing them to be provided to code not written to be aware of them.

...although after writing this out I think this actually ends up being most of the support work, and less is deferrable than I initially thought could be. Namely, due to associated types no longer getting MetaSized bounds and requiring manual bounds for them (potentially infinitely)

I'm still posting this as a useful exploration of the minimal support reservation, but I'm thinking just implementing RFC#3396 Extern types v2 in full is easier and preferable. At the least, my list of alternatives is potentially useful (especially the "bodge" option to reduce edition difference).

The necessary changes amount to roughly:

  • All editions
    • New builtin traits for capabilities of unsized types.
      • Sized — has a statically known size and alignment.
        • The existing trait, newly gaining Aligned and MetaSized as supertraits.
        • Object unsafe. (Logically has an associated const for the size.)
      • Aligned — has a statically known alignment.
        • An empty marker trait not implementable by user code.
        • Implied by the default Sized bound.
        • Object unsafe. (Logically has an associated const for the alignment.)
      • MetaSized — has a size and alignment known from pointer metadata.
        • An empty marker trait not implementable by user code.
        • Is a default bound on generics and associated types identically to but independently from how Sized is.
        • Object safe. (Logically the provider of the size and align items in trait object vtables).
    • In the standard library:
      • PhantomData is relaxed from T: ?Sized2021 to T: ?Sized2024.
      • mem::align_of is relaxed from T: Sized2021 to T: ?Sized + Aligned2024
      • mem::{size|align}_of_val are relaxed from T: ?Sized2021 to T: ?Sized + MetaSized2024.
      • ?Sized2021 in positions where removing bounds is API compatible (i.e. generics that are never implied bounds downstream) become ?Sized + MetaSized.
      • ?Sized2021 in positions where removing bounds is API incompatible (i.e. trait associated types) are immediately relaxed to ?Sized2024. (Compatibility is maintained via the edition ≤ 2021 changes.)
      • Fix insufficient bounds errors introduced by the previous two guidelines, by adding bounds to generic type projections or removing MetaSized requirements.
    • Unsized fields are required to be MetaSized.
    • For clarity, Rustdoc always renders docs using ?Sized + MetaAligned2024 independent of edition, never ?Sized2021.[1]
  • edition ≤ 2021
    • A ?Sized unbound removes the default Sized and Aligned bounds but not the MetaSized bound.
    • Unresolved: trait associated types, incl. Self (currently aren't bound by default in generic contexts).
  • edition ≥ 2024
    • A ?Sized unbound removes both default bounds, allowing non-MetaSized types to be used.
  • nightly
    • extern type are not MetaSized.
    • offset_of! only requires field types to be Aligned instead of Sized.
    • Trait upcasting to dyn MetaSized does not have a unique vtable. It uses the prefix of whatever vtable it was upcast from.
    • (certainly missed other relevancies)

†: Minimizes the immediately effective change, but intended to be weakened in the future.
‡: More than minimally necessary, but included here for relevance.

Unresolved questions:

  • Exact behavior of dyn Trait. Implicit + MetaSized for edition ≤ 2021 for backwards compatibility, not for edition ≥ 2024 for extern type support, yeah, but the change is more relevant in this position than for generics, since owning a trait object effectively requires allocation, thus MetaSized.
  • Method where bounds on MetaSized probably shouldn't prevent dyn Trait + MetaSized from being object safe, even though dyn Trait wouldn't be.[2]
  • Default trait method bodies have an implicit Self: Trait + ?Sized generic. How should that be handled for an edition ≤ 2021 trait; is it a supertrait bound so edition ≥ 2024 is required to provide MetaSized for impls? And whatever the solution is, it needs to be object safe.
  • Associated types don't have any implied bounds currently, but if we want to prevent edition ≤ 2021 code from getting a non-MetaSized

I previously made most of a (just the current edition) implementation of MetaSized, but got stuck on expectations for Self (in trait default method bodies) and associated types.

Alternatives:

  • Do nothing, and delay support language support for exotically (un)sized types until edition 2027 or later.
  • Do nothing, but pick some compromise that allows exotically (un)sized types to exist before edition 2027:
    • Live with external types claiming a 1ZST layout and &mut ExternType being a giant footgun.
    • Make mem::{size|align}_of_val panic for external types. (Runtime instead of compile-time checking.)
    • Forbid external types from ever appearing in generics. (Essentially this proposal, but only relaxing references and their builtin deref to support ?MetaSized and nothing else.)
  • Bodge support in the current edition, but proper in the next — same behavior w.r.t. MetaSized bounds in both, but a missing MetaSized bound in the current editions is only a warning, with asking for the layout panicking.
    • Future editions still have to soundly handle that asking for type layout could panic for provided types, but new code should always request/provide witness that it can't.
  • Sized doesn't strictly need to imply Aligned.[3]
  • More traits for more exotic layout, e.g. splitting MetaAligned from MetaSized, or DynSized which is allowed to read from the pointer to compute size/align (e.g. to call strlen for &CStr, or to fetch a vtable from behind the reference e.g. for nonfinal cxx classes).
    • I've personally come to the conclusion that these are more trouble than they're worth, even if Box<Dst> being thin is nice.
    • size_of_val doing a read is a pointer provenance validity nightmare; UnsafeCell
    • A read to determine the alignment (DynAligned) is borderline unusable; you have to store the pointer to the type directly, since determine the alignment you need the data pointer, but to offset to the field you need to know the alignment.
    • Plus, you can make Thin<Pointer<Dst>> work just fine without language support beyond splitting data pointer and metadata. You can even already do it on stable, if you limit yourself to just slice tailed DSTs; this is what my crates erasable and slice-dst do together.

Future extensions:

  • extern type might finally have a path to stabilization.
  • Relaxing more APIs to accept non-MetaSized types.
  • Allowing users to (unsafely) implement the layout traits to communicate properties about external types.

  1. Some sort of indicator for ?Sized2024 might be desirable, to distinguish it from ?Sized2021. Always Rendering it as ?MetaSized makes sense, but I don't think requiring writing that in edition ≥ 2024 is necessary. But it is how edition ≤ 2021 would be extended to be able to be generic over non-MetaSized types. ↩︎

  2. A separate thing I'd potentially like to see for edition 2024 is dyn trait item declarations which ensure the trait is object safe. Interesting combination of concepts here: it's inferred object safe or not for + MetaSized only by default, but dyn trait makes it object safe without the need for MetaSized? ↩︎

  3. Has a fun implication then that ?Sized + Sized is not the same as the lack of ?Sized, since it unbounds Aligned.) ↩︎

4 Likes

Just a question: would this proposal enable 2D slices? (slices whose metadata not only have a length but also a width and probably also a stride, so you can get a 2x2 submatrix out of matrix that is laid contiguously in memory, for example)

I remember seeing some discussions in Github about what it would take to support this natively, but I can't find it.

This change isn't necessary for supporting dynamically sized types that are still sized based on their metadata, such as 2D slices. The change is necessary for supporting types which can't have their size determined by their metadata.

That said, this is effectively a prerequisite for extern type which don't answer layout by panicking or as if they were 1ZST, and the imho most probable path to custom dynamically sized types is building from extern type and then specifying the size information for them.

For the specific case of 2D (contiguous, non-strided) slices, I think the most straightforward way to get them is to consider [T] as “[T; dyn]”, i.e. as an array where the length is moved from being a const to the pointer metadata. With that understanding of slices, 2D slices are as theoretically simple as 2D arrays, with composition giving us that [[T; dyn]; dyn] is a 2D array where the two usize worth of length information are carried as pointer metadata instead of being const.

Ofc it's more involved than just that, but at a high level that seems the most likely way we get built in or std support for 2D matrices, and it currently feels closer than proper custom DST does. (Fake custom DST are possible today if you're willing to bend the rules a bit, store your metadata as a single usize, and commit pointer laundering. The most notable example being bitvec.)

2 Likes

I think you meant implied by Sized, not Send

Another very exotically sizes type is the ARM scalable vector. Theres an RFC that cannot work as-is: RFC: Add a scalable representation to allow support for scalable vectors by JamieCunliffe · Pull Request #3268 · rust-lang/rfcs · GitHub Not saying that it must be part of this, but if it can be considered then why not.

2 Likes

well, I strongly think that when Rust supports scalable vectors, it should support them inside structs so we can have an ArrayVec-like type with support for user-selectable vector length (distinct from scalable vector length, which is cpu-designer selectable). (unrelated to exotically-sized types, but important when considering portable-simd design: we should be careful to support user-selectable vector length with a fixed-size (optionally user-selectable) backing simd type, which is what Libre-SOC's SimpleV's native vector types are)

(NOT A CONTRIBUTION)

First I want to make a criticism of how you've proposed this: you don't include a very detailed description of what you mean by "exotically sized types" (which has led all of the responses so far to be off topic) or a strong motivation for all of the changes you propose.

I feel I understand the design space pretty well so I'm able to contextualize this post, but I am not clear what motivation could be assigned to all of the proposed additions except a notion of theoretical purity. What type is Aligned but not Sized and why would we want to add it?

You propose to add 2 question mark traits. This would require an extremely strong motivation, but you just vaguely refer to "enabl[ing] the use of unsized types in the language and standard library."

I've written my understanding of the design space and options; maybe you can explain how you see things differently.

Extern types from FFI

The first motivation for this exploration years ago was to solve a problem with extern types. Extern types are types provided by a foreign library which doesn't give the user layout information, so that the user cannot allocate them (on the stack or heap) by themselves. Instead, the library provides functions which return *mut ExternType as well as destructor functions that take that pointer and deallocate it.

Right now, users work around this by defining zero-sized types, but this runs some amount of risk of a user allocating it. This risk is small: if you define a zero-sized type with a private field, the risk is only in the library wrapping the FFI. For this reason, I already regard this sort of extern type as a marginal feature, not worth making big language changes to support.

The problem that arose was what to do about passing an extern type to mem::size_of_val and mem::align_of_val. One option is to make this panic or abort. Another option is to add DynSized (or MetaSized, whatever), a parent trait of Sized and bound these functions by that trait, which extern types won't implement.

The biggest example of how a user could mistakenly do this is with Box. If a library exposes *mut ExternType, a user might call (unsafe) Box::from_raw on that, which would abort when it drops if you don't add DynSized, because it needs to get the layout. This is just about making a runtime error when dealing with extern types a compile time error instead, it doesn't actually enable putting an extern type in a box.

In my opinion, extern types which abort when trying to get their layout would be fine. They are strictly better than zero sized types, which instead have nonsense behavior when you try to put them in a box. I find it hard to see this as an error which is worth adding a ?Trait to prevent.

Libraries wrapping a C API with extern types would have to have 2 types:

  • ExternType, an extern type which is only ever accessed behind a reference.
  • ExternTypeOwnedRef, a wrapper around *mut ExternType which calls the FFI destructor when it's dropped and derefs to references to ExternType.

From my perspective, such an API would strongly discourage ever trying to get the layout of ExternType or move it in the heap and the language doesn't need additional protection against this. If this could be a static error at no cost to anyone that would be great, but it doesn't justify the front-loaded complexity of another ?Trait.

Let me know if there's some salient fact that I've missed.

Custom DSTs

The other potential application is custom DSTs. Here, the design space is less certain, though you propose a specific path through it with MetaSized (and suggest never supporting a distinct DynSized).

The goal is here is to allow users to define how to get the layout of a type, possibly from its metadata, possibly by an arbitrary function call (this is the distinction between MetaSized and DynSized). Exactly what new types would be supported by MetaSized without DynSized is unclear to me, as everything I can think of being useful (e.g. CStr, thin trait objects) would require DynSized, so some clarity around this would be helpful.

However, from my perspective, supporting this doesn't require a "real" new trait, it requires a "fake trait" like Drop. Ralf Jung has complained about how bizarre Drop is in the past, and maybe this sort of behavior shouldn't be defined with trait syntax, but that's what Rust chose. You would implement DynSized or whatever for a type (probably an extern type, maybe some other definition syntax) and define how to get the layout of it.

Importantly, this wouldn't require adding DynSized or ?DynSized bounds any more than custom destructors require adding meaningful Drop bounds. The only thing that requires these as bounds is restricting FFI extern types from size_of_val/align_of_val. As I said, I don't think turning those calls into a static error is worth the additional complexity of one or even more than one ?Trait. But it seems to have no bearing whatsoever on custom DSTs, because they would be safe to call with these functions.

2 Likes

This part is actually unsound I think, as a lot of code doesn't guard against size_of_val panicing (why would it?).

Whoops, yes. Blame editing on mobile, I guess.

Fair. I'm coming from a base place of RFC#3396, but halfway through writing this ended up realizing that this was basically no different from that RFC. My goal was to basically pare the RFC back to just the edition changes without additional functionality, but not much could be, thus this being light on direct motivation.

While this is focused on opening space for non-MetaSized types, that's because those are the ones which need edition changes. The “custom DST” class of exotic type layout don't need any edition changes for language support.

Below is some more discussion on the design space:

It's fairly minimal risk, but a much more significant risk IMHO is &mut Dst. We've established that this means it is sound to relocate the object. For ?Sized types this requires some form of flexible allocation[1], but you can't take_mut from &mut Header, because it's implicitly relying on the rest of the object being there.

With stable extern type shims which are Sized, you can even just mem::swap them. It's not unsound to do the way take_mut is (for ZST header; it is if actual header data gets swapped), but it's still a huge footgun that swap is a no-op. At least for unstable extern type it's ?Sized so you have to use some less common way to be able to improperly slice the object.

It's somewhat unlikely that this happens for a concrete type, but that's not the main problem. The main problem is combining generic APIs creating unsoundness in safe code.

You can mitigate the issues by using Pin<&mut Extern> instead, encoding the address sensitivity. This would even be correct for e.g. C++ types without trivial move constructors (in C++ terms), but is "morally" incorrect for data which isn't otherwise pinned (e.g. FAM slices) on top of the economic cost of Pin.

I have a library which quite thoroughly builds this out, with an unstable feature flag to switch between using extern type T or struct T(UnsafeCell<[u8; 0]>, PhantomData<*mut ()>, PhantomPinned). It works by having a Resource<'scope, Extern> handle type which acts as a foreign box and calls the appropriate destroy function, and derefs to &Extern, with all functionality defined using &Extern.

For the most part, this works, and shared mutability is the correct way to expose most of the functionality from the library. But there's some functionality that I cannot easily expose soundly; it needs to be &mut Extern to terminate derived lifetimes. But exposing &mut Extern would be unsound in combination with take_mut.

I could make the functionality self: Pin<&mut Extern> and add a "deref_pin" method to Resource, or I could define the methods on Resource<Extern> instead (everything is one crate, thus no coherence problems), but either of these are compromises making the library meaningfully ergonomically worse to use just to prevent edge case unsoundness. To the point that it's legitimately considered preferable just to make these methods unsafe instead.

If it were merely a matter of correctness, then yes. But it's not just correctness, it's a matter of soundness. The benefit of using Rust is that it's impossible to cause UB from safe code. It's not sufficient just to make it difficult and require doing weird but (otherwise) sound things. Safe code must be sound for the use of unsafe details to be correct.

And actually, layout queries panicking would be sufficient to make it sound. (Assuming, of course, that provenance isn't strictly bound, i.e. that we have a model more like TB than SB.) And if unknown layout types are rare, this could be sufficient, but I'm in the space of writing support code aiming to make the use of such types lower friction and more accessible. I'd like the fraction of unsized types which are ?MetaSized to be similar to the fraction of all types which are ?Sized.

And for every ?Sized type, I'd like it to be as simple to shim it to be ?MetaSized as composition in Indyn<T>.

The trick which I think makes DynSized unnecessary is that you can just deref into a MetaSized type by loading the appropriate metadata to rehydrate the fat pointer. This neatly completely sidesteps all of the issues with size_of_val potentially needing to read behind the reference (consider that &Mutex<CStr> mustn't, else it could alias with live &mut CStr) by doing the read when a read is obviously acceptable.

To illustrate as a sketch:
extern type zstr;
impl Deref for zstr {
    type Target = str;
    fn deref(&self) -> &str { unsafe {
        let ptr = ptr::from_ref(self);
        let len = strlen(ptr.cast::<c_char>());
        let bytes = slice::from_raw_parts(ptr.cast::<u8>(), len);
        str::from_utf8_unchecked(bytes)
    }}
}

This is why RFC#3396 aims to make it such that there's still only one unbound, ?Sized. The difference is in how unbound that unbound makes the type; currently stable, it guarantees the layout can be retrieved from “&ManuallyDrop<T>” (even after the T is dropped, despite ManuallyDrop not actually supporting ?Sized), but it could relax all the way to unknown layout.

Critically, if we make ?Sized mean unknown layout, then adding DynSized becomes a compatible addition for the future. MetaSized just bounds things back to where they currently stand. (Relaxing associated types from MetaSized to DynSized would unfortunately still be incompatible, but I think all std associated types either require Sized or are fine being fully-?Sized.)

I think if we relax ?Sized instead of introducing new unbounds, it's actually a reasonably smooth change for unbound positions. The rough places are where there isn't a default bound currently — dyn Trait, default method body Self, and associated type projections from generic types — since we're effectively saying there was a default bound there the whole time which we want to remove.

Is it rough enough that we should just give up on compile time safety here and do runtime checks instead? Maybe, but I'm not entirely convinced yet. I definitely think it's significantly more achievable than other possible unbounds (e.g. never movable types). It's not just a matter of explicit layout queries panicking, either, it's potential post monomorphization errors if you have a field with unknown layout.

As a final note, while size_of_val panicking works to prevent incorrect manipulation of unknown layout types, it still means that the function can panic and unsafe code needs to deal with that. As much as it is preferable if unsafe is always written in PPYP style and an unwind at any point would leave everything in a sound state, the reality is that this isn't always the case. Unsafe code that would be unsound if size_of_val unwinds definitely exists. And it means more code in drop handlers for the unwinding edge.


  1. But not necessarily heap allocation; a stack buffer is just as acceptable. ↩︎

1 Like

(NOT A CONTRIBUTION)

I agree that mem::swap is a compelling reason to add extern types, it doesn't seem relevant to this discussion though. FWIW, an option that could avoid the problem with mem::swap is to make them a custom DST with a nonsense layout, e.g. a newtype around a slice of (); anyway, proper extern types would still be better than this, I'm just mentioning it.

Of course! The current behavior is grossly unacceptable, which is why the debate was always between a runtime error (a panic or an abort) and adding complexity to the sized trait hierarchy.

This is why I've said "panic or abort;" if making it possible for size_of_val to unwind is considered an unacceptably disruptive change for unsafe code to have to be compatible with, always aborting is an alternative option. However, I don't know about unwinding edges since unwinding should only occur when these types have been monomorphized to an extern type; this is always incorrect so the performance shouldn't matter. I'd like to see the runtime error option explored more seriously, to discover if there really are serious problems that rule it out.

I don't think writing ?Sized + MetaSized instead of ?MetaSized mitigates the problem with adding these traits very much. The fundamental problem is that users, especially from a non-systems background, are surprised and confused by the concept of "unsized types." Having a simple binary (some types don't have a layout known at compile time) is a lot more accessible than having a complex lattice of possibilities. Keeping extern types a "secret third thing" by having their interaction with ?Sized only a runtime behavior documented on size_of_val avoids front-loading the additional complexity they bring.

2 Likes

[T] is. We would want to add it because it allows offset_of on the unsized field (which doesn't work on a dyn Trait field since that could have more alignment than the rest of the struct and so needs a greater offset). iirc it also allows some other things that icr at the moment.

1 Like

This doesn't really help not introduce additional complexity though, or I guess it pushes it to everywhere that uses extern types, which I suppose might be a benefit in avoiding excess breaking changes. This idea doesn't really isolate breaking changes though, as it introduces a semver hazard everywhere generics are used, as something might assume MetaSized implicitly and introduce a size_of_val call (directly or indirectly), so one update might break code that assumed it could use an extern type somewhere.

Just to summarize the different type categories mentioned in this thread until now:

Type exist on stack exist on heap exist in compound types support size_of support align_of support size/align_of_val support size/align_of_raw
Sized types :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:
Slices :x: :white_check_mark: (as tail) :x: :white_check_mark: :white_check_mark: :white_check_mark:
dyn Trait :x: :white_check_mark: (as tail) :x: :x: :white_check_mark: :white_check_mark:
thin CStr types :x: :white_check_mark: (as tail) :x: :grey_question: :white_check_mark: :x:
extern types :x: :white_check_mark: (as tail) :x: :x: :x: :x:
extern with #[repr(align(n)] :x: :white_check_mark: (as tail) :x: :white_check_mark: :x: :x:
scalable vector types :white_check_mark: :x: :x:? :x: :x:? :grey_question: :grey_question:

This proposal would enable

  • using align_of on slices (and in consequence offset_of on slice tailed DSTs),
  • extern types with no size and alignment information
  • extern types with no size information but statically known alignment

With a way to implement MetaSized/Pointee on external types in user code it could easily be extended to support

  • Custom DSTs with fat pointers (e.g. 2D slices and bit slices).

With an additional implementable DynSized trait, that allows size/align_of_val but not _of_raw, it could be extended to support

  • thin CStr, thin trait objects and similar types.

Scalable vector types are completely out of scope for this proposal.

Did I forget anything?

4 Likes

I have an open RFC for this that includes some motivation.