Pre-eRFC: Let's fix DSTs

glaebhoerl · January 31, 2018, 8:00am

(I wasn’t talking about ergonomics personally, but the relationship (if any) between the sizedness traits and ownership, which seems like a more fundamental matter)

mikeyhew · January 31, 2018, 8:49am

The reason that Thin<T> is a generic struct is so that it can work for any type T: SizeFromMeta, and in particular, any trait object type. Thin as a trait, if it works, would only work for traits with Thin as a super trait. Which would be fine for object-oriented style hierarchies like Widget, but isn’t general enough to support most traits in Rust.

parched · January 31, 2018, 9:56pm

How might "sizeless types" defined in Arm C language extension for SVE fit in here?

Informally, sizeless types can be used in the following situations:

as the type of an object with automatic storage duration;

as a function parameter or return type;

...

as the target of a pointer or reference type; and

as a template type argument.

Sizeless types may not be used in the following situations:

as the type of a variable with static or thread-local storage duration (regardless of whether the variable is being defined or just declared);

as the type of an array element;

as the operand to a new expression; and

as the type of object being deleted by a delete expression.

In all other respects, sizeless types have the same restrictions as the standard-defined incomplete types. This specifically includes (but is not limited to) the following:

The argument to sizeof and _Alignof cannot be a sizeless type, or an object of sizeless type.

It is not possible to perform arithmetic on pointers to sizeless types. (This affects the +, -, ++ and -- operators.)

Members of unions, structures and classes cannot have sizeless type.

_Atomic variables cannot have sizeless type.

It is not possible to throw or catch objects of sizeless type.

Lambda expressions cannot capture sizeless types by value, although they can capture them by reference. (This is a corollary of not allowing member variables to have sizeless type.) Standard library containers like std::vector cannot have a sizeless value_type.

kornel · January 31, 2018, 11:54pm

When is it useful to have alignment determined at run time?

I might be lacking imagination, but I need DSTs for things like multidimensional slices and custom fat pointers, and for all of them I’d be OK with just hardcoded usize alignment.

mikeyhew · February 2, 2018, 6:14am

@kornel The obvious case is trait objects – they have the alignment of the erased type.

std::mem::align_of_val(&5i32 as &::std::any::Any) // => 4
std::mem::align_of_val(&5i64 as &::std::any::Any) // => 8

kornel · February 2, 2018, 3:14pm

In that case, would this work?

Thin<dyn Trait> {
     meta: <dyn Trait>::Meta, // stored at alignment of Meta
     variable_padding: [u8; ?],
     data: Trait,  // stored at alignment of trait's implementation
}

The Thin struct would have compile-time constant alignment of its Meta and a variable length. The data alignment would be stored in Meta and you’d have to read it to compute the data pointer.

kennytm · February 2, 2018, 4:08pm

Consider the type Thin<u64x2> where u64x2 has 128-bit (16-byte) alignment. The Thin type according to your scheme would be arranged as

struct Thin {
    meta: &'static Vtable,   // size = 8 bytes
    padding: [u8; 8],
    data: u64x2,             // offset = 16
}

However, the alignment of Thin would only be 8,

struct Foo {
   a: u8,
   b: Thin<u64x2>,   // offset = 8
}

here foo.b's offset can only be 8, which means the offset of foo.b.data would be 24, making access to foo.b.data misaligned.

We could fix it by making that “variable padding” depends on the run-time pointer value of self in additional to the alignment deriving from meta. The drawback is that ptr::copy will require two separate calls to memcpy.

mikeyhew · February 2, 2018, 7:42pm

@kennytm Your use of a non-trait-object type with thin reminded me of something that I’ve been meaning to point out: the definition of Thin that I wrote earlier needs another type parameter in order to support unsizing:

struct Thin<T: SizeFromMeta, U: Unsize<T> + SizeFromMeta = T> {
    meta: <T as Referent>::Meta,
    data: U
}

mikeyhew · February 2, 2018, 9:54pm

Are you sure? I think this may be a counter-example:

let foo: RefCell<Option<&i32>> = RefCell::new(Some(&34));

// bar has pointer metadata for an `Option<Any>`, including discriminant
// EDIT: updated to replace `&Any` with `Any`
let bar: &RefCell<Option<Any>> = &foo as &RefCell<Option<Any>>;

// allowed because `bar` is only a shared borrow of the `RefCell`,
// and not the contained `Option`
foo.replace(None);
println!("{}", *bar.borrow().is_some());
// prints "true" because the pointer metadata still says that
// `Some` is the active variant
// It's easy to cause UB here

eddyb · February 2, 2018, 10:09pm

Not the discriminant, but mem::discriminant::<Enum<T>> the function (as a fn pointer, to be exact).

As in, how to obtain the discriminant. You don’t need to know how to change it, since you can’t do that after unsizing through the unsized type - as you show, by using foo instead of bar (I think you don’t need RefCell, btw, Cell should work fine).

Also, did you mean to write Any instead of &Any?

mikeyhew · February 2, 2018, 10:12pm

Oh, I see. And yes, Any should be there in place of &Any, good point. EDIT: fixed the error in the original comment

arielb1 · February 3, 2018, 1:41pm

Won't you also need to be able to get the field offsets? e.g. for Option<T>, the T might be in both offsets 0 and 8.

eddyb · February 3, 2018, 11:02pm

Okay, yeah, you probably need something closer to a vtable for “virtual fields”, I was oversimplifying then I said “a function pointer”.

Diggsey · February 4, 2018, 4:58pm

Can’t you just disable layout optimisations for enums containing potentially dynamically sized variants?

eddyb · February 4, 2018, 5:17pm

I mentioned this as the original trade-off. It would mean we can’t ever allow T: ?Sized for Option<T> (because we guarantee layout optimizations for it to unsafe code etc.).

mikeyhew · February 8, 2018, 11:58pm

glaebhoerl:

Once upon a time we also had ideas about nifty things like “nested DSTs”, e.g. &[[T]] would contain two pieces of metadata in the fat pointer - the size of the outer and of the inner - resulting in a regular (rectangular) 2D array. Likewise, &[Trait] would be a homogeneous slice of an unknown type that implements Trait. And even just (this is no longer nested, only nifty) unsizing Vec<Type> into Vec<Trait>, with the Vec itself being a kind of smart pointer (just, to multiple values), and fattening up to store the additional vtable pointer. (Vec<Trait> itself may not be that useful, because you won’t be able to push any new items onto it, so maybe it’s not the best possible example; but the point is that anything which doesn’t store T unboxed, rather only through pointers, could/should/would be able to behave “like a smart pointer” from the perspective of DSTs.)

Can these kind of things be accomodated by the plan (even if just to the extent of not ruling them out)?

Hopefully the Custom DST implementation that we come up with would allow experimenting with stuff like that in a library, without needing to add special support and syntax to the compiler. I can definitely see something like [Trait] being doable – I would call it ErasedSlice<T> where T: SizeFromMeta, with its metadata is a (usize, T::Meta) tuple. I'm not so sure how useful a nested slice [[T]] would be, since it would have to be contiguous, but it should be doable in a third-party library.

parched · February 9, 2018, 11:22am

Having now skimmed the unsized rvalues RFC, it seems these will be very close to DynSized. The difference is the size is a runtime constant (it is unknown at compile time but is constant at runtime) so the following rules for DynSized can be relaxed.

These types aren't in upstream LLVM yet last time I checked, so maybe nothing needs to be done now. Nevertheless, it should be kept in mind that another trait between Sized and DynSized, say RuntimeSized, will probably need to be added at some stage.

mikeyhew · February 9, 2018, 9:24pm

A couple of questions:

What do you mean by “runtime constant”? Would the Pixels example from above be an instance of this or not, and why?
What would the difference between DynSized and RuntimeSized be?

parched · February 9, 2018, 10:00pm

Basically it means std::mem::size_of exists for the type but it isn’t const.

mikeyhew · February 9, 2018, 10:04pm

nikomatsakis:

it feels very “significant” to me that T: Referent / T: ?DynSized basically corresponds to a non-owned T

Do you have any further thoughts on this? Just as a contrary data point, to me it feels like an accident - though I don’t have any specific, concrete thoughts about it (either).

Well, I don’t think it’s an accident, but the idea may indeed be a sort of dead-end. On the one hand, the main reasons I can think of to know the size of something all correspond to ownership:

You need to free it (i.e., you own it), and hence want to call dealloc

You want to copy it to your stack frame (i.e., you want to take ownership of it) and need to know how much stack space to allocate

However, there is one other place where we really need to know the size: if you have a type like &[A], you are borrowing those A values, but you still need to know their size to compute their offset. Similar things could apply for other types – the main reason this isn’t true for structs is that we restrict DST types to being in the final position layout-wise, so we only need to know their alignment to compute their offset.

So I know I said that I wanted to talk about this in the other issue, but the ownership stuff is at least somewhat on-topic, and it's a lot easier to reply here, so that's what I'm gonna do

My opinion is that any correspondence between ownership (whether you can pass something by value or just by reference) and sizedness (whether something's size is known at compile time, run time or not at all) is an accident of the way things are in Rust today. OK, that's not entirely true – you will never be able to pass something by value without knowing its size. However, the fact that you can't pass ownership of something to a function without knowing its size is, in my opinion, an unnecessary restriction, and one of my DST-related goals is to have this restriction lifted.

Currently, there are two main ways of passing ownership of a value to a function:

By value. This works for Sized + Move types, and with the unsized rvalues RFC would work for DynSized + Move types as well, by using &move-reference under the hood
By boxed value. This works for any DynSized type, but would not work for every Referent type because there is no way to deallocate the boxed value without knowing its size

There is a potential third way of passing ownership that works for any Referent type: explicit &move-references. With &move T, we can pass ownership of a value, without needing to allocate it on the heap, and without needing to know its size. Whereas the unsized rvalue sugar would presumably only support types that are Move + DynSized, &move T works for any T.

Topic		Replies	Views
[Pre-RFC] Custom DSTs language design	33	2555	March 25, 2019
[Pre-RFC] Yet another DST proposal language design	3	787	November 9, 2020
Proposals to support DST smart pointers language design	3	281	September 24, 2024
Custom DST Discussion language design	7	3600	March 25, 2019
(Mega-pre-RFC) Reference specialization types (DSTs, proxy-references) language design	32	2640	March 25, 2019

Pre-eRFC: Let's fix DSTs

Related topics