Storing the size in the box?

Soni · July 11, 2021, 8:37pm

We use Rc<Box<dyn T>> and Box<Box<dyn T>> a lot, because the outer box has a size suitable for FFI but the inner one does not. It would be nice if either Rust had a highly-optimized allocator for (ptr,size) pairs (to reduce the cost) or if one could somehow shove the size in the box itself (perhaps with a wrapper type? maybe as a lang item?).

CAD97 · July 11, 2021, 10:18pm

With custom allocator generics, you can provide this while still using the standard types. You can already do this with custom owning types.

For the specific case of array DSTs, this is already possible using custom types and ptr::slice_from_raw_parts.

Once ptr::metadata stabilizes, it will be possible to do the same for arbitrary DST types (and I intend to add library support for such into erasable).

I still need to do a bit of thinking as to how to acquire a Thin<P<InlinePtrMetadata<dyn Tr>>> (name obviously subject to bikeshedding), but the scheme otherwise works in library code without new language functionality (beyond #![feature(ptr_metadata)]).

CAD97 · July 12, 2021, 4:03am

It appears I may have backed myself into a corner, actually:

#![feature(ptr_metadata)]

use {
    erasable::Erasable,
    std::{marker::PhantomData, ptr},
};

#[repr(C)]
pub struct Indyn<Dyn: ?Sized, T: ?Sized = Dyn> {
    phantom: PhantomData<Dyn>,
    metadata: <Dyn as ptr::Pointee>::Metadata,
    inner: T,
}

unsafe impl<Dyn: ?Sized> Erasable for Indyn<Dyn> {
    unsafe fn unerase(this: erasable::ErasedPtr) -> ptr::NonNull<Self> {
        let metadata = ptr::read::<<Dyn as ptr::Pointee>::Metadata>(this.as_ptr() as *mut _);
        let this: *mut Dyn = ptr::from_raw_parts_mut(this.as_ptr() as *mut _, metadata);
        ptr::NonNull::new_unchecked(this as *mut Indyn<Dyn>)
    }

    const ACK_1_1_0: bool = true;
}

This works, and miri is happy to accept it.

Example

macro_rules! indyn {
    ($t:expr; as $d:ty) => {{
        let t = $t;
        let p: &$d = &t;
        Indyn {
            phantom: PhantomData,
            metadata: ptr::metadata(p),
            inner: t,
        }
    }};
}

fn main() {
    let b: Box<Indyn<dyn Any>> = Box::new(indyn!(0usize; as dyn Any));
    println!("type_name: {}", std::any::type_name_of_val(&b));
    println!("size_of  : {}", std::mem::size_of_val(&b));

    let thin = erasable::erase(ptr::NonNull::new(Box::into_raw(b)).unwrap());
    println!("type_name: {}", std::any::type_name_of_val(&thin));
    println!("size_of  : {}", std::mem::size_of_val(&thin));

    let b: Box<Indyn<dyn Any>> = unsafe { Box::from_raw(Indyn::unerase(thin).as_ptr()) };
    println!("type_name: {}", std::any::type_name_of_val(&b));
    println!("size_of  : {}", std::mem::size_of_val(&b));

    dbg!(b.downcast_ref::<usize>());
}

type_name: alloc::boxed::Box<indyn::Indyn<dyn core::any::Any>>
size_of  : 16
type_name: core::ptr::non_null::NonNull<erasable::priv_in_pub::Erased>
size_of  : 8
type_name: alloc::boxed::Box<indyn::Indyn<dyn core::any::Any>>
size_of  : 16
[src\main.rs:63] b.downcast_ref::<usize>() = Some(
    0,
)

Unfortunately...

error[E0119]: conflicting implementations of trait `erasable::Erasable` for type `Indyn<_>`
  --> src\main.rs:17:1
   |
17 | unsafe impl<Dyn: ?Sized> Erasable for Indyn<Dyn> {
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: conflicting implementation in crate `erasable`:
           - impl<T> Erasable for T;

the blanket impl of Erasable for any sized T conflicts with the specific impl for Indyn. I still stand by the blanket impl, so I guess arbitrary DST metadata support for erasable::Thin will need to wait for both ptr_metadata and min_specialization

(I'm sorry-not-sorry; Indyn is a pun on "linline")

Soni · July 12, 2021, 1:36pm

Hmm. This feels more suitable as a lang item, just saying. Good attempt tho.

(Specifically, isn't the goal that Box<Indyn<dyn Foo>> would have a size of 8?)

CAD97 · July 12, 2021, 7:28pm

That's not possible in general until custom DSTs. erasable::Thin is a wrapper around any erasable pointer that stores it in its erased (thin) form without losing type safety. I would've used it here if not for the coherence issue.

I think this is worth digging into: why?

Box is highly special, down to being a unique kind of type in the compiler (or at least it was at one point, I don't know if that's been unified?), and Box's specialness is usually seen as a historical accident (but a useful one) that the lang/compiler teams would like to decrease in the future.

Cell/RefCell/Mutex etc. are not language items, they're all library features built on top of one language feature, UnsafeCell.

What makes Indyn special that it needs to be implemented as a compiler item rather than a regular library type?

It's unfortunate that as of current rustc (plus #[feature(ptr_metadata)]) Indyn can't always be a thin DST, but that's solved in the future by custom DSTs. Any language feature for Indyn is going to look a lot like Indyn (though keep in mind, the proof-of-concept is just that, a proof that it works, not necessarily the best API); if the language feature can be implemented strictly in library code, why shouldn't it just be a library feature?

Making Indyn a language feature isn't going to magically make it stably work as a thin DST without stabilizing ptr_metadata and custom DSTs. In fact, I'd give you 90+% odds that the way the lang team would implement Indyn would be as a library feature using ptr_metadata and custom DSTs.

Plus, Thin<P<Indyn<dyn Trait>>> works on today's nightly. (I'll be adding nightly-only feature gated support to erasable and indyn this coming weekend.) Once custom DSTs are available, P<Indyn<dyn Trait>> will (hopefully) also be thin.

I fail to see the value-add of rejecting the library implementation and waiting even longer for a potential language implementation.

Soni · July 12, 2021, 8:18pm

Our thought process was that making it a lang item would make it work sooner. At the very least it could be implemented as a custom DST (under the hood) before custom DSTs get a defined syntax and semantics, thus also helping shape those syntax and semantics. (Indeed, just like Box. See below.)

Additionally, the "problem" with Box is simply one of ?Uninit types, as has been discussed before. It would stop being a lang item if we had ?Uninit types. It doesn't look like we'll have those anytime soon tho, and this thread isn't about that issue.

CAD97 · July 13, 2021, 2:22am

Syntax, sure. But semantics, not really. As a new kind of DST, thin DSTs would require deciding on the semantics of new DST kinds throughout the language and compiler.

Sure you could sidestep a little bit of complexity by the fact that it doesn't introduce a new pointer metadata type. But not enough to make it significantly easier of a problem, though, imo.

Also, a language thin DST Indyn would want to always be Indyn<T>, not Indyn<Interface, ActualT> like I've written. That would mean that it's always unsized, which would mean requiring unsized_locals to be usable. (My Indyn abuses the second parameter to be conditionally unsized to get behind an indirection at which point it can be unsized.) unsized_locals is hard blocked on custom DSTs being fully designed and workable, if not stable, such that custom DSTs can also be held as locals.

"Make it a lang item" isn't a magic bullet to push features through to stabilization faster. For one, the first question is "why can't this be a library item?" Plus, language extensions are under a much higher burden of proof for addition, for good reason.

Box digression

The ability to talk about (partially) uninitialized types in the type system isn't enough to demagic Box; you also need typestate. You need the type of existing bindings to change based on the initialization state of the value.

This is much more complicated than "just" supporting (partially) maybe uninitialized types.

Soni · July 13, 2021, 4:53am

Think of it this way: trying to make Indyn<T> work would lead to defining the semantics of custom DSTs, which would then lead to defining the syntax. That doesn't necessarily mean stabilizing it sooner, but it does make it easier to reason about with an actual implementation.

As for the Box digression, we consider those inseparable. We've already argued about it.

Sometimes you just need to let the implementation shape the syntax/features you wanna create. Box is special in that you can move in and out of it, despite it being a Drop type. So one should use Box to shape ?Uninit types and the stuff around it. Make an Indyn<T> and let it shape custom (thin) DSTs.

Soni · July 17, 2021, 12:51am

Are we making any sense here? Are these good, valid points? Any feedback? .-.

CAD97 · July 17, 2021, 2:33am

My stance remains the same. An Indyn that is always thin requires solving all of the barriers between custom DSTs and stabilization. There is next to no way a std Indyn is stabilized before custom DSTs. As I said previously, a std Indyn would want to always be a trait object, which requires unsized locals, which is another huge far-future feature to block on. In order to always be thin, it potentially even requires this; my implementation allows you to e.g. create Indyn<dyn Tr1, dyn Tr2> via unsizing, which can't be thin, since it's storing the incorrect metadata inline. And you can't unsize from Indyn<T> to Indyn<dyn Tr>, because the whole point is storing metadata inline, which necessarily changes if you unsize the type.

The syntax is not the hard part of a feature; the semantics are.

Soni · July 17, 2021, 3:45am

Alright. And wouldn't it make sense to design unsized locals, custom DSTs, etc around an Indyn rather than the other way around?

SkiFire13 · July 17, 2021, 8:31am

What do unsized locals have to do with this feature? AFAIK currently the main problems are with alignment and interactions with async/generators.

Soni · July 17, 2021, 1:09pm

Who knows. @CAD97 keeps bringing up unsized locals.

CAD97 · August 11, 2021, 9:36pm

The reason unsized locals come into it is actually interacting with an always-unsized always-thin Indyn type.

My library type works by being

struct Indyn<Dyn: ?Sized, Data: ?Sized = Dyn> {
    meta: Dyn::Metadata,
    data: Data,
}

which means you can construct and box Indyn<dyn Trait, impl Trait> and then unsize that to Indyn<dyn Trait, dyn Trait>. While this works, it's a bit of a hack, and it gets in the way of making Indyn an always-thin type, because there's no restriction that Dyn and Data match.

Specifically, because we rely on unsizing coercions to get from sized to unsized, nothing prevents the creation of Indyn<dyn Trait1, dyn Trait2>, which obviously can't be thin (as it's storing the metadata for dyn Trait1, but holds a value dyn Trait2). The case where Value can't even unsize to Dyn can be handled (stably by macros, unstably by an Unsize bound), but unsizing to the "wrong" type is unavoidable without changing how Unsize is (automatically) implemented for the type. And, even if you prevent construction of a badly typed Indyn, it's still a valid type (all you need is an unsized type that implements Unsize non-reflexively, which is not currently prohibited, and actually desired for multi-trait-objects), which means the compiler has to support generating code for non-thin Indyn, even if it's believed that it's impossible to create one, because you can write an unused function which takes a reference to one (the same way you can take statically uninhabited types as parameters today).

So that brings us to the type definition that we'd actually want for a lang item, that meaningfully justifies being special in the compiler:

struct Indyn<T> {
    meta: T::Metadata,
    data: T,
}

The problem is that this type cannot unsize, because it already (and only) stores the correct pointer metadata. This allows it to always be thin, but it also requires that if you want to have an unsized value in it, it be created as an unsized value. Thus, unsized_locals (but the subset with Metadata=()).

You could say that Indyn iss not usable as a local, and always has to be manually heap-allocated and initialized via manual allocation, but then we're back to the point of why is it a language feature if it's just as unwieldy to use as a library solution is.

My main point is that we have designed and implemented enough language features to implement thin pointers to arbitrary DSTs in library code. The library solution also sidesteps the potential issues around alignment by reusing the existing unsizing.

If someone comes up with a design that can be stabilized separately, great! But I think that the library solution is both best, easiest, and has the quickest path to stabilization.

system · November 9, 2021, 9:37pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ThinBox::new_unsize should not allocate	8	778	April 22, 2023
Proposals to support DST smart pointers language design	3	285	September 24, 2024
Layout of DST `Box` Unsafe Code Guidelines	11	482	October 22, 2024
Allows trait values that are no larger than the specified size to be assigned on the stack libs	2	693	May 27, 2022
Should `AtomicPtr` be able to point to DSTs? libs	5	755	May 28, 2023

Storing the size in the box?

Related topics