I read in this article that using a trait in a generic when creating, say, a vector has complications relating to the compiler not necessarily knowing how big the trait object is. So I propose that there should be syntax to tell the compiler the maximum size of an object in the generic. The compiler can use this information for allocation, and it can also ensure that no types of greater size are used in that generic.
The vector type may specify one maximum size, and the user of a vector may specify a smaller maximum size knowing more about what they expect the vector to contain. Alternatively, the vector type may not specify a maximum size, forcing the user to specify one either with a concrete type as is already supported or with a generic type specifying a maximum size.
One issue with this idea would be that types are usually allowed to grow in size without that being considered a breaking change. While it’s of course already possible to rely on the size_of of a type in a way that breaks when the size changes, offering a language feature that behaves that way would put this whole convention into question. The alternative to a strict limit would be a soft limit of the form “this type will allocate if the trait object surpasses size XYZ”. This kind of abstraction has already been implemented in some 3rd party libraries like e.g. smallbox - Rust, but of course, some things that e.g. Box can offer (e.g. unsizing coercions) cannot entirely be replicated in 3rd party crates yet.
There are a few crates out there that do seem to have the strict limits today: inline_dyn, stack_dst. I haven’t evaluated their quality myself, but I do think it’s cool that this can be done in a library! (The underlying feature that makes this possible is the ptr_metadata APIs.)
I just started picking up Rust three days ago, lol. How does one change the size of a type? My understanding is most types are static and cannot change in size, and DSTs don't really change their size at runtime either because it's more like their sizes are simply unknown at compile time. Are you referring to developers increasing the sizes of their types over the course of development? In that case, could they not use the syntax I proposed to reference those types whose sizes may increase?
What I was talking about is some crate foo defining some struct Bar with (private fields x, y) in version 1.0.0, and the authors then publish a new version 1.1.0 of the crate foo, where Bar gains an additional field z, but the rest of the API stays the same. This new version is supposed to be compatible without breaking changes (Rust is using a concept called “semantic versioning” so that a minor version like 1.1.0, compared to 1.0.0, is not supposed to contain any “breaking” changes). Users of the crate foo that do rely on the precise value of size_of::<Bar>() would however possibly run into compilation errors with the new (private) field of Bar, since it’s size got larger in the update. In this case, the user that relied on size_of::<Bar>() is probably to blame, unless Bar came with additional documentation guaranteeing the size of Bar would not change; but still it’s nice when it’s not so easy to – accidentally – rely too much on the precise size of a type that does not come with such guarantees, so an approach with a less efficient yet viable fallback (to allocation) such as the linked SmallBox is often preferable.
Would that use of size_of::<Bar>() fail to return a bigger size than before when the user recompiles with foo version 1.1.0? I am not sure I understand where that would be a breaking change.
Well… if Bar used to have the same size as a u64, and implements trait Qux, then you could convert it to a dyn Qux sizeof u64, but that same conversion would fail as soon as Bar’s size becomes larger.
I see. I would advise the user to use dyn Qux sizeof Bar if they intended Bar to remain valid in that generic, and then their code would not be broken.
But then it couldn't be passed to the code that expects dyn Qux sizeof u64. If the receiving code knew it is going to get an instance of Bar, it wouldn't need trait objects in the first place.
Merely adding a maximum size to a dyn type would not be sufficient to put them in a Vec, because when you have a value of a dyn SomeTrait type, the vtable pointer for SomeTrait is not stored in the data, but in some pointer to it. That is, completely ignoring the size issue, it's still the case that Vec<Box<dyn SomeTrait>> has a place to put the vtable pointers and Vec<dyn SomeTrait> does not.
We could define the language so that these maximum-sized-dyns store their vtable pointers inline, but that would give this flavor of dyn a very different behavior that isn't just setting the size. In fact, there has been discussion of just such things, with the placeholder syntax dyn*, though with the maximum size set to “a pointer”, not an arbitrary choice.
But we don't need any additional language builtin types to get maximum-size behavior; it can be done in an explicit non-allocating container. For example, it could exist like this:
let v: Vec<InlineBox<dyn Debug, 8>> = vec![
InlineBox::new("hello"),
InlineBox::new(7_i32),
];
where InlineBox<dyn Debug, 8> is a type consisting of a vtable pointer and 8 bytes of storage. (You can declare a constant using size_of to get the proposed sizeof Bar result.)
This looks to be more or less what the libraries @jrose linked are trying to offer, though inline_dyn has unclear documentation so I'm not sure. Presumably it will be possible to make things tidier when ptr_metadata is stable.
The code wouldn't expect Bar specifically, that would just be the size bound. It would expect anything the size of Bar or less.
I suppose I should also mention that I was imagining syntax like dyn Qux sizeof Bar + Foo to indicate that the greater of the two sizes should be taken as the size bound.
What I'm ultimately after here is the use of polymorphism in generics, such that, for instance, a vector of Debug trait objects could contain different implementations of the Debug trait in the same vector. Would the solution you outline facilitate something like that?
A macro could enable declaring this with more concise syntax.
Here's a proof-of-concept that, once ptr_metadata is stable, it will be possible to write such inline boxes as a straightforward, fully generic abstraction. This is a quick sketch; don't use it as production code.
#![feature(ptr_metadata)]
#![feature(unsize)]
use std::{mem, ptr};
#[repr(C, align(8))] // ensure a useful alignment
pub struct InlineBox<T: ?Sized, const MAX: usize> {
storage: mem::MaybeUninit<[u8; MAX]>,
metadata: <T as ptr::Pointee>::Metadata,
}
impl<T: ?Sized, const MAX: usize> InlineBox<T, MAX> {
pub fn new<S>(value: S) -> Self
where
S: std::marker::Unsize<T>,
{
let size = mem::size_of::<S>();
assert!(size <= MAX, "type size {size} must be <= {MAX}");
assert!(mem::align_of::<S>() <= mem::align_of::<Self>());
let mut new_self = Self {
storage: mem::MaybeUninit::uninit(),
metadata: ptr::metadata(&value as &T),
};
let p: *mut S = new_self.storage.as_mut_ptr().cast::<S>();
// SAFETY: `p` is within the allocated memory `new_self.storage`,
// and previous assertions checked that it is large enough and aligned enough.
unsafe {
ptr::write(p, value);
}
new_self
}
}
impl<T: ?Sized, const MAX: usize> std::ops::Deref for InlineBox<T, MAX> {
type Target = T;
fn deref(&self) -> &Self::Target {
let p = ptr::from_raw_parts(self.storage.as_ptr().cast::<()>(), self.metadata);
// SAFETY: The invariants of this type are that this pointer will be valid.
unsafe { &*p }
}
}
impl<T: ?Sized, const MAX: usize> Drop for InlineBox<T, MAX> {
fn drop(&mut self) {
let p: *mut T =
ptr::from_raw_parts_mut(self.storage.as_mut_ptr().cast::<()>(), self.metadata);
// SAFETY: The invariants of this type are that this pointer will be valid.
// Its referent will not not been dropped yet, because that is what we are doing now.
unsafe {
ptr::drop_in_place(p);
}
}
}
#[test]
fn example() {
let v: Vec<InlineBox<dyn std::fmt::Debug, 8>> =
vec![InlineBox::new(false), InlineBox::new(100_i32)];
for item in v {
println!("{:?}", &*item);
}
}
(Those familiar with how dyn normally works may notice that the unsizing is incorporated into new() rather than being a separate operation, which is necessary because unsizing coercion is hardcoded to look for a pointer, but there's no pointer here.)
Another possibility is that if we get something like the currently-in-discussion Store trait system then Box itself could be parameterized so as to be inline, rather than needing a different type for it.