Idea: Describing dynamicly sized types using const generics

I’ve noticed that a large chunk of the use cases for dynamicly sized types boil down to forgetting some information about a more specialized generic type. An example from the standard library is slices and arrays. Here an array is a generic type with a type parameter and a const parameter:

type Array<T, const N: usize> = [T; N];

Slices represent what we get if we make the size parameter dynamic and stored in the pointer rather than the type. The idea is to create a language feature to express this pattern of type construction.

Currently I think the most appropriate syntax would be overloading the dyn operator so that:

type [T] = dyn<const N: usize> [T; N];

Here dyn moves the information required to recover the specific type over to the pointer. This is consistent with the current use of dyn for trait objects. In trait objects the dyn signifies moving the type implementing the trait from the type to the pointer in the form of a vtable. In my proposed syntax we get the equivalence:

dyn Trait = dyn<T: Trait> T

This syntax also create new type constructs, such as:

Box<dyn<T: Debug> (String, T)> //A box containing both a string and a trait object.

This generalization of the dyn operator would allow libraries to only implement list extensions types for arrays and vecs, and to only implement things for the resulting slice type. In a linear algebra library, vectors and matrices could be defined as:

struct Vector<T, const N: usize>([T; N]);
struct Matrix<T, const N: usize, const M: usize>([[T; N]; M]);

Using these definitions, dyn<const N: usize> Vector<T, N> would be a dynamicly sized vector, and dyn<const N: usize, const M: usize> Matrix<T, N, M> would be a dynamicly sized matrix.

These new types could help rust’s story of !Sized types via treating them as more specific types that we simply forget some information about. Thus enabling safe stack construction of !Sized types.

11 Likes

We can already construct !Sized things on the stack.

let x = [1, 2, 3];
let y : &[u32] = &x;

Now, you can’t have a !Sized value on the stack without indirection, but that won’t change with your proposal. There is already a proposal to do that, unsized_locals. Also, your syntax doesn’t make it clear how a fat pointer to your types will be constructed, or how to make custom DSTs. If you go down that route, we already have q RFC for Custom DSTs. So I’m not sure what the point of this proposal is.

Yes you can construct !Sized types on the stack. My point is that we already use this exact pattern to do so. The syntax could be improved, but this seems like a natural extension to the language and would reduce the need to handle actually !Sized types on the stack.

1 Like

As for the fat pointer. The required metadata is the part after dyn. For types, this becomes a pointer to a vtable, and for constants it is the value of that constant.

1 Like

This feels like a logical generalization of dyn, and is a great utility for the simple case of a wrapper type around a slice.

If I’m not mistaken, this would make, for the !Sized std types,

  • dyn (Bounds) <=> dyn<T: (Bounds)> T
  • [T] <=> dyn<const N: usize> [T; N]
  • (str is still a primitive, but could at some point in the future be str([u8]) maybe)
  • (CStr is still CStr([c_char]) until it can actually be \0-terminated)

This is basically a way of describing fat pointers within the language, as the dyn<..> data is what’s in the pointer “extras”.

For cases wrapping arrays, this is invaluable as a way to get !Sized versions for free. See ndarray’s IxDyn for an example. With being able to “forget”/“outline” the const variable, Dim could theoretically just be (if you forget about Dim<Vec<_>>) Dim<const N> and Dim<IxDyn> becomes dyn<const N> Dim<N>.

More concretely, a type that wraps [_; N] gets the unsized variant in dyn<const N>, and we gain the ability to do dyn<const N, M> [[_; N]; M] where it didn’t exist before. Fewer types need to create a TRef type.

Whatever scheme this takes needs to be sure not to accidentally allow observing CStr's incorrect implementation. This is simple, and required to work with further unsized types beyond just forgetting a const variable: only allow forgetting const parameters which are actually listed.

Here’s a sketch of what this could look like, in guide-level explanation form:


Unsized types

Unsizing const parameters

The most common way to obtain an unsized type is to “forget” const parameters of a type. Let’s consider the case of a function to sum an array:

fn sum<const N: usize>(arr: &[f64; N]) {
    let acc = 0;
    for n in arr {
        acc += n;
    }
    acc
}

This is the generic, monomorphized, statically dispatched version. To instead work with arrays of unknown length until runtime (such as vectors), we can be dynamic over the length:

fn sum(arr: &dyn<const N: usize> [f64; N]) {
    let acc = 0;
    for n in arr {
        acc += n;
    }
    acc
}

This puts the length of the array on the stack with the reference, such that this single function works for any length of array.

This concept is so important that we give these a dedicated name – slices – and syntax – &[T].

6 Likes

cc @eddyb who had ideas in this direction?

I agree that dyn<...> TypeExpr seems like a natural extension to dyn BoundExpr using a sort of sigma-type explanation for &[T].

In preparation for Rust 2018 we also ensured that dyn<...> TypeExpr is viable syntactically so we should be good there.

2 Likes

This is the best proposal for unsized types I have seen so far. It clearly shows what part of the type should be stored as metadata in the pointer without requiring boilerplate. I find it interesting that it can also be used to define sized types with pointer metadata:

dyn<const META: i32> (data: i32)

although I can’t immediately think of a use for this.

You’d need to put META in a PhantomData or otherwise “use” the variable. And as a use, consider

struct StrideArr<T, const Len, Stride> {
    raw: [T; Len],
    stride: PhantomData<Stride>,
}

You can now have &dyn<const Len, Stride> StrideArr<T>.


Unrelated interesting thought: this would also now allow the very silly type of &dyn<T: (Bounds)> &T. Maybe it could also be used more practically for splittable vtables? &dyn<T: (Bounds)> dyn<T: (Bounds)> T

2 Likes

Several thoughts:

As implemented right now, const parameters don’t need to be used, I think, so you don’t need to do this whole silliness with the PhantomData.

Is the story for handling dyn<T: Tr, U: Ur> (T, U) the reasonable thing you expect? I haven’t been following unsized lvalues closely, but I assume this is an easy corollary of that work.

Also: given how wordy dyn<const N: usize> [T; N] is ([T] and other type aliases notwithstanding) I would like to propose adding the terminal dyn to both the type and the “const parameter” grammars, so we can write K<dyn> for dyn<?const T: ty|bounds> K<T>. E.g., type [T] = [T; dyn].

As a corollary, I’d suggest allowing dyn<T> T (which dyn in type position would desugar to), which is a pretty nonsense but otherwise harmless type.

Furthermore, what does dyn<T: Tr> commute with? I get the impression that dyn<T: Tr> K<T> and K<dyn<T: Tr> T> are the same thing, at least when K = &. Should they be equal as types? Does this hold for dyn<const X: T>?

2 Likes

This seems suspiciously similar to dependent types.

I like it!

2 Likes

This was also my first thought on the matter, but unfortunately dyn does not commute at all. Simple example in current code would be &&[usize]. Here the outer pointer is just that, while the inner pointer is a fat pointer containing both a pointer and the length. If we consider the equivalent type in the proposed syntax, this would become &&dyn<const N: usize> [usize; N]. If we move the dyn to the outer pointer, we instead get a fat pointer pointing to a normal pointer, which is quite different.

Yes this is correct since there's no variance to speak of.

2 Likes

That’s a good point, and I think that makes me a bit concerned for this proposal, since the reason for non-equality is extremely subtle.

This makes me want to invert my previous proposal to make bare dyn the sole primitive, potentially requiring dyn $ty for the const version, i.e.,

// Static dispatch
fn f<const N: usize>(x: [T; N]);

// Dynamic dispatch
fn f(x: [T; dyn usize]);

I think this doesn’t step on the custom DSTs aspect’s toes, since you’d now write

struct Matrix<T, const N: usize, const M: usize> { [[T; N]; M] }

type MatrixView<T> = Matrix<T, dyn usize, dyn usize>;

On a related note, how does this interact with const fns?

struct Matrix<T, const N: usize, const M: usize> { [T; N * M] }

// What is the layout of this type?
type MatrixColView<'a, T, N> = &'a Matrix<T, N, dyn usize>; 
2 Likes

This creates the same kind of problems. Consider

fn f(...) -> Box<Arc<dyn Display>>

Currently this is completely disambigues as it obviously referes to the Arc. But making dyn possible everywhere, like your matrix view, it becomes ambiguous as to what pointer the data should be stored in.

Box<Arc<MatrixView<T>>

Should we read this as the metadata being stored on the box or on the arc. Even worse, consider:

Box<Arc<Matrix<T, dyn usize, dyn usize>>>

We would like this to read the same, but that requires knowing which types are pointers and which aren’t.

This is why I think we have to move the dyn out to the pointer we want to add to.

Box<dyn<const N: usize, M: usize> Arc<Matrix<T, N, M>>> // the size is stored in the box
Box<Arc<dyn<const N: usize, M: usize> Matrix<T, N, M>>> // the size is stored in the arc
3 Likes

Let's walk before we run. Let's first fully figure out the static and dynamic semantics of dyn<...> TypeExpr before we consider giving it sugar.

4 Likes

The intention is that this makes it pretty clear that the answer is "yes, the Arc". I don't see why you would like to control where the size lives; this seems like an optimization that we should be teaching the compiler's codegen about, not having users worry about.

We already do, after a fashion: CoerceUnsized in std::ops - Rust

These are both staticly dispatched functions (atleast if I remember right). Dynamic dispatch only comes in once you call a function defined differently depending on the metadata.

I disagree that we always want the innermost pointer to store the metadata. Consider making a dual view into two arrays.

dyn<const N: usize> (&[T; N], &[U; N])

Here by moving the metadata outside the pointers themselves, we ensure at a type level the the two slices are of the same length. This could be used to unify their bound check, or allowing exact zipping of the values.

1 Like

I feel the need to ask: is this a problem people actually have? I think that we should weigh the value of trying to solve this problem, versus introducing confusing syntax (remember: dyn<> does commute past !CoerceUnsized types, which is definitely a source of confusion).

2 Likes

One potential solution to the commutitive issue would be to only allow introducing a dyn at the level of the indirection (i.e. where ?Sized is allowed, or equivalently(?) at the parameter to the CoerceUnsized). Then it's only allowed where dyn Trait is, we just allow more selective forgetfulness.

Unfortunately, that means that the entangled slices

needs to be either (using & as the target) &dyn<const N> ([T; N], [U; N]) (forcing them to be next to each other) or &dyn<const N> (&[T; N], &[U; N]) (a fat pointer to two thin pointers).

How would you work with dyn<const N> (&[T; N], &[U; N]) anyway? By the intuition that dyn unsizes a type by putting information in the pointer metadata, you'd have to put it behind an indirection. If not, you're expecting to basically get (N, &[T; N], &[U; N]) as a Sized parameter, which is entirely unprecedented (in Rust at least). The type is definitely not !Sized, but you've got the additional metadata that needs to live somewhere.


Three small questions:

  • Should this be opt-in? (dyn const _? :laughing:)
  • Would using for<..> quantification make more sense over introducing dyn<..> quantification?
  • Is "smallest bound" (dyn usize, dyn Trait) enough? (I.e. do we want &dyn<const N> &[_; N] to be expressible? Exotic cases can still use custom DST when they arrive.)

Minor note: I've been leaving type bounds off of const parameters just as a shorthand.

1 Like