(Mega-pre-RFC) Reference specialization types (DSTs, proxy-references)


#1

This is a big one. I pretty much wrote the whole RFC after getting the idea, but since this is my first contribution to Rust, I thought I’d post it here to get some feedback before making a pull request.

Rendered

The basic idea is to allow custom reference types to be defined, with a syntax something like:

struct &'a SomeType {
    wrapped_ref: &'a WrappedType,
    index: usize,
    /* Other meta-data goes here. */
}

So SomeType would be defined at the reference level instead of at the value level (there wouldn’t be a definition struct SomeType somewhere else). Of course, this means SomeType is unsized. I think this feature is useful mostly for making proxy-references for custom collection types, but it could also be used to define dynamically sized types. There’s a lot more detail about how it works in the RFC itself.

Full disclosure: although I’ve been following the Rust community for a few years, I only started programming in Rust about a month back, so I’m still very new to the language. I might have used terms incorrectly, made up my own terms where I didn’t need to, etc. The whole RFC might not be a very good idea at all. Please give me all your negative criticism so I can improve this!


#2

Haven’t read the whole RFC yet, but first impression: I’ve never thought of that syntax before and I quite like the idea.


#3

I liked the idea. It would be pretty useful for slicing some custom types like: a string with cached indexes over grapheme clusters or chars.


#4

A previous try on “custom DSTs”:


#5

There is definitely a need for something like this. This solves the issue where you can’t implement Index and IndexMut for some types because you need to return a reference. Previous discussion about that here. The idea of only defining &T and &mut T and not T itself makes the declaration more straightforward and I like the fact that using unsafe code is not needed for this approach. That said I am not sure what the added value of the Reborrow trait would be so I think you should keep it in the possible extensions section and not as a part of this RFC as you considered in the unresolved questions section.


#6

Would love to have custom DSTs! This is an interesting approach.

One thing that needs to be accounted for is the interaction with Unsized Rvalues. From what I understand, they rely on the fact that the compiler has deep enough knowledge of DST layouts that it can move the metadata around independent of the associated pointer.

I also see some syntactic ambiguities around syntax like &T { fields, of, an, rst }, though they don’t seem fundamental to the design.


#7

For two types to be pointer-equivalent with respect to a lifetime 'a, the members of the two types must have the same names and be in the same order.

… and, I presume, be of the same type (?).


#8

@DDOtten The purpose of the Reborrow trait is basically to let &T and &mut T RSTs have different representations. If the consensus is that &T and &mut T should be required to have pointer-equivalent representations always, even into the future, then I agree that Reborrow doesn’t really have a purpose.

@rpjohnst Thank you for bringing that RFC to my attention. I’ll have to read through it a little more carefully, but I think you are right that there could be some difficult interactions. It seems like it depends on a relationship between an unsized type and a Sized counterpart, but many RSTs would not have a Sized counterpart. EDIT: Nevermind, I was just getting the VLA part of it confused with the unsized r-values part.

Also, I’m not sure where the syntactic ambiguity could come from. Do you have a more specific example in mind?

@lordan No, the members do not have to have the same type. They must be pointer-equivalent to each other though. So, for example, &'a i32 and &'a mut i32 have different types but are also pointer-equivalent. I’ll review that part to make sure it’s more clear (maybe I should provide some examples).


#9

I would be very excited for any kind of RFC to this effect.

One thing I find interesting (or amusing/unfortunate) about the specific proposal is how it compares to the existing ability to define unsized types. For instance, today you can define a type like

struct Indexed<I: Idx, V: ?Sized> {
    _marker: PhantomData<I>,
    raw: V,
}

and then you can use types like &Indexed<Node, [T]>, but there’s no way to construct it without unsafe. With this RFC, you would be able to safely construct a dedicated reference type &IndexSlice<Node, T>, but still not &Indexed<Node, [T]>.


#10

A minor syntactic remark: I believe it is quite unusual to introduce a generic lifetime without angle brackets (<'a>). As far as I understand, in your proposition 'a would be any lifetime but T would be a specific concrete type. Correct?

Concerning another aspect I’m wondering if the redundancy between mutable and immutable reference specialization is not once again begging for generic mutability. There was a discussion about this long ago (see also there) but I’m not aware of any proposed RFC.


#11

I believe the duplication of defining & and &mut versions of the “RST”, the restrictions (“pointer-equivalence” etc.) relating those two definitions, and other issues are due to this approach being fundamentally at odds with what makes DSTs “tick”. The whole idea of DSTs (as opposed to what Rust had before – ~str, ~[T], etc.) is completely decoupling the description of the dynamically sized data from the type of pointer used to refer to it. Rather than specifying what Box<str>, Rc<[T]>, and myriad other combinations mean, we separately and independently:

  1. specify unsized types by the metadata that is needed to interpret them (slice length, vtable, etc.)
  2. specify pointer types, which can support all unsized types equally by handling their metadata opaquely

This pre-RFC goes in a completely different direction and therefore loses all its advantages (in addition to being incoherent with the rest of the language). By doing so it of course side steps all problems specific to specifying the metadata, but invites many more in return.

In particular I want to highlight that the problem of &MyCustomDST vs &mut MyCustomDST also apply to any other pointer type (Box, Rc, Arc, {rc,arc}::Weak, the hypothetical Thin<T> which stores the metadata on the heap instead of next to the pointer, etc.) . Even if we granted that not all custom DSTs really need support from all these pointer types, those that do wouldn’t really have a good way to do it – if any way at all – under this proposal.


#12

I don’t think this has been mentioned anywhere; how does this interact with unsizing? I think ironing that out might help to clarify how Box<DST> and friends would work. Would it be possible to define custom layouts for Box<DST>, and other CoerceUnsize types?

I question whether defining &DST and &mut DST independently is useful. Not only can we expect further future reference types like &move and &pin that have been thrown around a few times, but this results in code duplication, when in reality the only difference here is that often the mut version will have a mut pointer instead of a const one. I’m curious what situation warrants a different layout for mut references. I might imagine a more compact syntax like

struct &'a T {
    inner: &'a U,
    mut inner: &'a mut U,
    move inner: &'a move U,
}

(illustrating how this might interact with new reference types, making the churn much less obnoxious).

Also, is the lifetime in struct &'a T really required? Could I write struct &'static T to assert only immortal references are allowed, or struct &'_ T for not naming the reference, if I don’t care about it? Or, why not something like

struct &Slice {
    inner: &'self Container,
    idx: usize,
}

#13

All other pointer types like Box, Rc, Arc use a raw pointer underneath. Because *const T and *mut T are speciefied by &T and &mut T respectively they would already have support. The only type we don’t have in this way is the hypothetical Thin<T>.


#14

Thanks for all of the feedback. I am trying to take it all into consideration, and I will be making modifications to the original RFC.

@burakumin In this proposal, T is a concrete type that does not have a definition. It can be used as a type, but values of type T cannot be constructed. Only the &T and &mut T can be constructed.

@rkruppe You have touched upon something which I have thought about, though I didn’t write anything about it in the proposal. If RSTs are added to the language, being able to define &T allows a Box<T> to be constructed, even though Box<T> wasn’t given a definition. Since Box<T> contains a pointer internally, using the Box::from_raw method will create a Box<T> from a *mut T. Same applies to other smart pointer types (I think). From what I understand, references are all you need.

EDIT: Looking into it further, and I’m not confident that an Rc<T> can be created from the RST &mut T. This is a pretty big issue, so I’m going to have to spend a lot more time learning about this.

Also, this RFC was not meant to be primarily a DST-oriented one. Types for which the concept of “size” doesn’t even make sense can be defined using RSTs. My original motivation for this was proxy-references, and DSTs were a happy accident that fell out. So, I’m not surprised that this RFC misses some of what is essential to DSTs, but I would like to improve it so that it can capture that essence.

@drXor I agree that specifying &T and &mut T independently might not be useful enough to include in the RFC. However, there are certain situations where you might want different kinds of mutability. For instance, if more than one reference is wrapped:

struct &'a T {
    wrapped_ref_1: &'a Type1,
    wrapped_ref_2: &'a Type2,
}

// Only one reference is mutable.
struct &'a mut T {
    wrapped_ref_1: &'a Type1,
    wrapped_ref_2: &'a mut Type2,
}

I mostly just wanted to leave things open for the future, but I agree completely that a syntax such as yours could do the job just as well.


#15

First of all, even if that solved the problem of how to lay out these smart pointers and the like (it doesn’t for Arc and Rc), that doesn’t explain how you would construct a Box<RST>. (more on this below in reply to @duanebyer)

But there’s an even more fundamental problem with any smart pointer that isn’t a newtype around a pointer-to-T. Rc, for example, isn’t a *const T, it’s a *const RcBox<T> (I’m glossing over non-nullness for exposition), where RcBox<T> is roughly { strong_count: usize, weak_count: usize, value: T }. If you only specified what *const RST means, how do make sense of *const RcBox<RST>? The right answer is to “float out” the metadata (storing the unsized value in the RcBox and putting the metadata into Rc), but how could you arrive at that conclusion under this proposal. If anything, I would expect that the author of RcBox can specify a completely different and incompatible “reference specialization” for &RcBox<_> (and therefore, for *const RcBox<T>), but that would completely break Rc (and Arc, btw).

The only way I can see all this working out is if we analyze the definition of the RST and extract the metadata from that and apply it to every kind of pointer as usual – but

  1. if that worked, we wouldn’t need separate definitions of &RST and &mut RST either, and
  2. if we do that, we’re back to the usual custom-DST proposals, we just invented special magic “by-example” syntax for describing the metadata instead of using e.g. the trait system as previously proposed

I don’t see how that works. Box needs to own heap-allocated memory, constructing values behind references doesn’t give you that. You can of course manually and unsafely allocate some memory and initialize it, but then how do you construct the metadata to go along with the pointer? (The custom DST proposals has some way to construct a fat raw pointer from a data pointer and metadata, and it’s completely natural. I don’t see how that carries over to RSTs.)

Granted, existing DSTs have a similar problem, e.g. Box<str> or Rc<str> only work because there’s unsafe code in the standard library for those specific combinations of types. But at least the DST perspective makes it obvious what AnySmartPointer<AnyDst> should mean, and the custom DST proposals make it more feasible to do such constructions in user code.

Yeah the memory layout is very different (see above).

I don’t really understand this distinction. All the examples in the RFC fall under the umbrella of custom DSTs. I guess this is referring to the briefly-mentioned “extension” of allowing &RST and &mut RST to have very different representations? I don’t see motivating use cases or even many details (the section titled “extension” instead discusses reborrowing), but that road does not seem promising to me at all. It’s the complete opposite of DSTs (which make all pointers to the same type alike), and thus suffers from all the problems DSTs solve, such as enabling a wide array of smart pointers to be used automatically.


#16

I’m pretty sure the problem you describe is already mostly solved automatically by the compiler. See std::ops::CoerceUnsized and its RFC. Currently, Unsize implementations are compiler-generated, so we’d need to modify it to allow for user-defined types.

Mind, this not how things like Box<str> and Vec<T> -> Box<[T]> are handled, but I think this is the correct avenue to solve this problem.


#17

A DST pointer can be cast to a thin pointer:

let f: *const [u8] = &[1, 2, 3, 4, 5];
let g: *const u8 = f as *const u8;  // fat-to-thin cast
let h: *const u32 = f as *const u32;  // same pointer as above even if different type

How does this work with RSTs?

struct &'a T {
    wrapped_ref_1: &'a Type1,
    wrapped_ref_2: &'a Type2,
}

let f: *const T = ...;
let g: *const Type1 = f as *const Type1; // ??
let h: *const Type2 = f as *const Type2; // ??

#18

I assume by “the problem” you mean constructing a Box<RST>, not the other issues? If so, consider:

  1. To construct a Box<DST> by unsizing, you first need to create a Box containing something sized, and then extracting the metadata from it and unsizing it. For trait objects, the sized type is the concrete type implementing the trait. For slices, the sized type is a fixed-size array. What’s the thing you construct to then unsize into some arbitrary RST?
  2. If we want to allow user-defined unsizing, we still need to give people the tools to actually write the code for that – e.g., a way to construct *const SomeRST from a thin pointer and the metadata that SomeRST requires. Those tools mostly what I was getting at. CoerceUnsized itself is just an interface, choosing to implement it (instead of doing the conversion e.g. in a function) doesn’t solve any of those problems.
  3. Besides the compiler-generated impls for trait objects and slices, the primary feature of CoerceUnsized is that it’s a coercion (as the name says), i.e., automatically and implicitly applied. Many people don’t want that for expensive conversions such as Vec<T> -> Box<[T]> (which in general needs to reallocate) or String -> Rc<str> (which always needs to reallocate).

#19

I think the root of the problem is that RSTs are not strong enough to represent DSTs. A DST has got a buffer in memory somewhere where its data lives, but an RST could have its data spread throughout memory. So, @rkruppe is absolutely right that an RST cannot be stored in a Box<T>. I wish I had thought about this more carefully before making this proposal.

There might be a way to represent a DST as a special kind of RST though. What if a trait like this was added to the proposal?

unsafe trait DST {
    // Some `Copy` metadata.
    trait Meta;
    unsafe from_raw(ptr: *const u8, len: usize, meta: Meta) -> &Self;
    fn as_thin_ptr(&self) -> *const u8;
    fn len(&self) -> usize;
    fn meta(&self) -> Meta;
}

Then the trait could be implemented by an RST, which would be guaranteeing that its underlying data was at a specific location in memory with a specific size. This could then be put into a Box.

RSTs that are not DSTs would include things like proxy-references:

// Packages items from two different vectors together. So, it isn't a DST, since the
// data lives in two completely different locations in memory.
struct &'a SomeType {
    vec_a: &'a Vec<A>,
    vec_b: &'a Vec<B>,
    index: usize,
}

#20

I’m still not sure what these “reference specialization types” are, if they aren’t DSTs. From the mechanisms described, I can infer how they differ from DSTs, but what are they? What are typical examples? What problems do they solve? How would they be used? When and why would I use one instead of a custom DST? etc.