[Pre-RFC] Custom DSTs

This RFC might want to explicitly include what’s guaranteed about the layout of custom DSTs, if anything at all.

There are no guarantees about the layouts of pointers to custom DST, that’s the unsafe code guidelines team’s job.

Your comments in the unsafe code guidelines discussions appear to suggest that the pointer and the metadata need to be disjoint, maybe hinting that unsafe code should be able to rely on, e.g., that the first field of a custom DST is a pointer value.

Do you think it might make sense to support encoding the metadata in the pointer ? For example, implementing a “short slice” where the slice length is encoded in the pointer ?

1 Like

If that were to be implemented, one would have to use Metadata = (). The question is then, is the pointer allowed to be invalid; not the representation of the DST.

In general we wanted to support encoding information in unused pointer bits, but I don't think that discussions considered custom DSTs.

Unsafe code can already assume that a &T always points to a valid T. For DSTs we can specify where in the layout the DST pointer lives. If we allow custom DSTs to contain invalid pointers (which is what encoding information in unused pointer bits produces), then unsafe code cannot assume that just because they get the pointer from a &T this pointer is, in general, dereferenceable. It might do so for concrete DSTs that guarantee that this is always the case though.

There’s no way to, given α : !Sized, β : Sized safely go from &α → &β, in general. Therefore, the pointer that the DST pointer contains doesn’t necessarily need to be valid under any specific .

2 Likes

I don’t really follow. It seems to me custom DSTs should, in fact, follow some discipline wrt the “data pointer” part, because the whole point of custom DSTs is that they are agnostic to the kind of pointer that refers to them and just enrich it with additional metadata in some way.

For example, if we want ThinBox<CustomDST> to work, it doesn’t make any sense to let a custom DST stash something extra in “the pointer” – ThinBox is just a newtype around a plain old *mut (), pointing to an allocation that contains first the metadata and then (after suitable padding) the actual unsized data.

1 Like

When I say “doesn’t necessarily need to be valid”, I don’t mean “we won’t require it to be valid”, just that the design doesn’t require it.

I'm not even talking about invariants right now, I don't even see how "a custom DST that packs some metadata into the 'data pointer' portion of references to itself" could even work in the first place. I guess the idea is to rely on those references being produced and consumed by the impl DynamicallySized for TheDST (and thereby special-casing &T-references over all other kinds of pointers)?

But then what about consumers that need to get the address of the size_of_val()-sized, align_of_val()-aligned chunk of memory that is the DST value? To give just one example, Box::drop needs that address (along with size and align) to deallocate memory, and currently it gets it by extracting the "pointer" portion (discarding the metadata).

There's a way out of this problem: making this conversion from "a &DST what means whatever DST wants it to mean" to "the actual address of the referent" another method of the trait, but at that point it stops being just about dynamic size and starts being closer to overloading the reference type entirely (e.g. tagged pointers that pretend to be regular references are just as attractive for some sized types).


BTW, about this:

This is a quite fragile property. For example, once we decide what to do about padding bytes by e.g. allowing &MaybeUninit<u8> to point to padding bytes (and that this is not only valid but also upholds the safety invariant, in @RalfJung's terminology), the only reason one can't cast every &T where T: !Sized to a &[MaybeUninit<u8>; N] is that one doesn't know a priori what the right N is. But one can guess and check at runtime, so for every N there's at least a sound &T -> &Option<[MaybeUninit<u8>; N]> conversion which returns Some whenever size_of_val >= N. Admittedly there's not a lot one can do with that (because who knows which of those bytes are initialized) but I hope it illustrates that this is quite subtle.

Isn't that what as casts (search for cast_to_thin) are for?

That's not what this is proposing, but... would that be such a bad thing? :slight_smile:

It's somewhat off-topic in this RFC, which is about DST specifically.

Moreover, we already have newtypes, so if you want a reference-like thing, just make your own type and implement Deref and DerefMut. Maybe add some arbitrary-self for extra measure. Why would we want more than this?

1 Like

Yes, my point is that (at minimum) that operation would have to become overloadable as well to make this line of thinking work.

While I'd have to see a specific fleshed-out proposal to say anything about its merits, anything I can imagine in this space suffers from an unconvincing reward vs complexity ratio. In addition to what @RalfJung said, there's no clear way to generalize this sort of pointer bit fiddling to library-defined smart pointers such as Box (well, almost library-defined) or Rc even when one can in principle imagine what a "tagged variant" of such a pointer would look like (which isn't true for all smart pointers).

1 Like

@RalfJung

Moreover, we already have newtypes, so if you want a reference-like thing, just make your own type and implement Deref and DerefMut . Maybe add some arbitrary-self for extra measure. Why would we want more than this?

I don't think this approach would be very ergonomic with respect to user-defined coercions, e.g., from MyDSTref to MyDSTptr to raw pointer +/- mutability axis. Sure all these types could have methods that produces other types, but this does not feel as "built in" as normal references and pointers.

Then there is also the issue that @hanna-kruppe mentions:

there’s no clear way to generalize this sort of pointer bit fiddling to library-defined smart pointers such as Box (well, almost library-defined) or Rc even when one can in principle imagine what a “tagged variant” of such a pointer would look like (which isn’t true for all smart pointers).

Whatever solution to this problem is proposed, it IMO has to work with Box, Rc, and others. How ? I don't know, maybe finer grained separation of concerns would help, but as @hanna-kruppe mentions things get really complex really quick and we get less and less value out of these.

I agree with you @RalfJung that this is tangentially off-topic, but I do think that these problems are related and that whatever solution we end up agreeing on for custom DSTs here will determine how this other problem can be solved, if at all (whether it is worth solving, is another issue).

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.