Pre-RFC: PhantomUnsized and thin pointers

This is very unbaked.

TL;DR: add std::marker::PhantomUnsized, an "unsized ZST" marker type.

This allows for custom thin pointers to data. For example:

struct CStr {
    raw: c_char,
    unsize: PhantomUnsized,
} // and CStr things

struct Utf8Codepoint {
    raw: u8,
    unsize: PhantomUnsized,
}

impl Deref for Utf8Codepoint {
    type Target = str;
    fn deref(&self) -> &str {
        let len = utf8_length(self.raw);
        let slice = slice::from_raw_parts(&self.raw, len);
        str::from_utf8_unchecked(slice)
    }
}

impl Utf8Codepoint {
    fn as_char(&self) -> char {
        self.chars()
            .nth(0)
            .unwrap_or_else(|| unsafe { debug_unreachable!() })
    }
}

How is this different from extern type?

I'm not really certain. Mostly, I see it as the difference between extern type being the void in void* (i.e. "something I know nothing about") and PhantomUnsized being for things like turning *const [T] into *const (T, PhantomUnsized) where it is more cleanly "*const T but unsized". Also, PhantomUnsized is a smaller change that could be pushed through quickly (in theory).

[I currently have a struct Character { raw: str } in windex, but have been considering if making &Character a thin pointer would be better. Probably not, thinking about it after writing this, but PhantomUnsized is still an interesting minimal addition.]

1 Like

I think it would be better if Custom DSTs were introduced into the language because this can only handle the case where there is no meta-data (thin pointers). But this may have its uses as a short term solution.

2 Likes

As far as fat pointers go, I really liked the idea to support const generic erasure.

e.g. you have &MatrixSlice<4, 4> for a 4x4 matrix, and &MatrixSlice<dyn, dyn> for a matrix where the dimension data is "hoisted" from static to the fat pointer metadata. (So [T] is [T; dyn].)

Ignoring syntax issues for now (the time I saw this suggestion, it was brought up that e.g. &&[T; dyn] is ambiguous to which reference should be fat), are there DST use cases that wouldn't be covered by one of these?

I think extern type, PhantomUnsized, and "dyn const" cover the three types of custom DST (unknowable, inline/thin, and fat, respectively). And each of these is (in theory) fairly simple, in comparison to the full-fledged custom DSTs.

1 Like

I wouldn't call dyn const simple, its semantics can be confusing and rather arbitratry. I would not like to see that even if we never get Custom DSTs in any other form.

1 Like

So this type is unsized but it also has size "at least 1" (similar to e.g. (c_char, [c_char]). Does that mean if we apply our rules for references being dereferencable etc., that an &CStr must point to at least 1 byte of valid memory? That the compiler is allowed to insert spurious loads of that bytes? That it is UB to mutate that byte because it is pointed-to by a shared reference?

I think all of that is the right semantics. (It's looser than the current fat pointer CStr as well.)

And that I'm thinking about it now, ([T; 0], PhantomUnsized) would be an interesting translation for VLA. It basically says the same thing as the VLA "trick" in C: align this to T, may contain data after the "main" sized part of the struct that you have to handle unsafely.

For the most part, I think the obvious semantics of "make this ?Sized, use unsafe to track whatever data is in the unsized portion" works correctly, which is why I think this is a fairly minimal addition.