[Pre-RFC] Yet another DST proposal

It'll be good I promise :wink:

Currently the most active DST proposal seems to be #2594. After reading it, my impression is that it tries to mainly address two use cases:

  1. DST's like C-strings or C:s flexible array members
  2. Custom pointers/references for things like BitVec, Ndarray, etc.

Personally I believe n. 2 can be solved with less restrictive Deref, Index and IndexMut traits (for example #2953).

This Pre-RFC addresses the first use case.

DST's with slim pointers

The standard library exposes a new trait (that is auto-implemented for sized types) :

pub unsafe trait KnownSize {
    fn get_size(&self) -> usize;
}

Implementing this trait for T: ?Sized has the following consequences:

  • Pointers and references to T are now one word wide
  • Where previously the size field of the fat pointer would be inspected, get_size is called

That's it. Here is an implementation of CStr:

use std::os::raw::c_char;

// Needed for unsafe code
#[repr(transparent)]
// Syntax is the same as for regular unsized structs
struct CStr ([c_char]);

unsafe impl KnownSize for CStr {
    fn get_size(&self) -> usize {
        // Finds the first null character
        // Similarly to implementing drop we have to be careful not to cause infinite recursion 
        let mut ptr = self as *const Self as *const c_char;
        let mut count = 1usize;
        unsafe {
            while *ptr != 0 {
                 count+= 1;
                 ptr = ptr.add(1);
            }
        }
        count
    }
}

The standard library can have the following struct to make custom DSTs 100% safe

struct SizeWrapper<T: ?Sized> {
    size: usize,
    contents: T,
}

unsafe impl<T:  ?Sized> KnownSize for SizeWrapper<T> {
    fn get_size(&self) -> usize {
        self.size
    }
}

// SizeWrapper can implement this
impl<Idx :?Sized, T: ?Sized + Index<Idx>> Index<Idx> for SizeWrapper<T> {
    fn index(&self, index: Idx) -> &T::Index::Output {
        // This implicitly calls get_size because it takes a reference to a ?Sized + !KnownSize field
        self.contents.index(index)
    }
}
// The same for IndexMut and Deref

the user code can just do

struct MyDST (SizeWrapper<str>);
// MyDST is ?Sized and derives KnownSize

impl MyDST {
    fn foo(&self) {
        println!("MyDST has a message for you: {}", self.0); 
    }
}

But it's still impossible to create and use DST's ergonomically using safe code. For this I propose the following:

KnownSize ergonomics

Types that are ?Sized + KnownSize can be moved, but you can't assign them without binding. What this means is that:

let foo = MyDST::from_str("Hello"); // Ok
let mut foo = foo.push_str(" World"); // Also ok
let r = &mut foo; // Still ok
*r = MyDST::new("This string can't fit") // Error: cannot assign without binding
foo = MyDST::new("Neither can this") // Error: cannot assign without binding

The allowed moves must happen either in function calls (arguments and return values) or when binding values. In both cases the nature of the stack allows them to be unsized, just like in C. Except in async functions ... where the returned future would have to be ?Sized + KnownSize if the state machine ever contains a ?Sized value ... sigh.

Finally, what about dyn Foo? Here's what:

  • Every dyn Foo implements KnownSize (so its really dyn Foo + KnownSize). According to my understanding, this is the case today, as Box for example has to know it's size.
  • If T: KnownSize + Foo, then it's possible to get a &dyn Foo to a T

Unless I've overlooked something, this should be backwards compatible, since KnownSize is just a less strict subset of ?Sized. Only breaking changes are nomicon related ones, such as someone assuming that a ?Sized ptr or ref must be two words wide.

Pros:

  • IMO simpler (to implement?) than the current proposals
  • Allows usable DST:s

Cons:

  • This is really a subset of #2594
  • Cannot work with async fn:s without compromises

I would challenge your assertion that custom pointers/references for things like BitVec and Ndarray can be solved with smarter index traits. I wrote #2594 keeping that in the back of my mind and did think that along the way, but the fact of the matter is that Rust structs to approximate references are horribly verbose and difficult to use correctly due to (lack of) mutable aliasing and reborrow pains.

A very early DST proposal explains this well. One of the big points is that IndexGet, as it currently stands, does not require &mut. Imo, adding a mutable counterpart would substantially increase the complexity of the API as well as dilute the purpose of IndexGet.

1 Like

I still can't see why custom pointers/references with a lot of sugar can't work. For example the whole "Re-borrow semantics" section of your link can be solved by having an implicit (re)borrow in some places.

use std::marker::PhantomData;

struct ColMut<'a, T> {
    data: *mut T,
    len: usize,
    stride: usize,
    _marker: PhantomData<&'a mut T>,
}

impl<'a, T> ColMut<'a, T> {
    fn reborrow_mut<'b>(&'b mut self) -> ColMut<'b, T> {
        todo!()
    }

    fn split_at_mut(self, at: usize) -> (ColMut<'a, T>, ColMut<'a, T>) {
        todo!()
    }
}

fn use_col<'a>(col: ColMut<'a, f32>) {
    todo!()
}

fn example<'a>(col: ColMut<'a, f32>) -> ColMut<'a, f32> {
    let (left, right) = col.split_at_mut(34);
    left
}

fn example_2<'a>(mut col: ColMut<'a, f32>) {
    let (left, right) = col.reborrow_mut().split_at_mut(34);
    use_col(left);
    use_col(right);
    let mut_alias = col.reborrow_mut(); //Or just col
    use_col(mut_alias);
}

fn example_3<'a>(mut col: ColMut<'a, f32>) {
    let (left, right) = col.reborrow_mut().split_at_mut(34);
    use_col(right);
    let mut_alias = col.reborrow_mut(); //Or just col
    //This would error
    //use_col(left);
}

For me the ability to have reference-like semantics for any type is more powerful than having a reference-like implementation. For example it would be possible to have all the following things

// Have you ever sliced a range before?
let cool = (1..=10)[1..3]; //Returns (2..4)
// Or dereferenced it?
let tres = cool[1]; //Returns 3
// All without a pointer to speak of

const VIPs : [&'static str] = ["Joe", "John", "Jonathan"];
// Only three VIPs. Wouldn't it be nice to have a reference type that is only one byte large?

let nd = NdArray::new(); // This can have any number of dimensions
let slice = nd.slice(...); // This slice requires data about an arbitrary number of dimensions...
// So it can be ?Sized + KnownSize
// Or it can contain a Box. That's also fine

The last feature is something that might actually be useful. It's true that implementing custom references this way is more time consuming, but I think it's mostly less restrictive in what can be implemented and also gives the compiler more freedom to work with references and pointers.