`DerefMove` as two separate traits

tema2 · January 26, 2022, 6:57pm

Alternatively, we split DerefMove into two traits:

/// trait for moving out of container
trait DerefGet: Deref {
   /// This return layout info of data stored by container
   fn layout(&self) -> Layout;
   /// This returns an actual data pointer. Also, `Unique` from `core::ptr` may be used here.
   fn deref_get(&self) -> *const <Self as Deref>::Target;
   /// This is intended to dispose **empty** container
   unsafe fn drop(*mut self);
}

///trait for moving in a container
trait DerefSet: DerefMut {
   /// prepare the memory region
   fn prepare(&mut self,layout: Layout);
   /// move the value in. Here I would like to use `Unique` too.
   fn deref_set(&mut self,*mut <Self as Deref>::Target);
}

And the interesting part:

...
let data: Box<[u64]> = ...;
let mut data_moved = Box::new();
*data_moved = *data; // tada! [1]
//let data_moved = box *data //this woun't work as of there is no thing which we `DerefSet` yet. [2]
...

[1] What is actually happening is:

dereference operator calls layout, gets it, and calls recievers prepare with the desired layout. container itself decides how to manage its memory, for example, Box has two options: reallocate and reallocate_in_place... ;
after that we get the pointer from data by deref_get and pass in to deref_set of data_moved;
the final thing to do is to dispose data, but because of one of containers fields has been moved out, for soundness it needs a different drop path, which is unsafe fn drop(*mut self) from DerefGet is.

[2] If we go that far, we face the need of Placer trait and placement features, exactly as it used to be in past...

The problem of fallible allocation (can be probably be found when DerefSetting smth. like [1;10*1024*1024*1024]) is still there, however:

The vast majority of use cases deal with Sized data => Layout returned from DerefGet::layout is going to be a const.
Also, I don't see any adequate way to support fallible allocation without some crazy syntax and some kind of opinionated signaling involved; given the previous bullet I don't think that in such case we need to do anything else than a panic or handle_alloc_error.

A Sized data example

let mut b: Box<[u8;10]> = ...;
*b = [10;10]; //Here we used `DerefSet`, layout passing is optimized out.

The worst about this is that we don't have fine grained error handling in case of DerefSet allocation fails.

SkiFire13 · January 26, 2022, 8:12pm

First and foremost, what's the goal here? What's the problem you're trying to solve?
- I thought DerefMove's goal was to allow (partially) moving out of a Dereferencable container, but this has nothing to do with unsized types I see here, nor with partial moves.
DerefGet needs to be unsafe (the layout might be wrong, deref_get might not return a valid pointer)
deref_set needs to be unsafe (the caller might not provide a valid pointer)
Can't you get the layout from the raw pointers since they need to be valid anyway?
How is the snippet desugared to use these traits?
Possible allocation when dereferencing is a deal breaker for me.

tema2 · January 26, 2022, 8:43pm

That's the serious problem; any thoughts? This is designed to allow moving around unsized data, and currently, this cannot be opted out. Do we a need a way for this?

These are part of Deref trait family and, i guess, are allowed to have pretty non-trivial contract:

all Layout s that go in and out both traits must be valid.
pointers are valid; Moreover, these pointers own what they refer to, we just don't have &move T references. Maybe mentioned Unique<T> would be better?

becomes:

let data: Box<[u64]> = ...;
let mut data_moved = Box::new();
{
   let mut data = data; //consume data;
   let layout = DerefGet::layout(&data);
   DerefSet::prepare(&mut data_moved,layout); //here is where allocation error might occur
   DerefSet::deref_set(&mut data_moved,DerefGet::deref_get(&data)); //and it's not going to work - we need to change the signature of `deref_get` to reflect ownership transfer
   unsafe { DerefGet::drop(&mut data); /*call the drop glue, and no `Drop::drop` */ }
}

The motivation is to allow Box be non magical struct, improve other owning container convenience, allow them to hande unsized data.

Edit: if data contains Sized data, the desugaring is even simpler: As of T: Sized then its layout is known from both pointers and references, actual Layout that must be returned DerefGet::layout is known at compile time.

...
{
   let mut data = data;
   DerefSet::deref_set(&mut data_moved, DerefGet(&data));
   unsafe {DerefGet::drop(&mut data); /* call drop glue*/ }
}
...

This avoids risk of allocation, if it is unnecessay.

tema2 · January 26, 2022, 10:12pm

I'm not happy with deref_set being unsafe because it's called in language level desugaring...

As with DerefGet - yes, it is better for it to be unsafe.

More cases to note:

sized data & dynamically sized place:

let data = Box::new([0u64;16]);
let mut data_moved: Box<[u64]> = Box::new();
{
   let mut data = data;
   unsafe {
      DerefSet::prepare(&mut data_move, Layout::for_value(&*data));
      DerefSet::deref_set(&mut *data);
      DerefGet::drop(&mut data; /*drop glue call*/)
   }
}

Best about this is that implementor type itself decides, what to do if it is requested to store less or more bytes than it has.

unsized data & sized place

I want to forbid it, due to risk of not fitting data into place.

unsized data & the stack

In this case we use unsized_locals feature.

sized data & the stack

let data = Box::new([]);
{
   let mut data = data;
   unsafe {
      let data_ = core::ptr::read(DerefGet::deref_get(&data));
      DerefGet::drop(&mut data);
      data_
   }
}

chrefr · January 27, 2022, 1:25am

This is not special; e.g. Pin::new_unchecked() is used in the desugaring of async.

chrefr · January 27, 2022, 1:28am

While I definitely support making Box less magical, it doesn't allow (AFAIK) moving out unsized values, so a simple trait DerefMove: Deref where Self::Output: Sized { fn deref_move(self) -> Self::Output; } is enough.

And even if we want to allow unsized values, something like 1909-unsized-rvalues - The Rust RFC Book seems more appropriate.

spunit262 · January 27, 2022, 9:06am

It does and it's visible on stable with Box<dyn FnOnce>, though is important to note that it's only logically moved, not physically moved. Physically moving a DST is currently only possible with the currently unsound unsized_locals feature.

This is incompatible with the logical move that Box<dyn FnOnce> currently does as FnOnces need to modify themselves. &mut -> *mut is the minimum, but as the container will be in a fragile state after the move it should be *mut -> *mut.

This should take a Layout to allow the possible future addition of thin DST as they will need to read themselves to get their layout, but will be in a invalid state when this is called.

tema2 · January 27, 2022, 9:49am

The container will be dropped right after move out, see desugarings. Of course, it would be nice if we would be able to reuse the container, reinitialize it somehow.

Where can I read about thin DSTs?

SkiFire13 · January 27, 2022, 9:56am

I think this should not be the default behaviour. A good behaving *data_moved = *data should never allocate.

Moreover I don't see the problem this is solving. What's the problem with data_moved = data or data_moved = std::mem::take(&mut data)? You're just introducing an additional, useless, allocation.

Being a part of the Deref family doesn't magically introduce any contract. That's done through function signatures and unsafe annotations.

This isn't currently addressed, at least for the most part. Box is special because dereferencing it creates a place that you're allowed to move out from. This is different from just moving the T out of it, which can be done with a simple fn into_inner(self) -> T function, for example it allows you to:

move out of some fields, but not others, leaving the T in a partially initialized state (just like it's possible for stack allocated stuff);
reinitialize those fields, without having to allocate new Boxes or moving the fields that weren't previously moved;
dropping the other fields in place, without moving them to the stack.

Moreover you can create mutable references to different fields without saving the result of deref_mut in some variable first. See this playground for example

So, how are these snippet desugared under your proposal? How can they keep their current semantics and performance?

struct Foo {
    s1: String,
    s2: String,
}

let mut b = Box::new(Foo {
    s1: String::from("s1"),
    s2: String::from("s2"), 
});

let _s2 = b.s2; // consume only one field

// Here `b` is partially initialized. Note that the `Box` is not deallocated!

b.s2 = String::from("s3"); // You can also put stuff back!

let s1 = &mut b.s1;
let s2 = &mut b.s2;
consume((s1, s2));

consume(b); // and now you can use `b` again because it's fully initialized again.

struct Unsized {
    s: String,
    rest: [u8],
}

let mut b: Box<Unsized> = todo!();
let _s = b.s; // only moves the sized field to the stack!

What are the current problems with them? And how are they solved by your proposal?

So how do you handle an user calling DerefSet::deref_set(&mut some_box, std::ptr::null_mut())?

tema2:

let data = Box::new([0u64;16]);
let mut data_moved: Box<[u64]> = Box::new();
{
   let mut data = data;
   unsafe {
      DerefSet::prepare(&mut data_move, Layout::for_value(&*data));
      DerefSet::deref_set(&mut *data);
      DerefGet::drop(&mut data; /*drop glue call*/)
   }
}

How is this any better than data_moved = data? It even does an additional allocation! What problem is this even solving?

That's an incomplete feature that's nowhere near being done due to limitations in LLVM and other problems. Your feature should likely not depend on it.

tema2 · January 27, 2022, 12:56pm

Recently, I was writing an application where I needed a List<dyn Any>: I coudn't make it to work because of the list contents was unsized. Now I want to add a mechanism to allow underlying Node type to be moved out.

Indeed, now I see your point. What do you think about safe trait with all methods being unsafe? What would you expect from such a thing?

It works like I described only in case of moving sized value into unsized place, if both the value and the place are Sized, then its no different from what you are comparing it to.

That's really an argument against such split. I haven't come up with any scheme for splitting the work in code from playground. The best I could mind of is just making DerefGet::deref_get to produce a place... But then it's old DerefMove, for Sized types... It seems that what you have requested will require a lot ground work.

I think, moving out with DerefGet onto stack should be part of unsized_locals feature: it trivially uses this feature, but involves creation of an unsized local. E.g. this ability should be available only with both features active.

mathstuf · January 27, 2022, 1:00pm

I think they need to be unsafe because what is the state of things after this?

let layout = DerefGet::layout(&data);
target.prepare(layout);
// `deref_set` is never called.

What is:

the state of data?
the state of target?
the state of whatever was in target before?

So even if there were types to do things like "guarantee this pointer was returned from DerefGet or the like", I think the functions need to be unsafe. What if these impls panic!? Will the Drop impls be able to handle that case?

SkiFire13 · January 27, 2022, 1:25pm

That seems different than Box's magic, and I don't see what's blocking you from implementing yourself, other than not being able to use the *foo syntax.

unsafe on a method means that the caller needs to prove some invariants, but not the method;
unsafe on a trait means that the implementer needs to prove some invariants, including on the methods, but not the caller.

So DerefSet::deref_set needs to be unsafe because the caller needs to provide a valid pointer, and the DerefGet trait needs to be marked unsafe because the implementation must return a valid pointer in DerefGet::deref_get.

Then as long as data contains a Sized type then data_moved = data is better than what you're describing.

There's a reason Box is still special after all...

tema2 · January 27, 2022, 1:33pm

Not changed, just got layout;
Up to the target: most probably it should make its underlying memory be able to hold the value of data; so, realloc(_in_place), dealloc\alloc pair, erasing old value (with drop) - these should be fine.
Up to the target: forget, drop_in_place, send it somewhere, etc.

In case of panic after (of inside of) the prepare method, nothing happens - move itself hasn't been performed. The boundaries I imagine are following:

DerefGet::deref_get is assumed to deinitialize the data, so no drop shall be called.
DerefSet::deref_set is assumed to (re)initialize its target by its end, so also nothing special.
The case of deref_set panics itself is harder: probably another drop method in the trait, aka drop_empty...

Lossy boundaries for library authors, but good ones for actual consumers

tema2 · January 27, 2022, 1:45pm

Now, it appears to me that I have chosen wrong separation of concepts - instead of DerefGet and DerefSet there may be DerefMove and DerefUnsized:

trait DerefMove: Deref {
   fn deref_move(self: Unique<Self>) -> Unique<<Self as Deref>::Target>; //`Unique` is from `libcore`
   unsafe fn drop_empty(self: Unique<Self>);
}
unsafe trait DerefUnsized: DerefMove {
  fn prepare(&mut self,layout: Layout);
}

Now, we can opt out from working with unsized values, share more methods found out to be required during discussion, and have situation with unsafe clear.

Edit: DerefUnsize::get_layout got removed as of this

mathstuf · January 27, 2022, 2:06pm

So…a type implementing this has to stash the Layout somewhere then. Do I need to have an Option<Layout> in my type to track whether prepare has been called or not? If that's all it does, why is it its own method instead of "also pass Layout to what needs it"?

tema2 · January 27, 2022, 2:20pm

Imagine you store an unsized slice, to move it somewhere, you need to first notify that place, that it's expected to have certain ammount of memory cleaned.

I haven't picked this way because I wanted to have "failed to prepare => no move occurs", as the API currently stands it's impossible to handle preparation error, but I wanted a room for this.

In bad cases, when we cannot recompute layout each time we use DerefGet, yes; However the implementation for slices (and trait objects?) might be cheap enough.

SkiFire13 · January 27, 2022, 2:40pm

I still don't understand what's this problem for Layout. As I already said:

Can't you get the layout from the raw pointers since they need to be valid anyway?

A valid raw pointer allow you create a reference and use it to call Layout::for_value. Not to mention that there's an unstable Layout::for_value_raw

tema2 · January 27, 2022, 2:59pm

Thanks, I took a look at these functions in past, but thought that there may be surprising cases when these may produce wrong layout, so I, influenced by old placement feature, decided to write my own layout exchanging. Fixed the newer API.

system · April 27, 2022, 2:59pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
DerefMove without `&move` refs language design	6	1115	January 24, 2023
Place implicits for moves language design	5	502	February 8, 2024
DerefMove without &move -- why don't we have it? language design	7	782	January 11, 2024
DerefMove without move references aka `Box` magic for user types language design	5	702	March 10, 2024
`DerefMove` vs `&move` Unsafe Code Guidelines	5	2771	March 25, 2019