Location-stable into_iter()?


#1

There are some kernel interfaces (for example, aio(7) that require certain structures to reside at the same memory location for their entire lifetimes. Unfortunately, collections of these structures can’t be used with into_iter, because it moves elements out of the collection. iter and iter_mut work, but they don’t consume the collection. Is there any way to transform a collection into an iterator that preserves the location of its elements? I suppose I could Box each element, but that would entail a substantial overhead.

Here’s a snippet that demonstrates the problem: https://play.rust-lang.org/?gist=63d65c6f0a4876e44166ca8559d03986&version=stable


#2

I am not sure if you want a move here. By definition, it may move (more precisely memcpy) data around. Can you explain a bit more why iter() and iter_mut() would not work for your use case?


#3

I’ve run into this pattern several times. Basically, I’ve got a struct that contains a Vec of struct aiocb ( or some other struct that can’t be moved). I want to consume the outer struct, transforming the inner struct aiocbs in the process. Because I don’t know what the caller is going to do with the results, I want to return an Iterator, not a collection. But I can’t return an Iterator because into_iter moves the elements. So I’m stuck with using iter or iter_mut with collect to transform the Vec into a collection of whatever the output type is. It works, but it uses more memory than simply returning an Iterator.


#4

For the purpose of transferring ownership of the aiocbs (which is what I understand you want to do here), returning a collection might be more appropriate than returning an iterator. After all, the user can just iterate on the resulting collection. Are you very keen on this part of the design?

If not, I would just turn the Vec into a boxed slice (to prevent the user from inserting/removing elements or causing any kind of reallocation and associated data reshuffling) and return that to the client.


#5

Yeah, I’m trying to transform ownership at the same time as doing an operation on the struct aiocbs. The only part that bothers me about using a collection is the extra memory usage.


#6

What I am proposing here does not use any extra memory. It just reuses the storage of the Vec of aiocbs that you had initially.

Also, note that Rust does not support immovable types yet, and if someone has even just an &mut to an aiocb, it is still possible to swap it out of its current location… I’m not sure if it would be safe to give your clients anything more than an & reference to them.


EDIT: You may also want to explore this kind of API, which consumes self at the end of the user-specified aiocb processing routine. It’s a common trick to handle “consume a value after letting a user play with references to it” scenarios.

struct MyStruct {
    cbs: Vec<aiocb>,
}

impl MyStruct {
    fn consume<F: FnOnce(&[aiocb])>(self, processing: F) {
        processing(&self.cbs);
    }
}

// ... user code ...

my_struct.consume(|cbs| {
    // Can use the inner aiocbs however I like here: iteration, slicing...
});
// my_struct does not exist anymore here, and aiocbs cannot be leaked out

EDIT 2: Thinking about this some more, I wonder if you should not rethink your use of a vector of aiocbs to begin with:

  • Any operation that modifies the vec (insertions, deletions, capacity changes…) while async IO associated with one of the aiocbs is in progress is unsafe.
  • Accidentally giving your client mutable access to the aiocbs while async IO is in progress is unsafe.
  • Even just dropping the struct hosting the vector of aiocbs while async IO is in progress (which can occur unpredictably as a result of a panic) is unsafe.

Basically, if I understand the way your code currently works, this vector of aiocbs is a major unsafe footgun.


#7

Thanks! That kind of common trick is exactly what I was looking for. As for safety, you’re right that aio(7) is a major footgun :wink: . I try to mitigate it by making the inner Vec non-pub, by adding some runtime checks to prevent you from doing certain operations while the IO is in progress, and by storing references to the aiocbs’ buffers alongside the Vec in a form that prevents them from dropping while the IO is in progress.


#8

Ok, here’s my final solution. It’s still far from ideal, because it can’t be chained with an iterator adaptor. But it does avoid the extra allocation associated with returning a collection. And it avoids giving the client a mutable reference directly to the aiocb.

struct MyStruct {
    cbs: Vec<aiocb>,
}

struct MyResult {
    ...
}

fn transform(a: &mut aiocb) -> MyResult {
    ...
}

impl MyStruct {
    fn consume<F: FnMut(MyResult)>(mut self, mut callback: F) {
        for i in 0..self.cbs.len() {
            callback(transform(&mut self.cbs[i]));
    }
}

#9

If you can live with some dynamic dispatch, this may be closer to your goals:

impl MyStruct {
    fn consume<F, R>(mut self, callback: F) -> R
        where F: FnOnce(&mut Iterator<Item=MyResult>) -> R
    {
        let mut iterator = self.cbs.iter_mut().map(transform);
        callback(&mut iterator)
    }
}

I have added a way for the user callback to return a result, as in my experience that comes in handy very often in this kind of APIs. However, you may need to somehow promise the compiler that R will not leak references to the input Iterator. I used to know what kind of generic bound was needed for this (if any), but have forgotten about it. As always in Rust, the compiler error messages will tell you what to do :stuck_out_tongue:

Note that in this scheme, transform is not guaranteed to be called on each aiocb (think about e.g. the user dropping the iterator without consuming it). I’m not sure if that is important for your use case. If there is some kind of mandatory work in transform(), you may want to move it to an eagerly evaluated separate iteration.


#10

Also, note that so far, I have only used the kind of API design that we are discussing in &mut self methods that cleared internal state after letting user code see it. Which effectively mandated the use of callbacks. In your usage scenario, the fact that you consume self opens other interesting API design possibilities, such as this ugly little thing:

struct MyResultIterator {
    cbs: Vec<aiocb>,
    index: usize,
}

impl Iterator for MyResultIterator {
    type Item = MyResult;

    fn next(&mut self) -> Option<MyResult> {
        let result = self.cbs.get_mut(self.index).map(transform);
        self.index += 1;
        result
    }
}

impl MyStruct {
    fn consume(self) -> MyResultIterator {
        MyResultIterator { cbs: self.cbs, index: 0 }
    }
}

I’m not sure if I can recommend this approach, as it can keep the aiocb vector around for dangerously long, which may not be what you want, and it also requires a bit of boilerplate to build what is effectively a self-referential iterator type.

It will also become more complicated if you ever implement Drop for MyStruct, because in this case moving stuff out of self becomes frowned upon, and you need to use more dirty tricks like swapping an empty vector in place of the old one.

But I think it’s nice that you can actually return an iterator if you really want to and are ready to pay the price.