Changelog
- Clarify why building [MaybeUninit; N] may not be easy, add link to @scottmcm's lib.
- Scratched ConstSizeIterator section and added extra comments on extension trait approach.
- Integrated a first round of feedback and used into_iter for Vec.
Now that const generics are in sight, we may want to think about how we could use them to resolve some long-standing ergonomic issues around arrays in Rust. One, in particular, recently popped up in this github thread: the difficulty of cleanly and efficiently initializing arrays, emerging from to the lack of a way to collect iterators into arrays.
Motivation
As a motivating example, we will use the semi-realistic task of converting a list of borrowed strings into a list of owned strings. If your list is implemented as a Vec
, this is trivial...
fn to_owned_vec<'a>(strs: Vec<&'a str>) -> Vec<String> {
strs.into_iter().map(String::from).collect()
}
...so let's try to do it with arrays. What could possibly go wrong?
fn to_owned_arr<'a>(strs: [&'a str; SIZE]) -> [String; SIZE] {
/* ??? */
}
Well, first of all, even if we ignore the sad fact that into_iter()
does not have the same effect on arrays and Vec
s, the obvious conversion from the Vec
-based version will not work.
// Won't compile: arrays of T do not implement FromIterator<T>
fn to_owned_arr_fromiter<'a>(strs: [&'a str; SIZE]) -> [String; SIZE] {
strs.into_iter().map(|&s| String::from(s)).collect()
}
Let's try a loop, maybe, crossing fingers that the compiler will be able to optimize out the unnecessary initial array value?
// Won't compile: Strings are not Copy, so they can't be used in array initializers
fn to_owned_arr_loop<'a>(strs: [&'a str; SIZE]) -> [String; SIZE] {
let mut a = [String::new(); SIZE];
for (&inp, out) in strs.iter().zip(a.iter_mut()) {
*out = inp.to_owned();
}
a
}
Future prospects
@Centril pointed out that the future language improvements of rust-lang#49147 would allow this example to work if String::new()
were also turned into a const fn
.
While this approach would indeed resolve this particular problem among many others, it should be noted that it will not completely resolve the array initialization problem on its own, as some types like File
cannot be constructed by a const fn
.
Oh well, I just can't get this to work with safe code. Maybe a pinch of unsafe will resolve everything?
fn to_owned_arr_ub<'a>(strs: [&'a str; SIZE]) -> [String; SIZE] {
let mut a: [String; SIZE] = unsafe { std::mem::uninitialized() };
for (&inp, out) in strs.iter().zip(a.iter_mut()) {
*out = String::from(inp);
}
a
}
And indeed, this code compiles! But it would be a stretch to say that it works:
mem::uninitialized()
is deprecated and will go away soon because merely calling it is UB for some types like!
(which can easily accidentally happen in generic code).&mut
to uninitialized memory, as generated byiter_mut()
, are instant-UB according to the validity invariants which the Unsafe Code Guidelines are currently proposing.- Even if they weren't, overwriting an uninitialized
String
via an&mut
calls its destructor, which is UB. - If anything in the loop panicked, we would also be dropping uninitialized Strings, which is still UB.
The correct thing to do, I think, would be to build a [MaybeUninit<String>; SIZE]
(which actually may not be so easy until rust-lang#49147 lands), initialize its elements one by one, move everything into a [String; SIZE]
at the end, and hope that the compiler manages to optimize everything out. Oh, and maybe add some sort of RAII guard to drop already-initialized Strings when something panics, so as not to leak memory.
I don't know about you, but this is not what I expected a simple array initialization job to look like. I hope this toy example will have convinced you that there is a problem worth fixing here.
So, what should we do?
Implement FromIterator<T>
for [T; N]
?
This would work perfectly in our example, but it raises the obvious question of what should happen if the iterator ended up producing less than N
elements.
As far as I can see, the only thing which this sort of FromIterator
implementation can do in this case is to panic, which isn't great because it may break some users' intuition that something called From should not fail (even if this intuition is, strictly speaking, not guaranteed to be respected by FromIterator
anyway).
Implement FromIterator<T>
for Option<[T; N]>
(or similar)?
This provides an ability to handle "bad iterator length" errors, but at the cost of unnecessarily pessimizing the ergonomics in the common case where the length of the iterator is known at compile time by forcing unnecessary unwraps into the code.
It also raises the question of which "bad iterator length" we should catch: just too few elements, or too many elements as well? On this front, @kornel made the case that only "too few elements" errors should be caught and reported because...
- That is consistent with the existing logic of
zip()
- It makes it easier to deal with infinite iterators.
Personally, I'm convinced by this argument. Which also answers the question of whether one should discriminate between "too few items" and "too little items" by using a Result
instead of an Option
.
Borrow stackvec's TryFromIterator
and try_collect()
?
This is a close cousin of the above solution, which would involve a little bit more std
changes, but has the advantage of clarifying the faillible nature of collecting an iterator into an array.
It improves upon the previous solution in the following ways:
- It capitalizes on the existing
From
/TryFrom
distinction try_collect::<[T; N]>()
is clearer and easier to type thancollect::<Option<[T; N]>>()
- There is less ambiguity with a possible impl of
FromIterator<Option<T>>
forOption<[T; N]>
Overall, I think it is a net improvement.
Extend Iterator
to communicate compile-time length information?
Iterator
to communicate compile-time length information?It seems we could have our cake and eat it too if we could, somehow, only implement FromIterator<T>
for [T; N]
when the length of the iterator is known at compile-time to be N
.
Sadly, it does not seem possible to do this, because although the length of an iterator may be initially known at compile time, and could be kept up to date as adapters are added, the compiler cannot track the length of a partially consumed iterator, and partial iterator consumption is perfectly legal in Rust.
Here's my past attempt at this for reference
Communicating the length of the iterator seems straightforward enough, if tedious. The idea would be to have some sort of ConstSizeIterator: Iterator
trait, which exposes the length of an iterator as an associated const if it is known at compile time.
This trait would be implemented by iterators over arrays, and could be implemented for some other constructs too like Range
s with const
bounds?
Some iterator adapters like map()
would propagate this metadata to their result, while others like filter()
wouldn't, silently degrading their ConstSizeIterator
into a vanilla Iterator
.
But it is not clear to me how this could be integrated into FromIterator
, given that the FromIterator
trait does not give you a way to specialize implementations based on what kind of iterator you are building the collection from. Probably we would need some kind of FromConstSizeIterator
... and it could recurse deeply from there.
Add Iterator
-like methods to arrays via an extension trait?
@scottmcm suggested extending arrays with Iterator
-like functionality as follows:
Unlike the ConstSizeIterator
concept above, this design was successfully implemented. But it does have some unpleasant downsides.
By making "array-map" disjoint from "Iterator
-map" and by restricting itself to the Iterator
adapters that can safely be implemented on arrays (e.g. no filter()
and no flat_map()
), I think that this approach risks keeping an large ergonomic gap between array and Vec
manipulation. This proposal would make arrays easier to use, but using them would still feel very foreign for someone who's used to the Iterator
+ Vec
workflow.
Further, I would be a bit wary of trusting compilers to optimize the temporary arrays returned by these functions, because I've seen rustc
's optimizer struggle in similar areas before. For some reason, lazy iterators seem more readily amenable to its current optimizations than eager collection returns.
Then again, no other workable design for infaillible array construction was proposed so far! So if no one comes up with a better idea, this is the direction that I would propose for this use case.
Your turn!
I've done my best to give a summary of the design space that I've seen so far. Now, can anyone review the description above and tell me if they see any flaw in my reasoning, if there is another way to go about this that I've overlooked, or if they have particular opinions about the proposals above (which are not mutually exclusive)?
cc @est31