`<[T]>::[array_]windows` (and chunks?) function family could allow 0 length

I got this idea from reading the discussion in Make `<[T]>::array_*` methods fail to compile on 0 len arrays by AngelicosPhosphoros · Pull Request #99471 · rust-lang/rust · GitHub about how to forbid 0 as length for <[T]>::array_windows. I don't have enough knowledge to comment on the original topic, but it got me thinking: why do we forbid 0 in the first place?

There is a mathematically consistent implementation for <[T]>::[array_]windows(0). Let's take a look at some other input:

[0, 1, 2].windows(3) -> [ [0, 1, 2] ];              iterator length = 1, window length = 3
[0, 1, 2].windows(2) -> [ [0, 1], [1, 2]];          iterator length = 2, window length = 2
[0, 1, 2].windows(1) -> [ [0], [1], [2] ];          iterator length = 3, window length = 1

(edit: fixed some off-by-1 error above) and if we extrapolate this, we get

[0, 1, 2].windows(0) -> [ [], [], [], [] ];         iterator length = 4, window length = 0

So a valid implementation of [_; N].windows(0) is to return an iterator that yields N+1 items, and all items are empty slices. This might look silly, but it is consistent with all other input, and likely handles edge cases well for algorithm built on top of windows. One can also argue this is the correct implementation with the intuition of a sliding window: windows(W) iterator slides a window of length W from the beginning to the end. The beginning is where the left side of the window is at the left side of the slice, while the end is where the right side of the window is at the right side of the slice. Applying this to W = 0, we get the following sliding motion

[[ ]0, 1, 2, 3] => [0[,]1, 2, 3] => [0, 1[,]2, 3] => [0, 1, 2[,]3] => [0, 1, 2, 3 [ ]]

Changing the behavior for the stable function .windows to this probably should be considered as a breaking change. However, we still have a chance to "correct" it for the unstable array_windows function.


Some similars argument can be applied to the chunks function family

[0, 1, 2].chunks(3) -> [ [0, 1, 2] ];              iterator length = 1 = 3 / 3, chunk length = 3
[0, 1, 2].chunks(2) -> [ [0, 1] ];                 iterator length = 1 = 3 / 2, chunk length = 2
[0, 1, 2].chunks(1) -> [ [0], [1], [2] ];          iterator length = 3 = 3 / 1, chunk length = 1

So to extrapolate this, we should have

[0, 1, 2].chunks(0) -> [ [], [], [], .... ];       iterator length = inf = 3 / 0, chunk length = 0

This is even sillier because it produces an infinite iterator. It is still mathematically consistent, though. It can also be argued with a sliding window, with one modification: instead of moving one element forward at a time, the window moves forward by moving the new beginning to the old end. For 0-length chunk, the window can never move forward this way, hence it yields empty slices forever.

11 Likes

This makes perfect sense to me!

Quick note: the first piece of example code has had a couple of mistakes in it (maybe you originally wrote it with a list [1,2,3] or [0,1,2,3] instead of [0,1,2]?) I'd suggest editing to fix that for future readers

Thanks! edited

These are cute, and maybe valid, and you could probably find examples where the edge case is covered by this new behavior. In the case of windows

  • Is it going to require extra code to handle width 0? In particular it seems like you want the iterator to consume all items but not output any of them.
  • Is this just a complicated way of writing (0..=x.len()).iter()? That is, the current behavior makes handling this edge case explicit (albeit not enforced at compile time).

The second is a little more glaring, infinite loops can be a frustrating thing to debug, I would much rather have an explicit loop {} somewhere than have this case accidentally decay.

So: It would be nice if these could be enforced at compile time, I don't think they should silently become aberrant/frustrating/trivial iterators merely for mathematically pleasant reasons.

2 Likes

(second line still has an extra [2,3] at the end)

I agree with the general assessment that this might be the case, but nonetheless it would be interesting to find some concrete example of “algorithms(s) built on top of windows” to see whether this will actually handle edge cases of any reasonable, practical algorithms :slight_smile:


The chunks case on the other hand sounds a bit too much like division by zero, in particular, [].chunks(0) seems hard to define in any particularly nice way, just as dividing zero by zero (whereas for positive numbers divided by zero, the answer “(positive) infinity” is – at least – the unique answer following limit arguments). The obvious problem is that [].chunks(n) is always empty, whereas […non-empty…].chunks(0) is always infinitely long, so you have an argument for [].chunks(0) to be empty, but also an argument for it to be infinitely long (and possibly, by some different “limit”-style argument, you can also argue for anything in-between).

Another interesting effect of defining both chunks and windows as you proposed would be that every case except zero has chunks return a subsequence of windows, but for 0 the chunks suddenly are a lot more than the windows… so that’s maybe a bad edge-case after all? Another problem is ExactSizeIterator, which cannot be implemented correctly for any infinite chunks iterator (as it doesn’t support infinite iterators), and arguably you will probably want this trait for ArrayChunks, too (and restricting it to N != 0 on the type level might require some language features we don’t have yet).

8 Likes

Yeah, I totally agree that the case for chunks is going to cause a lot of practical problems (so I put a question mark in the title). It is just an extra thought.

Also a good point. I vaguely remember I had such case before but before I actually remember it, I guess I should call myself bluffing here :sweat_smile:

For windows I think this is fine. The windows iterator always returns references and never consume items from the slice, so 0 is unlikely going to be a special case in the code in this regard. I haven't read the current code, though, so maybe some weird handling around pointers is required for the 0 case...

1 Like

Another problem with allowing chunks(0) to yield infinite elements is that this becomes inconsistent with as_chunks::<0>() since it can't yield infinite elements.

I recall this coming up before around as_chunks -- one could just define that x.as_chunks::<0>() gives (&[], x) -- but I can't find that conversation right now :frowning:

The problem is in whether it'll actually help more than it'll hurt. I really like the postcondition that let (chunks, tail) = x.as_chunks::<N>(); means that tail.len() < N always. Having a "well, but zero does something extra" case just invites bugs, IMHO, where it's unexpectedly 0 and now there's a logic bug because that case wasn't handled.

It reminds me of align_to, which doesn't promise a bounded head and tail (in order to work in const fn) but where that weak postcondition is a constant source of complaints.

Similarly, .step_by(0) used to give an infinite iterator, but I'm quite happy that it now just panics. An off-by-one error panicking is much better than an off-by-one error that turns into a surprise infinite loop.

And perhaps more importantly, when it was possibly-infinite, it was illegal for the iterator to implement certain traits. That comes up here too.

Conceptually, I actually really like the idea of zero-length windows working as you describe. It's just like how matching the ε regex against a string will give you length-plus-one matches, one at each fencepost. And it doesn't complicate the postconditions either, since the windows are still always N long each, and the number of windows is always len + 1 - N.

But I think the thing that's fatal to it is the iterator marker traits. Unfortunately, [(); usize::MAX].array_windows::<N>() is legal. And that means that if N == 0 is supported, the ArrayWindows iterator can't be ExactSizeIterator, as its length would overflow a usize.

And I think I'd rather have the iterator be ExactSizeIterator than have it support N == 0.

5 Likes

Good approach!

But it could be more amazing!

Your logic (pseudo-rust):

let m : isize;
let m = m.abs();

[T; N].windows(m) -> [ [T; m], N - m + 1];
[T; N].chunks(m)  -> [ [T; m], N.div_floor(m)];

And special case for zero:

let zero = 0;

[T; N].windows(zero) -> [ [T; 0], N + 1];
[T; N].chunks(zero)  -> [ [T; 0], isize::MAX];

I suggest much better approach! First, let's extend parameter to real numbers (in mind)

Real-numbers logic (pseudo-rust):

let m : f64;
let m = m.abs();

[T; N].windows(m) -> [ [T; m.floor()], N - m.floor() + 1.0 - m.fract().ceil()];
[T; N].chunks(m)  -> [ [T; m.floor()], (N / m.max(1.0)).floor()];

Second, use special case for zero:

let zero = 2.0 * f64::EPSILON;

[T; N].windows(zero) -> [ [T; 0], N];
[T; N].chunks(zero)  -> [ [T; 0], N];

So, in our case:

[0, 1, 2].windows(0) -> [ [], [], [] ];
[0, 1, 2].chunks(0)  -> [ [], [], [] ];

It's brilliant!

Could you explain why a much more complex expression would be "much better" and "brilliant"?

Oof, I didn't think of this one. Perhaps one can make [_;usize::MAX].windows(0) panic to workaround this, but at this point this doesn't look that beautiful any more

Why do Gamma-function G(n + 1) == n! is "much better" than factorial?

It is more "natural" and it has more useful cases.

I would consider this one unacceptable, because to me one essential property of chunks is that .chunks(n).flatten() recovers the original. (And .chunks_exact(n).flatten() recovers the original other than potentially fewer than n missing from the end.)

8 Likes

Sorry :face_with_thermometer:, @wwylele accidentally used chunks_exact() instead of chunks() and so did I.

Ok, let's change and allow to have unstable length:

[T; N].chunks(m)  -> [ [T; m.floor() | m.ceil()], (N / m.max(0.5)).ceil()];

[0, 1].chunks(1.5)               -> [ [0], [1] ];
[0, 1, 2].chunks(1.5)            -> [ [0], [1, 2] ];
[0, 1, 2, 3].chunks(1.5)         -> [ [0], [1, 2], [3] ];
[0, 1, 2, 3, 4].chunks(1.5)      -> [ [0], [1, 2], [3], [4] ];
[0, 1, 2, 3, 4, 5].chunks(1.5)   -> [ [0], [1, 2], [3], [4, 5] ];
[0, 1, 2, 3, 4, 5, 6].chunks(1.5)-> [ [0], [1, 2], [3], [4, 5], [6] ];

[0,1,2,3,4,5,6,7,8,9].chunks(1.25) -> [ [0], [1], [2], [3, 4], [5], [6], [7], [8, 9] ];

[0,1,2,3,4,5,6].chunks(1.75) -> [ [0], [1, 2], [3, 4], [5, 6] ];

In this case of zero we have:

[T; N].chunks(zero)  -> [ [T; 0 | 1],  2 * N ];

[0, 1, 2].chunks(0)  -> [ [], [0], [], [1], [], [2] ];

In this case arr.chunks(n).flatten() recovers the original.

2 Likes

It still looks beautiful to me. "Sometimes, various things panic when they overflow usize" is a more general necessary-evil, and this is only a special case of that, rather than being something that's particularly the fault of windows(0).

Isn't it already the case that the largest slice is of length isize::MAX - 1 anyway? Or is [(); usize::MAX] allowed too?

The length of the slice in bytes can't be more than isize::MAX. But of course slices of ZSTs are always zero bytes, regardless of their length.

1 Like

The largest allowable Rust Allocated Object is isize::MAX bytes large, but [(); usize::MAX] is a perfectly allowable type with a size of zero bytes. [u8; isize::MAX as usize] is a theoretically valid type, but will often be rejected as too big for the current architecture.

This is a much larger change, and one completely impossible for the array versions.


Panicking on zero is fine IMHO; there's logically an N / M division happening, and that panics for M = 0. The same goes for a post-mono error for the const-generic version, at least once inline-const is stable and post-mono errors thus become more accessible. (And I vaguely positive on that we can and might make such instantiation errors surface during check builds and not reliant on dead code elimination for whether they're fired or not before stabilizing inline-const.)

2 Likes