We have Iterator::size_hint
, ExactSizeIterator
, and TrustedLen
(still unstable and also unusable without specialisation) all doing roughly the same thing.
To quote the documentation of Iterator::size_hint
:
It is not enforced that an iterator implementation yields the declared number of elements. A buggy iterator may yield less than the lower bound or more than the upper bound of elements.
size_hint()
is primarily intended to be used for optimizations such as reserving space for the elements of the iterator, but must not be trusted to e.g. omit bounds checks in unsafe code. An incorrect implementation ofsize_hint()
should not lead to memory safety violations.That said, the implementation should provide a correct estimation, because otherwise it would be a violation of the trait's protocol.
ExactSizeIterator
appears to imply that the exact length is known, but not that the reported length can be trusted (for memory safety at least). See also The Trusted Iterator Length Problem.
TrustedLen
appears to be intended to allow optimal collect()
implementations by trusting that the reported length is correct (and violating memory safety if not).
But what can be done with these reported lengths/hints?
- They cannot be relied upon for memory safety (except
TrustedLen
). - Is it allowable to ignore elements beyond the claimed maximum, e.g. in
collect()
? This does not appear to be specified anywhere, but the current implementations will not drop extra elements (in fact, withTrustedLen
, they would instead write beyond the end of the allocated block if the maximum was too low; otherwise the claimed maximum is ignored). - Is it allowable to incorrectly return
None
when randomly choosing an element from an iterator whose length is less than the claimed lower bound? This is analogous to the previous problem, so presumably also no, unlessTrustedLen
is implemented. - Is it allowable to introduce bias in the result when the claimed bounds are incorrect — e.g. in the above problem to retain the last element in case there are less than the claimed minimum, thus potentially selecting the last element with higher probability than expected? This problem is probably specific to random selection, and still returns a valid result, but perhaps not the correct one.
- Is it allowable for
size_hint
to affect the result in other ways so long as the result is still correct? This is very specific to randomness: normally only one result is correct but with random sampling, many results may be correct (valid and without bias). Additionally, the number of random values extracted from the random number generator may not be specified. This goes against the usual assumption that optimisations cannot affect the result, and is something we're trying to cover for the Rand library, but we're not fully there yet.
I'm trying to tease out what we can and can't do to optimise sampling based on size_hint
(see Optimise `IteratorRandom::choose` for `size_hint` or `ExactSizeIterator` · Issue #511 · rust-random/rand · GitHub). Personally I think we should yes to the last question, 5. I'm less sure about question 4, but without allowing this I don't think we can make any optimisations anyway.
Presumably since TrustedLen
allows memory safety violations on incorrect size_hint
, any of the rest is allowable too. For ExactSizeIterator
however, it doesn't appear that the trait implies anything more than size_hint
alone (except that the lower and upper bounds are the same; there have been suggestions that this trait could be removed).