To quote the documentation of
It is not enforced that an iterator implementation yields the declared number of elements. A buggy iterator may yield less than the lower bound or more than the upper bound of elements.
size_hint()is primarily intended to be used for optimizations such as reserving space for the elements of the iterator, but must not be trusted to e.g. omit bounds checks in unsafe code. An incorrect implementation of
size_hint()should not lead to memory safety violations.
That said, the implementation should provide a correct estimation, because otherwise it would be a violation of the trait’s protocol.
ExactSizeIterator appears to imply that the exact length is known, but not that the reported length can be trusted (for memory safety at least). See also The Trusted Iterator Length Problem.
TrustedLen appears to be intended to allow optimal
collect() implementations by trusting that the reported length is correct (and violating memory safety if not).
But what can be done with these reported lengths/hints?
- They cannot be relied upon for memory safety (except
- Is it allowable to ignore elements beyond the claimed maximum, e.g. in
collect()? This does not appear to be specified anywhere, but the current implementations will not drop extra elements (in fact, with
TrustedLen, they would instead write beyond the end of the allocated block if the maximum was too low; otherwise the claimed maximum is ignored).
- Is it allowable to incorrectly return
Nonewhen randomly choosing an element from an iterator whose length is less than the claimed lower bound? This is analogous to the previous problem, so presumably also no, unless
- Is it allowable to introduce bias in the result when the claimed bounds are incorrect — e.g. in the above problem to retain the last element in case there are less than the claimed minimum, thus potentially selecting the last element with higher probability than expected? This problem is probably specific to random selection, and still returns a valid result, but perhaps not the correct one.
- Is it allowable for
size_hintto affect the result in other ways so long as the result is still correct? This is very specific to randomness: normally only one result is correct but with random sampling, many results may be correct (valid and without bias). Additionally, the number of random values extracted from the random number generator may not be specified. This goes against the usual assumption that optimisations cannot affect the result, and is something we’re trying to cover for the Rand library, but we’re not fully there yet.
I’m trying to tease out what we can and can’t do to optimise sampling based on
size_hint (see https://github.com/rust-lang-nursery/rand/issues/511). Personally I think we should yes to the last question, 5. I’m less sure about question 4, but without allowing this I don’t think we can make any optimisations anyway.
TrustedLen allows memory safety violations on incorrect
size_hint, any of the rest is allowable too. For
ExactSizeIterator however, it doesn’t appear that the trait implies anything more than
size_hint alone (except that the lower and upper bounds are the same; there have been suggestions that this trait could be removed).