`size_hint` for `Display` and other `fmt` traits?

I wonder if there's any good reason against adding size_hint to fmt traits. Here's a motivating use case:

serde has a serialize_str method for serializing strings and also collect_str method which can be used to serialize Display. collect_str has default impl that allocates a String and passes it to serialize_str.

We have a format that serializes as hex string (when human readable). We can pre-compute the length so if the serializer allocates anyway, it'd be preferable if we did it ourselves with the correct capacity. However if the serializer is overridden to not allocate we should prefer collect_str.

The problem is we don't know which serializer we deal with. My guess is there are multiple similar cases across ecosystem and they would be nicely resolved by adding a method to the trait:

/// Returns the range of expected byte sizes when formatted as UTF-8.
fn size_hint(&self) -> (usize, Option<usize>) {
    (0, None)
}

The method is very similar to that on Iterator. This should be backwards-compatible since the method has a sane default.

3 Likes

I remember seeing a discussion at some point that the upper bound of Iterator::size_hints is essentially never really used in practice; another potential concern is that the name “hint” is weaker than the actual requirement that it’s (at least for Iterators) considered a logic error, i.e. a bug, if an Iterator’s size_hint returns incorrect bounds.

With this in mind, perhaps a different method name, and also a single usize lower-bound-only return value could make sense.

There is an existing estimated_capacity that is used when you std::fmt::format:

That mentions it is neither upper or lower bound, just a number to use based on the static string length.

I wonder why it's never used. I would expect this code to be reasonable:

let (min, max) = iter.size_hint();
let mut vec = Vec::with_capacity(max.unwrap_or(min));

One could probably also use reserve() vs reserve_exact depending on whether there's max.

Anyway, specific details are not the point of my post. :slight_smile:

I tried; it turned out not to help. See https://internals.rust-lang.org/t/is-size-hint-1-ever-used/8187?u=scottmcm.

I didn't look deeply into it, but it might be that all the calculations for the max optimize away when they're unused. So the work for all that code might not be worth occasionally saving one doubling.

That's probably a bad idea, because a filter has a size_hint of (0, Some(n)), but reserving the whole space is often going to be a huge overallocation.

I would definitely like to see something like Iterator::reserve_suggestion that's just a single value where it's not a logic error for it to be wrong, just a perf/memory issue if it's inappropriate somehow -- basically exactly that comment on estimated_capacity.

7 Likes

It'd be awesome if filter's hint (success ratio) could be computed from runtime data, similar to profile-guided optimization.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.