Idea: guestimate of size of datatype

derekdreery · December 26, 2018, 8:50pm

Say you have the following situation:


struct Worker {
    person: Person,
    job: Job
}

struct Person {
    first_name: String,
    last_name: String,
    dob: DateTime,
    address_line_1: String,
    address_line_2: String,
    address_line_3: String,
    // lots more fields here
}

struct Job {
    // similarly to person, lots of fields
}

You might think that the size of Worker was small, but in fact when you drill down its actually very big. Having this information might change design considerations, like when to Box. Could rustdoc give (or estimate) the size of any data-structures it documents?

In this case it would be fairly easy to drill down and see, but sometimes there are many levels of complex types, that may be generic, and it becomes a bit harder to tell.

mcy · December 26, 2018, 9:27pm

The main problem I see is that size information is usually private, in the semver sense. I think the only thing we should expose, if any, is “this struct fits in a cache line on whatever architectures”, since that’s approximately the main consideration for boxing.

As for the generics consideration, I think that “minimum size of T to fit in a cache line” is probably what you want?

djc · December 26, 2018, 9:59pm

As I understand it, semver relates to how the machine sees our code and reasons about compatibility. Exposing a size hint in rustdoc seems like a very different thing with probably different tradeoffs.

I do see how differences in padding and type sizes across targets might make this harder, but overall I think it would be useful for rustdoc to maybe float this information somehow (probably more than a boolean to reflect whether it fits in a cache line, but not quite an exact number of bytes?).

mcy · December 26, 2018, 10:03pm

Exposing the exact size makes the size an API commitment. Exposing "fits in a cache line" is only questionably an API commitment.

derekdreery · December 27, 2018, 11:39am

It would only ever be a best estimate. It would be different on different architectures, and between different versions of the compiler. There would be no promises made about the accuracy - it's just a hint.

What it does do is act as a hint on where to profile - if a struct is big, profiling with and without boxing is probably worth doing. If it's smaller than a pointer, probably not.

RustyYato · December 27, 2018, 12:18pm

You can use std::mem::size_of to find the size of a struct in bytes, but as others have noted don’t rely on this staying the same across different compilers and architectures.

derekdreery · December 27, 2018, 12:25pm

This is what you would use to provide the hint.

RustyYato · December 27, 2018, 12:33pm

If the size of a type matters, then you should probably check the size of your types anyways. How will generic types be handled, in general you can’t know the size of it before monomorphozation

derekdreery · December 27, 2018, 2:35pm

I would just not give estimates for type constructors, only concrete types.

H2CO3 · December 27, 2018, 5:45pm

There's already a similar lint in Clippy; I argue that if anything, this should be a Clippy lint which is maybe allow-by-default or warn-by-default. So instead of printing the size of every single data structure, there could exist a warning for extremely large structures which might need to be broken up into parts, boxed, etc.

Exactly – and in addition, if something is exposed, people will rely on it, no matter how many flashing red warnings saying "THIS IS ONLY AN ESTIMATE AND ALWAYS SUBJECT TO CHANGE" there might be.

nnethercote · January 1, 2019, 8:33pm

The (Nightly) compiler can already do this, via the -Zprint-type-sizes option. It’s very effective, I’ve used it myself on multiple occasions. See this blog post for details.

(That doesn’t involve rustdoc, which means the sizes don’t appear in documentation, so I’m not sure if it meets your requirements. If not, at least the machinery is already in place within the compiler, and presumably could be hooked up to rustdoc with some effort…)

derekdreery · January 4, 2019, 2:40pm

I don’t really have any requirements, I was just throwing the idea up.

kornel · January 4, 2019, 2:48pm

Please don’t “throw ideas up”. The language has more ideas than it can deal with, and there is very real cost of every addition, and even evaluation of ideas. If there isn’t a big real need for something, forget it.

comex · January 4, 2019, 10:45pm

I disagree. This is not an RFC or pre-RFC; it's just a thread in the internals forum, which has plenty of capacity for ideas. It's one thing if someone, say, spams the forum with a dozen half-baked ideas over the course of a month, but this OP hasn't done that.

It's not a bad idea, either... especially since it's not proposing a core language feature or anything that would be subject to stability guarantees, just an implementation feature, and one that would be relatively easy to implement.

kornel · January 4, 2019, 11:04pm

Fair enough

ahmedcharles · January 6, 2019, 8:07am

The docs currently allow trivially looking at the implementation of any type, which is the strongest statement that can be made in terms of stability. Stating the size of a type on a given architecture couldn’t possibly suggest a higher guarantee in terms of things not changing compared to viewing the source.

vorner · January 6, 2019, 8:55am

I have a slightly different concern ‒ could it be misleading? Specifically:

struct Indirect(Box<SomethingReallyHuge>)

This would show a small number, so one could go and create Vec<Indirect> with a lot of elements and be surprised how this 8-byte large structures ate all the RAM.

So maybe having a (collapsed by default) size analysis that would say 8 bytes inline, but some arbitrary amount on the heap?

kornel · January 6, 2019, 1:59pm

If you start counting indirect memory usage, it gets hairy. For Vec usage there is extremely odd distribution — most are empty, some are huge. With a type like Vec<Vec<u8>> you don’t know if it typically costs nothing or takes 90% of your RAM.

vorner · January 6, 2019, 2:15pm

That’s actually what I was trying to say. If I was rustdoc, I wouldn’t dare to claim this type is small. It’s stack representation is small, but that is misleading, as that’s only half of the message. The best/most accurate answer I could give would be something like:

stack: 24B
heap: 0-∞

But I don’t know if this is in any way useful.

kornel · January 6, 2019, 2:27pm

It’s worth noting that Clippy already has a warning about surprising enum size (e.g. if you have enum (u32, [u8; 1000])).

If the goal is to warn about excessive stack usage, or maybe too much copying for return types, such things can be added to Clippy. That’d work better than checking docs manually type by type.

Topic		Replies	Views
Official way to get the size of a field language design	12	737	January 15, 2025
Another case for smaller structs	9	1046	November 22, 2021
Minimal acceptable integer size - i32min, u8min, etc language design	30	2720	November 14, 2020
“Baseline bounds”: an extensible replacement for `?Sized` language design	18	437	November 24, 2024
Make size_of available at compile time ideas (deprecated)	3	1824	March 25, 2019

Idea: guestimate of size of datatype

Related topics