Idea: guestimate of size of datatype


#1

Say you have the following situation:


struct Worker {
    person: Person,
    job: Job
}

struct Person {
    first_name: String,
    last_name: String,
    dob: DateTime,
    address_line_1: String,
    address_line_2: String,
    address_line_3: String,
    // lots more fields here
}

struct Job {
    // similarly to person, lots of fields
}

You might think that the size of Worker was small, but in fact when you drill down its actually very big. Having this information might change design considerations, like when to Box. Could rustdoc give (or estimate) the size of any data-structures it documents?

In this case it would be fairly easy to drill down and see, but sometimes there are many levels of complex types, that may be generic, and it becomes a bit harder to tell.


#2

The main problem I see is that size information is usually private, in the semver sense. I think the only thing we should expose, if any, is “this struct fits in a cache line on whatever architectures”, since that’s approximately the main consideration for boxing.

As for the generics consideration, I think that “minimum size of T to fit in a cache line” is probably what you want?


#3

As I understand it, semver relates to how the machine sees our code and reasons about compatibility. Exposing a size hint in rustdoc seems like a very different thing with probably different tradeoffs.

I do see how differences in padding and type sizes across targets might make this harder, but overall I think it would be useful for rustdoc to maybe float this information somehow (probably more than a boolean to reflect whether it fits in a cache line, but not quite an exact number of bytes?).


#4

Exposing the exact size makes the size an API commitment. Exposing “fits in a cache line” is only questionably an API commitment.


#5

It would only ever be a best estimate. It would be different on different architectures, and between different versions of the compiler. There would be no promises made about the accuracy - it’s just a hint.

What it does do is act as a hint on where to profile - if a struct is big, profiling with and without boxing is probably worth doing. If it’s smaller than a pointer, probably not.


#6

You can use std::mem::size_of to find the size of a struct in bytes, but as others have noted don’t rely on this staying the same across different compilers and architectures.


#7

This is what you would use to provide the hint.


#8

If the size of a type matters, then you should probably check the size of your types anyways. How will generic types be handled, in general you can’t know the size of it before monomorphozation


#9

I would just not give estimates for type constructors, only concrete types.


#10

There’s already a similar lint in Clippy; I argue that if anything, this should be a Clippy lint which is maybe allow-by-default or warn-by-default. So instead of printing the size of every single data structure, there could exist a warning for extremely large structures which might need to be broken up into parts, boxed, etc.

Exactly – and in addition, if something is exposed, people will rely on it, no matter how many flashing red warnings saying “THIS IS ONLY AN ESTIMATE AND ALWAYS SUBJECT TO CHANGE” there might be.


#11

The (Nightly) compiler can already do this, via the -Zprint-type-sizes option. It’s very effective, I’ve used it myself on multiple occasions. See this blog post for details.

(That doesn’t involve rustdoc, which means the sizes don’t appear in documentation, so I’m not sure if it meets your requirements. If not, at least the machinery is already in place within the compiler, and presumably could be hooked up to rustdoc with some effort…)


#12

I don’t really have any requirements, I was just throwing the idea up. :slight_smile:


#13

Please don’t “throw ideas up”. The language has more ideas than it can deal with, and there is very real cost of every addition, and even evaluation of ideas. If there isn’t a big real need for something, forget it.


#14

I disagree. This is not an RFC or pre-RFC; it’s just a thread in the internals forum, which has plenty of capacity for ideas. It’s one thing if someone, say, spams the forum with a dozen half-baked ideas over the course of a month, but this OP hasn’t done that.

It’s not a bad idea, either… especially since it’s not proposing a core language feature or anything that would be subject to stability guarantees, just an implementation feature, and one that would be relatively easy to implement.


#15

Fair enough


#16

The docs currently allow trivially looking at the implementation of any type, which is the strongest statement that can be made in terms of stability. Stating the size of a type on a given architecture couldn’t possibly suggest a higher guarantee in terms of things not changing compared to viewing the source.


#17

I have a slightly different concern ‒ could it be misleading? Specifically:

struct Indirect(Box<SomethingReallyHuge>)

This would show a small number, so one could go and create Vec<Indirect> with a lot of elements and be surprised how this 8-byte large structures ate all the RAM.

So maybe having a (collapsed by default) size analysis that would say 8 bytes inline, but some arbitrary amount on the heap?


#18

If you start counting indirect memory usage, it gets hairy. For Vec usage there is extremely odd distribution — most are empty, some are huge. With a type like Vec<Vec<u8>> you don’t know if it typically costs nothing or takes 90% of your RAM.


#19

That’s actually what I was trying to say. If I was rustdoc, I wouldn’t dare to claim this type is small. It’s stack representation is small, but that is misleading, as that’s only half of the message. The best/most accurate answer I could give would be something like:

  • stack: 24B
  • heap: 0-∞

But I don’t know if this is in any way useful.


#20

It’s worth noting that Clippy already has a warning about surprising enum size (e.g. if you have enum (u32, [u8; 1000])).

If the goal is to warn about excessive stack usage, or maybe too much copying for return types, such things can be added to Clippy. That’d work better than checking docs manually type by type.