Optimized representations


#1

(This is a fork of issue #15714 from the github repository that was closed but could still be discussed usefully.)

Original issue text:

A key worry in a web browser is the size of data. We’d like to know that e.g. an enum with a single pointer constructor and 5 nullary constructors can still fit into a single machine word on 64-bit architectures. All of these optimizations need to work with custom smart pointer types, but we also need to be able to disable them for some smart pointers, e.g. pointers that are tracked by an external GC.


original feedback from strcat:

It’s not possible to do this because Rust allows taking references to the values inside an enum. It can’t use tagged pointers without breaking the language semantics.

Since this would be a language change rather than an optimization in the compiler, it needs to go through the RFC process. I really don’t know how it would fit into the enum system though. It would need to forbid using ref / ref mut for a variant.


#2

I’ve had thoughts on how to encode tagged data that would be compatible with our current strategies for generating compact enum representations.

For example, if we added singleton types like Singleton<const T> (where the generic parameter is a constant of type T, for any type T), where such a type has just one inhabitant (the constant itself), then it seems to me like one could directly encode enums with explicit tags like so:

// assume 64-bit ptr with 8 byte alignment
enum Value<T> {
    I32Val(i32, Singleton<1u32>),
    ChrVal(char, Singleton<2u32>),
    NulVal(Singleton<3u64>),
    PtrVal(Box<T>), // (requires 0b00 suffix on ptrs)
}

This is, I think, a different take on some of the recent RFC’s that have suggested having some way to suggest that certain bit patterns are “invalid” for a particular type, as a way to let libraries encode the representation optimization that we apply to Option.

This design would require some other changes to the language to be really useful (e.g. if you wanted, atop a 32-bit architecture, to support 30-bit fixnums with a 2-bit type tag, then we would need to generalize our numerics to support i30/u30 and i2/u2, etc, as well as a way to ask for a packed representation if we don’t have that already). Still, it is something I’ve been musing about.


#3

I sometimes wonder whether in Rust of the distant future there could be some fancy existentials system being useful for exactly laying out data:

type Something<T> = exists (n: u1) {
    0 => (AnyPad<3>, u4, Box<T>)
    1 => (u1, u1, u1, exists (m: u4) [u8, ..m])
}

Normal Enums would desugar to newtypes around such things.


#4

I’m thinking about defining an untagged union to encompass the variants. Each variant would be a proper type (e.g. a single-value struct (aka newtype)). The union type would provide predicate functions, and conversion/coercion functions to each variant, as well as a few constructors.

The union can implement Deref among other traits as needed.


#5

Out of curiosity, how is this problem solved in C/C++? Or at least how does Gecko deal with it?