Memory-layout-compatible enum/struct subsets

Add a way to subdivide an enum or struct into subsets of fields/variants, give them a type that can be used as any other enum/struct and have them compatible in the memory layout such that they can be copied with zero overhead using a simple memcpy. Syntax used is mainly for sake of argument, could be done in multiple ways.

// See motivation for examples that might make more sense
#[subset(GeneralError: A, D, E)]
#[subset(SendingError: B, D, E)]
#[subset(EncodingError: B, E)]
enum Error {
    A,
    B(u32)
    C{x: u32, y: u32, z: u32}
    D,
    E,
}

// As it works without any subsets
fn example_1() -> Result<(), Error> { todo!() }
fn example_2() -> Result<(), GeneralError> { todo!() }
fn example_3() -> Result<(), SendingError> { todo!() }
fn example_4() -> Result<(), EncodingError> { todo!() }

// Automatic implementation of `From`, to allow this:
fn example_5() -> Result<(), Error> {
    Ok(example_2()?)
}
fn example_5() -> Result<(), SendingError> {
    Ok(example_4()?)
}

They could also be accessed using with something like Error::SendingError::D.

Note that we currently can do this via macros, but it wouldn't guarantee memory layout compatibility and thus may require some code when converting from SendingError to Error (for example). This is worse for structs, but I think enums are the more useful application of memory-layout-compatible subsets.

EDIT: It is possible with macros if you tell the compiler what representation you want the structs to have and add the padding manually if needed (though I don't know how well that works with enums).

One question I cannot answer if is these sub-types need to have the same size or if they could (as long as it fits of course) be smaller than the outer enum/struct. Under some circumstances they might require padding to achieve the same memory layout, at which point (if it's small enough) it could also be more efficient to drop/loosen the memory layout, allowing a smaller struct but a bit more overhead when converting.

Note that for enums (at least if the data is close to the descriminator) that shouldn't matter much (though they might get some padding requirements from the outer type. But structs can have subsets that cannot be done all at the same time without added padding.

To be more specific: For enums this subset-type would:

  • Always be smaller of have the same size
  • Have the same discriminant location as the parent (e.g. first byte)
  • Have the same discriminants as the parent (although not all of them)
  • Would for the purpose of pattern matching be seen as not being able to represent the variants not part of the subset
  • Would otherwise behave like any other type (as far as I know Rust currently has no guarantees on the type layout anyways unless specified).

The subsets might introduce a change in the memory layout of the parent for optimization (e.g. making the subset size smaller).

Generics would be the same (or the corresponding subset) of the "parent" (sorry for the object-oriented way of naming things) type.

Other than adding those new types and some From implementations the existence wouldn't have to impact backwards compatibility of the existing (parent) types and they would look and behave like any other type (except for further restrictions on the memory representation).

Motivation

Enums

Many crates have a single Error type used everywhere. However I've now a few times had the situation that I wanted a function to only use a subset of them. For example because it can only ever produce some of the error variants:

// Shortened for brevity
pub enum Error { Encryption, Decryption, Serialization, IO}

// Subset that can be used internally (and potentially externally if the respective functions are public.
pub enum EncryptionError { Encryption, Serialization}

impl From<EncryptionError> for Error { ... }

pub normal_function() -> Result<(), Error> { todo!() }
pub encryption_function() -> Result<(), EncryptionError> { todo!() }

At the moment this either requires defining a second error type (as shown above) or using pattern matching with unreachable!() when calling that function somewhere that doesn't have/need/wants to use Error:

fn perfectly_normal_function_that_only_uses_encryption() -> Result<(), SomeOtherType> {
    match encryption_function() {
        Err(Error::Encryption) => return Err(SomeOtherType::Encryption),
        Err(Error::Serialization) => return Err(SomeOtherType::Serialization),
        Err(_) => unreachable!(),
        _ => {},
    }
}

This either leads to:

  • extra boilerplate (separate type + converting between them because they don't necessary have the same memory representation) AND/OR
  • the compiler not being able to ensure all cases have been handled, leading to panics (unless using no_panic AND/OR
  • no indication on what errors are allowed to happen.

Structs

Sometimes you want a function that produces a part of a struct, with the result ending up in the other type. You can do this using a sub-field, but that can become cumbersome when not constructing the object using this method, when there is more overlap and two subsets of fields this becomes even more chaotic:

struct A {
    b: B
    // More fields
}

struct B { /* Many fields */ }

// Could of course be a member of B
fn something_creating_b() -> B {}

// A method that doesn't want to bother with this
fn normal_function() -> A {
    A {
        b: B { ... },
        ...
    }
}

As with enums: While this can be done with separate types both containing the fields and .into() it is still cumbersome, requires boilerplate and potentially has a runtime overhead due to the types not having a compatible memory representation.

Though note that for structs the direction of the From implementations would be inverse to those in an enum, since the parent enum can represent all types of the subsets but a parent struct can only get a some of its fields from the subset types.

Related:

  • Create syntax and support for pattern matching based on type wrapped inside enum The goal presented there and the issues mentioned in the comment could be solved by allowing pattern matching using these sub-types, though that might require some changes to how pattern matching is implemented in the compiler.
  • There is the superstruct crate that adds something similar (plus an enum containing all variants/subsets), but as far as I know that only does it for structs. It is more flexible as it not only allows using a subset but it therefore cannot provide optimizations due to identical an memory-layout as far as I can tell.

Final words

Sorry for that long post. I'd like what you think about this (in terms of usefulness, added complexity to the compiler and how important the identical memory representation/layout is in your opinion). I think it would add something useful to the language, especially when regarding error types, might even be useful internally for optimization and it should not add much in terms of complexity (neither to the compiler nor the complexity of the language). Nor does it have additional restrictions on other language features (as far as I can tell).

For the enum case, this could be covered by refined types / pattern types:

With your example, EncodingError would be pattern_type!(Error is Error::A(_) | Error::E).

2 Likes

Yes :+1: (somehow I haven't seen that one, yet or forgot about it).

I wonder how useful it is/would be to allow truncating these types for a smaller memory representation. Main issue would probably be that calling a function on the "normal"/base type would require moving it in memory to add the missing (padding) bytes. For example on Enums if the data in all included variants is stored in the beginning and needs padding at the end to make it as large as a not-included variant. Not sure if that was mentioned somewhere in the topic you linked, I'm still reading.

1 Like