Infinite types as unsized?

The following code currently fails to compile:

enum Nested {
    Zero,
    One(Nested)
}

The error is:

error[E0072]: recursive type `Nested` has infinite size
 --> src/lib.rs:3:1
  |
3 | enum Nested {
  | ^^^^^^^^^^^ recursive type has infinite size
4 |     Zero,
5 |     One(Nested)
  |         ------ recursive without indirection
  |
help: insert some indirection (e.g., a `Box`, `Rc`, or `&`) to make `Nested` representable
  |
5 |     One(Box<Nested>)
  |         ^^^^      ^

But this seems to be an odd way of putting it, because in order to get an 'infinite' size, you'd have to construct it in some sort of infinite loop. So I'm wondering why such a type could not instead be unsized, and require representation using a fat pointer that carries the nesting depth.

This would be a lot more convenient than boxing everything and/or setting up an arena and passing it around to ensure locality...

6 Likes

The description is correct. The type has infinite size.

Something of infinite size does not need to be constructed in steps. You can easily define a mathematical construct that is infinite just by definition alone (like here).

The fact that somewhere down the wrapping layers you may eventually come to a Zero doesn't mean the type does not need to allocate space for the One case. So it has to allocate enough space for all possible instances of this type.

I remember the definition of infinite size basically means something like this: given n where n is any number, size > n. You may Google it to get the exact definition.

You can try to proof that type is infinite by first assuming that it is not, then you'll easily construct a proof by induction that comes to a contradiction.

Of course, this assumes the enum structure has an overhead which is non-zero (otherwise the type is zero and it is not infinite).

On the other hand, unsized is the wrong description because it says the type has "no size", or "unknown size" or a size that has no meaning. None of these is true.

The type has a well-known exact size that is valid: and that size is infinite.

5 Likes

@schungx The type has an infinite amount of terms but we could find a way to store each term using a finite amount of memory. This is similar to [T], such types are called unsized in rust.

I think we can find a way to represent nested enums as unsized types. This representation has to be different than the normal representation of enums (as tagged unions) however I think this is acceptable because we already have different representations for enums where we can exploit a niche (Option<&T> is the same size as &T). One way to do this as @skysch points out would be to store the nesting depth in the fat pointer. I think it would also be possible with thin pointers in a way similar to C strings.

Such nested enums would correspond to W-types in type theory. These are a kind of inductive types. Examples can be found in dependently typed languages like agda. The nested enum @skysch discusses corresponds to the natural numbers which can be defined in agda as follows:

data ℕ : Set where
    zero : ℕ
    suc : ℕ → ℕ
4 Likes

Here's an example on how to store Nested as a DST! We can even pretty-print it and use match

enum Nested {
    None,
    Cons {
        item: char,
        cons: Self,    
    }
}

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=8795b4d8acefdae9be59443a29114e8c

the same works for structs;

struct Foo {
    ..data,
    this: Option<Self>,
}
// into
struct FooNonRecursive {
   ..data,
    this: bool,

let ptr: *const FooNonRecursive = /*  -- */;

for i in 0.. {
    unsafe {
        let foo = *ptr.add(i);
        if !foo.this { break }
        // use foo
    }
}
2 Likes

Rust's "unsized" types actually still have a fixed finite size known when the object is constructed. The difference is only that this size is stored in the fat pointer or the vtable of the object.

So they're more like run-time-sized, rather than of unknown size.


Do you have an example of a more complex structure that would benefit from it?

Because in this case the data boils down to a number of Ones followed by a single Zero like this:

One, One, One, One, One, …, One, Zero

and you could express it as:

struct NotNested<const N: usize>(ArrayVec<One, N>, Zero)
5 Likes

This is a very weak argument, but it's the strongest one I could come up with at the moment...

It might be useful when constructing balanced trees of some fixed size, similar to how a slice's length is always known. That may be of help if you try to place the tree on the stack for some reason.

Note that even if that enum was considered unsized you still wouldn't be able to define it because enum fields can't be unsized.

No, you are mistaken, and the error message is correct. The necessary allocation size of an enum is the maximal possible size of its largest variant. This size doesn't shrink just because at runtime, a smaller variant happens to inhabit the enum. The following prints 16, even though the actual value would only require 2 bytes of storage:

use std::mem::size_of_val;

enum SmallAndBig {
    Small(u8),
    Big(u64),
}

fn main() {
    let v = SmallAndBig::Small(0);
    println!("{}", size_of_val(&v));
}

So, a self-containing enum always has an infinite necessary allocation size, regardless of what value it happens to dynamically hold.

7 Likes

I think you're right, and it was easiest for me to see when I think about appending to a mutable slice (which you can't do for good reason,) but for which what I proposed here would be equally impossible but also necessary for it to be of any real use.

Note that the reason for which this problem doesn't show up in Haskell is that the Haskell type is implicitly equivalent to

enum Nested {
  Zero,
  One(Arc<Nested>)
}

That type compiles flawlessly in Rust.

1 Like

This is technically correct for now but this is because it layout must be such that it can hold any variant and there are no !Sized enums. It seems very interesting to toy with the idea of allowing dynamically sized enums. In that case, the static size constraint is not an issue and the representation might be just as large as required for the actual variant. Of course, this is a bit of a design challenge since it would be not safe to change the active variant but we can't write/copy an unsized value directly safely so it might be feasible.

Sure, but I don't count those as self-containing types, because they contain pointers, and they don't contain themselves directly.

As others pointed out, currently, an enum type always has a fixed size, and if a fixed size is assumed the recursive enum type cannot exist.

To allow such a recursive enum type, two features are necessary.

Allow dynamically sized enum type

Like dynamically sized structs, an enum type can automatically make dynamically sized if one of the variants has a dynamically sized field at the end.

I found an old RFC issue https://github.com/rust-lang/rfcs/issues/1151. There are some subtleties regarding where to put enum tag, as the tag is necessary as the metadata.

Automatically make otherwise infinitely sized enum types to dynamically sized

If DST enum is implemented, it is possible to make recursive enum "just work" by making it dynamically sized automatically.

However, I think automatic dynamically sizing is problematic when multiple types are mutually recursive, as there are multiple ways to "solve" the infinite-ness. Consider these mutually recursive enum types:

enum E1 {
    X,
    Y(E2),
}

enum E2 {
    X,
    Y(E1),
}

Currently, there are two minimal ways to make it work by boxing: either modifying E1 to have X(Box<E2>), or modifying E2 to have Y(Box<E1>). Respectively, either E1 or E2, or both can be made dynamically sized, but the compiler has to determine which way is preferred. I guess there are more issues when generics are involved.

Thus, I think, requiring an explicit keyword or something is a sensible way to support the recursive dynamically sized types.

I would not expect the compiler or the program to pick one or the other to make dynamically sized. Rather, I would expect the compiler to lay everything out in-line, and there's only one way to do that for a given value- or in other words the compiler would simply make both dynamically sized.

For example E1::Y(E2::Y(E1::X)) would (like a slice) just be a sequence of tags, and finding its size would involve reading successive tags until you find the last one, corresponding to a variant with no dynamically sized field.

This is similar to how inheritance is laid out- any extra fields in X or Y go first at a fixed offset, much like a base class, then the dynamically sized field follows (and potentially includes a dynamically sized field of its own), like a derived class.

1 Like

I beg to differ. [T] is unsized because it has a definite size (the actual number of items). Only that size is unknown.

A recursive enum has infinite size simply because an enum requires it to have a definite size for all possible instances. If Rust starts implementing enum potentially with a count, then it will be unsized. But right now, sizeof(enum) >= largest instance, and the size of the largest instance is unbounded.

EDIT: Just realized @pcpthm 's answer is much better than mine!

If unsized enums were to become a thing, care should be taken, that beginners don't just stumble upon them by accident. Having someone ask the community why they get the recursive error and getting a proper explanation is worth more than silently turning the enum into an unsized type, which will most likely cause worse error messages down the line.

IMO, explicitly annotating an enum with dyn seems like a fitting solution. That way it's obvious at glance, that an enum is going to have different layout rules applied to it, which I'd consider acceptable and the added complexity is opt-in. Therefore, people who don't need it won't have to deal with it.

Personally, I never thought about unsized enums, but I imagine it'd have the same design issue as structs, that'd like to contain 2+ unsized fields or not have the unsized field at the end.

There would most certainly be somewhat hidden costs around dereferencing fields beyond an unsized fields, which may turn out to be either unfeasible or non-trivial to design, because multiple valid designs likely exist. We don't even have a stable trait object layout, yet, which suggests, that unsized type layouts are very tough or impossible to find an optimal solution for.

2 Likes

why not just, enum Foo: ?Sized {} or enum Foo where Self: ?Sized {}