Blog post: Extended Enums and Thin Traits

I agree. I was thinking as I was reading the blog post that local or closed might make sense as a name for these traits.

I feel like it is legitimate for representation choices to cause some operations to become unavailable. What would be bad is if the same operation has two very distinct meanings, depending on the representation. It seems like having a thin representation prohibit other crates from implementing the trait falls into the former category; this seems analogous to how using a packed representation will make taking references to fields unsafe. Another example might be allowing enums (at some future point) to use low-order bits to distinguish variants, thus achieving tagged pointers: this would have to forbid taking references into those enums.

EDIT: As @aturon later pointed out, thin traits can be implemented by other crates -- that's what makes the "open extensibility" I talked about in my post -- but they cannot be implemented for types of other crates. Anyway, the broader point stands.

If #[repr(thin)] prevents implementation of a trait from other crates, isn’t it the same thing as sealed traits (https://github.com/rust-lang/rfcs/search?q=sealed)? I.e. it will allow to lift some restrictions from coherence as well.

repr(packed) already has some strong semantic impact (internal references are unsafe). Arguably repr(C) and the rest also have a semantic impact if alignment and size are important aspects to a program’s correctness (e.g. binding to hardware interfaces).

I don’t like the situation with packed structs either but at least that case is very rare. IMO, no one should be exporting packed structs across crates except for, maybe, c bindings. I don’t think packed enums should use repr either but I’ll argue that case when it comes up.

My main problem with #[repr(thin)] is that it will significantly affect public APIs. Maybe we need an RFC outlining what attributes in general should and should not be used for (I see them as compiler directives but this is going well beyond compiler directives).

That's not the limitation being imposed here. Rather, you can only impl a thin trait for types that you define (where usually, if you define a new trait, you can apply it to existing types). That's because a thin trait influences the struct layout by inserting a vtable.

To be concrete, you can define a thin trait in crate A, and in crate B, you can impl that thin trait for a struct defined in crate B.

2 Likes

I used to feel exactly the same way, and had long pushed for this to be a keyword when we were working through the design. But I've since come around to @nikomatsakis's position: the behavior of traits does not vary at all, and the choice here is entirely about optimization of representation, that happens to also limit (but not change) the cases where the trait can be applied.

It's worth keeping in mind that, due to the orphan rules, the primary place where a trait is applied to types defined elsewhere is in the crate defining the trait -- since that's the only one that can do so arbitrarily. So I suspect this representation choice will mostly affect the trait definer rather than downstream crates in practice.

2 Likes

The difference is that repr(C) (versus repr(packed)) modifies compiler-level semantics. They don't affect rust as a high-level language, just how the compiled binary interacts with other programs and the hardware. In general, I'm fine having representation affect unsafe operations.

Good point. I withdraw my objection (to repr(thin), I still object to repr(packed) but that ship has sailed).

Unless I'm mistaken, the compiler could theoretically apply repr(thin) automatically to all traits not implemented on foreign types. If that's the case, this really is just a compiler hint saying "compiler, don't allow me to do something that will disallow this optimization" and doesn't affect API.

#[repr (thin)] only affects the local crate’s ability to implement the trait because orphan rules already preclude other crates from implementing that trait for alien types, correct? Is there another reason it wouldn’t be back compat to remove a thin repr tag from a trait? Is the performance characteristic of thin pointers always a win whenever you don’t need the flexibility of fat traits?

If the answer to these qs is yes, no, yes, wouldn’t it be optimal for the compiler to just use thin pointers for trait objects unless it can’t?

Implementing a thin trait adds an implicit vtable at the head of the struct. This will affects its size, naturally, and could break unsafe code, as well as other assumptions. It might also affect performance of plain safe code, depending on the ratio of struct-to-object instances (i.e., if you don’t use objects, as is common for traits, you’re just wasting memory). I think it should definitely be something you opt into.

this doesn't make sense to me. IIRC, there are two possible use cases here:

  • Either:
struct S{...}
#[repr(thin)]
trait T : S {...}
  • Or:

#[repr(thin)]
trait T {...}

In the former case the same crate provides both the struct and the trait so the coherence doesn't matter and in the latter the struct could be defined as in your example in a client crate but then the upstream crate forces an unnecessary layout decision on its clients without knowing the client struct's size and it could actually make performance worse (e.g when the struct fits a cache line exactly and adding a vtable inline will cause it to be bigger than a single cache line. This decision is better made at the same location (by the same person) defining the data layout, as part of the struct definition in the client crate and not on the trait.

As I said on the reddit thread, i really like all the separate pieces of these suggestions but the way those pieces are put together is wrong IMO. it is backwards and breaks Rust's current very clear and orthogonal design and violates the separation of concerns principle.
traits in rust define interfaces and concrete types (enums and structs..) define data layout. clearly we want to affect the data layout and therefore this does not belong in the trait definition.

This is odd that #[repr(thin)] which is a property of the struct (the fact that it has a vtable in it) is marked on the trait (which in my mind is where you declare the interface that the data implement but don’t actually impose any data (except with the hypothetical struct inheritance stuff but which is more like adding an explicit contract that some data is present, than the implicit vtable pointer here). I assume that marking the trait thin rather than the struct is because of the &-syntax (and transparency of fat pointers in general) where &Foo could be a fat or thin pointer depending on whether the vtable is in the pointer or the structure. I get it that we don’t want to add a specific syntax when talking about the pointer itself but on the other hand it is odd because this is actually where it matters (on the pointer) so it would make sense to be explicit there instead of on the definition of the interface. I can’t think of a convenient way to express it A thin pointer to something that implements Foo would look like

&thin Foo

and a fat pointer would remain

&Foo

The declaration of the interface and the way the virtual dispatch is made could be orthogonal and I think it should in an ideal world. Perhaps it is too late for something backward-compatible.

What I just wrote would also mean some sort of syntax on the struct, of course. But yeah the more I think about it the more I would be sad that I have to look for the existence of a trait somewhere to know the memory layout of my data.

Let me get this straight, because my first reaction was, I think, based on a misunderstanding of what you wrote. Is this your proposal?

  • Let each struct opt into or out of the vtable for a specific trait.
  • The trait doesn’t care either way and in fact can be implemented both by types with vtable and types without vtable.
  • When producing a &Trait, whether this pointer is thin or not depends on the concrete type from which it is produced, and thus this is tracked in the reference type.

Assuming the above is correct, I have several questions/issues:

  • This doubles the amount of pointer types: You’d need &thin Trait and &mut thin Trait, exponentially more as more dimensions pop up. As satisfying as orthogonality is, pointer type proliferation is a real problem that Rust has been fighting in the past. A really good motivation is needed to add new pointer types.
  • How does this interact with trait objects other than those behind &? What about Box<Trait> and Rc<Trait>?
  • Won’t this split the trait’s ecosystem in two, resulting in either lots of duplication or some functionality only being available for fat/thin pointers respectively? This issue already exists for &mut to some degree. Presumably a &thin Trait could be converted to a fat &Trait., but a &mut can often be reborrowed to & and it doesn’t completely solve the problem.

Your understanding of what I wrote is correct.

Whether this doubles the amount of pointer types is really a matter of whether you think mut and const doubles the amount of pointer types, and whether the implicit possibility of fat pointer themselves doubles the number of pointer types today (when reading &Thing in the code it can already be either a thin or a fat pointer even if it is syntactically transparent). I assume that you mean that the proliferation of pointer syntax is a problem (thin would give you a guarantee about data layout and the strategy for dispatch, but doesn’t change the ownership or other kind of “logical” semantic (The issue back when sigils where removed was that we had syntax for things like ref-counting, and owned values).

Trait objects would naturally follow the same rule: Box< thin Foo>, Rc< thin Foo>, etc.

As you said, we would have rules like a thin Foo could be coerced into a fat Foo since thin is a constraint on the data type but does not preclude also having a fat pointer to the struct by copying its vptr into the fat pointer. I think (but perhaps I am misunderstanding you) that my proposition actually prevents us from splitting the trait system in two! With my proposal you can actually have types that implement a trait with the thin pointer approach and types that don’t. without my proposal however, when you create a trait, you have to decide whether people will use thin or fat pointers, but how do you make this decision? Thin pointers are an optimization for data layout, they are not related to exposing interfaces. Today we have libraries like the standard library that define common traits that can be used by everyone. This means that we can never use these standard traits with a thin pointer approach. If you find yourself in a situation where you need the thin pointer optimization, you will have to duplicate the trait and roll your own ThinWriter, etc. I am certain that everywhere &thin would have divided the trait system (that is you need both thin and fat pointers to foo in your code and have incompatibilities between between a thin and a fat references), it means that you would have had to create a &ThinFoo trait which would actually cause an even deeper division since you declare the the same interface twice can’t coerce ThinFoo into Foo.

In fact, being "thin" is a property of both the trait and the struct. That is, we have to know whether a trait is thin so that we know what size &Trait has -- and of course we don't know the type of the underlying struct there, that's the whole point of objects. On the other hand, as you point out, you have to know whether a struct implements a thin trait because it affects the layout of the struct itself.

I would say that being thin is a property of the pointer and the struct rather than the trait (which I think of as a collection of methods that constitute an interface, so really just a vtable in the case of dynamic dispatch). As a result, from my point of view the size of &Trait should be 2 words and the size of &thin Trait should be 1 word (fat being the default for back-compat).

@nikomatsakis: Sorry if this is blindingly obvious, but could you please comment on why this idea was abandoned in favor of thin traits?

I think that proposal has a lot going for it from a technical perspective, but the ergonomics are quite lacking, and it seems likely to lead to interoperability problems. Details to follow.

From an ergonomics point of view, pushing information onto the reference is also pushing complexity to the consumers of a library. I find it generally makes things feel more complex, since you are confronted with fat-ness and thin-ness at every use of a trait. The names in that proposal were also somewhat consuming: for example, &Fat<Trait> actually gives you a thin pointer. Finally, there is no precedent in Rust for a distinction like Fat<T,U> and Fat<U>, where omitting the type argument corresponds to something quite different (i.e., not a default), nor for using a “pseudo-type” like Fat that seems to really be a kind of keyword.

It also seems likely to me that having some references to traits be thin and some references be fat will lead to composability concerns, where one library is requesting a &Trait and another has a &Fat<Trait> (or vice versa). It is certainly not possible to convert from fat-to-thin, though the other direction problem does work.

It’s not clear to me that mixing fat and thin is a big use case to begin with. For the use cases I am aware of, one usually wants all objects from a trait to be thin or not thin, not some ad-hoc mixture. In general, I consider thin pointers to be somewhat niche, particularly across crates like this – typically in the downstream crate, you know the actual type concretely, and you only employ the trait to kind of thread information upstream into a more generic context. Alternatively, as might be the case for Servo, you are spreading information across crates as much for convenience as anything else, but you still expect all of the trait objects (in this case, dom nodes) to be interoperable within the same data structures.

Finally, the thin trait proposal still allows for one to take an existing fat trait and make thin references from it. You can do it by making a thin trait that extends the fat trait, but adds no methods:

trait Foo { ... }

#[repr(thin)]
trait ThinFoo: Foo { }

Now one can have &ThinFoo for thin pointers and &Foo for fat pointers. I believe there is no technical obstacle to this working. (Of course, it assumes we get upcasting working for trait objects, but there are many reasons that we would like that to work.)

1 Like

Thin pointers, in whichever form they are implemented, push the same amount of complexity on the users of a library, because if you want to write good and/or efficient code, you need to know the layout of your data. so the complexity should not be considered as whether the user has to add annotations to the reference types, but whether he has to figure out by himself whether the compiler is messing with his data layout because of something that is not explicitly stated in the struct definition.

If thin pointers are a niche, then adding a thin annotation only adds complexity to the people who opted in to it. And in fact it did not add complexity because they did make the conscious decision to opting into a thin pointer, so they know what and why they are doing it. The thin annotation removes the burden of reading the definition of the pointee to verify something that is important to the pointer. Implicitness, however, does add complexity in the sense that you need to read a lot more code spread in different places to understand what the layout of your data is.

In C++ I can understand the layout of a class/struct by looking at what is in its definition (and the definition of its parents, which are explicitly enumerated in the struct’s definition). I believe this to be a very important thing. If I write a struct that needs to be exactly the size of one cache line and implements or references a trait from some 3rd party library, the impact of adding or removing #[repr(thin)] on the 3rd party traits can be disastrous. Memory layout is something people rely on, especially in rust because it hugely impacts performance in some cases, and therefore I believe it is very important to know the size of the pointers in my structure and whether it bundles a vptr by looking at the structure and more important I want to make sure it can’t change underneath me.

One might answer that I am talking about very specific performance sensitive use cases, and that most people don’t need to deal with that. And that is spot on: thin pointers are only useful in these performance sensitive use-cases, so as long as it is an opt-in thing, those who don’t need it won’t be affected. Thin pointers should be considered with the mindset of the people who think in term of cache lines because it is their problem that it is fixing, and not in term of the people who won’t affected anyway.

Apologies, I tend to write long comments.