Blog post: View types for Rust

The problem is that (AFAIK) the current implementation of mem::swap only considers the layout of the type. It’s basically doing the same as turning each of the two &mut T arguments into &mut [MaybeUninit<u8>; size_of::<T>()] first and and performing swap on those. Since a type {baz} Foo would still point to the whole Foo and also have the same size as the whole Foo, mem::swap would necessarily also touch the contained bar.

Or put differently, when people say that moving in Rust is (effectively) performing memcpy they’re referring to fn memcpy(dest: *mut c_void, src: *const c_void, n: size_t) -> *mut c_void which would just copy the entire size of the struct, including padding and whatnot; no way of not touching the disallowed parts of a type like {baz} Foo.

1 Like

This depends on whether a view such as {baz} Foo should be a supertype of the struct Foo. If yes, a Foo can be implicitly cast to a {baz} Foo:

impl Foo {
    fn bar(&{baz} self) {}
}

let foo: Foo = todo!();
foo.bar();          // (1) implicit cast
(&{baz} foo).bar(); // (2) explicit cast

If a view is a supertype of the struct, then it must be a new kind of type. Or it might be possible to add a new kind of "unsizing coercion" to allow line (1) without introducing a type hierarchy. This brings me to my next point:

I think that {baz} Foo should not be Sized. That means that functions such as mem::swap can't be called with views, which is a good thing. It also makes sense intuitively, because a view can have fields that aren't consecutive:

struct Foo {
    a: (i32, i8),
    b: (i32, i8),
    c: (i32, i8),
}

What should be the size of {a, c} Foo? I think it's best to just say that it doesn't have a size.

1 Like

Wouldn't it be possible, at least for private methods, by extending the tracking of which fields are borrowed from the method level to the impl level?

That would also avoid the syntax issues and not raise concerns about exposing implementation details.

2 Likes

Interesting idea, the problem I see with this (and probably any other solution) is that it will require either documenting private fields or documenting which groups (type aliases) don't overlap. In the absence of that, it would be a very opaque and unpredictable interface.

1 Like

I would like this disjoint-capturing self to be automatic for private methods. There's no semver hazard in doing this, and it could at least partially remove a gotcha and just do what users expect, with no extra syntax. I think it's fine when users stumble when it doesn't work for public methods: "It's hiding implementation for a public API" explanation makes a lot of sense.

14 Likes

I think it would be useful to allow expressing abstract groups of parts of a struct that could be borrowed/moved/etc. where those groups aren't necessarily tied to being specific fields...that could be quite useful for expressing borrowing some field that is somehow pointed to by a struct, for use with unsafe abstractions:

#[repr(transparent)]
pub struct MyStruct {
    'self: {
        'a, // group a
        'b, // group b
    }
    value: c_void,
}

extern "C" {
    fn get_a_ptr(this: *mut c_void) -> *mut u32;
    fn get_b_ptr(this: *mut c_void) -> *mut u8;
}

impl MyStruct {
    pub fn get_a(&'self.'a self) -> &'a u32 {
        unsafe {&*get_a_ptr(self as *const _ as *mut c_void)}
    }
    pub fn get_a_mut(&'self.'a mut self) -> &'a mut u32 {
        unsafe {&mut *get_a_ptr(self as *mut _ as *mut c_void)}
    }
    pub fn get_b(&'self.'b self) -> &'b u8 {
        unsafe {&*get_b_ptr(self as *const _ as *mut c_void)}
    }
    pub fn get_b_mut(&'self.'b mut self) -> &'b mut u8 {
        unsafe {&mut *get_b_ptr(self as *mut _ as *mut c_void)}
    }
}

For clarity, by private you mean non-pub, as in including pub(crate) and similar?

+1.

I don't see how adding explicit syntax in the function signature is actually better than what @kornel suggested. If you decide to borrow additional fields inside the function, you'd have to change the signature and then fix any broken callers -- which is exactly what you would do if the disjoint-capture was done automatically anyway.

Although maybe view types could be useful in other contexts...

Yes.

Absolutely. It would be nice to have these types at least to be able to explain what really happens with borrowing inside functions, the non-pub method behavior that I'm rooting for, and improved disjoint closure capture. In the same way lifetime syntax explains what the lifetime elision does.

2 Likes

I honestly get the appeal of doing this automatically, but there's still the intent issue, and why we don't allow global type inference for private functions either.

Or more specifically, it's currently possible to type check a function by using the body of the given function, the signatures of all called functions, the definitions of all used structures, and no more. As a key point, type checking a function does not require knowing the bodies of any other function.

Or even more direct of a comparison, it's why we don't allow lifetime inference for non public functions, even though we necessarily have the minimum bound information already.

Global type inference (of which subview inference is a subset) runs into the problem that you have to have the entire crate in (human) memory to understand it. You can't page out the details of the implementation of some helper module, because that implementation has direct impact on how you're allowed to use it.

Then there's of course still semver-like hazards in any large codebase; if I have a subroutine that introduces a log of the object state, I've broken any callers that were relying on me not using some other fields. If I do a temporary change that stops touching a field, then revert it later (as it turns out, bad idea), then anyone in the meantime could've relied on me not touching that field. (Sure, maybe that should've been a crate boundary. Oftentimes it's not, and sometimes it can't be, due to bidirectional dependencies.)

The important part of function signatures being fully specified is developer intent. Signatures give the developer a place to tell the compiler what they want the code's shape to be, which it then uses to help the developer to make sure that 1) the implementation actually fits inside of said shape (fulfils the obligations), and that 2) calling code only relies on the promised shape. Having this layer of developer intent allows for better errors (in the common case when most signatures are mostly accurate to intent) since the compiler knows what the developer wanted as well as what the developer wrote, and can offer hints both in the direction of stronger guarantees or fixing what was written. (Personally, I like avoiding errors that go "because it uses ... that uses ... that uses ... that uses ... that uses ... that uses this field". We have some (autotraits, blanket impl chains), but each for good reason.)

It's interesting that global type inference is kind of halfway between static typing and dynamic typing. Personally, I think Rust's local type inference is the sweet spot.

I'm against global type inference for anything but diagnostics for the reasons above. Diagnostic use is fine (e.g. _ as a return type, rustc will tell you what type you're returning to put there, though that's not actually global inference) and great, because it's the "did you mean" that makes a great peer programmer compiler. The compiler could track this inference and say "hey if you made these borrows take disjoint views it'd compile fine," and I'd rejoice at the compiler getting more helpful. It's the implicit reliance on function bodies for signature clarification that I'm against.


The reason type/borrow/disjoint captures inference works for closures is that they're entirely local. Their inference is still bound by the function definition. The implementation of the closure is clearly part of the implementation of the function it's a part of.

I'd love to see it become easier to lift said closures into full functions. I just don't think that expanding closure inference to currently fully specified function signatures (for successful compiles; I'm all for "allowing" inference in unsuccessful compiles to suggest adding the inferred signature) is a good idea.

13 Likes

Interesting note: at least in the case of unnamed/implicit/public field sets, we already have a syntax for only borrowing some fields:

// the bikeshed syntax
impl WonkaChocolateFactory {
    fn should_insert_ticket(&{golden_tickets} self, index: usize) -> bool {
        self.golden_tickets.contains(&index)
    }
}
// becomes the stable syntax
impl WonkaChocolateFactory {
    fn should_insert_ticket(
        Self { golden_tickets, .. }: &Self,
        index: usize,
    ) -> bool {
        golden_tickets.contains(&index)
    }
} 

Obviously if we wanted to give this disjoint captures semantics, it'd have to be in a new edition, to avoid dropping a semver change on previously published crates.

And if you wanted to keep the receiver sugar,

    fn should_insert_ticket(
        &self { golden_tickets, .. },
13 Likes

I don't think I'd want to piggy-back this on privacy, since just because something's pub doesn't mean it's externally accessible, and private_in_public is still just a warning.

I'd rather see a different opt-in for such functions/methods (and maybe even types) to mark them as borrow/type/etc-checked at use, kinda like they were macros.

My standard example:

placeholder-contextual-keyword fn mul_add(x, y, z) { x * y + z }

This is an interesting idea. A couple of things that occur to me:

Interaction with lifetimes

These views become attached to 'a lifetimes, don't they? If I have a function:

pub fn iter_bars<'a>(&'a {bars} self) -> impl Iterator<Item = &'a Bar> {...}

then that 'a is effectively bound to bars right? So a later &mut self or &mut {bars} self would conflict with it, but &mut {beans} self would not?

How does this generalize to all lifetime interactions? How much does it complicate reasoning about them, or presenting diagnostics about them?

"Exposes internal details" is good, actually

I've seen a few comments on this, but it's not obvious to me it matters. If you see & { internal_detail } self then I guess there's the concern that if you also need to express that type, you're baking it into your signatures as well. So really those names become part of the API, in the same way an associated type would. So perhaps it makes sense to have a notion like & { internal_name as api_name } T. (This reminds me of how named parameters in OCaml end up becoming part of the signature of a closure.)

Aside from the naming issue, it does also expose a more fundamental property of your implementation, but one that is probably useful to be part of the API. In effect you're saying "this method relates to this subset of my state" which means that it is coupled with other methods which touch overlapping parts of that state - and more interestingly - is independent from the methods which touch disjoint state. We effectively model that today by having sub-structures to encapsulate those states (as the blog mentions), but that can only model states which can be cleanly grouped like that.

So generalizing the internal_name as api_name to be able to give a public name to a group of state, similar to what @mjbshaw proposed, seems like a reasonable thing to consider (though I'm not sure about that syntax).

1 Like

It will not be the first: we already have (unstable) extern types.

Considering that the problem, in general, is with assigning the whole struct, which also changes hidden fields, unsizing it seems like a good idea. This way, when you have a view you can't do something meaningful with it other than access its allowed fields (which are sized). This will force views to be different types even if they include all of the fields, but I tend towards it anyway for the reason specified in your post.

4 Likes

If we really want to lean into the "kinda like they were macros", then macro fn makes a lot of sense.

Semantically, I would describe macro fn as expanding similar to the following:

macro fn mul_add(x, y, z) { x * y + z }

mul_add(f(), g(), h())

// roughly semantically equivalent to 

match (f(), g(), h()) {
    (x, y, z) => { x * y + z }
}

though with mul_add (maybe? maybe not?) having its own call stack entry, its own monomorphized function site rather than strictly being always 100% inlined.

If the argument patterns are partial patterns (e.g. WonkaChocolateFactory { golden_tickets, .. }), then this would semantically define macro fn to disjointly capture the named fields only.

However, with that definition with the early-evaluation rebinding, the whole binding is always taken, rather than field wise usage being possible (without a partial pattern in the signature). This is IIUC a fundamental limitation of the surface language as it exists today; there's no way to both only evaluate a (place) expression once and do disjoint captures with its usages.

If we do add macro fn/inline fn/whatever, I expect the function-interface semantics where the arguments are evaluated once to be met (unlike with bang-macros, which can freely do expression rewriting), so I'm not exactly sure how macro fn would exactly help the disjoint captures problem without supporting disjoint captures across a normal function interface first.

2 Likes

We have to think how you can specify nested fields of other structs, when they're not visible. Probably the best way is to nest views. If we choose this road, we have to decide whether we want to allow only nested views or both nested views and nested fields.

Another point I realize now is that we have to decide how this will interact with traits.

There are two options I can think of:

  1. Allow users to impl a trait only for the whole struct.

    We probably want methods to be able to take a view of self. For example, with @Aloso's syntax (ignoring allocators for simplicity):

    pub view Len {
        len,
    }
    pub view Slice {
        mut buf.ptr,
        len,
    }
    impl<T> Vec<T> {
        pub fn len(self: &Self::Len) -> usize { self.len }
    }
    impl<T> Deref for Vec<T> {
        type Target = [T];
        fn deref(self: &Self::Slice) -> &[T] { /* ... */ }
    }
    impl<T> DerefMut for Vec<T> {
        fn deref_mut(self: &mut Self::Slice) -> &mut [T] { /* ... */ }
    }
    

    When implementing a trait that has a method that takes a parameter of type T (any parameter or only self?), you'll be able to implement it with a method that takes any view of T. When calling it, the compiler will coerce T to the view. This coercion will have to also happen with the receiver, along with auto(de)ref and array/slice unsizing.

    I think this is the best solution.

  2. Allow to impl the trait for any view. Then:

    Do we allow them to collide?

    Do we coerce?

    If both answers are "yes", what do we do when there is a contradiction?

We also can, of course, not allow views to be used on traits, but this greatly reduces their potential.

I remembering raising this before:

I enjoyed the proposal a lot. My tiny bit of feedback is that I think the "how does that affect learning" part is could be tightened up and strengthened. I actually think this would be a positive change.

It is my experience that there are 2 camps of programming languages: ones that are highly dynamic and concepts start existing as patterns. Rust, in my feeling, is the other camp: it names important cases and makes them accessible through the type system, allowing humans (and machines like the compiler!) to pick them up.

Also, I feel like the blog post gives enough of an argument on how the issue arises naturally. That means it will probably not impact users early, but it will give them a goal to learn towards. Before such a change, users need to be aware of an issue that exists "out there", after such change, they will find a chapter "view types" and then figure out what they are useful for.

On the topic of concepts like views, I also experience in my training practice that there's not a lot of awareness, making it even harder for users to find good an common solutions around the issues you describe.

There's an argument to be made that it will make the language more complex, but I think Rust is already naturally down the path of "identifying useful concepts and making them part of the language". Whether it will make it more complex for the user is a question I would challenge though.

5 Likes

What about using struct-like syntax inside of traits to declare views?

Like so:

trait WithView{
   ref View{
      label: (mut) Type, //and so on
   }
}

In an implementor we have smth like:

impl WithView for Foo {
   ref View{
      label = ref(?) (mut) self.foo //...
   }
}

And the consumer sees:

fn method(self: Self::Foo) {...}

Short version:

fn method(self: Self{ref View})

Or even

fn method(self{ref View})

Possible extension: allow calling more complex code in such views: match or even (limited?) function calls.

Unresolved: Interactions with arbitrary self types.

There are some crates that let you do something like this: partial_borrow partial_ref and borrow_as. For people wanting something like this, it might be a good idea to experiment with them and see whether these approach are worthwhile.

Full disclosure: I'm the author of partial_borrow.

(I will post a copy of this to the proto-RFC thread mentioned earlier Partial borrowing (for fun and profit) · Issue #1215 · rust-lang/rfcs · GitHub)