Blog post: View types for Rust

nikomatsakis · November 5, 2021, 3:42pm

Comment thread for my latest blog post, View types for Rust:

I wanted to write about an idea that’s been kicking around in the back of my mind for some time. I call it view types . The basic idea is to give a way for an &mut or & reference to identify which fields it is actually going to access. The main use case for this is having “disjoint” methods that don’t interfere with one another.

steffahn · November 5, 2021, 4:03pm

This is a very natural extension of the language IMO, in line with rough ideas I’ve head myself, too. I like the part discussing that it gives explicit syntax to language features that the borrow checker (within a function) already offers.

The example code in the blog post seems to suggest that (when not using method-call syntax) you’d use an explicit expression-syntax to select which fields to view when creating a reference. I’d assume that the necessary view type could actually usually be inferred, extending the meaning of &EXPR and &mut EXPR.

Something like

let x = Foo { bar, baz };
let ref_1 = &mut x;
let ref_2 = &mut x;
ref_1.bar += 1;
ref_2.baz += 1;

would then probably also start to compile (because the compiler could infer ref_1 to be &mut {bar} Foo, etc..), but is that a bad thing?

mjbshaw · November 5, 2021, 4:09pm

The approach is interesting, but the syntax is a bit verbose and it unfortunately leaks private implementation details into the public API. I know the private fields aren't accessible, but the public documentation would have to document which view each method took, and the view uses private field names that are then exposed to the public API. That's not a huge problem but it's far from optimal IMO.

I kinda wish Rust leaned into lifetime syntax for this more. Something like:

struct FruitBasket {
    'self: {
        pub 'apple,  // Exposed in the public API documentation.
        pub 'banana,  // Exposed in the public API documentation.
        'count,  // Forbidden from being used in public method signatures
    }

    'apple apple_type: String,  // Field is accessible within 'self.'apple lifetime.
    'apple+'count apple_count: u32,  // Field is accessible within both 'self.'apple and 'self.'count lifetimes.

    'banana banana_type: String,  // Field is accessible within 'self.'banana lifetime.
    'banana+'count banana_count: u32,  // Field is accessible within both 'self.'banana and 'self.'count lifetimes.
}

impl FruitBasket {
    fn double_fruit(&'self.'count mut self) {
        self.apple_count *= 2;
        self.banana_count *= 2;
    }

    pub fn set_apples(&'self.'apple mut self, apple_name: String, apple_count: u32) {
        self.apple_name = apple_name;
        self.apple_count = apple_count;
    }
}

nikomatsakis · November 5, 2021, 4:21pm

That's an interesting idea. I was mostly focused on the "core calculus" in the post, I hadn't thought about inferring the set of fields that are accessed through a reference. I agree it would probably be relatively straightforward to do, no different really than what we're doing for disjoint closure capture; techniques for inferring structural records like row polymorphism also seem quite relevant (probably folks have investigated row polymorphism extended to cover a tree structure, actually, but I'm not sure...).

crlf0710 · November 5, 2021, 4:43pm

I've been using this pattern a lot recently in my tex-rs crate with the helper proc-macro from global_struct, I definitely think this is a useful pattern, and interested if it could be built into the language, too!

eholk · November 5, 2021, 5:03pm

I like this idea a lot. A few times this week I've had to work around issues that could have been neatly solved with this idea.

I think the biggest risk is the semver hazards it introduces, but one way to avoid that would be to only allow view types in the self parameter for private methods. That's a restriction that could be relaxed later but allows us to get some experience with the feature before committing to as much in public APIs.

I suspect in practice we wouldn't need to explicitly create a view very often (the let view = &{foo, bar} x syntax from your post). I expect the most common use case would be in the self parameter, and there we could create the view through auto borrows when you call the method.

I also like the idea of being able specify some fields as mutable and others as immutable in the view type. Although, it reminds me of back when Rust had mut modifiers and struct fields, and I remember that being kind of tricky to work out. We probably wouldn't have the same issues here, but maybe?

tfgast · November 5, 2021, 5:36pm

I like this idea a lot, one possible way to avoid semver hazards of naming private fields would be using type aliases.

pub type WonkaTicketView = &{golden_tickets} WonkaShipmentManifest;

But I’m not sure about all the implications of that.

pachi · November 5, 2021, 6:06pm

This reminds me project Verona, dealing with concurrent ownership using regions.

github.com

microsoft/verona/blob/master/docs/explore.md

---
layout: default
title: Explore project Verona
---
# Systems programming

The term system programming language is used to cover a wide range of problems from high-level performance-critical systems going down the stack to low-level memory managers and kernel modules.
There are two distinct aspects to system programming:

* Predictability
  - Latency
  - Resource usage

* Raw access 
  - Can treat memory directly as bits and bytes
  - Little or no abstraction on the hardware

To implement a low-level system of various kinds (for example, a memory manager), you need raw access.
In some sense, the memory manager is producing an abstraction on the machine that high-level services can consume.
Guaranteeing safety through a type system for programmers with raw access has not been achieved.

This file has been truncated. show original

Maybe there are some neat ideas in there regarding syntax.

steffahn · November 5, 2021, 6:15pm

speaking of disjoint closure capture... that could potentially also benefit from view types. Two disjoint captures of two fields of the same struct could be represented as a single view-type reference to the whole struct only having access to the two fields in question, saving a whole usize of data for the closure. (Similarly for n captured fields, n-1 times size_of::<usize>() could be saved.)

nikomatsakis · November 5, 2021, 6:22pm

Yes! I think I talked about this in the blog post, didn't I?

Update: Here. I think this is what you meant, or was it something different?

Update 2: I remember now that my initial post didn't include this paragraph, actually, due to a copy-and-paste error, so maybe you read it before I fixed that.

steffahn · November 5, 2021, 6:26pm

That must be it, I don't remember reading that paragraph before. Seems to be basically exactly the same thing I said, so - no - nothing different.

camelid · November 5, 2021, 7:03pm

I literally just ran into a situation in rustdoc yesterday where I think some form of view types would have saved the day, so I'm intrigued! Although, in this particular case, I would probably need trait fields as well.

nugend · November 5, 2021, 9:01pm

The main disadvantage of doing this through fields in Traits is that for single field disjoint access, it's a lot more verbose, right?

Ryan1729 · November 5, 2021, 9:25pm

A perhaps awkward solution to part of the problem, that is available today, is to use macro_rules! instead of private methods.

The initial example would look like this:

macro_rules! should_insert_ticket {
    ($manifest: expr, $index: expr) => {
        $manifest.golden_tickets.contains(&$index)
    }
}

impl WonkaShipmentManifest {
    fn prepare_shipment(self) -> Vec<WrappedChocolateBar> {
        let mut result = vec![];
        for (bar, i) in self.bars.into_iter().zip(0..) {
            let opt_ticket = if should_insert_ticket!(self, i) {
                Some(GoldenTicket::new())
            } else {
                None
            };
            result.push(bar.into_wrapped(opt_ticket));
        }
        result
    }
}

And it would compile.

That addresses the case in the initial example, but it does not address the later case where we want to expose should_insert_ticket and allow users to call the method iterating over bars That wouldn't work, unless you made the golden_tickets field pub, and made the macro part of the public API.

djc · November 5, 2021, 9:31pm

I do feel like the idea from the blog post contains quite a bit of somewhat alien-looking syntax for what is, in the end, a relatively niche feature. This doesn't seem easy to solve since the notion of paths itself is not something we have syntax for today, so you'd have to invent a bunch of syntax. I wonder how restricting it would be in practice to only allow this one level deep (so fields rather than paths).

I find @mjbshaw's direction of thinking in terms of lifetimes interesting partly because I wonder if it could then also help with the self-referential lifetime problems.

My other thought while reading was on more type system-based or procedural macro-like approaches, for example, having some shorthand for a view type derivation, along the lines of:

#[view(BarView { foo })]
struct Bar {
    foo: usize,
    bar: String,
}

impl Bar {
    fn baz(self as BarView, bloop: u8) -> String {

    }
}

SkiFire13 · November 6, 2021, 10:41am

Related discussion: Partial borrowing (for fun and profit) · Issue #1215 · rust-lang/rfcs · GitHub

Aloso · November 6, 2021, 11:57am

The syntax I considered is something like

impl ChocolateFactory {
    pub view GoldenTickets {
        mut golden_tickets,
    }

    fn blub(self: &mut Self::GoldenTickets) {}
}

This has the advantage that view types have a name (preventing semver hazards) and a visibility.

josh · November 6, 2021, 12:12pm

I like the idea of "named sets of fields" as well. That allows defining compatible sets of fields without actually exposing what internal fields those sets contain, so you can evolve an API compatibly without having as many details exposed. I also think those names may naturally fall out from logical groupings of methods, and make sense to document.

steffahn · November 6, 2021, 2:59pm

A few more thoughts on the topic that I’m having.

In the context of a struct

struct Foo {
    bar: u8,
    baz: u8,
}

First, one could discuss whether a view type such s &{baz} Foo is

a special / new kind of type by itself
a “regular” reference type, so it’s a special case of &T where T is a new type “{baz} Foo”

The code

pub type GoldenTicket = {serial_number, mut owner} GoldenTicketData;

in the post hints towards the latter approach.

Notably however, this approach would not interact nicely with the way references work in Rust:

Assuming that {baz} Foo would be a Sized type, you could do things like mem::swap on two &mut {baz} Foo instances; the way mem::swap operates is (AFAIK) that it memcopies the whole value including padding, and in the case of {baz} Foo this would be including the contents of the bar field.

I think we might actually have no such problem for shared references, provided that a view-type like {baz} Foo never implements Copy. So &{baz} Foo is probably fine.

This reminds me a bit of pinning. It would probably be sound to work with Pin<&mut {baz} Foo> only instead of &mut {baz} Foo; for this, {baz} Foo would be an !Unpin type without structural pinning, i.e. offering a Pin<&mut {baz} Foo> -> &mut u8 conversion for accessing the baz field. Using the existing Pin for this would be a bit weird; let's give a new name to this, I’ll temporarily choose “NoMove”. You could create a NoMove<&mut {baz} Foo> reference directly for a local variable foo: Foo on the stack (but you could not create a &mut {baz} Foo reference!), and you could project NoMove<&mut {baz} Foo> -> &mut u8. You could also split-reborrow a &mut Foo into NoMove<&mut {bar} Foo> and NoMove<&mut {baz} Foo>. The new wrapper could also be combined with Pin, allowing Pin<&mut Foo> to be split into Pin<NoMove<&mut {bar} Foo>> and Pin<NoMove<&mut {baz} Foo>>.

As an alternative to a new wrapper, &mut {...path} S could be considered something different from &mut T with T == {...path} S; or a new trait like Sized could be introduced that view-types like {bar} Foo don't implement, and that mem::replace and similar functions require.

Or perhaps just making {bar} Foo be considered an unsized (i.e. !Sized) type could make sense? It would be the first “unsized” type where &{bar} Foo has no meta-info, i.e. size_of::<&{bar} Foo>() == size_of::<usize>()

I’m just realizing that in a lot of places above, I should probably have written &mut {mut bar} Foo instead of &mut {bar} Foo. I’ll stick with the latter below, too, though for simplicity.

Relating &{...path(s)} T to &T, I think the question of whether e.g. &Foo is the same as &{bar, baz} Foo comes up. And similarly for &mut.

It would probably simplify the view-type system if &Foo was just a “syntax sugar” equivalent to &{bar, baz} Foo (i.e. listing all the fields). There’s however the question of empty lists, in particular with mutable references:

First of all, &mut {} Foo doesn’t make much practical sense, so it’s unclear if it should be allowed or disallowed in the first place. If it’s allowed, it could probably be duplicated: you could split &mut {} Foo into &mut {} Foo and &mut {} Foo similar to how you could split &mut {bar, baz} Foo into &mut {bar} Foo and &mut {baz} Foo.
For structs Bar with no fields, currently &mut Bar is still in some sense exclusive. It might be possible that some existing API somehow depends on that that’s the case, although I’m not actually sure if that’s really possible. I’m not talking about #[non_exhaustive] fieldless structs here (yet). For exhaustive fieldless structs, all fields are public, so anyone can just create new instances of them (provided they can name the type, I guess...) if they want to get hold lots of &mut ... references at the same time. Still, it somehow feels weird/questionable to just start allowing duplication of mutable references to field-less zero-sized types.

About #[non_exhaustive] structs: It might make sense for those to have some way of indicating complements. For those types, &{list, of, all, fields} Type should not be the same as &Type, because the list might not stay exhaustive in the future. Still, it can make sense to want to split up &mut Type into &mut {field} Type and &mut {..everything-but field} Type. Let me use temporary syntax ~{field} to refer to everything but the field "field". So now you can split &mut Type into &mut {field} Type and &mut ~{field} Type. For ordinary exhaustive structs like Foo above then, &{baz} Foo would be the same as &~{bar} Foo; for a non-exhaustive struct there’s always all-remaining-and-future fields that are not part of &{list, of, fields} Type, but are part of &~{list, of, excluded, fields}; hence for those &{…} Type and &~{…} Type are always different. Finally, &Type would still be syntactic sugar; now for &~{} Type.

Actually, this syntax does not give a way to specify whether all-remaining-and-future fields are borrowed mutably or immutable; I don’t have a great idea how to incorporate this.

How this interacts with "longer places": in a struct like

pub struct Foo {
    pub bar: Bar,
}
pub struct Bar {
    pub x: u8,
    pub y: u8,
}

it would make sense that &Foo is the same as &{bar} Foo and the same as &{bar.x, bar.y} Foo.

However by the same token, for

pub struct Foo {
    pub bar: Bar,
}
pub struct Bar {}

now &Foo is the same as &{bar} Foo and the same as &{} Foo? But – at least when bar would be private – unlike for truly field-less structs, I’d argue it is not sound anymore to be able to duplicate &mut Foo references. I’d say that

pub struct Foo {
    bar: Bar,
}
pub struct Bar {}

should behave the same way as

#[non_exhaustive]
pub struct Foo {
    bar: Bar,
}
pub struct Bar {}

!!

But [non_exhaustive] fields need the extra all-remaining-and-future-fields place to be considered, you can only have either &mut Foo be the same as &mut {} Foo or have them not be the same, and it’s also somewhat questionable to have this depend on how public Foo’s fields are. Maybe then it’s better when

&Foo is the same as &{bar} Foo and the same as &{} Foo

is not true after all, even for the case where all fields are public. What exactly is true and not true about this statement though? And is in the example before that the statement

&Foo is the same as &{bar} Foo and the same as &{bar.x, bar.y} Foo

still true? I don’t know the best answer here.

Synonyms / named sets of fields: Those are a must in order to support private fields. It’s probably also necessary to be able to declare sets of pairwise disjoint sets of places. E.g. if I have a type

// all fields private
pub struct Matrix3Times3 {
    x_1_1: f32, x_1_2: f32, x_1_3: f32, 
    x_2_1: f32, x_2_2: f32, x_2_3: f32, 
    x_3_1: f32, x_3_2: f32, x_3_3: f32, 
}

and I want to provide view-types

pub type Matrix3Times3Row1 = {x_1_1, x_1_2, x_1_3} Matrix3Times3;
pub type Matrix3Times3Row2 = {x_2_1, x_2_2, x_2_3} Matrix3Times3;
pub type Matrix3Times3Row3 = {x_3_1, x_3_2, x_3_3} Matrix3Times3;

as well as

pub type Matrix3Times3Colum1 = {x_1_1, x_2_1, x_3_1} Matrix3Times3;
pub type Matrix3Times3Colum2 = {x_1_2, x_2_2, x_3_2} Matrix3Times3;
pub type Matrix3Times3Colum3 = {x_1_3, x_2_3, x_3_3} Matrix3Times3;

Then you could split &mut Matrix3Times3 into &mut Matrix3Times3Row1, &mut Matrix3Times3Row2 and &mut Matrix3Times3Row3. Or you could split &mut Matrix3Times3 into &mut Matrix3Times3Colum1, &mut Matrix3Times3Colum2 and &mut Matrix3Times3Colum3. But having the compiler determine this automatically would leak implementation details: It’s probably better to have the possibility (and requirement) to declare e.g.

pairwise_disjoint_view_types!{ of Matrix3Times3 {
    Matrix3Times3Row1,
    Matrix3Times3Row2,
    Matrix3Times3Row3,
}}

pairwise_disjoint_view_types!{ of Matrix3Times3 {
    Matrix3Times3Colum1,
    Matrix3Times3Colum2,
    Matrix3Times3Colum3,
}}

and only allow splitting a borrow of all the private fields of a struct into multiple subsets if those subsets are explicitly declared to be disjoint. (At least in code where the private fields really are not visible.)

This also makes sense for traits. If you have some way of providing associated-view-types Bar and Baz in a trait; users of this trait might want to split up &mut Self into &mut Bar and &mut Baz; but for this the trait would need to (be able to) specify/require the two view-types to be disjoint!

Nokel81 · November 6, 2021, 4:28pm

Copying bar would be very surprising to me. Given that {baz} Foo doesn't have access to it.

I would assume that a swap would only copy the value and padding "around" baz.

Now this probably would be an issue if Foo is #[repr(packed)] so there probably should be some restrictions then.

Topic		Replies	Views
View types based on pattern matching language design	1	1035	September 21, 2022
Should (array) view be an internal type? language design	16	1662	October 27, 2021
Notes on partial borrows language design	7	3300	March 13, 2024
[post] Safe pin projections through view types language design	12	1287	June 3, 2022
Borrow visualizer for the Rust Language Service tools and infrastructure	42	21201	July 2, 2019

Blog post: View types for Rust

Related topics