Blog post: View types for Rust

I honestly get the appeal of doing this automatically, but there's still the intent issue, and why we don't allow global type inference for private functions either.

Or more specifically, it's currently possible to type check a function by using the body of the given function, the signatures of all called functions, the definitions of all used structures, and no more. As a key point, type checking a function does not require knowing the bodies of any other function.

Or even more direct of a comparison, it's why we don't allow lifetime inference for non public functions, even though we necessarily have the minimum bound information already.

Global type inference (of which subview inference is a subset) runs into the problem that you have to have the entire crate in (human) memory to understand it. You can't page out the details of the implementation of some helper module, because that implementation has direct impact on how you're allowed to use it.

Then there's of course still semver-like hazards in any large codebase; if I have a subroutine that introduces a log of the object state, I've broken any callers that were relying on me not using some other fields. If I do a temporary change that stops touching a field, then revert it later (as it turns out, bad idea), then anyone in the meantime could've relied on me not touching that field. (Sure, maybe that should've been a crate boundary. Oftentimes it's not, and sometimes it can't be, due to bidirectional dependencies.)

The important part of function signatures being fully specified is developer intent. Signatures give the developer a place to tell the compiler what they want the code's shape to be, which it then uses to help the developer to make sure that 1) the implementation actually fits inside of said shape (fulfils the obligations), and that 2) calling code only relies on the promised shape. Having this layer of developer intent allows for better errors (in the common case when most signatures are mostly accurate to intent) since the compiler knows what the developer wanted as well as what the developer wrote, and can offer hints both in the direction of stronger guarantees or fixing what was written. (Personally, I like avoiding errors that go "because it uses ... that uses ... that uses ... that uses ... that uses ... that uses this field". We have some (autotraits, blanket impl chains), but each for good reason.)

It's interesting that global type inference is kind of halfway between static typing and dynamic typing. Personally, I think Rust's local type inference is the sweet spot.

I'm against global type inference for anything but diagnostics for the reasons above. Diagnostic use is fine (e.g. _ as a return type, rustc will tell you what type you're returning to put there, though that's not actually global inference) and great, because it's the "did you mean" that makes a great peer programmer compiler. The compiler could track this inference and say "hey if you made these borrows take disjoint views it'd compile fine," and I'd rejoice at the compiler getting more helpful. It's the implicit reliance on function bodies for signature clarification that I'm against.


The reason type/borrow/disjoint captures inference works for closures is that they're entirely local. Their inference is still bound by the function definition. The implementation of the closure is clearly part of the implementation of the function it's a part of.

I'd love to see it become easier to lift said closures into full functions. I just don't think that expanding closure inference to currently fully specified function signatures (for successful compiles; I'm all for "allowing" inference in unsuccessful compiles to suggest adding the inferred signature) is a good idea.

13 Likes

Interesting note: at least in the case of unnamed/implicit/public field sets, we already have a syntax for only borrowing some fields:

// the bikeshed syntax
impl WonkaChocolateFactory {
    fn should_insert_ticket(&{golden_tickets} self, index: usize) -> bool {
        self.golden_tickets.contains(&index)
    }
}
// becomes the stable syntax
impl WonkaChocolateFactory {
    fn should_insert_ticket(
        Self { golden_tickets, .. }: &Self,
        index: usize,
    ) -> bool {
        golden_tickets.contains(&index)
    }
} 

Obviously if we wanted to give this disjoint captures semantics, it'd have to be in a new edition, to avoid dropping a semver change on previously published crates.

And if you wanted to keep the receiver sugar,

    fn should_insert_ticket(
        &self { golden_tickets, .. },
13 Likes

I don't think I'd want to piggy-back this on privacy, since just because something's pub doesn't mean it's externally accessible, and private_in_public is still just a warning.

I'd rather see a different opt-in for such functions/methods (and maybe even types) to mark them as borrow/type/etc-checked at use, kinda like they were macros.

My standard example:

placeholder-contextual-keyword fn mul_add(x, y, z) { x * y + z }

This is an interesting idea. A couple of things that occur to me:

Interaction with lifetimes

These views become attached to 'a lifetimes, don't they? If I have a function:

pub fn iter_bars<'a>(&'a {bars} self) -> impl Iterator<Item = &'a Bar> {...}

then that 'a is effectively bound to bars right? So a later &mut self or &mut {bars} self would conflict with it, but &mut {beans} self would not?

How does this generalize to all lifetime interactions? How much does it complicate reasoning about them, or presenting diagnostics about them?

"Exposes internal details" is good, actually

I've seen a few comments on this, but it's not obvious to me it matters. If you see & { internal_detail } self then I guess there's the concern that if you also need to express that type, you're baking it into your signatures as well. So really those names become part of the API, in the same way an associated type would. So perhaps it makes sense to have a notion like & { internal_name as api_name } T. (This reminds me of how named parameters in OCaml end up becoming part of the signature of a closure.)

Aside from the naming issue, it does also expose a more fundamental property of your implementation, but one that is probably useful to be part of the API. In effect you're saying "this method relates to this subset of my state" which means that it is coupled with other methods which touch overlapping parts of that state - and more interestingly - is independent from the methods which touch disjoint state. We effectively model that today by having sub-structures to encapsulate those states (as the blog mentions), but that can only model states which can be cleanly grouped like that.

So generalizing the internal_name as api_name to be able to give a public name to a group of state, similar to what @mjbshaw proposed, seems like a reasonable thing to consider (though I'm not sure about that syntax).

1 Like

It will not be the first: we already have (unstable) extern types.

Considering that the problem, in general, is with assigning the whole struct, which also changes hidden fields, unsizing it seems like a good idea. This way, when you have a view you can't do something meaningful with it other than access its allowed fields (which are sized). This will force views to be different types even if they include all of the fields, but I tend towards it anyway for the reason specified in your post.

4 Likes

If we really want to lean into the "kinda like they were macros", then macro fn makes a lot of sense.

Semantically, I would describe macro fn as expanding similar to the following:

macro fn mul_add(x, y, z) { x * y + z }

mul_add(f(), g(), h())

// roughly semantically equivalent to 

match (f(), g(), h()) {
    (x, y, z) => { x * y + z }
}

though with mul_add (maybe? maybe not?) having its own call stack entry, its own monomorphized function site rather than strictly being always 100% inlined.

If the argument patterns are partial patterns (e.g. WonkaChocolateFactory { golden_tickets, .. }), then this would semantically define macro fn to disjointly capture the named fields only.

However, with that definition with the early-evaluation rebinding, the whole binding is always taken, rather than field wise usage being possible (without a partial pattern in the signature). This is IIUC a fundamental limitation of the surface language as it exists today; there's no way to both only evaluate a (place) expression once and do disjoint captures with its usages.

If we do add macro fn/inline fn/whatever, I expect the function-interface semantics where the arguments are evaluated once to be met (unlike with bang-macros, which can freely do expression rewriting), so I'm not exactly sure how macro fn would exactly help the disjoint captures problem without supporting disjoint captures across a normal function interface first.

2 Likes

We have to think how you can specify nested fields of other structs, when they're not visible. Probably the best way is to nest views. If we choose this road, we have to decide whether we want to allow only nested views or both nested views and nested fields.

Another point I realize now is that we have to decide how this will interact with traits.

There are two options I can think of:

  1. Allow users to impl a trait only for the whole struct.

    We probably want methods to be able to take a view of self. For example, with @Aloso's syntax (ignoring allocators for simplicity):

    pub view Len {
        len,
    }
    pub view Slice {
        mut buf.ptr,
        len,
    }
    impl<T> Vec<T> {
        pub fn len(self: &Self::Len) -> usize { self.len }
    }
    impl<T> Deref for Vec<T> {
        type Target = [T];
        fn deref(self: &Self::Slice) -> &[T] { /* ... */ }
    }
    impl<T> DerefMut for Vec<T> {
        fn deref_mut(self: &mut Self::Slice) -> &mut [T] { /* ... */ }
    }
    

    When implementing a trait that has a method that takes a parameter of type T (any parameter or only self?), you'll be able to implement it with a method that takes any view of T. When calling it, the compiler will coerce T to the view. This coercion will have to also happen with the receiver, along with auto(de)ref and array/slice unsizing.

    I think this is the best solution.

  2. Allow to impl the trait for any view. Then:

    Do we allow them to collide?

    Do we coerce?

    If both answers are "yes", what do we do when there is a contradiction?

We also can, of course, not allow views to be used on traits, but this greatly reduces their potential.

I remembering raising this before:

I enjoyed the proposal a lot. My tiny bit of feedback is that I think the "how does that affect learning" part is could be tightened up and strengthened. I actually think this would be a positive change.

It is my experience that there are 2 camps of programming languages: ones that are highly dynamic and concepts start existing as patterns. Rust, in my feeling, is the other camp: it names important cases and makes them accessible through the type system, allowing humans (and machines like the compiler!) to pick them up.

Also, I feel like the blog post gives enough of an argument on how the issue arises naturally. That means it will probably not impact users early, but it will give them a goal to learn towards. Before such a change, users need to be aware of an issue that exists "out there", after such change, they will find a chapter "view types" and then figure out what they are useful for.

On the topic of concepts like views, I also experience in my training practice that there's not a lot of awareness, making it even harder for users to find good an common solutions around the issues you describe.

There's an argument to be made that it will make the language more complex, but I think Rust is already naturally down the path of "identifying useful concepts and making them part of the language". Whether it will make it more complex for the user is a question I would challenge though.

5 Likes

What about using struct-like syntax inside of traits to declare views?

Like so:

trait WithView{
   ref View{
      label: (mut) Type, //and so on
   }
}

In an implementor we have smth like:

impl WithView for Foo {
   ref View{
      label = ref(?) (mut) self.foo //...
   }
}

And the consumer sees:

fn method(self: Self::Foo) {...}

Short version:

fn method(self: Self{ref View})

Or even

fn method(self{ref View})

Possible extension: allow calling more complex code in such views: match or even (limited?) function calls.

Unresolved: Interactions with arbitrary self types.

There are some crates that let you do something like this: partial_borrow partial_ref and borrow_as. For people wanting something like this, it might be a good idea to experiment with them and see whether these approach are worthwhile.

Full disclosure: I'm the author of partial_borrow.

(I will post a copy of this to the proto-RFC thread mentioned earlier Partial borrowing (for fun and profit) · Issue #1215 · rust-lang/rfcs · GitHub)

FYI, using partial_borrow invokes UB, at least according to miri with the track-raw-pointers feature.

MIRIFLAGS="-Zmiri-track-raw-pointers" cargo +nightly miri run
1 Like

Yes. I think this is a false positive, although it's not entirely clear. According to the current SB 2.1 spec, raw pointers are not tagged. cf Stacked Borrows: raw pointer usable only for `T` too strict? · Issue #134 · rust-lang/unsafe-code-guidelines · GitHub Storing an object as &Header, but reading the data past the end of the header · Issue #256 · rust-lang/unsafe-code-guidelines · GitHub Stacked Borrows cannot properly handle `extern type` · Issue #276 · rust-lang/unsafe-code-guidelines · GitHub

Okay, fair point, I guess it might not be clear yet whether this really is UB or not. Note however that your crate does not interact nicely with the API provided by crates such as replace_with.

use partial_borrow::prelude::*;
use replace_with::replace_with_or_abort;

#[derive(PartialBorrow)]
struct Foo {
    field: String,
}

fn main() {
    let mut foo = Foo { field: String::new() };
    let r: &mut partial!(Foo mut field) = foo.as_mut();
    replace_with_or_abort(r, |p| {
        #[repr(C)]
        struct S {
            p: partial!(Foo mut field),
            field: String,
        }

        let mut s = S {
            p, field: "Hello World!".to_owned(),
        };

        let ref_1: &mut String = &mut *s.p.field;
        let ref_2: &mut String = &mut s.field;
        dbg!(&ref_1, &ref_2);
        // outputs:
        // [src/main.rs:26] &ref_1 = "Hello World!"
        // [src/main.rs:26] &ref_2 = "Hello World!"

        let r: &str = ref_2;
        std::mem::take(ref_1);

        println!("{}", r);
        // outputs garbage such as
        // *~c��G
        // (use after free)

        s.p
    });
}
1 Like

How alarming. Thanks for pointing this out. I would welcome an issue here Issues · Ian Jackson / rust-partial-borrow · GitLab (I just discovered that the issue tracker was disabled, sorry, I have enabled it.) That might be better than having this conversation here in the Internals thread...

Trying to log into that GitLab instance, I first tried to sign in with Google, only to get a strange error from Google:

Authorization Error

Error 403: org_internal

This client is restricted to users within its organization.

Then I tried to register a regular account, which worked, but apparently my account needs to be approved by administrators.

Anyway… this sort of thing is why we should stabilize extern types, while also making size_of_val panic when used on them (as opposed to the current nightly behavior of returning 0). The partial! type has size 0 and so replace_with thinks it can move it around in memory by copying 0 bytes. But in reality, pointers to that type are more like opaque handles. Anything that tries to check such a type's size is probably about to do something dangerous, so it should really be a panic.

3 Likes

I really like this proposal, looking at this thread, I see two large points left to be resolved;

  • How would the syntax look like?
  • How would this interact with traits and lifetimes?

I think both of those could be developed in a proper RFC, I strongly urge to bring this concept there.

Side note - this is unfortunately not true when it comes to impl Trait (including the async fn desugaring) and auto traits. Auto traits 'leak' through the underlying type through the opaque type, so you need to type-check the body of the callee to know if its opaque return type is Send/Sync

3 Likes

Is there a mistake in the example from the blog post? The code compiles when I change into_iter() to iter(), which seems easier than using view patterns.

        // for (bar, i) in self.bars.into_iter().zip(0..) {
        // replaced with:
        for (bar, i) in self.bars.iter().zip(0..) {

I think you’ll have to assume that a ChocolateBar cannot be cloned, and has a by-value fn into_wrapped(self, opt_ticket: Option<GoldenTicket>) -> WrapperChocolateBar method.