Static path-dependent types and deferred borrows

It seems that this paper addresses the same limitation I once tried to describe in thread Variable dependent types? In particular this quote

The key idea in this work is to encapsulate a reference to an object that is not (yet) a borrow

reminds me of some ideas in this post.

Definitely I agree that this is an issue Rust should tackle and that the ideas exposed in this paper are certainly worth considering. (The normal procedure should have been to write an RFC but I'm a afraid that now it is probably a dead-end).

The specific case of self-references comes up in some form or another relatively often. Presumably a more general approach would need to consider that use case specifically.

If the current analysis boundaries are maintained (e.g. it is not global and API is determined by function declarations and not function contents), the "binding handle by path" case seems limited to at best the same types of accesses that Rust can track as distinct today (field accesses). For example, you couldn't track a path into a Vec index without special casing Vec in the compiler, since that passes a trait method boundary. This limits its applicability to nested data structures. Perhaps it's adequate for stack-based borrows though (e.g. deferred references stored as offsets into a struct).

I had a lot of nits with the paper, but most were around "logical memory safety" and not the general idea of "tracking resource handles". So probably not relevant to this thread.

Nits

The restrictions on "logical memory safety" in the presented model are so large to make it rather unmotivating relative to the weight of the language changes, IMO. You're either severely restricted on operations or you need some pretty heavy indirection:

A deferred borrow idiom properly implemented by a container grants logical memory safety, as long as the container upholds the contract: a deferred borrow object returned by a lookup operation must convert into the same borrow at any future point if dereferenced with the same container object.

So, no rearranging of Vec elements for example (unless you keep track of how every element moved). No replacing of an existing element with a new one in any case, which implies no handing out of &mut Element. This means section 7 is pretty misleading: Those mutatable Vecs for example are either not today's Vecs, are Vecs of some sort of restrictive box, or are not providing logical memory safety.

The "dynamic" variants are reinventing Rc or similar and the "frozen" variants aren't workable unless Rust drops the noalias attribute.

The deferred mutable API presented prevents mutably borrowing a subset of struct fields independently.

The suggested Deref/DerefMut sugar couldn't actually be based on those traits, it would be some new sort of compiler magic to follow a resource handle back to it's source object. (Granted, most self referencial discussions I've read want this too.)

First, the borrowing system requires the borrowed object to outlive the borrow itself, to preserve the language’s memory-safety property. When one object points to another, the lifetime of the former must be strictly shorter than the latter. This immediately rules out cycles of borrows.

"Outlives" includes "same lifetime", i.e. it is not strictly shorter, and you can safely create cycles with references.

1 Like

Do you mean now this moment, or now in general?

I'm not sure this is what's covered by the paper. It's really the relationship between a path and a given instance. If I understand properly it's path-based polymorphism enforced statically. Have I misunderstood what you're saying?

I'm actually not well placed to discuss this in terms of deferred borrows (I've asked Chris, the paper's author, to weigh in). The specific use case that attracted me is something like:

impl Device {
    pub fn new(device_path: &PathBuf) -> Device {
        ...
    }

    pub fn create_operation(&self) -> OperationHandle {
        ...
    }

    pub fn use_operation(&self, operation: OperationHandle) {
        ...
    }
}

Expressing that use_operation requires a particular OperationHandle for a particular self is very tricky at the moment. It's possible with an API in which you use a generic parameter to statically determine each instance, but then you have the problem of pushing the correctness onto the user of the API. AFAICT, there's no good way to set that generic parameter in a way that doesn't require the safety to be enforced by the user. My understanding of the paper is that static path-dependent types can be used for exactly this pattern. (for background, I raised this pattern on /r/rust a short while ago). The fact that one can just about wrangle it out with generics implies it is statically determined, just we currently lack the mechanism to express the right semantics.

What do you mean by "particular" here? If you need to enforce the invariant that two values are only ever used together, put them in the private fields of a struct.

Indeed, but sometimes that's not so easy, for example because you need a handle that is Send+Sync acting on a device that is not.

You're right, they're not completely the same thing as path-based polymorphism, but they do overlap with the paper. I'm really thinking about two kinds of self-references: heap based or stack/field based. The former is usually pretty similar or the same as path-based types, depending on the pitch. The latter are not type constraints, but they are a form of deferred borrows. And both do overlap in what they require the compiler to track (reference provenance that persists across API boundaries).

For example, the desire for self-referential structs is often presented as something like:

struct MyType {
    // Heap based
    data: Vec<Widget>,
    current: &'data Widget, // or sometimes `&'self` as well
    // And/or field/stack based
    other_field: DooDad,
    and_another: DooDad,
    some_doodad_field: &'self DooDad,
}

Where &'self or some other special lifetime or annotation indicates some new kind of reference/type/"lifetime" that is associated in some way with the rest of the containing data structure.

One common requirement with both kinds of self-references and the paper is that you need to track the provenance of the reference to ensure that it isn't invalidated. In particular this tracking and validity proving needs to cover the existence of the reference, and not just a scope of local analysis. I.e. it needs to cross boundaries such as function calls.

In the case of heap based self-references, the tracking is often based on the path (field name) that owns the referenced data. In this case, the special self-referential type is a path-based type. (Although another possibility is somehow tracking the heap allocation itself, which could conceivably move between fields.)

The stack/field self-references are different in that the path is dynamic (or what's the point), and not a static type constraint like in the paper. However, they are still related to the paper. In particular, one likely implementation is to store the self-reference as an offset relative to the struct, and utilizing the self-reference would be a form of deferred borrowing: the actual reference would be computed from the offset at the use site (which can differ per use site, e.g. when you call a by-value function and move the struct).

Here's a recent thread where both heap and stack based self-references are mentioned.

I mean in general. I may be far too pessimistic but if you look at the oldest of the 100 currently open RFCs it is almost 5 year old. Besides, although GATs and specialization are long-awaited and officially validated features they are still not available in their complete and stabilized form. This is not a criticism: I completely understand why it has become very complicated to add such sophisticated features and why there might be a strong resistance against piling up new ones.

What is currently proposed here is not a new type (like str or usize), it's not even a new type category (like union or tuples) it's a entirely new dimension in the type system (comparable to genericity wrt types / lifetimes or to dependent types). So although I subscribe to the legitimicy of the issue, I sadly feel this is far too late now.

1 Like

The situation is not symetric. An OperationHandle should only be used with the Device used to create it. But there might be an unlimited number of OperationHandles related to the same device: Device. The end user may need to manipulate, modify or store them through individually distinct processes but she should not be able to apply method devide.use_operation(...) to some unrelated OperationHandle. Trying to elaborate on your proposal, storing a given Device and the vector of related OperationHandles in a the same struct does not seem to solve the problem because any external code that needs to refer to a specific OperationHandle would have then no other choice but to store the index from the vector (what the author calls pseudo-pointers).

Hi, I'm wrapping my mind around your idea. There is something that mesmerizes me about it :slight_smile:
What happens with OperationHandles if device is moved?
You probably envisioned it'd be used only/mostly with types that are not Copy?

I'm not so sure about that, but would be interested in hearing more thoughts. In particular...

Let's imagine the API today is something like

// More likely a method but for illustration...
fn new_device(_: Handle) -> Device<Handle> { ... }

// Intended usage:
let my_uniquely_handled_device = new_device(||{});

And to remove the burden from the user without expanding the type system, you want some mechanism to ensure that new_device can only be called in this manner -- with a closure or something analagous, a type unique to this call site.

But if you had this, you're still not there because the type is unique per call site and not per dynamic invocation, so you can still create Devices with the same Handle type and end up intermixing. Example on the playground.

So I would say the desired functionality is not statically determined.

The additional functionality is to have a different type per dynamic invocation and not per call site. The implications of that seem rather far reaching to me: variables would have dynamic types (e.g. different each time through the loop) and functions could return dynamic types as well (each call to new_device). Or, alternatively, an entirely new mechanism that keeps the type systems static.

Again, if I'm missing your point or you have further thoughts on this, I'm interested in hearing them.

Then the solution needs to be asymmetric as well: namely, the OperationHandles could store an Rc<Device> pointing to the device they should be used together with. (Add a pinch of interior mutability to taste if required.)

Whoa! What's going on there? Is this an effect of using anonymous types?

Based on your playground example, here is the closest thing I can come up with to what might be desired using self-consuming factories, which I don't think you can break in the same way your example did. There is a single runtime check, which is to enforce that the base factory is only used once, but beyond that it's all statically determined (how that impacts the theoretical assertion about static determinability I can't answer).

Is something remotely related to (OCa)ML generative (as opposed to applicative) functors desired?

Each time such a functor is applied to generate a new module it can generate new types. This is module-level feature of course. I don't know OCaml but I think there was some support for first-class modules which could potentially bring this into normal language..

I'm still wondering however

E.g. which semantics is desired in these cases?

The use case for me is a non cloneable device. More concretely, it's a structure representing a DMA device that wraps a piece of physical hardware.

What above moves? Can you move after you've created a handle? What happens to the handle?

Assuming the handle can be brought in scope of the device, it should work. It can only be used with the creating device, but it can be moved around as much as wanted (actually, it can be a bit more subtle than this, the handle actually references internal state of the device, and dropping the handle allows the device to later clear that state through shared atomics). Again, in my use case, though I don't know that this is a requirement, the device consumes the handle when it's used (it then returns it again on success).

I would suggest the whole point is that the device and the handle can be moved independently of one another. The original motivation for the architecture was that I wanted the handle to be Send, without imposing that on the device.

Using closures is just a shortcut to the local type generating macro used in the paper (modulo nameability). Every closure has a unique type, but it's unique to its source code location, not to each time the code runs. This has to compile down to a single concrete static type for example:

fn foo() -> impl Fn() {
    || {}
}

It's not because the types are anonymous, it's because they're static.

Ok, it all makes more sense now. The problem being that without the compiler actually running the code, the best it can do is have a per call site type. What you're highlighting is the broader conclusion I'd reached on this issue, which is that you can't express a mutating type in any generally useful way.

The solution I proposed works around this by consuming self, which immediately prevents the loop case you highlight. The broader question then is there a way to make this more ergonomic that doesn't break the compiler model?

Compiler for today's Rust - yes.

It should be possible to design a different language or extend Rust with more features so that the compiler would be able to do better. Ocaml can apply functors at module level level generatively.

I understand at PL theory level generative functors are actually simpler to reason about than applicative ones.

Is there any serious proposal to do this?