Pre-pre-RFC/Working Prototype: Borrow-Aware Automated Context Passing

Right. I was just being casual with my terminology. A better way to phrase the specific thing I was trying to get at was, if I see the function call:

compute_something(param1, param2);

It's not immediately obvious whether it accesses my voxel subsystem. Likewise, I don't know whether:

do_something_with_voxels();

Accesses just the voxel data or if it also does something with the renderer.

The explicit use Voxels marker in the realms proposal fixes that.

Not exactly because this says what it could potentially accessā€”not what it actually accesses. That means that, within a self-contained subsystem, you can make it such that all functions share the same realm and essentially write code as you would in the original proposal without realms. Additionally, if you find that you actually need access to a peer subsystem in a given realm, you can just update the realm's definition to include that other realm like so:

// Original code:
realm Voxel

cap Foo in Voxel = SomeType;

fn do_something() use Voxel {
    Foo.do_something_else();
}

// Code after refactor
realm Voxel: SomeOtherRealm

cap Foo in Voxel = SomeType;

fn do_something() use Voxel {
    Foo.do_something_else();
    do_something_in_the_sub_realm(use SomeOtherRealm);
}
1 Like

As an aside, the thing I deeply care about is that I can see which realm is used when looking at the definition of foo or bar - I would accept realms as OK if this was:

fn foo() use VoxelRenderer {
    foo();
}

fn bar() use VoxelRenderer {
    baz();
    maz();
    faz();
}

In large part, this is because if I need to debug what baz, maz and faz do, I'll be looking at their documentation, which includes the signature.

1 Like

This feels like a thing of personal preference. I personally kinda like the indication but I can imagine why some people would not care for it. Maybe it could be an optional assertion for users that want it and users like me could enable a clippy restriction lint if they want to force themselves to use it? I'm not sure whether to include it in any initial RFC, though.

Isn't that then functionally equivalent to the following (which already exists):

trait SomeOtherRealm {
    fn something_unrelated(&mut self) -> &mut SomethingUnrelated;
}
trait Voxel: SomeOtherRealm {
    fn foo(&mut self) -> &mut Foo;
}

fn do_something_in_the_sub_realm(v: impl SomeOtherRealm) {}

fn do_something(v impl Voxel) {
    v.foo().do_something_else();
    do_something_in_the_sub_realm(v)
}

With the main difference being the different syntax and that you use v.foo() instead of Foo.

Or am I missing something here?

The individual components of the context are not borrow-checked separately. You can't borrow foo in one function and something_unrelated in another simultaneously.

1 Like

I'm happy either way - I'm just trying to make sure that you're not putting things in to satisfy me when they're not things I want.

My sole desire is that when I look at a function's signature, I can see everything that it takes as parameters, bar the pre-existing (and unfortunately sometimes needed) wart of static globals. As long as a function's signature tells me how to find exactly what it takes in for its context, I'm OK with calling functions not needing to tell me what the context in use will be.

2 Likes

I see the discussion has moved on quite a bit in my absence.

I am mostly indifferent to syntax for this feature, but on semantics I want to explicitly disagree with @farnz. We do not want the same thing.

I think baz() should not appear in the using clause. I think there should be a distinction in the function signature between four cases:

  1. a function that doesn't make any use of context
  2. a function that uses context elements itself
  3. a function that forwards context (as an undifferentiated blob) from its callers to its callees
  4. a function that does both of these things

but I think functions of types 2 and 3 should not specify in their signatures which elements of context their callees use. I think if we make people do that, people will hate the feature and avoid using it, for the same reason they hated and avoided exception specifications in C++, which is also one of the reasons people don't like long lists of function parameters now: People don't want to have to change function because baz changed. It's tedious and error-prone and might break compatibility at crate boundaries.

Type signatures that refer to the context only by naming specific items or as a whole don't have this problem.

fn type_0_fn(params: ParamsType) -> ReturnType
fn type_1_fn(params: ParamsType) -> ReturnType
using cx1: Cx1Type,
fn type_2_fn(params: ParamsType) -> ReturnType
using ...
fn type_3_fn(params: ParamsType) -> ReturnType
using
    cx1: Cx1Type,
    ...

Writing it this way also makes it crystal clear that type 2 and 3 functions have no access to whatever's wrapped up in the ... (except if surfaced to them via callbacks, as discussed previously). With the "named subsets of context" alternative that's not clear at all.


There is a snag, of course.

If Rust presents the feature the way I want, then that places strong constraints on how it is implemented: specifically, if foo calls bar calls baz calls quux and one day quux changes so that now it requires two context elements instead of one, that's unavoidably a breaking change for foo (someone's gotta provide that new context element!) but it needs to somehow not be a breaking change for bar and baz. This may be very hard, particularly if the call chain crosses crate boundaries, but if we can't do it I think we're better off with no "ambient context" feature at all.

As far as I can tell, consensus seems to be leaning more towards my "realm" design. These comments should get you up to speed on its current design:

It's a breaking change for all functions in the chain but, for functions that are internal, they can absorb the breaking change implicitly. To ensure that you don't accidentally break things for the public API, we could adopt my suggestion in Pre-pre-RFC/Working Prototype: Borrow-Aware Automated Context Passing - #24 by Radbuglet.


Unless there are any more critical issues to fix with the proposal, I'm going to try and draft an RFC and share it here to make it a bit easier for people to see the current vision for the feature without scrolling through the entire discussion.

1 Like

Just to be clear, I don't want this feature at all. I'm trying to come up with semantics for all the variations on it that I could live with, because if someone else is going to make it happen, I want them to do a good job - I'd rather a high-quality feature I dislike than a low-quality version that makes my life harder.

The using baz() suggestion is because I see people wanting a way for bar to not care about what the ambient context is, but to insist that the ambient context is suitable for calling baz. I want this to be explicit in bar's signature somehow - I don't care whether that's done by saying that bar needs to include baz's ambient context requirements in its ambient context requirements, or by providing a way for bar to say "I need whatever baz needs.

The driving force behind all of this is that I find myself dealing a lot with people who message me on Slack at work saying "hey, I'm calling foo, but it doesn't do what I expect"'; the fastest way to help people unstick themselves is to ask for the full call to foo and look up foo in the codebase in parallel; if I can see from foo's signature that the context also matters, I can ask for that, but if I don't realise that it matters until I find that foo calls bar which calls baz which calls quux which calls frobnicate, and frobnicate's behaviour depends on ambient context, I'm going to spend a lot of time looking before I even know to ask what the ambient context is when you call foo.

On the other hand, if foo's signature tells me what the ambient context looks like at the point you call foo, I know to ask for it, making it the asker's problem to go up through the call stack and find out what the context looks like here. And that's really the crux of what I need; I need to be able to see, just looking at foo's signature, what I need to know to answer the question, so that I can send the asker looking for it; much of the time, the moment I ask them to tell me "what's the value of arg1, arg2 and arg3 in this call", that's enough to get them to reply a few minutes later telling me that they now understand the problem, and I'd expect similar if I was saying "what's the value of arg1, arg2 and arg3, and what's in the context named in the using clause?".

1 Like

I think you got this backwards. If baz changes and requires more implicit arguments then function's actual signature will change anyway. Having to also change function's signature will make it more obvious to the library writer and will help uncosciously avoid breaking compatibility.

How can it not be a breaking change for bar and baz? By the same logic that it is a breaking change for foo it will also be a breaking change for any other function that calls quux!

2 Likes

As far as I can understand, your AuToken is using thread_local. Then am I right about those?

  1. It can't be used in no_std environments.
  2. You can't pass &T where T: Sync between threads using your approach
1 Like

Correct. This is a limitation of my implementation caused by the way in which rustc_drivers canā€™t easily modify codegen. There is no reason this limitation couldnā€™t be lifted in a proper in-compiler implementation.

3 Likes

Then we would benefit from that!

By the way, web developers already have something like this: Contexts, so we can also learn something from them.

1 Like

I remeber I made a similar proposal based on early version of context and capabilities post of @tmandry and we basically got here as well.

The proposed solution was: if the item is pub, then it must declare which types of context it requires to be called: be it either for direct usage or propagation, otherway it's implicit propagation.

It allows to make lib API fool prof to usage of context, allows pub(crate) items to error out if developer changed the internal details too much and some of implicit bounds got leaked, and just does the job for the rest of items.

Motivation examples are: log crate, which can then basically turned into a functions requiring context, configuration loading in fn main() and so on.

The problems as far as i remember were in cases of threading and mutability of contextes, which both posed significant implementation challenges; none of theme were answered?

2 Likes

Iā€™d be interested in hearing about these challenges! Afaict, my prototype extended with the restrictions posed in Pre-pre-RFC/Working Prototype: Borrow-Aware Automated Context Passing - #2 by Radbuglet is sound.

1 Like

That works for open source libraries. I do particularly care how e.g. serde or tracing are implemented internally as long as they work. And I expect their authors do know, it is usually just one or a handful of people. Besides documentation is often pretty good.

Commercial code is quite different unfortunately. Some previous employee you never met, who left two years before you joined wrote some code that you now have to try to find a bug in. Anything implicit is bad, and reasoning should be local.

Then there is game dev, academic code and possibly some other niches that are less concerned with long term maintainability. Get the product out the door and do the DLCs. Or publish that conference paper. Then nobody cares any more, and it is on to the next project.

There are thus conflicting goals here. This suggests to me that this should be configurable, as a lint you can deny to say that, no, you can't use that feature in this code base.

Can we approach that problem from a different angle?

What if it can be somehow expressed via generics? With something like frunk + macros you already can have something like

#[requires(ctx(Bar))]
fn foo(ctx) {
    let bar: Bar = ctx.resolve();
}

Which will resolve to an ugly bound including Bar, approximately like this:

fn foo<S, C, I>(ctx: S)
where
    for<'a> S: Resolver<'a, C, Bar, I>,
{
    let bar: Bar = ctx.resolve();
}

Then, maybe make it easier to use or write? For example, add a mechanism to not require trait bounds on parents if children have them? That way only child will define that it wants Foo, but parents will not mention it in their signatures?

// No mention of `Bar`, while `foo` requires it
#[use_some_new_feature(ctx)]
fn baz(ctx) {
    foo(ctx)
}

It looks to solve our main problem: having to change all functions from the bottom to the top, as Bar will only be mentioned at the bottom (resolve where bound) and the top (where Bar initially injected).

I'm not sure about it's relationships with borrow checking, as author proposes. In my experience, stuff is passed like this is either by move, copy or immutable borrow. Mutable borrowing and managing is quite a specific thing and, in my opinion, might be written explicitly.

In some cases simple struct Ctx might be defined and passed down, so you only change it, but it's absolutely not flexible, as you might want to add/remove stuff to/from the context as it travels down.

There are two types of borrow information a user could care about: whether a capability could be borrowed and whether a capability is actually borrowed. Here's an example which should illustrate the difference:

cap! {
    SubSystem1Values  = Vec<u32>;
    SubSystem1Flag = bool;

    SubSystem2Values  = Vec<i32>;
    SubSystem2Flag = bool;
}

mod subsystem_1 {
    use super::*;

    pub fn foo() {
        for &v in cap!(ref SubSystem1Values) {
            bar(v);
        }
    }

    pub fn bar(v: u32) {
        if v == 42 {
            *cap!(mut SubSystem1Flag) = true;
        }
    }
}


mod subsystem_2 {
    use super::*;

    pub fn foo() {
        for &v in cap!(ref SubSystem2Values) {
            bar(v);
        }
    }

    pub fn bar(v: i32) {
        if v == -42 {
            *cap!(mut SubSystem2Flag) = true;
        }
    }
}

fn user() {
    let old_value = *cap!(ref SubSystem2Flag);
    subsystem_1::bar(42);
    assert_eq!(old_value, *cap!(ref SubSystem2Flag));

    let old_value = *cap!(ref SubSystem1Flag);
    subsystem_2::bar(-42);
    assert_eq!(old_value, *cap!(ref SubSystem1Flag));
}

It is important that subsystem_x::foo() know that subsystem_x::bar() does not actually borrow SubSystemXValues mutably since that allows it to iterate through it while calling subsystem_x::bar(), even though bar technically could. However, the user function does not really care which specific subsystem-internal state the system_x::bar() actually touches, so long as it couldn't touch other subsystems' internal state.

The actually borrows information is only useful for getting a program to borrow-check. Hence, I think it's fine (and, indeed, quite valuable) for that information to be left implicit. The could borrow information, meanwhile, is very valuable to code reviewers since it helps make semantic sense of the program but doesn't really mean much to the borrow checker, hence why it has no effect on the borrow checker.

This is why I think my realms proposal could effectively tame the implicitness of this proposal. I go into a lot more detail about it in my still WIP RFC but, the gist of it is that every single capability (which I call "contextual parameters" in my RFC to avoid the whole "capability doesn't mean that" bikeshed) belongs to a realm representing a subsystem (or sub-component of a subsystem) and functions that wish to access that context must declare the fact that they're operating in a realm which has access to that context:

realm Subsystem1;

realm Subsystem2;

realm App: Subsystem1, Subsystem2;

ctx SYS_1_VALUES in Subsystem1 = Vec<u32>;
ctx SYS_1_FLAG in Subsystem1 = bool;

ctx SYS_2_VALUES in Subsystem1 = Vec<i32>;
ctx SYS_2_FLAG in Subsystem1 = bool;

fn foo() use Subsystem1 {
    for &v in SYS_1_VALUES.iter() {
        bar(v);
    }
}

fn bar(v: u32) use Subsystem1 {
    if v == 42 {
        *SYS_1_FLAG = true;
    }
}

fn baz() use Subsystem2 {
    for &v in SYS_2_VALUES.iter() {
        quux(v);
    }
}

fn quux(v: i32) use Subsystem2 {
    if v == -42 {
        *SYS_2_FLAG = true;
    }
}

fn user() use App {
    let old_value = *SYS_1_FLAG;
    quux(-42);
    assert_eq!(old_value, *SYS_1_FLAG);

    let old_value = *SYS_2_FLAG;
    bar(-42);
    assert_eq!(old_value, *SYS_2_FLAG);
}

Realms are designed to be watertight: if a function requests a limited realm, it cannot suddenly gain access to a larger realm. This gives the code-reviewer of this program a very useful guarantee: baz and quux cannot modify state affecting foo and bar and vice versa.


Clarifiction: this is true of private APIs only! Public APIs still need to be aware of both potential and actual borrow sets, which is why I'm including the #[borrows_only] attribute proposal in the RFC.

2 Likes

Contrary, I doubt stuff like context will be allowed to introduced lightly in comercial development. Also such lint I belive is better suited for 3rd party tools.

Overstatement, we have implicit global allocation via Box; static's inside of modern executors; etc. Reasoning on these is generally omitted and those are assumed to just work.

Context passing is more explicit in fact than those, because you can (not sure if it has been written somewhere tho) modify the context: insert a tracing implementation for DB context, logging, etc; clear it, to forbid allocations in hot loop for example.

This from the very beginning was about reducing the need in global state conveying the context and replace it with granular mechanism.

1 Like

Depends on the domain. That is probably true in web or games dev and other non-critical applications. It is definitely not the case in my domain (human safety critical hard real-time machine control). Embedded (which is adjecent to what I work with) also cares deeply about allocations. (They do tend to like global state over there though.)

That said, there is still some global implicit state. Logging in particular tends to get a pass (but only after carefully reviewing that the design of the logging framework complies with the needs of the application of course).

And that I'm not against. I do think the realm proposal is reasonable, as long as you can see that something is going on at every level. I don't need to know the exact details that is being passed through a function when looking at it.

But I do want to be made aware of "oh this takes and passes through some sort of context related to $SUBSYSYEM". Then if that is relevant to what I'm debugging or reviewing I can go look at those details then.

I want to have a list of everything that could be relevant. It is the totally silent bits that are a problem.

For my day job I could see having such a subsystem for event sending might be useful: We use a actor + pub-sub bus architecture, which fits out domain very well. As such we don't have a lot of global state, most things is done by messages, but having some deeply nested function send an event can be annoying (threading the event sender through all the layers).

3 Likes