Explicit Captures for Closures and Code Blocks

In his second video on a games programming language, Jonathan Blow discusses the concept of C++ captures in its lambda syntax, and the potential generalization of the capture syntax to any block. I think that this feature could be useful to implement in Rust for the following reasons:

  1. Additional safety with potentially low compile-time overhead: if the programmer explicitly states the namespace of the closure, the compiler has an easier job of labeling everything in that namespace as moved/borrowed/etc. depending on the information the user provided.
  2. Additional information for the compiler: the programmer can use the syntax to explicitly state that the code block doesn’t modify global state, i.e. its a pure function, without requiring the compiler to do any analysis beside namespace checking.
  3. Faster testing and code factoring: Jon states in his video that using this syntax could make it easier to prototype the movement of a code block out of a larger function and into a smaller utility function. This is even more true of Rust; we already have block expressions, so adding a syntax to limit the namespace of a block would just make it easier to move blocks into their own function.
  4. Readability without comments: the syntax would give future readers additional information about the function and its side effects, in a way that can be validated by the compiler. i.e. we can have a compile-time guarrantee that certain functions do not and cannot change global state.

The syntax could be implemented using a similar syntax to that discussed in the video:

pub fn long_monolithic_function(state: GlobalState) -> GlobalState {
    // Here we're using a capture in square brackets
    // to say that we only want to operate on 2 members of the input struct,
    // with mutation on one, and rename them for easier processing
    let mut result: u64 = [&state.data as data, &mut state.other as other] {
        other += 1;

        // We use the empty brackets to state that global/non-local state
        // isn't used here, and its a pure function
        data.iter().map(|item| [] {
           // Complicated stuff here
           // Lots of logic
        })
        .sum()
    }
    // do work with result here
    // ...
    // ...
    state
}

pub fn my_function(foo: String) -> bool [&bar] {
    // do work on foo, while reading the state of bar
    // ...
    bar.validate(foo)
}

There are a few potential downsides to this idea:

  1. Additional complexity of the language: the language by definition becomes more complex.
  2. Worse compilation times in the general case: its hard to say, but at the very least if nobody uses the feature the compiler will just be objectively slower.
  3. Questionable necessity: the existing syntax and borrow-checking system might sufficient to give the guarantees that this proposal aims to provide. For the person writing the code this feature is absolutely unnecessary; the borrow-checker already does the checks to see if your closure/block does things that aren’t memory safe. Something like this would be objectively useful in a language with less static analysis, but here its use case may already be covered.
  4. Unclear syntax: it’s unclear how the syntax should actually look. The above version of the syntax is not binding by any means.
  5. Unclear implementation: this idea might require lost of changes to the way that namespaces are currently handled internally.

As an interesting note: you can actually sort of fake explicit captures with the current implicit captures:

pub fn long_monolithic_function(state: GlobalState) -> GlobalState {
    let mut result: u64 = { let data = &state.data; let other = &mut state.other; move || {
        other += 1;
        data.iter().map(move |item| {
           // Complicated stuff here
           // Lots of logic
        })
        .sum()
    }}();
    // do work with result here
    // ...
    // ...
    state
}

The move closure makes any outer variable be captured by move, so if you capture something accidentally you’ll probably notice, and for desired by-ref captures you can bind the ref explicitly.

Explicit closure captures are much more important when in a permissive mutability system and/or a pervasively refcounted system than in Rust, as accidental captures are much more problematic (e.g. if you don’t [weak self] in iOS callbacks you’ll definitely have memory leaks eventually). Because Rust guarantees that if you have &_/&mut _ it’s safe to use the reference, and all non-static garbage collection is explicit, it’s much harder to accidentally shoot yourself in the foot with implicit captures.

Of course, if you want full explicit separation from parent state, you can always just use a full named fn and parameters.

5 Likes

Captures in Rust are a bit unclear and sometimes it takes some fudging with move and Arc, but I’m not sure whether just explicit list of captures would be enough to solve this.

There were proposals to make borrowing of fields in closures more clever (so that foo.bar borrows bar, not foo). So maybe just going deeper into providing magic will be enough?

2 Likes

You're right, this syntax has questionable necessity because of the existing functionality that the borrow checker provides.

Do you think that the syntactic annotation is useful for readability? Or do most rustaceans with enough experience with the borrow-checker end up being fine without explicitly denotated captures around code blocks/closures?

Do you think the loss in compile-time speed is worth it? I’m not entirely sure about the compile-time costs of the borrow checker, but it seems like the added complexity of a cleverer borrow checker would reduce compile speed far more than this proposal. Also, do you think that adding some of those “fudging” tools as part of the potential explicit syntax would work just as well then? i.e.

let result = [
    Arc::new(&state.data) as data, 
    Arc::new(&mut state.other) as other] {
    // ... work here
}

One could imagine introducing a lint where creating a closure and casting it a fnptr would require an explicit no-capture list.

1 Like

AFAIK there won’t be a difference in compile time, because even if there is an explicit list, the compiler still has to check if it’s valid and whether all other undeclared references are valid. Besides, most time is spent in code generation and optimization, which won’t change.

2 Likes

It's being worked on. See Tracking issue for RFC 2229, "Closures Capture Disjoint Fields" · Issue #53488 · rust-lang/rust · GitHub and https://rust-lang.zulipchat.com/#narrow/stream/189812-t-compiler.2Fwg-rfc-2229.

2 Likes

Rust and Jon Blow's Jai couldn't be any farther, philosophically speaking. I'd be wary of trying to fit features from this language into Rust. Thinking of that, here are a couple of points that I think just don't apply to Rust:

How does this result in "more safety"? In Rust, safety has a specific meaning and it doesn't look like closures need any more human interaction in order to be sound; type checking (including borrow checking etc.) already takes care of closures. Last time I watched a video on Jai, it was a memory unsafe language with unrestricted raw pointers and without lifetimes, a la C. If this is still the case, I can imagine how this helps with safety there, but then it doesn't really benefit Rust.

That seems orthogonal to captures: AFAIK globals are not captured by closures because they don't need to be. And so closures which do not capture at all can still change global state or perform side effects by calling other functions. I think what you are looking for here is const instead of nocapture annotations.

I reckon it couldn't (but at the very least, it shouldn't); that […values, …] syntax in expression position looks like an array literal of a bunch of values resulting from as casts. It would thus need unbounded lookahead to parse (which is in itself highly undesirable), but what's worse, this ambiguity of the prefix would probably lead to misleading parser error messages or even code that doesn't do what you want it to do, e.g. when you genuinely meant an array literal followed by a block but you forgot a semicolon.

1 Like

You're right. Jon Blow doesn't seem to value the things that Rust explicitly has stated as its mission statement, and often takes his criticisms of Rust too far.

I used the wrong word there, because you're right; the borrow-checker already handles all of the use-after-free stuff (and general memory safety problems) for you. I guess what I was really thinking about was something more along the lines of "ease of refactoring", and since in other langauges the largest concern in that area is memory safety, I misconstrued the two. I think that reliance on the implicit behavior of the borrow-checker makes it harder to reason about the way that memory is being used in future maintenance; I should've been more clear about that. In terms of safety, explicit captures do nothing semantically that the borrow-checker doesn't already.

In terms of compile-time speed, do you think that there'd be a difference in the compile time if closure namespaces were limited? I'm not familiar with the compiler internals, but my original reasoning was that with a smaller search space, the compiler would have to do less work to reason about the safety of a closure.

I'm not experienced enough with the Rust programming language to really say anything here. Where could I go to learn more?

Do you think then that closure capture annotations should be more restrictive than the original idea? Maybe we semantically define a nocapture to be a pure function, and the compiler is instructed to disallow function calls that don't also explicitly state that they are also nocapture (i.e. functions that are also pure functions)? That seems potentially useful, but I'm unsure how difficult it would be to implement that, or whether or not it would be legitimately useful in practice. What do you think?

That makes sense, I hadn't thought about that. What kind of syntax would be useful there then? Curly brackets and paretheses don't work because of the reason you described, and any other syntax seems equally as unappealing, as it visually doesn't make sense. For closures and functions, this might not be a problem (? i.e. I can't think of a case right now), but for any of the benefits of explicit captures to be realized, the syntax should be consistent accross all use cases, and that doesn't seem possible (or at least I'm not clever enough to think of it).

2 Likes

My original question was more about the performance difference between expanding the borrow-checker logic vs. adding explicit captures, not simply the marginal performance change of adding explicit captures. Sorry about the mixup! I.e. would extending the borrow checker to be more clever cause a performance change, and if so would that be worse or better than the potential performance change for explicit captures?

Also, I'm not sure about the difference in compile time, mostly due to ignorance, and I'd love to learn more about it (although I'm not sure where to start). My inital assumption was that Rust uses something like a HashMap to check for semantic meaning of namespaces, and would potentially need to check each name in the closure against both the local and non-local namespace each time, and potentially do work on the nonlocal namespace at any time during the closure (i.e. when something is implicitly moved into it). My thought was that by explicitly stating all variables in the block/closure/etc., the compiler would be able to limit the search space that it uses in the closure itself. Maybe this doesn't make sense though, because once a variable comes into scope I'm assuming that it's put in the closure's namespace HashMap and the remaining operations on it are just as easy as if the name had been captured explicitly.

OK, that makes sense. Do you think explicit captures would help the compiler reduce the number of instructions sent to LLVM, or are those orthogonal concepts?

I wouldn't see why. The compiler figures out what is used in the closure one way or the other.

Fair enough.

IIRC there's a nascent series of articles on rust compiler internals; it was posted a few days/weeks ago. However, not being intricately familiar with rustc myself, I might be wrong, but in general, in any (compiled) language, globals are accessible in the binary from anywhere because they are in a statically known (as a first approximation) memory location and exist throughout the life of the program, so typically they are not considered as part of the environment of closures because that is not necessary. (You could think it of this way: even non-closure free functions can access globals, therefore they are not captured.)

I didn't form an opinion on that particular aspect because I don't really see said annotations valuable enough to be added to the language. In my experience, refactoring closures is a non-issue, because Rust encourages and makes it possible to follow a programming style that heavily emphasizes locality, and most idiomatic, "nice" Rust code only captures short-lived and nearby variables in almost trivial or at least short closures. Even in languages with less powerful type systems (e.g. C++), I have found that having a bug resulting from the act of refactoring a closure into a parametrized free function is not at all typical.

I'm leaning towards a short keyword like cap, but again, I have no strong opinion on it for now either, other than the general desire that it should ideally look significantly different from a juxtaposition of existing language elements, and preferably it shouldn't be heavy on special symbols. Of course inconsistency or generally too confusing / non-evocative syntax is an issue, but I think we can discuss that after the question of semantics.

1 Like

I tried (and failed) to add this idea to clippy:

The issue I ran into during implementation is described here: (basically, both the following issue and my lint ran into the same challenges, and if we can fix one then we can fix the other)

4 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.