Pre-RFC: defer statement

hyeonu · May 17, 2022, 1:34pm

Summary

Introduce a new defer {...} statement syntax with keyword defer for registering code block to be executed on current scope is exited.

Motivation

It is a common pattern to use drop guards to run some cleanup code reliably even on the panic unwinding or async task cancelation. Many stdlib functions define its own internal type which implements Drop trait for this purpose. More generally scopeguard crate is used by many crates for it. But these approaches have a fundamental limitation. scopeguard mentions:

If the scope guard closure needs to access an outer value that is also mutated outside of the scope guard, then you may want to use the scope guard with a value. The guard works like a smart pointer, so the inner value can be accessed by reference or by mutable reference.

This limitation is due to the fact that the guard is a value which borrows other values it needs to touch on its drop logic. It makes it not a trivial task to use those values while the guard is alive. Usually there are some workarounds like:

let (foo, bar) = &mut *scopeguard::guard((foo, bar), |(foo, bar)| { ... });

Which is not a very pleasant code. Not only it's too verbose, but also it undermines one of the purposes of the closure which doesn't force to specify all its captures. To solve it the language not a library needs to provide this functionality.

Guide-level explanation

A keyword defer can be used to declare a defer statement(defer block). Its syntax is to prepend a defer keyword before a block expression.

defer {
    println!("Hello, defer!");
}

A defer block is a statement(not an expression) which executes its inner block when the scope it declared is exited. Since it's a statement which doesn't produce any value, it doesn't borrow any values it captures until it got actually executed.

let mut numbers = vec![1, 2]

defer {
    println!("numbers: {numbers:?}");
}

numbers.push(3);

// prints `numbers: [1, 2, 3]`

It still be executed on panic unwinding and async task cancelation.

std::panic::catch_unwind(|| {
    defer { println!("declared before panic"); }
    println!("now go panicking");
    panic!();
    defer { println!("declared after panic"); }
});

// prints:
// now go panicking
// declared before panic

let handle = task::spawn(async {
    defer { println!("declared before sleep"); }
    async_sleep(an_hour).await;
    defer { println!("declared after sleep"); }
});
sleep(a_second);
handle.abort();

// prints: `declared before sleep`

Like the drop glue, it is executed in the reverse order of its declaration. This allows it safe to use any variables it can see within the defer statement.

let a = "a".to_owned();
defer { println!("first, {a}"); }
let b = "b".to_owned();
defer { println!("second, {a} {b}"); }
let c = "c".to_owned();
defer { println!("third, {a} {b} {c}"); }

// prints:
// third, a b c
// second, a b
// first, a

Like closures and async blocks, you can return from the defer blocks. It escapes from the current defer statement, but it doesn't affect executions of other defer statements.

defer {
    println!("1");
}

defer {
    println!("2");
    return;
    println!("3");
}

println!("4");

defer {
    println!("5");
}

// prints:
// 4
// 5
// 2
// 1

Like the if expression without any else part, inner block expression of the defer statement can only returns () type.

defer {
    return 42; // compile error
};

defer { 7 } // compile error
defer { 7; } // ok

You can't move out values if it's used/captured by some defer statement.

let s = "some text".to_owned();
defer { println!("{s}"); }
drop(s); // compile error

Since it should be executed on drop, you can't .await directly within defer statement even its declared scope allows it.

async fn foo() {
    defer {
        async_sleep(an_hour).await; // compile error
    }
}

Reference-level explanation

Since the keyword defer is not reserved, it is used as a raw keyword(k#defer).

From the language implementation's perspective, a defer statement is like a let statement with ZST variable but it has custom code block instead of the drop glue. Due to the semantics of the return within the defer statement, it's also possible to implement it as a closure constructed at the position of the drop glue and immediately call it.

From the type checker's perspective, it can be handle as a normal block expression statement. For the type system it doesn't matter whether the block is executed on its declaration or on its scope exits. Since the desugaring should happen as a part of the construction of the MIR, any processes that happens later including the borrow checker doesn't need to know about this feature.

Drawbacks

Compilers need to run more than single function call on the drop glue position. it may not be trivial to implement.

It allows code which doesn't run at the position it is written. It may confuse readers who're not used to the concept of the drop guard.

Its semantics doesn't match exactly with the Go's defer statement. It may confuse people who're used to that language.

Rationale and alternatives

The enforced block({}) gives visual isolation between the code that will be executed immediately and the code that will be executed later. It has its own scope so it's natural that variables declared within the defer statement can't be used from the outside. Also it prevents to put semicolons at the end of the block expression.

Technically the scopeguard crate and/or the drop guard newtype pattern can cover every use cases specified above.

Prior art

The scopeguard crate is a popular crate to provide library level solution to the problem this RFC tries to solve. The defer block statement is designed to be a superset of the defer! macro in this crate.

Unresolved questions

Bikeshedding. Is defer the best name/keyword for this feature?

Future possibilities

Conditional execution based on what triggered the scope exit - normal execution/panic unwinding/async task cancelation.

chrefr · May 17, 2022, 3:29pm

I think it will be good to mention some cases, possibly real world, where guards are used. It will be especially useful to bring cases where the borrowing problem shows up.

Specifically, I want to know how common the problem is: most languages that have a defer statement don't have RAII, and vice versa: do we really need a language feature for that, that we will need to implement, document, and teach? I feel like the motivation isn't strong enough.

Also, I know this was discussed in the past, so references will be good.

y86-dev · May 17, 2022, 4:53pm

~~Why is it not possible to use the defer macro from scopeguard? What are you improving? I feel there is virtually no difference between these two:~~

defer {
    println!("abc");
}
// and
defer! {
   println!("abc");
}

~~The keyword does not seem to add any value in my eyes.~~

Edit: Did not read the post carefully enough

chrefr · May 17, 2022, 5:28pm

This has problems with borrowck, as explained in the OP.

y86-dev · May 17, 2022, 6:08pm

Oh I am sorry, I only glanced over it (and assumed, it did not add anything to defer), should have read it completely.

hyeonu:

let (foo, bar) = &mut *scopeguard::guard((foo, bar), |(foo, bar)| { ... });
Which is not a very pleasant code. Not only it's too verbose, but also it undermines one of the purposes of the closure which doesn't force to specify all its captures. To solve it the language not a library needs to provide this functionality.

I do not think that this pattern happens often enough to justify a keyword for it. I havent used this pattern in my code and feel like @chrefr said, examples would help understand the general usecase of this.

scottmcm · May 17, 2022, 7:19pm

Note that in expressions we essentially can't do contextual keywords. (We can on items, but in expressions like this it doesn't really work.) Just make it a full keyword -- it can be k#defer in 2021 and defer in 2024.

I'd like to see a rationale conversation about scope-level vs function-level defer.

(I agree that doing this like Drop is probably best, but I also think it's worth writing out why it was chosen and what, if anything, is lost doing it that way.)

It would be nice to see more about the limitations of doing this as a macro.

Part of me thinks that the macro could completely handle this via shadowing so long as it was passed an explicit capture list? The guide bit mentions not wanting to do that, but then has almost entirely examples that don't need captures. Something like defer!(n => *n = 0); doesn't seem that obviously terrible to me, especially since you'd only need to mention the things usable after the defer -- defer!(x => *n = x); would be fine if nothing should touch n later, just update x. (Indeed, it might even be good to prevent things from touching n after that.)

hyeonu · May 18, 2022, 2:19am

Thanks for the feedback! I'll update the post soon.

But it's a statement not an expression. Is it still not possible with statements?

scottmcm · May 18, 2022, 4:22am

I don't think so. Given that expression statements exist, you can't really know without arbitrary lookahead, which we try to avoid.

It's super-contrived, but this compiles today:

struct defer { x: () }
fn blah() -> defer {
    let x = ();
    defer { 
        x
    }
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=a2aea0184dfe6524c7030181254b6c2b

And assuming that the defer block is a normal block expected to have type () (which I think it must, since there's nowhere for a value to go), I think that would be a valid defer block under this proposal, but it would then not compile any more.

So it might have been worth trying to make it a contextual keyword before we had editions, but now that we have editions, it might as well just become a full keyword.

The general goal, as I understand it, is still to have something like https://github.com/rust-lang/rfcs/pull/3098 so this isn't a big deal -- we have the lexical space reserved for it already as of edition 2021 (https://github.com/rust-lang/rfcs/pull/3101).

hyeonu · May 18, 2022, 5:09am

Nice catch, that's something I missed. And right it's not a hard blocker anyway thanks to the raw keywords feature. I'll fix the RFC to use strict keyword instead.

afetisov · May 18, 2022, 4:47pm

I don't see a point in having two almost-but-not-quite identical resource handling mechanisms in the language. Basically everything is covered by the usual Drop, scope_guard provides a small syntactic sugar, but doesn't change anything fundamental, and this rfcs exists purely as a workaround for a small issue with the borrow checker. Adding defer would just increase the confusion for new users (should I impl Drop or just use defer?), encourage error-prone practices (defer is easy to forget or misplace, potentially with drastic consequences, and it's too easy to just dump defer all over the code instead of properly implementing resource semantics), and add to inconsistencies within the language.

On the latter point and more to the meat of the RFC, I strongly dislike that it doesn't compose with any of the expected language features. It can't be used as an expression even though Rust is heavily expression-based (even let is now almost an expression, at least syntactically). It doesn't compose with async, even though the block syntax implies that it does. Similarly, return has weird block-exiting semantics which are usually reserved for break. The last part makes the defer block a syntax sugar for a weirdly borrow-checked closure rather than a proper block. For the same reasons it doesn't compose with the try operator ?.

I'll note that scope_guard isn't the only place where such issues with closure borrowing arises. Perhaps it could be possible to allow more complex borrowing in closures in the general case?

The issue of when exactly does defer fire should also be discussed in the RFC (there are at least two alternatives: at scope end or at function end; ZIg uses the former while Go uses the latter). It should also have a section discussing the prior art (i.e. at least Go and Zig), with its successes and pitfalls.

toc · May 18, 2022, 5:43pm

Is go's version even an option? It seems like supporting

for x in xs {
    defer { drop(x) };
}
...
// defers happen here

would be a complete non-starter in rust.

afetisov · May 18, 2022, 6:11pm

I also think it's a non-starter. Still, it is a good idea to have an understanding of their design rationale, benefits and issues.

For example, scope-based defer means that it's impossible to return a defer-protected object from an inner block. In particular, you can't conditionally initialize an object in different branches of an if or match expression. I doubt that it's a desirable outcome. It also means that defer-protected data essentially acts as if locally borrowed, but also usable by value, which is... weird.

hyeonu · May 18, 2022, 8:09pm

Can you please elaborate it more?

Actually the || {} and async {} syntax shares these properties.

I agree this RFC would become meaningless if we have that feature. Though I personally against to escape "closure is just a struct which borrows its capture" semantics. But let's talk about it on its own topic.

Well the reason is simple - the Go has (function-local) goto statement. It effectively makes running some code on scope exit no-go. Also it can only takes function call expression and its arguments are eagerly evaluated while the function itself are called right before return. I heard it's because that early designers of the Go are more like VM people not PL people but no official source I found. TBH I feel like describing it can easily be seen as blaming the language which I want to avoid on the RFC document.

chrefr · May 18, 2022, 8:31pm

This is because of let chains, e.g.:

macro_rules! m {
    ($e:expr) => {};
}

m!((let v = 1));

But my feeling is that this is mostly a side effect, perhaps even unintended.

return for closures has the expected meaning. return from async may indeed be considered the same. But the fact that they don't compose well together is a something we're willing to change (💤 Async closures - async fn fundamentals initiative).

afetisov · May 18, 2022, 8:53pm

As @chrefr notes, it is essentially because of let-chains. They allow let bindings to be chained with arbitrary expressions and other let bindings, in arbitrary order. It is also not inconcievable that let-chains will support || oeprators in the future, in addition to && (but that requires more implementation work and ironing out the semantics). It was decided that the easiest way to implement something with those properties is to allow let $pat = $expr syntax at the expression level, with a separate error check which blocks all unintended usages. Try it, you can do let _x = let () = y; and it will parse, producing an error " let expressions in this position are unstable".

It's also a relatively common source of confusion and language proposals to add something like let $pat = $expr with semantics similar to matches!($expr, $pat). The if-let and while-let construct sure look a lot as if they just checked a value of this expression. The proposals are blocked by the fact that let-expressions would be creating new variable bindings, which is unprecedented for expressions and pretty hard to integrate into the language.

There are good reasons, but in a nutshell I'm also not happy with that behaviour on their part. But those construct have independent damn good reasons to exist, and so it was more of a question "which semantics would control flow within them have" rather than "should we add those constructs in the first place". I don't see such a strong rationale for defer, it's hard to produce an example where it would allow something impossible in current Rust. If such examples exist (where there is really no safe alternative and not just a less than pretty API), they should be definitely included in the RFC.

Prior art is valuable, we would definitely want to avoid their mistakes and replicate their successes, but that requires knowing what they are. If Rust is treading new ground, it requires much closer scrutiny and much stronger arguments than if it's about adding some well-known, loved and expected feature.

simonbuchan · May 19, 2022, 8:49am

Zig has defer with identical seeming definition, and and errdefer, which only runs on error cases (zig has Result baked in with error!success that works with try, roughly).

I would really like more specification than this though:Documentation - The Zig Programming Language

system · August 18, 2022, 7:09pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
A defer discussion language design	70	4637	June 10, 2024
Pre-pre-pre-RFC: implicit code control and defer statements language design	8	1250	April 18, 2024
Idea: `drop_guard` or `scope_guard` helper function libs	9	1848	June 28, 2022
Idea: Add borrows_(begin/end) to enable guard-less concurrency language design	11	898	March 18, 2021
Marking a return value as inappropriate for _	5	820	March 7, 2023