Pre-pre-pre-RFC: implicit code control and defer statements

zetanumbers · December 22, 2023, 5:19pm

Implicit code control and defer statements

DISCLAIMER: This text idea once created during a brainstorm aims to be possible answer for many questions about async drop, many new possible features mentioned are for feature symmetry or aim at ability to write all implicit code explicitly in some way or fasion. Some code examples may be related to questions asked in the Motivation section.

Motivation

While striving to design async drop, there's one fundamental problem which is unresolved. Since Rust uses poll-based futures, future's drop (i.e. future cancellation) should be one of valid ways to deal with async tasks. To achieve flexible usage of futures (via select! for example) cancellation safety becomes a necessity in some circumstances, which in turn requires developer to carefully look at .await expressions and their placement. However, some designs of async drop may automatically generate invisible and sudden (in a new dependency version) await points which would make cancellation safety very difficult.

Also remember that Drop is the only way to make sure your code runs at all exits from the scope, thus it is common to see people writing structs with Drop implementation for only one scope in the entire code. Multiple ways of running these async destructors are discussed: concurrently, sequentially, in a new spawned task; picking one favorite could make others too verbose and tedious.

Consider another problem. Destructors are not allowed to have an error. But are they really infallible? As you may know File practically flushes inside its drop and flushing is actually very much fallible. So is ignoring error the right way there? Maybe, sometimes it's fine, but maybe one time you would have liked to print a warning about that?

Proposal

The core problem may lie in the implicit code. If so adding control over implicit drops may solve many problems. Please remember that new syntax is not finalized yet and async_drop is just there to make a point.

Defer blocks

defer {...} block is defined to copy its contained code to places just like placement of calls to a drop glue. There is a trivial example:

defer {
  println!("hello");
}
println!("world");

Output:

world
hello

It is based upon a similar idea from Defer blocks and async drop. However, these entire scopes are duplicated and inserted near calls to drop glue, thus potentially introducing enormous bloat, (thus maybe this should be the default defer behavior?) So you would usually want to introduce an indirection like:

#[indirect]
defer {
  println!("hello");
}
println!("world");

which desugars into

defer {
  (|| {
    println!("hello");
  })()
}
println!("world");

There could be necessity for defer blocks to not be copied into unwind or coroutine_drop code:

defer #[cfg_where(not(any(unwind, coroutine_drop)))] {
  println!("We are on the happy path!");
}

If conditional move occures we forbid regular defer blocks unless we use weak defer blocks, which are run if variable did not move somewhere else:

let a = "ඞ".to_owned()
#[weak(a)] // or simply #[weak]?
defer {
  println!("Who is that? {a}");
}
if check() {
  foo(a);
}

so that message would be printed only if check() returned false, alternatively

let a = "ඞ".to_owned()
#[on_scope_exit(a)]
defer {
  println!("Who is that? {a}");
}
if check() {
  foo(a);
}

should print during move too.

Implicit and explicit attributes

Let's start with async_drop. #[implicit(async_drop)] would allow to insert implicit code like async_drop(fut).await, introducing invisible await points. Right now this is probably not a desired default behavior, so it's #[explicit(async_drop)] for now, and can be changed between editions.

#[implicit(async_drop)]
async fn foo(x: Bar) {
  let y = x.recv().await;
  println!("This is y: {y:?}");
}

may be equivalent to:

async fn foo(x: Bar) {
  defer { async_drop(x).await };
  let y = x.recv().await;
  println!("This is y: {y:?}");
}

Could there be others? #[implicit(await)]? #[explicit(drop, into_iter, into_future)]? #[explicit(code)] to make "everything" explicit?

#[implicit(await, async_drop)]
async fn foo(x: Bar) {
  let y = x.recv();
  println!("This is y: {y:?}");
}

NOTE: one other important aspect would be to preserve macro hygiene.

Essential finalizer methods

We can mark methods which consume self as #[essential_finalizer] to indicate their importance and lint where it's not used. For example:

// src/foo.rs

#[warn(missing_essential_finalizer)]
// #[cfg_attr_where(warn(missing_essential_finalizer), not(unwind))] // to not warn if finalizer is not called during unwind
struct Foo();

impl Foo {
  #[essential_finalizer]
  pub fn finalize(self) -> Result<()> {
    // flush data and everything
    Ok(())
  }
  
  #[essential_finalizer]
  pub fn finalize_with_param(self, param: usize) -> Result<()> {
    // ...
  }
}

// src/lib.rs
fn process(x: Foo) {
  // /-----^
  // Warning: Hey buddy, you forgot calling one of essential finalizer methods:
  //   Foo::finalize, Foo::finalize_with_param.

  println!("length of x: {}", x.len())
}

so you edit process:

fn process(x: Foo) {
  defer {
    // No longer ignoring errors in destructors!
    if let Err(e) = x.finalize_into_string() {
      log::error!("this is bad: {e:?}");
    }
  };
  println!("length of x: {}", x.len())
}

or to suppress it for one argument:

fn process(#[allow(missing_essential_finalizer)] x: Foo) {
  // or `defer { let _ = x; }`?
  println!("length of x: {}", x.len())
}

Implicit and explicit lints

To ensure gradual adoption of async_drop and other possible features I see some new lints being really helpful:

missing_* tells about missing some explicit or implicit code, like missing_async_drop tells about missing async drops of values or parents of values with impl AsyncDrop

#[warn(missing_async_drop)] // maybe this should be default?
#[explicit(async_drop)] // this is default but anyway
async fn foo(x: Bar) {
  // Warn: you should probably defer async_drop(x).await
  //       because it's `Bar: AsyncDrop` or whatever
  let y = x.recv().await;
  println!("This is y: {y:?}");
}

implicit_* tells about some unwanted implicit code, like implicit_async_drop tells about implicit async_drop(value).await of values with impl AsyncDrop or their parents

#[warn(implicit_async_drop)]
#[implicit(async_drop)]
async fn foo(x: Bar) {
  // Warn: x generates defered async_drop call and awaits it
  //       because it's `Bar: AsyncDrop` or whatever
  let y = x.recv().await;
  println!("This is y: {y:?}");
}

Let-defer declarations and defer assignments

Let-defer declarations allow to declare variables in the defered context, and then defer assignments are used to assign new values to these variables:

let defer mut a = 0;
defer {
  println!("{a}!");
}
defer a = { 42 };

prints:

.eivom taht ni ekil si emit fo wolf ,ees nac uoy sA

We can add ability to declare returned value as variable:

fn foo() -> Result<String> {
  #[return_place]  // Automatic #[cfg_where(return)]
  let defer output;

  defer #[cfg_where(return)] {
    if let Err(e) = output {
      let err_len = e.to_string().len();
      warn!("Get ready this is bad, error message is {err_len} bytes!"),
    }
  }

  // ...

  Ok(())
}

Defer expressions

To explicitly enable async_drop inside expressions we could use:

#[explicit(async_drop)]
async fn foo(x: Bar) {
  baz(x.recv().await(defer |fut| { async_drop(fut).await }));
  // baz(x.recv().defer |fut| { async_drop(fut).await }.await); // Or this?
  // baz(x.recv().defer(async_drop).await); // Or this?
}

or some other way to include defers into expressions

Custom implicit code

With functionality described above, even proc_macros can define attribute to add custom drops or other implicit code. However there could be advantages to make it built-in, like ensuring expansion order.

#[use_custom_drop(MyDrop)]
fn foo() {
  // Dark magic territory
}

Benefits to the internal structure of rustc?

With defer blocks drops can now be generated during HIR stage, instead of during building of MIR, possibly allowing flexibility. It is definitely useful concept for implicit async_drop alone.

josh · December 22, 2023, 7:58pm

While acknowledging that this is a brainstorm rather than a proposal for things that should all be done:

As written, this seems to be proposing multiple new language features with a large surface area: defer blocks, indirect, cfg_where, let defer, implicit_*. Adding anything to the language adds complexity for every user/reader of the language, and so when it's necessary to add a language feature, we need to buy as much capability as possible with that feature while adding as little complexity for the user as we can.

Among other things:

The compiler can easily have a single block of code and multiple jumps to it, so indirect doesn't seem necessary; the compiler already need not duplicate the entire block of code in multiple places.
cfg_where is getting dangerously close to a macro-like system. I don't think we should have a "defer but only on specific paths" mechanism; you can already do that by making use of Option or similar, or by not deferring the code (depending on how large and complex the containing function is and how many exit points it has). You could also have a proc macro transform the function to do this. This seems best handled via ecosystem experimentation before deciding it's common enough to need a language feature.
weak seems like it is working around the fact that this is a generalized defer mechanism rather than a drop-like mechanism for a specific variable. Again, that seems like more complexity than we should add.
implicit/explicit isn't something we should bake into the language in that form. If anything, that's the right territory for a lint, and the existing deny/warn/allow mechanism. Such a lint could start out in clippy, as an "enforce this project's code style" mechanism. (It seems like you note the possibility of using lints for this later in the proposal.)
#[essential_finalizer] seems like, effectively, linear types. Rather than tagging every such method, we could add linear types. Then, you have to call something consuming the object before the end of your scope, or return the object, or otherwise not let it implicitly get dropped.
let defer: it's not obvious from your explanation why this needs the extra defer keyword, rather than just being a let mut x;.
#[return_place]: There are other ways to transform the output of a function (for instance, calling an inner function/closure and then transforming its return value), and this is really starting to feel like a recipe for writing spaghetti code.

Consider what language features would gain the most benefit. Think of it like a constraint-solving problem: what's the minimal set of language features that give the maximal benefit and make it reasonably easy to do anything you want, rather than adding a new language feature every time something doesn't perfectly fit every use case.

HeroicKatora · December 22, 2023, 8:18pm

Having an expression that looks like a block expression yet does not allow code for the local control flow of a block expression just seems confusing. The closure-semantics are, also, already possible. So I would in general prefer if an RFC focussed on the distinguishing characteristics. For instance:

Could defer be made to allow moving-out-of the variable it protects? The closure approach will not easily give back the variable by-value, scopeguard needs to rely on unsafe here.
Could defer be better motivated by protecting variables in various disjunct places, without having them moved out of those places? This is again not possible for closures as changing the drop-flag into a partially moved-from state can not be achieved.
Could defer be integrated with code-flow, fixing the confusion in the introduction? What would be the semantics of return from a deferred destructor, in particular if the scope was only left due to a break? This seems to mesh with your idea of having return values for destructors somehow, it's potentially sort of having multiple return points. Which would need at least a few semantic examples specifically for this to be convincing.
- Most interestingly imho, the finalizer of one value could initialize the variable which another finalizer relies one. As far as I know, there are no ones to statically prove this intialization path to the compiler otherwise at the moment.
  
  Contrived example:
```
let flushed_state;
defer {
    // flushed_state is always initialized, but only after file 
    // is dropped. So we can't initialize it directly..
    if let Err(err) = flushed_state {
         eprintln!("{err:?}");
    }
}
// Contrived syntax to say that defer initializes this variable..
// The inversion of initialization flow is definitiy hard to teach.
defer { flushed_state = file.try_flush() };
```
Is there some unknown value in protecting a variable without having to introduce a wrapper type? (I.e. macros or syntax that relies on the variable having its specific underlying type). And keep in mind the concept of type-states, which was explicitly left out of Rust on release.

dlight · December 23, 2023, 7:30am

A somewhat idiomatic way to write an ad-hoc "defer" in present-day Rust (without supporting crates) is by locally defining types and implementing drop for them, like this

fn f() {
    struct MyType;
    impl Drop for MyType {
        fn drop(&mut self) {
            println!("hello");
        }
    }

    let _s = MyType;

    println!("world");
}

And anyway do we want defer? Look at the code above; I've seen code like this in the wild (but oddly enough I can't find it - and I don't know how to search for it). It looks like a clever hack (and it's nice to know that I can define a type locally like that) and there's something satisfying about it, but it's also very verbose.

For ad hoc cleanups, I think the verbosity of defining drop is a problem. Because of that, Rust discourages ad hoc cleanups and instead insist that if you want to cleanup some data, you should wrap it into a struct and impl drop for that; if you want to cleanup something not tied to a data structure (for example, a db connection), it should have a struct anyway (a "handle") and you should impl drop for that. By systematically avoiding cleaning up in an ad-hoc way, you also avoid the need of defer { } as a language feature.

But despite of that, I think that ad-hoc cleanups are sometimes very handy. So I wish Rust had this feature, even knowing that people might misuse it (the ecosystem is firmly grounded in wrapping things in structs and impl drop on it anyway; nobody is going to defer the release of a lock for example)

But this verbosity can also be solved with a macro. So a quick search finds things like defer (which isn't even a function that receives a closure - maybe the stdlib could have this function?), also defer-lite (which contains the exact same function underneath, but hidden, used only by a defer! { } macro) and scopeguard (that besides defer! { } has some other things, it's very handy)

So the question here is between the language offering defer { } as a language feature vs a third party crate defining defer! { } as a macro.

The problem with little helper macros littered across crates.io like that is that if you need ad-hoc cleanup, it's probably something very trivial (if it were serious, it would be refactored into its own type with a drop impl), which probably doesn't warrant bringing up a dependency (and that's why someone might opt for a locally defined struct like the snippet above).

So maybe a middle ground is to have the stdlib define defer! { }. And I think it definitely should. (I wouldn't mind if the entirety of scopeguard was lifted to the stdlib)

About manually calling drop (rather than having types drop implicitly at the end of scope) - I'm all for it, but I think it requires a way to raise a compile-time error if one forgets dropping. Manual dropping is useful for async drop but also for fallible drop (for example, if we save and close a file by dropping it, we want to handle the case where there were a failure), and in general, to handle effects in drop.

This means linear types, or something stronger than #[must_use], or something else.

CAD97 · December 23, 2023, 8:36am

There's one key benefit of having defer as a language item instead of library functionality: lifetimes. Theoretically, a defer only needs to access the bindings it uses at exit edges, but a drop bound closure borrows immediately. This is why scopeguard has the shape it does, allowing capturing a context which can then be borrowed back out of the guard into the ambient scope. Avoiding that dance is also easier on stack space optimization.

Separately, defer is better than type directed drop glue in exactly one case I know of: sharing of resources between multiple (esp. w.r.t. linear) drops. The easy example is allocator handles. Collections need to each have their own copy of the allocator handle in order to dealloc upon drop, whereas a defer based cleanup makes it trivial to ~~dependency inject~~ provide a shared handle to multiple collections at cleanup time.

Yoric · December 23, 2023, 11:51am

Still in the spirit of brainstorming, a few questions come to mind. For this conversation, I'm going to use the word "guardian" for any kind of drop/defer/finally/...

(When) is it useful for a guardian to be able to inject a new return value for the enclosing function?
(When) is it useful for a guardian to be able to inspect the original return value?
(When) is it useful for a guardian to be able to inspect values other than the one it is guarding?
What is the original return value in case of panic?
If there are several guardians and each wants to inspect/replace this value, how do they interact? How do we help users figure out which is the really final value?

1. Injecting new return values

I believe that the example of File given by @zetanumbers is a fairly compelling use case for 1.

2. Inspecting previous return values and 3. other variables

Having played a bit with Go's implementation of defer, I have seen a few use cases for 2. and 3., e.g. using defer as some kind of post-condition/validation to examine a result (which in Go is often an outvalue, so that's a case of 3.) but only if we're not returning an error (which makes it a case of 2.).

I don't know if this is the kind of pattern we'd be interested in in Rust. At the moment, we can handle such cases by wrapping the function/method in code that performs the validation. We'd need to look at specific examples.

As a side-remark, since Rust typically uses results instead of outvalues, this suggests to me that 3. would be less important.

4. Panics

I guess if a function returns T, a guardian should see a Option<T>, with None representing a panic? Can it be that simple?

5. Interaction between guardians

I feel that this is going to be tricky. If there are several guardians, developers will need to model the order in which guardians are executed, in particular if values are moved at some point.

tmandry · January 18, 2024, 1:09am

I have thought we could get pretty far with a lint-based approach like this for async drop, but we need to understand the limitations.

It's important that the information about whether a type is implicitly droppable is attached to the type, not individual uses of it in a function. The question then is what to do with generic code. Where should the lint occur in the following example, if anywhere, and how should the compiler detect it?

fn debug(x: impl Debug) { println!("{x:?}") }

let x = MyAsyncDropType::new();
debug(x);

This is a problem even with std::mem::drop::<T>. One approach is to detect pass-by-value to a function whose generic argument is not marked with #[warn(missing_essential_finalizer)], which would put the lint on the call site of debug.

But is pass-by-value really enough? Assignment can lead to drops, so we also need to check pass-by-reference. Then we have containers, and so on.

It doesn't take long before you start getting into the territory of ?Drop bounds. Again, I'm not convinced those are necessary for solving this problem, but leveraging the trait system in this way is definitely the "fully correct" approach.

Aside: Perhaps starting from that, and then defining where it is okay to relax requirements, is a useful framing for the problem. I imagine for instance that if we had ?Drop bounds but supplied a lightweight way to work around type that were missing them (AssertDropSafe?) it could work out okay in practice. But since you are already looking at a lint-based approach, I would see how far that can go first.

(Bikeshed: I think essential_finalizer should be called destructor, so I'll use that term here.)

Another approach I don't remember thinking of before is to do the checking at monomorphization time. There it is possible to know the types of every value and whether they are being dropped inside of a #[destructor] or not.

The downside of this is that you would end up with post-monomorphization errors, and the error would occur when monomorphizing debug itself instead of checking the call to debug. The error message could be tweaked to point to the call sites that led to the monomorphization, but in less simple cases these can end up in a user staring at a stack of function calls and trying to assign blame to the right one.

It's also possible we could combine these approaches and attempt to report errors eagerly (pre-monomorphization) when possible and post-monomorphization when it's not. I think that would be a reasonable place to end up if it meant we could defer adding ?Drop bounds to Rust.

jmjoy · January 19, 2024, 3:02am

Defer is not need to become a control syntax, there is already a defer crate.

async_drop is also a lot of discuses, poll_drop may be a solution.

system · April 18, 2024, 3:03am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
A defer discussion language design	70	4409	June 10, 2024
Blog post: Async Cancellation language design	10	6209	February 18, 2022
Asynchronous Destructors	54	14620	February 6, 2021
Pre-RFC: defer statement language design	16	3450	August 18, 2022
Can we reduce the burden of cancel-correctness for async Futures? language design	21	6870	September 7, 2019