Pre-RFC: Lifecycle Trait

Lifecycle Notifications Internals Post

Over the last few months, Eliza Weisman and I have realized that idioms that involve RAII guards to mark scopes do not always translate neatly from synchronous to asynchronous code. The proposal outlined here came from Niko’s “Async Interview” with Eliza Weisman, while I—David Barsky—took the idea and ran with it. While I’ve felt the issues most acutely when using tracing, the issue outlined in this post applies to a wide range of “this value can(not) be held across await points”. For example, the lifecycle notifications we propose be used to implement:

  • Synchronization primitives like asynchronous locks that shouldn’t be held across awaits, as they introduce deadlock risks.
  • Tokio’s automatic cooperative yielding.
  • Task locals that could be set/unset on yield or resumption.

This proposal will reference tracing as the primary example and motivation because I can speak about tracing’s usecase with far greater confidence than I can about asynchronous locks or Tokio’s internals, not because I think language changes should be informed by tracing’s needs. With this post, I hope to solicit feedback for use-cases that I missed and gauge interest from additional people.

Background

In the tracing library, spans are used to track a unit of work in code. Spans are defined axiomatically—spans are whatever the user considers to be a unit of work in their application. tracing is similar to distributed tracing systems like Dapper, Zipkin, or OpenTelemetry in that it relies on user-driver annotations to generate useful information, but tracing is highly optimized around in-process instrumentation, where all spans live in the same address space. For the purposes of this post, spans have a few relevant lifecycle stages:

  • create
  • enter
  • exit
  • close

While create and close can only be called once for a given a span—whenever a span is created or completed—enter and exit can be called multiple times in the lifecycle of span. In the context of a span decorating a future, a span would be:

  • entered whenever the executor polls the future.
  • exited whenever the future returns Poll::Pending.

The general guidance that the tracing library provides to end-users is to create a span for each logical unit of work. In a network service, there might several spans per request:

  • One top-level span for the request.
  • A span to parse and validate the request contents.
  • A span for a downstream service call and response.
  • A span to stream and close the response.

In this example, each span corresponds to an .awaitable future.

Because a single task might be composed of multiple futures which perform distinct units of work, the task-local storage APIs offered by many executors today are insufficiently fine-grained for this use-case. In the above example, an executor like tokio or async-std would only be able to distinguish these spans if they were independently spawned. An approach which relies on the executor itself to provide instrumentation has similar issues: the executor is only aware of each task as a single, opaque Future trait object. Finally, both of these solutions are executor-specific: there is no way to abstract over these executor-provided task-local storage APIs using only the interfaces available in core.

What is Tracing?

Today, there are a few ways to instrument code with spans in tracing. The first is through a RAII guard object, which will exit the span when dropped. For example:

use tracing::{span, Level};
    
let span = span!(Level::INFO, "my_span");
let guard = span.enter();
// code here is within the span
drop(guard);
// code here is no longer within the span

The second is through Span::in_scope:

let my_span = span!(Level::TRACE, "my_span");
    
my_span.in_scope(|| {
    // this event occurs within the span.
    trace!("i'm in the span!");
});

// this event occurs outside the span.
trace!("i'm not in the span!");

The third is through the #[instrument] attribute macro. The example below creates a new span with the name write.

use tracing::{info, instrument};
use tokio::{io::AsyncWriteExt, net::TcpStream};
use std::io;
    
#[instrument]
async fn write(stream: &mut TcpStream) -> io::Result<usize> {
    let result = stream.write(b"hello world\n").await;
    info!("wrote to stream; success={:?}", result.is_ok());
    result
}

The fourth option is through an explicit instrument combinator:

use tracing_futures::Instrument;
    
let my_future = async {
    // ...
};

my_future
    .instrument(tracing::info_span!("my_future"))
    .await

The Problem with RAII Guards and async/await

The easiest and simplest option, the RAII guard, unfortunately has the largest misuse-created blast radius due to the interaction of two properties:

  • RAII guards assume that as long as they are are not dropped, they are active.
  • With async/await and stack pinning, many RAII guards can be alive at the same time.

Normally, this isn’t an issue. But with tracing, or any debugging tool that makes use of scope guards, the debugging tool will be confused as to which span is currently active. Consider the following code:

 async {
    let _s = span.enter();
    // ...
}

The span guard _s will not exit until the future generated by the async block is complete. Since futures and spans can be entered and exited multiple times without them completing, the span remains entered for as long as the future exists, rather than being entered only when it is polled, leading to very confusing and incorrect output. Worse still, the incorrect output might not be noticed unless multiple futures are executing concurrently, possibly in production. Issues pertaining to this misuse are some of the top sources of inquiry and support on tracing’s Discord channel.

For some historical background, tracing ended up prioritizing RAII guards for a few reasons:

  1. People don’t like to unnecessarily indent code. This is particularly important when adding tracing to existing code; a change that adds a span to a function shouldn’t result in a git diff that “adds” all of the previously-existing code in the function.
  2. People like the ergonomics of RAII guards, as it allowed them to avoid moving/borrow values into closures.
  3. Any closure-based API would probably be internally implemented using a private RAII guard regardless, to ensure that the span is unset during a panic.
  4. Requiring all instrumented code to be in closures passed to tracing functions adds a potentially large number of tracing-related function calls in stack-frame-based diagnostics, such as backtraces or perf traces. Using RAII guards avoids cluttering up the diagnostics provided by other tools.

A Naive, and Probably Incorrect, Solution

tracing, and libraries similar to it, would greatly benefit from some sort of optional callback for values whose stack frame is being suspended. Below is the flawed and naive approach:

trait Lifecycle {
    // for notifications on a type's stack frame being suspended.
    fn on_yield(
        self: Pin<&mut Self>, 
        ctx: &mut Context<'_>
    ) { }

    // for notifications a type's stack frame being resumed.
    fn on_resume(
        self: Pin<&mut Self>, 
        ctx: &mut Context<'_>
    ) { } 
}

Few notes:

  • The proposed methods will function like Drop::drop, such that an explicit implementation will be called regardless of the type’s membership—explicit plumbing, like for poll_drop_ready is not required.
  • Once poll_drop_ready() in a hypothetical AsyncDrop is called, neither on_yield nor on_resume should be called.

tracing would implement these lifecycle hooks as:

impl Lifecycle for tracing::span::Entered {
    fn on_yield(self: Pin<&mut Self>, _: &mut task::Context<'_>) {
       if let Some(inner) = self.span.inner.as_ref() {
            inner.subscriber.exit(&inner.id);
        }
    }

    fn on_resume(self: Pin<&mut Self>, _: &mut task::Context<'_>) {
       if let Some(inner) = self.span.inner.as_ref() {
            inner.subscriber.enter(&inner.id);
        }
    }
}

This would allow tracing’s guards to automatically enter and exit spans as their corresponding futures are suspended and resumed.

Known Issues

The proposed Lifecycle, as is, has several issues, which were helpfully pointed out by Nika. For instance:

  • if we take Pin<&mut Self> or Pin<&Self>, common patterns in Rust will stop working. You either can't have a mutable guard after a yield point, or the guard can't be mutable, which dramatically limits the usefulness of this extension
  • If we take *const Self a receiver, we can write trivially unsound code like:
let mut guard = span.enter();
let r = &mut guard;
whatever().await;
*r = other_span.enter();

@tmandry and @mystor have proposed alternatives, but those alternatives should be discussed in a different thread.

Alternatives

I can think of a few alternatives:

  • The proposed YieldSafe autotrait could be used to mark the span::Entered RAII guard !YieldSafe. This would solve the problem of misusing this API, but it would still give us a situation where we have two different APIs that look totally different, one for sync and one for async. This is not ideal ideal for ergonomics and teachability.
  • A generalization of the await_holding_lock in Clippy https://github.com/rust-lang/rust-clippy/pull/5439 that applies tracing’s spans as well.
  • In tracing, de-emphasize Span::enter-based mechanism and encourage users to make use of Span::scope which works with asynchronous blocks, and in the future, closures. A Span::scope that works with asynchronous blocks is already planned for tracing 0.2.

Requested Feedback

  • Does this language extension seem useful? Are there applications beyond what I listed?
  • What are some alternative places to place this trait? How does this interact with generators, if at all?
8 Likes

Could you expand on this? A lock that unlocks around every yield would be unsound if the user e.g. has a reference to the inner data.

What would interaction with types implementing this trait look like with manual futures implementations?

Oh, true. Disregard that, then!

What would interaction with types implementing this trait look like with manual futures implementations?

I'm not entirely sure! I think there's a bit of behavioral overlap for manual futures implementations, in that a manual future implementation can do everything that this lifecycle trait can do.

I have 3 comments regarding this one:

This is not about RAII, but about thread-local storage

The description here outlines that RAII guards are somehow problematic with async/await code, since they are hold across various tasks. This is not really true. RAII guards work just fine in async/await code. They are kept for the current scope inside the task. If the RAII guard would store store only state that belongs to that particular task - like common RAII guards do (e.g. for sync/async Mutexes), everything would be fine. There would be abosolutely no side-effects when other tasks which also make use of RAII guards are active.

What is not mentioned in the description: In tracings case the RAII guard manipulates thread-local variables. This means there is "invisible" shared state between all of the tasks which run on a thread, which does not get informed when about task state changes (task yields/resumption).

If tracing would not make use of thread-local storage and just explicitly forward all required state, the issue would not exist for it - independent of whether RAII guards are a thing. Not favoring to do this was a design decision of tracing - and as always these decision lead to tradeoffs. In this case that thread-locals don't really work well with async/await.

The issue is kind of the same as trying to store all application-state in a multi-threaded program in a single global variable - that would also be problematic :slight_smile: - even if the access would be properly synchronized.

The proposal breaks the async/await illusuion even more

A goal of async/await is to allow people to write code that just looks and behaves like regular threaded code. Ideally we would just get rid of all the annotations, and just have a less resource consuming thread. Application developers shouldn't be concerned with the task yielding and getting resumed, and all of the details of getting it right (e.g. pinning).

Rusts current state of async/await isn't really great in that regard, there are a couple of important differences between how sync and async code behaves. E.g. around lifetimes, the ability to use functions in traits, cancellation behavior, etc. Hopefully we will find solutions for them, so that "async is less special".

However this proposal actually exectly goes in the opposite direction: It makes async more special, by adding a hook that makes the yield/resume distincation more visible. I think this might be ok for any real state-machine or generator support. There application programmers explicitly want to "resume" something in get a single next value. But I don't think it's a great fit for async/await where people just want to call a method and to run it mostly like a synchronous method.

There are no guarantees about Lifecycle trait methods being called

I think the Lifecycle trait will have exactly the same problem as the proposed poll_drop mechanism. Nothing guarantees that those methods will ever get called. Maybe the compiler could automatically insert the calls for state-machines generated from async/await. However for all manual state-machine/Future implementations authors of that implementation would need to be aware about the new methods and support them. If someone implements select! and forgets about on_yield, on_resume or poll_drop, they simply wouldn't be called even though the Future had yielded/resumed/destroyed.

So a library and application could never rely on these methods being called for correctness.

It's also not as easy as updating a small set of core types/combinators. The amount of hand-written Futures in Rusts async/await world is still huge, since anything async doesn't really compose very well, and we don't even have solutions for AsyncWrite/Read/Stream/Sink etc types. Those would all need to get awareness about the new trait(s).

Alternatives

I think this section misses the "introduce and use task-local storage" approach, which might also work. This will obviously also have it's downsides (task-local storage is not standardized, will not work with every executor, and what do you do in real synchronous code?). But it seems as good as an option as the others.

I don't think the YieldSafe and clippy lints are solutions to the tracing problem. They could highlight that tracing gets broken when users use the tracing RAII guards with .await, but they don't present a better way to make it work.

The most reliable way to get things to work seems to be manual state/span forwarding, which doesn't require any new feature.

5 Likes

Thanks for your detailed and thoughtful response, @Matthias247. I'll address your comments in turn.

The description here outlines that RAII guards are somehow problematic with async/await code, since they are hold across various tasks. This is not really true. RAII guards work just fine in async/await code. They are kept for the current scope inside the task.

My primary motivation for this proposal is this: RAII guards, when used to model scopes with some effect that occurs on Drop, don't work in ways that users expect them to in an asynchronous context. Those user's expectations come from writing similar code in synchronous contexts. I think we can improve the state of the world here in asynchronous contexts.

What is not mentioned in the description: In tracing s case the RAII guard manipulates thread -local variables. This means there is "invisible" shared state between all of the tasks which run on a thread, which does not get informed when about task state changes (task yields/resumption).

tracing could easily remove thread-local state today; it is not integral to the design of tracing. The use of thread-local state is just an performance optimization to reduce unnecessary cross-thread traffic. I also don't know if your reference to "thread-local state" is meant to be a stand-in for a more general notion of "implicit state"—can you clarify?

If tracing would not make use of thread-local storage and just explicitly forward all required state, the issue would not exist for it - independent of whether RAII guards are a thing. Not favoring to do this was a design decision of tracing - and as always these decision lead to tradeoffs. In this case that thread-locals don't really work well with async/await.

Perhaps this is unfair, but I feel that "just explicitly forward all required state" trivializes why tracing opted for the design that it did. I'll explain why some of those of decisions have been made:

  • Adding instrumentation and diagnostics should not be a breaking change to library APIs.
  • Users who do not want to use tracing (and instead use something like log) don't have to be aware of tracing. They can continue using log-based Loggers without knowing or caring about what their dependencies use for logging/debugging.

Explicit context propagation fails on those points.

A goal of async/await is to allow people to write code that just looks and behaves like regular threaded code. Ideally we would just get rid of all the annotations, and just have a less resource consuming thread. Application developers shouldn't be concerned with the task yielding and getting resumed, and all of the details of getting it right (e.g. pinning).

I completely agree with you! This proposal is targeted at library developers such that application developers don't need to be concerned about the task or future yielding or resuming. Unfortunately, application developers do need to concern themselves with these details today, otherwise instrumentation code that just works in synchronous code is broken in asynchronous code.

But I don't think it's a great fit for async/await where people just want to call a method and to run it mostly like a synchronous method.

I disagree. I believe that there is currently an unaddressed gap in capabilities of asynchronous and synchronous code. That gap is somewhat related to RAII guards, but it's more related to making effects bound to a lexical scope work the way that users expect them to work in a synchronous context. Far from trying to introduce differences between synchronous and asynchronous code, this proposal is trying to reduce the differences.

I think the Lifecycle trait will have exactly the same problem as the proposed poll_drop mechanism. Nothing guarantees that those methods will ever get called. Maybe the compiler could automatically insert the calls for state-machines generated from async/await. However for all manual state-machine/ Future implementations authors of that implementation would need to be aware about the new methods and support them. If someone implements select! and forgets about on_yield , on_resume or poll_drop , they simply wouldn't be called even though the Future had yielded/resumed/destroyed.

This doesn't run into the same issues that poll_drop ran into because I don't expect this Lifecycle trait to be implemented on manual futures implementations in the first place. I call this point out here:

The proposed methods will function like Drop::drop , such that an explicit implementation will be called regardless of the type’s membership—explicit plumbing, like for poll_drop_ready is not required.

So a library and application could never rely on these methods being called for correctness.

I'll make a small addendum—correctness with regard to memory safety. However, as for modeling effects (such as closing a file descriptor on drop; releasing a mutex; etc.), Rust code already relies on Drop. This proposal isn't an aberration from the norm; it is a continuation.

I think this section misses the "introduce and use task-local storage" approach, which might also work. This will obviously also have it's downsides (task-local storage is not standardized, will not work with every executor, and what do you do in real synchronous code?). But it seems as good as an option as the others.

It's something we've considered, but we've found that that per-task spans are insufficiently fine-grained as tracing users often instrument on a per-future basis. This problem is most apparent in a select!—how do you know which future is executing?

The most reliable way to get things to work seems to be manual state/span forwarding, which doesn't require any new feature.

We do some manual forwarding now with an .instrument combinator, but I believe that I've illustrated why I believe this is an insufficient solution in prior sections.

1 Like

To add to this, I see this proposal as a generalization of RAII in the context of coroutines. Whereas a function (technically, a scope) can be entered or exited, we now have entered, exited, and suspended. (These are the states of our state machine; the transitions are what @endsofthreads listed above.)

Like RAII, this mechanism is fully "internal" to the function. Any user of the async fn (or gen fn for that matter) doesn't have to care about the fact that it contains Lifecycle objects. Instead, the calls to these Lifecycle methods get inserted into the code of the function, before it suspends itself and after it resumes.

(Correct me if I'm wrong, @endsofthreads?)

In general I think that making async "less special" is a good goal, but I'd push back on the idea of not being concerned with yielding and resuming – that's a fundamental concept of async/await, and it's encoded in the language through the use of .await. This is an explicit acknowledgement of the fact that an async fn is not an ordinary function, but one that suspends its execution. We wouldn't hide that distinction if we could, and I think that point has already been litigated in the RFC process for async/await.

I also think this distinction is enough to justify a design like the one being proposed. There is some complexity cost from any new language feature, but I would argue that this doesn't add new conceptual complexity, just acknowledges that which is already there.

It's possible we could use the same lifecycle methods to record a yield as an await – after all, they both have the same effect of suspending/resuming the current execution context. Perhaps there's a case to be made for separate lifecycle methods (so you can distinguish between yield and await – e.g. in an async gen fn), but I don't know what it would be.

4 Likes

Nothing to correct! You nailed this explanation better than I did!

That's a good point. I suspect that having a lifecycle method/callback for yields is unnecessary for the same reason that implementing the Lifecycle method on manual future/stream implementations is unnecessary: the effect of a Lifecycle trait can be modeled by a closure that wraps the yield. I haven't thought too much about this, however.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.