Lifecycle Notifications Internals Post
Over the last few months, Eliza Weisman and I have realized that idioms that involve RAII guards to mark scopes do not always translate neatly from synchronous to asynchronous code. The proposal outlined here came from Niko’s “Async Interview” with Eliza Weisman, while I—David Barsky—took the idea and ran with it. While I’ve felt the issues most acutely when using tracing, the issue outlined in this post applies to a wide range of “this value can(not) be held across await points”. For example, the lifecycle notifications we propose be used to implement:
- Synchronization primitives like asynchronous locks that shouldn’t be held across
await
s, as they introduce deadlock risks. - Tokio’s automatic cooperative yielding.
- Task locals that could be set/unset on yield or resumption.
This proposal will reference tracing
as the primary example and motivation because I can speak about tracing
’s usecase with far greater confidence than I can about asynchronous locks or Tokio’s internals, not because I think language changes should be informed by tracing
’s needs. With this post, I hope to solicit feedback for use-cases that I missed and gauge interest from additional people.
Background
In the tracing
library, spans are used to track a unit of work in code. Spans are defined axiomatically—spans are whatever the user considers to be a unit of work in their application. tracing
is similar to distributed tracing systems like Dapper, Zipkin, or OpenTelemetry in that it relies on user-driver annotations to generate useful information, but tracing
is highly optimized around in-process instrumentation, where all spans live in the same address space. For the purposes of this post, spans have a few relevant lifecycle stages:
- create
- enter
- exit
- close
While create and close can only be called once for a given a span—whenever a span is created or completed—enter and exit can be called multiple times in the lifecycle of span. In the context of a span decorating a future, a span would be:
- entered whenever the executor polls the future.
- exited whenever the future returns
Poll::Pending
.
The general guidance that the tracing
library provides to end-users is to create a span for each logical unit of work. In a network service, there might several spans per request:
- One top-level span for the request.
- A span to parse and validate the request contents.
- A span for a downstream service call and response.
- A span to stream and close the response.
In this example, each span corresponds to an .await
able future.
Because a single task might be composed of multiple futures which perform distinct units of work, the task-local storage APIs offered by many executors today are insufficiently fine-grained for this use-case. In the above example, an executor like tokio
or async-std
would only be able to distinguish these spans if they were independently spawn
ed. An approach which relies on the executor itself to provide instrumentation has similar issues: the executor is only aware of each task as a single, opaque Future
trait object. Finally, both of these solutions are executor-specific: there is no way to abstract over these executor-provided task-local storage APIs using only the interfaces available in core
.
What is Tracing?
Today, there are a few ways to instrument code with spans in tracing
. The first is through a RAII guard object, which will exit the span when dropped. For example:
use tracing::{span, Level};
let span = span!(Level::INFO, "my_span");
let guard = span.enter();
// code here is within the span
drop(guard);
// code here is no longer within the span
The second is through Span::in_scope
:
let my_span = span!(Level::TRACE, "my_span");
my_span.in_scope(|| {
// this event occurs within the span.
trace!("i'm in the span!");
});
// this event occurs outside the span.
trace!("i'm not in the span!");
The third is through the #[instrument]
attribute macro. The example below creates a new span with the name write
.
use tracing::{info, instrument};
use tokio::{io::AsyncWriteExt, net::TcpStream};
use std::io;
#[instrument]
async fn write(stream: &mut TcpStream) -> io::Result<usize> {
let result = stream.write(b"hello world\n").await;
info!("wrote to stream; success={:?}", result.is_ok());
result
}
The fourth option is through an explicit instrument
combinator:
use tracing_futures::Instrument;
let my_future = async {
// ...
};
my_future
.instrument(tracing::info_span!("my_future"))
.await
The Problem with RAII Guards and async/await
The easiest and simplest option, the RAII guard, unfortunately has the largest misuse-created blast radius due to the interaction of two properties:
- RAII guards assume that as long as they are are not dropped, they are active.
- With async/await and stack pinning, many RAII guards can be alive at the same time.
Normally, this isn’t an issue. But with tracing
, or any debugging tool that makes use of scope guards, the debugging tool will be confused as to which span is currently active. Consider the following code:
async {
let _s = span.enter();
// ...
}
The span guard _s
will not exit until the future generated by the async
block is complete. Since futures and spans can be entered and exited multiple times without them completing, the span remains entered for as long as the future exists, rather than being entered only when it is polled, leading to very confusing and incorrect output. Worse still, the incorrect output might not be noticed unless multiple futures are executing concurrently, possibly in production. Issues pertaining to this misuse are some of the top sources of inquiry and support on tracing
’s Discord channel.
For some historical background, tracing
ended up prioritizing RAII guards for a few reasons:
- People don’t like to unnecessarily indent code. This is particularly important when adding
tracing
to existing code; a change that adds a span to a function shouldn’t result in a git diff that “adds” all of the previously-existing code in the function. - People like the ergonomics of RAII guards, as it allowed them to avoid moving/borrow values into closures.
- Any closure-based API would probably be internally implemented using a private RAII guard regardless, to ensure that the span is unset during a panic.
- Requiring all instrumented code to be in closures passed to tracing functions adds a potentially large number of tracing-related function calls in stack-frame-based diagnostics, such as backtraces or
perf
traces. Using RAII guards avoids cluttering up the diagnostics provided by other tools.
A Naive, and Probably Incorrect, Solution
tracing
, and libraries similar to it, would greatly benefit from some sort of optional callback for values whose stack frame is being suspended. Below is the flawed and naive approach:
trait Lifecycle {
// for notifications on a type's stack frame being suspended.
fn on_yield(
self: Pin<&mut Self>,
ctx: &mut Context<'_>
) { }
// for notifications a type's stack frame being resumed.
fn on_resume(
self: Pin<&mut Self>,
ctx: &mut Context<'_>
) { }
}
Few notes:
- The proposed methods will function like
Drop::drop
, such that an explicit implementation will be called regardless of the type’s membership—explicit plumbing, like forpoll_drop_ready
is not required. - Once
poll_drop_ready()
in a hypotheticalAsyncDrop
is called, neitheron_yield
noron_resume
should be called.
tracing
would implement these lifecycle hooks as:
impl Lifecycle for tracing::span::Entered {
fn on_yield(self: Pin<&mut Self>, _: &mut task::Context<'_>) {
if let Some(inner) = self.span.inner.as_ref() {
inner.subscriber.exit(&inner.id);
}
}
fn on_resume(self: Pin<&mut Self>, _: &mut task::Context<'_>) {
if let Some(inner) = self.span.inner.as_ref() {
inner.subscriber.enter(&inner.id);
}
}
}
This would allow tracing
’s guards to automatically enter and exit spans as their corresponding futures are suspended and resumed.
Known Issues
The proposed Lifecycle
, as is, has several issues, which were helpfully pointed out by Nika. For instance:
- if we take
Pin<&mut Self>
orPin<&Self>
, common patterns in Rust will stop working. You either can't have a mutable guard after a yield point, or the guard can't be mutable, which dramatically limits the usefulness of this extension - If we take
*const Self
a receiver, we can write trivially unsound code like:
let mut guard = span.enter();
let r = &mut guard;
whatever().await;
*r = other_span.enter();
@tmandry and @mystor have proposed alternatives, but those alternatives should be discussed in a different thread.
Alternatives
I can think of a few alternatives:
- The proposed YieldSafe autotrait could be used to mark the span::Entered RAII guard
!YieldSafe
. This would solve the problem of misusing this API, but it would still give us a situation where we have two different APIs that look totally different, one for sync and one for async. This is not ideal ideal for ergonomics and teachability. - A generalization of the
await_holding_lock
in Clippy https://github.com/rust-lang/rust-clippy/pull/5439 that appliestracing
’s spans as well. - In
tracing
, de-emphasizeSpan::enter
-based mechanism and encourage users to make use ofSpan::scope
which works with asynchronous blocks, and in the future, closures. ASpan::scope
that works with asynchronous blocks is already planned for tracing 0.2.
Requested Feedback
- Does this language extension seem useful? Are there applications beyond what I listed?
- What are some alternative places to place this trait? How does this interact with generators, if at all?