One of the goals of the async crash dump debugging initiative is to implement the ability for getting "logical stack traces". The "Grace debugs a crash dump again" shiny future story states that we would like to have such a logical stack trace for each task owned by the runtime -- but it does not define in detail what that means exactly. I'd like us to remedy that here :)
In the following I'll describe my assumptions and what I imagine this to look like. This is meant as a starting point for diving into a broader discussion of the topic.
Assumptions
A task corresponds to a tree of futures with a single root future.
The future trees of tasks can overlap, i.e. a future somewhere in the tree can be a task too.
Some of the futures we are dealing with are handwritten, some are generated by the compiler from an async function or block.
We don't care about "free-floating" futures, i.e. futures that have been created but that are not awaited anywhere.
Logical stack traces don't track spawning of tasks: If task A spawns task B then
the logical stack trace of A will not contain B's tree of futures.
But if A then awaits B, B's tree would show up as a subtree of A.
Given a pointer to a future object f, we can assume that we can always get a pointer to each child future of f with a GEP-like operation. I.e. child futures are reachable from fields of f where such a field could be compiler-generated in the case of async fns/blocks.
We would like this to work out of the box for async functions and blocks, with no additional runtime overhead, as long as debuginfo is present.
Example
The following gives an example of a simple program and the expected traces we'd get if the task were suspended at various points.
async fn bar() { /* ... */ }
async fn baz() { /* ... */ }
async fn sleep() {
// sleep a while
}
async fn foo() {
let bar = bar();
let baz = baz();
sleep().await;
// suspension point #1
// The futures `bar` and `baz` have been created
// but are not being polled.
let joined = futures::future::join(bar, baz);
sleep().await;
// suspension point #2
// `joined` is created but still not being polled
joined.await;
// suspension point #3
println!("done")
}
Trace at suspension point #1:
task 0x1234 ~ blocked on timer 0x789
my_program::sleep() suspended at src/foo.rs:24 [blocked on timer 0x789]
|
my_program::foo() suspended at src/foo.rs:36
Trace at suspension point #2:
task 0x1234 ~ blocked on timer 0xabc
my_program::sleep() suspended at src/foo.rs:24 [blocked on timer 0xabc]
|
my_program::foo() suspended at src/foo.rs:40
Trace at suspension point #3:
task 0x1234
my_program::bar() suspended at src/foo.rs:19 [blocked on something]
\
\ my_program::baz() completed at src/foo.rs:21
\ /
futures::Join
|
my_program::foo() suspended at src/foo.rs:43
Some thoughts on how to implement this
The debugger has to perform a number of distinct tasks if it wants to generate a logical stack trace:
Given a future object, extract (file, line, col) of its suspension point. For async functions and blocks, this information is determined by the generator state value and could be encoded in debuginfo. For manual implementations of Future we don't have a generator state and there is no good place in debuginfo that could hold the information.
Given a future object, extract references to its child futures.
Given a future object, correlate it to a resource it is waiting for (e.g. a timer) or extract some other useful description.
Given a task object, extract the root future object, which involves
getting the address of the future object, and
getting the debuginfo of the object's concrete type.
Task objects are implemented by executor runtimes, so this is something that needs to implemented per runtime.
It would be nice if we could have a single framework for encoding the needed information into debuginfo and runtime state, that can be used by both the compiler (for async fns and blocks) and end-users (for annotating manual Future implementations). However, the two cases look quite distinct. If it turns out a single framework would mean that there is unnecessary runtime overhead or that it is cumbersome to do manual implementations then it would also be OK if the two cases are supported by distinct mechanisms.
select!() and join!() will need extra support. Overall it is preferable to use types implementing Future instead of macros that will generate local variables which are much harder to deal with in a debugger, especially for optimized builds.
Next steps
It would be great to hear what you all think about this, in particular:
Are the assumptions listed above correct?
Does the example trace look like what you imagined for the feature?
Does anyone else have thoughts for how to implement this?
Does the description above miss something important?
Also, please provide further examples that you'd like to be supported. I'll soon start prototyping things at GitHub - rust-lang/rust-dbg-ext and intend to have regression tests for all relevant scenarios. I'd also like to discuss implementation strategies - but maybe in a separate thread.
At one of the Rust all hands a couple of years ago we sketched out a potential approach to this that was based off of adding an extra field to Context that could hold a sort of manually created stack trace in a linked list. "Terminal" futures that hold onto the Waker etc would call some function to "commit" the trace to the runtime or whatever was tracking this.
Very pseudo-codey:
impl Future for MyFuture {
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
let mut new_cx = cx.with_trace("MyFuture"); // with_trace would be #[track_caller] and record file/line/etc.
self.project().inner.poll(&mut new_cx)
}
}
impl Future for Sleep {
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
cx.waiting("Sleep");
// ...
}
}
The nice thing here is that you should be able to build up the "stack trace" without needing to allocate by storing the nodes in the stack frames of the futures. Tracking this would add some overhead, but much less than actually unwinding the stack at every suspend point.
@sfackler, given your approach, how would you reconstruct the trace if you just have the future object, but it's not currently being polled? E.g. if you had a crash dump of a process where no task is currently being executed because they are all waiting indefinitely for some reason.
This looks like the thing I've really been wanting for awhile. Exciting!
You could handle this somewhat reasonably without special casing by changing the assumption that we don't care about "free-floating" futures, and displaying any future types embedded in a generator. The generator debuginfo will tell you which fields are valid at each suspend point so the debugger doesn't try to inspect old futures that have been overwritten with some other state.
Again I think we can get pretty far by traversing the type debuginfo for any fields of a type which implements Future. I'm assuming we have someway of flagging this in the debuginfo.
For leaf futures which are blocked on some resource the general solution is not as clear to me.
I think "task provenance" would be a great feature to have but it's also relatively easy to implement with #[track_caller] or actually capturing a backtrace at each spawn point. Both would require some kind of runtime integration. Ideally the tooling would be able to show this info alongside the logical stack trace.
We might be might be able to get away with no special annotations at all for manually implemented Futures. That would be a big plus.
Cons:
We'd get false positives, that is, the stack trace would show a dependence where there actually is none. But maybe there is some information somewhere that would allow us to filter those false positives out or display them differently. E.g. something in the state of the child future. If the child future is a task, it would probably store a waker for the parent task if the parent is actually awaiting it.
I'm not sure if debuginfo encodes the fact that a type implements Future in any way at the moment. But I think we have to implement that one way or other regardless. We'll need to experiment with what we can do without breaking existing debugger support.
There's also futures::RemoteHandle which simply has a oneshot channel to get the task's result back. It would be good if whatever system of linking tasks together could support non-integrated handles like this somehow.
We already attach some debuginfo about the suspend point to the generator variant types (just file/line for now). For cases where the parent future is a generator, maybe we can store some extra debuginfo keys about the object actually being awaited at that state. It would have a be a field in the generator, I think.
Looking at the child futures, for generators it's trivial to tell if they've been polled yet or not.
For hand-written generators we probably need some kind of custom annotation scheme.
I don't think it does, but since type debuginfo is a set of key/value pairs, maybe we have leeway to define our own custom keys for things like this.
I noticed that generators seem to always capture the pinned variable that is generated by .await. If we give that a more special name (in debuginfo we can even give it a name that cannot clash with user-defined names, like awaitee$) then it would be easy for debuggers to find it. The question is whether we can rely on that variable to always be present.
By looking at the __state field, you mean?
I'm hoping that we can use Dwarf's DW_TAG_inheritance for attaching a list to all implemented traits to a type. That tag should be well supported by LLVM and should also map reliably to CodeView.