I agree that a global executor/spawn API probably needs more time to bake in the ecosystem before it is added to std. Particularly I feel that adding just task parallelism doesn't greatly increase the set of libraries that can be written agnostic of async runtime. There are a significant set of other global resources commonly used (TCP connecting/binding, timeouts, file IO). We need more experiments like runtime into how a global runtime can be abstracted over, either as a whole as runtime did it, or as a series of pick-and-choose components.
But, even without the global executor I think it would be worth exploring async fn main and #[test] async fn foo as soon as std::thread::block_on_task is available. There are a lot of libraries that don't (need to) use any of the globals mentioned above, they take in impl {Future, Stream, AsyncRead, ...} and return impl {Future, Stream, AsyncRead, ...} and only use task-internal concurrency.
Primary benefit I see of async fn main is doc examples, futures 0.3 examples are full of
# futures::executor::block_on(async {
...actual example
# })
if rustdoc were to detect top-level await and implicitly change to using async fn main then this wrapper could be dropped. On the other hand in real code a lot of frameworks wouldn't want to use async fn main, e.g. GUI frameworks that need to own the main thread.
Runtime agnostic libraries shouldn't need to be pulling in Tokio in order to run tests.