Nice write up!
One question: by setup/teardown do you also mean process setup/teardown? It is not 100% clear to me whether this is what you mean 
One thought, a lot of people seem to only want to customize the output format, so maybe we can have an API for plugin in a different test or bench runners but they all produce JSON, and we have a different API for pluggin in a formatter (so the JSON formatter does nothing, but we can have a formatter to human readable text that transform JSON to the output we have today). That way, those who want a specific let’s say XML output for some CI system, or maybe even a different type of JSON for an IDE, only have to plug in a relatively small component, and that component would work with all test runners that users might want to use.
This might be overengineering, but it at least makes sense to me that “how to run test” and “how to display test results” are two different orthogonal problems.