and if you don’t write with_scheduler, then you get the default (rayon-core).
Sounds good to me 
Cool, I hadn’t seen that! I’ll take a look and let you know what I think.
Just keep in mind that this paper is more a specification than a rationale discussion, and also, that the problem it solves isn’t necessary what rayon aims to solve.
Bit of history: The design has evolved from the Networking WG which started standardizing Christopher Kohlhoff’s Boost.ASIO (~2012-2013). At the same time, Google wanted to standardize thread-pool-like things, and provide a dynamic API to abstract over those:
It turned out that network people hated virtual calls (big surprise). Anyhow at the same time the Parallel STL technical specification was being approved (the parallel STL is C++'s rayon). It turned out that nvidia wanted the parallel STL to work on their 10.000 core GPUs, and Intel wanted to be able to implement it using Cilk+, and also, to be able to use SIMD, so here things start getting out of hands, now executors (schedulers) need to solve all these problems too. So they become a feature of their own:
And everybody joins to the party complaining that executors don’t solve their problems:
-
N4406: Parallel Algorithms Need Executors (nvidia, 2015, very rayon relevant: why should parallel algorithms be parametrized over a scheduler, superseded by P0058R1, read that instead).
-
P0076R2: Vector and Wavefront Policies (Intel, 2016, kindish relevant)
-
P0058R1: An Interface for Abstracting Execution (Nvidia, 2016, rayon relevant (in particular the task_block part))
-
P0072R1: Light-Weight Execution Agents (Nvidia, 2016, rayon relevant).
-
N4411: Task Block (formerly Task Region) R4 (Intel/Microsoft, 2015, rayon relevant, fork-join parallelism).
- … and many others, Intel with more Cilk+ stuff, AMD with the Heterogeneous Systems Architecture stuff, IBM with OpenMP, and the SIMD proposals…
At the same time, ~2015-2016, Microsoft wrote a full specification for C++ co-routines, implemented those in MSVC and Clang (the LLVM coroutines), and wanted to get this into C++17, so now executors needed to solve co-routine problems as well, particularly, in the context of networking:
And the specification we currently have, is the result of executors going full circle, from networking and thread pools, to data parallelism in its 1000 flavors, and finally through coroutines back to networking. I don’t think this design makes everybody happy (in particular, Google), but it doesn’t make anybody specially unhappy, which is what ISO standardization is all about. Rust can definitely do better.
EDIT: I’ve added the Intel paper on fork-join parallelism, since that is only tangentially addressed by the executors proposals for algorithms.