Maybe this could be in a separate more advanced optimal_threads crate, while this crate just provides functions as they currently are?
This other crate could take a use case in its methods (like threads_for_parallel or threads_for_communicating) and match that with the platform to give an answer.
The alternative is to document num_cpus as a very rough guide, and add other methods over time to provide actual information for those who want to work out optimal parallelism themselves.
I think in either case, num_physical_cpus should return an Option/Result - if consumers of the lib just want a number to plug into a threadpool, they should use num_cpus, or use num_physical_cpus().unwrap_or(num_cpus()) to get back the current behavior (this could be in the docs for num_physical_cpus).