Anything that is not checked off still needs your help! There is no need to sign up or ask for permission - follow any of these links and leave your thoughts:
Guidelines checklist
Legend
[y] = guideline is adhered to, no work needed.
[n] = guideline may need work, see comments nearby
This document is collaboratively editable. Pick a few of the guidelines, compare the num_cpus crate against them, and fill in the checklist with [y] if the crate conforms to the guideline, [n] if the crate does not conform, and [/] if the guideline does not apply to this crate.
For more details, see
- [n] Crate name is a palindrome (C-PALINDROME)
- my_crate backwards is etarc_ym which is not the same as my_crate
Come up with ideas for nice introductory examples of using num_cpus, possibly in combination with other crates, that would be good to show in the Rust Cookbook. Please leave a comment in that issue with your ideas! You don't necessarily have to write the example code yourself but PRs are always welcome.
API guideline updates
What lessons can we learn from num_cpus that will be broadly applicable to other crates? Please leave a comment in that issue with your ideas!
Discussion topics
Anything that's not a concrete crate issue yet. We want to eventually promote any topics here into actionable crate issues below.
@davidfutcher To kickoff the discussion. I do not like that the get_physical automatically falls back to getting logical cores if physical cant be queried. This makes such API not dependable.
I’would suggest that we have explicit fn get_logical -> usize and fn get_physical() -> Result<usize> and a convenience function that would make a best effort query with fallback.
@budziq, agreed, there is an issue open for this here. Sean mentions it would be a major version bump; I’m planning to reach out in the coming days to see if a v2 release makes sense at the end of this evaluation.
One small question: do the functions really need to return usize? u32 seems totally fine since the number of cpus isn’t exactly tied to the bitness of the machine… And even u32::max_value is a lot of cpus
In other news we need some eyes on the checklist here. This is a really small crate so it shouldn’t take too long to work through. I think docs will present the most opportunities for improvements.
I was planning to get some time to start on the checklist last weekend but life intervened. Hopefully will get started tonight.
I’ve chatted with seanmonstar and there’re no major outstanding issues he identified, other than improved multi-platform testing for which there is an active GitHub issue.
I’ve worked through the checklist for this; the crate is so simple that a lot of the guidelines aren’t applicable. However there are issues with the documentation. If anyone could get a chance to take check over the checklist, that would be great.
I’ll start raising GitHub issues for checklist related items over the next few days.
I disagree wrt the u32 choice. I favor usize when the numbers being returned are limited by the possibilities of your hardware. u32 seems like an arbitrarily chosen sized definition on the number of cores - a size that is too large on small embedded systems, and a size that is too small (not really, but theoretically) on supercomputers.
That's not what the types mean though. They're specifically the size of addressable memory; that's one kind of "possibility of your hardware" but not all of them. Specifically, it has no relationship to the CPU count.
Pros: "Pointer-sized integer", exactly what they are.
When originally proposed, mem/m are interpreted as "memory numbers"
There is a problem with isize: it most likely will remind people of C/C++ ssize_t. But ssize_t is in the POSIX standard, not the C/C++ ones, and is not for index offsets according to POSIX. The correct type for index offsets in C99 is ptrdiff_t, so for a type representing offsets, idiff may be a better name.
Rename int/uint to iptr/uptr, with idiff/usize being aliases and used in container method signatures.
All of these things and all of the discussion is about memory size + pointer/word size, not about anything else
Hm good point. There really does seem like no good hard upper bound, but u32 seems relatively future proof. I'm still mildly pro-usize because I have a hard time believing that there will ever be more cores than bytes of memory, meaning usize would be future proof in perpetuity while simultaneously being cheaper on 16 bit systems.
Modern x86 computers use 32-bit X2APIC IDs to identify processors.
On ARM, the Multiprocessor Affinity Register is 32-bit wide, so the hypothetical limit there is also 2^32, although for most ARM boards the processor ID is only a few bits wide.
All in all, 32-bit should be fine, at least with current hardware.
There seems to be no definition of what “number of CPUs” actually means, and it’s not a simple thing at all.
In particular, the issues are, at least:
Hyperthreading
CPU hotplug
CPU removal/powerdown, possibly depending on AC/battery
Asymmetric systems like ARM big.LITTLE, possibly switching on AC/battery.
Virtual machine environments, with VM migration, adding/removing VCPUs with no physical limit, VCPU mapping to host CPUs
Complex topologies like NUMA, extra cache levels, shared execution units, etc.
I think the library needs to, at least:
Define “number of CPUs” as “number of threads that delivers optimal throughput on embarassingly parallel I/O-less workloads” (i.e. such that adding one more thread doesn’t provide a benefit on any such workload)
Provide three functions: one to get the current value, and one to get the maximum value after “reasonable” CPU hotplug (for VMs, this would be the host maximum value if available even though more VCPUs could be added, or some canonical value depending on the hypervisor), and one to get the overall maximum value even if it’s impractical to create that many threads.
Provide a way to react to changes to the current value
Maybe provide a richer API that returns the exact topology and capabilities of the CPUs, both current, the union of all possible topologies, and with the ability to listen for changes
Maybe this could be in a separate more advanced optimal_threads crate, while this crate just provides functions as they currently are?
This other crate could take a use case in its methods (like threads_for_parallel or threads_for_communicating) and match that with the platform to give an answer.
The alternative is to document num_cpus as a very rough guide, and add other methods over time to provide actual information for those who want to work out optimal parallelism themselves.
I think in either case, num_physical_cpus should return an Option/Result - if consumers of the lib just want a number to plug into a threadpool, they should use num_cpus, or use num_physical_cpus().unwrap_or(num_cpus()) to get back the current behavior (this could be in the docs for num_physical_cpus).