Out-of-band crate evaluation for 2017-10-13: num_cpus

davidfutcher · September 29, 2017, 6:47pm

Out-of-band crate evaluation for 2017-10-13: num_cpus

For additional contribution opportunities, see the main libz blitz thread.

This post is a wiki. Feel free to edit it.

Links

Needs your help!

Anything that is not checked off still needs your help! There is no need to sign up or ask for permission - follow any of these links and leave your thoughts:

Guidelines checklist

Legend

[y] = guideline is adhered to, no work needed.
[n] = guideline may need work, see comments nearby
[/] = guideline not applicable to this crate

Checklist

Guidelines checklist

This document is collaboratively editable. Pick a few of the guidelines, compare the num_cpus crate against them, and fill in the checklist with [y] if the crate conforms to the guideline, [n] if the crate does not conform, and [/] if the guideline does not apply to this crate.

For more details, see

  - [n] Crate name is a palindrome (C-PALINDROME)
   - my_crate backwards is etarc_ym which is not the same as my_crate

Cookbook recipes

Cookbook example ideas

Come up with ideas for nice introductory examples of using num_cpus, possibly in combination with other crates, that would be good to show in the Rust Cookbook. Please leave a comment in that issue with your ideas! You don't necessarily have to write the example code yourself but PRs are always welcome.

API guideline updates

What lessons can we learn from num_cpus that will be broadly applicable to other crates? Please leave a comment in that issue with your ideas!

Discussion topics

Anything that's not a concrete crate issue yet. We want to eventually promote any topics here into actionable crate issues below.

Crate issues

Issues to file against the num_cpus crate.

How are we tracking?

Pre-review checklist

Create evaluation thread based on this template
Work with author and issue tracker to identify known blockers
Compare crate to guidelines by filling in checklist
Record other questions and notes about crate
Draft several use case statements to serve as cookbook examples
Record recommendations for updated guidelines

Post-review checklist

Create new issues and tracking issue on crate's issue tracker
Solicit use cases for cookbook examples related to the crate
File issues to implement cookbook examples
File issues to update guidelines
Post all approachable issues to TWiR call for participation thread
Update links in coordination thread

budziq · September 29, 2017, 11:08pm

@davidfutcher To kickoff the discussion. I do not like that the get_physical automatically falls back to getting logical cores if physical cant be queried. This makes such API not dependable.

I’would suggest that we have explicit fn get_logical -> usize and fn get_physical() -> Result<usize> and a convenience function that would make a best effort query with fallback.

davidfutcher · September 30, 2017, 12:23am

@budziq, agreed, there is an issue open for this here. Sean mentions it would be a major version bump; I’m planning to reach out in the coming days to see if a v2 release makes sense at the end of this evaluation.

budziq · September 30, 2017, 12:49am

Darn. My bad for only cursory browse of the bugtracker.

KodrAus · October 1, 2017, 7:04am

One small question: do the functions really need to return usize? u32 seems totally fine since the number of cpus isn’t exactly tied to the bitness of the machine… And even u32::max_value is a lot of cpus

SDX2000 · October 1, 2017, 11:43am

What benefit does u32 have over usize here? Rust uses usize to count (almost) everything.

KodrAus · October 1, 2017, 12:17pm

There isn’t really one I think. The decision either way is kind of arbitrary.

steveklabnik · October 1, 2017, 12:25pm

usize is used whenever you need to deal with memory. The fixed sized types should be used for everything else.

I would expect most of these to return u64.

SDX2000 · October 1, 2017, 12:55pm

Yeah I think this is a better guideline.

Lokathor · October 5, 2017, 1:39am

Given that I own a 4-core, 32-bit machine that runs rust code, having to emulate a u64 in software just to count to four strikes me as deeply silly.

If we’re not using usize I’d suggest u32.

KodrAus · October 5, 2017, 7:13am

u32 seems like a fine choice to me.

In other news we need some eyes on the checklist here. This is a really small crate so it shouldn’t take too long to work through. I think docs will present the most opportunities for improvements.

davidfutcher · October 5, 2017, 8:23am

I was planning to get some time to start on the checklist last weekend but life intervened. Hopefully will get started tonight.

I’ve chatted with seanmonstar and there’re no major outstanding issues he identified, other than improved multi-platform testing for which there is an active GitHub issue.

steveklabnik · October 5, 2017, 12:07pm

Yeah sorry, I think u32 makes sense as well; when unsure, I tend to go too big at first, but that does seem far too big.

(Let’s see if this comment holds up in 100 years )

davidfutcher · October 6, 2017, 6:37pm

I’ve worked through the checklist for this; the crate is so simple that a lot of the guidelines aren’t applicable. However there are issues with the documentation. If anyone could get a chance to take check over the checklist, that would be great.

I’ll start raising GitHub issues for checklist related items over the next few days.

twmb · October 6, 2017, 7:54pm

I disagree wrt the u32 choice. I favor usize when the numbers being returned are limited by the possibilities of your hardware. u32 seems like an arbitrarily chosen sized definition on the number of cores - a size that is too large on small embedded systems, and a size that is too small (not really, but theoretically) on supercomputers.

steveklabnik · October 6, 2017, 9:24pm

That's not what the types mean though. They're specifically the size of addressable memory; that's one kind of "possibility of your hardware" but not all of them. Specifically, it has no relationship to the CPU count.

For some history, rfcs/text/0544-rename-int-uint.md at master · rust-lang/rfcs · GitHub is the RFC.

Pros: "Pointer-sized integer", exactly what they are.

When originally proposed, mem/m are interpreted as "memory numbers"

There is a problem with isize: it most likely will remind people of C/C++ ssize_t. But ssize_t is in the POSIX standard, not the C/C++ ones, and is not for index offsets according to POSIX. The correct type for index offsets in C99 is ptrdiff_t, so for a type representing offsets, idiff may be a better name.

Rename int/uint to iptr/uptr, with idiff/usize being aliases and used in container method signatures.

All of these things and all of the discussion is about memory size + pointer/word size, not about anything else

twmb · October 6, 2017, 9:48pm

Hm good point. There really does seem like no good hard upper bound, but u32 seems relatively future proof. I'm still mildly pro-usize because I have a hard time believing that there will ever be more cores than bytes of memory, meaning usize would be future proof in perpetuity while simultaneously being cheaper on 16 bit systems.

But really, u32 seems generally fine.

gabrielm · October 18, 2017, 10:12am

From a hardware perspective:

Modern x86 computers use 32-bit X2APIC IDs to identify processors.
On ARM, the Multiprocessor Affinity Register is 32-bit wide, so the hypothetical limit there is also 2^32, although for most ARM boards the processor ID is only a few bits wide.

All in all, 32-bit should be fine, at least with current hardware.

jmst · October 18, 2017, 4:20pm

There seems to be no definition of what “number of CPUs” actually means, and it’s not a simple thing at all.

In particular, the issues are, at least:

Hyperthreading
CPU hotplug
CPU removal/powerdown, possibly depending on AC/battery
Asymmetric systems like ARM big.LITTLE, possibly switching on AC/battery.
Virtual machine environments, with VM migration, adding/removing VCPUs with no physical limit, VCPU mapping to host CPUs
Complex topologies like NUMA, extra cache levels, shared execution units, etc.

I think the library needs to, at least:

Define “number of CPUs” as “number of threads that delivers optimal throughput on embarassingly parallel I/O-less workloads” (i.e. such that adding one more thread doesn’t provide a benefit on any such workload)
Provide three functions: one to get the current value, and one to get the maximum value after “reasonable” CPU hotplug (for VMs, this would be the host maximum value if available even though more VCPUs could be added, or some canonical value depending on the hypervisor), and one to get the overall maximum value even if it’s impractical to create that many threads.
Provide a way to react to changes to the current value
Maybe provide a richer API that returns the exact topology and capabilities of the CPUs, both current, the union of all possible topologies, and with the ability to listen for changes

derekdreery · October 23, 2017, 1:48pm

Maybe this could be in a separate more advanced optimal_threads crate, while this crate just provides functions as they currently are?

This other crate could take a use case in its methods (like threads_for_parallel or threads_for_communicating) and match that with the platform to give an answer.

The alternative is to document num_cpus as a very rough guide, and add other methods over time to provide actual information for those who want to work out optimal parallelism themselves.

I think in either case, num_physical_cpus should return an Option/Result - if consumers of the lib just want a number to plug into a threadpool, they should use num_cpus, or use num_physical_cpus().unwrap_or(num_cpus()) to get back the current behavior (this could be in the docs for num_physical_cpus).

Topic		Replies	Views
[Pre-RFC] usize is not size_t language design	142	14175	November 9, 2021
Minimal acceptable integer size - i32min, u8min, etc language design	29	3165	August 16, 2020
Should u64 implement From<usize>? libs	20	3199	December 27, 2019
Please can we add a basic num trait? libs	18	8278	April 13, 2022
Policies around default CPU architecture compiler	20	5880	January 14, 2019

Out-of-band crate evaluation for 2017-10-13: num_cpus

Out-of-band crate evaluation for 2017-10-13: num_cpus

Links

Needs your help!

Guidelines checklist

Cookbook recipes

API guideline updates

Discussion topics

Crate issues

How are we tracking?

Pre-review checklist

Post-review checklist

Related topics