Out-of-band crate evaluation for 2017-10-13: num_cpus

Out-of-band crate evaluation for 2017-10-13: num_cpus

For additional contribution opportunities, see the main libz blitz thread.

This post is a wiki. Feel free to edit it.

Links

Needs your help!

Anything that is not checked off still needs your help! There is no need to sign up or ask for permission - follow any of these links and leave your thoughts:

Guidelines checklist

Legend

  • [y] = guideline is adhered to, no work needed.
  • [n] = guideline may need work, see comments nearby
  • [/] = guideline not applicable to this crate

Checklist

Guidelines checklist

This document is collaboratively editable. Pick a few of the guidelines, compare the num_cpus crate against them, and fill in the checklist with [y] if the crate conforms to the guideline, [n] if the crate does not conform, and [/] if the guideline does not apply to this crate.

For more details, see

  - [n] Crate name is a palindrome (C-PALINDROME)
   - my_crate backwards is etarc_ym which is not the same as my_crate

Cookbook recipes

Cookbook example ideas

Come up with ideas for nice introductory examples of using num_cpus, possibly in combination with other crates, that would be good to show in the Rust Cookbook. Please leave a comment in that issue with your ideas! You don't necessarily have to write the example code yourself but PRs are always welcome.

API guideline updates

What lessons can we learn from num_cpus that will be broadly applicable to other crates? Please leave a comment in that issue with your ideas!

Discussion topics

Anything that's not a concrete crate issue yet. We want to eventually promote any topics here into actionable crate issues below.

Crate issues

Issues to file against the num_cpus crate.

How are we tracking?

Pre-review checklist

  • Create evaluation thread based on this template
  • Work with author and issue tracker to identify known blockers
  • Compare crate to guidelines by filling in checklist
  • Record other questions and notes about crate
  • Draft several use case statements to serve as cookbook examples
  • Record recommendations for updated guidelines

Post-review checklist

  • Create new issues and tracking issue on crate's issue tracker
  • Solicit use cases for cookbook examples related to the crate
  • File issues to implement cookbook examples
  • File issues to update guidelines
  • Post all approachable issues to TWiR call for participation thread
  • Update links in coordination thread
4 Likes

@davidfutcher To kickoff the discussion. I do not like that the get_physical automatically falls back to getting logical cores if physical cant be queried. This makes such API not dependable.

I’would suggest that we have explicit fn get_logical -> usize and fn get_physical() -> Result<usize> and a convenience function that would make a best effort query with fallback.

1 Like

@budziq, agreed, there is an issue open for this here. Sean mentions it would be a major version bump; I’m planning to reach out in the coming days to see if a v2 release makes sense at the end of this evaluation.

Darn. My bad for only cursory browse of the bugtracker.

One small question: do the functions really need to return usize? u32 seems totally fine since the number of cpus isn’t exactly tied to the bitness of the machine… And even u32::max_value is a lot of cpus :slight_smile:

What benefit does u32 have over usize here? Rust uses usize to count (almost) everything.

There isn’t really one I think. The decision either way is kind of arbitrary.

1 Like

usize is used whenever you need to deal with memory. The fixed sized types should be used for everything else.

I would expect most of these to return u64.

3 Likes

Yeah I think this is a better guideline.

Given that I own a 4-core, 32-bit machine that runs rust code, having to emulate a u64 in software just to count to four strikes me as deeply silly.

If we’re not using usize I’d suggest u32.

u32 seems like a fine choice to me.

In other news we need some eyes on the checklist here. This is a really small crate so it shouldn’t take too long to work through. I think docs will present the most opportunities for improvements.

I was planning to get some time to start on the checklist last weekend but life intervened. Hopefully will get started tonight.

I’ve chatted with seanmonstar and there’re no major outstanding issues he identified, other than improved multi-platform testing for which there is an active GitHub issue.

1 Like

Yeah sorry, I think u32 makes sense as well; when unsure, I tend to go too big at first, but that does seem far too big.

(Let’s see if this comment holds up in 100 years :wink: )

4 Likes

I’ve worked through the checklist for this; the crate is so simple that a lot of the guidelines aren’t applicable. However there are issues with the documentation. If anyone could get a chance to take check over the checklist, that would be great.

I’ll start raising GitHub issues for checklist related items over the next few days.

1 Like

I disagree wrt the u32 choice. I favor usize when the numbers being returned are limited by the possibilities of your hardware. u32 seems like an arbitrarily chosen sized definition on the number of cores - a size that is too large on small embedded systems, and a size that is too small (not really, but theoretically) on supercomputers.

That's not what the types mean though. They're specifically the size of addressable memory; that's one kind of "possibility of your hardware" but not all of them. Specifically, it has no relationship to the CPU count.

For some history, rfcs/text/0544-rename-int-uint.md at master · rust-lang/rfcs · GitHub is the RFC.

Pros: "Pointer-sized integer", exactly what they are.

When originally proposed, mem/m are interpreted as "memory numbers"

There is a problem with isize: it most likely will remind people of C/C++ ssize_t. But ssize_t is in the POSIX standard, not the C/C++ ones, and is not for index offsets according to POSIX. The correct type for index offsets in C99 is ptrdiff_t, so for a type representing offsets, idiff may be a better name.

Rename int/uint to iptr/uptr, with idiff/usize being aliases and used in container method signatures.

All of these things and all of the discussion is about memory size + pointer/word size, not about anything else

Hm good point. There really does seem like no good hard upper bound, but u32 seems relatively future proof. I'm still mildly pro-usize because I have a hard time believing that there will ever be more cores than bytes of memory, meaning usize would be future proof in perpetuity while simultaneously being cheaper on 16 bit systems.

But really, u32 seems generally fine.

From a hardware perspective:

  • Modern x86 computers use 32-bit X2APIC IDs to identify processors.

  • On ARM, the Multiprocessor Affinity Register is 32-bit wide, so the hypothetical limit there is also 2^32, although for most ARM boards the processor ID is only a few bits wide.

All in all, 32-bit should be fine, at least with current hardware.

1 Like

There seems to be no definition of what “number of CPUs” actually means, and it’s not a simple thing at all.

In particular, the issues are, at least:

  1. Hyperthreading

  2. CPU hotplug

  3. CPU removal/powerdown, possibly depending on AC/battery

  4. Asymmetric systems like ARM big.LITTLE, possibly switching on AC/battery.

  5. Virtual machine environments, with VM migration, adding/removing VCPUs with no physical limit, VCPU mapping to host CPUs

  6. Complex topologies like NUMA, extra cache levels, shared execution units, etc.

I think the library needs to, at least:

  1. Define “number of CPUs” as “number of threads that delivers optimal throughput on embarassingly parallel I/O-less workloads” (i.e. such that adding one more thread doesn’t provide a benefit on any such workload)

  2. Provide three functions: one to get the current value, and one to get the maximum value after “reasonable” CPU hotplug (for VMs, this would be the host maximum value if available even though more VCPUs could be added, or some canonical value depending on the hypervisor), and one to get the overall maximum value even if it’s impractical to create that many threads.

  3. Provide a way to react to changes to the current value

  4. Maybe provide a richer API that returns the exact topology and capabilities of the CPUs, both current, the union of all possible topologies, and with the ability to listen for changes

Maybe this could be in a separate more advanced optimal_threads crate, while this crate just provides functions as they currently are?

This other crate could take a use case in its methods (like threads_for_parallel or threads_for_communicating) and match that with the platform to give an answer.


The alternative is to document num_cpus as a very rough guide, and add other methods over time to provide actual information for those who want to work out optimal parallelism themselves.


I think in either case, num_physical_cpus should return an Option/Result - if consumers of the lib just want a number to plug into a threadpool, they should use num_cpus, or use num_physical_cpus().unwrap_or(num_cpus()) to get back the current behavior (this could be in the docs for num_physical_cpus).

2 Likes