Pre-RFC: Thread Affinity


#1

I am building an asynchronous networking library in Rust for an upcoming component of our product and I need to be able to lock N threads to N CPU cores to schedule fibers onto them. It would be ideal if it was part of the std. I haven’t made my mind up if it should be a separate struct as proposed below or should be a trait and implemented for std::thread::Thread.

  • Feature Name: thread_affinity
  • Start Date: 2016-01-26
  • RFC PR:
  • Rust Issue:

Summary

Thread affinity provides functionality to lock/unlock a specific thread to a specified CPU core. The locked threads are then not scheduled so they do not ever move to other cores by the OS scheduler. I believe it useful for a systems programmer to have the ability to set/unset set thread affinity for Rust threads.

Motivation

My motivation derives from a personal use-case of creating a fiber-based server/client application framework for Rust in my spare time. I need the ability to set thread affinity of Rust threads in an OS-independent manner to CPU cores to minimize cache misses and thread context switching.

The proposed feature provides a performance optimization for servers. It would also be useful for MIO and similar frameworks where the main event-loop is locked to one core and passes connections to workers that are also locked to the remaining cores so it takes advantage of cache-locality.

Thread context switching is quite cheap but it causes a butterfly effect where 1 thread switch causes the other N-1 to switch so its ends up being expensive when there’s 8 or more cores.

Detailed design

I strongly encourage others to improve my RFC since I am not a Rust expert and only serves to start a dialog on a proposed API.

I split it into two parts: User API and Implementation.

Locking to an arbitrary free CPU core

let t = thread::spawn(move || {
    let cpu_lock = CpuLock::lock().unwrap(); // the current thread is now locked to an arbitrary free CPU core
    let cpu_id = cpu_lock.cpu_id();
    // thread logic goes here
    // the cpu_lock will unlock from the core once out of scope
});

Here’s the equivalent but with some syntatical sugar to hide the lock if the user doesn’t care to access it:

let t = thread::lock(move || {
    // thread logic goes here
    // the cpu_lock will unlock from the CPU core once out of scope
});

The user may want more fine grained control of which core it should be locked to.

Locking to a specific CPU core

// locks thread to cpu core 2
let t = thread::spawn(move || {
    let cpu_no = 2;
    let cpu_lock = CpuLock::lock_on(cpu_no).unwrap(); // the current thread is now locked to an arbitrary free CPU core
    // thread logic goes here
    // the cpu_lock will unlock from the core once out of scope
});
// locks thread to cpu core 2
let cpu_no = 2;
let t = thread::lock_on(2,move || {
    let cpu_lock = CpuLock::lock_on(cpu_no).unwrap(); // the current thread is now locked to an arbitrary free CPU core
    let cpu_id = cpu_lock.cpu_id();
    // thread logic goes here
    // the cpu_lock will unlock from the core once out of scope
});

Unlocking prematurely

// locks thread to cpu core 2
let t = thread::spawn(move || {
    let cpu_lock = CpuLock::lock().unwrap(); // the current thread is now locked to an arbitrary free CPU core    
    // thread logic goes here
    cpu_lock.unlock();
    // other thead logic - like cleanup, etc.
});

Implementation

The implementation is quite simple as it just needs to store the underlying thread’s id and invoke system calls to lock/unlock.

  • Windows -> SetThreadAffinityMask

  • Nix: -> PTHREAD_SETAFFINITY_NP(3)

The implementation needs to know the number of available CPUS and manage state to know which CPUs are available for arbitrary locking.

To get the affinity mask from the cpu no is trivial:

let cpu_no = 2;
let affinity_mask = 1 << cpu_no;

Corner-cases

  1. Passing an cpu number out of bounds.

  2. Moving the lock out of the thread.

  3. Moving the locked from one thread to another thread.

Drawbacks

I believe the drawbacks of the current RFC are:

  1. This RFC does not provide any empirical evidence of the computational benefits. I can spend time to provide benchmarks from a C/C++ benchmark.
  2. Thread affinity can have the adverse effect - degrading performance if it interferes with the OS’s scheduler if used in the wrong way.
  3. My lack of experience of Rust could be prohibiting an optimal design.
  4. Could be regarded as adding complexity to the standard library.

Alternatives

Calling unsafe OS-specific system calls to create threads and changing the affinity mask because std::thread::Thread do not expose the underling OS-specific thread id (AFIAW).

I’m writing a Rust asynchronous networking library in my spare time so I can use Rust for an upcoming project. I want to set thread affinity to lock N threads to N CPU cores to schedule fibers to minimize cache-misses and thread scheduling. I think it maybe a good idea to include it to the std library.

Unresolved questions

No research has been conducted into the following hypotheses:

  • Is it possible for consistent & symmetric semantics across all supported operating systems?

  • Is it better to be an external library rather than in std?

  • What empirical measurements are there to justify the computational benefits of thread affinity?

  • Is there a better API design?


#2

This is the sort of thing that should be easily implementable in a third party crate especially since I recently added platform specific extension traits to get the handle/pthread_t of threads that you spawn with Rust. Once you do have a solid crates.io implementation then it should be easy for the lib subteam to decide whether such a thing should be added to std and you won’t have to wait for it to land in std and stabilize, you can use the crates.io version immediately.


#3

As a sidenote, OS X does not support strictly binding threads to specific CPUs, but only assigning arbitrary “affinity tags”, which the kernel uses a hint by trying (but not promising) to run processes with the same tag on the same CPU. Thus, a cross-platform API must have a relatively expansive definition of “CPU number”, and in particular not guarantee that a CPU “lock” actually serves as a lock, in the sense that e.g. memory ordering will be strongly consistent between processes on the same CPU.


#4

Thanks for the feedback so far.

@retep998 Do you have a link to the traits that you added to get access to the underlying native thread? It should be really useful.

@comex Thanks for the heads-up. It looks like OS X doesn’t even have pthread_setaffinity_np in the pthread header and looks like it has its own Affinity API to set affinity.Good point w.r.t memory consistency so I will drop calling it a lock so it doesn’t mislead.


#5

@lambdaburrito On windows it is std::os::windows::io::AsRawHandle but I can’t link to it because the online docs are for linux only. For everything else it is std::os::unix::thread::JoinHandleExt.


#6

I’d like to have this as well for at least one of my applications. Is anyone taking responsibility for implementing this or is it still up in the air? I can get it started on Windows and Linux.

The only relevant project I could find is scheduler which binds the thread affinity API on Linux but Linux only. I’m not sure if the authors would be interested in going crossplatform, but it couldn’t hurt to ask.


#7

Hey @logician, you’re very welcome to co-create a crate for thread affinity especially since you can help with Window (I have Windows but just for playing dota 2). Ping me at jp [at] signalanalytics [dot] co


#8

If you are interested, the nix crate does this for Linux:

http://rustdoc.s3-website-us-east-1.amazonaws.com/nix/master/linux/nix/sched/fn.sched_setaffinity.html


#9

Blah blah Windows blah