I am building an asynchronous networking library in Rust for an upcoming component of our product and I need to be able to lock N threads to N CPU cores to schedule fibers onto them. It would be ideal if it was part of the std
. I haven’t made my mind up if it should be a separate struct
as proposed below or should be a trait
and implemented for std::thread::Thread
.
- Feature Name: thread_affinity
- Start Date: 2016-01-26
- RFC PR:
- Rust Issue:
Summary
Thread affinity provides functionality to lock/unlock a specific thread to a specified CPU core. The locked threads are then not scheduled so they do not ever move to other cores by the OS scheduler. I believe it useful for a systems programmer to have the ability to set/unset set thread affinity for Rust threads.
Motivation
My motivation derives from a personal use-case of creating a fiber-based server/client application framework for Rust in my spare time. I need the ability to set thread affinity of Rust threads in an OS-independent manner to CPU cores to minimize cache misses and thread context switching.
The proposed feature provides a performance optimization for servers. It would also be useful for MIO and similar frameworks where the main event-loop is locked to one core and passes connections to workers that are also locked to the remaining cores so it takes advantage of cache-locality.
Thread context switching is quite cheap but it causes a butterfly effect where 1 thread switch causes the other N-1 to switch so its ends up being expensive when there’s 8 or more cores.
Detailed design
I strongly encourage others to improve my RFC since I am not a Rust expert and only serves to start a dialog on a proposed API.
I split it into two parts: User API and Implementation.
Locking to an arbitrary free CPU core
let t = thread::spawn(move || {
let cpu_lock = CpuLock::lock().unwrap(); // the current thread is now locked to an arbitrary free CPU core
let cpu_id = cpu_lock.cpu_id();
// thread logic goes here
// the cpu_lock will unlock from the core once out of scope
});
Here’s the equivalent but with some syntatical sugar to hide the lock if the user doesn’t care to access it:
let t = thread::lock(move || {
// thread logic goes here
// the cpu_lock will unlock from the CPU core once out of scope
});
The user may want more fine grained control of which core it should be locked to.
Locking to a specific CPU core
// locks thread to cpu core 2
let t = thread::spawn(move || {
let cpu_no = 2;
let cpu_lock = CpuLock::lock_on(cpu_no).unwrap(); // the current thread is now locked to an arbitrary free CPU core
// thread logic goes here
// the cpu_lock will unlock from the core once out of scope
});
// locks thread to cpu core 2
let cpu_no = 2;
let t = thread::lock_on(2,move || {
let cpu_lock = CpuLock::lock_on(cpu_no).unwrap(); // the current thread is now locked to an arbitrary free CPU core
let cpu_id = cpu_lock.cpu_id();
// thread logic goes here
// the cpu_lock will unlock from the core once out of scope
});
Unlocking prematurely
// locks thread to cpu core 2
let t = thread::spawn(move || {
let cpu_lock = CpuLock::lock().unwrap(); // the current thread is now locked to an arbitrary free CPU core
// thread logic goes here
cpu_lock.unlock();
// other thead logic - like cleanup, etc.
});
Implementation
The implementation is quite simple as it just needs to store the underlying thread’s id and invoke system calls to lock/unlock.
-
Windows -> SetThreadAffinityMask
-
Nix: -> PTHREAD_SETAFFINITY_NP(3)
The implementation needs to know the number of available CPUS and manage state to know which CPUs are available for arbitrary locking.
To get the affinity mask from the cpu no is trivial:
let cpu_no = 2;
let affinity_mask = 1 << cpu_no;
Corner-cases
-
Passing an cpu number out of bounds.
-
Moving the lock out of the thread.
-
Moving the locked from one thread to another thread.
Drawbacks
I believe the drawbacks of the current RFC are:
- This RFC does not provide any empirical evidence of the computational benefits. I can spend time to provide benchmarks from a C/C++ benchmark.
- Thread affinity can have the adverse effect - degrading performance if it interferes with the OS’s scheduler if used in the wrong way.
- My lack of experience of Rust could be prohibiting an optimal design.
- Could be regarded as adding complexity to the standard library.
Alternatives
Calling unsafe OS-specific system calls to create threads and changing the affinity mask because std::thread::Thread
do not expose the underling OS-specific thread id (AFIAW).
I’m writing a Rust asynchronous networking library in my spare time so I can use Rust for an upcoming project. I want to set thread affinity to lock N threads to N CPU cores to schedule fibers to minimize cache-misses and thread scheduling. I think it maybe a good idea to include it to the std library.
Unresolved questions
No research has been conducted into the following hypotheses:
-
Is it possible for consistent & symmetric semantics across all supported operating systems?
-
Is it better to be an external library rather than in std?
-
What empirical measurements are there to justify the computational benefits of thread affinity?
-
Is there a better API design?