I come up with that idea while working on PR https://github.com/rust-lang/rust/pull/98824
Basically, implementing std::mem::swap
using read_unaligned
and write_unaligned
greatly reduces number of memory reads and writes while reducing generated code size: Compiler Explorer
Also it solves this problem: rust/swap-small-types.rs at b04bfb4aea99436a62f6a98056e805eb9b0629cc · rust-lang/rust · GitHub
Also, there is 10 years old article which claims that misaligned reads and writes are free on newer x86-x64 processors: Data alignment for speed: myth or reality? – Daniel Lemire's blog
And there is also article which tells that unaligned access on ARMv7 is slow and is not supported in older ARMs: The curious case of unaligned access on ARM | by Levente Kurusa | Medium
As I understand, main problem with misaligned access is that they may result in reading 2 cache lines instead of 1. But this concern is not very important for std::mem::swap
implementation because we would load whole values into exclusive cache line anyway.
So: should I implement our std::mem::swap
for x86_x64 using unaligned reads/writes?
Maybe we even have some compile-time check to put in cfg guard like "is_misaligned_memory_access_fast" and I just don't know about it?