I noticed, while looking at some generated code on godbolt, that something changed in function calls between stable (1.30) and beta (1.31), and I was wondering if that was a deliberate change and if yes, where I could read about it.
The difference is that with stable, function calls look like:
call example::foo@PLT
while in beta, they look like:
call qword ptr [rip + example::foo@GOTPCREL]
A notable difference between the two is that in the former case, the symbol resolution will happen at first-call time (except when linked with bindnow), while in the second, it will happen at startup.
Interestingly, while this changed on x86-64, this didn’t change on x86.
3 Likes
cuviper
November 27, 2018, 10:05pm
2
rust-lang:master
← GabrielMajeri:no-plt
opened 04:55PM - 26 Sep 18 UTC
This PR gives `rustc` the ability to skip the PLT when generating function calls… into shared libraries. This can improve performance by reducing branch indirection.
AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already [enables full relro for security](https://github.com/rust-lang/rust/pull/43170), lazy binding was disabled anyway.
This is a little known feature which is supported by [GCC](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html) and [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fplt) as `-fno-plt` (some Linux distros [enable it by default](https://git.archlinux.org/svntogit/packages.git/tree/trunk/makepkg.conf?h=packages/pacman#n40) for all builds).
Implementation inspired by [this patch](https://reviews.llvm.org/D39079#change-YvkpNDlMs_LT) which adds `-fno-plt` support to Clang.
## Performance
I didn't run a lot of benchmarks, but these are the results on my machine for a `clap` [benchmark](https://github.com/clap-rs/clap/blob/master/benches/05_ripgrep.rs):
```
name control ns/iter no-plt ns/iter diff ns/iter diff % speedup
build_app_long 11,097 10,733 -364 -3.28% x 1.03
build_app_short 11,089 10,742 -347 -3.13% x 1.03
build_help_long 186,835 182,713 -4,122 -2.21% x 1.02
build_help_short 80,949 78,455 -2,494 -3.08% x 1.03
parse_clean 12,385 12,044 -341 -2.75% x 1.03
parse_complex 19,438 19,017 -421 -2.17% x 1.02
parse_lots 431,493 421,421 -10,072 -2.33% x 1.02
```
A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. [This comment](https://patchwork.ozlabs.org/patch/468993/#1028255) suggests that, in some cases, `-fno-plt` could improve PIC/PIE code performance by 10%.
## Security benefits
**Bonus**: some of the speculative execution attacks rely on the PLT, by disabling it we reduce a big attack surface and reduce the need for [`retpoline`](https://reviews.llvm.org/D41723).
## Remaining PLT calls
The compiled binaries still have plenty of PLT calls, coming from C/C++ libraries. Building dependencies with `CFLAGS=-fno-plt CXXFLAGS=-fno-plt` removes them.
I don't know why x86 would be missing though.
glandium:
A notable difference between the two is that in the former case, the symbol resolution will happen at first-call time (except when linked with bindnow), while in the second, it will happen at startup.
FWIW, bindnow is also on by default for most targets.
2 Likes
x86 (32-bit) doesn’t have PC-relative addressing
2 Likes
system
Closed
March 25, 2019, 8:31am
4
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.