Idea: `#[opt_level(level)]` for performance-sensitive functions

For some algorithm impl core functions (e.g. permutation functions for various crypto algorithms), I sometimes want them to be optimized in debug mode as well, because whether they are optimized or not can make a huge difference in performance and size*, and in most cases debug mode is not for debugging these functions, which requires that they are still optimized while the rest of the program is complied in debug mode. First of all, I guess this may require creating some kind of "boundaries", is this possible with our current compiler? Then, the way to mark such functions could be an attribute like #[optimize(condition)], indicating that optimization is required in the case specified by condition. In addition to the debug mode, other conditions (such as arch, etc.) may also be useful.

*The keccak::f1600 function under x86_64-pc-windows-msvc measured a performance difference of 89.4x between debug and release (measured by subtracting SystemTime before and after executing a million times on an empty state), and a size difference of 24.2x (measured by cargo-bloat).

2 Likes

I think having this would be useful, but I just want to mention that you can optimize your dependencies while building your own code in debug mode.

3 Likes

I think this is only possible on linux. I've tried it on windows it just silently broke everything (I tried finding out why but couldn't come up with anything).

https://doc.rust-lang.org/cargo/reference/profiles.html#overrides

1 Like

The build type tends to get conflated with the selection of which runtime to use. On Windows, there are two incompatible runtimes: a release runtime and a debug runtime. You can compile your code however you want and independently select which runtime to use, but profiles will tend to choose based on how your crate is compiled. The runtime selection must be uniform across a program [1]. I doubt you're debugging the runtime, so using the release runtime and compiling your code with debug symbols seems the most reasonable thing to do (FWIW, CMake calls this "RelWithDebInfo").


  1. there are ways around this, but probably not relevant here as the stdlib is really the deciding factor for Rust programs AFAIK. ↩ī¸Ž

2 Likes

I don't really have opinions on whether #[optimize] would be wanted/possible (or what semantics it would have, does it force transitive optimization too somehow?). But for cfg-based conditionality we already have cfg_attr, so the attribute doesn't need to implement that internally, it can just be used like #[cfg_attr(some_conditon, optimize)].

1 Like

An optimize attribute already exists, also adds optimize(size) and optimize(speed) but is blocked on some attribute propagation stuff mentioned in the RFC rendered here. It's tracked at #54882.

2 Likes

It seems to only work for the current workspace. tiny-keccak has this in its own Cargo.toml, but using it as a dependency for the same test, just replacing keccak::f1600 with tiny_keccak::keccakf, the performance and sizing behaves the same. Perhaps it would be better if this was up to the crate author and affected all dependents. (Should there be a way for dependents to be able to turn it off?) And it would also be better to refine the granularity to functions rather than crates; crypto algorithms have peripheral abstractions in addition to the performance-sensitive core functions, and optimizing only the core functions would help with debugging the peripheral abstractions.

It was the name of the option opt-level that reminded me. The #[optimize(condition)] was just a random thought, I didn't even consider cfg_attr. I would prefer the form like #[cfg_attr(debug, opt_level(3))] now. I will change the title.

The optimize attribute's discussion had missed some things in the RFC, like specifying the optimisation level. You can add these missing features; see this comment about that.