Pre-RFC: optimise(size) and optimise(no) attributes


#1
  • Feature Name: optimise_attr
  • Start Date: 2018-03-26
  • RFC PR: (leave this empty)
  • Rust Issue: (leave this empty)

Summary

This RFC introduces the #[optimise] attribute, specifically #[optimise(size)] and #[optimise(no)], which allow controlling optimisation level on a per-item basis.

Motivation

Currently, rustc has only a small number of optimisation options that apply globally to the crate. With LTO and RLIB-only crates, it is likely, that a whole-program optimisation level would remain. This is a troublesome trend, as it removes one of the more important knobs for the size-awaare applications such as embedded.

For those applications it is critical that they are able to specify certain pieces of code to optimise in a different manner (e.g. without unrolling the loops) so that the code is able to satisfy the size constraints.

In C-world this is very easy to achieve by simply compiling an object without optimisations. In Rust world where Cargo decides your build commands for you and the compiler flags are per-crate, such approach is less feasible.

Guide-level explanation

Sometimes, optimisations are a tradeoff between execution time and the code size. Some optimisations, such as loop unrolling increase code size many times on average (compared to original function size).

#[optimise(size)]
fn banana() {
    // code
}

Will instruct rustc to consider this tradeoff more carefully and avoid optimising in a way that would result in larger code rather than a smaller one. It may also have effect on what instructions are selected to appear in the final binary.

Note that #[optimise(size)] is a hint, rather than a hard requirement and compiler may still, while optimising, take decisions that increase function size.

In addition a way to entiely disable optimisations for a function is provided:

#[optimise(no)] // implies #[inline(never)]
fn banana() -> i32 {
    2 + 2
}

Would prevent optimisations from running on the function banana only, resulting in an explicit instruction adding the two and two together, rather than constant-folded 4:

; x86 assembly
banana:
movl	$2, %eax
addl	$2, %eax
retq

Note, that some cross-function optimisations may still be able to make decisions depending on the implementation of banana.

Reference-level explanation

optimise(no)

The #[optimise(no)] attribute applied to a function definition will inhibit most of the optimisations, with exception of cross-function ones, that would otherwise be applied to the function.

This attribute implies #[inline(never)] and specifying any other inline attribute in conjunction with #[optimise(no)] will cause a compilation error.

In context of MIR optimisations, the optimisation pass would skip over functions annotated with this attribute.

optimise(size)

The #[optimise(size)] attribute applied to a function definition will instruct the optimisation engine to avoid applying optimisations that could result in a size increase and macine code generator to generate code that’s smaller rather than larger.

Note that the optimise(size) attribute is just a hint and is not guaranteed to result in any different or smaller code.

Drawbacks

  • Not all of the alternative codegen backends may be able to express such a request, hence the “this is an optimisation hint” note on the #[optimise(size)] attribute. RFC mandates support for #[optimise(no)], so a backend that does not support disabling optimisations on a per-function basis would end up not being full featured.

Rationale and alternatives

Proposed is a very semantic solution (describes the desired result, instead of behaviour) to the problem of needing to sometimes inhibit some of the trade-off optimisations such as loop unrolling.

Alternative, of course, would be to add attributes controlling such optimisations, such as #[unroll(no)] on top of a a loop statement. There’s already precedent for this in the #[inline] annotations.


optimize instead of optimise… or both?

Prior art

  • LLVM: optsize, optnone, minsize function attributes (exposed in Clang in some way);
  • GCC: attribute((optimize)) function attribute which allows setting the optimisation level and using certain(?) -f flags for each function;
  • IAR: Optimisations have a checkbox for “No size constraints”, which allows compiler to go out of its way to optimise without considering the size tradeoff.

Unresolved questions

  • N/A?

cc @nikomatsakis @japaric


#2

Interesting proposal!

Would #[optimize(size)] have any effect on monomorphization (i.e: inhibiting it)?


#3

It is obvious to me that we need such knobs, and attributes are the natural way to expose them. It will also be trivial to implement since LLVM has matching attributes already.

The only open question for me is bikeshedding: I think we’ll also (in addition to, not instead of this) want some control over specific optimizations like loop unrolling, automatic vectorization, and annotating individual calls as cold. It would be good if the attribute name/format chosen was forward compatible with this,


#4

I understand the need for #[optimise(size)] on specific items in size-constrained embedded environments, but what’s the use case for #[optimise(no)]?


#5

Working around optimizer bugs mostly? Or de jure correct optimizer behavior breaking you code because it relies on UB (mostly relevant for C/C++).
Imagine you are porting a large codebase from one compiler version to another and the new version contains… surprises.


#6

Even when everything is working correctly, there are good reasons for disabling optimizations in some parts of your code:

  • Improved debug info when you debug in release mode (all but the lowest optimization level tends to make a total mess of it, and even mild optimizations can impact debug info quality somewhat).
  • You found that the optimizer takes a lot of time for no/little gain on some part of your code (arguably a bug, but still needs working around)
  • The optimizer does improve performance, but the code is rarely used so you don’t want to spend time on optimizing it (while still optimizing everything else)

#7

I would prefer #[optimize(none)] or #[optimize(never)] . no looks like “number” or something.


#8

+1 for optimize(never) due to consistency with inline(never)


#9

When implementing security-sensitive algorithms compiler optimisations are often disabled so that asymmetries are avoided. These might otherwise be used in side-channel attacks.


#10

Also debugging. Sometimes I want an otherwise optimized binary, but for one function, I want to step through the assembly.


#11

For this one, a module-level attribute might actually be better.


#12

Can we also have an #[optimize(always)] to increase rather than decrease optimization level? Useful for hot loops involving, say, long chains of iterator methods, which the optimizer can condense down into near-optimal machine code, but are incredibly inefficient when it’s off.


#13

Considering that “optimize” can mean many different things (unrolling, inlining, branch predictor hinting…), I would spontaneously prefer more focused optimization hints (like we already have), in addition to perhaps some support for profile-guided optimization in rustc.


#14

Sadly, no. There are a few angles to this answer, one of which is that property of optimisation is a whole-crate one – optimisation does not only optimise within a function, but also across functions as well (inlining is such an example of cross-function optimisation). For e.g. modules such attribute would make a lot of sense, but LLVM doesn’t currently expose an attribute to achieve that, so workarounds like translating the module into a separate object/codegen-unit would be necessary.

That is a feasible extension to the current proposal, yes.

I don’t think, not currently at least. Control of monomorphisation is already pretty explicit in Rust (i.e. it is up to you if you use a banana<T>(t) or banana(&t), and I don’t think it would be beneficial at all for this attribute to implicitly alter the current behaviour.


#15

I don’t think it makes sense to associate codegen options with a construct that in Rust is supposed to be just for scoping.

Mm, this is too bad, and indeed it seems that Clang doesn’t support the attribute-optimize GCC extension you mentioned under prior art. Nevertheless, that workaround doesn’t seem infeasible at all, though it would be more complicated than just exposing optsize and optnone.


#16

Oh, right. Another point to more precise optimisation attributes would be the very likely future addition of branch weights, which would be specified on the enum variants and/or match arms.


#17

Are fast math flags completely orthogonal to these, or do you expect to expose those via these attributes?


#18

optimize(no) is a very implementation-oriented directive. It’d be interesting to see how practical it would be to capture intent instead. Addressing the use cases raised here:

When implementing security-sensitive algorithms compiler optimisations are often disabled so that asymmetries are avoided. These might otherwise be used in side-channel attacks.

LLVM doesn’t guarantee that optnone will disable all optimizations. As one example, IPSCCP and other interprocedural optimizations can move code out of an optnone function into a non-optnone function, and noinline doesn’t disable them. As another, fast-isel sometimes falls back to selectiondag-isel, and selectiondag-isel does some optimizations automatically. If you’re doing crypto or similar and really need to be sure about timings, perhaps we should talk something like #[timing_sensitive] so that we can have a conversation with implementors about what’s needed to actually implement that reliably.

Imagine you are porting a large codebase from one compiler version to another and the new version contains… surprises.

and

You found that the optimizer takes a lot of time for no/little gain on some part of your code (arguably a bug, but still needs working around)

Thought experiment: what if you required the attribute to have a compiler identifier/version? Something like #optimize(no, LLVM, 4.3), which would only disable optimizations for a specific compiler/version (or perhaps range of versions)? That might discourage these attributes from applying in contexts where the original intent doesn’t apply.

Improved debug info when you debug in release mode (all but the lowest optimization level tends to make a total mess of it, and even mild optimizations can impact debug info quality somewhat).

There are ways to do optimization that don’t disrupt debugging, eg. work on the -Og flag, that optimize(no) would preclude. Would something like #[single_step_debugging], capture the intent here?

The optimizer does improve performance, but the code is rarely used so you don’t want to spend time on optimizing it (while still optimizing everything else)

Rust already covers this with #[cold].


#19

Maybe this is a silly question, but it seems like for #[optimize(never)] to be a meaningful attribute we have to agree what “unoptimized code” means, which seems a lot more subtle than “small memory footprint” or “don’t run the optimization passes”. If a disagreement arose as to whether some sequence of assembly instructions was a correct compilation of some code with #[optimize(never)] (I assume crypto people will file bug reports like this), it’s not obvious to me how we’d determine who’s right. For instance, if the hardware has 50 different instructions that could be used for addition, and the compiler happens to choose lea or whatever when compiling 2+2, is that “optimized” compared to the boring, bogstandard add instruction?

Judging by the responses my last post got, it feels like #[optimize(never)] is sort of like a combination of #[preserve_debuginfo], #[inline(never)], #[timing_sensitive] and #[opt_level(0)]. That last one would be an actual compiler knob where we just accept there are no firm semantic guarantees, but the other three are specific enough that I can imagine getting a reasonable consensus on what does or doesn’t qualify as a miscompilation.


#20

@sunfishcode mentioned -Og, but just for the record, here’s how GCC documents it:

Optimize debugging experience. -Og enables optimizations that do not interfere with debugging. It should be the optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization while maintaining fast compilation and a good debugging experience.

In Clang, -Og is currently a synonym for -O1, but according to the documentation, “In future versions, this option might disable different optimizations in order to improve debuggability.” I’m not sure how well the current behavior upholds the goal of not interfering with debugging.

This seems like something rustc should support as an argument to -C opt-level (which already supports 0-3, s, and z).

As for an attribute, perhaps something like #[optimize(debugging)]? For now it could just be translated to optnone – even though that brings the optimization level to the equivalent of -O0 rather than -O1 – but with the potential to do better in the future. Alternately, the idea of using separate codegen units for functions with optimization attributes, which was previously suggested as a hack to support #[optimize(always)], could also be used here…

In any case, I do think there should also be an #[optimize(never)], for when you just know better than the compiler :slight_smile: