Optimization barriers suitable for cryptographic use

Neutron3529 · June 20, 2024, 2:04am

I have another idea, maybe a conditional_move function is all we need.

#[inline(always)]
fn conditional_move<T>(flag:bool, target:&mut T, truthy:T, falsy:T) {
    std::hint::black_box(&falsy); // ensure both variable are calculated
    std::hint::black_box(&truthy); // It is reliable since it's used in benchmark
    *target = if flag {
        truthy
    } else {
        falsy
    }
}

the std::hint::black_box will ensure both the truthy and falsy are calculated (since it is the main case for benchmarks), and thus the final if-else could be directly optimized to a cmove instruction. Even if the final optimize step is not applied, since we ensure both truthy and falsy are calculated, the difference of execution time should be negliable.

farnz · June 20, 2024, 8:22am

Note that this also involves a CPU mode flag in Intel land; Intel does not guarantee constant time relative to data items for any instructions unless the logical CPU is in "Data Operand Independent Timing Mode".

DragonDev1906 · June 20, 2024, 9:50am

I think black_box should not be used/considered here (except for workarounds if needed): It's main purpose (as far as I can tell) is to prevent the compiler from optimizing something away, similar to #[inline(never)] prevents the compiler from inlining the function. The name (nor description) doesn't (and imo shouldn't) say anything about constant time because that is not its main purpose.

That should be solved by a different function/macro (e.g. const_time), with the only purpose of trying or guaranteeing to make something evaluate in constant time:

No jumps unless both branches have the same evaluation time (hard)
No optimizing of a function if that changes its runtime
Use of Intels CPU mode flag if applicable
...?

Throwing those two things into the same black_box function doesn't make sense imo, especially since they are usually used in completely different places, in one of which you want such optimizations

farnz · June 20, 2024, 1:59pm

IIUC, the goal for cryptography is not genuinely constant time; it's data-independent timings, where the data is the key plus the plain text (in or out). This would allow you to change the constraints in your list to more like the following:

No conditional jumps unless the execution time is the same whether the branch is taken or not, regardless of microarchitecture state (branch predictors etc).
No optimizations that make the runtime of a function dependent on the input data.
Use of CPU mode flags where present (Intel) to disable data-dependent instruction timings.
Disable use of instructions with unconditional data-dependent timings such as Cortex-M3 SMULL, UMULL, SMLAL, UMLAL and division operations.
…?

You probably end up annotating a function as #[data_independent_timing], and the compiler then knows that it cannot optimize the function or anything it calls such that the time taken for this function depends on the parameters to the function; there's going to be fun decisions lurking about things it calls (must they also be marked #[data_independent_timing]? Will the compiler happily emit two versions, one with data-independent timing, one without?), but this, IIUC, is the guarantee that cryptography needs.

bascule · June 20, 2024, 6:08pm

Though that will theoretically evaluate both inputs, it still contains a branch.

Where black_box would help for that sort of thing, which is actually quite similar to what's happening in the original code example, is to use bitwise masking to select from those two values, with the mask value accessed through black_box to ensure LLVM doesn't try to rewrite it as a branch.

To get better guarantees, where available you can use dedicated CPU instructions for predication via asm! like the x86 CMOV family or the ARM CSEL family. This is implemented in the cmov crate (which uses the masking approach as a portable fallback, and could probably benefit from black_boxing the mask):

https://crates.io/crates/cmov

Though LLVM will otherwise rewrite CMOV instructions as branches in the x86-cmov-conversion pass, it won't touch asm! blocks.

bascule · June 20, 2024, 6:13pm

I covered an implementation of this sort of idea in the talk I linked above: GitHub - klutzy/nadeko: [INACTIVE] const-time Rust experiment

It's implemented as a proc macro which outputs asm! (though it's very old and outputs the old unstable LLVM asm!)

It's pretty tricky to implement and really needs to be a "crypto-compiler" carefully designed to never branch on a secret value or use such a value in a pointer computation, like the prospective "secret types" backend for LLVM would have to have been written.

I don't imagine such a codegen backend making its way into rustc proper, but perhaps there would be merit to reviving the proc macro based approach emitting stable asm! with x86(_64) and ARM backends, falling back to some portable pure Rust on other platforms.

farnz · June 20, 2024, 7:21pm

I suspect you could even set it up so that you have to use a feature flag like "permit-insecure-cryptography-targets" to have it compile on unsupported platforms (with the feature flag completely ignored on targets with a secure backend). That would at least make it opt-in, albeit that (because of feature unification), people could opt-in from anywhere in the dependency tree.

bascule · June 21, 2024, 2:30am

Oh hey, here's an actual implementation of a proc macro-based DSL for x86_64 assembly generation:

binarycat · June 22, 2024, 2:20am

there's also the option of non-inline assembly, although that would require a custom build script and a standalone assembler as an additional build dependancy.

RalfJung · June 25, 2024, 12:27pm

With the current state of backends, I don't think there is anything Rust can do here. Someone has to carry the burden of "manually inspect what LLVM does here since there's nothing else that works". I don't think it is realistic to ask the compiler team to carry that burden -- there are simply too many variables. Crypto code can decide to engage in off-label use of black_box as that is realistically their best option, but it should be clear who is carrying the responsibility for such off-label use: the people writing the crypto code.

To be clear, I would love there to be a better option, I just don't see any short-term way to get there. Having an optimization barrier in Rust that is explicitly intended to provide guarantees for cryptographic use would be a significant step ahead of the current state of the art, I don't know any barrier in any language that can do that. Implementing this is I think just as hard as secret types. It's not something that can be done by changing a comment somewhere.

The black_box comment is deliberately scary because it is a common misconception that black_box can be used to hide UB from the compiler, and we have to make absolutely sure that people understand that this does not work.

I'm afraid I don't think there is anything short of secret types that can provide this.

I don't understand how such a comment would satisfy your request of "approving" black_box for cryptographic use.

newpavlov · June 25, 2024, 1:20pm

What about making black_box wrapper around an empty asm! as proposed here? It's still not a bulletproof solution (e.g. a "smart" backend may in theory see that the asm! block is empty and optimize accordingly), but much better than the current status quo where backends may simply ignore black_box.

bjorn3 · June 25, 2024, 1:56pm

black_box is implemented as an empty asm block in the LLVM backend. Other backends may need to implement it as a no-op if for example they don't support inline asm at all. black_box breaking entirely with those backends is not better than the status quo IMHO.

newpavlov · June 25, 2024, 2:01pm

My point is that such behavior should be considered a deficiency in those backends. If they can not support asm at all, they should return a compilation error on black_box and asm!. You would not expect an asm! block being replaced by a no-op because a backend does not support inline asembly, would you?

bjorn3 · June 25, 2024, 2:05pm

A backend may have a dedicated instruction for black_box support without needing inline asm. It would not be valid to replace an empty asm block with said instruction (inspecting the contents of an asm block is not allowed), yet it can easily support black_box with intended semantics.
We do not guarantee that black_box does anything, so a backend may reasonably want to implement it as no-op to support benchmarking crates like for example criterion.

newpavlov · June 25, 2024, 2:10pm

And I would be happy with it. The empty asm! block suggestion was an answer to how we can model black_box, not how it must be implemented by backends. The potential implementation of black_box with an explicit asm! block is one of the ways to reduce burden mentioned in the Ralf's post above, but alternative backends could replace black_box calls with their own special instruction.

And we go full cirlce yet again... The proposal is to make black_box to guarantee that it's equivalent to an empty "observing" asm! block.

Also, I do not consider replacing black_box with a no-op a proper support of benchmarking crates like criterion. I would prefer to get a compilation error, than to get misleading benchmarking results.

bjorn3 · June 25, 2024, 2:20pm

I thought you meant literally changing the black_box implementation with asm!() to force all backends to never implement it as no-op.

That doesn't provide any meaningful guarantee for crypto purposes. It still allows the compiler to insert a branch on the resulting value. And for an assembly level optimizer (like wasm-opt for WebAssembly (fully compliant with the as-if rule) or LLVM BOLT for x86_64 (doesn't work with self-modifying/self-observing code)) it doesn't provide any optimization barrier at all.

newpavlov · June 25, 2024, 2:29pm

How many times should I mention the all-or-nothing fallacy? I explicitly wrote that I agree that it's not a bulletproof solution, but it's still better than the current status quo. At the very least, it closes the concern of alternative backends ignoring black_box (well, they still could, but it would be the backend's fault).

Assembler-level optimizers are far out of scope for this discussion. Programming languages like Rust simply can not do anything about those and their users should understand potential implications of using them.

farnz · June 25, 2024, 3:27pm

I'm not convinced that it's better than the status quo; if black_box is potentially a no-op, and an empty asm! block is potentially a no-op, with no guarantees in either case (because an implementation that uses LLVM BOLT, or wasm-opt, can ignore the empty block for optimization purposes), then what's the gain of saying that black_box is the same as an empty asm!?

For this to be the "all-or-nothing" fallacy, you need an empty asm! block to provide language-level (not implementation-determined) guarantees that black_box currently doesn't, even if those guarantees are not sufficient to be completely usable for cryptography; but that's not what I'm seeing here.

Rather, I'm seeing that at the language level, the guarantees of an empty asm! block are the same as black_box provides today, and it's just that the main implementation happens to not optimize empty asm! blocks by default - but there's nothing that stops that from changing in the future.

the8472 · June 25, 2024, 3:47pm

I don't see the fallacy. The standard library aims to provide portable abstractions with specified behavior. "It does something for which we don't have formal semantics, that you can't rely on to do what you actually need, on some platforms" is not something we want to write in the documentation because those amount to "check if the compiler output does what you need". In which case you don't need a specification, you just check that the output does what you want. Specifications are needed if you want to sleep soundly at night without having to double-check that each build preserves the desired semantics.

To restate it in a different way: You can even ignore every Safety precondition and write UB as much as you like if you verify the compiler output. All our API contracts are null and void if you're already doing output verification work. They only exist to document which semantics will be preserved in the output without you having to check it.

It follows that all language on black_box is also irrelevant if you do output verification.

Occasionally we do write something about best- or limited-effort behavior. But that's usually about documenting platform bugs (and our workarounds) or drawing lines around things that are out of scope for the standard library but people are trying to do anyway.

It's akin to writing a general note like the following on every single method in the standard library:

This method may have additional behaviors not part of its API contract that you might find useful. If you want to rely on it then QA is on you. Good luck.

Which is a non-statement. So we don't do it.

ryanavella · June 25, 2024, 4:23pm

I can't think of a systems language that provides these guarantees, but I've seen languages with backends/optimizer-suites based on term rewriting systems which are able to provide similar guarantees.

Unfortunately LLVM can't benefit from this prior art, because its internal representations and its "as-if" paradigm are too fundamentally different from a term-rewriting system. I could however imagine LLVM providing a hypothetical partial-guarantee where the user is warned when a specific codegen property is not met.

Topic		Replies	Views
Annotations for zeroing the stack of "sensitive" functions which deal in transient secrets language design	29	4361	April 22, 2020
Rustc copying cryptographic keys onto the stack instead of using them via pointer compiler	29	1679	January 4, 2024
Why is a trusted, feature-complete crypto library not a top priority for the Rust community? libs	19	21185	March 25, 2019
Operational semantics and high-level vs low-level language design	24	2934	July 4, 2020
[Pre-RFC #2]: Inline assembly language design	161	10801	March 15, 2020

Optimization barriers suitable for cryptographic use

Related topics