Optimization barriers suitable for cryptographic use

bascule · June 18, 2024, 9:59pm

In 2014 I gave a talk on cryptography in Rust, highlighting how LLVM can look at cryptographic code which uses bitwise masking in an attempt to be constant-time and decide to insert a branch:

Today, we just published a security advisory for curve25519-dalek where such a problem was occurring:

This is of course not a Rust-specific problem, but more of an LLVM problem (and really, any sufficiently smart optimizing compiler). Some cryptographers recently encountered a similar problem with Clang when compiling the reference implementation of Kyber, a new post quantum key exchange algorithm:

https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/hqbtIGFKIpU/m/cnE3pbueBgAJ

What would be really handy here, short of full-blown secret types which hint to LLVM that they shouldn't be branched upon, is literally any kind of optimization barrier considered "approved" for cryptographic use.

On the Kyber thread, they mitigated the problem using a volatile read, and we used a similar approach for the curve25519-dalek fix, with the idea that compilers won't elide volatile reads.

We defined our own black_box function because core::hint::black_box has a big scary warning on it

Programs cannot rely on black_box for correctness , beyond it behaving as the identity function. As such, it must not be relied upon to control critical program behavior. This immediately precludes any direct use of this function for cryptographic or security purposes.

So that really leaves us with nothing short of volatile reads, and even building our own black_box with it feels like an off-label use. Now the thing is, core::hint::black_box fixes the specific problem of LLVM inserting jns instructions on x86/x86_64 targets, but if we were to use it, someone would probably open an issue pointing to the documentation that it isn't approved for cryptographic use.

Looking at the fix though, it still feels quite brittle and low-level. Someone could easily introduce new code which reads the mask without using black_box, and that could potentially introduce another branch.

I think what might be handy here is a VolatileCell-like type, something like BlackBox<T> (where T: Copy), which has a read(&self) -> T method as the only way to access the inner value, which would prevent the potential bug I just described where the value is accessed without going through black_box.

tl;dr: it sure would be nice to have a first-class API, whatever it looks like, that can prevent this class of bug where branches are inserted on masks for bitwise operations

8573 · June 18, 2024, 10:32pm

Would a reasonable first step be for RustCrypto or dalek-cryptography (maybe in subtle?) to define VolatileCell using your current solution and to apply it, to address this risk immediately and (as a further benefit) to begin the usual prototyping in unofficial crates of the proposed API?

bascule · June 18, 2024, 10:55pm

While we can experiment in subtle, there still is nothing better to actually use to implement it with than off-label usage of volatile reads

scottmcm · June 18, 2024, 11:13pm

I don't think this is possible for rust to offer. Performance isn't considered observable, and while the volatile writes are observable, that still doesn't say anything about whether there's a branch involved -- the sequence of volatile writes could be preserved by just duplicating it inside a branch.

Unless there's a select.constant.time in LLVM for us to lower to (kinda like craneleft's select_spectre_guard, though not exactly) on all the backends, I don't think there's anything we can give you better than "use asm!".

bascule · June 18, 2024, 11:15pm

“Use asm!” doesn’t work for CPU architectures besides x86/x86_64/ARM, at least if you care about supporting stable

newpavlov · June 18, 2024, 11:48pm

I think the most straightforward and quickest solution will be to officially make stronger guarantees for hint::black_box, similar to what we get for an empty "observing" asm! block. It's an already stable API which works more or less as expected in the mainline compiler. IIRC some alternative compilers ignore black_box, but it will be a good motivation for them to fix it and I think we can consider it a minor enough issue.

bascule · June 19, 2024, 12:03am

@newpavlov I would agree, I'm not sure what's wrong with core::hint::black_box or why it has so much scare language in its documentation (I was having a bit of trouble figuring out how it's actually implemented), but it sure would be nice if there were at least one API for optimization barriers which was labeled something other than being totally unsuitable for cryptographic use

newpavlov · June 19, 2024, 12:10am

Here is a relevant Zulip discussion: https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/black_box.20and.20crypto

I think it's just a case of not wanting to commit to any specific guarantees (which may be also hard to properly specify to boot).

doomrobo · June 19, 2024, 12:12am

Related unmerged RFC from 2020

github.com/rust-lang/rfcs

added secret types rfc

rust-lang:master ← avadacatavra:secret-types

opened 05:42PM - 23 Jan 20 UTC

avadacatavra

+206 -0

### [Rendered](https://github.com/avadacatavra/rfcs/blob/secret-types/text/0000-…secret-types.md) The goal is to provide primitive data types that can be used to traffic in transient secrets in code that are important to not accidentally leak. These new primitives would be used in conjunction with an in-progress LLVM RFC to ensure important security invariants like secret-independent runtime are maintained throughout the compilation process. Note that we explicitly do not want secret_isize and secret_usize, because we do not want to index based on secrets.

bascule · June 19, 2024, 12:29am

@doomrobo yes, I mentioned that earlier, however that RFC was closed and the plans for the backend LLVM work scrapped.

scottmcm · June 19, 2024, 2:00am

That's fixable, though? People interested in other architectures could work on getting them stabilized.

Certainly a clearer path forward than different LLVM semantics, at least.

bascule · June 19, 2024, 2:13am

There are people writing cryptographic code targeting stable who need solutions that work everywhere today.

It seems like the best portable solution currently available is {core, std}::hint::black_box but the docs in their current form are scaring people away from using it.

the8472 · June 19, 2024, 6:59am

The everywhere and everywhen (future compiler versions) breaks this. You can't have your cake and eat it too. Either you validate that the code is generated as expected on every target, compiler version and optimization flag combination or you risk non-constant code on some of those combinations.

The best we could offer in std is "on some platforms and compiler versions this does something. but this might change tomorrow". Which is a non-guarantee. And this non-guarantee is today phrased as

[...] precludes any direct use of this function for cryptographic or security purposes.

I.e. those purposes cannot be achieved with this function in a way that we consider reliable. You're free to use a tool that's not rated fit-for-purpose anyway (as with any tool). But if future changes blow things up that's on you, not on us.

Additionally since the function is currently not rated security critical it means we can just completely disable it or implement it differently (replace asm with atomics, volatile or whatever the platform offers) when the current implementation causes problems. For benchmarks that's not ideal but not breaking. If we have to consider non-benchmark users but on the other hand don't make any guarantees it's difficult to figure out which alternative implementations would still fulfill its purpose.

tczajka · June 19, 2024, 7:38am

I don't see how even fully reliable black_box can guarantee that whatever comes after the black_box is going to be compiled to constant-time operations.

Neutron3529 · June 19, 2024, 8:53am

I have a new idea: What if using AtomicU* with read/write with SeqCst?

Atomic types might be regarded as Volatile.

the8472 · June 19, 2024, 9:20am

Compilers can and do optimize atomics. A stack-local atomic that does not escape can be optimized out.

Atomics do not have volatile semantics.

They might still be usable as part of a weak optimization barrier on some platforms if no other options are available, but they'd be even less reliable than the current approach.

bjorn3 · June 19, 2024, 9:26am

And on single threaded targets atomics are lowered to regular loads and stores.

Neutron3529 · June 19, 2024, 9:29am

I tried

fn main(){
    let a=std::sync::atomic::AtomicU32::new(0);
    println!("started");
    for _ in 0..10_0000_0000{
        a.fetch_add(1,std::sync::atomic::Ordering::SeqCst);
    }
    println!("{a:?}")
}

it takes 8s to finish the printing.

Atomic* uses an UnsafeCell, thus we could read its doc:

If you have a reference &T, then normally in Rust the compiler performs optimizations based on the knowledge that &T points to immutable data. Mutating that data, for example through an alias or by transmuting an &T into an &mut T, is considered undefined behavior. UnsafeCell<T> opts-out of the immutability guarantee for &T

It shows directly that,

UnsafeCell<T> opts-out of the immutability guarantee for &T

This might ensure the compiler will not optimize any read or write ops.

the8472 · June 19, 2024, 9:29am

It does not. It disables noalias.

I tried

This does not prove absence of optimizations in all scenarios, just in the case you tested, on today's compiler, with that particular optimization pipeline.

Valaphee · June 19, 2024, 9:32am

Yea you can only "guarantee" constant time by checking the compilers output (generating and embedding the assembly, and for targets which don't have assembly emitting a warning that the function might not be constant time)

Or having a proper solution backed into LLVM (which doesn't allow branching and non-constant time instructions like div - even though this is somewhat arch/µcode dependent).

The fix for example still uses branching for bit masking on RISC-V (Tier 2 target) Compiler Explorer.

Topic		Replies	Views
Using a custom optimisation pass pipeline compiler	7	3845	March 25, 2019
`freeze(MaybeUninit<T>) -> MaybeUninit<T>` for masked reads language design	32	2415	November 16, 2022
Operational semantics and high-level vs low-level language design	24	2900	July 4, 2020
Rustc copying cryptographic keys onto the stack instead of using them via pointer compiler	29	1644	January 4, 2024
Resolving Rust's forward progress guarantees compiler	53	6830	December 22, 2024

Optimization barriers suitable for cryptographic use

Related topics