Annotations for zeroing the stack of "sensitive" functions which deal in transient secrets

At the latest High Assurance Cryptography Workshop we discussed a number of ways to improve assurances for cryptographic software written in Rust. One of the ideas that came out of those discussions was a way to annotate functions which deal in transient secrets, e.g. a #[sensitive] attribute:

#[sensitive]
pub fn secret_key_op(key: &Key, plaintext: &[u8]) -> Vec<u8> {
    [...]
}

There are several ways an attribute like this could potentially affect the execution of that particular function, possibly with some (forthcoming) LLVM features, but the general ideal would be after a call from a non-sensitive function into a sensitive function completes, some sort of zeroization of the stack frames (and/or registers) involved occurs.

I have heard many different ways this could happen, such as:

  • Separate stack for functions which deal in transient secrets (possibly living on pages that get zeroized aftwerward). This could potentially even ensure that the stack from further calls into e.g. FFI libraries are also zeroized upon completion.
  • Method of calculating the maximum stack depth for the resulting call graph, and zeroing everything that may have potentially been used as stack.
  • On fancy "every word tagged" memory architectures (e.g. lowRISC), annotating the memory where the stack frames are stored with their equivalent of a "sensitive" bit.

Even in the absence of a specific method for zeroing the stack, I think something like a #[sensitive] attribute would be helpful in signaling to things like taint analysis tooling or other static analysis tools that a sensitive operation is occurring.

9 Likes

Have you thought about e.g. Cranelift?

Are you seeking language level guarantees here, or is it quality of implementation best-effort? If the former, this would need to be specified / specifiable via operational semantics (cc @RalfJung).

This could be a crate?

I think now is definitely the time to think about these things for Cranelift. Either this proposal, and/or a forthcoming secret integers proposal, will be "straightforward" to add to LLVM, but also a huge (multi-year effort) amount of work as at least the latter touches practically everything. I think the earlier we get people thinking about these sorts of features, the better.

Ideally yes. The goal would be functions annotated in such a way are assured to zeroize their stacks.

FWIW I'm the author of the zeroize crate, which has done its best to solve this problem at the crate level.

I think zeroize is great for things like zero-on-Drop handlers for things key types which wipe persistent secrets from memory when they're no longer in use.

I have sincere doubts about using it for transient secrets though, and its documentation explicitly calls out problems like clearing the stack/registers which can't be solved by a crate with any sort of assurances, which is exactly why I think language-level (and LLVM/Cranelift/etc-level) support is needed.

6 Likes

The problem with specifying a feature like this is that none of it is about observable offects (from within the memory model of the language). It is only about implementation details/side effects: what is left behind on the machine stack after the language is done with it.

So I think the language would say "this is an instruction to the compiler to treat the stack data as sensitive and not persist it longer than necessary," and it would be on the implementation to decide how to avoid the persistence of the data. On a system where it is impossible to read old data (I don't know, NaT flags?), it would be valid to not zeroize anything, so long as it's impossible to be read, even exploiting the worst of UB and platform defined behavior.

3 Likes

I'm genuinely unclear what's even being proposed here. AFAIK there's never been any question that these are valuable use cases, but there's also never been any plausible suggestion for how to specify and/or implement properties like "sensitive function", "must zero-init", "must zero on drop", "constant-time execution" and so on in a portable way. Thus, my impression has always been that these desires can only be met with handwritten assembly for each target platform (and that even this is only possible for some operations on some platforms).

Has any of that changed? Did I miss a novel proposal in one of the posts here?

1 Like

To summarize an out-of-band discussion with @Centril:

  • If this does not have implications for the Rust Virtual Machine (in other words: has no observable impact (other than timing) from Rust code that does not cause UB), it is not something that should or even can be specified/guaranteed by the language.
  • Instead, it'd be something like #[inline] or #[optimize(..)], where it's just a hint to the compiler implementation that doesn't change observable behavior (other than timing).
  • Even so, this implicitly feels like it should be behavior provided by a crate utilizing ASM to get specifically the desired behavior from the target machine. By caring about dead memory you're explicitly leaving the high-level abstraction of Rust and entering the machine-specific world of ASM and implementation details.
  • And on top of that, these security-in-depth features focusing on "attacker on the same machine" style attacks (e.g. timing, stolen memory, etc.) are very high-investment low-reward and there are many more high-yield tasks that the language and compiler teams should focus on first, including (but definitely not limited to) paying down technical debt.

(Usual disclaimer: this summary is my interpretation, not Centril's, though his thought did go into the discussion this is summarizing.)

4 Likes

It's a fair summary. :slight_smile:

For normal stack slots this would just be some extra code in cg_clif. For register spill stack slots it would require a Cranelift pass to run after regalloc has created register spill slots. For registers there may need to be a new instruction to zero a specific register, or the Cranelift pass can add iconst/fconst/vconst instructions with specific registers for their SSA values after DCE has run. I also assume that the flags register will need to be cleared (new instruction) All together it should not be too hard to add this to Cranelift.

Edit: opened https://github.com/bytecodealliance/cranelift/issues/1327

1 Like

It's also unclear how such a feature would interact with signals, SEH, unwinding, etc.

2 Likes

@Diggsey brought up what I was thinking about from the start; if you're trying to protect against other processes probing while your process is running, then you're going to have headaches, e.g., debuggers. So, before really delving into whether or not this is needed, @bascule, could you please give us a model of what you're protecting against? That is, what sorts of attacks is the attacker capable of doing, and what are outside of the security model? If someone has low-level hardware access and is able to log every single memory access, then #[sensitive] won't really protect you, so clearly this is aimed at something somewhat higher level than that.

2 Likes

The threat is accidental disclosure of transient secrets used in cryptographic applications. There are several ways that could manifest, e.g. "Heartbleed".

The goal is defense in depth. Yes, there are a million other ways to achieve defense in depth, like isolating this code in a separate process, a hardware device, etc. I consider those mechanisms also nice to have and am also working on Rust on a hardware device for key storage as it were.

However, I consider those mechanisms orthogonal and complementary to something like #[sensitive]. If it's possible to zero data left over from a stack containing transient secrets, I think it should be done.

Also note that this sort of zeroization is a requirement for certain types of high assurance cryptography certifications.

5 Likes

I'm wondering if bleaching the stack is the correct primitive you want here... I hear a lot of noise in the "secret integers" direction. Do you have a stronger preference towards stack bleaching versus "this integer's contents must be zeroed once that data's lifetime is over" (whatever that means)? I get the impression that #[sensitive] does not really cover scrubbing callers, or architectural registers.

I would also thinks that whatever feature this winds up becoming should be target-guarded, like atomics... exotic targets, like the JVM, can't really satisfy this assurance. Such an assurance needs to be guaranteed to be of any cryptographic value, and it needs to be a compile-time error to try to use this on a target where the assurances do not hold.

4 Likes

It's an alternative worth considering with debatable tradeoffs.

One advantage of wiping the stack is it could potentially also apply to FFI calls.

Zeroize on drop for secret integers (if that's what you're suggesting) could also add unnecessary performance overhead if it happens at the level of every stack frame, versus a "wipe upon completion" approach.

The goal with callers is for them to only deal in "non-sensitive" values.

I'd hope scrubbing registers could also be "on the table" but it matters less to me than persistently leaving secrets in memory.

I'm not sure I love the idea of making guarantees about FFI... mostly because off-stack allocations mean you lose immediately, especially if you're FFI'ing with something like Go.

That's fair, I guess, though I would hope that the compiler is smart enough to figure out that the ABI of "take a secret integer as argument" is equivalent to "I expect the caller to zero my stack for me". What I mean to say is that secret integers can be implemented by stack bleaching, but perhaps provide a more precise knob for the user to turn.

Perhaps related: secret integers can also be used as an optimization barrier (e.g, any expression involving secret integer optimizations would somehow be exempt from constant propagation and GVN). I think this requires backend support, but I have a suspicion that LLVM may grow something like this soon. I think that such an optimization barrier wants to go hand-in-hand with secure destruction, but that's mostly because 99% of the use of secret integers is to hold key material.

What about using a library like libfringe to allocate a temporary stack, run your sensitive function on that stack, and then manually zero the stack once it returns?

That's the same implementation strategy covered in the first bullet point in the OP, but I think it'd be nice if it were a first-class feature that worked portably on stable Rust, as opposed to relying on a nightly-only library with architecture-specific assembly.

I realized I didn't address this part, either... if the primitive is "mark this stack as sensitive material," I'm not sure what the appropriate story is for handling key material throughout the lifetime of the application, in which we can avoid tacking on "secretness" to the key type... perhaps that's an orthogonal hardening measure, but I'm not sure. The impression I get from chatting with my cryptographer counterparts is that we're still fuzzy on the semantics we want...

1 Like

That would be a persistent/long-lived secret as opposed to a transient secret, so yes, it's explicitly out of scope for a proposal like this, which per the title is intended to cover transient secrets only.

That said, there are several options for that:

  • Make such secrets transient by outsourcing long-term storage to something which keywraps them, e.g. an OS "keychain" service, a (e.g. cloud) KMS, a TPM/HSM/SEP, or failing any of those options a bespoke "agent" process
  • Use memory protection. There are a number of crates which handle this in-and-of-themselves.
  • Use wrapper types which try to prevent accidental exposure. The secrecy crate (which clears the secret from memory using the zeroize crate) provides such wrapper types and also traits and trait impls intended to prevent accidental exposure.
3 Likes

9 posts were split to a new topic: Operational semantics and high-level vs low-level

You almost certainly can't, for the same reasons that you can't meaningfully do so for volatile access. All "optimization barrier" features necessarily have this problem.

At some point, you have to give up and acknowledge the silicon you execute on.

5 Likes