Annotations for zeroing the stack of "sensitive" functions which deal in transient secrets

The threat is accidental disclosure of transient secrets used in cryptographic applications. There are several ways that could manifest, e.g. "Heartbleed".

The goal is defense in depth. Yes, there are a million other ways to achieve defense in depth, like isolating this code in a separate process, a hardware device, etc. I consider those mechanisms also nice to have and am also working on Rust on a hardware device for key storage as it were.

However, I consider those mechanisms orthogonal and complementary to something like #[sensitive]. If it's possible to zero data left over from a stack containing transient secrets, I think it should be done.

Also note that this sort of zeroization is a requirement for certain types of high assurance cryptography certifications.

5 Likes

I'm wondering if bleaching the stack is the correct primitive you want here... I hear a lot of noise in the "secret integers" direction. Do you have a stronger preference towards stack bleaching versus "this integer's contents must be zeroed once that data's lifetime is over" (whatever that means)? I get the impression that #[sensitive] does not really cover scrubbing callers, or architectural registers.

I would also thinks that whatever feature this winds up becoming should be target-guarded, like atomics... exotic targets, like the JVM, can't really satisfy this assurance. Such an assurance needs to be guaranteed to be of any cryptographic value, and it needs to be a compile-time error to try to use this on a target where the assurances do not hold.

5 Likes

It's an alternative worth considering with debatable tradeoffs.

One advantage of wiping the stack is it could potentially also apply to FFI calls.

Zeroize on drop for secret integers (if that's what you're suggesting) could also add unnecessary performance overhead if it happens at the level of every stack frame, versus a "wipe upon completion" approach.

The goal with callers is for them to only deal in "non-sensitive" values.

I'd hope scrubbing registers could also be "on the table" but it matters less to me than persistently leaving secrets in memory.

I'm not sure I love the idea of making guarantees about FFI... mostly because off-stack allocations mean you lose immediately, especially if you're FFI'ing with something like Go.

That's fair, I guess, though I would hope that the compiler is smart enough to figure out that the ABI of "take a secret integer as argument" is equivalent to "I expect the caller to zero my stack for me". What I mean to say is that secret integers can be implemented by stack bleaching, but perhaps provide a more precise knob for the user to turn.

Perhaps related: secret integers can also be used as an optimization barrier (e.g, any expression involving secret integer optimizations would somehow be exempt from constant propagation and GVN). I think this requires backend support, but I have a suspicion that LLVM may grow something like this soon. I think that such an optimization barrier wants to go hand-in-hand with secure destruction, but that's mostly because 99% of the use of secret integers is to hold key material.

What about using a library like libfringe to allocate a temporary stack, run your sensitive function on that stack, and then manually zero the stack once it returns?

That's the same implementation strategy covered in the first bullet point in the OP, but I think it'd be nice if it were a first-class feature that worked portably on stable Rust, as opposed to relying on a nightly-only library with architecture-specific assembly.

I realized I didn't address this part, either... if the primitive is "mark this stack as sensitive material," I'm not sure what the appropriate story is for handling key material throughout the lifetime of the application, in which we can avoid tacking on "secretness" to the key type... perhaps that's an orthogonal hardening measure, but I'm not sure. The impression I get from chatting with my cryptographer counterparts is that we're still fuzzy on the semantics we want...

1 Like

That would be a persistent/long-lived secret as opposed to a transient secret, so yes, it's explicitly out of scope for a proposal like this, which per the title is intended to cover transient secrets only.

That said, there are several options for that:

  • Make such secrets transient by outsourcing long-term storage to something which keywraps them, e.g. an OS "keychain" service, a (e.g. cloud) KMS, a TPM/HSM/SEP, or failing any of those options a bespoke "agent" process
  • Use memory protection. There are a number of crates which handle this in-and-of-themselves.
  • Use wrapper types which try to prevent accidental exposure. The secrecy crate (which clears the secret from memory using the zeroize crate) provides such wrapper types and also traits and trait impls intended to prevent accidental exposure.
3 Likes

9 posts were split to a new topic: Operational semantics and high-level vs low-level

You almost certainly can't, for the same reasons that you can't meaningfully do so for volatile access. All "optimization barrier" features necessarily have this problem.

At some point, you have to give up and acknowledge the silicon you execute on.

5 Likes

Which is why e.g., bench_black_box was accepted as a hint, and not a language level guarantee. As long as things cannot be specified operationally, I'm not in favor of providing such guarantees.

Which silicon is that specifically? For architecture-specific intrinsics, to the extent we provide guarantees, we only do so on that architecture. Beyond that, when guarantees, rather than something best-effort is sought after, I think the right level to make the acknowledgement is not in a high-level general-purpose language like Rust.

Do you think volatile access should not be a language level guarantee?

My understanding from @RalfJung's notes re. "externally observable events" is that this is specifiable operationally:

2 Likes

This may be the real disconnect- @Centril does seem to disagree, and suggests we exclude some tools and scenarios because they are hard or impossible to specify this way. That suggestion is what gets this kind of pushback, and arguments about "systems programming."

2 Likes

If there are concrete questions about how a feature like this relates to e.g. the Rust Abstract Machine or otherwise, I am probably the wrong person to ask, but if people who are curious/skeptical about a feature like this can put together a concrete list of them, I know there are people who would understand such questions better than myself who are interested in making this proposal more concrete.

(Specifically some of them are LLVM developers who want to work on a similar feature for C++. I don't want this feature to be simply "LLVM errata", but if we can specify things correctly I think we can potentially achieve this feature in a way which works seamlessly across Rust and C++/"C compiled as C++")

3 Likes

This is a fun argument to have, but this is not the right place to be having it.

For other features, ranging from volatile and asm to boring old FFI, I think it's possible to formally specify their behavior, but I don't know if you can call the manner of specification "operational". You have to define a mapping between Rust Abstract Machine and the lower-level process state, and say that at certain points they're required to "sync up" to a certain extent.

For example, suppose you write some data to a buffer, then pass that buffer as an argument to a system call. The kernel obviously has to be able to read the data you wrote to the buffer. But it doesn't know about the Rust Abstract Machine; it has its own, lower-level model of process memory*. For the Rust program to behave correctly, the compiler needs to guarantee that when you perform the FFI call, the lower-level memory state contains data at the buffer's address corresponding to the data you wrote there within the Abstract Machine. The abstract and lower-level states aren't always the same; they can diverge due to compiler optimizations such as reordering writes or eliminating redundant ones. But they have to sync up when you perform the call.

But none of that matters in this case, because buffer clearing – at least in the form of C's memset_s, the proposed secure_clear for C++ that was mentioned, or any of the nonportable ways that C/C++ programs often do it in practice – is only best-effort anyway. That means we simply don't need to worry too much about precise guarantees.

Why is it best-effort? In C, if you call memset_s or secure_clear, the compiler may be forced to clear out some buffer in the lower-level memory state. But:

  • Any previous operations that operated on the buffer may have left traces of the data in registers, on the stack, etc. This isn't just a theoretical concern; in fact it almost always happens. Usually only parts of the buffer are leaked, e.g. the most recently accessed word or byte, rather than the entire thing; and usually the relevant registers or stack locations will be clobbered by other data eventually. But there are no guarantees.

  • On the more theoretical side of things, the compiler is generally within its rights to make multiple copies of the entire buffer, especially (but not only) if it's stored in a local variable whose address never escapes. In other words, a single buffer in the Rust Abstract Machine state can correspond to multiple buffers in lower-level state, and the memset will only clear out one of them. It's just that this is not usually a profitable optimization.

Despite those objections, best-effort buffer clearing is still a useful operation, because in practice it usually does reduce the amount of sensitive information left in memory. I think Rust should support it, but it should be clearly documented as best-effort.

Alternately, clearing the entire stack range rather than just one buffer – as has been discussed – would, I think, solve most of the practical leaks, but it's still playing with fire. If we implemented that natively in Rust (which it doesn't really need to be; it works fine as a library feature), I'd still be reluctant to call it more than best-effort, even if I were talking purely about 'how the implementation works today' as opposed to 'what we can guarantee forever'.

Now, if someone comes up with a design, based on Cranelift or something, that precisely tracks where sensitive data is stored and can truly guarantee when it's gone... that would be awesome, and much more elegant. Though even that wouldn't be perfect: even if the data is gone from userland's view of memory, it may be around in state only the kernel can see, e.g. swap files. Regardless, the delta from here to there is a lack of implementation, not philosophical questions about how low-level Rust is.

* Though note that the kernel's view of process memory is still several abstraction layers away from "the hardware", such as virtual memory, CPU caches, etc.

5 Likes

To be perfectly clear:

@Centril, as per our out-of-band discussion, is fine with a best-effort hint with the proposed semantics. It is the specified guarantee that they take issue with.

Agreed, I was going to post the same. :slight_smile: Adding a best-effort hint without guarantees is totally fine.

Of course, getting actual guarantees is a really interesting problem, but I think that's just further out there and doesn't have to block a best-effort attempt.

4 Likes

FYI, a related RFC for adding a set of secret integer types is now up:

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.