Operational semantics and high-level vs low-level

Storyyeller · April 5, 2020, 4:05pm

I think the problem is that there is a fundamental tradeoff between properly supporting low level features like "secure clear" and performance.

For performance, you want your abstract model to be as high level as possible so it is possible to transform the code to make it faster. Meanwhile, caring about stuff like external views of memory or whether instructions execute in constant time essentially makes optimization impossible. At that point, your model is so low level that you're basically writing assembly.

If you're designing the language from scratch, you could introduce some notion of "volatile memory region" which the compiler isn't allowed to copy or optimize, along with appropriate operational semantics, and then any code which isn't using volatile memory can mostly still be optimized. However, it still adds complexity to the compiler, since every single optimization pass has to the consider whether the code in question might be using (even indirectly) volatile memory, and some popular optimizations are still ruled out entirely. And it's still possible to accidently leak the contents of volatile memory when you read it into a variable and that variable gets spilled to the stack. And this is the best possible case, when you design it into the language from the ground up.

Tom-Phinney · April 5, 2020, 4:48pm

That's not inherently correct. The requirement is to suppress data-dependent timing variance for sensitive data, not to inhibit all optimization. For example, loop-independent computations can be lifted out of loops, and expressions that do not involve sensitive data can be optimized as formerly. The primary need is to eliminate conditional branches based on sensitive data, followed by eliminating indexing/addressing based on sensitive data. (Note that this latter implies crypto algorithms that do not use table lookup, unless the entire lookup table can be pre-loaded and then locked in the level-1 cache.)

A reasonable approach would be to declare which variables and struct fields are sensitive, to track which objects, including hardware registers and temporaries, are "infected" with sensitive data via computation and assignment, and to zero those objects when they are dropped or go out of scope unless the compiler can prove that they (e.g., registers) are reused either A) in the same basic block or B) in all successor chains of basic blocks. I suspect that this latter optimization would be implemented by inserting such clearing code everywhere it is potentially needed, then removing it in a late compilation pass only when A) or B) is observed.

Storyyeller · April 5, 2020, 5:25pm

That doesn't quite work, since some amount of "infection" is desirable. It would be kind of sad if, once you were done using a secret key to sign a message, the compiler helpfully zeros out not only the key, but the signature as well!

Tom-Phinney · April 5, 2020, 7:11pm

True, which implies that there need to be a pair of intrinsics with one inverse to the other: infect and disinfect, or protect and unprotect. Note that I do not use classify and declassify, as those two terms have very specific meaning for at least those of us who have handled such information in a government context.

system · July 4, 2020, 7:11pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Thoughts on Hardening?	1	1569	March 25, 2019
Pre-RFC: Stabilize volatile copy and set functions libs	6	1122	September 10, 2019
Taming Undefined Behavior in LLVM Unsafe Code Guidelines	6	2254	March 25, 2019
[Pre-RFC] Another take at clarifying `unsafe` semantics	41	4143	March 25, 2019
Should it be possible to translate (“transpile”) a Rust program to C11? language design	14	7177	March 25, 2019

Operational semantics and high-level vs low-level

Related topics