Safe trasnsmute for slices; e.g. &[u64] -> &[u32], particularly SIMD types

Very frequently in cryptography, we have to do calculations on data as though the data is &[64] or &[32] and then return the results as &[u8]. For example, SHA-256 is defined to operate on &[u32] data and SHA-512 is defined to operate on &[u64] data, but applications are always going to treat the result on &[u8].

I think that many uses of SIMD have similar needs, which is why the SIMD RFC defines simd_insert and simd_extract.

In my code, I use std::slice::from_raw_parts for this:

pub struct Digest {
    value: [u64; MAX_OUTPUT_LEN / 8],
    algorithm: &'static Algorithm,
}

impl AsRef<[u8]> for Digest {
    fn as_ref(&self) -> &[u8] {
        unsafe {
            std::slice::from_raw_parts(self.value.as_ptr() as *const u8,
                                       self.algorithm.output_len)
        }
    }
}

One of my goals in my Rust crypto code is to eliminate all uses of unsafe in it without compromising performance. Consequently, I am interested in extending Rust’s core library (and thus std) to expose non-unsafe alternatives to code that currently requires the use of unsafe.

There was a similar discussion back in July: Pre-RFC: Explicit Opt-in OIBIT for truly POD data and safe transmutes. Unfortunately, it seems like momentum on that has stopped.

One thing to note in particular is that in C/C++ such conversions are very tricky to get right because of the “strict aliasing rule”, which says that the compiler can assume that pointers to unrelated types don’t alias each other, except that char/unsigned char. Rust doesn’t have the same rule as C/C++; see https://doc.rust-lang.org/reference.html#behavior-considered-undefined, in particular the bit about “breaking the pointer aliasing rules.”

Thus, AFAICT, it would be 100% safe to cast &[u64] to &[u32] to &[u16] to &[u8], and also it would be safe to cast slices of SIMD types similarly, if Rust exposed a non-unsafe API for doing so.

Questions:

  1. Does anybody disagree that Rust’s aliasing rules allow such conversions?

  2. Does anybody think it would be a bad idea to add an API for doing such casts without using unsafe to core (and thus std)?

  3. Are there existing crates that could serve as the basis for an RFC to add these conversions to core?

  4. The Pref-RFC I linked to above seemed to be hoping to add a quite generic interface for such conversions. However, I think it may be better to define something simpler just for slices of u8, u16, u32, u64, and SIMD types. Does anybody strongly disagree?

This would go a long way towards enabling safe (as in no use of unsafe) cryptography in Rust. (Ignore the lack of side-channel-resistant operations for now. That’s a topic for another day.)

1 Like

Rust doesn’t apply TBAA, and there aren’t any concrete plans to land it (with @nikomatsakis notably being quite unhappy with TBAA in general). That said, it wouldn’t be the worst to leave TBAA on the table.

Would there be any problem with using the standard conversion traits for this sort of thing? Seems like a classic case for from/into/as_ref/whatev.

let data: &[u8] = u32_slice.into();

Not super readable, I guess.

Also, I should mention that the conversion can only safely go from larger types to smaller (or same-sized&aligned) types, not smaller types to larger types, to ensure proper alignment.

not smaller types to larger types

You'd need that though to handle user input in crypto.

It is true that you need to handle misaligned input. However, you can't (on all platforms) implement that safely by just casting a uint8_t * pointer to a larger type; you have to do actual work. We are lucky that the casting from larger types to smaller types allows us to avoid some of that actual work. (I recommend looking at the file sha512.c in OpenSSL to see the extra work they do to get this right, in the functions that call sha512_block_data_order.)

I don't think that would be useful here anyway because slices aren't POD.

I don't think doing this as an implementation of the standard conversion traits is a good idea. Into isn't appropriate because there's no need for self to be consumed unless the goal is to explicitly support TBAA.

More importantly, AsRef and From are inappropriate because they are designed to enable implicit conversions, and we definitely don't want these kinds of conversions to be implicit. For example, if you refactor a function from fn f(x: From<&[u64]>) to fn f(x: From<&u32>), it would be dangerous for all the calls to f that pass a &[u64] to keep working via this conversion.

I see one issue here - endianess. In ciphers we doesn’t care that much (except key schedules), but when dealing with digest it is really important to use proper endianess independently from platform. This is big issue here. I would be happier if there would land something like Write trait in libcore that would allow me to use byteorder crate without libstd or that there would be some kind of tools to deal with endianess.

I'd say that ship has sailed. The "underspecified language semantics" section of that RFC does list "the legal aliasing rules between unsafe pointers", but the sort of unsafe casting found in the OP is useful often enough that I would be surprised if there wasn't a significant amount of real code depending on it by this point. As an example, bindgen currently generates it for unions.

I started this thread after writing code to handle the endianness conversion for digest functions. I don't think that anything new for endian conversion is needed as far as this "safe transmute for slices" is concerned. In particular, if you have a &[u32] or &[u64] then you can do the endian conversion (e.g. using to_be() on each element) before doing the type conversion to the smaller size.

I do think that there are things that can be improved with respect to endian conversion, but IMO they are more about the input side--in particular, doing endian conversion while dealing with unaligned input--than the output side being discussed here.

I agree. TBAA would be a backward incompatible change and thus seems unlikely.

I believe that &mut[T1] cannot alias &mut[T2], which provides all of the optimization potential that strict aliasing (and the restrict keyword in C) do. The difference is that in Rust the lack of aliasing is a static invariant guaranteed by the language via the borrow checker. So TBAA is less important than in C/C++, since unsafe pointers are not used as much in Rust.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.