CHERI pointers and Rust / LLVM SIMD

I am continuing this discussion from [Pre-RFC] usize is not size_t in its own thread because I have a weird question. It has less to do with what usize should be, so it seems like its own thread.

Let me cite the two excerpts from jrtc27 that I am interested in:

Please don't call it uaddr , rather something like uptr . CHERI capabilities (for 64-bit architectures) have a 64-bit integer part that is called the address (technically it's not always an address, so Arm's Morello calls it the value which avoids that slight abuse of terminology, but that's overall worse because the value should be the whole 128-bit (plus tag) quantity), and CHERI C/C++ defines a ptraddr_t type that is an integer big enough for any virtual address, i.e. a 64-bit integer on 64-bit targets (you could argue that we should just use size_t , but technically size_t only needs to be as big as the largest contiguous allocation you support, and we wanted to ensure the language extension was as general as possible rather than introducing a new conflation of types).

The rules in C are really only what they are today because uintptr_t is defined as long or equivalent on traditional architectures and they have to keep that working; you could introduce a new opaque integral type for uintptr_t that has the same representation as long but different semantics so you can track its provenance, but you can't retroactively change existing architectures to use a different type for uintptr_t . That corner of the C standard does not make any sense in the context of CHERI and is not necessary for real-world C code, it's primarily just a side-effect of trying to retroactively invent semantics for uintptr_t that make the normal cases we do support work.

So, hello @jrtc27! I have been working on the design of the portable SIMD library[0] for Rust, and this is raising questions in my head because of the way LLVM represents certain vectorized operations, namely scatter/gather, in LLVM IR[1]. When we do a SIMD gather, that is us passing a "vector of pointers" to the LLVMIR intrinsic with a mask.

It seems CHERI has an implementation for both RISCV and Arm chips, and I know that both of those have scalable predicated vector ISAs. My understanding is the Arm SVE2 way of expressing a gather is generally as a base pointer plus a series of offsets[2]. Likewise for the RISCV-Vector spec's indexed loads[3].

While my main design work has focused on a useful abstraction for "fixed-width" vectors, I have been keeping the predicated ones on my radar, and this bit is interesting to me. So, I guess my question is simple in truth: How would a CHERI vector gather / indexed load work? Are there any established semantics for it, or even just what complications would we have to become aware of? Does LLVM's "vector of pointers" work as a model at all, here?

The short answer is that Morello is based on Armv8.2 so is pre-SVE (let alone SVE2) and CHERI-RISC-V does not try to deal with the V extension that is only just now out for public review.

How to deal with vectors, especially scalable, remains an interesting area of research, but you're right that (at least for RISC-V, I haven't looked much at SVE) the model is base + vector of offsets, which maps nicely to CHERI... except that people use a base of zero (and element width equal to the address size) to do generalised scatter/gather. We have not thought about ways to avoid having to scalarise such constructs, so whilst we can absolutely support vectors of pointers, certain idioms may get more penalised. But vectors aren't something we've even touched in LLVM as neither CHERI-MIPS nor CHERI-RISC-V have baselines that have any kind of vectors, and Arm have only lightly modified their fork of CHERI-LLVM for Morello to work with NEON.

Of course, if architects are willing to make vectors of capabilities a thing that can exist in hardware then the problem goes away (but maybe introduces new ones?).

1 Like