I am continuing this discussion from [Pre-RFC] usize is not size_t in its own thread because I have a weird question. It has less to do with what usize
should be, so it seems like its own thread.
Let me cite the two excerpts from jrtc27 that I am interested in:
Please don't call it
uaddr
, rather something likeuptr
. CHERI capabilities (for 64-bit architectures) have a 64-bit integer part that is called the address (technically it's not always an address, so Arm's Morello calls it the value which avoids that slight abuse of terminology, but that's overall worse because the value should be the whole 128-bit (plus tag) quantity), and CHERI C/C++ defines aptraddr_t
type that is an integer big enough for any virtual address, i.e. a 64-bit integer on 64-bit targets (you could argue that we should just usesize_t
, but technicallysize_t
only needs to be as big as the largest contiguous allocation you support, and we wanted to ensure the language extension was as general as possible rather than introducing a new conflation of types).
The rules in C are really only what they are today because
uintptr_t
is defined aslong
or equivalent on traditional architectures and they have to keep that working; you could introduce a new opaque integral type foruintptr_t
that has the same representation aslong
but different semantics so you can track its provenance, but you can't retroactively change existing architectures to use a different type foruintptr_t
. That corner of the C standard does not make any sense in the context of CHERI and is not necessary for real-world C code, it's primarily just a side-effect of trying to retroactively invent semantics for uintptr_t that make the normal cases we do support work.
So, hello @jrtc27! I have been working on the design of the portable SIMD library[0] for Rust, and this is raising questions in my head because of the way LLVM represents certain vectorized operations, namely scatter/gather, in LLVM IR[1]. When we do a SIMD gather, that is us passing a "vector of pointers" to the LLVMIR intrinsic with a mask.
It seems CHERI has an implementation for both RISCV and Arm chips, and I know that both of those have scalable predicated vector ISAs. My understanding is the Arm SVE2 way of expressing a gather is generally as a base pointer plus a series of offsets[2]. Likewise for the RISCV-Vector spec's indexed loads[3].
While my main design work has focused on a useful abstraction for "fixed-width" vectors, I have been keeping the predicated ones on my radar, and this bit is interesting to me. So, I guess my question is simple in truth: How would a CHERI vector gather / indexed load work? Are there any established semantics for it, or even just what complications would we have to become aware of? Does LLVM's "vector of pointers" work as a model at all, here?