I had removed most generic bounds on stringlet, as I found they infect every generic caller. But now the compiler is forcing me back to square one:
I’m trying to clean up VarStringlet’s messy extra len-byte, by moving it into the byte array. Having stumbled over how fixedstr::strN takes away one from the requested size, I want an additional byte, to avoid such surprises.
I understand const generic operations and picking up an associated const from a generic type require nightly and #![feature(generic_const_exprs)]. But then I get this, not only on the type itself, but also on every user:
Besides suggesting the wrong inner type () vs. u8, that burden makes this feature unergonomic. So I wonder and hope: is this temporary, until some prerequisite gets implemented?
This bound is essentially used to declare that your const computation (SIZE + Kind::EXTRA) can be evaluated. In this particular case, it means that the addition should not overflow usize. It also can be used to declare that types created with the const computation result have a proper size (i.e. less than isize::MAX bytes) by using u8 instead of () in the bound.
In other words, the bound prevents potential monomorphization errors (which are, unfortunately, already exist in Rust for some corner cases...).
IIRC, yes. Though I can not link exact proposals out of my head.
And to clarify, the suggestion to use () is not wrong because it doesn't matter what the type actually is, as long as [_; SIZE + Kind::EXTRA] is a valid type.
where [(); X]: is temporary syntax for "X can be evaluated without panic/UB", presumably chosen becaues it sort-of has the right meaning already. I wouldn't expect generic_const_exprs to be stabilised until it has changed to something more sensible.
For the time being, the best way to avoid this is that if instead of a field of type [T; A + B] you use a user defined type ArrConcat<T, A, B>:
#[repr(C)]
struct ArrConcat<T, const A: usize, const B: usize>([T; A], [T; B]);
impl<T, const A: usize, const B: usize> ArrConcat<T, A, B> {
#[inline]
pub fn as_array(&self) -> &[T; A + B] {
unsafe { &*(self as *const Self as *const _) }
}
#[inline]
pub fn as_array_mut(&mut self) -> &mut [T; A + B] {
unsafe { &mut *(self as *mut Self as *mut _) }
}
#[inline]
pub fn into_array(self) -> [T; A + B] {
let this = ManuallyDrop::new(self);
unsafe { transmute_copy(&*this as &Self) }
}
}
// you can also impl From in both directions
then for some reason this is enough to convince Rust that A + B is a known well-formed const expression, whereas the field type of [T; A + B] causes the compiler to ask for the where bound.
(If you only use slices, this works even on stable, and is a nice encapsulation of the technique.)
If you only care about u8, you can likely even make doing this safe by utilizing bytemuck or zerocopy.
The second syntax is not allowed because type holes are only allowed in expression context, while where clauses are a type context. Supporting that would mean introducing a new feature that makes where clauses diverge from all other type contexts on what can be written in them.
Oh no noes! Back from holiday, with Rust 1.93 I can’t validate your optimism. Each of the three A + B exprs gives me both “cannot perform const operation using A” and “cannot perform const operation using B”.
So you say something about only being stable for slices? How exactly do you mean that? And under what circumstances would those slices be optimised away, i.e. zero-cost?
Slices, as in &[T] instead of &[T; N]: [playground]
#[repr(C)]
struct ArrConcat<T, const A: usize, const B: usize>([T; A], [T; B]);
impl<T, const A: usize, const B: usize> ArrConcat<T, A, B> {
#[inline]
pub fn as_slice(&self) -> &[T] {
let ptr = self as *const Self as *const T;
unsafe { slice::from_raw_parts(ptr, A + B) }
}
#[inline]
pub fn as_slice_mut(&mut self) -> &mut [T] {
let ptr = self as *mut Self as *mut T;
unsafe { slice::from_raw_parts_mut(ptr, A + B) }
}
#[inline]
pub fn into_parts(self) -> ([T; A], [T; B]) {
(self.0, self.1)
}
}
The cost is that by using [T] instead of [T; N], the optimizer has to notice that the slice length is known from further away. This relies on inlining in the exact same manner that optimizing out indexing bounds checks does.
The as_slice methods are functionally trivial after MIR inlining[1], so optimization of any direct callers will be functionally identical. If you always index in such a way that the optimizer can know that the index is < slice.len(), then any performance diff should be within noise, or even slightly better due to better code locality [2].
As such, I'd recommend utilizing fn<T>(&[T]) and fn<T, A, B>(&ArrConcat<T, A, B>) for helpers but avoidingfn<T, A, B>(&[T]) where you provide the known const length — while you know that, the optimizer sees an unrelated constant and can't eliminate indexing bounds checks based on your external length information.
// WARNING: This output format is intended for human consumers only
// and is subject to change without notice. Knock yourself out.
fn ArrConcat<T, A, B>::as_slice(_1: &ArrConcat<T, A, B>) -> &[T] {
let mut _0: &[T];
let _2: *const T;
let mut _3: *const ArrConcat<T, A, B>;
let mut _4: usize;
scope 1 (inlined slice::from_raw_parts) {
let _5: *const [T];
}
bb0: {
_3 = &raw const (*_1);
_2 = copy _3 as *const T (PtrToPtr);
StorageLive(_4);
_4 = Add(const A, const B);
StorageLive(_5);
_5 = *const [T] from (copy _3, copy _4);
_0 = &(*_5);
StorageDead(_5);
StorageDead(_4);
return;
}
}
A function generic over length N needs to be separately generated for each N. If a function on &[T] doesn't get further inlined for any reason, then it is reasonable to predict that the &[T; N] version would likely also not be inlined, even if it optimizes slightly differently. The greater code locality from reusing a shared monomorphization can often be surprisingly impactful. (And inlining happens when the function is "small enough" to not meaningfully contribute to that impact.) ↩︎
Ok, makes sence. Thank you so much for your effort!
Alas both pairs generate differing asm, so guess it’s not zero cost. I’d hoped that after inlining the compiler would see through it, but apparently not.
I'm seeing the first pair of indexing generate identical asm output for both, just loading a constant. Are you sure you checked with release optimizations? (Alternatively, if you just chucked both options in the same function, was it just trivial register allocation differences?)
The second pair it makes sense that they differ, since thanks to the black_box optimization barriers you used, you're effectively comparing <[u8] as PartialEq<[u8]>>::eq and <[u8; 10] as PartialEq<[u8; 10]>>::eq. If you tell the compiler to ignore that it knows the slice length, of course it's going to be unable to unroll a loop it doesn't know is fixed length.
A more fair comparison to non-black_box usage would be black_box(ArrConcat([65u8; 9], [65])).as_slice() == black_box(ArrConcat([65u8; 9], [65])).as_slice(), which moves the as_slice() outside of the black box boundary. This does still generate slightly different asm, but it looks to me to be just a trivial difference in register allocation and spill order.
[playground] (choose "show assembly" in the run drop-down menu)