Rust has str (where the type system loses size info) or [u8; N] (which loses utf-8 info.) I want size and utf-8, as for any "literal str" the compiler does know both.
This is to fix error handling in the next version of Stringlet. Currently my Froms illegally panic, if either size or utf-8 are violated. For dynamic data this should be TryFrom -> Result. But for &'literal str I resist so much ceremony. Here the compiler could already check that the size fits into [u8; N], while maintaining utf-8 well-formedness. (Besides: Result can’t be used in const.)
Is there already a way of achieving this? Or is there a plan for something like my fantasy optional str<const N: usize>?
For the harder part of the question: I have several kinds of Stringlet where shorter strings can be stored with padding. So actually I’d want something like str<0..256>. For this case I do already have a config trait implemented for each possible number. As I’m thinking of making things more flexible, I’d need to add even more such traits and impls. That’s rather cumbersome and possibly quite a drag on the trait resolver.
If we’re getting configurable ints, and they can generically parametrize T<const N: UInt<5..9>>([u8; N]), that might solve this.
I think once const generic expressions and arbitrary const generic types arrive (or even just &'static str getting whitelisted), it should become possible to do everything you need in user code. But progress on const generics is slow...
Given that you’re already using macros-pretending-to-be-literals, you can obtain the string length at compile time to derive the type from the literal value:
#[repr(transparent)]
pub struct SizStr<const LEN: usize>([u8; LEN]);
impl<const LEN: usize> SizStr<LEN> {
pub const fn strict_from_ref(s: &str) -> &Self {
// as long as you use the macro, this assertion cannot fail
assert!(s.len() == LEN);
// SAFETY: Self is a transparent wrapper, the input bytes are UTF-8,
// and we checked the length.
unsafe { &*(&raw const *s).cast::<Self>() }
}
pub fn as_str(&self) -> &str {
// SAFETY: Length and utf-8 was checked at construction time
unsafe { core::str::from_utf8_unchecked(self.as_array()) }
}
pub fn as_array(&self) -> &[u8; LEN] {
&self.0
}
}
macro_rules! sizstr {
($text:literal) => {
SizStr::<{ $text.len() }>::strict_from_ref($text)
};
}
fn main() {
let s: &'static SizStr<5> = sizstr!("hello");
println!("{:?} {:?}", s.as_str(), s.as_array());
// this will be a type error, not a const eval error!
// let _: &'static SizStr<6> = sizstr!("hello");
}
This method is fully const as of Rust 1.88 (for <[_]>::as_chunks). There are other methods for supporting earlier Rust versions (e.g. unsafely assume length in new_unchecked). [playground]
I license any code I post here under SPDX: MIT-0 OR Apache-2.0.
Note that the Option::unwrap occurs at const time due to the const block, so there is guaranteed no runtime overhead.
I see I was ninjad while putting together my example. But! Note that your example could be unsound (EDIT: panicky) in the future if $:literal matchers can match custom type literals, as such a custom type could provide a .len() which does not match its deref-coercion to str. (This was more important before $:literal, when such macros had to take $:expr instead.)
Technically not unsound because of the assert! in the constructor. But I take your point that relying on any behavior of :literal is sketchy. The minimal fix is
In a nutshell, that’s the base case of my stringlet!() macro, yes. But I get a feeling that many people don’t like macros. Some crates even hide them in an optional feature. I was wanting to make the functional folks happy as well.
And this indeed seems a valid improvement – thank you:
This is fascinating! Like the last time you helped me (which inspired my new StringletBase layout and workaround module) I’ll let this settle and see where I can take it. Thank you!
Same thing though about macros as I replied to kpreid. And it would need two variants as the const {} one can’t handle dynamic data. I guess there must be reasons why const fns are not simply always called at compile time. But it just seems so annoying – and counter-intuitive that the same keyword means different things.
The common reason why procedural macros are often an optional feature is that they can have significant extra cost to compile the macro crate and its dependencies. macro_rules macros have no comparable cost, so you will generally not find macro_rules macros gated by features.
I’m against macro overuse myself, but “this can’t be done without a macro except by counting bytes“ is a good reason to use a macro; and in my opinion, all you need to do is offer some non-macro way to do whatever the task is, and the fn strict_from_ref() in my example code or @CAD97’s new() does that.
Thank you Jordan! I guess this isn’t to everybody’s liking, but I will highlight it as a solution. That one I actually already offer as from_utf8_bytes and from_utf8_bytes_unchecked.
Edit: just noticed: not quite, as you use a reference. I’ll add those variants.