Convenient null-terminated string literals

While Rust's native strings are better in general, there's a lot of existing C APIs that need null-terminated strings.

It's good that CStr exists and it's what I'd turn to for dynamic strings. But it's pretty inconvenient if I need a static string literal, a case which was pretty common for my application (OpenGL) and I can see being useful for many C APIs.

I ended up writing a macro like this:

macro_rules! cstr {
    ($str:expr) => (CStr::from_bytes_with_nul_unchecked(concat!($str, "\0").as_bytes()))
}

I could then use it like this:

let pos_attrib = gl::GetAttribLocation(program, cstr!("position").as_ptr());

Does anyone have any thoughts about whether this could be in the standard library, or why it isn't already? Also, if there's already a better way of doing this, I'd love to know. I know I could write "foo\0" manually but it feels unclean.

Nitpick: this should probably check that no embedded nul characters exist, and then could place an unsafe block arround the from_bytes_with_nul_unchecked. This could be done with a proc macro, or an inherent macro in the standard library (though I do not believe a declarative macro could).

One advantage to having it be in the standard library would be const-ness. from_bytes_with_nul_unchecked, IIRC, is not const. However a macro version could be, through stdlib macro magic (possibly involving a const union field transmute), and be used as a static initializer.

9 Likes

Or from_bytes_with_nul_unchecked could be made const, no?

Anyway, for the record, there is a popular crate that provides a macro similar to OP’s (as well as several alternative crates that do the same thing, judging from search results). But it’s definitely plausible that this belongs in the standard library, given that CStr itself is.

5 Likes

We would rather argue for allowing mixing UTF-8 in bytestrings, tbh. (It would also come in handy when writing tests.)

(Replying to @Soni only, even though Discourse won't show the icon)

How is this related to making C strings (i.e. null terminated with no internal nulls) easier to write, at all? (Plus, you can already just write "メカジキ".as_bytes() if you want UTF-8 bytes.)

3 Likes

I usually write these things as b"hello world\0", and that seems OK to me. It very directly expresses "some bytes with zero at the end".

Somewhat related, note as well that CStr is not exactly char* at the moment, as it carries the length:

4 Likes

IIRC from last time I looked the blocker for that is some way to have different const/runtime impls, it currently uses a highly optimized implementation that requires non-const operations.

Yeah perhaps I should just do that.

Maybe where my suggestion comes into its own is if you want to wrap a C API with a friendlier Rust interface. In that situation, where the C API takes a null-terminated string, I would want to use CStr in the signature of the wrapper.

1 Like

To us "a C string" for FFI purposes is just a bytestring with a \0 at the end.

Granted we try to avoid needing those, sometimes going as far as using implementation details of the thing we're binding to get it done.

A UTF-8 string (str) can be null-terminated and therefore a C string. This property isn't exclusive to byte strings (by which I assume [u8] is meant), no?

There is also:

  • byte_strings::c_str - Rust

  • ::safer_ffi::c!

    This one does not yield the flawed &'static CStr, but rather, a char_p::Ref<'static>. This is a type that is guaranteed to have the same layout as a *const c_char for ffi-exported functions, and which also requires that the given string be valid UTF-8 (so as to showcase an unfallible cast to &str).

Finally, now that const fns support while and condition, and with the imminent stabilization of min_const_generics, the whole thing can be done without proc-macros, while also yielding an awesome error message when an inner null is encountered:

#[macro_export]
macro_rules! c_str {( $s:expr ) => (
    {
        const IT: &'static [$crate::__::u8] = $crate::__::core::concat!($s, "\0").as_bytes();
        #[allow(deprecated)] {
            use $crate::__::*;
            let _: no_inner_null_bytes_until<{ $s.len() }> =
                no_inner_null_bytes_until::<{ c_strlen(IT) }>
            ;
            unsafe {
                core::mem::transmute::<
                    &'static [u8],
                    &'static std::ffi::CStr,
                >(IT)
            }
        }
    }
)}


#[doc(hidden)] #[deprecated(note = "Not part of the public API")] pub
mod __ {
    #![allow(nonstandard_style)]

    pub use ::core;
    pub use ::std;
    
    pub use u8;

    pub const fn c_strlen (bytes: &'static [u8]) -> usize
    {
        let mut i = 0;
        while i < bytes.len() {
            if bytes[i] == b'\0' {
                return i;
            }
            i += 1;
        }
        i + 1
    }
    
    pub struct no_inner_null_bytes_until<const idx: usize>;
}

For instance, when fed "Hell\0, W\0rld!", the macro generates the following error:

1 Like

(FWIW, the layout of &CStr is not guaranteed, so transmuting to it is library UB. We want to eventually make it a thin pointer, but the viability of doing so is still in question, and not going to be resolved until at least it's possible to make it a thin pointer.)

1 Like

I agree that performing an unguaranteed transmute is far from ideal, but for some reason CStr showcases no const constructor (not even from_bytes_with_nul? It's compatible with the thin pointer implementation). So the only non-UB solution right now is to use our own wrapper type that delegates the construction to runtime:

struct ImplIntoCStr<'lt>(&'lt [u8]);

const unsafe fn cstr_from_bytes_with_null (bytes: &'_ [u8])
  -> ImplIntoCStr<'_>
{
    /* const */ impl<'lt> Into<&'lt CStr> for ImplIntoCStr<'lt> { … }

    ImplIntoCStr(bytes)
}

If there are user crates doing dangerous things right now, having it in core would be harm reduction, I suppose.

If it were to be part of the standard library, I think it would be better expressed as a custom literal than a macro. Macros are ugly because they are untyped.