Idea: Safe blocks for inside unsafe blocks

First off, I have no clue what impact this may have or if there would be any ramifications(especially since I don't know a lot of about the internals of rustc), but I had an idea while writing some code for an OS I'm making. My idea is to allow blocks that look something like safe { /* code */ } inside unsafe blocks(so something like unsafe { /* unsafe code */ safe { /* safe code */ } /* more unsafe code */ }) as, for example, I was writing a function like this(ignore the contents, it's not super important):

/// Parses a PC Screen Font into a [PCScreenFont].
pub fn parse_pc_screen_font(data: RawPCScreenFont) -> Result<PCScreenFont, crate::Error<'static>> {
    unsafe {
        let unitable: &[&str] = &[];
        let unistr = data.glyphs.byte_add(data.bytes_per_glyph as usize*data.num_glyphs as usize);
        /* Safe block start */
        for i in 0..(data.num_glyphs as usize) {
            let char = (*unistr)[i];
            /* Didn't finish writing code here lol */
        }
        /* Safe block end */
        
        let out = PCScreenFont {
            version: data.version,
            flags: data.flags,
            height: data.height,
            width: data.width,
            glyphs: &*(core::ptr::from_raw_parts(data.glyphs as *const Glyph, data.num_glyphs as usize) as *const [Glyph]),
            unitable
        };
        Ok(out)
    }
}

And I'd like to have the for loop be safe so that I can re-enable some of the compilers checks, however having something like:

/// Parses a PC Screen Font into a [PCScreenFont].
pub fn parse_pc_screen_font(data: RawPCScreenFont) -> Result<PCScreenFont, crate::Error<'static>> {
    unsafe {
        let unitable: &[&str] = &[];
        let unistr = data.glyphs.byte_add(data.bytes_per_glyph as usize*data.num_glyphs as usize);
    }
    for i in 0..(data.num_glyphs as usize) {
        let char = (*unistr)[i];
         /* Didn't finish writing code here lol */
    }
    unsafe {
        let out = PCScreenFont {
            version: data.version,
            flags: data.flags,
            height: data.height,
            width: data.width,
            glyphs: &*(core::ptr::from_raw_parts(data.glyphs as *const Glyph, data.num_glyphs as usize) as *const [Glyph]),
            unitable
        };
        Ok(out)
    }
}

seems too obtrusive. Please let me know if I'm missing something, but here's an example code sample with my same example:

/// Parses a PC Screen Font into a [PCScreenFont].
pub fn parse_pc_screen_font(data: RawPCScreenFont) -> Result<PCScreenFont, crate::Error<'static>> {
    unsafe {
        let unitable: &[&str] = &[];
        let unistr = data.glyphs.byte_add(data.bytes_per_glyph as usize*data.num_glyphs as usize);
        safe {
            for i in 0..(data.num_glyphs as usize) {
                let char = (*unistr)[i];
                /* Didn't finish writing code here lol */
            }
        }
        
        let out = PCScreenFont {
            version: data.version,
            flags: data.flags,
            height: data.height,
            width: data.width,
            glyphs: &*(core::ptr::from_raw_parts(data.glyphs as *const Glyph, data.num_glyphs as usize) as *const [Glyph]),
            unitable
        };
        Ok(out)
    }
}

Also, having a safe block like this makes it more obvious that this code is safe code when the surrounding isn't, although I suppose that the version that'd be used right now would make it more explicit that the code at the end of the "safe block" is unsafe again. I figure this would go through the RFC process, but I wanted to gauge interest first.

1 Like

Also, another thing I forgot, with what you have to use right now, you run into scope issues:

/// Parses a PC Screen Font into a [PCScreenFont].
pub fn parse_pc_screen_font(data: RawPCScreenFont) -> Result<PCScreenFont, crate::Error<'static>> {
    unsafe {
        let unitable: &[&str] = &[];
        let unistr = data.glyphs.byte_add(data.bytes_per_glyph as usize*data.num_glyphs as usize);
    }
    for i in 0..(data.num_glyphs as usize) {
        let char = (*unistr)[i]; // error[E0425]: cannot find value `unistr` in this scope
         /* Didn't finish writing code here lol */
    }
    unsafe {
        let out = PCScreenFont {
            version: data.version,
            flags: data.flags,
            height: data.height,
            width: data.width,
            glyphs: &*(core::ptr::from_raw_parts(data.glyphs as *const Glyph, data.num_glyphs as usize) as *const [Glyph]),
            unitable // error[E0425]: cannot find value `unitable` in this scope
        };
        Ok(out)
    }
}

Just realized, probably not the best example, the let char line requires an unsafe block lol

note that unsafe doesn't disable any of the existing checks you'd usually rely on in safe code (e.g. lifetimes, indexing out-of-bounds, etc.), all it does is allow you to also do a few extra operations you wouldn't normally be allowed to: derefencing raw pointers, calling unsafe functions, and a few others.

That said, I think having some form of safe {} blocks is a great idea that I've been wanting for some time, mostly because it'll be useful in macros, e.g.:

macro_rules! inc {
    ($a:expr) => {
        unsafe {
            // not the best example...other macros much more
            // naturally have inputs in the middle of an `unsafe` block
            let mut a: u64 = safe { $a };
            asm!("inc {}", inlateout(reg) a);
            a
        }
    };
}
6 Likes

But then someone who uses your macro inside their own unsafe block suddenly loses that ability in the macro arg.

4 Likes

I know you said not to focus on the specific example, but generally I find that code can be rewritten with more precise unsafe blocks in a clean manner, e.g.

/// Parses a PC Screen Font into a [PCScreenFont].
pub fn parse_pc_screen_font(data: RawPCScreenFont) -> Result<PCScreenFont, crate::Error<'static>> {
    let num_glyphs = data.num_glyphs as usize;
    let glyphs = unsafe { ptr::slice_from_raw_parts::<Glyph>(data.glyphs.cast(), num_glyphs) };
    let unistr = unsafe {
        let size_of_glyph = data.bytes_per_glyph as usize;
        let ptr = data.glyphs.byte_add(size_of_glyph * num_glyphs);
        slice::from_raw_parts(ptr, num_glyphs)
    };

    for &char in unistr {
        /* Didn't finish writing code here lol */
    }

    let out = PCScreenFont {
        version: data.version,
        flags: data.flags,
        height: data.height,
        width: data.width,
        glyphs,
        unitable: &[],
    };
    Ok(out)
}

But a way to cancel out one "layer" of unsafe would indeed be a nice to have.

3 Likes

true, though allowing extraneous unsafe operations in arguments of what's supposed to be a safe macro is waay worse than the slight ergonomic hit of having to use unsafe twice. this is basically what asm! did

I also commented on needing safe blocks there: Tracking Issue for asm_goto · Issue #119364 · rust-lang/rust · GitHub

1 Like

Instead of introducing a safe block inside unsafe, wouldn’t it be better to have a prefix unsafe. operator (notice the required . to remove ambiguity) that applies only to the operation immediately on the right?

unsafe fn unsafe_function(x: i32) { ... }
fn get_ptr() -> *const i32;

fn foo() {
    let ptr = get_ptr();

    // this would not compile
    unsafe.unsafe_function(*ptr);

    // while this would
    unsafe.unsafe_function(unsafe.*ptr);

    // and obviously, this continues to work
    unsafe { unsafe_function(*ptr); }
}
2 Likes

What ambiguity?

I have suggested the same before but without the ..

1 Like

I generally agree about rewriting to have the unsafe blocks be more precise (I usually enable lints to force each unsafe block to only have one unsafe operation and and to have a safety comment), but I haven't found a good way to do it with a chain of method calls. I'd love something like:

// SAFETY: why `.unsafe_baz()` is safe
unsafe {
    safe {
        // SAFETY: why `unsafe_bar` is safe
        unsafe { foo.unsafe_bar() }
        .bat()
    }
    .unsafe_baz()
}

It's still not perfect to have a rightward drift from each unsafe level, but imo this is still an improvement over the status quo.

That does work, although imo that feels cluttered(of course, that's more a matter of opinion, but that's just my feelings lol). Especially if your code is mostly unsafe but you still have a decent amount of safe code, then it doesn't feel worth it to add unsafe blocks all over.

Anyway, I'm not 100% sure the process for this kind of stuff but since the general consensus seems to be for adding safe blocks then in a couple of days when there's been more feedback I'll probably create an RFC for this.

Honestly, I'm impatient and submitting my RFC now. I'll definitely make changes along the way though.

With single-operation unsafe this would be:

foo
    // SAFETY: why `unsafe_bar` is safe
    .unsafe unsafe_bar()
    .bat()
    // SAFETY: why `unsafe_baz` is safe
    .unsafe unsafe_baz()
3 Likes

I feel like single operation unsafe would be a separate thing from safe blocks.

surely no one dev would be forced to use (nested) unsafe twice, though? the macro author would use unsafe and safe, then some users of the macro would use unsafe?

i guess if someone is using the macro they wrote they might need to write unsafe twice, but in that case they could just choose not to include the safe block, and just define the macro as introducing an unsafe block.

what I was referring to could look like:

// crate a
// this macro is safe to use
macro_rules! m {
    ($v:expr) => {
        unsafe { some_unsafe_op(safe { $v }) }
    };
}

// crate b, depends on crate a
unsafe fn f(v: *mut i32) {
    unsafe { // unsafe needed to deref v
        *v += 1;
        m!(unsafe { *v }); // unsafe needed again because m! has safe {}
    }
}
1 Like

The point is, every example so far in favor of safe is better solved by smaller unsafe scopes (including possibly a single-operation unsafe). So what's the use case for safe blocks?

5 Likes

Shouldn't this be dealt with by macro hygiene? That is, shouldn't the expansion of $v be treated as within or without an unsafe block based on the original lexical context of those tokens, in the same way that we handle identifier resolution for a macro body differently than for substitutions? Like this

/// this macro is safe to call
macro_rules! m {
    ($v:expr) => {
        unsafe { some_unsafe_op($v) }
    };
}

unsafe fn f(v: *mut i32) {
    unsafe { // unsafe needed to deref v
        *v += 1;
        m!(*v);  // macro argument can use unsafe ops
    }
}

fn g(v: &mut i32) {
    *v += 1;
    m!(*v); // macro argument *cannot* use unsafe ops
}