Pre-RFC: Unsafe reasons

Was the forgotten precondition documented? If yes, that's a rare case; it should be easy enough to verify the reasons list against the /// # Safety documentation. If not, then arguable they should break downstream users, because they might not have verified the precondition.

I believe this is rare enough, especially across crates, to not worth downgrading from error for.

Overall I believe an error is worthy in this case, but this is clearly a viable alternative.

I fail to see how this could work without either flow typing or putting reasons in the type system (which forces them to be static and deviates a lot from this proposal).

I like the general idea of making the reasons take parameters, but it brings up several questions, which I don't have straightforward answers to.

Adding to this, would the readable(p) reason apply to the name, the binding, the place, the value of p? Something else?

When is the reason asserted semantically? At the start of the unsafe block or when the unsafe call happens? What if the value of the variable changes inside the unsafe block?

How to express a reason that is only valid once/in a particular part of the unsafe block? For example, ManuallyDrop::drop would require a not_yet_dropped(self).

How to express something that is not the property of a particular value, but rather a global property of the state of the program? For example, how does one assert not_aliased(p) required to create a unique reference?

Not answering for the original post, but in my version of the world some of these properties would appear as a consequence of how the pointer was made. Create a pointer from a mutable reference, that pointer is not_aliased. You could also:

// SAFETY: Probably fine.
unsafe { allege_unsafe_property!(not_aliased(p)); }

And then subsequent calls to a function which only required not_aliased would not be unsafe, the type of p is now p: *const T where not_aliased(p). The "proof" is just that we have declared that it is the case. Some of these things can't (in general) be proved for real, but that it why unsafe blocks exist. This "starts looking like formal provers (like Kani)", but yes a lite version of that is exactly what is being proposed.

How does this proposal handle situations where the unsafe code is relying on the well-formedness of a complex data structure? For example, here is some FFI code I just wrote...

    unsafe {
        use shim::{UAPIStruct, STRUCTS};
        let mut map = HashMap::new();
        for UAPIStruct { name, align, size, fields, } in &STRUCTS
        {
            let name = CStr::from_ptr(*name);
            let name = name.to_str().expect("struct tag name not valid UTF-8");
            let mut rfields = Vec::new();
            for i in 0.. {
                let field = fields.offset(i).read();
                if field.name.is_null() {
                    break;
                }
                let fname = CStr::from_ptr(field.name);
                let fname = fname.to_str().expect("field name not valid UTF-8");
                rfields.push(Field {
                    name: fname,
                    size: field.size,
                    offset: field.offset,
                });
            }
            map.insert(name, Struct { size, align, fields: rfields })
        }
        map
    }

where STRUCTS is static data generated by compiling a C module. The safety requirements for this code are

  • the definitions of UAPIStruct and its daughter struct UAPIField (the type of fields) must agree with the matching definitions on the C side of the fence
  • On the Rust side STRUCTS is declared as static STRUCTS: [UAPIStruct; n]; n must be accurate
  • all the pointer dereferences need to be valid
  • all the pointers passed to CStr::from_ptr must be valid C strings
  • there must actually be a null pointer terminator for each fields array
  • probably some other stuff I forgot

How much of this can be captured in your proposal and how would it be written? I can provide more fragments if it will help.

I agree that all the stuff you raised are interesting questions that would need to be considered.

That said, to me this is still under the category of "aid to the unsafe programmer", not under "formal proof system", so I'm ok allowing slightly loose answers to things.

For example, maybe it's fine to have one big unsafe [aligned(p)] { ... } block even when p is mutated, because you read it as "in this block it's aligned", and it's just a thing that you need to maintain in any mutable operations on p in that block, but it's also a well-documented thing because of that unsafe block.

It's up to tools like Kani to formally define and check exactly what everything means, but those would be with full-SMT expressions, and if you have those proofs then you also don't need granular unsafe nor granular unsafe reasons.

I don't know if you meant mine, but I could imagine things like

unsafe [aligned(ptr)]
struct MyThing<T> { ptr: NonNull<T> }

where you need to prove that to mutate or initialize, but you also get it free from taking a parameter of the type when reading the field.

2 Likes

One possible answer:

The item declarations would have a moral unsafe requirement to match the C declarations. This is what the unsafe in the edition2024 unsafe extern { block stands for, which handles the static, but struct would need a new syntax (struct items aren't allowed in unsafe extern).

This would be a reasonable application of unsafe fields, as the fields could be marked with these requirements. Alternatively, a well_formed reason could be constructed for the type which implies these properties.

Written with minimal aliasing and fairly tight scoping, it could look like, if we allow discharging unsafe reasons on any "pure" place projection:

for UAPIStruct { name, align, size, fields, } in &STRUCTS {
    let name = unsafe where
        [valid_for_read(name), "by construction"]
        [valid_for_read_until_nul(*name), "by construction"]
    {
        CStr::from_ptr(*name)
    };
    let name = name.to_str().expect("struct tag name should be UTF-8");
    let fields = (0..).filter_map(|i| unsafe where
        [valid_for_ref(fields, i), "by construction"]
    {
        let field = slice::from_raw_parts(fields, i)[i];
        (!field.name.is_null()).then(field)
    });
    let fields = fields.map(|UAPIField { name, size, offset }| {
        let name = unsafe where
            [valid_for_read_until_null(name)]
        {
            CStr::from_ptr(name)
        };
        let name = name.to_str().expect("struct field name should be UTF-8");
        Field { name, size, offset }
    });
    map.insert(name, Struct { size, align, fields: fields.collect() });
}

This isn't pretty, to be sure, but it is fairly clear about the requirements. Doing less in this function and writing helpers that maintain validity in the FFI representation would help, but that carries additional author overhead without much benefit if this is the one site ingesting the FFI structure already. There will always be a good use case for "hold my beer" style unsafe blocks.

(Aside: oooh, I really want Rust bindings for working with the Unreal Engine reflection system, but lost steam on my proof of concept after a manually translated minimal example. Probably was being far too ambitious to immediately try to integrate with the build system instead of starting as a separate tool.)

1 Like

Is there a reason to not use the macro syntax?

#[unsafe_reason(reason1)]
#[unsafe_reason(reason2)]
// #[unsafe_reason(reason1, reason2)] // Maybe allow combining them
unsafe fn my_unsafe_fn() { /* ... */}

fn main() {
    #[unsafe_reason(reason1)]
    unsafe {
        #[unsafe_reason(reason1)]
        my_unsafe_fn()

        // ...
    }
}
  • It keeps the reasons still visible and near the unsafe keyword,
  • clearly indicates that they are optional (though a lint could still require them),
  • allows a lot of flexibility,
  • makes it possible/easier to specify reasons for a single line within an unsafe block,
  • doesn't make the function definition itself even longer and
  • it is easy to ignore when skimming through the code
5 Likes

Unsafe on attributes are in a problematic state, but other than that I like this idea.

Sure, attributes would work too. That's the version I had in my sketch in Detect and Fix Overscope unsafe Block - #19 by scottmcm

I think the important part is how they should work and what they mean; how exactly the wrapping paper is spelled is much less important to me.

1 Like

I'm thinking about if this should be language feature at all. Clippy already has lints "undocumented_unsafe_blocks" and "missing_safety_doc", and they could be further modified to what libraries already do with Safety sections in the docs (example Bevy Ecs − blob_vec.rs, listing all safety requirements)

/// # Safety
/// - [len_is_in_bounds] `len` is in bound of ptr
/// - [len_is_not_zero] `len` is greater than zero
/// - [ptr_not_null] `ptr` is not null
unsafe fn do_smth_with_raw_slice(ptr: *const u8, len: usize) {
    /* implementation */
}

/// Uses [`do_smth_with_raw_slice`]
fn do_smth_with_slice(slice: &[u8]) {
    let len = slice.len();

    // If slice is empty return early
    if len == 0 {
        return;
    }

    unsafe {
        // SAFETY:
        // - [len_is_in_bounds] using `len()` of slice
        // - [len_is_not_zero] early-returned if `len` was zero
        // - [ptr_not_null] got slice from a reference
        do_smth_with_raw_slice(slice.as_ptr(), len);
    }


/* === OR === */


    // SAFETY:
    // - [len_is_in_bounds] using `len()` of slice
    // - [len_is_not_zero] early-returned if `len` was zero
    // - [ptr_not_null] got slice from a reference
    unsafe {
        do_smth_with_raw_slice(slice.as_ptr(), len);
    }


/* === THE FOLLOWING WILL FAIL (OR WARN) === */


    unsafe { do_smth_with_raw_slice(slice.as_ptr(), len); }
    //       ^ Reasons not explained

    // SAFETY: `slice` is a valid reference to not-empty slice
    unsafe { do_smth_with_raw_slice(slice.as_ptr(), len); }
    //       ^ This is not valid syntax for reasons
}
  • This already compiles and won't break existing unsafe code
  • Unsafe code is rather niche, so a feature like this shouldn't affect compilation of safe code (thinking about compilation speed here)
  • I feel like using clippy lints is niche as well, so it kind-of fits well with unsafe code being niche too
2 Likes

Yeah, the code I quoted is the one and only site that reads from STRUCTS, and I have to say that sprinkling [valid_for_xxx] annotations all over the code makes it a lot harder to see the actual logic. Unless these annotations could somehow make it possible for the compiler to look past the FFI boundary and verify that the module compiled from C does in fact satisfy the Rust code's requirements, I'm currently thinking it's more maintainable without.

The item declarations would have a moral unsafe requirement to match the C declarations. This is what the unsafe in the edition2024 unsafe extern { block stands for

I noticed that in edition2024 unsafe extern is now a thing you can write, but I could not find any explanation of how it differs semantically from regular extern, as extern items already require an unsafe block to reference (unless marked as safe, which STRUCTS isn't) and it didn't sound like that had changed in e2024...?

I don't know where it's documented for the unstable period either, annoyingly (I expect it at least will be documented as part of the edition 2024 docs), but in short, declaring extern function signatures is, unfortunately, unsafe in practice, and an incorrect declaration can cause UB even if it isn't ever called[1].

So the unsafe extern block takes on the unsafety of "this item exists and the signature is correct," and the safety of the item itself is the safety of calling it. The safety still defaults to unsafe, but you can write safe as a contextual keyword to declare a safe-to-call external function, only in unsafe extern blocks. (And in the future, external functions not marked as safe or unsafe might be linted against.)

The same "signature matches" requirement applies to statics as well. structs don't really exist across the FFI boundary (which only cares about matching ABI), just the safety of whatever data is accessible.

If the data is fully immutable (and IIRC the API it is), I think the "ideal" bindings would look something like

#[repr(C)]
pub struct UAPIStruct {
    pub name: &'static UAPIName,
    pub align: i32,
    pub size: i32,
    pub fields: NullTerminatedArrayPtr<'static, UAPIField>,
}

#[repr(C)]
pub struct UAPIField {
    name: UAPIName,
    size: i32,
    offset: i32,
}

impl IsNull for UAPIField {
    fn is_null(&self) -> bool {
        self.name.is_null()
    }
}

unsafe extern "C" {
    // SAFETY: the types match up, i swear uwu
    safe static STRUCTS: [UAPIStruct; N];
}

which would then be actually safe to consume, since the safety of the data access pattern is now captured in the types instead of the unsafe at the use point.

Is writing it this way actually better? It's debatable; the effort and the burden of proof is higher to write this than to just write the unsafe code to translate the data structure to a Rust native one using the same unsafe access pattern that you would in C++.


  1. The known case is that if you declare a function more than once in a codegen unit, LLVM will only use one of the decls, and which one it uses is unspecified at best. ↩︎

1 Like