Was the forgotten precondition documented? If yes, that's a rare case; it should be easy enough to verify the reasons list against the /// # Safety documentation. If not, then arguable they should break downstream users, because they might not have verified the precondition.
I believe this is rare enough, especially across crates, to not worth downgrading from error for.
Overall I believe an error is worthy in this case, but this is clearly a viable alternative.
I fail to see how this could work without either flow typing or putting reasons in the type system (which forces them to be static and deviates a lot from this proposal).
I like the general idea of making the reasons take parameters, but it brings up several questions, which I don't have straightforward answers to.
Adding to this, would the readable(p) reason apply to the name, the binding, the place, the value of p? Something else?
When is the reason asserted semantically? At the start of the unsafe block or when the unsafe call happens?
What if the value of the variable changes inside the unsafe block?
How to express a reason that is only valid once/in a particular part of the unsafe block?
For example, ManuallyDrop::drop would require a not_yet_dropped(self).
How to express something that is not the property of a particular value, but rather a global property of the state of the program?
For example, how does one assert not_aliased(p) required to create a unique reference?
Not answering for the original post, but in my version of the world some of these properties would appear as a consequence of how the pointer was made. Create a pointer from a mutable reference, that pointer is not_aliased. You could also:
And then subsequent calls to a function which only required not_aliased would not be unsafe, the type of p is now p: *const T where not_aliased(p). The "proof" is just that we have declared that it is the case. Some of these things can't (in general) be proved for real, but that it why unsafe blocks exist. This "starts looking like formal provers (like Kani)", but yes a lite version of that is exactly what is being proposed.
How does this proposal handle situations where the unsafe code is relying on the well-formedness of a complex data structure? For example, here is some FFI code I just wrote...
unsafe {
use shim::{UAPIStruct, STRUCTS};
let mut map = HashMap::new();
for UAPIStruct { name, align, size, fields, } in &STRUCTS
{
let name = CStr::from_ptr(*name);
let name = name.to_str().expect("struct tag name not valid UTF-8");
let mut rfields = Vec::new();
for i in 0.. {
let field = fields.offset(i).read();
if field.name.is_null() {
break;
}
let fname = CStr::from_ptr(field.name);
let fname = fname.to_str().expect("field name not valid UTF-8");
rfields.push(Field {
name: fname,
size: field.size,
offset: field.offset,
});
}
map.insert(name, Struct { size, align, fields: rfields })
}
map
}
where STRUCTS is static data generated by compiling a C module. The safety requirements for this code are
the definitions of UAPIStruct and its daughter struct UAPIField (the type of fields) must agree with the matching definitions on the C side of the fence
On the Rust side STRUCTS is declared as static STRUCTS: [UAPIStruct; n]; n must be accurate
all the pointer dereferences need to be valid
all the pointers passed to CStr::from_ptr must be valid C strings
there must actually be a null pointer terminator for each fields array
probably some other stuff I forgot
How much of this can be captured in your proposal and how would it be written? I can provide more fragments if it will help.
I agree that all the stuff you raised are interesting questions that would need to be considered.
That said, to me this is still under the category of "aid to the unsafe programmer", not under "formal proof system", so I'm ok allowing slightly loose answers to things.
For example, maybe it's fine to have one big unsafe [aligned(p)] { ... } block even when p is mutated, because you read it as "in this block it's aligned", and it's just a thing that you need to maintain in any mutable operations on p in that block, but it's also a well-documented thing because of that unsafe block.
It's up to tools like Kani to formally define and check exactly what everything means, but those would be with full-SMT expressions, and if you have those proofs then you also don't need granular unsafe nor granular unsafe reasons.
I don't know if you meant mine, but I could imagine things like
The item declarations would have a moral unsafe requirement to match the C declarations. This is what the unsafe in the edition2024 unsafe extern { block stands for, which handles the static, but struct would need a new syntax (struct items aren't allowed in unsafe extern).
This would be a reasonable application of unsafe fields, as the fields could be marked with these requirements. Alternatively, a well_formed reason could be constructed for the type which implies these properties.
Written with minimal aliasing and fairly tight scoping, it could look like, if we allow discharging unsafe reasons on any "pure" place projection:
for UAPIStruct { name, align, size, fields, } in &STRUCTS {
let name = unsafe where
[valid_for_read(name), "by construction"]
[valid_for_read_until_nul(*name), "by construction"]
{
CStr::from_ptr(*name)
};
let name = name.to_str().expect("struct tag name should be UTF-8");
let fields = (0..).filter_map(|i| unsafe where
[valid_for_ref(fields, i), "by construction"]
{
let field = slice::from_raw_parts(fields, i)[i];
(!field.name.is_null()).then(field)
});
let fields = fields.map(|UAPIField { name, size, offset }| {
let name = unsafe where
[valid_for_read_until_null(name)]
{
CStr::from_ptr(name)
};
let name = name.to_str().expect("struct field name should be UTF-8");
Field { name, size, offset }
});
map.insert(name, Struct { size, align, fields: fields.collect() });
}
This isn't pretty, to be sure, but it is fairly clear about the requirements. Doing less in this function and writing helpers that maintain validity in the FFI representation would help, but that carries additional author overhead without much benefit if this is the one site ingesting the FFI structure already. There will always be a good use case for "hold my beer" style unsafe blocks.
(Aside: oooh, I really want Rust bindings for working with the Unreal Engine reflection system, but lost steam on my proof of concept after a manually translated minimal example. Probably was being far too ambitious to immediately try to integrate with the build system instead of starting as a separate tool.)
I'm thinking about if this should be language feature at all. Clippy already has lints "undocumented_unsafe_blocks" and "missing_safety_doc", and they could be further modified to what libraries already do with Safety sections in the docs (example Bevy Ecs − blob_vec.rs, listing all safety requirements)
/// # Safety
/// - [len_is_in_bounds] `len` is in bound of ptr
/// - [len_is_not_zero] `len` is greater than zero
/// - [ptr_not_null] `ptr` is not null
unsafe fn do_smth_with_raw_slice(ptr: *const u8, len: usize) {
/* implementation */
}
/// Uses [`do_smth_with_raw_slice`]
fn do_smth_with_slice(slice: &[u8]) {
let len = slice.len();
// If slice is empty return early
if len == 0 {
return;
}
unsafe {
// SAFETY:
// - [len_is_in_bounds] using `len()` of slice
// - [len_is_not_zero] early-returned if `len` was zero
// - [ptr_not_null] got slice from a reference
do_smth_with_raw_slice(slice.as_ptr(), len);
}
/* === OR === */
// SAFETY:
// - [len_is_in_bounds] using `len()` of slice
// - [len_is_not_zero] early-returned if `len` was zero
// - [ptr_not_null] got slice from a reference
unsafe {
do_smth_with_raw_slice(slice.as_ptr(), len);
}
/* === THE FOLLOWING WILL FAIL (OR WARN) === */
unsafe { do_smth_with_raw_slice(slice.as_ptr(), len); }
// ^ Reasons not explained
// SAFETY: `slice` is a valid reference to not-empty slice
unsafe { do_smth_with_raw_slice(slice.as_ptr(), len); }
// ^ This is not valid syntax for reasons
}
This already compiles and won't break existing unsafe code
Unsafe code is rather niche, so a feature like this shouldn't affect compilation of safe code (thinking about compilation speed here)
I feel like using clippy lints is niche as well, so it kind-of fits well with unsafe code being niche too
Yeah, the code I quoted is the one and only site that reads from STRUCTS, and I have to say that sprinkling [valid_for_xxx] annotations all over the code makes it a lot harder to see the actual logic. Unless these annotations could somehow make it possible for the compiler to look past the FFI boundary and verify that the module compiled from C does in fact satisfy the Rust code's requirements, I'm currently thinking it's more maintainable without.
The item declarations would have a moral unsafe requirement to match the C declarations. This is what the unsafe in the edition2024 unsafe extern { block stands for
I noticed that in edition2024 unsafe extern is now a thing you can write, but I could not find any explanation of how it differs semantically from regular extern, as extern items already require an unsafe block to reference (unless marked as safe, which STRUCTS isn't) and it didn't sound like that had changed in e2024...?
I don't know where it's documented for the unstable period either, annoyingly (I expect it at least will be documented as part of the edition 2024 docs), but in short, declaring extern function signatures is, unfortunately, unsafe in practice, and an incorrect declaration can cause UB even if it isn't ever called[1].
So the unsafe extern block takes on the unsafety of "this item exists and the signature is correct," and the safety of the item itself is the safety of calling it. The safety still defaults to unsafe, but you can write safe as a contextual keyword to declare a safe-to-call external function, only in unsafe extern blocks. (And in the future, external functions not marked as safe or unsafe might be linted against.)
The same "signature matches" requirement applies to statics as well. structs don't really exist across the FFI boundary (which only cares about matching ABI), just the safety of whatever data is accessible.
If the data is fully immutable (and IIRC the API it is), I think the "ideal" bindings would look something like
which would then be actually safe to consume, since the safety of the data access pattern is now captured in the types instead of the unsafe at the use point.
Is writing it this way actually better? It's debatable; the effort and the burden of proof is higher to write this than to just write the unsafe code to translate the data structure to a Rust native one using the same unsafe access pattern that you would in C++.
The known case is that if you declare a function more than once in a codegen unit, LLVM will only use one of the decls, and which one it uses is unspecified at best. ↩︎