FFI mutating raw pointer where immutable reference in scope, is this UB?

Yesterday I asked a question in stackoverflow and getting two answers that are contradicting at some points.

It boils down to the soundness of the following unsafe code:

Playground

use std::os::raw::c_void;

//Let's say all those functions are in an FFI module,
//with the exact same behaviour
fn ffi_create() -> *mut c_void {
    Box::into_raw(Box::new(0u8)) as *mut u8 as *mut c_void
}
unsafe fn ffi_write_u8(p: *mut c_void, v:u8) {
    *(p as *mut u8) = v;
}
unsafe fn ffi_read_u8(p: *mut c_void) -> u8 {
    *(p as *mut u8)
}

fn main() {
    unsafe {
        //let's ignore ffi_destroy() for now
        let pointer: *mut c_void = ffi_create();
        let ref_pointer: &*mut c_void = &pointer;        
        ffi_write_u8(pointer, 7);
        let integer = ffi_read_u8(*ref_pointer);
        assert_eq!(integer, 7);
    }
}

For (it is legal!)

  • There are practical uses like this.
  • It is hard to imagine that the compiler can do anything unexpected.

Against (No it is UB!)

  • It violate the mut rules: ref_pointer is in scope, but ffi_write_u8 can mutate the thing it refers to
  • If considered “interior mutablity”, it is not using UnsafeCell directly or indirectally

I think it is better to raise it here as we don’t have an agreement with. What do you think?

Disclaimer: there’s no formal memory model yet.


I’m going to use a simplified version of the Stacked Borrows model, which is roughly the direction that we seem to be going to what is intrinsically allowed in Rust’s memory model.

So let’s walk through what happens here:

let pointer: *mut extern = ffi_create();

We create a Raw pointer to some resource.

let ref_pointer: &(*mut extern) = &pointer

We create a Freeze reference to the pointer, pointer.

ffi_write_u8(pointer, 7);

We use the Raw pointer pointer to mutate the resource.

let integer = ffi_read_u8(*ref_pointer);

We load the Raw pointer from the Freeze reference and then load the resource behind the pointer.

assert_eq!(integer, 7);

We use our loaded value.

As the resource is behind a Raw pointer at all times, the Rust model assumes nothing about it, and it’s allowed to change at any point. The value which is unable to change because of the & borrow lock is pointer itself. (This only holds for this thread’s access; concurrent threaded access to the resource would be data race UB.)

Now, what if you had instead written

let pointer: *mut extern = ffi_create();
let mut_ref: &mut extern = &*pointer;
ffi_write_u8(pointer, 7);
let integer = ffi_read_u8(mut_ref);
assert_eq!(integer, 7);

Here, because the &mut exists, the value managed by ffi_create is assumed by Rust to be unique from the creation of mut_ref to the last use of it. This is obviously not the case, as the original pointer is used in-between the creation and use of the ref. This example is UB.

cc @RalfJung, did I get this right?

6 Likes

I think this is not a problem right now as it is not possible to send pointer to another thread unless you explicitly making a newtype that is Send.

This example have problem because it attempts to use the *mut extern pointer IN Rust (the expression &mut *pointer deferences the pointer).

What the original intention is treat *mut extern a completely opaque type: the only way you can use it is to sent it to FFI.

My thoughts

In theory, there is no way Rust can know whether the use of FFI functions will UB: there could be a bug in the FFI module, who knows?

So what the compiler can do for safe code and programmers should do for unsafe code, is to ensure the following condition (legal = not UB):

If the current program status is legal, before calling any FFI functions (including exit of the program), the program status should be legal at any point

This principle means we can only conclude the following in my example:

assuming the status before main is legal,

  • Before ffi_write_u8 it is still legal
  • If after ffi_write_u8 it is still legal (means all Rust visible variables including pointer remains the same, and pointer is still point to legal memory), it is still legal before ffi_read_u8 because the ref_pointer is still refer to pointer, a legal pointer.
  • If after ffi_read_u8 it is still legal (means integer is valid), the remaining program is legal

So in general, our safety conclusion can only base on the assumption that the FFI code is not UB.

In terms of mutability though, I think now we have three category of mutability:

  1. Exterior mutability, specified by the use of mut keyword
  2. Interior mutability, specified by using UnsafeCell directly or indirectly
  3. Foreign mutablity, specified by a never dereferenced raw pointer, and may sent to FFI functions

Problems left behind

  1. What language facility shall we have to specify a raw pointer should never be dereferenced?
    • This is harder than it looks
    let ref_pointer: &mut ! = &mut *(pointer as *mut !);
  • The above code is legal, but the whole point is to avoid such things for specific raw pointer types.
  1. Should this type also have mut and const variants? C pointers have const void * and void * variants so the difference do exists in FFI. But I am not sure what can the Rust compiler take advantage of them.

Finally I think the *mut extern grammar is great, but I just don't want to enable &*pointer or coerce to another type - we do not want Rust to touch the thing it points to. If the user do want things like that, they can always use *mut [u8;10] instead.

I’d like to clarify that *mut extern was a shorthand and is not valid syntax.

Instead, you would extern type, and I think that it’s theoretically possible to forbid dereferencing a ptr-to-extern-type, even if that’s not the current nightly implementation. I definitely think that’s the intent of extern type; Rust isn’t allowed to do anything other than hand around pointers to the type.

Yes you lead me to the RFC 1861: extern types.

So the conclusion is to add the following to the RFC:

  • Include the necessity for Foreign Mutability in the motivation part;
  • Allow to specify that a type is not deferencable. My proposal is something like extern { type Opaque: !; }.

@CAD97 I was about to disagree strongly, and then I re-read the code. :wink: So, what happens here is that you have a shared reference to a raw pointer to some location, and then that location gets mutated? Yes, that is definitely okay. None of Rust's guarantees look "through" raw pointers.

The goal is to define a model like Stacked Borrows so that mutability is not something "extra", but instead when you look at the program state you already have there the necessary pieces to track mutability violations (aliasing violations, really, because it's also about uniqueness).

More concretely to your point, I do not see how anything you are doing here is FFI specific. I do not understand what you mean to say by putting "Foreign mutability" next to &mut and &UnsafeCell, but you seem to mean raw pointers, so maybe if anything let's call this "Raw mutability". FFI is not special. Raw pointers are really in a different class from references, and Rust just makes no assumptions about them, period. So there's not much to say in terms of what you can and cannot do as long as all involved pointers are raw.

I had a hard time figuring out where you are going with your post, so if you have some concrete questions it would help if you could formulate them explicitly. :slight_smile:

You start to conflate several issues here. You started by asking about whether some code is legal (it is). Now you are going into API design. That's also interesting, but it is an entirely independent discussion.

The current best practices for "opaque pointers" are documented in the nomicon. Until extern type lands, that's what we recommend you to use.

3 Likes

Reading the stackoverflow, it would help if you could tell us about the actual, high-level problem you are trying to solve. What happened here is that you want part-way down the wrong rabbit-hole (interior mutability, which is likely unrelated to your problem), got stuck, and are asking a very specific question that is already making some wrong assumptions. So I suggest you take a couple steps back and focus on what you want to achieve: Which kind of API do you want to wrap, which kind of operations does it provide, what do the user-visible types look like in C, what would you like the user-visible types to look like in Rust?

2 Likes

This is a long story. But I will leave the interaction detail as appendix of this post. Now let me focus on the current concerns.

The capstone_rust project is a wrapper of C project capstone. The heart of the C API is the following:

CAPSTONE_EXPORT
size_t CAPSTONE_API cs_disasm(csh ud, const uint8_t *buffer, size_t size...)
{
...
}

where csh is a typedef of void *. The code that calls this in Rust, was defined as

struct Capstone {
    handle: *mut c_void;
    other_fields:...
}
impl Capstone {
    fn disasm(&mut self, data: &[u8]...) -> DisasmResult {
        unsafe { 
            ...
            let r = cs_disasm(handle, data...);
            ...
        }
    }
}

But there are some issues:

  1. The disasm method return value does not have lifetime constraint. In this case, this makes the return value live longer than the Capstone, and thus the csh handle. So it causes UB in the C module and Rust have no way to ensure this. This is what I was able to contributed on fixing by adding a lifetime constraint to self.

  2. The method requires &mut self, which makes the use of this method less ergonomic. However, the C code behind DO mutate the internal data, so if it is possible to run cs_disasm simultaneously or reentranted, we still have UB in the C code. It is in fact safe to require &self in this case thanks to *mut T is not Sync/Send, AND there is no callbacks can be setup during the cs_disasm call.

The contract expressiveness of Rust

Both of the issues above demonstrate some invariants in C being expressed in Rust and get checked by the Compiler.

  1. The C module require an object to live longer than another, which is not able to expressed in code. But in Rust, it is expressed in lifetime constraints.

  2. The C module requires specific use pattern when calling a function (you should only call cs_disasm on the same csh when any previous calls completed). This is expressed in Rust by trait bounds.

This demonstrates the impressive expressiveness of Rust!

So what about…

Now suppose the cs_disasm allows a “progress callback”: you can pass a function pointer to it, and when dissembling in progress, the function will be called so you know how much instructions being processed and you can access to the results before everything has been processed.

This scenario will require the user take care of more invariants: you should not call cs_disasm within the callback, and the partial result should not live longer than the final result.

The lifetime constraint should not be an issue and I will leave it alone. For the reentrancy constraint though, we have to apply the Rust mutation rule to the csh value, even its change are not visible in any Rust code.

There are two ways to express this in Rust: exterior mutablity or interior mutablity. Using the former is simple: we just step back and use &mut self. However what is the best way for the interior mutablity?

Thoughts along the line lead me to the stackoverflow question, and this post. And that’s why after some discussions and have some misunderstands corrected (namely, UnsafeCell is not a pointer, it is a container), I came up with the idea of *const UnsafeCell<c_void> as a isomorphism of *mut c_void: except the former is interior mutable and the later is exterior mutable.

The whole point of this discussion though, is that we need some guideline on mapping the invariant exists informally in the outside world (namely, FFI) into something can be expressed in Rust.

Appendix/References

The things begins when I read a blog post.

I then responsed in reddit for my expression of issue #1. The blogger then shown me issue #2 in his reply.

After looking it to both issues, I made a PR.

As in the discussion on the PR the author shown me the actual mutating code in the C module, I start thinking about the mutation safety issue, and then asked the question in stackoverflow, then this post.

Which data is being being modified? Taking &self is also a promise that none of the fields of self will get changed. However, if all the data being mutated lives behind raw pointers, then this reasoning sounds correct (and you need not be concerned with interior mutability, because there exist no &T pointing to the data being modified).

Your actual problem is reentrancy, not mutation of Rust data structures. Interior mutability does not come up here as you are not mutating anything that a &T points to. What you want is to prevent reentrancy, meaning you need to make sure that the caller exclusively owns the Capstone when calling disasm. The way to express this in Rust is ownership. If you have a function that requries exclusive ownership (and in the face of reentrancy, disasm is such a function!), you need to make it take &mut self at the least to ensure there is no sharing.

My impression is that you come at interior mutability at the wrong way: Interior mutability is not in any way about "preventing reentrancy". Interior mutability is about being safe to mutate in the presence of sharing. Reentrancy is a way of accidentally sharing data, so one consequence of this is that interior mutability also means being safe to mutate in the presence of reentrancy. The relationship between your code and interior mutability is that reentrancy is a problem in both cases. Now you need a way to avoid reentrancy. You have two ways to do that in Rust:

  • Exclusive access through &mut. This works statically, so if your user does it wrong their code does not compile, which is nice. It is also less ergonomoc.
  • Dynamically checking for reentrancy. This is what RefCell, Mutex, RwLock do, because the first option would defeat their entire purpose! This is more ergonomic, but if your user does it wrong they will only notice at run-time.

I would recommend using the first, static approach. It is more powerful: If you user wants to share and fall back to the second approach, they can always use a RefCell. However, if you pick the second approach, there is no way for your user to opt-in to static checking.

Basically, RefCell is a generic mechanism that turns static exclusivity checks (like &mut self) into dynamic checks. Make your users use it as a building block! RefCell<T> is a generic type because it is usable with any type T. It solves the problem of "how can I make these too restrictive static checks dynamic", once and for all, so that you don't have to solve this problem again and again in your code. You can just rely on &mut self and not worry about reentrancy, and rest ensured that RefCell takes care of all the complications that could arise.

4 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.