This is a long story. But I will leave the interaction detail as appendix of this post. Now let me focus on the current concerns.
The capstone_rust project is a wrapper of C project capstone. The heart of the C API is the following:
CAPSTONE_EXPORT
size_t CAPSTONE_API cs_disasm(csh ud, const uint8_t *buffer, size_t size...)
{
...
}
where csh
is a typedef of void *
. The code that calls this in Rust, was defined as
struct Capstone {
handle: *mut c_void;
other_fields:...
}
impl Capstone {
fn disasm(&mut self, data: &[u8]...) -> DisasmResult {
unsafe {
...
let r = cs_disasm(handle, data...);
...
}
}
}
But there are some issues:
-
The disasm
method return value does not have lifetime constraint. In this case, this makes the return value live longer than the Capstone
, and thus the csh
handle. So it causes UB in the C module and Rust have no way to ensure this. This is what I was able to contributed on fixing by adding a lifetime constraint to self
.
-
The method requires &mut self
, which makes the use of this method less ergonomic. However, the C code behind DO mutate the internal data, so if it is possible to run cs_disasm
simultaneously or reentranted, we still have UB in the C code. It is in fact safe to require &self
in this case thanks to *mut T
is not Sync/Send
, AND there is no callbacks can be setup during the cs_disasm
call.
The contract expressiveness of Rust
Both of the issues above demonstrate some invariants in C being expressed in Rust and get checked by the Compiler.
-
The C module require an object to live longer than another, which is not able to expressed in code. But in Rust, it is expressed in lifetime constraints.
-
The C module requires specific use pattern when calling a function (you should only call cs_disasm
on the same csh
when any previous calls completed). This is expressed in Rust by trait bounds.
This demonstrates the impressive expressiveness of Rust!
So what about…
Now suppose the cs_disasm
allows a “progress callback”: you can pass a function pointer to it, and when dissembling in progress, the function will be called so you know how much instructions being processed and you can access to the results before everything has been processed.
This scenario will require the user take care of more invariants: you should not call cs_disasm
within the callback, and the partial result should not live longer than the final result.
The lifetime constraint should not be an issue and I will leave it alone. For the reentrancy constraint though, we have to apply the Rust mutation rule to the csh
value, even its change are not visible in any Rust code.
There are two ways to express this in Rust: exterior mutablity or interior mutablity. Using the former is simple: we just step back and use &mut self
. However what is the best way for the interior mutablity?
Thoughts along the line lead me to the stackoverflow question, and this post. And that’s why after some discussions and have some misunderstands corrected (namely, UnsafeCell
is not a pointer, it is a container), I came up with the idea of *const UnsafeCell<c_void>
as a isomorphism of *mut c_void
: except the former is interior mutable and the later is exterior mutable.
The whole point of this discussion though, is that we need some guideline on mapping the invariant exists informally in the outside world (namely, FFI) into something can be expressed in Rust.
Appendix/References
The things begins when I read a blog post.
I then responsed in reddit for my expression of issue #1. The blogger then shown me issue #2 in his reply.
After looking it to both issues, I made a PR.
As in the discussion on the PR the author shown me the actual mutating code in the C module, I start thinking about the mutation safety issue, and then asked the question in stackoverflow, then this post.