How extern functions are modeled in language semantics?

Meta: while reading this urlo thread, I realized that I am very confused about the exact semantics of extern functions. In fact, I am so confused that I don't even know what specific question to ask :slight_smile:

Consider the following Rust program:

extern "C" {
    fn magic1(_: *mut i32);
    fn magic2() -> *mut i32;
}

fn main() {
    let mut x = std::mem::MaybeUninit::uninit();
    let x = unsafe {
        magic1(x.as_mut_ptr());
        x.assume_init()
    };

    let y: &mut i32 = unsafe { &mut *magic2() };

    println!("{}", x);
    println!("{}", *y);
}

It's pretty clear that, depending on actual definitions of magic1 and magic2, the program might or might not be valid.

  • one UB case is when magic1 squirrels the pointer away for later, and magic2 returns this pointer, making x and y alias.
  • one non-UB case is when magic1 unconditionally aborts.

However, I am having a hard time coming up with a criteria for the behaviors of magic1 and matic2, which would allow me to distinguish the two cases. The core problem for me is that the semantics of Rust code is determined in term of Rust abstract machine, but extern functions exist outside of it.

We can say that an extern functions is just an abstract function that maps one abstract machine state to the other, and then we'll be able to precisely reason about the semantics. But it doesn't seem that this is a practical approach, as Rust abstract machine is specific to the Rust compiler. When we deal with pure Rust programs, it doesn't matter how this abstract machine is realized in terms of actual hardware/environment. But when we link with non-Rust code, these details matter. In particular, I think it's possible to imagine an alternative rust compiler, rustc', such that linking exactly the same magic1 and magic2 object code with rustc would work, but would be UB with rustc' (for example, magic2 may return 42, and rustc' can have this specific pointer value as instant UB).

Can anyone suggest relevant background reading?

5 Likes

I'd guess that a rigorous model of extern functions is something fairly nontrivial. I mean, we don't even have formal semantics at all for most programming language, including Rust, even without considering foreign function interfaces. (AFAIR, for example multi-threading is pretty hard to model, too).

I'm not really having much knowledge on the topic, but I guess adequately modeling the interaction between multiple programming languages is a hard (potentially unsolved?) problem in general; a small Google search pointed out this question in StackExchange for example, which seems related.

On the other hand, there's the practical question of how to make sure your extern fns don't run into practical UB-related problems. Giving an informal answer to this practical question might not involve too much discussion about Rust's abstract machine model anymore.

Would it help if you treated

  • each definition of an extern function same as you treat unsafe Rust fn-s
  • each invocation of an extern function as implicitly wrapped into an unsafe block?

In fact it might make sense to distinguish

  • safe exern
  • unsafe extern

One would be safe to call. Such a function would be tautonymous to a Rust fn with a large unsafe covering all of its body but a safe "interface". Such a function would promise no UB so long as you supply correct values of Rust types as inputs. It would also promise to deliver correct values of Rust types as its return.

The other would be the usual unsafe fn in Rust sense.

P.S. so my naive take is: you don't go inside an extern fn to define its semantics. You ascribe the semantics to such an fn as if it was a black box; you assume some semantics for it.

Of course it would be splendid if you could prove it actually has such a semantics. But such a proof would need to be done by tools associated to that extern language and according to its semantics.

...we don't have such proofs even for native Rust fn-s do we? Jonathan S. Shapiro recently said (hope he doesn't mind me mentioning this) it was a good start to prove safety of parts of Rust, now how about proving correctness?

I think I was a bit confusing -- I am asking about dynamic semantics (if running a program with particular input can UB) and not about static semantics (what the compiler can assume about arbitrary unknown function with a given signature).

From the point of view of dynamic semantics, unsafe functions are not a black box. Miri can run safe and unsafe functions (I would guess that it doesn't even distinguish between then) and can flag some UBs which happen at runtime.

This is actually an interesting way to think about the problem -- can we extend miri to handle extern functions?

1 Like

Me too - that's what I was talking about.

If ascribing semantics is not good enough and you need to establish or prove such semantics - it seems it would need to happen in terms of whichever language that extern fn is written in. Or else in terms of machine code. All the while keeping various guarantees upheld/required by Rust language and libraries in mind..

P.S. tangentially to this it would be rather interesting to know how far things have reached with semantics of safe Rust code, e.g. how much is possible to prove nowadays..

With extern functions defined outside of rust, you'd need to consider the Abstract Machine (or similar semantic definitions) of the programming language that defines it. You would also need to consider parts of the Rust Abstract Machine that impose restrictions on that programming language (for example, Stacked Borrows or whatever aliasing model we end up with - you can't just pass a *mut i32 aliasing a &mut i32 to C to pull an end-run arround the "do not alias &mut" rule).

Presumably, the use of x in rust invalidates y. It may also invalidate the stored pointer that magic2() returned, or it could reduce it to SharedReadOnly (either permanently or for the duration of the borrow used by println!().

1 Like