Function pointers are inconsistent with other language features

The topic will be the function types of the form fn(T) -> U. These are quite special, and in a sense inconsistent with other parts of the type system design, which I claim is a root cause of at least one popular soundness bug in ffi.

In many documents they are referred to as function pointers, including the reference itself, but it might be much more appropriate to speak of them as function references in Rust's terms. There are a number of reasons for this.

  1. They are assumed to be non-null. In particular this implies that Option<fn()> uses (and is guaranteed) the layout optimization that assigns the representation 0 to the None variant.
  2. They are always 'dereferencable' in the sense of being allowed to call them, which in most RAM machine target architectures executes a call instruction that jumps to the pointee.
  3. On the topic of them being references in the sense of 2. they are still always assigned a 'static lifetime, can be unconditionally part of constants and they are Copy.
  4. Function pointers are Send and Sync whereas usually raw pointers are not.

Note that there is a particular inconsistency bothering me that is at odds with the pointer kind as a whole. There is no type that one could be ascribed to their pointee. It would be completely fine if that were an opaque DST in the likes of OsStr¹. Maybe this feature is dependent on extern types as it might be wise to not allow any query of its size at all? But this means one can't construct a raw function pointer, and for example parameterize a type such that it can store both pointers to functions as well as other types. This appears to me to be a major hinderance for the ergonomics of dynamic loading implementations.

Finally, I said there were bugs cause by this. The first class is those where a translation from C or even a C-compatible api uses the direct translation of function pointers to fn and assumes these to be zeroable. This then leads to UB (Examples: 1, 2). This has recently become a lint when it occurs in constant context which finds at least a couple instances of these bugs.

Secondly, it becomes impossible to define functions that do not have static lifetime, such as those loaded from dynamic libraries that might be unloaded at another point in time. This means libraries such as libloading are simply generally unsound—or at the very least make it clearly hard to use them soundly–even they could perfectly validate the correct typing of loaded functions. There might be a possible, weird workaround by hiding it behind a dyn Fn() trait object instead, which would neither be Copy nor implicitly assumed 'static, but this feels generally very clunky. And here once again the non-existence of the pointee type becomes relevant, albeit not a large issue. It is not possible to turn the original function type into a dynamic trait object but only a & fn() which requires the function pointer itself to be stored with the lifetime. Fortunately this exists in the symbol table of the usual dll format already, and libloading stores the pointer itself, so it might not be an issue in practice. I honestly don't have enough experience with the implementation of libloading itself to tell for sure.

Some very, very rough ideas for remedies:

  • Introduce an optional lifetime parameter for fn types: fn 'lifetime (T) -> U. I think this is somewhat of a band-aid fix but it allows naming the function type with the correct lifetime, which might fix the immediate soundness concern of libloading althought it would require a specialized type just for functions to correctly name the type while being dependent on the lifetime of the loaded library itself. In particular this would not yet permit a sound, consistent Deref impl to symbols loaded in such a manner.
  • I wonder if it would be possible to introduce a pointee type for functions if it were made essentially an extern type? It's already possible to to cast an fn() to *const u8 so it would only be proper to have it be an actual pointer type. It would be one more feature where Rust could use its type system to deliver a more consistent and useful approach than C ever did.

¹Sidenote: I'm thinking that in a system programming language the number of hoops as to jump through to inspect the machine code of functions should be as small as possible. I don't see a problem with doing so immutably. Obviously introducing such would require its own RFC. Maybe it could dereference to a target specific dst representing the function's/machine's instruction set.. which might also permit dynamically validating functions to not contain unsupported instructions and cast target-specific unsafe functions to safe functions.

11 Likes

Previously:

Personally, I find it a shame this issue was apparently neglected in the rush towards 1.0.

5 Likes