The topic will be the function types of the form fn(T) -> U
. These are quite special, and in a sense inconsistent with other parts of the type system design, which I claim is a root cause of at least one popular soundness bug in ffi
.
In many documents they are referred to as function pointers, including the reference itself, but it might be much more appropriate to speak of them as function references in Rust's terms. There are a number of reasons for this.
- They are assumed to be non-null. In particular this implies that
Option<fn()>
uses (and is guaranteed) the layout optimization that assigns the representation0
to theNone
variant. - They are always 'dereferencable' in the sense of being allowed to call them, which in most RAM machine target architectures executes a call instruction that jumps to the pointee.
- On the topic of them being references in the sense of 2. they are still always assigned a
'static
lifetime, can be unconditionally part of constants and they areCopy
. - Function pointers are
Send
andSync
whereas usually raw pointers are not.
Note that there is a particular inconsistency bothering me that is at odds with the pointer kind as a whole. There is no type that one could be ascribed to their pointee. It would be completely fine if that were an opaque DST in the likes of OsStr
¹. Maybe this feature is dependent on extern types as it might be wise to not allow any query of its size
at all? But this means one can't construct a raw function pointer, and for example parameterize a type such that it can store both pointers to functions as well as other types. This appears to me to be a major hinderance for the ergonomics of dynamic loading implementations.
Finally, I said there were bugs cause by this. The first class is those where a translation from C
or even a C-compatible api uses the direct translation of function pointers to fn
and assumes these to be zeroable. This then leads to UB (Examples: 1, 2). This has recently become a lint when it occurs in constant context which finds at least a couple instances of these bugs.
Secondly, it becomes impossible to define functions that do not have static lifetime, such as those loaded from dynamic libraries that might be unloaded at another point in time. This means libraries such as libloading
are simply generally unsound—or at the very least make it clearly hard to use them soundly–even they could perfectly validate the correct typing of loaded functions. There might be a possible, weird workaround by hiding it behind a dyn Fn()
trait object instead, which would neither be Copy
nor implicitly assumed 'static
, but this feels generally very clunky. And here once again the non-existence of the pointee type becomes relevant, albeit not a large issue. It is not possible to turn the original function type into a dynamic trait object but only a & fn()
which requires the function pointer itself to be stored with the lifetime. Fortunately this exists in the symbol table of the usual dll format already, and libloading
stores the pointer itself, so it might not be an issue in practice. I honestly don't have enough experience with the implementation of libloading
itself to tell for sure.
Some very, very rough ideas for remedies:
- Introduce an optional lifetime parameter for
fn
types:fn 'lifetime (T) -> U
. I think this is somewhat of a band-aid fix but it allows naming the function type with the correct lifetime, which might fix the immediate soundness concern oflibloading
althought it would require a specialized type just for functions to correctly name the type while being dependent on the lifetime of the loaded library itself. In particular this would not yet permit a sound, consistentDeref
impl to symbols loaded in such a manner. - I wonder if it would be possible to introduce a pointee type for functions if it were made essentially an
extern type
? It's already possible to to cast anfn()
to*const u8
so it would only be proper to have it be an actual pointer type. It would be one more feature where Rust could use its type system to deliver a more consistent and useful approach than C ever did.
¹Sidenote: I'm thinking that in a system programming language the number of hoops as to jump through to inspect the machine code of functions should be as small as possible. I don't see a problem with doing so immutably. Obviously introducing such would require its own RFC. Maybe it could dereference to a target specific dst representing the function's/machine's instruction set.. which might also permit dynamically validating functions to not contain unsupported instructions and cast target-specific unsafe
functions to safe functions.