Maybe Rust should have function references, too

what??

as far as i can tell, an Oxford cast just... isn't a thing??

Without breaking the current syntax, fn() + 'a isn't the simplest solution. The simplest solution would be a way to manually implement the Fn family of traits (which AFAIK already exists, albeit unstable), and solve the problem via library.

1 Like

An Oxford cast is a cast from function pointer to data pointer. The name may or may not be a bit of a joke, as in Oxford casts making Harvard architectures upset.

I'm unconvinced that there really is a need for function pointers with lifetimes. The primary motivation is usually shared library unloading, but unloading is terrible and has all sorts of problems. It is therefore best avoided in all cases that's possible (I think it's a huge footgun that the libloading crate does it on Drop). If the library registers any sort of global (like an atexit handler or even just simply a TLS destructor (though glibc has some means to guard against this, I think it just refuses to unload libraries work TLS destructors?)), bad things will happen. This is why musl libc implements dlclose as a no-op, showing that it should not be done.

While there may be use cases that genuinely can't leak opened shared libraries (I hope the libraries you open don't have TLS destructors then), then you just have to be careful (and a Library::close should very much be an unsafe function with the precondition that you've stopped using all references. I think this use case is sufficiently rare that it does not need language support.

"Just be careful" is what Rust is trying to avoid.

Are there problems with unloading that a well-behaved Rust program couldn't avoid? I know that an arbitrary library could be infinitely messy, but if you have a controlled plugin implementations, then it's not too bad? It could also be useful for JIT functions?

I assume that &fn() today isn't used much (at least not deliberately), so code could be easy to migrate automatically across editions.

9 Likes

The main thing that makes unloading "fun" is the fact that it can be quite inconsistent across platforms.

In order to unload a library fully soundly and across platforms, you must ensure at least that:

  • No references or pointers to data owned by the dylib are accessible outside of the dylib.
  • No references to functions exported by the dylib are accessible outside of the dylib.
  • No global symbols outside the dylib link to a symbol exported by the dylib. (Static version of the first two dynamic constraints.)
  • No running threads are owned (were started by) the dylib.
  • No running threads have stack frames sourced from the dylib.
  • No global destructors are owned by the dylib.
    • No TLS object owned by the dylib has a nontrivial destructor.
    • No C++ global owned by the dylib has a nontrivial destructor.
    • No c::atexit or c::at_quick_exit handlers are installed by the dylib.
    • No platform-specific destructor or unload hook are used by the dylib.

So, what would the minimum viable way to create an unloadable plugin safely with Rust be?

  • A new target so std can be different in plugin mode.
  • The new target needs to be explicitly supported by library crates, to avoid existing assumptions that FFI is sound without complying with unload safety rules.
  • No unscoped thread spawning in std.
  • thread_local! requires needs_drop::<T>() == false and/or change the platform specific behavior to leak any TLS in the plugin.
  • No implicit global ZST handles (e.g. alloc::Global) are allowed to cross the dylib boundary.
    • Do you enjoy trait UnwindSafe? Now introducing: unsafe trait DylibSafe; all the same issues, but this time it's unsafe in a terrifyingly subtle manner!
  • The dylib loading API ensures:
    • All loaded functions borrow from the dylib handle and won't unload it until all function refs are dropped.
    • All dylib provided data is poisoned with the 'dylib lifetime and cannot be made 'static unless it's safe to copy and contains no "covered" (hidden) lifetimes.
      • Another new unsafe autotrait! Just what I wanted!
      • I think it's required to introduce a covariance trait so that the bridge can weaken &'static to &'dylib.
  • And we still just have to assume that no linker weak symbol tricks are used that could permit symbols to link to exports in the dylib from outside of the dylib.
    • I don't know if these exist, but I have been burned before.
  • OR: give up and prevent unloading by cloning and leaking the dylib handle on dylib initialization.
    • Breaks reloading by way of unload+load (that's a terrible API in the first place, why would you do that).
    • Perhaps Live++'s techniques can improve things?

In short: it's a mess and involves removing access to anything and everything that hasn't been audited for unload safety, since that's a much stricter constraint than the safety constraint that existing APIs uphold.

IO safety caused some fallout[1] already, despite being a much smaller scope in what Rust+std claims "unsafe provenance" over. Unload safety would be so much worse to attempt to guarantee.

Is it worth discussing what it would take? I think so. Is it worth enabling incremental improvements, like lifetime scoped fn()? Rust's thesis is that unsafe can and should be encapsulated to a smaller proof surface, so I think so. Is sound unloading possible using the current platform interfaces? Not plausibly for Rust IMHO. We're low level and allow too much.


  1. E.g. the nix crate exposes close(2) as a safe function, despite this violating the "IO safety" ideals and being unsound (library UB) to close std's added concept of an "owned file descriptor" unexpectedly. ↩︎

So lifetime on functions would help a lot to prevent unloading when threads/functions are still running, because the functions would have to be borrowed from the library. If unloading required exclusive ownership of dylib's handle, it would naturally require use of scoped threads.

static globals and TLS are tough, and something else would be needed for them (maybe just a #[deny(globals)] lint).

3 Likes

There's also threads spawned from the "loaded" side. I don't think a #[deny(globals)] lint would solve the issue because it would need to be applied to dependencies and the stdlib too.

What could be done on the loaded side would be to make every function and static have some existential 'dylib lifetime that is not 'static (kinda like what is done for #[thread_local]). Then thread::spawn would not work because the closure passed to it would never be 'static, and the library would not be able to store 'static references to statics or function pointers because those will not be 'static. For this reason storing drop functions for thread locals would also be invalid, because those would have to require 'static. This would however be a massive change, although restricted to this hypothetical new target.

I don't think anyone has mentioned a stable workaround: The validity of function pointers with arguments or returns that contain non-higher-ranked lifetimes are limited by said lifetimes.[1] So you can use a fn(Arg, [Invariant<'lifetime>; 0]) or such.


  1. Except when not, which is unsound. ↩︎

3 Likes

that only works if the function has that signature in the dylib or JIT code, since iirc Rust hasn't declared you can cast e.g. extern "C" fn(u8) -> i8 to extern "C" fn(u8, PhantomData<&'a mut &'a ()>) -> i8 and calling the result works

2 Likes

This makes me think of the new Tracking issue for unsafe binder types · Issue #130516 · rust-lang/rust · GitHub experiment for having a general way to erase a specific lifetime then have an unsafe way to put one back. Being able to do that for function pointers does seem quite reasonable, for similar "doing it with 'static is really error-prone" reasons as it's useful for other things. (Albeit not as necessary.)

I wonder if whatever we do here could also help with the fn-vs-Fn confusion. Like if we have extern types, we could have &'static FnData(…) -> … be a thin pointer. And then conceptually fn would just be a type alias.

1 Like

Do you have an alternative to hot-reload native code? WASM or some other JIT system is much nicer to work with, sure, but if the idea is "I load native plugins" combined with "I let you work on a plugin without losing state (given some app-specific way of externalizing state from the plugin)" I can't see any other realistic option... perhaps playing loader yourself let's you do something clever?

This is hardly an unusual expectation either; real world applications like Unreal Engine and (I believe) audio workbenches support this.

2 Likes

Run each native plugin in a separate process from the main application, communicating via socket pairs and/or shared memory. Then any plugin can be reloaded by restarting that plugin's process.

This architecture has so many other advantages -- plugins cannot crash the main application, you can sandbox them independently from the main application, you're forced to design and commit to a stable plugin API, etc -- that I would recommend any new application use it as its initial design for plug-in modules, and consider moving them into the main process only if the communication overhead of out-of-process plugins proves to be an intractable bottleneck.

6 Likes

And in that situation, as a last resort, you could make the entire application itself reloadable. Export the state and handles to any native resources (e.g. open network connections) that are feasible to move over, exec() into a new binary, rehydrate the state and reintegrate native resources into libs that are handling them.

1 Like

You can:

  • only load new code without unloading the old one, accepting the memory leak;
  • accept the fact that hot-reloading is unsound but do it anyway, knowing that the application may not behave in an expected way.

I don't believe you can reasonably make hot-reloading safe, so you will have to pick some poison.

Huh, I already do exactly that for JavaScript, weird that I'd have such a block on figuring out you can do it for native code!

1 Like

In the end, you don't even need plugins at all, just standalone applications that you run with std::process::Command and communicate with via stdin/stdout.

My main thought wasn't even about hot reloading, but just reloading the same plug-in. Unload+reload is a theoretically decent way to reset stare, but an actual reset in the API is genuinely nicer and doesn't stop working because something prevented the unload from occurring.

For semi-hot reloading, disable the old plugin (but leave it loaded), load and enable the new plugin. Ideally the plugin has some way to persist/load state as part of the plugin API.

UE5 uses Live++. It works great most of the time… as long as you don't edit any headers. As soon as you modify any headers it's a complete potshot gamble as to whether reloading works.

UE also basically replaces the entire standard library with their own tech that integrates with their module loading system (including explicit startup/shutdown and fast single-sysyem de/serialization). It's sort of a best-case control-the-world scenario, with everyone agreeing to use the same dynamically linked stable "core" libraries.

The coolest bit is that this doesn't even have to be directly apparent to the plugin. The plugin can be loaded into a "plugin host" binary that provides all of the IPC needed to communicate back to the main process.

Audio plugins being the canonical example of "soft real-time" processing where marshalling overhead eats problematically into the time-slice deadline.

When I was working in UE, we had a whiteboard tracking how many times UE crashed. I was DQd because being the only one messing with native modules meant I had sometimes order of magnitude more crashes in a day than anyone else.

So yes, this seems to be the reality even in "robust" industry hot-reloading systems.

1 Like

So what about a real world use case of this: loading and unloading Linux kernel modules?

I have done this in C when I worked on a driver and didn't want to reboot the computer every time. This has much more overhead than restarting a program as well. What is the story about making that safe?

maybe rust needs something where you can have a global lifetime other than 'static where you can limit the lifetime of all code and data inside a crate. kinda like:

// lifetime implicitly passed to all dependencies, all dependencies
// are required to also have a crate lifetime
crate 'crate;

static FOO: u8 = 0; // FOO now lives for 'crate instead of 'static

// same thing for consts, string literals, and anything else that would
// normally implicitly promote to or be 'static
const BAR: &'crate str = "";

// lives for 'crate, not 'static, can cast to `fn 'crate(u8) -> i8`
pub fn baz(v: u8) -> i8 {
    v as i8
}

// you can still explicitly mention 'static though, since you can get references to stuff from the host program and still need to know it lives forever (outliving plugins), or contains no lifetimes

pub struct S<T: 'static>(&'static T);
4 Likes