Lt<'a> lang item (because we don't have DerefMove) - lifetimes for fn!

We have found this neat hack to apply lifetimes to fn:

use core::marker::PhantomData;

#[repr(transparent)]
struct Lt<'a, T=()>(pub T, pub PhantomData<&'a ()>);

// impl DerefMove for Lt<'a, T> = T

struct Library {
    func: extern "C" fn() -> u32,
}

impl Library {
    fn get<'a>(&'a self) -> extern "C" fn() -> Lt<'a, u32> {
        unsafe { std::mem::transmute(self.func) }
    }
}

extern "C" fn foo() -> u32 {
    10
}

fn main() {
    let lib = Library { func: foo };
    let lib_item = lib.get();
    drop(lib);
    lib_item(); // error!
}

This has a wide range of uses, from libloading to libffi to whatever! However, Lt is inconvenient because you'll be putting .0 everywhere. It'd be nice if this was a lang item so it could be Box-like and have DerefMove semantics (or better).

This requires introducing no new syntax to make fn have lifetimes, which is awesome! It doesn't even strictly require being a lang item, as that's only really necessary for the DerefMove semantics. But anyway, thoughts? Should this just be an external crate and have no DerefMove semantics? We guess nothing prevents it from just being an external crate. Ah well. .-.

(not a proposal, just excited to share this.)

3 Likes

Why not just put the entire function in Lt and then impl Deref?

#[repr(transparent)]
struct Lt<'a, T=()>(pub T, pub PhantomData<&'a ()>);

impl<'a, T> std::ops::Deref for Lt<'a, T> {
    type Target = T;
    
    fn deref(&self) -> &T { &self.0 }
}

struct Library {
    func: extern "C" fn() -> u32,
}

impl Library {
    fn get<'a>(&'a self) -> Lt<'a, extern "C" fn() -> u32> {
        Lt(self.func, PhantomData)
    }
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e522edff1f3806c729c50acde5a9ef83

Edit: Hang on, this has an obvious flaw with making T pub. One second while I fix this...

Edit 2: Okay, this is actually a fundamental problem. fn implements Copy so anything that gives access to a &fn() -> T can be used to bypass the lifetime.

1 Like

That's why this hack is so important for unsafe code (like libloading or libffi): it solves that problem. (or at least, it appears to.)

I don't know this needs to be a lang item - rather it could simply be a type that impls Fn{,Mut,Once} if the inner type is a safe function pointer type. It would still have to be implemented in the standard library if at all, since fn_traits are unstable. However, for completeness, we'd also need an UnsafeFn{,Mut,Once} trait, so that you can call Lt<'...,unsafe extern"C" fn()>.

DerefMove won't help, since you can (by design) move out of DerefMove, and the point of Lt as I understand, is to forceably bound a value by a lifetime even if you can copy it. Likewise, get would have to be an unsafe get_unchecked even if it returns a reference (and thus you can't implement Deref at all).

4 Likes

One possibility, is we could have an Unbound auto trait (idk how viable this is, though) that is unimplemented for raw pointers (for the same reason raw pointers don't implement either Send or Sync) and fn-ptrs (then reimplemented for Box etc.), so you could have a safe Deref{,Mut,Move} for Lt<'a> that implements Bound. Formally:

/// SAFETY:
/// A type T that explicitly implements this trait warants that it either does not contain non-lifetime bound pointers (including non-'static pointers) to non-'static data, or that it otherwise enforces the non-'static lifetime 
#[lang="unbound_trait"] // Add fn-ptr impls
pub unsafe auto trait Unbound{}

impl<T: ?Sized> !Unbound for *mut T{}
impl<T: ?Sized> !Unbound for *const T{}
// Added by lang-item above
impl<Args..., R> !Unbound for unsafe? extern<abi> fn(Args...)->R{}

// SAFETY: Lifetime of the reference is enforced by its own type
unsafe impl<'a,T: ?Sized> Unbound for &'a T{}
unsafe impl<'a,T: ?Sized> Unbound for &'a mut T{}
// SAFETY: The Lt<'a> type enforces the lifetime and does not allow safe access to an inner value that isn't Unbound
unsafe impl<'a,T: ?Sized> Unbound for Lt<'a,T>{}

unsafe impl<T: ?Sized+Unbound,A: Unbound> Unbound for Box<T,A>{} // Vec<T>/String/etc.
unsafe impl<T: ?Sized+Unbound, A: Unbound> Unbound for Rc<T,A>{} // Arc<T,A>/etc.

#[repr(transparent)] // Transparent over T
pub struct Lt<'a,T: ?Sized>(PhantomData<&'a ()>,T); // T needs to trail the type to be `?Sized`
impl<'a,T> Lt<'a,T>{
    pub fn new(x: T) -> Self;
    pub fn into_inner(self) -> T where T: Unbound;
    /// Safety:
    /// Unless T: Unbound, it is only guaranteed that it and any pointers it contains will be valid for up to 'a - using it outside that lifetime may cause undefined behaviour
    pub unsafe fn into_inner_unchecked(self) -> T;

    pub fn upgrade(self) -> Lt<'static,T> where T: Unbound;
}
impl<T> Lt<'static,T>{
     pub fn into_inner_unbounded(self) -> T;
}

impl<'a, T: ?Sized> Lt<'a,T>{
    pub fn get(&self) -> &T where T: Unbound;
    pub fn get_mut(&mut self) -> &mut T where T: Unbound;
    pub unsafe fn get_unchecked(&self) -> &T;
    pub unsafe fn get_mut_unchecked(&mut self) -> &mut T;
    // Other methods
}

impl<'a,Args...,R> FnOnce<(Args...)> for Lt<'a,extern<abi> fn(Args...)->R>{}
impl<'a,Args...,R> FnMut<(Args...)> for Lt<'a,extern<abi> fn(Args...)->R>{}
impl<'a,Args...,R> Fn<(Args...)> for Lt<'a,extern<abi> fn(Args...)->R>{}
impl<'a,Args...,R> UnsafeFnOnce<(Args...)> for Lt<'a,unsafe extern<abi> fn(Args...)->R>{}
impl<'a,Args...,R> UnsafeFnMut<(Args...)> for Lt<'a,unsafe extern<abi> fn(Args...)->R>{}
impl<'a,Args...,R> UnsafeFn<(Args...)> for Lt<'a,unsafe extern<abi> fn(Args...)->R>{}

Basically, Lt (which possibly should be Bounded or something else to be bikeshedded rather than the rather short, terse, and ambiguous Lt) would act like Pin, in that it only lets you access the inner value if it implements an auto trait. unsafe code that manages lifetimes in special ways could then return an Lt<'a,fn-ptr> and know that it is safe to move arround for 'a, without exposing the otherwise 'static bound fn-ptr to safe code that could use/call it outside of 'a.

3 Likes

using impl Fn() -> u32 + 'a works, but won't be able to deal with unsafe functions

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=cda5fc3f59d368b2b46e3ed3699a9d91

1 Like

None of this works in the general case where the library provides &'static access to static data. In that case, you can just copy out the &'static reference, unload the library, then use the &'static reference and use-after-unload.

The transformation to make a library safe to unload is subtly tricky, and at a minimum requires all types to have an added 'lib lifetime (and potentially more that I've overlooked). This includes purely Copy data with no mention of lifetimes -- consider a custom handle that stores ptr::NonNull to static data.

While I was previously a fan of "it should've been &'static fn, the pervasiveness of issues with static data makes me feel that dynamic unloading of native code is not a problem worth trying to solve for Rust. Use process isolation and/or other other sandboxing techniques for code you need to unload.

3 Likes

The point of Lt is to bind the return value of any function type by an external lifetime.

What's an external lifetime? An external lifetime is a lifetime that is not part of/internal to the type, i.e. it's not a for<'a> lifetime. If you look at the code above:

    fn get<'a>(&'a self) -> extern "C" fn() -> Lt<'a, u32> {

Note that the <'a> is in the get, not the extern "C" fn(). The hack is that Lt is ABI-compatible with any return value, yet it carries a lifetime, so for any arbitrary fn you can always make its return value an Lt.

This binds the bare fn, which is safe to Copy, to the lifetime of the &'a self, by introducing &'a self's lifetime to the Copy type itself, instead of wrapping the lifetime around the Copy type (which is unsound).

Remember: &'a Foo is safe to Copy, but nobody has ever complained about it being a problem. That's because the lifetime is part of the reference, altho it isn't part of Foo, so you can freely copy the reference and still get borrowck complaints. If Foo is copy, you can copy out the whole Foo and ditch the reference but that's something else entirely. Importantly, fn are akin to &'static Foo, but only vaguely so - there are ways to make fn akin to almost &'static Foo<'a>, which requires downgrading(?) the 'static to 'a!

The only drawback is that Lt is a newtype. E.g.

fn foo(x: u32) {
}
fn main() {
  let library = Library::new("something");
  let x: fn() -> Lt<'_, u32> = library.get();
  //foo(x()); // error: Lt<'_, u32> isn't u32
  foo(x().0); // this works, and x is bound by library's lifetime.
}

So even tho you can trivially copy/move out of Lt - by using .0 - it's still safe, because Lt isn't concerned with wrapping the function.


@CAD97 NonNull is fine - it's unsafe!

1 Like

There is also the fact that some platforms don't support unloading libraries in all cases. AFAIK, macOS does not allow unloading libraries which have created/accessed/something with thread-local storage. Once TLS is used, the library is only unloaded when exit() is called.

1 Like

It also depends on the code in the library itself, whether or not it expects to be unloaded. I've experienced crashes from libloading unloading data that had registered destructors at exit, or otherwise passed data around.

One notable point that argues against Rust trying to combat this is that good support of it from the perspective of the code intended to be loaded as a library is... difficult at best, possibly impossible. For example, most the properties normally assumed about the 'static lifetime does not apply.

Anyway, my personal opinion is that libloading should have not implemented a Drop for the library types which unloads them, and should have provided an unsafe fn to unload the library (and perhaps an unsafe fn unload_on_drop to get the current auto-unload behavior). As it is, it seems like a massive footgun to me, given that the resource leak here is so unlikely to cause a problem.

My general recommendation to most users is that if you didn't write the library in question, don't unload it unless you really need that memory back. It's much safer.

4 Likes

A hairy situation is if two libraries register callbacks into each other. Unless Rust is going to deny that and libraries can't have "backwards edges" into other libraries loaded after main. The parameters to functions then must also be limited through 'load because otherwise if a 'static can be passed through, it could be stashed in some transitive library somewhere because the code of the crate isn't going to know that, at runtime, it can only deal in values that live as long as 'load because otherwise it could stash it into a 'static API. Note that if you load two libraries a and b which both use c, c is actually limited to the longer of the two, so b's view into c is not at all consistent.

There's also the fact that lifetimes don't exist at runtime.

There's an interesting problem here, but:

  • I don't think Rust needs to solve it (no other language has attempted it at the language level AFAIK)
  • the easy (and obvious) solution is to punt and say that unloading is unsafe because it fundamentally is on any platform without Rust tracking lifetimes at runtime as well

Agreed. It's analogous to Thread performing join on Drop.

Our take is that libloading should take a library definition file somewhere, and cleanly wrap it so there can't be misuse. This means introducing phantom lifetimes wherever necessary, such as all function return values (to make the function carry the lifetime even across Copy), and potentially many structs. It could also actively prevent some constructs from being safe to dynamically load and use.

(As for the atexit issue... That's a Rust problem. It uses that for thread local storage. And while we're at it, yes, libloading should prevent binding libraries that use threads, unless they use a strictly scoped threading system. Ideally you'd be able to tell Rust to disable spawning threads... Anyway, luckily atexit just means your program crashes (or potentially ACEs) when it's already quitting, which isn't a big deal. Unfortuately there's no real way to unatexit on current OSes - altho maybe Rust could support using library unloading instead of atexit, and libloading would hook atexit to "pass through" the atexit to libraries by unloading them.)

Couldn't rust use __cxa_atexit (on platforms that support it, using the correct token value) instead of atexit? Could also use equivalents on other platforms - AFAIK, all platforms that support library unloading solved the issue with C++ static/thread destructors, which is the same fundamental problem here.

It seems __cxa_atexit just prevents the library from unloading! Thread locals keep Rust shared library from unloading `dlclose`. · Issue #59629 · rust-lang/rust · GitHub

Arguably this is a libc bug rather than a rust bug. ah well.

In principle this could be handled like any cyclic reference between two data structures: with reference counting and weak references. The sharedlib crate has FuncArc, an object that represents some specific function from a library, but also keeps a reference count to the library as a whole. You could imagine an analogous FuncWeak type, although the sharedlib crate doesn't seem to actually implement one. Then library A would use FuncArc to store its callback to library B, while library B would use FuncWeak to store its callback to library A.

Of course, without language support, there's no way to verify that libraries are actually using these types.

Our second take is that unloadable libraries should bring their own dependencies, and the application (or parent library) should be responsible for interfacing the various unloadable libraries safely. Basically dlmopen.

You do realize that such requirements effectively means that no library will be unloadable in practice, right? It certainly should never be assumed as the default given an arbitrary library at least.

I'll also note that it is completely unworkable for things like interpreter modules because there, the "dependency" is injected by the loading interpreter. For example, Python modules do not link to libpython because the CPython API is expected to be provided by the interpreter itself (either by it loading libpython with symbols available globally or having libpython statically linked in as is done in Anaconda and such).

And as I said above, this is a laudable goal, but the platforms need to support this kind of stuff first. Rust can't fix it because the only answer there will be "doesn't work as intended on your platform for $platform_specific_reasons". It is far easier (and realistic) to just assume that libraries are not unloadable. There are just too many edge cases to deal with in practice to make it viable today.

Also, dlmopen has ridiculously low resource limits. There's something like a whopping 16 available namespaces with glibc last I checked (but I don't see mention of that anywhere in the docs, so maybe my memory is faulty or outdated).

3 Likes

Resource limits aside, dlmopen is awesome and more platforms should support it. ^-^

Besides, this thread isn't about rustc so much. It's mostly about libloading and other unsafe code.

Also, on the python example, pretty sure unloading python native modules is unsound, at least for some modules. In general you're not even supposed to unload pure python modules. So yes, you're right, it is unworkable in the context of unsound code. Hexchat handles this pretty nicely in our view, by not using actual linking at all, only (un)loading. (Unfortunately still has issues with threads which we're not sure how you'd deal with, even assuming platform support. Altho granted, miri does fault on return-from-main-while-threads-are-still-running, at least.)

You're asking for a lang item though for a use case that Rust cannot guarantee anything about because platforms don't provide the mechanisms to do so. I don't see how adding a special language-wide lifetime modification system (at runtime, to a language that doesn't have the concept of lifetimes at runtime), without a platform which it can actually be modeled after or made safe on makes any sense at all right now.

It wasn't about Python in particular, but for systems which provide an API through the linker to loaded modules. If you want the kind of isolation it seems that you're asking for, I would think that loading WASM modules where you provide your API through an object injected into the loaded module's environment (basically a context object of some kind) wouldn't be better than native code. Basically, safe unloading and native code are just oil-and-water AFAICS. Some choose native code and take the "poison pill" of not being able to remove bad modules at runtime. Others choose safe unloading and restricting what the modules can do (e.g., DRM modules in browsers).

2 Likes

We're asking for a lang item that can be automatically coerced from/into, and carries a lifetime. Rust is not meant to guarantee anything about it other than being automatically coercible and carrying a phantom lifetime. Even then, we're merely suggesting it should be a lang item, because you can still use such a type (if you write one yourself) quite well without it having automatic coercions.

Specifically, it would enable things like:

fn foo() -> Lt<'a, u32>;
let x: u32 = foo();

where you'd otherwise (i.e. without it being a lang item) write:

let x: u32 = foo().0;