Idea to disallow multiple imports of a `-sys` crate


#1

TL;DR is it possible, if not already proposed, to add a Cargo option so that a crate can only be imported once?

I have seen some discussions around about the problem of being safe when using unsafe external FFI. One major concern is that you write a beautiful -sys crate to wrap a C lib and a higher-level safe crate which uses an internal thread locking system (like a std::Mutex) to make everything work in a safe way. Then someone very silly uses directly some -sys functions inside his crate, and a big project that use both crates suddenly triggers UB without using a single line of unsafe code.

We love ownership and borrowing, why not applying the same concepts to the handling of the crates? I mean, we can say that, with the dependency declarations, a crate is normally available to be referenced through extern crate. What if could have an option inside cargo.toml to say that a crate must be owned using extern crate? Setting this parameter inside a -sys crate, it won’t be possible to use two different crates that need to own it.

Said that, I am quite new with Rust, and maybe something similar has been already discussed somewhere (in this case it is my fault i did not find anything). I am also aware that it is not straightforward to apply this concept on different versions of the same crate (maybe a crate can import and own a previous version of itself?), and obviously there are issues with this idea I am not able to see in this moment. Nevertheless, maybe someone more experienced than me could assess this idea and, if if could be valuable, a pre-RFC could be written.

What do you think?


#2

-sys crates are supposed to use the links field, which ensures that it can only be included once https://doc.rust-lang.org/stable/cargo/reference/manifest.html#the-links-field-optional


#3

Maybe I am wrong, but from what is written in the docs, it is not possible to link the same library from two different crates. However, you are still allowed to use the same -sys crate from two different crates, then nothing stops you from using both these two crates from another crate. Am I wrong?


#4

I think you are correct. links prevents you from having two major versions of a -sys crate in the crate graph, but it does not prevent you from using -sys in several places.

Could you sketch an example of the API that might suffer from this problem? At first sight, it seems like the -sys functions should be unsafe, if they can be used to trigger UB?


#5

It stops you from having more than one instance of that crate. Even if two different dependencies require that crate, even with different versions, there will still be a one version, one copy of that crate (or error if versions can’t be unified).

Rayon-core uses links to ensure there’s only one global threadpool.


#6

You are right, let’s reason on some real code :grin:

useless_and_unsafe.c

static int value = 0;

void
inc_value() {
  ++value;
}

int
get_value() {
  return value;
}

This is a very simple and useless C code for a useless library. As you can see there is a value that is not accessible from outside that can be manipulated using inc_value and get_value. Unfortunately, our friendly C developer forgot to make them thread safe to be used…

Now we can write the -sys crate.

#[link(name = "useless_and_unsafe")]
extern "C" {
    pub fn get_value() -> i32;
    pub fn inc_value();
}

Pretty straightforward.

We also want to write down the safe crate, and people should use it instead of the -sys crate…

#[macro_use]
extern crate lazy_static;
extern crate useless_and_unsafe_sys;

use std::sync::Mutex;

lazy_static! {
    static ref LIB_MUTEX: Mutex<i32> = Mutex::new(0);
}

pub fn get_value() -> i32 {
    let _guard = LIB_MUTEX.lock().unwrap();
    unsafe { useless_and_unsafe_sys::get_value() }
}

pub fn inc_value() {
    let _guard = LIB_MUTEX.lock().unwrap();
    unsafe { useless_and_unsafe_sys::inc_value() }
}

As you can see, we want a Mutex (silly and slow solution) to control that we can only modify the internal value atomically in multithread code.

Fantastic. Now someone else decide in his crate to use the -sys crate directly as a dependency. That’s very bad, because, even if there is a sort of locking mechanism, it will be different from the Mutex we are using. Maybe this developer has his reasons (using parking_lot::Mutex could be a very valid one).

The fact is, if someone is using both my crate and the one from the other developer, it is possible that the inner value is accessed concurrently from multiple threads, just because there are multiple locking mechanisms that are trying to avoid this problem. Obviously failing, because they can control what other code does.

My idea is that, even if the safe crate cannot avoid another crate that tries to do the same thing, it is conceptually possible that the -sys crate is only imported once. In this case both safe crate can compile flawlessly, but a project that includes both will not compile, just because the -sys crate can only be imported once.

I hope I have been able to clarify my idea. If not, I’ll try again, in a (hopefully) better way.


#7

But even with one instance of the same crate, it is possible to use its methods from multiple crates. I just finished writing the explanation of my idea, maybe it is a bit clearer. And if what you are saying still applies, maybe I understood a couple of things in the wrong way.


#8

Ah, I misunderstood what you were asking. Sorry about that!


#9

No problem, if you misunderstood it is probably my fault, I had to explain the idea and the issue in a better way :smile:


#10

They will have to use unsafe for it, because -sys functions are extern "C" { }. I agree that this is problematic, I just think that

suddenly triggers UB without using a single line of unsafe code

is incorrect :smiley:


#11

Oh, I see, you are talking about linking two dependencies into your crate. It is indeed the case that unsafe is hidden in the dependencies, and each one of them is safe in isolation, but together they trigger UB.


#12

Let me express the concept in a bit clearer way – too many crates terms used around :blush:

  • Crate A is the -sys crate
  • Crate B is the high level safe crate for crate A
  • Crate C is another high level safe crate based on A
  • Crate D uses both B and C, just because the developer is not aware of the way B and C are handling the internal data

D can easily trigger UB without unsafe code, just because the safe implementations of B and C are safe on their own but not… urhm… inter-safe?

Obviously, this is a very simple situation that can be easily avoided. However, I am worried when there are two branches of dependencies that are build upon crates B and C independently. The developer that tries to use both branches is completely unaware that two sets of crates cannot be used at the same time. And he cries :joy:


#13

Exactly that! :blush:


#14

Fascinating! This reminds me of http://smallcultfollowing.com/babysteps/blog/2016/10/02/observational-equivalence-and-unsafe-code/#composing-unsafe-abstractions post by @nikomatsakis , but in this case the abstraction does not compose with itself.

It’s interesting that you don’t actually need FFI to expose this phenomenon:

/// This function is unsafe. The caller must ensure that
/// it isn't called simultaneously from several threads
pub unsafe fn sneaky() {
    static X: AtomicUsize = AtomicUsize::new(0);
    if X.fetch_add(1, SeqCst) != 0 {
        ::std::hint::unreachable_unchecked();
    }
    X.fetch_sub(1, SeqCst);
}

I think that this function is fundamentally unsafe: you just can’t write a safe abstraction for it.

Interesting things happen if you try though! You can wrap it in a global mutex, reasoning: “if all other code does not use unsafe to access sneaky, this is safe. If other code uses unsafe, than the bug is in that usage”. Then your doppelgänger does the same, and you both end up in a curious situation when two unsafe blocks mutually blame each other for safety violation! I wonder how the formal definition of what unsafety is allowed will deal with this issue :slight_smile:

We can make this more apparent by reifying global state:

pub struct GlobalState {
    __hidden: ()
}

pub static mut STATE: GlobalState = GlobalState { __hidden: () };

pub fn sneaky(_: &mut GlobalState) { // Note, this is a safe function now
    static X: AtomicUsize = AtomicUsize::new(0);
    if X.fetch_add(1, SeqCst) != 0 {
        // safe b/c we have a unique reference to GlobalState.
        unsafe { ::std::hint::unreachable_unchecked(); }
    }
    X.fetch_sub(1, SeqCst);
}

This API is now more obviously fundamentally unsafe, because one just can not get &mut from globally visible static mut in a safe way.

And, finally, we can make API safe at the cost of one atomic flag:

pub struct GlobalState {
    __hidden: ()
}

impl GlobalState {
    pub fn init() -> &'static mut GlobalState {
        static INIT: AtomicUsize = AtomicUsize::new(0);
        if INIT.fetch_add(1, SeqCst) != 0 {
            panic!("Double initialization")
        }
        unsafe { &mut STATE }
    }
    
}

pub static mut STATE: GlobalState = GlobalState { __hidden: () };

pub fn sneaky(_: &mut GlobalState) {
    static X: AtomicUsize = AtomicUsize::new(0);
    if X.fetch_add(1, SeqCst) != 0 {
        // safe b/c we have a unique reference to GlobalState.
        unsafe { ::std::hint::unreachable_unchecked(); }
    }
    X.fetch_sub(1, SeqCst);
}

I argue that the final version does not suffer from non-composable unsafe problems (that is, the unsafe guidelines/memory model should accept this code). This is because crates form a DAG, so any other unsafe {&mut STATE } will be in a crate that depends on sneaky, and would be trivially invalid. I haven’t relized before that DAG dependency property may play a role in determining if a piece of code is safe…


#15

Very good point indeed!

I really hope that this sort of issue is, in practice, mainly related to FFI -sys crates, because at least it is possible to have some tool to avoid multiple imports of them.

On the other hand, a case similar to the one you showed (not the last one, which is safe… or at least I cannot find any caveat to misuse it) it could be very difficult to detect in case of multiple safe abstractions.

All of this makes me think that the mental model of crate import ownership is not enough. Your example is making me think about the possibility of forcing only one reference to unsafe functions, in order to need only one safe abstraction. And honestly this makes me think to more complex cases that should be handled, as following:

  • As before, crate A is a -sys crate. Let’s go deeper: it has two unsafe function f and g which can access the same internal variable.
  • Crate B only exposes function f in a safe way, because of reasons
  • Crate C does the same, but for function g
  • Crate D uses both B and C

In this case the idea that “a function can have an attribute so it cannot be referred from more than a place” obviously does not work. I will leave this part here instead of deleting it, just to share my thoughts.

What can be said is that a Rust code that is exposing an unsafe API that should be handled correctly from a safe abstraction layer, should have a way to say “hey, I need to be accessed from only one abstraction”. Moreover, with this idea in mind, if there is a set of APIs that are unsafe but can be used from multiple abstraction layers, they should be put in a different crate from the APIs that need only one.


#16

There we are again, at the contextual definition. :wink: Notice that “only the combination of multiple libraries makes unsafe code break” has happened before, e.g. the combination of Rc and thread::scoped (famously known as Leakpocalypse), or the combination of Josephine and nodrop before unions were introduced. That latter thread also contains some discussion of the pitfalls of a contextual definition of safety.

From my POV, both of your example crates are wrong. You may only impose additional invariants or things like a locking discipline if you can actually enforce that nobody else can access the relevant data, at all. That’s why the module boundary and privacy are so important for safely encapsulated unsafe code. A pub static mut is inherently public and nobody can impose a protocol on it. sneaky is wrong for assuming that nobody else accesses STATE.

Still, this is certainly a very interesting example! I had not realized that “only one crate may import this” can be a useful restriction. I think the pure-Rust case is not as interesting as we can blame it on the pub static mut (just don’t make it pub), but once FFI is involved we cannot rely on privacy any more, and such a restriction seems like a neat trick to work around this limitation.


#17

Hm, I would like to argue (in non-strict terms) that my last example is correct, because I can, in fact, enforce the locking discipline :smiley: Let me simplify it a bit, get rid of the static mut, and spell the unsafe contract explicitly:

pub struct GlobalState {
    __hidden: ()
}

pub fn state() -> GlobalState {
        static INIT: AtomicUsize = AtomicUsize::new(0);
        if INIT.fetch_add(1, SeqCst) != 0 {
            panic!("Double initialization")
        }
        unsafe { state_unchecked() }
}

/// Contract: 
///  - calling `state_unchecked` more than once is UB. 
///  - calling both `state_unchecked` and `state` is UB.
pub unsafe fn state_unchecked() -> GlobalState {
    GlobalState { _hidden: () }
}

pub fn sneaky(_: &mut GlobalState) {
    static X: AtomicUsize = AtomicUsize::new(0);
    if X.fetch_add(1, SeqCst) != 0 {
        // safe b/c we have a unique reference to unique GlobalState.
        unsafe { ::std::hint::unreachable_unchecked(); }
    }
    X.fetch_sub(1, SeqCst);
}

Without state_unchecked, everything is kosher. I argue that adding it together with the contract in the comment (which I shouldn’t have omitted the initial version) is correct as well, because state and state_unchecked are in the same crate. The unsafe { state_unchecked() } is correct, because it is the single call to state_unchecked() in crate (protected by atomic), and all downstream crate must ensure that they won’t call state_unchecked as well because that is explicitly stated in the contract. That is, although two unsafe blocks in upstream and downstream crate might be mutually unsound, the upstream crate can unilaterally assign the blame to the downstream crate, because it breaks the “calling both state_unchecked and state is UB” contract.

Moving state to downstream crate changes the picture, because we no longer can have the second clause in the state_unchecked's contract.

I also argue that state_unchecked's contract is weaker than the best contract of all, promoted by unreachable_unchecked, “calling this function is UB”. If you are building the binary application (i.e, you are the leaf crate), and you don’t have any other dependencies, you can call state_unchecked in main and be sure that no one else does this.

The original example with static mut works in the same way, the magic second clause is attached to the contract for accessing the global.

And surely I don’t argue that language must not forbid this code, it might be very reasonable to forbid such shenanigans. I do think, however, that allowing them could be correct (at least, I don’t see any immediate holes in my reasoning).


#18

So essentially you are saying that adding a public unsafe function (with a properly documented contract) cannot break safety of a crate because we can put the entire burden of maintaining safety on the user. In your example, that unsafe function and the underlying contract for elements of type GlobalState live in the same crate, so we can meaningfully locally verify safety.

I cannot find anything fundamentally wrong with it. As you pointed out, the key difference to the situation discussed previously in this post is that everything is under control of one crate.