Rust on Rust FFI


#1

Hi folks,

This post has a few aspects to it. It is partly a “bug report”, partly a question about whether it is even a bug or just not-yet-specified behavior (or “it’s specified as your_fault_fool”), and partly a solicitation for similarly minded folks.

I’m currently doing a bit of work where I believe that what I want is Rust on Rust FFI, to support some REPL-like behavior for differential dataflow. The current mock-up is a static library and server binary defining some common types, with the intent that you write shared libraries using these types to interact with the underlying timely dataflow runtime and then the server loads and unloads them for you interactively.

This type of project doesn’t seem wildly unexpected. In managed environments this is often pretty easy to do, dynamically loading code and interoperating safely, and it seems like Rust is pretty close to being able to do it with some confidence.

There are a bunch of potentially naughty aspects to this, some of which I know about and probably at least as many I don’t know about. For example:

  1. Passing around owned data with any sort of backing memory is perilous, as by default the allocators for dynamic libraries and binary are different (resp: the system allocator and jemalloc). I’ve swapped in the system allocator for the binary, and am working under the (possibly flawed) assumption that if the same allocator is used then either piece of code can allocate/deallocate without risking explosions.

  2. Passing around data of any sort relies on some assumptions about data layouts, and my understanding from #rust-internals is that there are sane assumptions to make if you only use types drawn in from common binary sources (i.e. a binary locks down the layout of types it exposes). Roughly, for the same reason that coherence is required (different crates independently using the same types) there is not much flexibility in re-laying things out. I’m still not 100% on what the guarantee is (or if it is a guarantee, vs a “for the foreseeable future”).

  3. If you ever unload libraries you’d best be sure you don’t need that code ever again. Like, if you return a Box<Any> back to the server, say, and try to use it. >.<

To the extent that there are other folks interested in understanding this, or in articulating some guarantees (perhaps outside “stability guarantees”, just with the goal of putting words to intent), I’d love to hear from you.

The “bug report” part of this is that I currently get segfaults (at runtime) when using workspaces to link the building of the library, server binary, and example dylibs together. If I unlink them and build separately, the segfaults seem to go away (though difficult to know if they are just deferred, because I am not sure whether what I am doing is intended to work or not). My guess is that workspaces are meant to be very similar to building each of the projects, and if there is a different result that is interesting, but if it is segfaulting then it is probably UB and so maybe not out of spec in any way.


#2

The “coherence” guarantee we make, is that if all the types are “available” (i.e. you can write them in) some base .so library, and you link all of your crates against the same .so library, then the types will be laid out the same way.

I don’t know cargo, so I don’t know what workspaces do. You are probably linking libstd several times, which breaks things.


#3

More details on the segfault vs non-segfault issue with workspaces.

I’m working off of the server branch of https://github.com/frankmcsherry/differential-dataflow/, which at least for the moment exhibits the following behavior:

git clone https://github.com/frankmcsherry/differential-dataflow/
cd differential-dataflow
cd server
git checkout server
rustup override set nightly-2017-11-29
cargo build --release
cd dataflows/random_graph
cargo build --release
cd ../..

Now if we spin up the server binary and attempt to load the random_graph dylib, we get the intended non-segfaulting experience:

Echidnatron% cargo run --release
    Finished release [optimized] target(s) in 0.0 secs
     Running `target/release/server`
load dataflows/random_graph/target/release/librandom_graph.dylib build <graph_name> 1000 1000 1000
worker 0: received command: ["load", "dataflows/random_graph/target/release/librandom_graph.dylib", "build", "<graph_name>", "1000", "1000", "1000"]
handles set

If we repeat the process from scratch, but with edits to Cargo.toml uncommenting the workspace binding,

git clone https://github.com/frankmcsherry/differential-dataflow/
cd differential-dataflow
cd server
git checkout server
rustup override set nightly-2017-11-29
%% edit Cargo.toml to uncomment workspace links.
cargo build --release --all

when we run things we get a less awesome experience

Echidnatron% cargo run --release
    Finished release [optimized] target(s) in 0.0 secs
     Running `target/release/server`
load ./target/release/librandom_graph.dylib build <graph_name> 1000 1000 1000
worker 0: received command: ["load", "./target/release/librandom_graph.dylib", "build", "<graph_name>", "1000", "1000", "1000"]
handles set
zsh: segmentation fault  cargo run --release
Echidnatron%

The handles set line is produced in the dylib, so we are landing in there in both cases, but something horrible is happening along the way.

As Ariel mentions, std is probably being linked statically, and I have no clue whether this does a bad thing in this case or not. The librandom_graph.dylib was 825,632 bytes with workspaces, and 2,291,824 bytes without, so something fundamentally different seems to be happening beyond exploding (do workspaces institute a -C prefer-dynamic?).


#4

In terms of “just getting this to work” one of the trickiest pieces may be the standard library, where especially if you’re sharing ownership across dynamic library boundaries you’ll want to be sure that libstd is itself linked in dynamically. That runs into the world of “the dylib crate type is full of bugs and crazy hard to get right” which isn’t always that fun to deal with…

IIRC Cargo does have logic for -C prefer-dynamic if you’re the dylib crate type and not the “main package” where the idea of a “main package” can change a lot depending on workspaces. That may help explain the loss in size there? It would also mean that libstd may or may not be getting shared depending on how the server itself is implemented.

In general from what I’ve seen you’re sort of in “you’re on your own” territory in this sort of plugin architecture. Rust should work just fine but there’s lots of “unknown unkonwns” in a sense which may impede progress and require structuring the binaries/linking a bit differently. For a more bullet-proof experience I’d recommend IPC w/ serde or something like that, but it’s obviously much more difficult to implement sometimes!


#5

Thanks alex! (and ariel!),

I think the IPC w/serde option is a non-starter in this case (there is fundamentally shared state, rather than light chitchat, and thread-local scheduling).

I just tried out the non-workspace setting as a cdylib, and there was a substantial reduction in size (down to 724,672 bytes, so less than the workspace case). And no segfaults, so yay? I have to admit being very unclear on how this all works (does the binary I build also link dynamically against std; shouldn’t it; how do I make that happen).

It seems to me like this is something that really could be helpful to sort out. Do you have a sense for whether it is (i) todo, (ii) never again, or (iii) over your dead body? Or perhaps something on the spectrum from “you’ll figure it out if you keep trying and ask questions” to “this breaks whenever we do anything and will only bring tears”.

In searching I’ve seen a few posts (e.g. https://users.rust-lang.org/t/dynamic-linking-of-sharedlibs-with-cargo/4756/10, but this was only the most recent) and it seems there are others keen for something similar. Do you (Mozilla) not have these use cases, or do you just have enough local expertise that you can make it work even if it isn’t 100% clear how it should work to others?


#6

Ah interesting! The cdylib and dylib crate types are subtly different wrt symbol visibility and defaults in Cargo, so that may be what you’re seeing here perhaps? In general though a lot of this boils down to what interface you’d like to have. If you’re working exclusively through a C API (e.g. unmangled symbols etc) then cdylib is probably the way to go (especially if you can keep ownership within a shared object). If you, however, have objects flying all over the place it gets a lot trickier because for a guaranteed-to-work situation you’ll need to make sure the appropriate pieces are all shared (dynamically linked) depending on the definition of “appropriate” for your use case.

Overall I think this is definitely along the lines of situations we’d like to work in Rust (or at least I’d like to see working). Now that being said there’s a broad spectrum I think of to what degree dynamically loadable plugins should work. On one end you have the compiler which mostly works but has caveats (you can’t ever unload anything) and on the other end you have a clean/workable but difficult to write system with #[no_mangle] and totally isolated shared objects (no ownership transfer between them).

AFAIK Gecko doesn’t do much in the area of dynamic loading, but I could be wrong! I’m at least not personally aware of any folks doing this sort of stuff internally.

It seems to me like this is something that really could be helpful to sort out. Do you have a sense for whether it is (i) todo, (ii) never again, or (iii) over your dead body?

At least for me personally (assuming that “it” is the whole system here) I’d have to dig in a lot more to understand what’s going on to be able to say how aligned it is with something that I might at least expect to work. From the descriptions here it’s not quite enough unfortunately :frowning: (although that’s more on me for not digging into more code, not on you!)


#7

Unloading dynamic libraries is always a “you’re on your own”-type thing. IIRC musl don’t even support it.