When can we have rdylib?

rdylib is a hypothetical compiler option to create a Rust binary that can be loaded and unloaded at runtime by another Rust program. It will communicate with the host program via a Rust native ABI. The goal is to ensure that if both program are proved safe, and the compiler/loader checked that the ABI constraints are fulfilled, when they are working together, the system is still safe.

Today, Rust still not able to dynamically link two binaries written in Rust, except from unsafe C ABI.

This is the biggest obstacle towards replacing all unsafe C programs with Rust. The Redox OS team has been forced to use static linking from the beginning. Recently, I just saw a Gentoo blogger blamed Rust for its lack of ability to dynamic linking.

It would not be possible to ship native dynamic linking in 2021 edition. However, if we do not start working now we would not see it in 2024 edition. So I really want to see it become the first priority for the post-2021 development.

The following is some thoughts about how to achieve this.


Apart from the variants of C ABI, the only popular enough alternative is the COM ABI which is Windows only. I haven't dig into the relatively new Swift ABI and have no comment on it. But I am familiar with the COM ABI, which is based on passing Arc<RefCell<Box<dyn Any>>>(or Rc if using single thread appartments) everywhere in Rust's point of view. In the day the COM ABI was invented, dynamic dispatch was started to be popular thanks to SmallTalk, C++ and OOP.

It is interesting because there was no successful attempts that bring generics to ABIs. I can see the huge difficulty - if we allow generics to appear in the ABI exported entry points, we need a code generator that runs at link time.

So, I guess the Rust native ABI should be more like the COM ABI which is a subset of the language without generics. However, this ABI still should support the following:

  1. Concrete structs, enums and tuples.
  2. Lifetimes. This is a big challenge since we know lifetimes were erased at compile time. But without lifetimes, we lost the ability to check that the ABI is in fact safe at load time, and this would be no better than an unsafe C ABI. so this is what makes the ABI different. However, any progress on this direction would be very beneficial - the same idea can be used in Web APIs, RPC APIs etc.
  3. Dynamic dispatch. The COM ABI proved this can work, so we should be able to do the same.

(I didn't put async in the list because I think it should be implied by the above)

On top of the lifetimes in the ABI, the unsafely today exists in many dynamic linking systems includes dangling pointers to DLL owned data when the DLL was unloaded. This creates another big challenge: a pointer that is 'static in a DLL is not 'static in the host program.

Having the issues I described here, I believe we will overcome it and have a good solution.

6 Likes
  1. rdylib makes me think that it is a dylib (not executable) with rust metadata. This is exactly what the dylib crate type is.
  2. Rust does support dynamic linking (the dylib crate type) in itself. It does not support swapping dynamically linked dependencies behind the back of rustc. The ABI is not the only thing that needs to match for dynamically linked libraries to be swappable. All const, const fn, #[inline] functions and generic functions need to match too as the compilation of the downstream executable depends on their definition. It should theoretically be possible to define a hash that like the SVH (semantic version hash) hashes all these things, but unlike the SVH ignores the function bodies of functions where the function body is not important for cross-crate usage. The dylib could then define a symbol containing this hash as name and the executable could then require this symbol to be defined. This is likely not easy though and still requires the rustc version to be the same.

This is not because rust doesn't support dynamic linking, but because at first Redox OS didn't have a dynamic linker and when it did get one, it was too broken to use for most things. I believe it is mostly fixed now, but I am not sure.

You mean at runtime when the program starts? This code generator would depend on rustc as a whole and LLVM. In addition it would be way too slow. Depending on the dependency size it could take minutes to compile everything. This is simply infeasible.

3 Likes

This is one problem that rdylib want to address.

Another problem. We want the programmer to control what is to be exported what is to be imported. And like in COM ABI we want it to be accessible from a C binary (in that case all safety check will be skipped, we rely on the C programmer to maintain the safety).

I guess being stuck to a specific compiler version for everything contribute to the brokenness.

Yes that's why I didn't recommend to include generics in the ABI.

That is exactly the set of pub items.

The specific compiler version didn't matter. What happened was that for example gcc crashed due to a certain relocation type being implemented incorrectly. Enabling dynamic linking for Redox OS would not mean making libstd dynamically linked. It would mean making the libc used by Redox OS and only libc dynamically linked. The libc uses the C abi and as such can be replaced as easily as any other libc when dynamically linking.

That would make this feature useless for pretty much every crate, including libstd exports generics, which is a crate that distribution maintainers will very likely want to swap out when a security issue is found in it.

1 Like

Not for rdylib if generics can't be exported.

This is just a symptom of compiler incompatibility, or being stuck to the compatible compilers only. But we can't prevent the compiler being evolved; we can only define a new stable format that doesn't change from version to version.

COM ABI used to be quite successful without generics. It created its own ecosystem that does not depend on system core APIs like Kernel32.dll(it is always linked through C ABI, not COM ABI), I didn't see why we couldn't do the same. Furthermore, when I said this ABI might not support generics, it does not mean it do not export specialized types and functions. So for example we can export all Vec<T> types where T is in a "native" type set. The programmer should also be able to specify that a specific variant must be exported.

Regarding Swift, it achieved API for generics by basically turning everything into dyn Trait at the API boundary, and it has extra metadata and methods in vtable for dyn to work with unsized types as sized.

https://gankra.github.io/blah/swift-abi/

Personally, I think it would be fine if Rust had partial stable ABI. Simply start by forbidding all the hard parts (no public inline, no pub const, no generics with unknown types).

https://docs.rs/abi_stable/0.9.3/abi_stable/

3 Likes

This has nothing to do with compiler incompatibility. It was literally a bug in the dynamic linker of Redox OS. I have been trying to say that Redox OS not using dynamic linking has nothing to do with rustc. Static linking is much easier to implement as libc implementer. This is the only reason Redox OS had to be statically linked previously. The same would have been the case for an OS written in C.

COM is not Rust. COM can easily handle dll swapping as it doesn't have generics and thus by definition no code uses them. Rust does have generics and they are used by pretty much everyone.

The COM ABI goes one step further to simplify: it turns everything into dyn Any. But I don't know how this would work as Any requires the type behind to be 'static and in the context of dynamic linking, this can be relative as I said in the top post.

Interesting, may try it some day. But to address the blame from the Gentoo maintainer, we need more people to use it, until we have enough crates that is dynamically linkable.

I also wonder how it handle the unload of code, since this involves the problem of relative 'static.

There are good use-cases for Rust having a stable ABI. However, the particular blog post you're referring to was IMHO more of a complaint that not all languages are C, and not all programming environments want to depend on Linux. Maintainers don't like that they have to change their old tools and old processes to adjust to this fact.

For example the oft-cited example "What if we need to update openssl? Impossible with static linking!" is not really impossible, but merely not supported by their existing setup. It's quite possible if they kept an archive of lockfiles to find which binaries used the affected version, and rebuilt these binaries.

It's true that the existing system-specific package managers struggle to work with language-specific package managers. Cargo and npm can handle way more packages and support semver with multiple versions. But that doesn't mean all languages have to go back to being like C. It may mean that the old system package managers need to adapt to the newer ways of managing dependencies.

9 Likes

There is already a topic discussing ABI issues, along with a project wiki, so people are aware of (some) of the issues. Unfortunately, while lots of people seem to be interested in this problem (me included), none of us have sufficient time to actually work on it. If you do have time, please update the wiki with your thoughts and comments!

As a note, what you want to do is hard. The issue is that what I think we're all after is a safe ABI, one that allows a linker/loader to decide if the semantics of a library have changed between versions. If the entire planet perfectly followed semver.org requirements, and if all compiled binaries that used the ABI also ensured that all of Rust's safety guarantees were met, then we could possibly do this just by requiring all binaries expose an interface that included a machine-readable version string. However, I think this may not be sufficient, it may actually be an undecideable problem. E.g., as the maintainer of some library, I may notice some function has a bug in it that I consider to be small, so I fix it, bump the patch version number, and make a release. Package maintainers look at the change logs, release version number, etc., and agree that it was a small fix, so the next time you update your system, your libraries also get updated. Well, it turns out that some binary that depends on the library depended on the 'buggy' behavior to operate correctly, so now it's broken. So, was the fix small, or did it warrant a major version number bump? Eye of the beholder... More importantly, what information could we have encoded into the binary that would have allowed a loader to decide that the change in functionality would have an effect on the behavior of the binary?

Note: This is a bit tangential.

One bone I have to pick with semver is that it naturally is a social, rather than semantic, versioning system. This isn't inherently a bad thing — social decision making rounds the sharp corners of any automated system — but it makes it so that semantic versioning isn't strictly semantic, which is kinda its entire raison d'être. Another issue with semver is that anything goes below 1.0: given that of the 10 most downloaded crates on crates.io 5 are below version 1.0, it doesn't seem like the strict semantical guarantees semver provides are fully enforced by even the most popular crates.

Which is why I think we need a versioning system that, while still allowing for social flexibility when needed, adheres to strict API-enforced versioning guarantees. Especially for low-level APIs like binary interfaces, where 'socially acceptable' minor breaking changes can quite literally result in undefined behavior. Humor me for a second as I introduce a small example as to what an actually semantically-sound versioning system could might like:

(Click to expand) Let's call this system Mark Versioning. It aims to separate and preserve both social and semantic aspects of versioning:

A Mark is a specific version of a software. This number is decided by social consensus, and represents major iterations / changes to a software. It starts at 0 and is incremented by 1 whenever appropriate.

A Revision is similar to a Mark, but it represents a strict change in API. This may be a breaking or a no breaking change. Reversions start at 0 for each mark, and are incremented by one whenever an API change is detected. This should be done automatically by package management software.

A Hash is the particular SHA-3 hash (in hex form, truncated to 8 characters) of a the source code of a software version. A Revision may have multiple different hashes. Alternatively, one can start from 0 with each Revision and automatically increment by 1 with each change. This should be done automatically.

Whatever, I've written it out and tossed it up here.

This isn't to say we should all drop what we're doing adopt Mark versioning. This is by no means a fully formulated idea/proposal. I do this for the sake of showing that other versioning system constructions exist that are as-or-more useful than semantic versioning.

If we were to keep semver around for things like ABIs, at minimum I think API changes should be enforced by tools like cargo to meet strict semver requirements: e.g. adding a public function should require a minor version bump. More controversially, I think stricter requirements should be at least partially enforced for even versions below 1.0.

So, on the topic of a safe ABI: It's impossible to determine the total behavior of code in all circumstances, indeed, the issue is isomorphic to the halting problem (in anything other than trivial cases). The best we can do is ensure type safety and safety guarantees, reflect that in the version, and entrust conventions and requirements are fully checked at the boundary.

2 Likes

How would this be verified? We know that the versions will be different between the binaries (that's the point of upgrading), so we can't just compare hashes or diffs, they'd be meaningless. We can't rely on the programmers as that would be (as you said) a social version number. We can't even rely on exhaustive, brute force analysis of all possible states to compare the old and new outputs; we know that the behavior of the function changed, that was why the version number was bumped, because there was a bug that needed to be fixed.

The issue is that our original function didn't meet the original semantic specification, but the new function does (or is supposed to). So, to do perfect semantic versioning, we need a perfect specification language, and we need programmers that are able to use that language perfectly, so that we can verify that the implementation of a function perfectly meets its specification. Even with proof checkers and other tools, this is a very difficult task...

You can effectively get your rdylib by just using the existing dylib. You don't get compiler assistance in avoiding the parts of Rust that require being instantiated in the caller's crate, but you could in theory avoid them manually.

I highly suggest you check out the abi_stable crate family. It implements basically exactly what you're asking for in user-space, and clearly discusses the tradeoffs involved.

I think there is merit in providing a stable ABI subset functionality, similar to abi_stable, directly assisted by the language. But I also think it isn't a 2024 edition thing, but a 2027 edition thing at the earliest. I also think it doesn't have any need to be tied to an edition.

(The edition change that would be required is migrating from fn(Args...) -> Ret function pointers to &'static fn(Args...) -> Ret function references, if you want to support ever unloading Rust libraries. IIRC lang team is lukewarm towards making that change if someone were to demonstrate benefit to it and describe how the fn unsized type would behave.)

2 Likes

At the risk of getting bogged down in the weeds... I think having fn mean different things in different editions is a bad idea. The new unsized type could have a different name, but having both fn and Fn is confusing enough without adding a third thing. So… why not take a cue from trait objects and have the syntax fn(Args...) -> Ret + 'a ?

The "advantage" of using &fn() is getting normal lifetime elision, and getting *const fn(), and just it generally behaving like a normal reference. &fn() I could see being regularly used syntax, others feel much more like "oops maybe this is actually necessary after all" (especially dealing with reference vs pointer semantics).

That said, I generally agree with you. Were someone to give me the ability to make one request of pre-1.0 Rust devs, it'd be to use &fn() for function pointers rather than fn(). But it's probably not the right solution for post-1.0 Rust.

Really, &fn() is just a general way of referring to making function pointers more like references. The idea is needed (in order to support code unloading), the syntax is just the vehicle to explain. dyn fn() or fn() + 'a or whatever syntax is just a means to a (probably uncommon enough to be syntactically salty) end.

Plus, DLL loading will be wrapped up into &'a Dll almost certainly, so the only people who the syntax should matter to are people implementing the dynamic binding, which can be wrapped in a nice proc macro package.

(The language part is then dealing with 'static in the face of module unloading.)

1 Like

I'll quote the last paragraph of my first response:

So, on the topic of a safe ABI: It's impossible to determine the total behavior of code in all circumstances, indeed, the issue is isomorphic to the halting problem (in anything other than trivial cases). The best we can do is ensure type safety and safety guarantees, reflect that in the version, and entrust conventions and requirements are fully checked at the boundary. [emphasis mine]

Although we can't verify the behavior is correct, we can at least enforce type safety and rust safety guarantees at the boundary. This doesn't ensure semantic non-breakage, but it does ensure semantic compatibility.

In the context of Mark Versioning... ...a change that doesn't touch type signatures or only introduces new items would only result in an updated Hash. A change that touches type signatures or removes items would introduce a new Revision.

If a new public API is semantically compatible with an old one, and code that uses the general API checks assumptions at the boundary, we can ensure that version changes are loosely compatible. (Additionally, one can always specify exact version hashes as needed.)

Out of curiosity, is anyone familiar enough with .Net's CLI to know whether it defines anything like an ABI that could be reused for a high-level language such as Rust? If my memory serves, .Net offers dynamic linking for F#, C# and other high-level languages, which is encouraging, but it's not clear to me how much is provided by the ABI and how by the runtime and JIT.

I don't agree in this point and the Mark versioning seems a bad idea to me. Versioning is inherently hard. Semantic versioning is therefore hard as well. However, semantic versioning is honest and does not lie about the difficulties and does not offer well-intended shortcuts to hell, instead it provides a set of rules to keep the problem manageable. It is does not make it easy, but manageable.

That said, semantic versioning may seem deceptively simple first, then it may seem impossibly difficult and not working at all. I believe that it is because it is rarely explained and understood fully. Perhaps the missing piece for many people to understand it fully could be the concept of consumer vs provider types. I found it explained well in OSGi's technical whitepaper on semantic versioning.

OSGi is a Java technology standard founded twenty years ago to define a solid service-oriented modular system based on semantic versioning. I find it truly amazing, because it still holds its mission and I don't know any better example of really working, scalable and long-living practical application of the mentioned approach. Even if Java is not your cup of tea, OSGi might be interesting as a source of inspiration how to do things to last and how to apply (semantic) versioning in particular.

1 Like

I can't serve with that, but I wonder how difficult it could be to let Rust code run within JVM :grinning: It seems much more plausible to me since there is the GraalVM/Truffle with the support for running LLVM bitcode. While interesting (or funny), I doubt this is the actual way to achieve acceptable dynamic linking for Rust.

.Net will JiT monomorphizations at runtime, allowing it to get get past some of these problems, but I don't think that's the direction most people want for "normal" rust.

(I do think it'd be cool to have a CLR backend for Rust, through.)