Hi everyone! I've finally finished a pre-RFC for a solution to the problem raised in issue #73632. Thanks to everyone listed in the acknowledgements for very helpful feedback. Feel free to comment with your thoughts on this proposal!
Edit: I've posted version 2, incorporating all your helpful feedback!
Edit 2: Version 3 is up! It introduces the notion of standard library bundles, which specifies a full standardized solution for linking Rust binaries using the native Rust linker.
RFC: Stabilize a version of the rlib
format
Patrick Walton pcwalton@fb.com
Pre-RFC version 3
Summary
Large projects written in multiple languages need to be able to link together multiple Rust crates that form complex dependency trees, each compiled with separate invocations of the Rust compiler. The current staticlib
format exports symbols from dependencies of the crate being compiled, which can cause multiple-definition errors at link time. This RFC specifies an opt-in stable rlib
format that external build systems can use to produce a library including only symbols from the crate being compiled, and no others, avoiding possible link errors.
Motivation
Frequently, Rust code is just one part of a large binary or dynamic library, perhaps built with a language-neutral build system other than Cargo, such as Bazel and Buck. In these projects, there may be arbitrary combinations of Rust and C++ code such that the same crate arises as a dependency at multiple points in the graph. The amount of investment in the toolchain and workflow for these projects frequently predates the introduction of Rust by years. Thus it is desirable to preserve a standard linking setup, in which the build system directly invokes the system linker (e.g. ld
), in order to build a binary containing Rust code alongside code written in other languages.
Right now, the documented way to achieve this is by compiling the crate with the --crate-type=staticlib
switch (or crate-type = ["staticlib"]
in Cargo.toml
). This works well for small projects. However, it has the fundamental problem that dependencies of the Rust crate being compiled are included in the resulting native library. This causes problems with diamond dependencies. Suppose that we have the following dependency hierarchy specified in the native build system:
Rust crates B and C, both compiled with staticlib
, depend on the Rust crate A, while the C++ target D depends on B and C. Because of the semantics of staticlib
, the contents of A will be duplicated into B and C. This can cause D to fail to link, because the linker can see definitions from A twice and exit with a "multiple definition" error. (Note that multiple definition errors are not guaranteed in the above scenario, because linkers are "lazy" and will only bring in symbols as requested. The success of the link is determined by the particular symbols in use in these four targets, as well as the number and makeup of each package's codegen units.)
The simplest way to solve this problem is to provide a supported way for the Rust compiler to produce artifacts that export only the symbols from the crate being compiled. That way, the build system, which has complete knowledge of the dependency graph, can produce a final link line that guarantees each crate is included only once in the resulting binary. In fact, Rust has a mechanism that is nearly perfect for handling this already (and which Cargo uses to solve this exact problem): the rlib
format, which does not include symbols from dependent crates. However, the contents of rlib
s are unstable, so external build systems can't technically use them without depending on implementation details of the compiler.
This RFC proposes an opt-in mechanism that external build systems can use to produce rlib
files with a stable format. It's intentionally minimal and avoids stabilizing any more than is absolutely necessary for external build systems to work properly. Additionally, this RFC proposes a simple mechanism for packaging foundational Rust libraries such as the standard library into a format that the system linker can link against, allowing the creation of usable Rust binaries without rustc
driving the linker.
Guide-level explanation
Compiler switch
A new compilation switch, -C rlib-version
, is added to the compiler to control the contents of .rlib
archives. It takes one of two values, with more possible in future versions of Rust:
-
-C rlib-version=unstable
— The default value, this option indicates that the contents of.rlib
archives are unspecified. External tools should not rely on.rlib
files conforming to any particular format. -
-C rlib-version=v0
— This value indicates that.rlib
files conform to the version 0 format defined here.
Version 0 rlib
s
A version 0 rlib
is an archive file in the native format of the target, with the usual extension (.lib
or .a
) replaced by .rlib
. The native format is the usual file format for statically-linked libraries on the target, which for all targets is some variation of the common ar
archive format. (The format of WebAssembly rlib
s is unspecified in this RFC.)
Inside the rlib
file, any number of object files may be present that provide code and data for symbols defined by the Rust crate being compiled. Other files may also be present, such as .rmeta
files. This RFC makes no guarantees whatsoever about what these files may or may not contain: in particular, this RFC doesn't stabilize any kind of metadata format. External tools such as linkers should ignore any non-object files, as their contents are unstable.
There must be a file inside the archive whose name begins with the string _rlib_v00
. The contents are typically empty. The name of this file allows tools to determine the version of the rlib
.
The object files inside a version 0 rlib
must collectively contain global definitions for all the non-generic functions and statics defined by the crate being compiled. Global definitions must not be provided for any upstream dependencies of the crate, to avoid symbol collisions when linking. It's OK for functions for upstream dependencies to be present, but such symbols must be marked local to the archive. Symbol names should be appropriately mangled; in the case of v0
symbol mangling, they should follow Rust RFC 2603.
rlib
files often contain undefined symbols with definitions in other objects, whether those objects be rlib
files (i.e. crate dependencies) or other libraries such as native static or dynamic libraries. This RFC intentionally doesn't provide a way for an external tool to locate those dependencies. That's assumed to be the job of the build system.
These requirements are designed to allow non-rustc
linkers to link executables created by the Rust compiler, driven by a variety of build systems, in a way that doesn't result in symbol conflicts when diamond dependencies are involved.
Standard library bundles
In order to successfully produce an binary containing both Rust code and native code, a way to link to the Rust standard library is needed. This RFC specifies a simple mechanism for doing so: simply compile an empty crate (an empty lib.rs
file is fine) as a staticlib
with a flag -C emit-std-bundle=yes
. Any desired crate-level metadata and/or compiler flags can be supplied in the process of compiling this standard library bundle, for example #![no_std]
to omit the standard library, or -C target-feature
to enable specific CPU features. The resulting artifact will be a linkable version of the standard library.
An example workflow is as follows:
$ cargo new --lib stdrust
$ cd stdrust
$ echo "[lib]" >>Cargo.toml
$ echo "crate-type = ['staticlib']" >>Cargo.toml
$ RUSTFLAGS="-C emit-std-bundle=yes" cargo build --release
$ ls -l target/release/libstdrust.a
-rw------- 1 pcwalton staff 17031504 Jan 19 19:37 target/release/libstdrust.a
The resulting libstdrust.a
may be installed into the library search path, at which point -lstdrust
may be added to the link line in order to link Rust executables.
Reference-level explanation
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. They are not capitalized, for clarity.
rlib
versioning
When the compiler is instructed to produce rlib
output, the contents of the resulting artifact depend on the rlib
version in use. The rlib
version is specified by a compiler switch with the syntax -C rlib-version=VERSION
, with VERSION
replaced by one of the following:
-
unstable
— When therlib
version isunstable
, the contents of therlib
file are completely unspecified by this RFC. In particular, the resultingrlib
files may, or may not, actually be inv0
format. External tools should not assume thatrlib
s with versionunstable
conform to any specific format.
Note (non-normative): Most likely, unstable
will result in a version 0 rlib
being produced initially. The primary reason why unstable
is left unspecified is so as not to preclude the possibility of MIR-only rlib
s in the future.
-
v0
— When therlib
version isv0
, the contents of therlib
must match the definition supplied in the following section.
Other valid values of VERSION
may exist. Their semantics are unspecified by this RFC.
Version 0 rlib
contents
A version 0 rlib
must be an archive file in the native format of the target. All supported targets use some variant of the common ar
archive format. In particular, all supported targets begin their archive format with the string !<arch>
followed by a newline character: i.e. the bytes 0x21 0x3C 0x61 0x72 0x63 0x68 0x3E 0x0A. The precise on-disk format of the archive file is unspecified by this RFC, but it must contain linkable object files as well as a symbol table.
For targets that do not use a variant of the common ar
archive format, as well as WebAssembly, this RFC does not define the format of a version 0 rlib
. Such platforms may or may not support version 0 rlib
s at all.
In this section, we make reference to concepts of the BFD library. This provides a convenient way to abstract over the concepts that correspond to one another in different object formats.
Note (non-normative): BSD, System V, and Windows use incompatible mechanisms for specifying symbol tables inside the ar
format, so we must use an abstraction.
The target crate is the crate that the current invocation of the compiler is compiling and producing a rlib
artifact for.
A global symbol is a symbol with the BFD BSF_GLOBAL
flag set. An rlib
library defines whatever global symbols are required to link, subject to the three conditions below.
A local symbol is a symbol with the BFD BSF_LOCAL
flag set. A rlib
library may contain any number of local symbols. Their names and contents are unspecified by this RFC.
This RFC does not define the contents of the set of global symbols exported by an rlib
archive. Instead, it requires that some number of global symbols shall be exported such that the following three conditions are fulfilled:
-
If some Rust crate B depends on Rust crate A, and both A and B are in
.rlib
format, the.rlib
files corresponding to A and B shall be successfully linkable using the system linker, notwithstanding the requirements specified in the "additional linking requirements" section. -
Any set of crates in
.rlib
format compiled by the same Rust compiler (including compiler version) must be linkable together as long as the following conditions are fulfilled:
a. For each crate, all dependencies of that crate must be in the set.
b. All conditions specified in the "additional linking requirements" section are met.
c. The set contains each .rlib
file no more than once.
There are two exceptions to this rule:
(i) Multiple crates that define the same language item may not be linkable together.
(ii) Multiple crates that define identically-named items marked with #[no_mangle]
may not be linkable together.
- For each item with a
#[no_mangle]
annotation, a global symbol must be present in the archive with a name matching that of an identically-named C symbol definition on the target.
Note (non-normative): Some binary formats mark C symbols in some way (e.g. Mach-O represents them with a leading _
).
Note (non-normative): "Linkable using the system linker" implies that there are neither undefined nor multiply-defined symbols.
Note (non-normative): Rust RFC 2603 specifies a mangling scheme for symbols.
Note (non-normative): Symbols relating to global allocation and panic handling must not be defined in the .rlib
unless the crate itself defines those symbols.
The object files containing the symbols that the target crate defines shall be present inside the rlib
archive. Any other files necessary for the Rust compiler to link to the target crate and use it as a dependency must also be present. The object files must be linkable in a format that the target supports.
Note (non-normative): Examples of linkable object files on various platforms include but are not limited to ELF, Mach-O, PE/COFF, LLVM bitcode for full LTO, LLVM bitcode for ThinLTO, and LLVM bitcode wrapped in a native object file.
Note (non-normative): The most important non-object file is the .rmeta
file, which contains data necessary for downstream Rust invocations to use the crate as a dependency, such as the types and contents of inlined functions. The system linker ignores this information.
Additionally, a file with a name beginning with the string _rlib_v00
must be present inside the rlib
archive. The contents of this file are unspecified. It allows build systems and other tools to determine that the rlib
is in version 0 format.
Any other files may also be present inside the rlib
archive. Whether these files exist, and what they contain, is unspecified by this RFC.
All rlib
files that are to be linked together must be built in a compatible manner. The precise definition of compatible manner is unspecified by this RFC.
Note (non-normative): The reason for not defining the term compatible manner precisely is so that new linkage restrictions may be added in the future. For example, one could imagine a later version of Rust introducing multiple ABIs such as those suggested in the interoperable_abi
proposal. In this case, the ABI of the rlib
s and the ABI of the standard library bundle would all need to match for the link to succeed. This RFC is intended to preserve maximum flexibility for such changes in the future.
Standard library bundles
In order to successfully perform the final link of an executable containing Rust code, the core
library must be present on the link line. Frequently, the std
library must be present as well. This RFC specifies a mechanism to produce a standard library bundle containing core
or std
, appropriately configured to match the given target.
A crate successfully compiled as staticlib
that contains no Rust symbols and no dependencies other than core
or std
with the -C emit-std-bundle=yes
flag is known as a standard library bundle. The native system linker must be able to successfully link an executable containing Rust code if:
-
All such crates containing Rust code are supplied precisely once to the system linker.
-
All transitive
rlib
dependencies of all such crates are supplied precisely once to the system linker. -
Exactly one of the
core
orstd
standard library bundles is supplied to the system linker. This standard library bundle must have been built in a compatible manner with allrlib
s to be linked.
Note (non-normative): At the time of writing, the -C emit-std-bundle=yes
flag can simply be a no-op, as the Rust compiler can successfully create such staticlib
s already by compiling an empty crate. The purpose of the flag is to ensure that this behavior is preserved in the future in an opt-in fashion.
The exact symbols that are exposed in a standard library bundle is unspecified by this RFC. In general, they are expected to change with every Rust release and may change depending on the manner in which the standard library channel was compiled.
Note (non-normative): The standard library bundle approach allows this RFC to avoid specifying details like the behavior of allocator shims, raw-dylib
, bundled static libraries, #[global_allocator]
, the allocation error handler, -C panic=abort
and -C panic=unwind
, and so forth.
Drawbacks
This scheme doesn't preclude Rust changing the rlib
format (for example, introducing MIR-only rlib
s), but if Rust does so, under this RFC the compiler will need to retain support for the version 0 rlib
support described here behind a compiler switch. This may add some amount of maintenance burden.
Addressing potential issues
-
rustc
now supports the concept of bundled static libraries, which are native libraries placed inside a.rlib
file. Thev0
rlib
format doesn't support such libraries; generally, native build systems would prefer to keep libraries separate, for better interoperability with native code. This can be revisited with futurerlib
versions if need be. -
An issue regarding static initializers was raised during the discussion: they don't reliably work unless
--whole-archive
is provided when linking therlib
. However,--whole-archive
is not available on AIX. AIX is currently not a supported platform for Rust, however; if and when it becomes one, the RFC for support for that platform can specify what to do here. Additionally, this is an issue that would be present regardless of whether therlib
format is specified.
Alternatives
-
Keep the format of
.rlib
files unstable officially, but have external build systems depend on their format anyway. This wouldn't immediately have any ill effects, as external build systems like Buck and Bazel could depend on the contents of.rlib
files and things would probably continue to work for some time. It would also have the advantage of avoiding the complexity of extra compiler switches and would allow the compiler to make a clean switch to MIR-onlyrlib
s someday. However, this would cause breakage if Rust ever decides to change the format ofrlib
s. -
Have external build systems invoke
rustc
instead ofld
to perform the final link. This would allow Rust to make a clean break with the past if it switches to MIR-onlyrlib
s. It would also potentially obviate the need for the build system to be aware of the dependency graph, including standard library crates. However, it would force large C++ projects to switch linkers whenever they link in any Rust at all, which would significantly reduce the willingness of many C++ projects to incrementally adopt Rust by burdening the build system with extra logic. It would also be incompatible with any other language wanting to "take over" linking in this way; only one language can be in charge of the last linking stage, and the advantage of systemld
is that it's language-neutral. -
Have external build systems invoke
rustc
to bundle all Rust dependencies together into one library, which is then linked into the final binary. This is similar to the previous alternative, except it adds an extra step. It would have the advantage of allowingrustc
to automatically add extra libraries that need to be added to the final link line, such as allocator shims and native bundled libraries, without having to duplicate that logic into the external build system. However, this has potential performance issues due to needing to process Rust code twice, once with therustc
-invoked linker and once with the native linker. Additionally, this would complicate the common task of introducing Rust components to two unrelated portions of a large binary by requiring the build system to track every binary to determine whether Rust is involved and adding an extra global linking step if so. -
Add a new crate type,
staticlib-nobundle
or similar, which works likestaticlib
but without marking symbols from dependent crates global. This would mean that the same crate cannot be officially used from both Rust and C++, despite being essentially identical. In projects that have both Rust and C++ upstream crates that depend on a single Rust downstream crate, this would result in duplicate symbol errors as both therlib
and thestaticlib-nobundle
would be linked into the final binary. -
Use
--emit=obj
instead of usingstaticlib
s orrlib
s. With this approach, there is no obvious place for the Rust compiler to emit metadata (.rmeta
files). Without metadata, the crate would no longer be linkable from Rust, only from C or C++, meaning that a library meant to be used from both C/C++ and Rust would need to be built twice. Additionally, this forces the number of codegen units to 1, causing compilation performance problems. Finally, this would have the same problem asstaticlib-nobundle
in that if both Rust and C++ link to the same crate, duplicate symbol errors would result, as Rust would be linking to anrlib
and C++ would be linking to an object file with the same symbols. -
Use
--emit=obj
, and add support for multiple codegen units when using--emit=obj
torustc
by having the compiler generate the object files separately and then useld -r
to link them together. The-r
(relocatable) switch told
allows multiple.o
files to be combined into another.o
file that can then be further linked into a binary. Unfortunately, this was tried early on in Rust's development and it was discovered thatld -r
is often poorly supported by OS toolchains on account of how seldom the feature is used. Furthermore, this inherits the same problems mentioned before regarding metadata and duplicate symbols. -
Use an flag that doesn't carry a version number, like
-C rlib-format=platform
. This would be essentially the same as this RFC, but would not leave room for different versions in the future. For example, platforms might introduce new library formats in the future, or we might want to add some extra information to the.rlib
format consumable by outside tools. In these cases, the ability to release av1
version and beyond would be useful. -
Instruct the linker to discard duplicate Rust symbols instead of emitting errors, and have external build systems use
-C crate-type=staticlib
. TheCOMDAT
feature in the ELF format (exposed aslinkonce_odr
in LLVM) can be used for this. This is what C++ does to avoid duplicate symbol errors when different object files include expansions of the same template. This solution gets the job done in practice, but it means that static libraries duplicate their dependencies, which results in extra needless I/O during the compilation (quadratic blow-up in the worst case). Moreover, it's inelegant. -
Stabilize the contents of
.rlib
files in perpetuity. This would prevent Rust from adopting MIR-onlyrlib
s in the future, which are a commonly-discussed feature. The goal of this RFC isn't to hinder experimentation with alternativerlib
formats.
References
See GitHub issue #73632 and the pre-RFC Discourse thread for the discussion that led up to this RFC.
Acknowledgements
Thanks to Jeremy Fitzhardinge, Matt Hammerly, Dana Jansens, Augie Fackler, Marcel Hlopko, and bjorn3 for feedback on this RFC, and everyone who took part in GitHub issue #73632 and the Discourse thread.