Hi everyone! I've finally finished a pre-RFC for a solution to the problem raised in issue #73632. Thanks to everyone listed in the acknowledgements for very helpful feedback. Feel free to comment with your thoughts on this proposal!
Edit: I've posted version 2, incorporating all your helpful feedback!
Patrick Walton — email@example.com
Pre-RFC version 2
Large projects written in multiple languages need to be able to link together multiple Rust crates that form complex dependency trees, each compiled with separate invocations of the Rust compiler. The current
staticlib format exports symbols from dependencies of the crate being compiled, which can cause multiple-definition errors at link time. This RFC specifies an opt-in stable
rlib format that external build systems can use to produce a library including only symbols from the crate being compiled, and no others, avoiding possible link errors.
Frequently, Rust code is just one part of a large binary or dynamic library, perhaps built with a language-neutral build system other than Cargo, such as Bazel and Buck. In these projects, there may be arbitrary combinations of Rust and C++ code such that the same crate arises as a dependency at multiple points in the graph. The amount of investment in the toolchain and workflow for these projects frequently predates the introduction of Rust by years. Thus it is desirable to preserve a standard linking setup, in which the build system directly invokes the system linker (e.g.
ld), in order to build a binary containing Rust code alongside code written in other languages.
Right now, the documented way to achieve this is by compiling the crate with the
--crate-type=staticlib switch (or
crate-type = ["staticlib"] in
Cargo.toml). This works well for small projects. However, it has the fundamental problem that dependencies of the Rust crate being compiled are included in the resulting native library. This causes problems with diamond dependencies. Suppose that we have the following dependency hierarchy specified in the native build system:
Rust crates B and C, both compiled with
staticlib, depend on the Rust crate A, while the C++ target D depends on B and C. Because of the semantics of
staticlib, the contents of A will be duplicated into B and C. This can cause D to fail to link, because the linker can see definitions from A twice and exit with a "multiple definition" error. (Note that multiple definition errors are not guaranteed in the above scenario, because linkers are "lazy" and will only bring in symbols as requested. The success of the link is determined by the particular symbols in use in these four targets, as well as the number and makeup of each package's codegen units.)
The simplest way to solve this problem is to provide a supported way for the Rust compiler to produce artifacts that export only the symbols from the crate being compiled. That way, the build system, which has complete knowledge of the dependency graph, can produce a final link line that guarantees each crate is included only once in the resulting binary. In fact, Rust has a mechanism that is nearly perfect for handling this already (and which Cargo uses to solve this exact problem): the
rlib format, which does not include symbols from dependent crates. However, the contents of
rlibs are unstable, so external build systems can't technically use it without depending on implementation details of the compiler.
This RFC proposes an opt-in mechanism that external build systems can use to produce
rlib files with a stable format. It's intentionally minimal and avoids stabilizing any more than is absolutely necessary for external build systems to work properly.
A new compilation switch,
-C rlib-version, is added to the compiler to control the contents of
.rlib archives. It takes one of two values, with more possible in future versions of Rust:
-C rlib-version=unstable— The default value, this option indicates that the contents of
.rlibarchives are unspecified. External tools should not rely on
.rlibfiles conforming to any particular format.
-C rlib-version=v0— This value indicates that
.rlibfiles conform to the version 0 format defined here.
A version 0
rlib is an archive file in the native format of the target, with the usual extension (
.a) replaced by
.rlib. The native format is the usual file format for statically-linked libraries on the target, which for all targets is some variation of the common
ar archive format. (The format of WebAssembly
rlibs is unspecified in this RFC.)
rlib file, any number of object files may be present that provide code and data for symbols defined by the Rust crate being compiled. Other files may also be present, such as
.rmeta files. This RFC makes no guarantees whatsoever about what these files may or may not contain: in particular, this RFC doesn't stabilize any kind of metadata format. External tools such as linkers should ignore any non-object files, as their contents are unstable.
There must be a file inside the archive whose name begins with the string
_rlib_v00. The contents are typically empty. The name of this file allows tools to determine the version of the
The object files inside a version 0
rlib must collectively contain global definitions for all the non-generic functions and statics defined by the crate being compiled. Global definitions must not be provided for any upstream dependencies of the crate, to avoid symbol collisions when linking. It's OK for functions for upstream dependencies to be present, but such symbols must be marked local to the archive. Symbol names should be appropriately mangled; in the case of
v0 symbol mangling, they should follow Rust RFC 2603.
rlib files often contain undefined symbols with definitions in other
rlib files (i.e. crate dependencies). This RFC intentionally doesn't provide a way for an external tool to locate those dependencies. That's assumed to be the job of the build system.
These requirements are designed to allow non-
rustc linkers to link executables created by the Rust compiler, driven by a variety of build systems, in a way that doesn't result in symbol conflicts when diamond dependencies are involved.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. They are not capitalized, for clarity.
When the compiler is instructed to produce
rlib output, the contents of the resulting artifact depend on the
rlib version in use. The
rlib version is specified by a compiler switch with the syntax
-C rlib-version=VERSION, with
VERSION replaced by one of the following:
unstable— When the
unstable, the contents of the
rlibfile are completely unspecified by this RFC. In particular, the resulting
rlibfiles may, or may not, actually be in
v0format. External tools should not assume that
rlibs with version
unstableconform to any specific format.
Note (non-normative): Most likely,
unstable will result in a version 0
rlib being produced initially. The primary reason why
unstable is left unspecified is so as not to preclude the possibility of MIR-only
rlibs in the future.
v0— When the
v0, the contents of the
rlibmust match the definition supplied in the following section.
Other valid values of
VERSION may exist. Their semantics are unspecified by this RFC.
A version 0
rlib must be an archive file in the native format of the target. All supported targets use some variant of the common
ar archive format. In particular, all supported targets begin their archive format with the string
!<arch> followed by a newline character: i.e. the bytes 0x21 0x3C 0x61 0x72 0x63 0x68 0x3E 0x0A. The precise on-disk format of the archive file is unspecified by this RFC, but it must contain linkable object files as well as a symbol table.
For targets that do not use a variant of the common
ar archive format, as well as WebAssembly, this RFC does not define the format of a version 0
rlib. Such platforms may or may not support version 0
rlibs at all.
In this section, we make reference to concepts of the BFD library. This provides a convenient way to abstract over the concepts that correspond to one another in different object formats.
Note (non-normative): BSD, System V, and Windows use incompatible mechanisms for specifying symbol tables inside the
ar format, so we must use an abstraction.
The target crate is the crate that the current invocation of the compiler is compiling and producing a
rlib artifact for.
A global symbol is a symbol with the BFD
BSF_GLOBAL flag set. An
rlib library defines whatever global symbols are required to link, subject to the three conditions below.
A local symbol is a symbol with the BFD
BSF_LOCAL flag set. A
rlib library may contain any number of local symbols. Their names and contents are unspecified by this RFC.
This RFC does not define the contents of the set of global symbols exported by an
rlib archive. Instead, it requires that some number of global symbols shall be exported such that the following three conditions are fulfilled:
If some Rust crate B depends on Rust crate A, and both A and B are in
.rlibfiles corresponding to A and B shall be successfully linkable using the system linker, notwithstanding the requirements specified in the "additional linking requirements" section.
Any set of crates in
.rlibformat compiled by the same Rust compiler (including compiler version) must be linkable together as long as the following conditions are fulfilled:
a. For each crate, all dependencies of that crate must be in the set.
b. All conditions specified in the "additional linking requirements" section are met.
c. The set contains each
.rlib file no more than once.
There are two exceptions to this rule:
(i) Multiple crates that define the same language item may not be linkable together.
(ii) Multiple crates that define identically-named items marked with
#[no_mangle] may not be linkable together.
- For each item with a
#[no_mangle]annotation, a global symbol must be present in the archive with a name matching that of an identically-named C symbol definition on the target.
Note (non-normative): Some binary formats mark C symbols in some way (e.g. Mach-O represents them with a leading
Note (non-normative): "Linkable using the system linker" implies that there are neither undefined nor multiply-defined symbols.
Note (non-normative): Rust RFC 2603 specifies a mangling scheme for symbols.
Note (non-normative): Symbols relating to global allocation and panic handling must not be defined in the
.rlib unless the crate itself defines those symbols.
The object files containing the symbols that the target crate defines shall be present inside the
rlib archive. Any other files necessary for the Rust compiler to link to the target crate and use it as a dependency must also be present. The object files must be linkable in a format that the target supports.
Note (non-normative): Examples of linkable object files on various platforms include but are not limited to ELF, Mach-O, PE/COFF, LLVM bitcode for full LTO, LLVM bitcode for ThinLTO, and LLVM bitcode wrapped in a native object file.
Note (non-normative): The most important non-object file is the
.rmeta file, which contains data necessary for downstream Rust invocations to use the crate as a dependency, such as the types and contents of inlined functions. The system linker ignores this information.
Additionally, a file with a name beginning with the string
_rlib_v00 must be present inside the
rlib archive. The contents of this file are unspecified. It allows build systems and other tools to determine that the
rlib is in version 0 format.
Any other files may also be present inside the
rlib archive. Whether these files exist, and what they contain, is unspecified by this RFC.
Definitions of additional
std-internal symbols that the compiler generates calls to may be required in order to link a Rust target. The names of such symbols must begin with
__ (a double underscore). Other than that restriction, this RFC does not define anything about these symbols. External build systems may be required to include object files that provide definitions for them in the final link.
Crates not shipped with the Rust compiler must not attempt to define their own
Note (non-normative): For the most part, these symbols have to do with allocation. They have names like
When using the
raw-dylib feature on Windows, one or more import libraries may need to be supplied to the linker in order to successfully link. The contents of such libraries are unspecified by this RFC.
This scheme doesn't preclude Rust changing the
rlib format (for example, introducing MIR-only
rlibs), but if Rust does so, under this RFC the compiler will need to retain support for the version 0
rlib support described here behind a compiler switch. This may add some amount of maintenance burden.
Right now, there is no officially-supported way for the build system to find where standard library crates like
corereside, as well as foundational crates like
panic_abort. One might reasonably ask what the point of stabilizing the
rlibformat is if the precise list of crates needed to link to Rust code is still unstable. This RFC acknowledges that the interface to an external build system is incomplete without addressing this point, but stabilization of the
rlibformat will be a necessary component of any solution. It's better to have a partial solution that makes progress toward the goal of a standard interface to external build systems than no solution at all.
- Note that the
std-aware Cargo working group is making progress toward providing a solution that enables crates to specify explicit dependencies on the standard library. The outcome of this work may well be useful for external build systems like Buck and Bazel as well.
- Note that the
This RFC doesn't specify a way to actually build standard library crates in
v0format. That's intentional, in the interests of avoiding overspecification. Presumably it will be provided by some opt-in mechanism in the
rustcnow supports the concept of bundled static libraries, which are native libraries placed inside a
rlibformat doesn't support such libraries; generally, native build systems would prefer to keep libraries separate, for better interoperability with native code. This can be revisited with future
rlibversions if need be.
An issue regarding static initializers was raised during the discussion: they don't reliably work unless
--whole-archiveis provided when linking the
--whole-archiveis not available on AIX. AIX is currently not a supported platform for Rust, however; if and when it becomes one, the RFC for support for that platform can specify what to do here. Additionally, this is an issue that would be present regardless of whether the
rlibformat is specified.
Keep the format of
.rlibfiles unstable officially, but have external build systems depend on their format anyway. This wouldn't immediately have any ill effects, as external build systems like Buck and Bazel could depend on the contents of
.rlibfiles and things would probably continue to work for some time. It would also have the advantage of avoiding the complexity of extra compiler switches and would allow the compiler to make a clean switch to MIR-only
rlibs someday. However, this would cause breakage if Rust ever decides to change the format of
Have external build systems invoke
ldto perform the final link. This would allow Rust to make a clean break with the past if it switches to MIR-only
rlibs. It would also potentially obviate the need for the build system to be aware of the dependency graph, including standard library crates. However, it would force large C++ projects to switch linkers whenever they link in any Rust at all, which would significantly reduce the willingness of many C++ projects to incrementally adopt Rust by burdening the build system with extra logic. It would also be incompatible with any other language wanting to "take over" linking in this way; only one language can be in charge of the last linking stage, and the advantage of system
ldis that it's language-neutral.
Have external build systems invoke
rustcto bundle all Rust dependencies together into one library, which is then linked into the final binary. This is similar to the previous alternative, except it adds an extra step. It would have the advantage of allowing
rustcto automatically add extra libraries that need to be added to the final link line, such as allocator shims and native bundled libraries, without having to duplicate that logic into the external build system. However, this has potential performance issues due to needing to process Rust code twice, once with the
rustc-invoked linker and once with the native linker. Additionally, this would complicate the common task of introducing Rust components to two unrelated portions of a large binary by requiring the build system to track every binary to determine whether Rust is involved and adding an extra global linking step if so.
Add a new crate type,
staticlib-nobundleor similar, which works like
staticlibbut without marking symbols from dependent crates global. This would mean that the same crate cannot be officially used from both Rust and C++, despite being essentially identical. In projects that have both Rust and C++ upstream crates that depend on a single Rust downstream crate, this would result in duplicate symbol errors as both the
staticlib-nobundlewould be linked into the final binary.
--emit=objinstead of using
rlibs. With this approach, there is no obvious place for the Rust compiler to emit metadata (
.rmetafiles). Without metadata, the crate would no longer be linkable from Rust, only from C or C++, meaning that a library meant to be used from both C/C++ and Rust would need to be built twice. Additionally, this forces the number of codegen units to 1, causing compilation performance problems. Finally, this would have the same problem as
staticlib-nobundlein that if both Rust and C++ link to the same crate, duplicate symbol errors would result, as Rust would be linking to an
rliband C++ would be linking to an object file with the same symbols.
--emit=obj, and add support for multiple codegen units when using
rustcby having the compiler generate the object files separately and then use
ld -rto link them together. The
-r(relocatable) switch to
.ofiles to be combined into another
.ofile that can then be further linked into a binary. Unfortunately, this was tried early on in Rust's development and it was discovered that
ld -ris often poorly supported by OS toolchains on account of how seldom the feature is used. Furthermore, this inherits the same problems mentioned before regarding metadata and duplicate symbols.
Use an flag that doesn't carry a version number, like
-C rlib-format=platform. This would be essentially the same as this RFC, but would not leave room for different versions in the future. For example, platforms might introduce new library formats in the future, or we might want to add some extra information to the
.rlibformat consumable by outside tools. In these cases, the ability to release a
v1version and beyond would be useful.
Instruct the linker to discard duplicate Rust symbols instead of emitting errors, and have external build systems use
-C crate-type=staticlib. The
COMDATfeature in the ELF format (exposed as
linkonce_odrin LLVM) can be used for this. This is what C++ does to avoid duplicate symbol errors when different object files include expansions of the same template. This solution gets the job done in practice, but it means that static libraries duplicate their dependencies, which results in extra needless I/O during the compilation (quadratic blow-up in the worst case). Moreover, it's inelegant.
Stabilize the contents of
.rlibfiles in perpetuity. This would prevent Rust from adopting MIR-only
rlibs in the future, which are a commonly-discussed feature. The goal of this RFC isn't to hinder experimentation with alternative
Thanks to Jeremy Fitzhardinge, Matt Hammerly, Dana Jansens, Augie Fackler, Marcel Hlopko, and bjorn3 for feedback on this RFC, and everyone who took part in GitHub issue #73632 and the Discourse thread.