Pre-RFC: Stabilize a version of the `rlib` format

Just like how it read .rmeta file. Read out the .rmeta section of the object file.

I suppose the alternative on these platforms would be to require thin local LTO. I think the only case we don't run thin local LTO are single CGU, -Copt-level=0 and when cross-crate LTO is enabled. For the latter cases we can simply run thin local LTO but with opt level 0, so everything is still merged to a single object file.

I argued against archive because staticlib really isn't an ideal format for this purpose. The fact that rlib are archives caused #47384. I used the synthetic symbols.o to fix that issue for binary/dylib artifacts, but the issue is still present in archive.

A Rust crate is supposed to be a single compilation unit and that's not really how archives work. It must be noted that simply add --whole-archive is not equivalent to generating a single object file, because the former will actually force all static libraries that the rlib link to to be selected as well (rlib will unpack all static libraries it links to and pack it into the rlib).

And for archives the symbol export level issue of archive is remain unaddressed. All Rust symbols that are referenced from other CGUs are exported while they really shouldn't.

Thin local LTO still produces multiple object files. The whole difference between fat and thin lto ia that fat lto merges all modules into a single one, while thin lto merely extracts a summary from extern modules and uses this for optimizing the local module pretty much the same way it would otherwise.

This is no longer the case I believe. AFAIK it directly bundles the staticlib as a single archive member rather than unpacking the individual object files.

I confess I have not caught up on this thread, but I wanted to add one constraint:

I am in favor of allowing people to use their own linker, but I think it should be "opt-in".

in other words, rather than saying that rlibs are stable (which I see as an internal impl detail of the compiler), I'd rather say that we are stabilizing some new target type, akin to static libs or what have you.

The idea is that I do not want to commit that the internal byproducts of running cargo build can be consumed by any tool other than rustc; I would want to keep the option of having special features that require linking support (but which therefore could not be used in conjunction with such targets).

1 Like

(This is a key part of the rfc above: it's letting you opt in by asking for a specific rlib target type, though it works via a separate flag rather than crate-type i think. I don't think that violates the spirit of your constraint.)

Manish covered it above—this is opt-in behind an -C rlib-version switch.

1 Like

At the risk of coming across like bikeshedding: I think there's a meaningful difference between "let's stabilize a version of rlib" and "let's define a format for third-party linkers to use".

1 Like

Keep in mind that one important design goal is to make the format that third-party linkers use also the format that rustc can link against. Otherwise we run into the problem that we'd have to build crates that are used by both Rust and C++ twice. (This is all covered in the alternatives section.)

So, in a sense, defining a format for third-party linkers to use isn't really the problem we're trying to solve here: it really is about stabilizing a type of rlib.

1 Like

Do you have any references for issues with ld -r on iOS? Hmm... I see that lld does not support it, but Apple's default linker does, and if there are bugs with that support, maybe they can just be fixed.

Even if we could fix the ld -r issues, we would still have the problem of two different formats for "Rust linking to Rust" and "C++ linking to Rust", which would require either building crates twice, or having rustc support two different library formats for no particularly compelling reason that I can see.

It seems like you're using "rlib" as a synonym for "format rustc knows how to link".

I wasn't suggesting using a format that rustc can't handle. Rather, I was suggesting using a format that rustc does know how to handle, but building it up from "what's the minimum we need to support both Rust and third-party linkers" rather than trying to stabilize most of rlib.

Last time I tried it (for inline asm support in cg_clif) ld -r on macOS simply ate at least some of the symbols exported by the individual object files.

I wasn't aware there was any other format that properly supports metadata; as far as I can tell, rlib is that format. Also the RFC intentionally only stabilizes enough of rlib to allow Rust and third-party linkers to work together, leaving much unspecified. It's not a goal to hinder evolution of the rlib format.

It sounds like we're basically already on the same page here.

1 Like

:grinning: I don't disagree that we're kinda saying the similar things in different ways here. Like I said, I don't mean to bikeshed, but I do think this is as much "what semantic/directional information does it convey to call this a stabilized version of rlib, versus a separate format". The former implies some degree of commitment to support everything rlib does, and stabilizing a v2, v3, etc. A separate format, on the other hand (even if it shares much of rlib) feels more like saying "here's something that may support only a subset of what Rust supports, and might support more things in the future, but there might be features incompatible with it in the future".

1 Like

I don't see an issue teaching rustc to know how to embed and extract metadata from object files. Why does it have to be an archive file?

Yeah, that's more or less how I think of it, except I don't think the different library types are fundamentally different crate types - they're all just libraries. So I'd like to see something like:

--crate-type lib --emit staticlib,metadata,deps,...

instead of --crate-type staticlib for example. The library type doesn't really directly affect codegen or other backend aspects, so in principle the object code generation could be identical - it's just how the results are packaged. (With -Crelocation-model controlling PIC etc.)

(Though at least in the ELF world, .so files can also be pretty close to binaries in functionality and role they play in the build graph, so it's not completely straightforward.)

Ideally I'd like to see all the other emitted files consolidated under --emit as well, like save-analysis (even if it remains unstable).

It technically does as it for example affects symbol visibility and the set of root mono items in the monomorphization collector. It is a bug IMHO especially because the changes affect all emitted artifacts even if only one crate type needs the change, but this bug is not one that I think is avoidable without duplicating codegen between the crate types.

Yeah, it would have been nicer to call what I called crate type groups as crate types.

Yes - the context here is that we're integrating Rust into other existing build systems, so the assumption is that the build system is directly invoking rustc with appropriate flags (assuming they exist).

The goal here is that if a given crate appears any number of times in the overall dependency graph, it should appear exactly once on the linker command line so that there's no possibility of duplicate symbols or other link-time conflicts. (Not to mention the compile-time cost of compiling the same code multiple times.)

The problem with staticlib or any other new --crate-type option is that it introduces the possibility that the crate could be built and linked multiple times depending on exactly how it relates to the other objects in the dependency graph. We don't want distinct crate foos for "foo as used by C++ module X" vs "foo as used by Rust crate Y".

1 Like

I would turn the question on its head: why would we need two linkable-from-Rust formats (rlib and .o), when one will do?

What do people think about the idea of, at least for now, having rustc in charge of bundling libstd (and other foundational crates like libcore and liballoc) into a single .a (which I'll call libstdrust.a—we can bikeshed the name)? rustc can provide the link flags for libstd based on the panic mode, allocator, and so forth. This should be OK for us, though not ideal. It would bypass all of the problems of allocator shims, raw-dylib, bundled static libraries, #[global_allocator], and the alloc error handler. It would not solve the problem of Linux kernel modules, though presumably the Linux kernel doesn't use libstd anyway so the problem is easier there.

In this model, the stable rlib format would be for "regular" non-std crates. To build a binary with Rust you need to have the libstdrust.a present on the link line, but that should be all you need to do. There may be different flavors of libstdrust.a depending on whether you want panic=unwind, panic=abort, different allocators, etc. That will result in some amount of on-disk duplication between the different flavors of libstdrust.a, but it shouldn't have any impact on the final compilation artifacts because all Rust in the process has to be built using a single set of these attributes anyway.

Note that you can generate libstdrust.a today by simply compiling a 0-byte .rs file with --crate-type=staticlib. I've tested that I can build working Rust binaries with that and regular old gcc/clang as the linker.

3 Likes

Indeed. In fact this strategy is allready in use for the rarely used dylib variant of rlib (crate-type dylib).

Why does it have to be an archive file?

Archive files exist for a reason. This reason is that merging everthing into a single object file requieres extra efford due to the partial linking step needed. (As well as not really solving the issue of preventing rustc to call the linker itself.) Also archives offer a convenient strategy to provide link-only-when-actually-needed code.

You can argue that crates are compilation units and in C/C++ one compilation unit produces an .o file, but then you ignore the fact that crates are commonly way larger then you C/C++ compilation unit.

1 Like