Librustc_driver.so not reproducible

Hello,

While building rust compiler (in a linux system where host and target are same) between two different build directories, we've seen a reproducibility issue with librustc_driver.so file and this we are able to fix by using the --remap-path-prefix option usage.

But, in our Yocto project we are building the rust compiler and there also we faced the same reproducibility issue (here the host and target are different i.e., the rust compiler will get cross-compiled for a different target). The solution of using --remap-path-prefix is not working here. There a section called '.rustc' in librustc_driver.so is differed and that making the binary comparison failed b/w two build directories.

Can I get some info from community on below...

  1. What is this .rustc section and what it contains? (I couldn't disasemble this section and coudn't find much info on doc as well)
  2. Why this section is dependent in build path?
  3. What could be the reasons for --remap-path-prefix not working? (I can see this option is passed during Yocto build as well for rust compilation)

Thanks...

1 Like

Are the respective host and target at least consistent between the builds you're comparing?

I know that Cargo includes the host in its crate metadata hash, so you will end up with different library and symbol hashes if you build the same target from different hosts.

This sounds like a terrible idea for having reproducible builds. That should really be opt-in, but is there at least a flag to turn it off?

For more context, I have approached the host metadata issue before:

In my case, I ended up working around this in my rpm builds by only building the cross-target parts from x86_64, but the resulting target libraries are still usable from other hosts.

In the general case two different rustc builds are not ABI-compatible with each other, e.g. they could be built with different target flags and thus have different call ABIs or with different struct layout randomization. Two different builds producing identical output is the exception, not the norm.

Yes, those are consistent (the host and target are in Yocto are for x86_64 arch i.e., "x86_64-poky-linux"). Is this the consistency you asked?

And, I am not building the target compiler with different hosts, but in the different build paths in build machine with same host. Also, the rlib's & .so are generated with identical hash b/w two build directories.

I can only see the difference in ".rustc" section of the generated objects in different rust build stages.

Below is an instance of the diff: (buildX & buildY are the build directories)

stage0-sysroot: libcstr-94890041bc3cec88.so

buildX: 0000000000000000 g O .rustc 0000000000000822 rust_metadata_cstr_39367acaf44167b3

buildY: 0000000000000000 g O .rustc 0000000000000820 rust_metadata_cstr_39367acaf44167b3

stage2: librustc_driver-62e98be45f5bb4e6.so

buildX: 0000000000000000 g O .rustc 00000000000028e1 rust_metadata_rustc_driver_d1b873a1521f7a0e

buildY: 0000000000000000 g O .rustc 00000000000028e6 rust_metadata_rustc_driver_d1b873a1521f7a0e

Ok, then it doesn't look like you're facing the issue I knew, but I don't have a guess for what's causing your difference.

Can we see a comparison of the output of readelf -x .rustc [library] for both versions of librustc_driver, please?

I cannot attach the text of the dump. So, attached the diff image.

Could you remove the first 16 bytes (a header added by rustc) and then decompress it? It is snappy compressed. That should make it easier to see what the actual change is without getting a cascade of differences. It would also show if it is maybe non-determinism in the compression rather than the data being compressed. By the way starting from rustc 1.73 the crate metadata in dylibs is no longer compressed to avoid a perf regression in a bugfix.

@bjorn3 I tried a build with rust 1.73 and still see some compressed data (may be not with Snappy as I couldn't see that string now). How can I decompress this section to a plain text.

I tried the previous data (generated with 1.70.0) to decompress with snappy but it was giving some errors like 'unknown start of token: \' & 'unterminated character literal'... etc, but that was the data generated by the rust compiler in .so file.

There shouldn't be any compression. This place looks like it is the list of dependencies. Could you take libdigest-<somehash>.rmeta for both compilations and show the diff? This is a crate which according to this diff you posted has a different crate hash and it shouldn't have any dependencies beyond the standard library. And could you see if libcore-<somehash>.rlib has any differences already?

@bjorn3 I compared the libdigest-<somehash>.rmeta, libdigest-<somehash>.rlib & libcore-<somehash>.rlib files geneared during stage0&1. Only the stage0-rustc artifacts are differed in .rmeta file. Since rmeta is not a binary I couldn't dump it. Below is the diff.

As per the rust doc, the rmeta files do not support linking, since they do not contain compiled object files. And also the doc says, A dylib is a platform-specific shared library. It includes the rustc [metadata] in a special link section called .rustc in a compressed format.

Does the rmeta changes influence the final binaries, and why this rmeta section is changing by build path?

Is there any tool available to dump the rmeta files?

The .rmeta files are identical to the .rmeta section in dylibs and the lib.rmeta files in rlibs.

1 Like

rustc -Zls /path/to/file (rustc -Zls=full /path/to/file for newer nightly, I extended it in https://github.com/rust-lang/rust/pull/115735) can dump some information from the crate metadata of a .rmeta, .rlib or rust .so. Make sure to use the same rustc as the one that originally compiled it. Most information in the crate metadata can't be dumped using rustc -Zls though and there is no external tool which works with current rustc versions.

@bjorn3 , thanks for the info, this update helps me analyze the issue further.

I've checked the diff of the metadata info pulled by the -Zls=y option for libdigest rmeta file and this differes are in libcrypto_common-<somehash>.rmeta, libgeneric_array-<somehash>.rmeta, libtypenum-<somehash>.rmeta & libblock_buffer-<somehash>.rmeta files. I tried comparing a few other rmeta files also, those are also differed in the same set of files. Out of these, The libtypenum-<somehash>.rmeta has not shown any diff in it's dependencies but still shows a different fingerprint/hash.

How to understand this fingerprint/hash diff i.e., is it expected to change when a build path of rust sources changed? I had a look into cargo/src/cargo/core/compiler/fingerprint/mod.rs file, the comments shows that the fingerprint is depending on the Path of the source file.

Let me explain a bit more detail about my observations.

I built the rust in -

/home/workspace/rustc/buildX & buildY directories (buildX & buildY are my build directories) which differ in the librustc_driver-<somehash>.so file. (File path - <>/x86_64-unknown-linux-gnu/stage2/lib/)

I added a --remap-path-prefix command (by providing the build directory paths to replace with a some string) in the src/bootstrap/builder.rs in fn cargo function and this generated a identical librustc_driver-<somehash>.so file in buildX & buildY directories.

The same rust sources I rebuild in another directory -

/home/workspace/extended-path/rustc/buildX & buildY with the '--remap-path-prefix' command. Now, when I compared the librustc_driver-<somehash>.so in,

/home/workspace/rustc/buildX & /home/workspace/extended-path/rustc/buildX

these are differed again in .rustc section. (And, I read the fingerprint of some libs in two directories and those are entirely different)

So, for me it seems the rust is generating the fingerprint based on build path and which is getting changed when the build path is changed.

Is there any command which fixes this issue with fingerprint/.rustc sections or it's a bug in rust?

Please let me know if any additional info is needed or anything is unclear.

With remap-debuginfo enabled in config.toml everything should be reproducible when changing the source dir location. If that isn't the case like here, that is a bug.

By the way I just now noticed that you were manually passing --remap-path-prefix instead of enabling remap-debuginfo.

Could you show the diff of this file?

@bjorn3 Thanks a lot for letting me know about remap-debuginfo that actually fixed the issue now the .rustc section is identical.

Here is the libtypenum-<somehash>.rmeta diff.

But, I've another section .dynstr is still differing, which is holding the absolute rpath (This is in our Yocto project, there we are passing the -Clink-args=-Wl,-rpath,$ORIGIN/../lib, and this rpath is getting adding to the .dynstr section) . These remap-debuginfo & --remap-path-prefix options won't work on entire object file?

Great! I'm honestly surprised it is not the default for the dist profile.

Rust's build system sets rpath too by default. (you can use the rpath config option to disable this) We should be using a relative rpath though:

{libdir} here is lib by default and needs to be explicitly changed in config.toml if you want a different value.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.