Name mangling different on different architectures?

Hello!

I'm cross-compiling for cortex-m4 and I noticed that rustc 1.80.0 aarch64-unknown-linux-gnu and x86_64-unknown-linux-gnu generate different names for the same function:

  _ZN4util4zero17h5e1c088916b790b9E
  _ZN4util4zero17h2c45c9b7b1c70b92E

I believe that this in turn generates slightly different binaries. I would like my firmware to be reproducible so that others can confirm that the shipped binary is compiled from the open sources.

I thought those last numbers are the hash of the content of the function, but the content didn't change, and the hash is different for all the functions in the compiled file.

Does anyone know if I can make these hashes deterministic?

Regards

2 Likes

Try enabling -C symbol-mangling-version=v0 to see the actual symbol being hashed. It probably contains ABI or something platform-specific like that.

2 Likes

The output changed, but unfortunately still non-deterministic

_RNvCsfGR1dQmR8Ix_4util4zero
_RNvCsizA82bcACvt_4util4zero

Checking to confirm: you're using different host toolchains, but you're using the same cross-compilation targets, and getting different binaries?

That seems like a reasonable thing to expect. I think what may be happening is that the mangling is using a hash of the Rust toolchain, and so different compiled binaries of the same rustc sources are producing different ABIs. It would take some amount of additional work to make the hash that's used as input to the mangling be the same for two different builds of the same stable rustc sources, but it seems worth doing.

5 Likes

should fix this. It will be part of rust 1.81.

8 Likes

Yes exactly! I install the exact same version of rustc on two different hosts (ubuntu linux, x86_64 and linux aarch64 running in qemu on macos).

Cool, lucky me that it was already in the pipeline!

I tested 1.81.0-beta.5, but unfortunately the mangling still differs. The difference in the .text section is only 224 bytes now however, so that is much better.

_RNvCs5l0CocSwM29_4util4zero
_RNvCsiZidjvELykd_4util4zero
rustc 1.81.0-beta.5 (6c199c89c 2024-08-15)
binary: rustc
commit-hash: 6c199c89c72db4c1912674f872eabd34753b19f8
commit-date: 2024-08-15
host: aarch64-unknown-linux-gnu
release: 1.81.0-beta.5
LLVM version: 18.1.7
rustc 1.81.0-beta.5 (6c199c89c 2024-08-15)
binary: rustc
commit-hash: 6c199c89c72db4c1912674f872eabd34753b19f8
commit-date: 2024-08-15
host: x86_64-unknown-linux-gnu
release: 1.81.0-beta.5
LLVM version: 18.1.7

It is hard to find where the actual differences come from when the names are mangled differently.

$ bloaty -d symbols gcc-13-rustc-1.81-aarch64/firmware.elf -- Drop\ Box/gcc-13-rustc-1.81-x86_64/firmware.elf
    FILE SIZE        VM SIZE
 --------------  --------------
  [NEW] +32.8Ki  [NEW] +30.0Ki    _RNCNvNtNtCsjnXcpNX9h2k_13bitbox02_rust3hww3api11process_api0CsjHBc3zuapjH_15bitbox02_rust_c
  [NEW] +11.4Ki  [NEW] +10.7Ki    _RNCNvNtNtNtNtCsjnXcpNX9h2k_13bitbox02_rust3hww3api7bitcoin6signtx8__process0CsjHBc3zuapjH_15bitbox02_rust_c
  +0.9% +6.93Ki  [ = ]       0    [section .debug_line]
 -99.9% +6.90Ki -98.8% +3.75Ki    [2846 Others]
  [NEW] +6.33Ki  [NEW] +5.92Ki    _RNvMs0_NtNtNtNtCsjnXcpNX9h2k_13bitbox02_rust3hww3api7bitcoin8policiesNtB5_12ParsedPolicy17derive_at_keypath
  [NEW] +5.83Ki  [NEW] +5.39Ki    _RNvXsc_NtCsghQgz8ZvA2k_10miniscript10miniscriptINtNtB5_7private10MiniscriptNtNtCsbzOWeDbTIik_5alloc6string6StringNtNtB5_7context8Segwitv0ENtNtB7_10expression8FromTree9from_treeCsjnXcpNX9h2k_13bitbox02_rust
  [NEW] +4.97Ki  [NEW] +4.68Ki    _RNvXs_NtCs1Zwuxpl8oER_14bitcoin_hashes9ripemd160NtB4_10HashEngineNtB6_10HashEngine5input
  [NEW] +4.38Ki  [NEW] +4.05Ki    _RNCNvNtCsjnXcpNX9h2k_13bitbox02_rust3hww14process_packet0CsjHBc3zuapjH_15bitbox02_rust_c
  [NEW] +4.35Ki  [NEW] +4.06Ki    _RNvXs_NtCs1Zwuxpl8oER_14bitcoin_hashes6sha256NtB4_10HashEngineNtB6_10HashEngine5input
  [NEW] +4.31Ki  [NEW] +3.81Ki    _RNCINvNtNtNtNtCsjnXcpNX9h2k_13bitbox02_rust3hww3api8ethereum14sign_typed_msg13encode_memberINtNtNtCseP2c4Uu0nmS_6digest8core_api7wrapper11CoreWrapperNtCsNs7FEKU4RL_4sha313Keccak256CoreEE0CsjHBc3zuapjH_15bitbox02_rust_c
  [NEW] +4.13Ki  [NEW] +3.83Ki    _RNvMs1_NtNtNtNtCsjsVzd7dveep_16curve25519_dalek7backend6serial3u326scalarNtB5_8Scalar293mul
  [DEL] -3.91Ki  [DEL] -3.52Ki    _RNCNvNtNtNtNtCs31AKljRVKSG_13bitbox02_rust3hww3api8ethereum4sign7process0CshNzm7exkWu0_15bitbox02_rust_c
  [DEL] -4.13Ki  [DEL] -3.83Ki    _RNvMs1_NtNtNtNtCsePSYukMvsJy_16curve25519_dalek7backend6serial3u326scalarNtB5_8Scalar293mul
  [DEL] -4.34Ki  [DEL] -3.81Ki    _RNCINvNtNtNtNtCs31AKljRVKSG_13bitbox02_rust3hww3api8ethereum14sign_typed_msg13encode_memberINtNtNtCslAQ6jb1r6CW_6digest8core_api7wrapper11CoreWrapperNtCsbkFcGwbn2o6_4sha313Keccak256CoreEE0CshNzm7exkWu0_15bitbox02_rust_c
  [DEL] -4.35Ki  [DEL] -4.06Ki    _RNvXs_NtCs7H41KIA2VY_14bitcoin_hashes6sha256NtB4_10HashEngineNtB6_10HashEngine5input
  [DEL] -4.38Ki  [DEL] -4.05Ki    _RNCNvNtCs31AKljRVKSG_13bitbox02_rust3hww14process_packet0CshNzm7exkWu0_15bitbox02_rust_c
  [DEL] -4.97Ki  [DEL] -4.68Ki    _RNvXs_NtCs7H41KIA2VY_14bitcoin_hashes9ripemd160NtB4_10HashEngineNtB6_10HashEngine5input
  [DEL] -5.83Ki  [DEL] -5.39Ki    _RNvXsc_NtCser2FUcbI2eS_10miniscript10miniscriptINtNtB5_7private10MiniscriptNtNtCs833s49SVoLx_5alloc6string6StringNtNtB5_7context8Segwitv0ENtNtB7_10expression8FromTree9from_treeCs31AKljRVKSG_13bitbox02_rust
  [DEL] -6.33Ki  [DEL] -5.92Ki    _RNvMs0_NtNtNtNtCs31AKljRVKSG_13bitbox02_rust3hww3api7bitcoin8policiesNtB5_12ParsedPolicy17derive_at_keypath
  [DEL] -11.4Ki  [DEL] -10.7Ki    _RNCNvNtNtNtNtCs31AKljRVKSG_13bitbox02_rust3hww3api7bitcoin6signtx8__process0CshNzm7exkWu0_15bitbox02_rust_c
  [DEL] -32.8Ki  [DEL] -30.0Ki    _RNCNvNtNtCs31AKljRVKSG_13bitbox02_rust3hww3api11process_api0CshNzm7exkWu0_15bitbox02_rust_c
  +0.1% +9.89Ki  +0.0%    +224    TOTAL

For the different symbol names can you build with -v and diff the rustc invocations between both builds. Using -j1 may help getting the order deterministic. Otherwise sort would work too.

For the .text changes are you using --remap-path-prefix to ensure that paths that end up being embedded in the binary are the same on both machines?

Hmm, yeah there is some interesting differences..

Are the sources of rust-std different?

I checked and these folders are identical on both machines, except of course the path due to the toolchain name:

$ ls /opt/rustup/toolchains/beta-aarch64-unknown-linux-gnu/lib/rustlib/thumbv7em-none-eabi/lib/
liballoc-b8e640c80c99247d.rlib  libcompiler_builtins-679ee573caf6d8a5.rlib  libcore-afaa5e9723996f9c.rlib  librustc_std_workspace_core-10319abd33b68d85.rlib

Yes I'm using --remap-path-prefix. Perhaps I need to use that for the rlibs as well?

edit: Even if I add --remap-path-prefix=/opt/rustup/toolchains/beta-x86_64-unknown-linux-gnu= and --remap-path-prefix=/opt/rustup/toolchains/beta-aarch64-unknown-linux-gnu= the metadata hash still ends up being different. Anything else I can try?

The paths of rlibs never end up in anything anyway, so --remap-path-prefix doesn't help for them.

Are the source paths and -Cmetadata/-Cextra-filename arguments the only differences? If so the issue is somewhere in the way cargo determines the identity of the crates it builds.

Are you using -Zbuild-std? If not the exact same rlibs for the standard library should be used. If you do, does it reproduce without -Zbuild-std (eg for a tier 1 or tier 2 target). If it doesn't without -Zbuild-std, this may be a -Zbuild-std specific thing that remains the difference. Maybe it could be the fact that -Zbuild-std doesn't lock the dependencies of the standard library and instead takes the latest semver compatible versions from crates.io. Up until very recently the standard library sources were not shipped with a lockfile that would be usable by -Zbuild-std. Consider doing a forced lock of the standard library. ยท Issue #38 ยท rust-lang/wg-cargo-std-aware ยท GitHub tracks cargo support now that we do ship a usable lockfile for the standard library on recent nightlies.

Yeah, that is the only difference.

Yes. I'm locking down the deps by copying the lockfile that is in the rust-src "workspace" before running cargo vendor.

Even without -Zbuild-std I get the same result, i.e. different -Cmetadata. I'll try to reduce it to a small test-case.

Is there anyway to print some info from rustc to figure out what it uses for input to calculate the -Cmetadata?

You mean to get info from cargo? I believe you can use CARGO_LOG=trace (this is very verbose).

What about glibc and linker versions? For reproducible builds you need exactly the same versions of libraries to which your binary links.

In general, I would recommend to use Docker/Podman for reproducible builds, it removes a LOT of headache.

I'm running in docker, but this is the first time I'm trying two different host architectures in docker.

I'm currently building an minimal repro, and I think it relates to when dependencies are listed in a workspace.

Are you using proc macros by any chance? It is possible they get a different id depending on the host triple, which would then results in all dependent crates getting a different id and thus -Cmetadata argument.

Not that I know of. I have a reproduction now you can see the different metadata hashes in the last stages of these two builds:

The crate autocfg for example:

  • -C metadata=ff7757ee20672155
  • -C metadata=8d956e1242b16064

I'll make another one that doesn't use a workspace and doesn't have this problem.

edit: Hmm, maybe autocfg is the culprit.

If proc-macros get different IDs depending on the host tuplet that sounds like a bug too. Same goes for other host crates needed by build scripts or as dependencies of proc-macros.

I noticed that panic-halt gets the same ID in both runs. And panic-halt doesn't depend on autocfg. So I guess all of the other ones get differnt IDs because autocfg does.

edit: I made an issue in the issue tracker to see if the author has any ideas Getting different crate ids with different host compilers ยท Issue #69 ยท cuviper/autocfg ยท GitHub

edit2: Hmm, looking a bit more carefully. Autocfg is compiled for the host compiler. Could the issue then be with num-traits? Could it be that autocfg generates something slightly different for num-traits?

edit3: Found this now, thanks to the author of autocfg. Seems like the problem indeed is build scripts and proc-macros. it is being worked on. Non-reproducible -C metadata=hash passed to rustc depending on the compiling OS ยท Issue #8140 ยท rust-lang/cargo ยท GitHub

1 Like