How do declarations works across Crates, esp. with mangling?

In a C compiler, usage of an external function can be split into two completely separate steps: A header file provides a (forward) declaration, while the actual implementation will not be relevant until linking.

With Rust, its module system and Crates, those two steps are more entwined: A use or extern crate statement will lead to both, the external symbol being available (declared) and its Crate getting linked (plus stuff like monomorphization, which is irrelevant for now).

However, the declaration part will not work if the external Crate is available in binary format only. So I suspect an rlib does provide the information required to use its symbols (function signatures etc.), probably as part of Crate metadata. Am I right here? Does Crate metadata really contain information about function signatures?

This gets even more interesting when taking name mangling into account. The mangling v0 RFC points out that the current (legacy) mangling scheme depends on compiler internals and does not generate predictable names:

Since the current scheme generates its hash from the values of various compiler internal data structures, an alternative compiler implementation could not predict the symbol name, even for simple cases.

In my experience, this means that the mangled name of a function depends on a lot of obscure factors and can't even be derived by the same compiler implementation without access to the full source code. This yields the question of how the compiler knows the mangled names of external functions.

Once again, I would expect this information to be part of Crate metadata. But is it really?

Unfortunately, documentation on both Crate metadata and mangling is pretty sparse, so I appreciate any clues!

Compiler generates rmeta files, which contain metadata about crates.

I am aware of rmeta files, they are what I had been referring to by "Crate metadata". However, I find it hard to identify what is contained in these metadata files and what not.

The docs you linked to for rustc_metadata::rmeta seem to be the most expansive information available. I feel like within that module, FnData would be structure closest to my question. And FnData only has as param_names field, no trace of param types or mangling information.

I'm not sure either, but CrateRoot also has a "disambiguator":

https://doc.rust-lang.org/stable/nightly-rustc/rustc_span/crate_disambiguator/struct.CrateDisambiguator.html

and a list of exported symbols:

https://doc.rust-lang.org/stable/nightly-rustc/rustc_middle/middle/exported_symbols/enum.ExportedSymbol.html

fn_sigs are encoded here, separate from the FnData, which is handled earlier in that same function.

I'm pretty sure rustc recomputes the symbols of upstream functions when it needs them, and the symbols are not stored directly in the metadata. The computation of the disambiguating hash for the symbol is based on things like def path and fn_sig which are all available in the metadata.

1 Like

I wrote a chapter on how cross-crate metadata is handled and stored here: Libraries and Metadata - Guide to Rustc Development if you want a high-level overview.

I believe each time the compiler needs a symbol for a remote crate, it uses the symbol_name query to recompute it which is defined here: rust/lib.rs at 36a4d14c7edba21bba14df00b9e6e4a111dfc6f2 · rust-lang/rust · GitHub

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.