Bridging the Gap Between Rust's Theoretical IR Advantages and Real-World Performance

gaowujie2004 · January 15, 2026, 11:30am

Bridging the Gap Between Rust's Theoretical IR Advantages and Real-World Performance

Hi everyone,

I’ve been diving deep into the LLVM IR generated by rustc and comparing it with the output from Clang for equivalent C/C++ code. This has led me to a question regarding the current state of Rust's optimization pipeline.

The Premise: The "Information Dividend"

In theory, Rust generates LLVM IR that contains significantly richer semantic information than C/C++. This should, logically, allow the backend to perform more aggressive optimizations:

Aliasing Guarantees: Rust's &mut T maps to noalias, and &T maps to readonly. This provides stronger guarantees than C's restrict keyword (which is rarely used correctly) and should unlock aggressive load/store reordering, LICM, and auto-vectorization.
Type Constraints: References are guaranteed nonnull, and enums/booleans have strict value ranges (noundef, !range metadata).
Immutability: Shared references allow for confident Constant Subexpression Elimination (CSE).

The Reality

Despite these theoretical advantages, real-world benchmarks often show Rust performing on par with, or sometimes slightly trailing (~1-5%), highly tuned C/C++. It seems that this "information dividend" isn't fully capitalizing into runtime performance yet.

Hypotheses on Obstacles

Based on my analysis of the generated assembly and IR, I’ve hypothesized a few reasons for this. I’d love to know if these assessments align with the compiler team’s observations:

1. LLVM's "C-Bias" in Heuristics The LLVM backend has been tuned for decades to recognize C/C++ code patterns. Rust generated IR, while valid, often produces patterns (e.g., heavy Option/Result unwrapping, complex closure expansions in iterators) that may not trigger existing optimization heuristics designed for C-style loops and branches.

2. Implicit memcpy & Stack Inefficiency Due to Rust's Move semantics and the current implementation of Box::new, Release builds still exhibit a significant amount of memcpy instructions and heavy stack usage. While LLVM's MemCpyOpt exists, it seems insufficient to eliminate copies across function boundaries or complex control flows, leading to unnecessary memory traffic.

3. Bounds Checking Overhead While iterators eliminate many checks, index-based access still incurs overhead. C compilers often generate tighter code by leveraging Undefined Behavior (assuming no OOB access), a luxury Rust cannot afford by default.

Questions for the Team

To understand the roadmap for unlocking Rust's full potential, I am curious about the following directions:

Q1: MIR Optimizations vs. LLVM Upstreaming Is the long-term strategy to rely more on MIR Optimizations (like Destination Propagation) to "clean up" the code before it reaches LLVM, or is the focus shifting towards upstreaming Rust-specific optimization passes (or tuning heuristics) directly into LLVM?

Q2: The "Memcpy" Problem Regarding excessive stack copies (and Box::new stack overflows): Is a language-level "Placement New" feature still being actively explored, or is the consensus to solve this entirely through compiler smarts like improved NRVO and Destination Propagation?

Q3: Rust-Specific LLVM Tuning Are there active efforts to collaborate with the LLVM team to introduce heuristics specifically for Rust IR patterns? For example, better handling of the complex branch/switch structures often generated by match expressions on enums.

Q4: Code Bloat & Polymorphization What is the current status of Polymorphization? Is it viewed as the primary solution to Rust's binary size bloat and Instruction Cache pressure caused by aggressive monomorphization?

I would greatly appreciate any insights from the compiler team or contributors working on these fronts.

Thanks!

bjorn3 · January 15, 2026, 12:55pm

The old polymorphization implenentation got removed per

github.com/rust-lang/compiler-team

Delete current polymorphization implementation

geopend 12:38AM - 22 Nov 24 UTC

gesloten 10:39AM - 03 Dec 24 UTC

saethlin

T-compiler major-change major-change-accepted

# Proposal I am writing this because I keep tripping over the polymorphization …implementation while working on post-mono MIR optimizations. I want there to be a point in the compiler where entire MIR bodies are fully monomorphized and then they stay monomorphized. The current polymorphization implementation breaks that mental model, in at least two ways: * Anything that has been touched by polymorphization must not be re-monomorphized. This is surprisingly fiddly to uphold, because the current monomorphization strategy is that each rustc crate defines its own `fn monomorphize` and uses it to monomorphize individual consts, operands, etc. * This comment is entirely correct; polymorphization means we cannot use `TypingEnv::fully_monomorphized` even though this is after monomorphization. https://github.com/rust-lang/rust/blob/b19329a37cedf2027517ae22c87cf201f93d776e/compiler/rustc_ty_utils/src/abi.rs#L48-L63 So there's this rake, and I keep stepping on it, and it's quite annoying. I've done a GitHub code search for the flag and I cannot find any actual users: https://github.com/search?type=code&q=-Zpolymorphize+NOT+repo%3Arust-lang%2Frust++NOT+repo%3Amatthiaskrgr%2Ficemaker+NOT+repo%3Amatthiaskrgr%2Fglacier+NOT+repo%3Amatthiaskrgr%2Fglacier2+NOT+repo%3Arust-lang%2Fzulip_archive+NOT+path%3A%C2%B7tests%2Fui%2F*+NOT+path%3Atests%2Fui%2F*%2F*+NOT+path%3Atests%2Fui%2F*%2F*%2F*+NOT+path%3A%C2%B7src%2Ftest%2Fui%2F*+NOT+path%3Asrc%2Ftest%2Fui%2F*%2F*+NOT+path%3Asrc%2Ftest%2Fui%2F*%2F*%2F*+NOT+repo%3Adavidtwco%2Fzulip_archive+NOT+path%3Atests%2Fcrashes%2F*.rs&p=1 It doesn't make a whole lot of sense to be spending maintenance effort on a feature that may have users we can't see, but they are so few that none are public on GitHub. There are a few more details about the current state and history in the feature's tracking issue, including a link to a sketch of a redesign: https://github.com/rust-lang/rust/issues/124962 # Mentors or Reviewers *If you have a reviewer or mentor in mind for this work, mention them here. You can put your own name here if you are planning to mentor the work.* # Process The main points of the [Major Change Process][MCP] are as follows: * [x] File an issue describing the proposal. * [ ] A compiler team member or contributor who is knowledgeable in the area can **second** by writing `@rustbot second`. * Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a `-C flag`, then full team check-off is required. * Compiler team members can initiate a check-off via `@rfcbot fcp merge` on either the MCP or the PR. * [ ] Once an MCP is seconded, the Final Comment Period begins. If no objections are raised after 10 days, the MCP is considered **approved**. You can read [more about Major Change Proposals on forge][MCP]. [MCP]: https://forge.rust-lang.org/compiler/mcp.html # Comments **This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.**

Noratrieb · January 15, 2026, 5:07pm

Please don't use large language models to write your posts. It creates a large asymmetry between the time put in by you and the time expected from others.

scottmcm · January 15, 2026, 5:31pm

Do you have an example? Speaking in the abstract often goes less well than just looking specifically at one particular comparison where there can be a specific thing to address.

MIR is a poor IR for writing complex optimizations, so anything more than relatively-simple ones are best done in LLVM (or potentially in a hypotheitical "LIR" Low-level IR).

gaowujie2004 · January 16, 2026, 6:18am

我让ai帮我翻译，因为我不懂英文 ..... 抱歉

Noratrieb · January 16, 2026, 12:14pm

Using an automated translator is fine, but please use a traditional translation tool like Google Translate instead of an LLM, as LLMs tend to bad things to the text that mess up the meaning and styling while Google Translate will keep the translated version as close as possible.

thehatless · January 16, 2026, 5:43pm

Google translate is an llm that's optimised for speed rather than quality. In my experience, it mangles the meaning of text much more often than other tools.

(But it adopted the technology earlier than most, so people think of it as Virtuous Old rather than Frightening New, I guess.)

Topic		Replies	Views
Sum of 1..=n optimization compiler	5	563	December 26, 2025
How can we improve the rust-specific optimization? compiler	2	378	August 22, 2024
Using a custom optimisation pass pipeline compiler	7	3934	March 25, 2019
Impediments to transpile Rust to C? compiler	41	18776	March 25, 2019
Writing last few chapters of the guide compiler	1	647	March 25, 2019

Bridging the Gap Between Rust's Theoretical IR Advantages and Real-World Performance

Bridging the Gap Between Rust's Theoretical IR Advantages and Real-World Performance

The Premise: The "Information Dividend"

The Reality

Hypotheses on Obstacles

Questions for the Team

Related topics