Pre-ACP: Un-specialize impl ToString

Arguments::as_str is a similar "just for the purpose of optimization" API, so there is some precedent for adding shortcuts for the "essentially just a string" case.

The main reason this is difficult AIUI is the dynamicism involved. Even if we ignore potential difficulties around reference validity guarantees[1], while it could be somewhat straightforward to replace fn fmt(&self, &mut Formatter<'_>) -> Result with fn fmt(&self, &mut dyn Write + '_) -> Result, the devirtualization to remove the dyn dispatch (required to actually DCE the fmt machinery) is much less straightforward.

That said, an MIR pass which attempts some amount of devirtualization would be an interesting project. AIUI most MIR opts have been focused on reducing the amount of IR passed to LLVM, and devirtualization would usually move in the other direction, but perhaps there's a heuristic that rustc could use that could remain a net positive?

After current MIR inlining, calling <&String as Display>::fmt(s, f) or <str as Display>::fmt(s, f) with s: &&String, f: &mut Formatter look the same, modulo debug information. A call to <&String as ToString>::to_string is just a call, whereas <str as ToString>::to_string is fully inlined. (playground links)

ToString::to_string is already marked #[inline] with a note that while unconventional for a generic impl, it has significant perf impact (ref: #74852). <&_ as Display>::fmt does not have such an #[inline] annotation; perhaps adding it would enable <&String as ToString>::to_string to be inlined?

I recall seeing that the heuristic for auto-#[inline] is roughly that no MIR call statements exist in the optimized MIR. <str as ToString>::to_string obviously does include call ops (into allocation, as well as Vec::deref, interestingly[2]).

Subobservation: Vec::deref isn't known to not unwind. I would've hoped it'd just've been that MIR always includes unwind edges, but Vec::deref has -> [unwind continue] where a #[rustc_nounwind] call has -> [unwind unreachable]. An MIR pass/opt to record cross-crate functions known to never unwind could potentially unblock some hidden optimizations, if not at the LLVM level, then at least at the MIR level.

Edit to add: reported Vec::deref MIR inlining regression as an issue


  1. I'm not sure exactly how relevant it is here, but it can be difficult to automatically optimize fn(&Scalar) into fn(Scalar) because while it's a validity requirement for the reference to be dereferenceable to sufficient bytes, there's no validity requirement for the bytes to be a valid instance of the scalar (currently; disclaimer: undecided, my own non-normative recollection, etc) even if we derive proof that the address is irrelevant. ↩︎

  2. And this is despite the function being marked as #[inline]. Here it is open coded (shows there aren't any reachable unwinding edges). Gut guess: the call to std::slice::from_raw_parts::precondition_check is blocking inlining :slightly_frowning_face: Justification: on stable, it inlines and doesn't include that call, making it a single straightline basic block, whereas it does include the UB check on beta and doesn't inline there. ↩︎

5 Likes