The current machinery in core::fmt
is built in a way that any expansion of format!
/write!
/etc keeps most of the code out of the call site, and inside core::fmt::write()
. To do this, it has to pass an array of arguments, and since the arguments may be different types, it uses (sort-of-but-different) trait objects to do so. This results in formatting to be noticeably slower than pushing values to a string manually.
After some rusty archaeology, the best I could find was this issue, stating:
It should not emit lots of inline code to bloat up function bodies, instead delegating to core::extfmt functions marked with
#[inline]
, leaving the amount of inlining up to LLVM.
I couldn't find any other discussion really, if there is some, I'd welcome a link!
v2
Instead of creating a bunch of core::fmt::ArgumentV1
s, putting them in an array (core::fmt::Arguments
), and calling in to core::fmt::write(buf, args)
, we could expand the code directly in the macro call site. Take this example:
let s = format!("Hello, {}, and welcome back to {}. You last visited on {}.", name, place, date);
With v1:
let s = ::fmt::format(::std::fmt::Arguments::new_v1({
static __STATIC_FMTSTR:
&'static [&'static str]
=
&["Hello, ",
" and welcome back to ",
". You last visited on ",
"."];
__STATIC_FMTSTR
},
&match (&name,
&place,
&date)
{
(__arg0,
__arg1,
__arg2)
=>
[::std::fmt::ArgumentV1::new(__arg0,
::std::fmt::Display::fmt),
::std::fmt::ArgumentV1::new(__arg1,
::std::fmt::Display::fmt)],
::std::fmt::ArgumentV1::new(__arg2,
::std::fmt::Display::fmt)],
}));
With a proposed v2:
let s = match (&name,
&place,
&date)
{
(__arg0,
__arg1,
__arg2)
=> do catch {
let mut __buf = String::new();
__buf.push_str("Hello, ");
Display::fmt(__arg0, &mut ::std::fmt::Formatter::new_v2(&mut __buf))?;
__buf.push_str(" and welcome back to ");
Display::fmt(__arg1, &mut ::std::fmt::Formatter::new_v2(&mut __buf))?;
__buf.push_str(". You last visited on ");
Display::fmt(__arg2, &mut ::std::fmt::Formatter::new_v2(&mut __buf))?;
__buf.push_str(".");
Ok(buf)
}.expect("a formatting trait implementation returned an error")
};
With this specific example, v2 runs about 4x faster on my machine. The performance difference grows the more arguments you have in the macro call. If you add in positional arguments ("Hello, {1}, and welcome back to {2}. You last visited on {0}.
), v1 takes an additional 10%, while v2 stays constant, since v1 calculates the position of the argument in the Arguments
array at run time, whereas v2 could determine that at compile time.
Motivation
The design of v1 feels like the opposite stance that Rust usually takes in most other situations. It's usually encouraged to take generics (allow monomorphization), and if you worry about code bloat, then use trait objects instead. fmt
v1 says use trait objects, and if you want faster, don't use fmt
.
To be consistent, and therefore less surprising ("don't use format!("{}", s)
, its slower than just s.to_string()
!"), we could change to a v2 that does the above suggestions. Since pretty much all of the internals of fmt
are marked unstable
, I believe it can be done with 0 breaking changes.
For anyone wanting the original behavior since they want to remove code bloat, they can do as suggested in every other situation in Rust: explicitly choose to use a trait object.
let s = format!("{}", &foo as &Display);
Can we do this?