Changing core::fmt for speed


#1

The current machinery in core::fmt is built in a way that any expansion of format!/write!/etc keeps most of the code out of the call site, and inside core::fmt::write(). To do this, it has to pass an array of arguments, and since the arguments may be different types, it uses (sort-of-but-different) trait objects to do so. This results in formatting to be noticeably slower than pushing values to a string manually.

After some rusty archaeology, the best I could find was this issue, stating:

It should not emit lots of inline code to bloat up function bodies, instead delegating to core::extfmt functions marked with #[inline], leaving the amount of inlining up to LLVM.

I couldn’t find any other discussion really, if there is some, I’d welcome a link!

v2

Instead of creating a bunch of core::fmt::ArgumentV1s, putting them in an array (core::fmt::Arguments), and calling in to core::fmt::write(buf, args), we could expand the code directly in the macro call site. Take this example:

let s = format!("Hello, {}, and welcome back to {}. You last visited on {}.", name, place, date);

With v1:

let s = ::fmt::format(::std::fmt::Arguments::new_v1({                                                                       
    static __STATIC_FMTSTR:
              &'static [&'static str]
              =
           &["Hello, ",
             " and welcome back to ",
             ". You last visited on ",
             "."];
       __STATIC_FMTSTR
    },
    &match (&name,
           &place,
           &date)
        {
        (__arg0,
         __arg1,
         __arg2)
        =>
        [::std::fmt::ArgumentV1::new(__arg0,
                                     ::std::fmt::Display::fmt),
         ::std::fmt::ArgumentV1::new(__arg1,
                                     ::std::fmt::Display::fmt)],
         ::std::fmt::ArgumentV1::new(__arg2,
                                     ::std::fmt::Display::fmt)],
    }));

With a proposed v2:

let s = match (&name,
           &place,
           &date)
        {
        (__arg0,
         __arg1,
         __arg2)
        => do catch {
            let mut __buf = String::new();
            __buf.push_str("Hello, ");
            Display::fmt(__arg0, &mut ::std::fmt::Formatter::new_v2(&mut __buf))?;
            __buf.push_str(" and welcome back to ");
            Display::fmt(__arg1, &mut ::std::fmt::Formatter::new_v2(&mut __buf))?;
            __buf.push_str(". You last visited on ");
            Display::fmt(__arg2, &mut ::std::fmt::Formatter::new_v2(&mut __buf))?;
            __buf.push_str(".");
            Ok(buf)
        }.expect("a formatting trait implementation returned an error")
        };

With this specific example, v2 runs about 4x faster on my machine. The performance difference grows the more arguments you have in the macro call. If you add in positional arguments ("Hello, {1}, and welcome back to {2}. You last visited on {0}.), v1 takes an additional 10%, while v2 stays constant, since v1 calculates the position of the argument in the Arguments array at run time, whereas v2 could determine that at compile time.

Motivation

The design of v1 feels like the opposite stance that Rust usually takes in most other situations. It’s usually encouraged to take generics (allow monomorphization), and if you worry about code bloat, then use trait objects instead. fmt v1 says use trait objects, and if you want faster, don’t use fmt.

To be consistent, and therefore less surprising (“don’t use format!("{}", s), its slower than just s.to_string()!”), we could change to a v2 that does the above suggestions. Since pretty much all of the internals of fmt are marked unstable, I believe it can be done with 0 breaking changes.

For anyone wanting the original behavior since they want to remove code bloat, they can do as suggested in every other situation in Rust: explicitly choose to use a trait object.

let s = format!("{}", &foo as &Display);

Can we do this?


#2

Perhaps as a compromise, the macro could write your v2 code in a local fn and call that? This way the “bloat” is held at arm’s length, and it’s up to LLVM whether it’s worth inlining.


#3

How would that work with slog (not formatting till it’s needed)?


#4

Put differently, what does format_args! expand to? Some equivalent of fmt::Arguments is needed.


#5

Seeing as everything about fmt::Arguments except for its name and fmt::Debug/fmt::Display implementations are unstable, it can be changed to just hold a single closure (&Fn(&mut Formatter) -> Result) that has the same expansion inside.

let args = format_args!("Hello {}!", name);

Would expand to:

let args = ::std::fmt::Arguments {
    f: &match (&name,) {
        (__arg0,) => |__f| {
            __f.write_str("Hello, ")?;
            Display::fmt(__arg0, __f)?;
            __f.write_str("!")?;
            Ok(())
        }
    }
};

It’d be slightly slower than inlining it completely, but it allows you to delay the formatting, just like you can now.

(I struggled for a while to get the lifetimes to work, until I realized that the example expression fails even in stable Rust. You must always pass the Arguments to another function with a lifetime set, and this v2 expansion works correctly in the same places as v1 does.)


#6

I don’t have opinion on the implementation, but fully support the goal. Since format! and friends are macros, I’ve been assuming they’re expanded inline anyway.


#7

The closure idea is the first form of this I’ve seen that is actually feasible IMO.

That is, a closure won’t need more code in the original function, in fact there should be only the minimal amount of stack data necessary to point to all the values being printed.

There is another way to get this minimal result, that results in static data as today, but AFAICT it requires VG or some form of typeof to infer all the types.

Now that said, there will be more code generated in total, which LLVM will have to optimize, and if you allow inlining it’s going to bloat up the callsites.

I have no idea how bad it will be in practice nowadays since I haven’t touched it in years, so experimentation is welcome, but I’d be surprised if the results were satisfying.


#8

I fully support this goal as well. Formatters are going to continue to be leaned on heavily by engineers coming to rust from other langs and they won’t know when to use format over string building. I think 90% of the cases, call site bloat is not on one’s mind and they believe the macros are expanding code inline anyways, but for the other 10% (and I think I’m being generous here) that are conscientious about function bloat, they have the escape hatch of trait objects.


#9

This looks really promising. I’d love to see this.