Optimize Int to String Conversions

arthurprs · June 15, 2015, 6:41pm

As of today rust uses a naive implementation for converting integers to decimal strings in Int Debug/Display methods. So I propose adding an optimization for the most commonly used cases, that is, converting to decimal characters.

This should be non measurable overall but might give us a minor speedup on serde and rustc serializers.

I wrote a reasonably optimized version here. Further optimizations are possible but I tried to keep the code size small (which I think is important), it’s a road of diminished gains.

The benchmarks are nothing scientific but I tried to avoid the most common pitfalls. Timing includes formating machinery (buffer is always preallocated though) so it should be reasonably close to the real world code path.

Running with rust 1.2 nightly @ x64 Linux - Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz

test tests::new_08    ... bench:       562 ns/iter (+/- 111) (-18%)
test tests::new_16    ... bench:      1424 ns/iter (+/- 60)  (-08%)
test tests::new_32    ... bench:      3342 ns/iter (+/- 92)  (-16%)
test tests::new_64    ... bench:      7692 ns/iter (+/- 373) (-48%)
test tests::stdlib_08 ... bench:       626 ns/iter (+/- 12)
test tests::stdlib_16 ... bench:      1540 ns/iter (+/- 113)
test tests::stdlib_32 ... bench:      3887 ns/iter (+/- 72)
test tests::stdlib_64 ... bench:     11436 ns/iter (+/- 317)

llogiq · June 15, 2015, 7:01pm

One problem I can see with your benchmarks is the 0ns baseline – this basically means your code got inclined to nothing (which you usually want to avoid in benchmarks).

Apart from that, good job. I see that you use a lookup table for the common case. AFAIK Java also does this, though on some platforms direct arithmetic conversion is faster.

As Kirk Pepperdine says: "Measure, don’t guess!"™

arthurprs · June 15, 2015, 7:17pm

The baseline (_warmup) is just to warm-up the processor (get it to a stable frequency), it takes 0ns because it’s a noop loop. I’ll remove that line to avoid further confusion.

As much as modern processors have a highly parallel ALU unit I doubt the arithmetic alternatives will be faster . I’ll give it a try though!

llogiq · June 15, 2015, 9:08pm

On my system, a noop loop takes more than 0ns – so either there's something wrong with my system, or your _warmup || 1 closure gets inlined away. I'll investigate and come back with my findings.

Edit: I had forgotten to compile my benchmark with -O

sanxiyn · June 16, 2015, 4:33am

Are you aware of rust-strconv?

arthurprs · June 16, 2015, 12:15pm

I am. Heroic efforts on the float to string side.

I wanted a small/fast self contained version for integrating in stdlib though.

sanxiyn · June 16, 2015, 12:53pm

I was pointing out that rust-strconv has int to string conversion, in addition to float to string conversion.

arthurprs · June 16, 2015, 3:33pm

I see. Judging by a quick look at the repo code I think my proposal is both smaller and faster, probably not in all cases though.

I’m fine with anything that’s faster than what we have today.

llogiq · June 16, 2015, 5:25pm

Judging by a quick look at the repo code I think my proposal is both smaller and faster,

"Measure, don't guess"™ — Kirk Pepperdine (and yes, he has trademarked that phrase)

arthurprs · June 16, 2015, 7:39pm

Fair enough, here are some more exaustive results.

Code here I used a fixed seed for a rng generator to make sure I got a good sample for each range.

test bench::new_u08     ... bench:     76928 ns/iter (+/- 8229)
test bench::new_u16     ... bench:    380763 ns/iter (+/- 22530)
test bench::new_u32     ... bench:   4091478 ns/iter (+/- 163228)
test bench::new_u64     ... bench:   5547373 ns/iter (+/- 413232)
test bench::stdlib_u08  ... bench:     83228 ns/iter (+/- 4394)
test bench::stdlib_u16  ... bench:    392138 ns/iter (+/- 28264)
test bench::stdlib_u32  ... bench:   4540254 ns/iter (+/- 332782)
test bench::stdlib_u64  ... bench:   6813518 ns/iter (+/- 478703)
test bench::strconv_u08 ... bench:     86298 ns/iter (+/- 6344)
test bench::strconv_u16 ... bench:    390341 ns/iter (+/- 17149)
test bench::strconv_u32 ... bench:   4564609 ns/iter (+/- 299032)
test bench::strconv_u64 ... bench:   6710341 ns/iter (+/- 341379)

arthurprs · June 17, 2015, 1:25pm

I updated the benchmarks with a small number skewed verion, which reflects real world usage much better.

Example length histogram for 32 bit numbers: [0, 4627, 451, 502, 551, 576, 647, 751, 757, 853, 285, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

test bench::skewed_new_u08     ... bench:     70107 ns/iter (+/- 4295)
test bench::skewed_new_u16     ... bench:    375911 ns/iter (+/- 32367)
test bench::skewed_new_u32     ... bench:   4408632 ns/iter (+/- 177293)
test bench::skewed_new_u64     ... bench:   5185259 ns/iter (+/- 233081)
test bench::skewed_stdlib_u08  ... bench:     77604 ns/iter (+/- 2557)
test bench::skewed_stdlib_u16  ... bench:    432629 ns/iter (+/- 24703)
test bench::skewed_stdlib_u32  ... bench:   5285474 ns/iter (+/- 262226)
test bench::skewed_stdlib_u64  ... bench:   6496465 ns/iter (+/- 225418)
test bench::skewed_strconv_u08 ... bench:     79549 ns/iter (+/- 2836)
test bench::skewed_strconv_u16 ... bench:    433645 ns/iter (+/- 19394)
test bench::skewed_strconv_u32 ... bench:   5243582 ns/iter (+/- 285716)
test bench::skewed_strconv_u64 ... bench:   6464695 ns/iter (+/- 240468)

arthurprs · June 25, 2015, 10:37pm

Issue here for those who might be interested https://github.com/rust-lang/rust/issues/26578

system · March 25, 2019, 8:24am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Proposal: integer conversion methods	27	3234	May 12, 2019
Optimizing Fallback Algorithms for Float Parsing internals	6	434	January 28, 2025
No sane way to generate [1., 1.5, 2., 2.5].iter().cloned()...?	3	401	July 12, 2024
About optimizations of `for` loops compiler	6	1154	August 25, 2023
[Review request] Floating point formatting libs	1	923	March 25, 2019

Optimize Int to String Conversions

Related topics