Optimize Int to String Conversions


#1

As of today rust uses a naive implementation for converting integers to decimal strings in Int Debug/Display methods. So I propose adding an optimization for the most commonly used cases, that is, converting to decimal characters.

This should be non measurable overall but might give us a minor speedup on serde and rustc serializers.

I wrote a reasonably optimized version here. Further optimizations are possible but I tried to keep the code size small (which I think is important), it’s a road of diminished gains.

The benchmarks are nothing scientific but I tried to avoid the most common pitfalls. Timing includes formating machinery (buffer is always preallocated though) so it should be reasonably close to the real world code path.

Running with rust 1.2 nightly @ x64 Linux - Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz

test tests::new_08    ... bench:       562 ns/iter (+/- 111) (-18%)
test tests::new_16    ... bench:      1424 ns/iter (+/- 60)  (-08%)
test tests::new_32    ... bench:      3342 ns/iter (+/- 92)  (-16%)
test tests::new_64    ... bench:      7692 ns/iter (+/- 373) (-48%)
test tests::stdlib_08 ... bench:       626 ns/iter (+/- 12)
test tests::stdlib_16 ... bench:      1540 ns/iter (+/- 113)
test tests::stdlib_32 ... bench:      3887 ns/iter (+/- 72)
test tests::stdlib_64 ... bench:     11436 ns/iter (+/- 317)

#2

One problem I can see with your benchmarks is the 0ns baseline – this basically means your code got inclined to nothing (which you usually want to avoid in benchmarks).

Apart from that, good job. I see that you use a lookup table for the common case. AFAIK Java also does this, though on some platforms direct arithmetic conversion is faster.

As Kirk Pepperdine says: "Measure, don’t guess!"™


#3

The baseline (_warmup) is just to warm-up the processor (get it to a stable frequency), it takes 0ns because it’s a noop loop. I’ll remove that line to avoid further confusion.

As much as modern processors have a highly parallel ALU unit I doubt the arithmetic alternatives will be faster . I’ll give it a try though!


#4

On my system, a noop loop takes more than 0ns – so either there’s something wrong with my system, or your _warmup || 1 closure gets inlined away. I’ll investigate and come back with my findings.

Edit: I had forgotten to compile my benchmark with -O :blush:


#5

Are you aware of rust-strconv?


#6

I am. Heroic efforts on the float to string side.

I wanted a small/fast self contained version for integrating in stdlib though.


#7

I was pointing out that rust-strconv has int to string conversion, in addition to float to string conversion.


#8

I see. Judging by a quick look at the repo code I think my proposal is both smaller and faster, probably not in all cases though.

I’m fine with anything that’s faster than what we have today.


#9

Judging by a quick look at the repo code I think my proposal is both smaller and faster,

"Measure, don’t guess"™ — Kirk Pepperdine (and yes, he has trademarked that phrase)


#10

Fair enough, here are some more exaustive results.

Code here I used a fixed seed for a rng generator to make sure I got a good sample for each range.

test bench::new_u08     ... bench:     76928 ns/iter (+/- 8229)
test bench::new_u16     ... bench:    380763 ns/iter (+/- 22530)
test bench::new_u32     ... bench:   4091478 ns/iter (+/- 163228)
test bench::new_u64     ... bench:   5547373 ns/iter (+/- 413232)
test bench::stdlib_u08  ... bench:     83228 ns/iter (+/- 4394)
test bench::stdlib_u16  ... bench:    392138 ns/iter (+/- 28264)
test bench::stdlib_u32  ... bench:   4540254 ns/iter (+/- 332782)
test bench::stdlib_u64  ... bench:   6813518 ns/iter (+/- 478703)
test bench::strconv_u08 ... bench:     86298 ns/iter (+/- 6344)
test bench::strconv_u16 ... bench:    390341 ns/iter (+/- 17149)
test bench::strconv_u32 ... bench:   4564609 ns/iter (+/- 299032)
test bench::strconv_u64 ... bench:   6710341 ns/iter (+/- 341379)

#11

I updated the benchmarks with a small number skewed verion, which reflects real world usage much better.

Example length histogram for 32 bit numbers: [0, 4627, 451, 502, 551, 576, 647, 751, 757, 853, 285, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

test bench::skewed_new_u08     ... bench:     70107 ns/iter (+/- 4295)
test bench::skewed_new_u16     ... bench:    375911 ns/iter (+/- 32367)
test bench::skewed_new_u32     ... bench:   4408632 ns/iter (+/- 177293)
test bench::skewed_new_u64     ... bench:   5185259 ns/iter (+/- 233081)
test bench::skewed_stdlib_u08  ... bench:     77604 ns/iter (+/- 2557)
test bench::skewed_stdlib_u16  ... bench:    432629 ns/iter (+/- 24703)
test bench::skewed_stdlib_u32  ... bench:   5285474 ns/iter (+/- 262226)
test bench::skewed_stdlib_u64  ... bench:   6496465 ns/iter (+/- 225418)
test bench::skewed_strconv_u08 ... bench:     79549 ns/iter (+/- 2836)
test bench::skewed_strconv_u16 ... bench:    433645 ns/iter (+/- 19394)
test bench::skewed_strconv_u32 ... bench:   5243582 ns/iter (+/- 285716)
test bench::skewed_strconv_u64 ... bench:   6464695 ns/iter (+/- 240468)

#12

Issue here for those who might be interested https://github.com/rust-lang/rust/issues/26578