Implementing a Fast, Correct Float Parser

~2 years ago, I posted about making Rust float-parsing fast and correct. Although I had planned on working towards integrating these changes into the Rust core library, due to personal reasons (mental health), I needed to take a step back. I'm much better suited to implement the necessary changes now, and also, have improved various algorithms along the way. In addition, for integration into a popular crate, I've created a barebones library that should be a sufficient starting point for integration this functionality into core.

If this is of interest, please respond here so I can find appropriate channels to work towards integration this into the rust core library.

Background

First, a quick bit of background, in order to ensure everyone reading this is familiarized with the issues in accurate float parsing. IEEE-754 floating point numbers have a sign bit, exponent bits, and then mantissa (significant digit) bits, and the precision is 1+mantissa_size, due to an implicit hidden bit. Although float parsing is generally simple, decimal (base-10) numbers cannot be exactly represented in a fixed-width binary (base-2) float, and it can take up to 767 digits to unambiguously determine how to round the number.

A simple example of this phenomenon is, for single-precision (f32) floats:

  • 16777216.0 => 100000000000000000000000 0
  • 16777217.0 => 100000000000000000000000 1
  • 16777218.0 => 100000000000000000000001 0
  • 16777219.0 => 100000000000000000000001 1
  • 16777220.0 => 100000000000000000000010 0

The first bit is the implicit, hidden bit. The next 23 bits are the significant digits, prior to rounding. The bit after the space is truncated, prior to rounding. IEEE-754 defaults to round-nearest, tie-even, so the examples would therefore round to:

  • 16777216.0 => 100000000000000000000000 0 => 16777216.0
  • 16777217.0 => 100000000000000000000000 1 => 16777216.0
  • 16777218.0 => 100000000000000000000001 0 => 16777218.0
  • 16777219.0 => 100000000000000000000001 1 => 16777220.0
  • 16777220.0 => 100000000000000000000010 0 => 16777220.0

This can get significantly more complex with larger more complicated numbers, since we can require 112 digits for f32, or 767 digits for f64 to determine the correct way to round.

For terminology, for near-halfway cases we will use the terms b to represent the float, rounded-down, b+h to represent the halfway case, and b+u to represent the next, positive float.

Approach

A correct float parser is generally broken into 3 general algorithms:

  • A fast path algorithm, using native floats.
  • 0 or more moderate path algorithms, using extended-precision, fixed-width floats.
  • A slow path algorithm, using arbitrary-precision arithmetic.

We'll assume we've parsed the number at this stage, and extracted the significant digits, the raw exponent, and the exponent relative to the significant digits. This expects all leading zeros in the mantissa to be trimmed, and all trailing zeros (after the decimal point) to similarly be trimmed.

For example, "1.2345e5" would be:

let mantissa = "12345";
let raw_exponent = 5;
let mantissa_exponent = 1;
let digits = mantissa.parse::<u64>().unwrap();

Just a quick notice: all the estimates of performance below are estimated on an i7-6560U CPU @ 2.20GHz, using the rust target x86_64-unknown-linux-gnu.

Fast Path

A decimal number can be exactly represented as a binary float if the number of digits can fit exactly in the mantissa of the float, without truncation, and the exponent can also be represented as a native float. For an f32, the exponent limits are [-10, 10], inclusive, and for an f64, they are [-22, 22], inclusive. This is simply ⌊(mantissa_size + 1) / log2(5)⌋, since we can remove a power-of-two (since it can be exactly represented).

If the significant digits can be represented exactly without truncation (digits >> 53 == 0), and the exponent can be represented exactly (mantissa_exponent >= -22 && mantissa_exponent <= 22), then we have a valid float, and can simply multiply digits * 10**exponent, where the power is precomputed to ensure accuracy.

If the exponent is larger than this limit, we might still have a disguised fast-path algorithm, and can shift digits from the exponent to the mantissa. If the exponent is >10 && <= 17 (for f32) or > 2 &&2 <= 37 (for f64), we can try to shift digits over, see if we still have an untruncated mantissa, and then multiply digits * 10**exponent as before.

Parsing using the fast-path can take ~20-45ns, depending on the length of the input, and almost entirely depends on the speed of parsing the mantissa. Rust already has a fast-path algorithm, although I am unsure if it handles it does not handle disguised fast-path cases.

Moderate Path

If we cannot use the fast-path algorithm, we need to fall back to 0 or more moderate-path algorithms, which are considerably faster than the slow-path algorithms. My library uses extended-precision float representations, and there are 2 algorithms I'd like to propose for inclusion (of which, any or both of them may be included).

  • An extended-float representation using 64-bits for the mantissa and 16-bits for the binary exponent.
  • The Eisel-Lemire algorithm, which creates a 128-bit (fallback 192-bit) representation of the mantissa and infers the binary exponent from the mantissa exponent.

Both algorithms use 64-bit multiplication to scale the normalized mantissa (the MSB is 1) by the decimal exponent, using precomputed powers of 10. However, the extended-float representation truncates intermediate results to 64-bits, and calculates the error from this truncation, to estimate if the errors could lead to ambiguity in halfway cases. An implementation of this algorithm can be found here, although this is based off the Go implementation. For licensing reasons, I would likely explain the algorithm to a core developer ( which are trivial to implement), add the necessary routines in the ExtendedFloat struct to core (which I can contribute freely). Meanwhile, the Eisel-Lemire algorithm computes the high and low bits of the 64-bit multiplication, and uses them to create a 128-bit representation of the float. If this representation cannot be differentiated from halfway cases, a 192-bit representation is attempted. An implementation of this algorithm can be found here, and it is similarly based off the Go implementation.

Both algorithms, however, are trivial to implement and may be done in <80 lines of code. Parsing using the fast-path can take ~50-60ns, depending on the length of the input, and handles the vast majority of the remaining cases. Note that there is not 100% overlap between both algorithms, so if performance is the primary concern, both may be implemented together for optimal speed.

If the extended-float algorithm is used, it requires of 76 u64s for precomputed powers of 10. If the Eisel-Lemire algorithm is used, it requires ~1300 u64s for precomputed powers of 10. If both are used, the extended-float algorithm can use Eisel-Lemire's precomputed powers, so no additional storage is required.

Slow Path

Although initially (as described in the post) I used a modified form of Algorithm-M (which Rust currently uses, to my knowledge), I have implemented my own algorithms with much faster performance here. The general approach is quite simple: create an arbitrary-precision integer (with at most ~3600 bits of precision), parse the significant digits into the big integer.

If the value is larger than 1, we can use large_atof, which then scales the significant digits by a power-of-ten, extracts the high 64-bits of the number, creates an ExtendedFloat from this representation, and rounds-nearest, ties-even.

If the value is less than 1 (including denormal floats), we use the create a theoretical representation of the bits of b (generated from the moderate path). We then scale the two numbers to the same binary exponent (first by removing a power-of-two) and significant digits, compare the digits, and round up or down accordingly.

These algorithms are significantly faster than rust core (by up to 500x), and are more comprehensive (can parse large and subnormal floats that are beyond the scope of core right now). The speed of these algorithms for near-halfway cases is ~200ns (for 20-30 significant digits), ~1.2μs for 400 digits, and ~6.5μs for 6400 digits. This is orders of magnitude faster than rust's existing implementation, and comparable with GCC's strtod for very large inputs (but considerably faster for small inputs).

Validation

In order to ensure the float parser is correct, I've done extensive validation, using a curated test-suite of near-halfway cases and other difficult-to-parse examples, submitted it to the comprehensive float tests from the rust core library, and the curated tests used by Wuzz and Go for validation. Furthermore, these algorithms have been used in serde-json, nom, and downloaded more than 5 million times with no reported issues in correctness since very initial versions. If there are errors in correctness, it is extremely unlikely.

Additional Considerations

The last time I proposed integrating slow-path algorithms into rust, I learned that the internal big-integer types do not do normalization after operations (this may have changed recently). As my algorithms only need normalization to the nearest digit (so there are no zero most-significant-digits), this should be trivial to alleviate with a normalize function.

Appendix - Radix Conversions

The algorithm for determining the number of digits required to round is the following, in Python:

import math

def digits(emin, p2, b=10):
    return -emin + p2 + math.floor((emin+ 1)*math.log(2, b)-math.log(1-2**(-p2), b))

f32_digits = digits(-126, 24)
f64_digits = digits(-1022, 53)

Appendix - Repository

I've created a minimal repository (so I don't have to build Rust everytime I make changes) tracking these changes, with each branch representing changes to improve each alghorithm.

Appendix - Open Issues and Pull Requests

#85198: Improved Performance for Disguised Fast-Path Cases in Float Parsing

#85214 Infer, Rather than Store, Binary Exponents when Float Parsing

#85234 Improve Float Parsing Speeds by up to 99.94% Through Improvements to the Bellerophon Algorithm

16 Likes

Looking more carefully at the fast path implementation, it does not seem core currently handles disguised fast path cases. This would be a trivial PR, and would be a first step towards improving float parsing in rust.

2 Likes

I've since opened 3 issues with improvements to the fast-path and moderate-path algorithms. These focus on relatively minor, incremental changes with major performance impacts, since Hanna Kruppe, who originally authored Rust's float parsing algorithms, has retired from Rust core development.

Any review would be greatly appreciated. In addition, another library has considered integrating its float-parsing algorithms into Rust:

6 Likes

I've submitted a PR implementing a modified version of fast-float-rust for better safety guarantees and improved code generation to core. Feedback is more than appreciated.

10 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.