~2 years ago, I posted about making Rust floatparsing fast and correct. Although I had planned on working towards integrating these changes into the Rust core library, due to personal reasons (mental health), I needed to take a step back. I'm much better suited to implement the necessary changes now, and also, have improved various algorithms along the way. In addition, for integration into a popular crate, I've created a barebones library that should be a sufficient starting point for integration this functionality into core.
If this is of interest, please respond here so I can find appropriate channels to work towards integration this into the rust core library.
Background
First, a quick bit of background, in order to ensure everyone reading this is familiarized with the issues in accurate float parsing. IEEE754 floating point numbers have a sign bit, exponent bits, and then mantissa (significant digit) bits, and the precision is 1+mantissa_size
, due to an implicit hidden bit. Although float parsing is generally simple, decimal (base10) numbers cannot be exactly represented in a fixedwidth binary (base2) float, and it can take up to 767 digits to unambiguously determine how to round the number.
A simple example of this phenomenon is, for singleprecision (f32
) floats:

16777216.0
=>100000000000000000000000 0

16777217.0
=>100000000000000000000000 1

16777218.0
=>100000000000000000000001 0

16777219.0
=>100000000000000000000001 1

16777220.0
=>100000000000000000000010 0
The first bit is the implicit, hidden bit. The next 23 bits are the significant digits, prior to rounding. The bit after the space is truncated, prior to rounding. IEEE754 defaults to roundnearest, tieeven, so the examples would therefore round to:

16777216.0
=>100000000000000000000000 0
=>16777216.0

16777217.0
=>100000000000000000000000 1
=>16777216.0

16777218.0
=>100000000000000000000001 0
=>16777218.0

16777219.0
=>100000000000000000000001 1
=>16777220.0

16777220.0
=>100000000000000000000010 0
=>16777220.0
This can get significantly more complex with larger more complicated numbers, since we can require 112 digits for f32
, or 767 digits for f64
to determine the correct way to round.
For terminology, for nearhalfway cases we will use the terms b
to represent the float, roundeddown, b+h
to represent the halfway case, and b+u
to represent the next, positive float.
Approach
A correct float parser is generally broken into 3 general algorithms:
 A fast path algorithm, using native floats.
 0 or more moderate path algorithms, using extendedprecision, fixedwidth floats.
 A slow path algorithm, using arbitraryprecision arithmetic.
We'll assume we've parsed the number at this stage, and extracted the significant digits, the raw exponent, and the exponent relative to the significant digits. This expects all leading zeros in the mantissa to be trimmed, and all trailing zeros (after the decimal point) to similarly be trimmed.
For example, "1.2345e5"
would be:
let mantissa = "12345";
let raw_exponent = 5;
let mantissa_exponent = 1;
let digits = mantissa.parse::<u64>().unwrap();
Just a quick notice: all the estimates of performance below are estimated on an i76560U CPU @ 2.20GHz, using the rust target x86_64unknownlinuxgnu.
Fast Path
A decimal number can be exactly represented as a binary float if the number of digits can fit exactly in the mantissa of the float, without truncation, and the exponent can also be represented as a native float. For an f32
, the exponent limits are [10, 10]
, inclusive, and for an f64
, they are [22, 22]
, inclusive. This is simply ⌊(mantissa_size + 1) / log2(5)⌋
, since we can remove a poweroftwo (since it can be exactly represented).
If the significant digits can be represented exactly without truncation (digits >> 53 ==
0), and the exponent can be represented exactly (mantissa_exponent >= 22 && mantissa_exponent <= 22
), then we have a valid float, and can simply multiply digits * 10**exponent
, where the power is precomputed to ensure accuracy.
If the exponent is larger than this limit, we might still have a disguised fastpath algorithm, and can shift digits from the exponent to the mantissa. If the exponent is >10 && <= 17
(for f32
) or > 2 &&2 <= 37
(for f64
), we can try to shift digits over, see if we still have an untruncated mantissa, and then multiply digits * 10**exponent
as before.
Parsing using the fastpath can take ~2045ns, depending on the length of the input, and almost entirely depends on the speed of parsing the mantissa. Rust already has a fastpath algorithm, although I am unsure if it handles it does not handle disguised fastpath cases.
Moderate Path
If we cannot use the fastpath algorithm, we need to fall back to 0 or more moderatepath algorithms, which are considerably faster than the slowpath algorithms. My library uses extendedprecision float representations, and there are 2 algorithms I'd like to propose for inclusion (of which, any or both of them may be included).
 An extendedfloat representation using 64bits for the mantissa and 16bits for the binary exponent.
 The EiselLemire algorithm, which creates a 128bit (fallback 192bit) representation of the mantissa and infers the binary exponent from the mantissa exponent.
Both algorithms use 64bit multiplication to scale the normalized mantissa (the MSB is 1) by the decimal exponent, using precomputed powers of 10. However, the extendedfloat representation truncates intermediate results to 64bits, and calculates the error from this truncation, to estimate if the errors could lead to ambiguity in halfway cases. An implementation of this algorithm can be found here, although this is based off the Go implementation. For licensing reasons, I would likely explain the algorithm to a core developer ( which are trivial to implement), add the necessary routines in the ExtendedFloat
struct to core (which I can contribute freely). Meanwhile, the EiselLemire algorithm computes the high and low bits of the 64bit multiplication, and uses them to create a 128bit representation of the float. If this representation cannot be differentiated from halfway cases, a 192bit representation is attempted. An implementation of this algorithm can be found here, and it is similarly based off the Go implementation.
Both algorithms, however, are trivial to implement and may be done in <80 lines of code. Parsing using the fastpath can take ~5060ns, depending on the length of the input, and handles the vast majority of the remaining cases. Note that there is not 100% overlap between both algorithms, so if performance is the primary concern, both may be implemented together for optimal speed.
If the extendedfloat algorithm is used, it requires of 76 u64
s for precomputed powers of 10. If the EiselLemire algorithm is used, it requires ~1300 u64
s for precomputed powers of 10. If both are used, the extendedfloat algorithm can use EiselLemire's precomputed powers, so no additional storage is required.
Slow Path
Although initially (as described in the post) I used a modified form of AlgorithmM (which Rust currently uses, to my knowledge), I have implemented my own algorithms with much faster performance here. The general approach is quite simple: create an arbitraryprecision integer (with at most ~3600 bits of precision), parse the significant digits into the big integer.
If the value is larger than 1, we can use large_atof, which then scales the significant digits by a poweroften, extracts the high 64bits of the number, creates an ExtendedFloat
from this representation, and roundsnearest, tieseven.
If the value is less than 1 (including denormal floats), we use the create a theoretical representation of the bits of b
(generated from the moderate path). We then scale the two numbers to the same binary exponent (first by removing a poweroftwo) and significant digits, compare the digits, and round up or down accordingly.
These algorithms are significantly faster than rust core (by up to 500x), and are more comprehensive (can parse large and subnormal floats that are beyond the scope of core right now). The speed of these algorithms for nearhalfway cases is ~200ns (for 2030 significant digits), ~1.2μs for 400 digits, and ~6.5μs for 6400 digits. This is orders of magnitude faster than rust's existing implementation, and comparable with GCC's strtod
for very large inputs (but considerably faster for small inputs).
Validation
In order to ensure the float parser is correct, I've done extensive validation, using a curated testsuite of nearhalfway cases and other difficulttoparse examples, submitted it to the comprehensive float tests from the rust core library, and the curated tests used by Wuzz and Go for validation. Furthermore, these algorithms have been used in serdejson, nom, and downloaded more than 5 million times with no reported issues in correctness since very initial versions. If there are errors in correctness, it is extremely unlikely.
Additional Considerations
The last time I proposed integrating slowpath algorithms into rust, I learned that the internal biginteger types do not do normalization after operations (this may have changed recently). As my algorithms only need normalization to the nearest digit (so there are no zero mostsignificantdigits), this should be trivial to alleviate with a normalize
function.
Appendix  Radix Conversions
The algorithm for determining the number of digits required to round is the following, in Python:
import math
def digits(emin, p2, b=10):
return emin + p2 + math.floor((emin+ 1)*math.log(2, b)math.log(12**(p2), b))
f32_digits = digits(126, 24)
f64_digits = digits(1022, 53)
Appendix  Repository
I've created a minimal repository (so I don't have to build Rust everytime I make changes) tracking these changes, with each branch representing changes to improve each alghorithm.
Appendix  Open Issues and Pull Requests
#85198: Improved Performance for Disguised FastPath Cases in Float Parsing
#85214 Infer, Rather than Store, Binary Exponents when Float Parsing
#85234 Improve Float Parsing Speeds by up to 99.94% Through Improvements to the Bellerophon Algorithm