With the discussion and major advance of improving the float-parsing algorithms in Rust core, I believe it's time to consider a major use-case of float parsing in Rust: parsing floats in storage formats such as ENDF-6, TOML, JSON, and many more.
These decimal strings representing floats may have major, syntactical differences from what Rust considers a valid string. For example, in JSON, we can annotate the following strings as valid floats or not:
"NaN" // invalid
"nan" // invalid
"1.23" // valid
"1.23e" // invalid
"1." // invalid
".1" // invalid
"1.23e5" // valid
"+1.23e5" // invalid
"-1.23e5" // valid
Meanwhile, the following are valid when using str::parse
:
"NaN" // valid
"nan" // invalid
"1.23" // valid
"1.23e" // invalid
"1." // valid
".1" // valid
"1.23e5" // valid
"+1.23e5" // valid
"-1.23e5" // valid
In short, these implementations cannot use Rust core's float parser, due to the design choices of the core library. Although this might be fine in most languages, Rust is a common choice for implementing high-performance parsers for data interchange formats, and the performance and abundance of features in the standard library are two of the major reasons for this. There are two, common alternatives therefore when parsing a data interchange format:
- Create your own float parser (or fork an existing implementation, such as serde-json).
- Tokenize the float, and re-format it to be passed on the
str::parse
.
Neither of these solutions is ideal for high-performance parsing, either due to the complexity of correct float parsers in the former, or the major performance issues in the latter.
A solution that would satisfy the vast majority of cases would be as follows:
pub trait FloatFromParts {
fn from_parts(integral: &str, fractional: &str, exponent: i64, negative: bool) -> Self;
}
impl FloatFromParts for f32 {
...
}
impl FloatFromParts for f64{
...
}
This has major advantages:
- It allows significant code re-use with
dec2flt
, since we already need to parse the integral and fractional digits separately, parse an exponent to ani64
, and determine if the float is negative. All the internal algorithms will therefore share the same code. - It covers the vast majority of cases, without adding performance penalties. The only common cases this does not cover is floats with digit separators.
This would allow numerous data-interchange parsers (as well as compilers written in Rust) to use the Rust core library for float parsing. This would also require minimal additions to dec2flt
. This would not include special values (which is a feature, since many data interchange formats, such as JSON, do not support special floats).
Note: This would only encompass decimal strings, so float strings like C/C++ hexadecimal strings would not be included.