[algorithm] new float/double to string algorithm

xjb714 · October 31, 2025, 5:41pm

xjb714/xjb: fast float to string algorithm.

This is my work. Welcome everyone to learn about and use it.

Vorpal · October 31, 2025, 5:51pm

It is not in Rust, so it is hard to tell how well it would port over.
I only see microbenchmarks. Those are often not very useful, as in a real program a on paper slower algorithm that uses significantly less instruction cache can be faster, if the program as a whole is cache bound or memory bandwidth bound.

Especially this last point is important: lookup tables tend to actually be terrible in real code, but look good in benchmarks. And you have some really big lookup tables. No thanks!

zackw · October 31, 2025, 6:15pm

I endorse what @Vorpal said, particularly about the gigantic lookup tables. It looks like the biggest one contains precalculated powers of ten. You might be able to compact this down considerably by computing an approximation to the correct value and then having the table store only a correction. I'd guess that a lookup table on the order of 16 bits per exponent would be an acceptable cost in terms of cache footprint.

Also, I looked through your linked repository and I don't see any concrete analysis or even testing of the accuracy of the result of the conversion. You do mention the key accuracy criteria in your paper:

Information preservation: The print result can be parsed back to the original floating-point number.
Minimum length: The print result should be as short as possible.
Correct rounding: On the basis of satisfying 1 and 2, if there are two candidate values they should be correctly rounded ~~(i.e., the even value should be selected)~~.^[1]

but I don't see anywhere you actually verified that either your algorithm or its implementation has these properties. You need to do that verification and document the results.

Also also, float-to-text conversion in practice is almost always paired with rounding to a specific number of decimal places. To be a compelling replacement for existing conversion routines, your algorithm should handle this: both for efficiency -- you should be able to speed things up when only a few decimal places are wanted -- and to avoid double rounding.

"Correctly rounded" does not always mean "the last digit should be even". Rather, floating point to text conversion should honor the rounding mode of the thread's floating point environment, and it may be desirable to be able to override that on a case-by-case basis with arguments to the conversion routine. ↩︎

xjb714 · October 31, 2025, 6:40pm

Test link:

Floating-point number printing is very complex. I merely propose an algorithm whose output results are consistent with those of algorithms such as ryu, dragonbox, and schubfach. If you are interested, you can first learn about these algorithms and then my algorithm.

zackw · October 31, 2025, 6:43pm

One more thing: allow me to suggest some more realistic whole-program benchmarks.

Write a large matrix of numbers to disk as a CSV file.
Write a large array of records, where some but not all of the fields are floats, to disk as a CSV file.
Convert a large set of geographic polygons from "well-known binary" to "well-known text" representation and write the result to disk.

For the first two, "large" should be at least tens of megabytes for the in-RAM representation, and you should not stop hacking until you're significantly faster than vroom. For the last, "large" means something like a relatively fine-resolution map of the world, e.g. the Natural Earth 1:50m or 1:10m vector maps, and I don't have a good reference to suggest off the top of my head (GIS libraries in general are slow beasts).

xjb714 · October 31, 2025, 6:45pm

Please note that some of the data in the project is outdated and I haven't updated it in time. The latest results are in the paper.

zackw · October 31, 2025, 6:45pm

Please understand that this comes across as "I had a clever idea but I'm not interested in doing the bulk of the work required to make it actually useful in practice." You're entitled to take that position but it is likely that no one else will be interested in doing that work either.

xjb714 · October 31, 2025, 6:53pm

Printing floating-point numbers usually involves two steps: (1)binary to decimal; (2) decimal to ascii. I have only completed the most crucial first step. Outputting data to a csv file usually includes floating-point number printing. After completing the second step, I will conduct the test subsequently.

Topic		Replies	Views
Implementing a Fast, Correct Float Parser internals	4	4924	September 28, 2021
Optimizing Fallback Algorithms for Float Parsing internals	6	555	January 28, 2025
Scientific notation when formatting floating point numbers	37	3662	August 11, 2024
[Review request] Floating point formatting libs	1	952	March 25, 2019
0.1 + 0.2 = 0.30000000000000004 a cliche question language design	16	4224	August 21, 2021

[algorithm] new float/double to string algorithm

Related topics