Unroll the integer-part digit scan (straight-line for the common 1-5 digit case) by fcostaoliveira · Pull Request #381 · fastfloat/fast_float

fcostaoliveira · 2026-06-01T09:52:11Z

The integer part of a number is scanned one byte at a time, while the fractional
part already uses the 8-digit SWAR loop (loop_parse_if_eight_digits). Integer parts
are usually short (1–5 digits), so the loop back-edge is a large share of the cost.
This peels the first five iterations into straight-line ifs and falls through to the
original loop for longer inputs. The arithmetic is unchanged (i = 10*i + digit), so
behavior is identical; one file, +29/−6, in the UC-templated path.

Benchmark — m8g.metal-24xl (Graviton4), -O3 -march=native,
simple_fastfloat_benchmark, from_chars→double, base vs patch measured
back-to-back (mean of 2 runs):

dataset	gcc 13	clang 18
canada.txt	+3.1%	+2.8%
mesh.txt	+5.4%	+5.1%
random [0,1]	~0%	~0%

random is 0.xxx (a 1-digit integer part), so it is unaffected, as expected. No
regression on any input.

For completeness I also tried reusing loop_parse_if_eight_digits for the integer
part, and a counted for (k < 5) loop; both were slower here (the 8-digit SWAR setup
does not pay off for short integer parts, and clang optimized the counted loop less
well), so this keeps the explicit peel.

Tests: FASTFLOAT_TEST 14/14 and FASTFLOAT_EXHAUSTIVE (exhaustive32 / 32_64 /
midpoint / long variants) all pass. Builds clean on gcc and clang at C++11 and C++20
under -Werror -Wall -Wextra -Weffc++ -Wconversion -Wsign-conversion -Wshadow,
clang-format clean. No new multi-byte reads, so big-endian (s390x) is unaffected.

…digit case) parse_number_string scans the integer part one byte at a time in a while loop, while the fraction already uses the 8-digit SWAR loop. Most integer parts are 1-5 digits, so the loop back-edge dominates. Peel the first five iterations into nested ifs, falling through to the original while for longer runs. Semantics are identical (i = 10*i + digit, advancing p); no behavior change. AWS m8g.metal-24xl (Graviton4), -O3 -march=native, simple_fastfloat_benchmark, from_chars->double. base vs patch measured back-to-back, mean of 2 runs: canada: gcc +3.1%, clang +2.8% mesh: gcc +5.4%, clang +5.1% random: ~flat (1-digit integer part) No regression; gcc and clang agree. Alternatives benchmarked and rejected: reusing loop_parse_if_eight_digits for the integer part regressed 5-8% (integer parts are too short for 8-digit SWAR setup); a counted for(k<5) loop matched on gcc but clang optimized it worse (canada -0.9%). The explicit peel is the only form solidly positive on both compilers.

lemire

Will merge once tests pass.

lemire approved these changes Jun 1, 2026

View reviewed changes

lemire merged commit 0f682cd into fastfloat:main Jun 1, 2026
35 checks passed

fcostaoliveira mentioned this pull request Jun 1, 2026

GCC: parsed_number_string marshaling dominates short-float parsing on aarch64 #384

Open

BrewTestBot mentioned this pull request Jun 1, 2026

fast_float 8.2.6 Homebrew/homebrew-core#285770

Merged

fcostaoliveira mentioned this pull request Jun 2, 2026

Use ffc (pure-C99) as the RESP3 double parser instead of strtod redis/hiredis#1328

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unroll the integer-part digit scan (straight-line for the common 1-5 digit case)#381

Unroll the integer-part digit scan (straight-line for the common 1-5 digit case)#381
lemire merged 1 commit into
fastfloat:mainfrom
redis-performance:pr/integer-scan-unroll

fcostaoliveira commented Jun 1, 2026

Uh oh!

lemire left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fcostaoliveira commented Jun 1, 2026

Uh oh!

lemire left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants