Arrow: Fix truncation of decimals with precision larger than 18 by wombatu-kun · Pull Request #16627 · apache/iceberg

wombatu-kun · 2026-05-31T04:30:28Z

Problem

Reading a decimal column through the vectorized Arrow reader silently corrupts values whose unscaled magnitude exceeds Long.MAX_VALUE. This affects any decimal with precision larger than 18 (for example decimal(38, 0)) holding a sufficiently large value. No error is raised; the returned BigDecimal is simply wrong, and often negative.

Root cause

Decimals with precision larger than 18 are stored as a binary / FIXED_LEN_BYTE_ARRAY and read into a FixedSizeBinaryVector. The binary-backed decimal accessors decode the bytes into the correct BigDecimal and then hand it to JavaDecimalFactory.ofBigDecimal, which rebuilds it as BigDecimal.valueOf(value.unscaledValue().longValue(), scale). BigInteger.longValue() keeps only the low 64 bits, so any unscaled value beyond Long range is truncated. The incoming value is already the correct BigDecimal (it carries the right unscaled value and scale), so this round-trip is both unnecessary and lossy.

The ofLong path used for INT32/INT64-backed decimals (precision up to 18) is unaffected, which is why only high-precision decimals are corrupted and the existing tests, which use decimal(9, 2), never caught it.

Fix

Return value unchanged. It already represents the decimal with the correct unscaled value and scale, matching how the Spark accessor factory preserves the full value.

Tests

Added TestArrowReader.testHighPrecisionDecimalIsReadCorrectly, which writes a decimal(38, 0) Parquet file with values larger than Long.MAX_VALUE and asserts they round-trip through the vectorized reader. It fails before the fix (expected 12345678901234567890 but was -6101065172474983726) and passes after.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Arrow: Fix truncation of decimals with precision larger than 18

1f9dc27

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions Bot added the arrow label May 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arrow: Fix truncation of decimals with precision larger than 18#16627

Arrow: Fix truncation of decimals with precision larger than 18#16627
wombatu-kun wants to merge 1 commit into
apache:mainfrom
wombatu-kun:arrow-fix-decimal-truncation

wombatu-kun commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wombatu-kun commented May 31, 2026

Problem

Root cause

Fix

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant