Skip to content

Arrow: Fix truncation of decimals with precision larger than 18#16627

Open
wombatu-kun wants to merge 1 commit into
apache:mainfrom
wombatu-kun:arrow-fix-decimal-truncation
Open

Arrow: Fix truncation of decimals with precision larger than 18#16627
wombatu-kun wants to merge 1 commit into
apache:mainfrom
wombatu-kun:arrow-fix-decimal-truncation

Conversation

@wombatu-kun
Copy link
Copy Markdown
Contributor

Problem

Reading a decimal column through the vectorized Arrow reader silently corrupts values whose unscaled magnitude exceeds Long.MAX_VALUE. This affects any decimal with precision larger than 18 (for example decimal(38, 0)) holding a sufficiently large value. No error is raised; the returned BigDecimal is simply wrong, and often negative.

Root cause

Decimals with precision larger than 18 are stored as a binary / FIXED_LEN_BYTE_ARRAY and read into a FixedSizeBinaryVector. The binary-backed decimal accessors decode the bytes into the correct BigDecimal and then hand it to JavaDecimalFactory.ofBigDecimal, which rebuilds it as BigDecimal.valueOf(value.unscaledValue().longValue(), scale). BigInteger.longValue() keeps only the low 64 bits, so any unscaled value beyond Long range is truncated. The incoming value is already the correct BigDecimal (it carries the right unscaled value and scale), so this round-trip is both unnecessary and lossy.

The ofLong path used for INT32/INT64-backed decimals (precision up to 18) is unaffected, which is why only high-precision decimals are corrupted and the existing tests, which use decimal(9, 2), never caught it.

Fix

Return value unchanged. It already represents the decimal with the correct unscaled value and scale, matching how the Spark accessor factory preserves the full value.

Tests

Added TestArrowReader.testHighPrecisionDecimalIsReadCorrectly, which writes a decimal(38, 0) Parquet file with values larger than Long.MAX_VALUE and asserts they round-trip through the vectorized reader. It fails before the fix (expected 12345678901234567890 but was -6101065172474983726) and passes after.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the arrow label May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant