Problem
parse_iso8601_date_core() in src/fread.c uses C's % and / operators for 400-year calendar cycle arithmetic (lines 1091–1092). In C, these operators truncate toward zero, which produces negative results for negative (BCE) years. This causes:
- Out-of-bounds array access:
year % 400 is negative for negative years (e.g., -1 % 400 = -1), so cumDaysCycleYears[year % 400] reads before the start of the array.
- Wrong cycle calculation:
year / 400 truncates toward zero (e.g., -1 / 400 = 0), placing negative years in the wrong 400-year cycle.
- Signed integer overflow: the garbage value from the OOB read participates in the addition on lines 1090–1094, overflowing
int32_t.
Under UBSAN, this manifests as SIGILL (illegal instruction trap). Without sanitizers, it silently produces wrong dates.
Minimal reproducer
library(data.table)
fread(text = "-1-01-01\n")
#> V1
#> <IDat>
#> 1: 370-01-01 # wrong — year -1 becomes 370
fread(text = "-100-06-15\n")
#> V1
#> <IDat>
#> 1: -558-01-23 # wrong date and month
Any negative year that passes the range check on line 1072 (year >= -5877640) triggers this.
Root cause
// fread.c, lines 1090-1094
*target =
(year / 400 - 4) * cumDaysCycleYears[400] +
cumDaysCycleYears[year % 400] + // BUG: negative index for negative years
(isLeapYear ? cumDaysCycleMonthsLeap[month - 1] : cumDaysCycleMonthsNorm[month - 1]) +
day - 1;
C's % preserves the sign of the dividend: -1 % 400 = -1, not 399. The cumDaysCycleYears array has indices 0–400, so negative indices are out of bounds.
Suggested fix
Use floored division and non-negative modulo:
int32_t cycle_year = year % 400;
if (cycle_year < 0) cycle_year += 400;
int32_t cycle = (year - cycle_year) / 400;
*target =
(cycle - 4) * cumDaysCycleYears[400] +
cumDaysCycleYears[cycle_year] +
(isLeapYear ? cumDaysCycleMonthsLeap[month - 1] : cumDaysCycleMonthsNorm[month - 1]) +
day - 1;
The leap year check on line 1076 has the same issue with year % 4 and year % 100 — those don't cause UB (no array indexing) but may give wrong leap year classification for some negative years.
Found by
AFL++ fuzzing of data.table::fread() with ASAN + UBSAN instrumentation.
Problem
parse_iso8601_date_core()insrc/fread.cuses C's%and/operators for 400-year calendar cycle arithmetic (lines 1091–1092). In C, these operators truncate toward zero, which produces negative results for negative (BCE) years. This causes:year % 400is negative for negative years (e.g.,-1 % 400 = -1), socumDaysCycleYears[year % 400]reads before the start of the array.year / 400truncates toward zero (e.g.,-1 / 400 = 0), placing negative years in the wrong 400-year cycle.int32_t.Under UBSAN, this manifests as SIGILL (illegal instruction trap). Without sanitizers, it silently produces wrong dates.
Minimal reproducer
Any negative year that passes the range check on line 1072 (
year >= -5877640) triggers this.Root cause
C's
%preserves the sign of the dividend:-1 % 400 = -1, not399. ThecumDaysCycleYearsarray has indices 0–400, so negative indices are out of bounds.Suggested fix
Use floored division and non-negative modulo:
The leap year check on line 1076 has the same issue with
year % 4andyear % 100— those don't cause UB (no array indexing) but may give wrong leap year classification for some negative years.Found by
AFL++ fuzzing of
data.table::fread()with ASAN + UBSAN instrumentation.