Skip to content

fread: undefined behavior in parse_iso8601_date_core for negative years #7704

@kevinushey

Description

@kevinushey

Problem

parse_iso8601_date_core() in src/fread.c uses C's % and / operators for 400-year calendar cycle arithmetic (lines 1091–1092). In C, these operators truncate toward zero, which produces negative results for negative (BCE) years. This causes:

  1. Out-of-bounds array access: year % 400 is negative for negative years (e.g., -1 % 400 = -1), so cumDaysCycleYears[year % 400] reads before the start of the array.
  2. Wrong cycle calculation: year / 400 truncates toward zero (e.g., -1 / 400 = 0), placing negative years in the wrong 400-year cycle.
  3. Signed integer overflow: the garbage value from the OOB read participates in the addition on lines 1090–1094, overflowing int32_t.

Under UBSAN, this manifests as SIGILL (illegal instruction trap). Without sanitizers, it silently produces wrong dates.

Minimal reproducer

library(data.table)

fread(text = "-1-01-01\n")
#>           V1
#>       <IDat>
#> 1: 370-01-01   # wrong — year -1 becomes 370

fread(text = "-100-06-15\n")
#>            V1
#>        <IDat>
#> 1: -558-01-23  # wrong date and month

Any negative year that passes the range check on line 1072 (year >= -5877640) triggers this.

Root cause

// fread.c, lines 1090-1094
*target =
    (year / 400 - 4) * cumDaysCycleYears[400] +
    cumDaysCycleYears[year % 400] +              // BUG: negative index for negative years
    (isLeapYear ? cumDaysCycleMonthsLeap[month - 1] : cumDaysCycleMonthsNorm[month - 1]) +
    day - 1;

C's % preserves the sign of the dividend: -1 % 400 = -1, not 399. The cumDaysCycleYears array has indices 0–400, so negative indices are out of bounds.

Suggested fix

Use floored division and non-negative modulo:

int32_t cycle_year = year % 400;
if (cycle_year < 0) cycle_year += 400;
int32_t cycle = (year - cycle_year) / 400;

*target =
    (cycle - 4) * cumDaysCycleYears[400] +
    cumDaysCycleYears[cycle_year] +
    (isLeapYear ? cumDaysCycleMonthsLeap[month - 1] : cumDaysCycleMonthsNorm[month - 1]) +
    day - 1;

The leap year check on line 1076 has the same issue with year % 4 and year % 100 — those don't cause UB (no array indexing) but may give wrong leap year classification for some negative years.

Found by

AFL++ fuzzing of data.table::fread() with ASAN + UBSAN instrumentation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions