Read 64-bit multi-page SAS catalog files and skip format default values by hpoettker · Pull Request #370 · WizardMac/ReadStat

hpoettker · 2026-04-29T18:43:21Z

Summary

This PR addresses three issues regarding the reading of SAS catalog files:

it fixes the issue that XLSR records beyond the first page in 64 bit files are currently not found due to a wrong offset,
it skips the "informats" in the file, which currently lead to parsing errors
it skips the default values of custom formats also in the cases where the default value is not written last

I've tested the PR locally with 64 bit files on Linux and 32 bit files on Windows.

Offset for `XLSR` record on later pages

The current implementation always uses the page offset 16 when looking for XLSR records on later pages. But the offset 16 is only correct for 32-bit files. For 64-bit files, it is 32.

Currently, the XLSR records on later pages are missed in 64-bit files, which leads to any formats that are referenced from those later records to be missed.

Informats in catalog files

The current implementation is only prepared for formats, which map from numbers to either other numbers or strings. This leads to parsing errors when "informats" are encountered, which also map from strings.

The PR proposes to just skip informats, which can be identified in the catalog file by names starting with @.

I might contribute the code for the informat parsing in a follow-up PR. But I think that would require a discussion before-hand on how to integrate informats into the existing API. They are not value labels that should be used for outward presentation but rather mappings that should be used on input data to derive an internal representation. I don't know whether such a concept exists for SPSS or Stata files.

Default values in custom formats

The PR fixes an issue that occurs when reading custom formats from SAS catalog files whose default value is not saved as the last value in the physical catalog file.

Currently, ReadStat skips the default value correctly in a format created like this:

proc format;
  value myfmt
    1 = 'Yes'
    2 = 'No'
    other = 'Unknown';
run;

as SAS writes the labels in the order of encounter.

But for a format created like this:

proc format;
  value myfmt
    other = 'Unknown'
    1 = 'Yes'
    2 = 'No';
run;

which logically creates the same format, ReadStat currently reads the format to map

from 1 to Unknown,
from 2 to Yes,
and skips the mapping to No.

It would be nice to also expose the default value through the API. But that would require a discussion on the API change before-hand as the current handlers for value label do not accept default values as far as I can tell. I don't know whether the concept of default value labels exists for SPSS or Stata files.

hpoettker · 2026-04-29T19:44:59Z

Sorry for the spam from the failed fuzzing. I'll look into running the fuzzer locally and/or on push before opening PRs.

I've probably opened up the problem by removing the bound on the counter i in pass 2 of sas7bcat_parse_value_labels on both label_count_used and label_count_capacity. With the PR, it's now only bound by label_count_used as that is larger by 1 than label_count_capacity for formats with default value.

But I've added the missing checks to guard against buffer overflow now, and the fuzzing is successful.

hpoettker added 3 commits April 29, 2026 18:38

Read 64-bit multi-page SAS catalog files and skip format default values

9bdfa58

Address fuzzing issue with additional buffer check

9d173a3

Address fuzzing issue with split buffer check

06514dd

hpoettker mentioned this pull request Apr 29, 2026

Upgrade GitHub action upload-artifact #371

Merged

hpoettker mentioned this pull request May 31, 2026

SAS formats with string values of length longer than 10 are parsed incorrectly #378

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Read 64-bit multi-page SAS catalog files and skip format default values#370

Read 64-bit multi-page SAS catalog files and skip format default values#370
hpoettker wants to merge 3 commits into
WizardMac:devfrom
hpoettker:sas-formats

hpoettker commented Apr 29, 2026 •

edited

Loading

Uh oh!

hpoettker commented Apr 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hpoettker commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Offset for XLSR record on later pages

Informats in catalog files

Default values in custom formats

Uh oh!

hpoettker commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hpoettker commented Apr 29, 2026 •

edited

Loading

Offset for `XLSR` record on later pages

hpoettker commented Apr 29, 2026 •

edited

Loading