Skip to content

SystemError / wrong behavior: 15+ sites conflate error-return sentinels with data (PyObject_IsTrue, PyObject_IsInstance, PyLong_AsSsize_t) #293

@devdanzin

Description

@devdanzin

Summary

Three Python/C-API functions — PyObject_IsTrue, PyObject_IsInstance, and PyLong_AsSsize_t — return -1 to signal an error and -1 can be a legitimate data value. zstandard does not distinguish the two at ~15 sites. Observable consequences:

  • PyObject_IsTrue: -1 is stored as an int field value (closefd = -1, read_across_frames = -1, ...) or used in !r / r == 0 comparisons, returning success with a pending exception → SystemError: <method> returned a result with an exception set (fatal abort on debug builds).
  • PyObject_IsInstance: !r maps -1 to 0 (false), silently replacing the caller's exception with TypeError("dict_data must be zstd.ZstdCompressionDict").
  • PyLong_AsSsize_t: legitimate -1 is conflated with the error sentinel without checking PyErr_Occurred(); from_level(dict_size=-1) takes the error branch without setting an exception → SystemError: returned NULL without setting an exception (fatal abort on debug builds).

Impact

  • Severity: Fatal SystemError on debug builds; silent wrong behavior / clobbered exceptions on release.
  • Reachability: Standard Python — pass a __bool__ that raises, a negative dict_size, or a __instancecheck__ that raises. No unusual setup.
  • Version: 0.25.0 (commit 7a77a75).
  • Platform: Confirmed Linux x86_64 / CPython 3.14 debug; bug is platform-independent.

Reproducers

PyObject_IsTrue — 11 sites, one trigger pattern

import zstandard, io

class BadBool:
    def __bool__(self):
        raise ValueError("bad bool")

comp = zstandard.ZstdCompressor()
comp.stream_writer(io.BytesIO(), closefd=BadBool())
# Fatal Python error: <method 'stream_writer'> returned a result with an exception set
# SystemError: <method 'stream_writer'> returned a result with an exception set

All 11 confirmed sites:

comp.stream_reader(source, closefd=BadBool())
comp.stream_writer(dest, closefd=BadBool())
comp.stream_writer(dest, write_return_read=BadBool())
decomp.stream_reader(source, closefd=BadBool())
decomp.stream_reader(source, read_across_frames=BadBool())
decomp.stream_writer(dest, closefd=BadBool())
decomp.stream_writer(dest, write_return_read=BadBool())
decomp.decompress(data, allow_extra_data=BadBool())
zstandard.ZstdCompressor(write_checksum=BadBool())
zstandard.ZstdCompressor(write_content_size=BadBool())
zstandard.ZstdCompressor(write_dict_id=BadBool())

PyLong_AsSsize_tfrom_level(dict_size=-1) aborts

import zstandard
zstandard.ZstdCompressionParameters.from_level(3, dict_size=-1)
# Fatal Python error: returned NULL without setting an exception
# SystemError: <built-in method from_level> returned NULL without setting an exception

PyObject_IsInstance — 3 sites

Pattern: !PyObject_IsInstance(obj, ...) in validation for the dict_data kwarg on compressor / decompressor init. Confirmed by static review — reproducing requires a non-cooperative __instancecheck__, which is less idiomatic to construct than BadBool. Impact is exception clobbering rather than a crash: caller's exception from __instancecheck__ is replaced with TypeError("dict_data must be zstd.ZstdCompressionDict").

Root cause

All three APIs return -1 on error and -1 can be a valid value. The correct patterns:

  • PyObject_IsTrue / PyObject_IsInstance: check < 0 before using the return as a boolean.
  • PyLong_AsSsize_t: when the result is -1, check PyErr_Occurred() to distinguish a legitimate -1 from an error.

zstandard stores the PyObject_IsTrue return directly into an int field (result->closefd = PyObject_IsTrue(obj);), uses !PyObject_IsInstance(...) without a prior < 0 check, and takes a goto cleanup on PyLong_AsSsize_t == -1 without consulting PyErr_Occurred.

Affected sites

PyObject_IsTrue — 11

File Line(s) Kwarg / field
c-ext/compressor.c 487 write_checksum, write_content_size, write_dict_id (init)
c-ext/compressor.c 752-754 closefd, write_return_read on stream_reader / stream_writer
c-ext/decompressor.c 381 closefd on stream_reader
c-ext/decompressor.c 431 read_across_frames
c-ext/decompressor.c 562-563 closefd, write_return_read on stream_writer
c-ext/decompressor.c 614-615 allow_extra_data on decompress()

PyObject_IsInstance — 3

File Line Context
c-ext/compressor.c 128 dict_data type check in ZstdCompressor_init
c-ext/compressor.c 140 dict_data type check (2nd occurrence)
c-ext/decompressor.c 78 dict_data type check in Decompressor_init

PyLong_AsSsize_t — reported site

File Line Context
c-ext/compressionparams.c 271 dict_size parameter to from_level()

The PyLong_AsSsize_t calls in copy_stream (c-ext/compressor.c around 411; c-ext/decompressor.c around 247) interact with an OOM cleanup path to produce a separate Py_DECREF(NULL) — noted here for completeness; details in the full report linked below.

Suggested fix

Three mechanical patterns; one per API.

PyObject_IsTrue:

int r = PyObject_IsTrue(obj);
if (r < 0) goto except;
result->closefd = r ? 1 : 0;

PyObject_IsInstance:

int r = PyObject_IsInstance(obj, (PyObject *)&ZstdCompressionDict_Type);
if (r < 0) goto except;
if (!r) {
    PyErr_SetString(PyExc_TypeError, "dict_data must be zstd.ZstdCompressionDict");
    goto except;
}

PyLong_AsSsize_t:

Py_ssize_t dict_size_val = PyLong_AsSsize_t(dict_size);
if (dict_size_val == -1 && PyErr_Occurred()) goto cleanup;
if (dict_size_val < 0) {
    PyErr_SetString(PyExc_ValueError, "dict_size must be non-negative");
    goto cleanup;
}

All three can land in a single PR. If you prefer finer-grained review, happy to split into 3 PRs (one per API).

Methodology

Found via cext-review-toolkit (Tree-sitter-based static analysis with structured naive/informed review passes). 11 of 11 BadBool() sites produce a SystemError fatal abort on CPython 3.14.3 debug. from_level(dict_size=-1) also verified on the same build. PyObject_IsInstance sites confirmed via static review. Happy to open a PR.

Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.

Full report

Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions