Summary
PyType_GenericNew creates zero-initialized instances, but zstandard's method implementations assume the C-level pointer fields (cctx, dctx, params, ...) are non-NULL. T.__new__(T).method(...) therefore segfaults for 12 of 13 extension types with nothing more than a standard-library import.
Impact
- Severity: Segfault (some paths hit an assertion abort on debug builds).
- Reachability: A single line of pure Python on a public type. No ZSTD-library involvement, no unusual input.
- Version: 0.25.0 (commit
7a77a75); pattern very likely present in prior releases as well.
- Platform: Confirmed Linux x86_64 / CPython 3.14 debug; the bug is platform-independent.
Reproducers
Each one-liner segfaults:
import zstandard as zstd
zstd.ZstdCompressor.__new__(zstd.ZstdCompressor).compress(b'x')
zstd.ZstdDecompressor.__new__(zstd.ZstdDecompressor).decompress(b'x')
zstd.ZstdCompressionParameters.__new__(zstd.ZstdCompressionParameters).estimated_compression_context_size()
zstd.ZstdCompressionWriter.__new__(zstd.ZstdCompressionWriter).write(b'x')
zstd.ZstdDecompressionWriter.__new__(zstd.ZstdDecompressionWriter).write(b'x')
zstd.ZstdCompressionReader.__new__(zstd.ZstdCompressionReader).read(10)
zstd.BufferWithSegmentsCollection.__new__(zstd.BufferWithSegmentsCollection)[0]
Five more types aren't in the top-level namespace but are reachable via type() introspection and crash the same way:
c = zstd.ZstdCompressor()
T = type(c.compressobj()) # ZstdCompressionObj
T.__new__(T).compress(b'x') # segfault
# likewise: ZstdDecompressionObj, ZstdCompressorIterator,
# ZstdDecompressorIterator, ZstdCompressionChunker
ZstdDecompressionReader (the 13th affected type) does not crash — read() returns b'' because input.pos == input.size == 0 takes an early-return branch. The instance is still in an invalid state; any future method that doesn't short-circuit on this state will crash.
Root cause
All 13 type specs install {Py_tp_new, PyType_GenericNew}. PyType_GenericNew zero-initializes the instance. tp_init is where real allocation (ZSTD_createCCtx, ZSTD_createDCtx, PyMem_Malloc, ...) happens — skip __init__ and the pointers stay NULL. Methods then do things like ZSTD_CCtx_reset(self->cctx, ...) on NULL.
Affected types
| Type |
File |
NULL field / observable |
Trigger |
ZstdCompressor |
c-ext/compressor.c |
cctx |
.compress(b'x') |
ZstdDecompressor |
c-ext/decompressor.c |
dctx (via ensure_dctx) |
.decompress(b'x') |
ZstdCompressionParameters |
c-ext/compressionparams.c |
params |
any accessor |
ZstdCompressionWriter |
c-ext/compressionwriter.c |
compressor |
.write(b'x') |
ZstdDecompressionWriter |
c-ext/decompressionwriter.c |
decompressor |
.write(b'x') |
ZstdCompressionReader |
c-ext/compressionreader.c |
assertion abort |
.read(10) |
BufferWithSegmentsCollection |
c-ext/bufferutil.c |
firstElements |
[0] |
ZstdCompressionObj |
c-ext/compressobj.c |
compressor |
via type() |
ZstdDecompressionObj |
c-ext/decompressobj.c |
decompressor |
via type() |
ZstdCompressorIterator |
c-ext/compressoriterator.c |
compressor/reader |
via type() |
ZstdDecompressorIterator |
c-ext/decompressoriterator.c |
decompressor/reader |
via type() |
ZstdCompressionChunker |
c-ext/compressionchunker.c |
compressor |
via type() |
ZstdDecompressionReader (silent) |
c-ext/decompressionreader.c |
zero-state early return |
.read(10) → b'' |
Suggested fix
Two options; Option A is recommended. It yields a usable object immediately after __new__ and composes cleanly with a re-init leak finding noted below.
Option A — allocate in tp_new
Replace {Py_tp_new, PyType_GenericNew} with a type-specific tp_new that installs the minimum non-NULL state the methods need:
static PyObject *
ZstdCompressor_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
ZstdCompressor *self = (ZstdCompressor *)type->tp_alloc(type, 0);
if (!self) {
return NULL;
}
self->cctx = ZSTD_createCCtx();
if (!self->cctx) {
Py_DECREF(self);
PyErr_NoMemory();
return NULL;
}
/* other fields stay zero-initialized; tp_init will configure them. */
return (PyObject *)self;
}
static PyType_Slot ZstdCompressor_slots[] = {
{Py_tp_dealloc, ZstdCompressor_dealloc},
{Py_tp_methods, ZstdCompressor_methods},
{Py_tp_init, (initproc)ZstdCompressor_init},
{Py_tp_new, ZstdCompressor_new}, /* was: PyType_GenericNew */
{0, 0},
};
tp_init then only configures the already-allocated context. For types whose tp_init also allocates (ZstdCompressionParameters, BufferWithSegmentsCollection, ...), the allocation moves into tp_new and tp_init is reduced to argument parsing + configuration.
Option B — guard every method
Add a NULL check at the top of every method that dereferences an initializable field:
static PyObject *
ZstdCompressor_compress(ZstdCompressor *self, PyObject *args)
{
if (!self->cctx) {
PyErr_SetString(PyExc_ValueError,
"ZstdCompressor not initialized - call __init__");
return NULL;
}
/* ... */
}
Less invasive but requires auditing every method on every affected type, and new methods have to remember the guard.
Related / follow-ups
- This composes with a re-init leak finding (
comp.__init__(...) on an already-initialized instance allocates new contexts without freeing the previous ones — ~5.5 KB per call) covered in the full report linked below. Option A cleanly separates "create fresh state" (tp_new) from "configure from kwargs" (tp_init), making it easy to either reject re-init or free-then-recreate.
- The
ZstdCompressionObj / ZstdDecompressionObj / iterator / chunker types are returned from factories and not directly constructable, but their __new__ remains reachable via type(). Same fix applies.
Methodology
Found via cext-review-toolkit (Tree-sitter-based static analysis with structured naive/informed review passes). All 7 direct-namespace reproducers and 5 type()-introspection reproducers were verified live on CPython 3.14.3 debug build. Happy to open a PR — the Option A change is mechanical across the 13 specs. I'd propose a single PR with the atomic set of changes, but can split into a 2-commit PR (simple-allocation types first, more-complex-init types second) if you prefer.
Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.
Full report
Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605
Summary
PyType_GenericNewcreates zero-initialized instances, but zstandard's method implementations assume the C-level pointer fields (cctx,dctx,params, ...) are non-NULL.T.__new__(T).method(...)therefore segfaults for 12 of 13 extension types with nothing more than a standard-library import.Impact
7a77a75); pattern very likely present in prior releases as well.Reproducers
Each one-liner segfaults:
Five more types aren't in the top-level namespace but are reachable via
type()introspection and crash the same way:ZstdDecompressionReader(the 13th affected type) does not crash —read()returnsb''becauseinput.pos == input.size == 0takes an early-return branch. The instance is still in an invalid state; any future method that doesn't short-circuit on this state will crash.Root cause
All 13 type specs install
{Py_tp_new, PyType_GenericNew}.PyType_GenericNewzero-initializes the instance.tp_initis where real allocation (ZSTD_createCCtx,ZSTD_createDCtx,PyMem_Malloc, ...) happens — skip__init__and the pointers stay NULL. Methods then do things likeZSTD_CCtx_reset(self->cctx, ...)on NULL.Affected types
ZstdCompressorc-ext/compressor.ccctx.compress(b'x')ZstdDecompressorc-ext/decompressor.cdctx(viaensure_dctx).decompress(b'x')ZstdCompressionParametersc-ext/compressionparams.cparamsZstdCompressionWriterc-ext/compressionwriter.ccompressor.write(b'x')ZstdDecompressionWriterc-ext/decompressionwriter.cdecompressor.write(b'x')ZstdCompressionReaderc-ext/compressionreader.c.read(10)BufferWithSegmentsCollectionc-ext/bufferutil.cfirstElements[0]ZstdCompressionObjc-ext/compressobj.ccompressortype()ZstdDecompressionObjc-ext/decompressobj.cdecompressortype()ZstdCompressorIteratorc-ext/compressoriterator.ccompressor/readertype()ZstdDecompressorIteratorc-ext/decompressoriterator.cdecompressor/readertype()ZstdCompressionChunkerc-ext/compressionchunker.ccompressortype()ZstdDecompressionReader(silent)c-ext/decompressionreader.c.read(10)→b''Suggested fix
Two options; Option A is recommended. It yields a usable object immediately after
__new__and composes cleanly with a re-init leak finding noted below.Option A — allocate in
tp_newReplace
{Py_tp_new, PyType_GenericNew}with a type-specifictp_newthat installs the minimum non-NULL state the methods need:tp_initthen only configures the already-allocated context. For types whosetp_initalso allocates (ZstdCompressionParameters,BufferWithSegmentsCollection, ...), the allocation moves intotp_newandtp_initis reduced to argument parsing + configuration.Option B — guard every method
Add a NULL check at the top of every method that dereferences an initializable field:
Less invasive but requires auditing every method on every affected type, and new methods have to remember the guard.
Related / follow-ups
comp.__init__(...)on an already-initialized instance allocates new contexts without freeing the previous ones — ~5.5 KB per call) covered in the full report linked below. Option A cleanly separates "create fresh state" (tp_new) from "configure from kwargs" (tp_init), making it easy to either reject re-init or free-then-recreate.ZstdCompressionObj/ZstdDecompressionObj/ iterator / chunker types are returned from factories and not directly constructable, but their__new__remains reachable viatype(). Same fix applies.Methodology
Found via cext-review-toolkit (Tree-sitter-based static analysis with structured naive/informed review passes). All 7 direct-namespace reproducers and 5
type()-introspection reproducers were verified live on CPython 3.14.3 debug build. Happy to open a PR — the Option A change is mechanical across the 13 specs. I'd propose a single PR with the atomic set of changes, but can split into a 2-commit PR (simple-allocation types first, more-complex-init types second) if you prefer.Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.
Full report
Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605