Skip to content

Memory leaks in stream lifecycle: 4 __exit__ + 2 close() discard CallMethod returns; 5 types leak on __init__ re-call (~5.5 KB); 2 Py_buffer leaks on closed streams #296

@devdanzin

Description

@devdanzin

Summary

Four distinct memory-leak patterns in stream/resource lifecycle. Three are mechanical (discarded CallMethod return values, missing PyBuffer_Release on a closed-stream branch); one is semantic (re-callable __init__ leaks the prior ZSTD contexts). Filing together because all four surface as "memory grows during normal usage of stream writers/readers", but each has a distinct fix and can be addressed independently.

Impact

  • Severity: Memory leak — no crash. Magnitude per occurrence ranges from ~30 bytes (discarded CallMethod returns) to ~5.5 KB (re-init contexts).
  • Reachability: Standard idioms — with comp.stream_writer(...):, explicit .close(), .write() after close.
  • Version: 0.25.0 (commit 7a77a75).
  • Platform: Confirmed Linux x86_64 / CPython 3.14 debug; bug is platform-independent.

Leak 1: 4 __exit__ methods discard close() return — ~31 B per with exit

PyObject_CallMethod(self, "close", NULL) returns a new reference; all 4 __exit__ implementations discard it without Py_DECREF.

Reproducer:

import zstandard, tracemalloc, gc
tracemalloc.start(); gc.collect()
s1 = tracemalloc.take_snapshot()

for _ in range(5000):
    comp = zstandard.ZstdCompressor()
    with comp.stream_writer(open('/dev/null', 'wb')) as w:
        w.write(b'hello' * 100)

gc.collect()
s2 = tracemalloc.take_snapshot()
diff = sum(s.size_diff for s in s2.compare_to(s1, 'lineno') if s.size_diff > 0)
print(f"{diff/5000:.1f} bytes per __exit__")   # ~31.1

Sites:

  • c-ext/compressionreader.c:57 (compressionreader_exit)
  • c-ext/compressionwriter.c:53 (ZstdCompressionWriter_exit)
  • c-ext/decompressionwriter.c:41 (ZstdDecompressionWriter_exit)
  • c-ext/decompressionreader.c:57 (decompressionreader_exit)

Fix:

PyObject *result = PyObject_CallMethod(self, "close", NULL);
Py_XDECREF(result);

Leak 2: 2 close() methods discard flush() return — ~32 B per close

Same pattern as Leak 1, different method. close() calls self.flush() via PyObject_CallMethod and discards the return.

Reproducer:

import zstandard, tracemalloc, gc
tracemalloc.start(); gc.collect()
s1 = tracemalloc.take_snapshot()

for _ in range(5000):
    comp = zstandard.ZstdCompressor()
    w = comp.stream_writer(open('/dev/null', 'wb'))
    w.write(b'hello' * 100)
    w.close()

gc.collect()
s2 = tracemalloc.take_snapshot()
diff = sum(s.size_diff for s in s2.compare_to(s1, 'lineno') if s.size_diff > 0)
print(f"{diff/5000:.1f} bytes per close")      # ~31.7

Sites:

  • c-ext/compressionwriter.c:219 (ZstdCompressionWriter_close)
  • c-ext/decompressionwriter.c:155 (ZstdDecompressionWriter_close)

Fix: Py_XDECREF(result); after each PyObject_CallMethod(..., "flush", ...) call.

Leak 3: Re-callable __init__ leaks ZSTD contexts — ~5.5 KB per re-init

Calling comp.__init__(...) on an already-initialized instance allocates a new cctx and params (via ZSTD_createCCtx / ZSTD_createCCtxParams) without freeing the prior ones. Uses system malloc, not CPython's allocator — tracemalloc doesn't observe it; RSS grows.

Affected types: ZstdCompressor, ZstdDecompressor, ZstdCompressionDict, BufferWithSegments, BufferWithSegmentsCollection.

Reproducer:

import zstandard, resource, gc
gc.collect()
r1 = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

comp = zstandard.ZstdCompressor()
for _ in range(50000):
    comp.__init__()
gc.collect()
r2 = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
print(f"{(r2-r1)*1024/50000:.0f} bytes per re-init")   # ~5541

Fix options:

Option A — free prior state at top of tp_init

if (self->cctx) { ZSTD_freeCCtx(self->cctx); self->cctx = NULL; }
if (self->params) { ZSTD_freeCCtxParams(self->params); self->params = NULL; }
/* ... then allocate ... */

Option B — reject re-init

if (self->cctx) {
    PyErr_SetString(PyExc_RuntimeError, "already initialized");
    return -1;
}

Option B composes cleanly with a separately-reported __new__() fix (if __new__ allocates the context in tp_new, tp_init simplifies to argument parsing and runtime-configuration only, and re-init naturally becomes "error").

Leak 4: 2 Py_buffer leaks in writer methods on closed streams

The y* arg format acquires a Py_buffer on input data. The "stream is closed" check returns NULL before PyBuffer_Release → buffer stays locked → a later bytearray.extend() on the same data raises BufferError, and the underlying memory is held until the buffer-owning object itself is collected.

Reproducer:

import zstandard
comp = zstandard.ZstdCompressor()
writer = comp.stream_writer(open('/dev/null', 'wb'))
writer.write(b'hello')
writer.close()

data = bytearray(1000)
try:
    writer.write(data)          # Py_buffer acquired, error, never released
except ValueError:
    pass

data.extend(b'x')               # BufferError: Existing exports of data

Sites:

  • c-ext/compressionwriter.c:85 (ZstdCompressionWriter_write — closed-check path)
  • c-ext/decompressionwriter.c:73 (ZstdDecompressionWriter_write — closed-check path)
  • c-ext/decompressionwriter.c:103 (same function — output.dst leak on writer.write() raising)

Fix:

if (self->closed) {
    PyBuffer_Release(&source);
    PyErr_SetString(PyExc_ValueError, "stream is closed");
    return NULL;
}

Suggested PR shape

Four independent patches; happy to bundle in one PR or split by leak. The Py_XDECREF-on-CallMethod fixes (Leaks 1 + 2) are trivial; Leak 3 is a semantic choice (free-and-reinit vs. reject-reinit); Leak 4 is mechanical.

Methodology

Found via cext-review-toolkit (Tree-sitter-based static analysis with structured naive/informed review passes). All four leaks verified live on CPython 3.14.3 debug build. Leaks 1 + 2 measured via tracemalloc (per-call deltas match the reference-count-size overhead exactly). Leak 3 measured via resource.getrusage(ru_maxrss) because the allocator is libc malloc, outside CPython's tracking. Leak 4 verified via the BufferError observable. Happy to open a PR.

Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.

Full report

Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions