Skip to content

Commit cbae620

Browse files
committed
gh-51067: Document reliable space reclamation in ZipFile.repack()
Clarify that passing removed= reclaims a member's space regardless of how its local file entry was written, while the scan used when removed is omitted may leave entries in place (e.g. unsigned data descriptors under the default strict_descriptor=True). Adjust the remove() note so it no longer implies a bare repack() always removes the data, and add a short remove-then-repack(removed) example.
1 parent 9376c2d commit cbae620

1 file changed

Lines changed: 14 additions & 1 deletion

File tree

Doc/library/zipfile.rst

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -570,7 +570,9 @@ ZipFile objects
570570
making it inaccessible to most tools. The member's local file entry,
571571
including content and metadata, remains in the archive and is still
572572
recoverable using forensic tools. Call :meth:`repack` afterwards to
573-
completely remove the member and reclaim space.
573+
remove the local file entry and reclaim space; pass the returned
574+
:class:`ZipInfo` to :meth:`repack` to ensure the data is removed
575+
regardless of how the entry was written.
574576

575577
.. versionadded:: next
576578

@@ -587,6 +589,17 @@ ZipFile objects
587589
locate and remove local file entries that are no longer referenced in the
588590
central directory.
589591

592+
Passing *removed* is the most reliable way to reclaim space: the
593+
corresponding local file entries are located directly from the central
594+
directory and removed regardless of how they were written, whereas the scan
595+
used when *removed* is omitted may leave some entries in place (see
596+
*strict_descriptor* below). To remove members and reclaim their space in a
597+
single step::
598+
599+
with ZipFile('spam.zip', 'a') as myzip:
600+
removed = [myzip.remove(name) for name in ('ham.txt', 'eggs.txt')]
601+
myzip.repack(removed)
602+
590603
When scanning, *strict_descriptor* controls how entries written with an
591604
unsigned *data descriptor* are handled. A data descriptor is an optional
592605
record holding an entry's CRC and sizes, stored just after the entry's data;

0 commit comments

Comments
 (0)