Skip to content

Commit 49125bd

Browse files
committed
gh-51067: Rephrase doc for ZipFile.{remove,repack} (GH-151932)
1 parent aec0aed commit 49125bd

1 file changed

Lines changed: 25 additions & 33 deletions

File tree

Doc/library/zipfile.rst

Lines changed: 25 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -554,10 +554,9 @@ ZipFile objects
554554

555555
Removes a member entry from the archive's central directory.
556556
*zinfo_or_arcname* may be the full path of the member or a :class:`ZipInfo`
557-
instance. If multiple members share the same full path and the path is
558-
given as a string, only one of them is removed and which one is unspecified;
559-
it should not be relied upon. Pass the specific :class:`ZipInfo` instance to
560-
remove a particular member.
557+
instance. If multiple members share the same path and a string is provided,
558+
only one unspecified entry is removed; pass a specific :class:`ZipInfo`
559+
instance to guarantee which member is removed.
561560

562561
The archive must be opened with mode ``'w'``, ``'x'`` or ``'a'``.
563562

@@ -569,10 +568,9 @@ ZipFile objects
569568
This method only removes the member's entry from the central directory,
570569
making it inaccessible to most tools. The member's local file entry,
571570
including content and metadata, remains in the archive and is still
572-
recoverable using forensic tools. Call :meth:`repack` afterwards to
573-
remove the local file entry and reclaim space; pass the returned
574-
:class:`ZipInfo` to :meth:`repack` to ensure the data is removed
575-
regardless of how the entry was written.
571+
forensically recoverable. To completely delete the data and reclaim
572+
space, call :meth:`repack` afterwards (preferably passing the returned
573+
:class:`ZipInfo` instance).
576574

577575
.. versionadded:: next
578576

@@ -585,37 +583,31 @@ ZipFile objects
585583

586584
If *removed* is provided, it must be a sequence of :class:`ZipInfo` objects
587585
representing the recently removed members, and only their corresponding
588-
local file entries will be removed. Otherwise, the archive is scanned to
589-
locate and remove local file entries that are no longer referenced in the
590-
central directory.
591-
592-
Passing *removed* is the most reliable way to reclaim space: the
593-
corresponding local file entries are located directly from the central
594-
directory and removed regardless of how they were written, whereas the scan
595-
used when *removed* is omitted may leave some entries in place (see
596-
*strict_descriptor* below). To remove members and reclaim their space in a
597-
single step::
586+
local file entries will be removed. This is the most efficient and reliable
587+
way to reclaim space. A brief example looks like::
598588

599589
with ZipFile('spam.zip', 'a') as myzip:
600590
removed = [myzip.remove(name) for name in ('ham.txt', 'eggs.txt')]
601591
myzip.repack(removed)
602592

603-
When scanning, *strict_descriptor* controls how entries written with an
604-
unsigned *data descriptor* are handled. A data descriptor is an optional
605-
record holding an entry's CRC and sizes, stored just after the entry's data;
606-
it is used when the archive is written to a non-seekable stream, and is
607-
*signed* when it begins with a marker signature or *unsigned* otherwise.
593+
If *removed* is omitted, the archive is scanned to locate and remove local
594+
file entries that are no longer referenced in the central directory.
595+
596+
When scanning, *strict_descriptor* controls how entries with an unsigned
597+
data descriptor are handled. A data descriptor is an optional record
598+
(mostly used for non-seekable streaming) stored after an entry's data, and
599+
can be either signed (beginning with a magic signature) or unsigned.
608600
Unsigned descriptors have been deprecated by the `PKZIP Application Note`_
609-
since version 6.3.0 (released in 2006) and are written only by some legacy
610-
tools; signed descriptors—written by Python and other modern tools—are always
611-
detected. When *strict_descriptor* is true (the default), only signed data
612-
descriptors are detected, so an unreferenced entry written with an unsigned
613-
descriptor is not located and its space is not reclaimed by the scan.
614-
Setting ``strict_descriptor=False`` additionally detects unsigned
615-
descriptors, at the cost of a significantly slower scan—around 100 to 1000
616-
times in the worst case—which may be exploitable as a denial-of-service
617-
vector on untrusted input. This does not affect entries without a data
618-
descriptor, and is not needed when *removed* is provided.
601+
since version 6.3.0 (released in 2006) and are rarely produced by modern
602+
tools.
603+
604+
When *strict_descriptor* is true (the default), unsigned descriptors are
605+
not detectable, and unreferenced entries using them are not recognized and
606+
their space is not reclaimed. Setting ``strict_descriptor=False`` allows
607+
such entries to be properly handled, at the cost of a significantly slower
608+
scan—around 100 to 1000 times in the worst case—which may be exploitable
609+
as a denial-of-service vector on untrusted input. Entries without a
610+
descriptor or with a signed descriptor are unaffected.
619611

620612
*chunk_size* may be specified to control the buffer size when moving
621613
entry data (default is 1 MiB).

0 commit comments

Comments
 (0)