diff --git a/Doc/library/zipfile.rst b/Doc/library/zipfile.rst index 98d2a5e5cdf00e..f7e3245b71d85a 100644 --- a/Doc/library/zipfile.rst +++ b/Doc/library/zipfile.rst @@ -554,10 +554,9 @@ ZipFile objects Removes a member entry from the archive's central directory. *zinfo_or_arcname* may be the full path of the member or a :class:`ZipInfo` - instance. If multiple members share the same full path and the path is - given as a string, only one of them is removed and which one is unspecified; - it should not be relied upon. Pass the specific :class:`ZipInfo` instance to - remove a particular member. + instance. If multiple members share the same path and a string is provided, + only one unspecified entry is removed; pass a specific :class:`ZipInfo` + instance to guarantee which member is removed. The archive must be opened with mode ``'w'``, ``'x'`` or ``'a'``. @@ -569,10 +568,9 @@ ZipFile objects This method only removes the member's entry from the central directory, making it inaccessible to most tools. The member's local file entry, including content and metadata, remains in the archive and is still - recoverable using forensic tools. Call :meth:`repack` afterwards to - remove the local file entry and reclaim space; pass the returned - :class:`ZipInfo` to :meth:`repack` to ensure the data is removed - regardless of how the entry was written. + forensically recoverable. To completely delete the data and reclaim + space, call :meth:`repack` afterwards (preferably passing the returned + :class:`ZipInfo` instance). .. versionadded:: next @@ -585,37 +583,31 @@ ZipFile objects If *removed* is provided, it must be a sequence of :class:`ZipInfo` objects representing the recently removed members, and only their corresponding - local file entries will be removed. Otherwise, the archive is scanned to - locate and remove local file entries that are no longer referenced in the - central directory. - - Passing *removed* is the most reliable way to reclaim space: the - corresponding local file entries are located directly from the central - directory and removed regardless of how they were written, whereas the scan - used when *removed* is omitted may leave some entries in place (see - *strict_descriptor* below). To remove members and reclaim their space in a - single step:: + local file entries will be removed. This is the most efficient and reliable + way to reclaim space. A brief example looks like:: with ZipFile('spam.zip', 'a') as myzip: removed = [myzip.remove(name) for name in ('ham.txt', 'eggs.txt')] myzip.repack(removed) - When scanning, *strict_descriptor* controls how entries written with an - unsigned *data descriptor* are handled. A data descriptor is an optional - record holding an entry's CRC and sizes, stored just after the entry's data; - it is used when the archive is written to a non-seekable stream, and is - *signed* when it begins with a marker signature or *unsigned* otherwise. + If *removed* is omitted, the archive is scanned to locate and remove local + file entries that are no longer referenced in the central directory. + + When scanning, *strict_descriptor* controls how entries with an unsigned + data descriptor are handled. A data descriptor is an optional record + (mostly used for non-seekable streaming) stored after an entry's data, and + can be either signed (beginning with a magic signature) or unsigned. Unsigned descriptors have been deprecated by the `PKZIP Application Note`_ - since version 6.3.0 (released in 2006) and are written only by some legacy - tools; signed descriptors—written by Python and other modern tools—are always - detected. When *strict_descriptor* is true (the default), only signed data - descriptors are detected, so an unreferenced entry written with an unsigned - descriptor is not located and its space is not reclaimed by the scan. - Setting ``strict_descriptor=False`` additionally detects unsigned - descriptors, at the cost of a significantly slower scan—around 100 to 1000 - times in the worst case—which may be exploitable as a denial-of-service - vector on untrusted input. This does not affect entries without a data - descriptor, and is not needed when *removed* is provided. + since version 6.3.0 (released in 2006) and are rarely produced by modern + tools. + + When *strict_descriptor* is true (the default), unsigned descriptors are + not detectable, and unreferenced entries using them are not recognized and + their space is not reclaimed. Setting ``strict_descriptor=False`` allows + such entries to be properly handled, at the cost of a significantly slower + scan—around 100 to 1000 times in the worst case—which may be exploitable + as a denial-of-service vector on untrusted input. Entries without a + descriptor or with a signed descriptor are unaffected. *chunk_size* may be specified to control the buffer size when moving entry data (default is 1 MiB).