Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 25 additions & 33 deletions Doc/library/zipfile.rst
Original file line number Diff line number Diff line change
Expand Up @@ -554,10 +554,9 @@ ZipFile objects

Removes a member entry from the archive's central directory.
*zinfo_or_arcname* may be the full path of the member or a :class:`ZipInfo`
instance. If multiple members share the same full path and the path is
given as a string, only one of them is removed and which one is unspecified;
it should not be relied upon. Pass the specific :class:`ZipInfo` instance to
remove a particular member.
instance. If multiple members share the same path and a string is provided,
only one unspecified entry is removed; pass a specific :class:`ZipInfo`
instance to guarantee which member is removed.

The archive must be opened with mode ``'w'``, ``'x'`` or ``'a'``.

Expand All @@ -569,10 +568,9 @@ ZipFile objects
This method only removes the member's entry from the central directory,
making it inaccessible to most tools. The member's local file entry,
including content and metadata, remains in the archive and is still
recoverable using forensic tools. Call :meth:`repack` afterwards to
remove the local file entry and reclaim space; pass the returned
:class:`ZipInfo` to :meth:`repack` to ensure the data is removed
regardless of how the entry was written.
forensically recoverable. To completely delete the data and reclaim
space, call :meth:`repack` afterwards (preferably passing the returned
:class:`ZipInfo` instance).

.. versionadded:: next

Expand All @@ -585,37 +583,31 @@ ZipFile objects

If *removed* is provided, it must be a sequence of :class:`ZipInfo` objects
representing the recently removed members, and only their corresponding
local file entries will be removed. Otherwise, the archive is scanned to
locate and remove local file entries that are no longer referenced in the
central directory.

Passing *removed* is the most reliable way to reclaim space: the
corresponding local file entries are located directly from the central
directory and removed regardless of how they were written, whereas the scan
used when *removed* is omitted may leave some entries in place (see
*strict_descriptor* below). To remove members and reclaim their space in a
single step::
local file entries will be removed. This is the most efficient and reliable
way to reclaim space. A brief example looks like::

with ZipFile('spam.zip', 'a') as myzip:
removed = [myzip.remove(name) for name in ('ham.txt', 'eggs.txt')]
myzip.repack(removed)

When scanning, *strict_descriptor* controls how entries written with an
unsigned *data descriptor* are handled. A data descriptor is an optional
record holding an entry's CRC and sizes, stored just after the entry's data;
it is used when the archive is written to a non-seekable stream, and is
*signed* when it begins with a marker signature or *unsigned* otherwise.
If *removed* is omitted, the archive is scanned to locate and remove local
file entries that are no longer referenced in the central directory.

When scanning, *strict_descriptor* controls how entries with an unsigned
data descriptor are handled. A data descriptor is an optional record
(mostly used for non-seekable streaming) stored after an entry's data, and
can be either signed (beginning with a magic signature) or unsigned.
Unsigned descriptors have been deprecated by the `PKZIP Application Note`_
since version 6.3.0 (released in 2006) and are written only by some legacy
tools; signed descriptors—written by Python and other modern tools—are always
detected. When *strict_descriptor* is true (the default), only signed data
descriptors are detected, so an unreferenced entry written with an unsigned
descriptor is not located and its space is not reclaimed by the scan.
Setting ``strict_descriptor=False`` additionally detects unsigned
descriptors, at the cost of a significantly slower scan—around 100 to 1000
times in the worst case—which may be exploitable as a denial-of-service
vector on untrusted input. This does not affect entries without a data
descriptor, and is not needed when *removed* is provided.
since version 6.3.0 (released in 2006) and are rarely produced by modern
tools.

When *strict_descriptor* is true (the default), unsigned descriptors are
not detectable, and unreferenced entries using them are not recognized and
their space is not reclaimed. Setting ``strict_descriptor=False`` allows
such entries to be properly handled, at the cost of a significantly slower
scan—around 100 to 1000 times in the worst case—which may be exploitable
as a denial-of-service vector on untrusted input. Entries without a
descriptor or with a signed descriptor are unaffected.

*chunk_size* may be specified to control the buffer size when moving
entry data (default is 1 MiB).
Expand Down
Loading