Commit Graph

140 Commits

Author SHA1 Message Date
T. Wouters
1858db7cbd
GH-105808: Fix a regression introduced in GH-101251 (#105910)
Fix a regression introduced in pythonGH-101251, causing GzipFile.flush() to
not flush the compressor (nor pass along the zip_mode argument).
2023-06-19 17:09:04 +00:00
Arjun
9af485436b
gh-89550: Buffer GzipFile.write to reduce execution time by ~15% (#101251)
Use `io.BufferedWriter` to buffer gzip writes.

---------

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
2023-05-08 17:55:59 +00:00
Ruben Vorderman
eae7dad402
gh-95534: Improve gzip reading speed by 10% (#97664)
Change summary:
+ There is now a `gzip.READ_BUFFER_SIZE` constant that is 128KB. Other programs that read in 128KB chunks: pigz and cat. So this seems best practice among good programs. Also it is faster than 8 kb chunks.
+ a zlib._ZlibDecompressor was added. This is the _bz2.BZ2Decompressor ported to zlib. Since the zlib.Decompress object is better for in-memory decompression, the _ZlibDecompressor is hidden. It only makes sense in file decompression, and that is already implemented now in the gzip library. No need to bother the users with this.
+ The ZlibDecompressor uses the older Cpython arrange_output_buffer functions, as those are faster and more appropriate for the use case. 
+ GzipFile.read has been optimized. There is no longer a `unconsumed_tail` member to write back to padded file. This is instead handled by the ZlibDecompressor itself, which has an internal buffer. `_add_read_data` has been inlined, as it was just two calls.

EDIT: While I am adding improvements anyway, I figured I could add another one-liner optimization now to the python -m gzip application. That read chunks in io.DEFAULT_BUFFER_SIZE previously, but has been updated now to use READ_BUFFER_SIZE chunks.
2022-10-16 19:10:58 -07:00
Victor Stinner
d3a27e4c93
gh-94196: Remove gzip.GzipFile.filename attribute (#94197)
gzip: Remove the filename attribute of gzip.GzipFile,
deprecated since Python 2.6, use the name attribute instead. In write
mode, the filename attribute added '.gz' file extension if it was not
present.
2022-06-24 11:59:32 +02:00
Ilya Leoshkevich
943ca5e1d6
gh-90839: Forward gzip.compress() compresslevel to zlib (gh-31215) 2022-04-12 22:46:40 +09:00
Ruben Vorderman
0ff3d95b98
bpo-45507: EOFErrors should be thrown for truncated gzip members (GH-29029) 2021-11-19 19:07:05 +01:00
Inada Naoki
0a4c82ddd3
bpo-45475: Revert __iter__ optimization for GzipFile, BZ2File, and LZMAFile. (GH-29016)
This reverts commit d2a8e69c2c.
2021-10-19 11:51:48 +09:00
Ruben Vorderman
ea23e7820f
bpo-43613: Faster implementation of gzip.compress and gzip.decompress (GH-27941)
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
2021-09-02 17:02:59 +02:00
Ma Lin
bc6c12c72a
bpo-44439: BZ2File.write() / LZMAFile.write() handle buffer protocol correctly (GH-26764)
No longer use len() to get the length of the input data. For some buffer protocol objects,
the length obtained by using len() is wrong.
2021-06-22 10:04:23 +03:00
Ashwin Ramaswami
de367378f6
Fix typo in comment (GH-26162) 2021-05-16 16:35:41 +01:00
Inada Naoki
d2a8e69c2c
bpo-43787: Add __iter__ to GzipFile, BZ2File, and LZMAFile (GH-25353) 2021-04-13 13:51:49 +09:00
Inada Naoki
4827483f47
bpo-43510: Implement PEP 597 opt-in EncodingWarning. (GH-19481)
See [PEP 597](https://www.python.org/dev/peps/pep-0597/).

* Add `-X warn_default_encoding` and `PYTHONWARNDEFAULTENCODING`.
* Add EncodingWarning
* Add io.text_encoding()
* open(), TextIOWrapper() emits EncodingWarning when encoding is omitted and warn_default_encoding is enabled.
* _pyio.TextIOWrapper() uses UTF-8 as fallback default encoding used when failed to import locale module. (used during building Python)
* bz2, configparser, gzip, lzma, pathlib, tempfile modules use io.text_encoding().
* What's new entry
2021-03-29 12:28:14 +09:00
Ruben Vorderman
7956ef8849
bpo-43317: Use io.DEFAULT_BUFFER_SIZE instead of 1024 in gzip CLI (#24645)
This improves the performance slightly.
2021-02-26 21:17:51 +09:00
Inada Naoki
9525a18b5b
bpo-43316: gzip: Fix sys.exit() usage. (GH-24652) 2021-02-26 11:09:06 +09:00
Ruben Vorderman
cc3df6368d
bpo-43316: gzip: CLI uses non-zero return code on error. (GH-24647)
Exit code is now 1 instead of 0. A message is printed to stderr instead of stdout. This is
the proper behaviour for a tool that can be used in scripts.
2021-02-25 20:30:24 +09:00
William Chargin
eab3b3f1c6 bpo-39389: gzip: fix compression level metadata (GH-18077)
As described in RFC 1952, section 2.3.1, the XFL (eXtra FLags) byte of a
gzip member header should indicate whether the DEFLATE algorithm was
tuned for speed or compression ratio. Prior to this patch, archives
emitted by the `gzip` module always indicated maximum compression.
2020-01-21 13:25:24 +02:00
Serhiy Storchaka
a0652328a2
bpo-28286: Deprecate opening GzipFile for writing implicitly. (GH-16417)
Always specify the mode argument for writing.
2019-11-16 18:56:57 +02:00
Zackery Spytz
cf599f6f6f bpo-6584: Add a BadGzipFile exception to the gzip module. (GH-13022)
Co-Authored-By: Filip Gruszczyński <gruszczy@gmail.com>
Co-Authored-By: Michele Orrù <maker@tumbolandia.net>
2019-05-13 10:50:52 +03:00
Maximilian Nöthe
4f5a3493b5 fix typo in gzip.py (GH-12928) 2019-04-24 18:21:02 +09:00
guoci
0e7497cb46 bpo-34898: Add mtime parameter to gzip.compress(). (GH-9704)
Without setting mtime, time.time() will be used as the timestamp which will
end up in the compressed data and each invocation of the compress() function
will vary over time.
2018-11-07 11:50:23 +02:00
Stéphane Wirtel
3e28eed9ec bpo-34969: Add --fast, --best on the gzip CLI (GH-9833) 2018-11-03 16:24:23 +01:00
Stéphane Wirtel
e8bbc52deb bpo-23596: Use argparse for the command line of gzip (GH-9781)
Co-authored-by: Antony Lee <anntzer.lee@gmail.com>
2018-10-10 00:41:33 +02:00
Victor Stinner
8c663fd60e
Replace KB unit with KiB (#4293)
kB (*kilo* byte) unit means 1000 bytes, whereas KiB ("kibibyte")
means 1024 bytes. KB was misused: replace kB or KB with KiB when
appropriate.

Same change for MB and GB which become MiB and GiB.

Change the output of Tools/iobench/iobench.py.

Round also the size of the documentation from 5.5 MB to 5 MiB.
2017-11-08 14:44:44 -08:00
Berker Peksag
03020cfa97 Issue #28227: gzip now supports pathlib
Patch by Ethan Furman.
2016-10-02 13:47:58 +03:00
Serhiy Storchaka
5f1a5187f7 Use sequence repetition instead of bytes constructor with integer argument. 2016-09-11 14:41:02 +03:00
Martin Panter
8f26565ba9 Fix spelling (inital), grammar (may translates) in documentation, comments 2016-04-19 04:03:41 +00:00
Martin Panter
b82032f935 Issue #22341: Drop Python 2 workaround and document CRC initial value
Also align the parameter naming in binascii to be consistent with zlib.
2015-12-11 05:19:29 +00:00
Antoine Pitrou
2dbc6e6bce Issue #23529: Limit the size of decompressed data when reading from
GzipFile, BZ2File or LZMAFile.  This defeats denial of service attacks
using compressed bombs (i.e. compressed payloads which decompress to a huge
size).

Patch by Martin Panter and Nikolaus Rath.
2015-04-11 00:31:01 +02:00
Serhiy Storchaka
2116b12da5 Issue #23865: close() methods in multiple modules now are idempotent and more
robust at shutdown. If needs to release multiple resources, they are released
even if errors are occured.
2015-04-10 13:29:28 +03:00
Serhiy Storchaka
7e7a3dba5f Issue #23865: close() methods in multiple modules now are idempotent and more
robust at shutdown. If needs to release multiple resources, they are released
even if errors are occured.
2015-04-10 13:24:41 +03:00
Serhiy Storchaka
bca63b362d Issue #23688: Added support of arbitrary bytes-like objects and avoided
unnecessary copying of memoryview in gzip.GzipFile.write().
Original patch by Wolfgang Maier.
2015-03-23 14:59:48 +02:00
Serhiy Storchaka
d4c2ac8394 Issue #21560: An attempt to write a data of wrong type no longer cause
GzipFile corruption.  Original patch by Wolfgang Maier.
2015-03-23 15:25:43 +02:00
Ned Deily
e5127299c8 Issue #20875: Merge from 3.3 2014-03-09 14:47:58 -07:00
Ned Deily
6120739f0c Issue #20875: Prevent possible gzip "'read' is not defined" NameError.
Patch by Claudiu Popa.
2014-03-09 14:44:34 -07:00
Nadeem Vawda
ee1be99e05 Issue #19222: Add support for the 'x' mode to the gzip module.
Original patch by Tim Heaney.
2013-10-19 00:11:13 +02:00
Serhiy Storchaka
48e6a8c88a Issue #18743: Fix references to non-existant "StringIO" module
in docstrings and comments.
2013-08-29 11:39:48 +03:00
Serhiy Storchaka
50254c57cd Issue #18743: Fix references to non-existant "StringIO" module
in docstrings and comments.
2013-08-29 11:35:43 +03:00
Georg Brandl
b3bd624a55 Back out patch for #1159051, which caused backwards compatibility problems. 2013-05-12 11:57:26 +02:00
Serhiy Storchaka
ffcd339aac Close #17666: Fix reading gzip files with an extra field. 2013-04-08 22:37:15 +03:00
Serhiy Storchaka
7e69f0085e Close #17666: Fix reading gzip files with an extra field. 2013-04-08 22:35:02 +03:00
Serhiy Storchaka
cc0172c007 Issue #1159051: GzipFile now raises EOFError when reading a corrupted file
with truncated header or footer.
Added tests for reading truncated gzip, bzip2, and lzma files.
2013-01-22 17:11:07 +02:00
Serhiy Storchaka
57f9b7a124 Issue #1159051: GzipFile now raises EOFError when reading a corrupted file
with truncated header or footer.
Added tests for reading truncated gzip, bzip2, and lzma files.
2013-01-22 17:07:49 +02:00
Serhiy Storchaka
7c3922f44c Issue #1159051: GzipFile now raises EOFError when reading a corrupted file
with truncated header or footer.
Added tests for reading truncated gzip and bzip2 files.
2013-01-22 17:01:59 +02:00
Serhiy Storchaka
fc6e8aabf5 #15546: Fix GzipFile.peek()'s handling of pathological input data.
This is a backport of changeset 8c07ff7f882f.
2013-01-22 15:54:48 +02:00
Andrew Svetlov
f7a17b48d7 Replace IOError with OSError (#16715) 2012-12-25 16:47:37 +02:00
Nadeem Vawda
6ff262e18f Issue #15677: Document that zlib and gzip accept a compression level of 0 to mean 'no compression'.
Patch by Brian Brazil.
2012-11-11 14:14:47 +01:00
Nadeem Vawda
19e568d254 Issue #15677: Document that zlib and gzip accept a compression level of 0 to mean 'no compression'.
Patch by Brian Brazil.
2012-11-11 14:04:14 +01:00
Antoine Pitrou
2a021c80ce Issue #15800: fix the closing of input / output files when gzip is used as a script. 2012-08-30 00:30:14 +02:00
Antoine Pitrou
ecc4757b79 Issue #15800: fix the closing of input / output files when gzip is used as a script. 2012-08-30 00:29:24 +02:00
Nadeem Vawda
043540088a #15546: Also fix GzipFile.peek(). 2012-08-05 14:45:41 +02:00