mirrors/zstd

mirror of https://github.com/facebook/zstd.git synced 2024-12-05 04:26:45 +08:00

Author	SHA1	Message	Date
W. Felix Handte	6cb2454646	Remove CParams from Block Compressor Functions' Args	2018-09-28 17:10:42 -07:00
W. Felix Handte	76ef87ed9d	Add ZSTD_compressionParameters to ZSTD_matchState_t	2018-09-28 17:10:42 -07:00
Nick Terrell	5e580de6da	[zstd] Fix seqStore growth We could undersize the literals buffer by up to 11 bytes, due to a combination of 2 bugs: * The literals buffer didn't have `WILDCOPY_OVERLENGTH` extra space, like it is supposed to. * We didn't check the literals buffer size in `ZSTD_sufficientBuff()`.	2018-08-28 13:24:44 -07:00
Nick Terrell	924944e471	[zstd] Reuse the ZSTD_CCtx more often with small data.	2018-08-23 17:48:06 -07:00
W. Felix Handte	01bb1c1016	Add CCtx Param Controlling Dict Attachment Behavior	2018-06-21 17:29:25 -04:00
Yann Collet	fa41bcc2c2	grouped debug functions into debug.h There were 2 competing set of debug functions within zstd_internal.h and bitstream.h. They were mostly duplicate, and required care to avoid messing with each other. There is now a single implementation, shared by both. Significant change : The macro variable ZSTD_DEBUG does no longer exist, it has been replaced by DEBUGLEVEL, which required modifying several source files.	2018-06-13 15:43:09 -04:00
Yann Collet	3050733042	Merge branch 'dev' into negLevels	2018-06-07 15:51:35 -07:00
Yann Collet	a57b4df85f	removed literalCompression directive in this version, literal compression is always disabled for ZSTD_fast strategy. Performance parity between ZSTD_compress_advanced() and ZSTD_compress_generic()	2018-06-07 15:24:12 -07:00
Yann Collet	e5e17d009f	changed member name to workSpaceOversizedDuration	2018-06-06 15:00:27 -07:00
Yann Collet	3d523c741b	added workSpaceTooLarge and workSpaceWasteful also : slightly increased speed of test fuzzer.16	2018-06-05 11:42:48 -07:00
Yann Collet	2108decb41	Fixed a nasty corruption bug recently introduce into the new dictionary mode. The bug could be reproduced with this command : ./zstreamtest -v --opaqueapi --no-big-tests -s4092 -t639 error was in function ZSTD_count_2segments() : the beginning of the 2nd segment corresponds to prefixStart and not the beginning of the current block (istart == src). This would result in comparing the wrong byte.	2018-06-01 18:54:34 -07:00
Yann Collet	463a0fe38b	simplified optimal parser removed "cached" structure. prices are now saved in the optimal table. Primarily done for simplification. Might improve speed by a little. But actually, and surprisingly, also improves ratio in some circumstances.	2018-05-29 14:07:25 -07:00
Yann Collet	f6ad59ab5c	Merge branch 'dev' into staticDictCost	2018-05-24 16:21:02 -07:00
W. Felix Handte	298d24fa57	Make loadedDictEnd an Index, not the Dict Len	2018-05-23 17:53:03 -04:00
W. Felix Handte	3ba70cc759	Clear the Dictionary When Sliding the Window	2018-05-23 17:53:03 -04:00
W. Felix Handte	191fc74a51	Rename 'hasDict' to 'dictMode'	2018-05-23 17:53:03 -04:00
W. Felix Handte	ae4fcf7816	Respond to PR Comments; Formatting/Style/Lint Fixes	2018-05-23 17:53:03 -04:00
W. Felix Handte	b67196f30d	Coalesce hasDictMatchState and extDict Checks into One Enum and Rename Stuff	2018-05-23 17:53:03 -04:00
W. Felix Handte	265c2869d1	Split Wrapper Functions to Cause Inlining	2018-05-23 17:53:03 -04:00
W. Felix Handte	8d24ff0353	Preliminary Support in ZSTD_compressBlock_fast_generic() for Ext Dict Ctx	2018-05-23 17:53:03 -04:00
W. Felix Handte	d18a405779	Refer to the Dictionary Match State In-Place (Sometimes)	2018-05-23 17:53:03 -04:00
Nick Terrell	e3959d5eba	Fixes	2018-05-22 16:06:33 -07:00
Nick Terrell	49cf880513	Approximate FSE encoding costs for selection Estimate the cost for using FSE modes `set_basic`, `set_compressed`, and `set_repeat`, and select the one with the lowest cost. * The cost of `set_basic` is computed using the cross-entropy cost function `ZSTD_crossEntropyCost()`, using the normalized default count and the count. * The cost of `set_repeat` is computed using `FSE_bitCost()`. We check the previous table to see if it is able to represent the distribution. * The cost of `set_compressed` is computed with the entropy cost function `ZSTD_entropyCost()`, together with the cost of writing the normalized count `ZSTD_NCountCost()`.	2018-05-22 14:33:22 -07:00
Yann Collet	a95e9e80d1	adding some debug functions to observe statistics	2018-05-18 14:09:42 -07:00
Yann Collet	8572b4d09f	fixed a pretty complex bug when combining ldm + btultra	2018-05-17 16:13:53 -07:00
Yann Collet	a243020d37	slightly improved weight calculation translating into a tiny compression ratio improvement	2018-05-17 11:19:44 -07:00
Yann Collet	18fc3d3cd5	introduced bit-fractional cost evaluation this improves compression ratio by a tiny amount. It also reduces speed by a small amount. Consequently, bit-fractional evaluation is only turned on for btultra.	2018-05-16 14:53:35 -07:00
Yann Collet	2c26df0e13	opt: removed static prices after testing, it's actually always better to use dynamic prices albeit initialised from dictionary.	2018-05-14 18:04:08 -07:00
Yann Collet	761758982e	replaced FSE_count by FSE_count_simple to reduce usage of stack memory. Also : tweaked a few comments, as suggested by @terrelln	2018-05-11 16:03:37 -07:00
Yann Collet	74b1c75d64	btopt : minor adjustment of update frequencies	2018-05-10 16:32:36 -07:00
Yann Collet	338f738c24	pass entropy tables to optimal parser for proper estimation of symbol's weights when using dictionary compression. Note : using only huffman costs is not good enough, presumably because sequence symbol costs are incorrect.	2018-05-08 15:37:06 -07:00
Yann Collet	a155061328	minor code refactor for readability removed some useless operations from optimal parser (should not change performance, too small a difference)	2018-05-08 12:32:44 -07:00
Nick Terrell	295ab0dbfa	Only load extra table positions for CDicts Zstdmt uses prefixes to load the overlap between segments. Loading extra positions makes compression non-deterministic, depending on the previous job the context was used for. Since loading extra position takes extra time as well, only do it when creating a `ZSTD_CDict`. Fixes #1077.	2018-04-02 14:41:30 -07:00
Yann Collet	a99c4a3621	Merge branch 'dev' into advancedDecompress	2018-03-21 06:08:28 -07:00
Yann Collet	87b0cf05bd	Merge pull request #1057 from facebook/lrmSettings LRM parameters	2018-03-21 05:59:39 -07:00
Yann Collet	6873fec658	changed dictMore for dictContentType which seems clearer to describe what the variable/argument is about.	2018-03-20 15:13:14 -07:00
Yann Collet	6f4d0778a5	make it possible to express compression parameters in any order	2018-03-19 14:41:23 -07:00
Nick Terrell	4af1fafeb8	Restore setting loadedDictEnd Setting `loadedDictEnd` was accidently removed from `ZSTD_loadDictionaryContent()`, which means that dictionary compression will only be able to reference the parts of the dictionary within the window. The spec allows us to reference the entire dictionary so long as even one byte is in the window. `ZSTD_enforceMaxDist()` incorrectly always allowed offsets up to `loadedDictEnd` beyond the window, even once the dictionary was out of range. When overflow protection kicked in, the check `current > loadedDictEnd + maxDist` is incorrect if `loadedDictEnd` isn't reset back to zero. `current` could be reset below the value, which would incorrectly allow references beyond the window. This bug is present in `master`, but is very hard to trigger, since it requires both dictionaries and data which triggers overflow correction.	2018-03-16 14:54:06 -07:00
Nick Terrell	1908c92c46	Merge remote-tracking branch 'upstream/dev' into extern-seq * upstream/dev: Fix overflow protection with wlog=31	2018-03-14 17:26:31 -07:00
Nick Terrell	a9a6dcba63	Expose reference external sequence API * Expose the reference external sequences API for zstdmt. Allows external sequences of any length, which get split when necessary. * Reset the LDM window when the context is reset. * Store the maximum number of LDM sequences. * Sequence generation now returns the number of last literals. * Fix sequence generation to not throw out the last literals when blocks of more than 1 MB are encountered.	2018-03-14 12:29:31 -07:00
Nick Terrell	33fb966e56	Fix overflow protection with wlog=31 The overflow protection is broken when the window log is `> (3U << 29)`, so 31. It doesn't work when `current` isn't around `1U << windowLog` ahead of `lowLimit`, and the the assertion `current > newCurrent` fails. This happens when the same context is used many times over, but with a large window log, like in zstdmt. Fix it by triggering correction based on `nextSrc - base` instead of `lowLimit`. The added test fails before the patch, and passes after.	2018-03-14 11:45:44 -07:00
Yann Collet	a146ee04ae	added negative compression levels negative compression level trade compression ratio for more compression speed. They turn off huffman compression of literals, and use row 0 as baseline with a stepSize = -cLevel. added associated test in fuzzer also added : new advanced parameter ZSTD_p_literalCompression	2018-03-11 05:21:53 -07:00
Nick Terrell	0a0e64c641	LDM manages its own window round buffer	2018-02-27 12:13:23 -08:00
Nick Terrell	7e5e226cbf	Split the window state into substructure	2018-02-26 13:29:57 -08:00
Nick Terrell	af866b3a58	Split block compresser out of long range matcher * `ZSTD_ldm_generateSequences()` generates the LDM sequences and stores them in a table. It should work with any chunk size, but is currently only called one block at a time. * `ZSTD_ldm_blockCompress()` emits the pre-defined sequences, and instead of encoding the literals directly, it passes them to a secondary block compressor. The code to handle chunk sizes greater than the block size is currently commented out, since it is unused. The next PR will uncomment exercise this code. * During optimal parsing, ensure LDM `minMatchLength` is at least `targetLength`. Also don't emit repcode matches in the LDM block compressor. Enabling the LDM with the optimal parser now actually improves the compression ratio. * The compression ratio is very similar to before. It is very slightly different, because the repcode handling is slightly different. If I remove immediate repcode checking in both branches the compressed size is exactly the same. * The speed looks to be the same or better than before. Up Next (in a separate PR) -------------------------- Allow sequence generation to happen prior to compression, and produce more than a block worth of sequences. Expose some API for zstdmt to consume. This will test out some currently untested code in `ZSTD_ldm_blockCompress()`.	2018-02-22 15:18:41 -08:00
Nick Terrell	6e128d3534	[BMI2] Add comments to the bmi2 variable in the contexts	2018-02-20 14:12:11 -08:00
Nick Terrell	b58f01537e	[compress] Support BMI2	2018-02-14 19:20:32 -08:00
Yann Collet	9945e60ac4	Merge branch 'dev' into flexibleLevel	2018-02-10 11:54:49 -08:00
Yann Collet	de68c2ff10	Merged ZSTD_preserveUnsortedMark() into ZSTD_reduceIndex() as it's faster, due to one memory scan instead of two (confirmed by microbenchmark). Note : as ZSTD_reduceIndex() is rarely invoked, it does not translate into a visible gain. Consider it an exercise in auto-vectorization and micro-benchmarking.	2018-02-07 14:22:35 -08:00
Yann Collet	5188749e1c	ensure compression parameters are updated when only compression level is changed	2018-02-02 16:31:20 -08:00

1 2

72 Commits