clarified dictionary in format description

This commit is contained in:
Yann Collet 2016-09-02 17:04:49 -07:00
parent 2b26ad1947
commit 855766d73d
4 changed files with 19 additions and 18 deletions

1
NEWS
View File

@ -2,6 +2,7 @@ v1.0.1
New : contrib/pzstd, parallel version of zstd, by Nick Terrell
Fixed : CLI -d output to stdout by default when input is stdin (#322)
Fixed : CLI correctly detects console on Mac OS-X
Fixed : compatibility with OpenBSD, reported by Juan Francisco Cantero Hurtado (#319)
Fixed : zstd-pgo, reported by octoploid (#329)
v1.0.0

View File

@ -8,7 +8,6 @@
*/
/*-*******************************************************
* Compiler specifics
*********************************************************/

View File

@ -463,12 +463,6 @@ static U32 ZDICT_dictSize(const dictItem* dictList)
}
#define DISPLAYUPDATE(l, ...) if (g_displayLevel>=l) { \
if (ZDICT_clockSpan(displayClock) > refreshRate) \
{ displayClock = clock(); DISPLAY(__VA_ARGS__); \
if (g_displayLevel>=4) fflush(stdout); } }
static const clock_t refreshRate = CLOCKS_PER_SEC * 3 / 10;
static size_t ZDICT_trainBuffer(dictItem* dictList, U32 dictListSize,
const void* const buffer, size_t bufferSize, /* buffer must end with noisy guard band */
const size_t* fileSizes, unsigned nbFiles,
@ -481,6 +475,12 @@ static size_t ZDICT_trainBuffer(dictItem* dictList, U32 dictListSize,
U32* filePos = (U32*)malloc(nbFiles * sizeof(*filePos));
size_t result = 0;
clock_t displayClock = 0;
clock_t const refreshRate = CLOCKS_PER_SEC * 3 / 10;
# define DISPLAYUPDATE(l, ...) if (g_displayLevel>=l) { \
if (ZDICT_clockSpan(displayClock) > refreshRate) \
{ displayClock = clock(); DISPLAY(__VA_ARGS__); \
if (g_displayLevel>=4) fflush(stdout); } }
/* init */
DISPLAYLEVEL(2, "\r%70s\r", ""); /* clean display line */

View File

@ -551,7 +551,7 @@ Let's presume the following Huffman tree must be described :
The tree depth is 4, since its smallest element uses 4 bits.
Value `5` will not be listed, nor will values above `5`.
Values from `0` to `4` will be listed using `Weight` instead of `Number_of_Bits`.
Weight formula is :
Weight formula is :
```
Weight = Number_of_Bits ? (Max_Number_of_Bits + 1 - Number_of_Bits) : 0
```
@ -779,7 +779,7 @@ which specifies `Baseline` and `Number_of_Bits` to add.
_Codes_ are FSE compressed,
and interleaved with raw additional bits in the same bitstream.
##### Literals length codes
##### Literals length codes
Literals length codes are values ranging from `0` to `35` included.
They define lengths from 0 to 131071 bytes.
@ -1126,10 +1126,10 @@ When `Repeated_Offset2` is used, it's swapped with `Repeated_Offset1`.
Dictionary format
-----------------
`zstd` is compatible with "pure content" dictionaries, free of any format restriction.
`zstd` is compatible with "raw content" dictionaries, free of any format restriction.
But dictionaries created by `zstd --train` follow a format, described here.
__Pre-requisites__ : a dictionary has a known length,
__Pre-requisites__ : a dictionary has a size,
defined either by a buffer limit, or a file size.
| `Magic_Number` | `Dictionary_ID` | `Entropy_Tables` | `Content` |
@ -1151,20 +1151,21 @@ _Reserved ranges :_
- high range : >= (2^31)
__`Entropy_Tables`__ : following the same format as a [compressed blocks].
They are stored in following order :
Huffman tables for literals, FSE table for offsets,
FSE table for match lengths, and FSE table for literals lengths.
It's finally followed by 3 offset values, populating recent offsets,
stored in order, 4-bytes little-endian each, for a total of 12 bytes.
They are stored in following order :
Huffman tables for literals, FSE table for offsets,
FSE table for match lengths, and FSE table for literals lengths.
It's finally followed by 3 offset values, populating recent offsets,
stored in order, 4-bytes little-endian each, for a total of 12 bytes.
__`Content`__ : Where the actual dictionary content is.
Content size depends on Dictionary size.
__`Content`__ : The rest of the dictionary is its content.
The content act as a "past" in front of data to compress or decompress.
[compressed blocks]: #the-format-of-compressed_block
Version changes
---------------
- 0.2.1 : clarify field names, by Przemyslaw Skibinski
- 0.2.0 : numerous format adjustments for zstd v0.8
- 0.1.2 : limit Huffman tree depth to 11 bits
- 0.1.1 : reserved dictID ranges