mirror of
https://git.code.sf.net/p/ntfs-3g/ntfs-3g.git
synced 2024-12-18 06:21:28 +08:00
154 lines
7.2 KiB
Plaintext
154 lines
7.2 KiB
Plaintext
|
|
Description of the NTFS (de)compression algorithm (based on a modified LZ77
|
|
algorithm)
|
|
|
|
Copyright (c) 2001 Anton Altaparmakov
|
|
|
|
This document is published under the GNU General Public License.
|
|
|
|
Credits: This is based on notes taken from various places (most notably from
|
|
Regis Duchesne's NTFS documentation and from various LZ77 descriptions) and
|
|
further refined by looking at a few compressed streams to figure out some
|
|
uncertainties.
|
|
|
|
Note: You should also read the runlist description with regards to compression
|
|
in linux-ntfs/include/layout.h. Just search for "Attribute compression".
|
|
FIXME: Should merge the info from there into this document some time.
|
|
|
|
Compressed data is organized in logical "compression" blocks (cb). Each cb has
|
|
a size (cb_size) of 2^compression_unit clusters. In all versions of Windows,
|
|
NTFS (NT/2k/XP, NTFS 1.2-3.1), the only valid compression_unit is 4, IOW, each
|
|
cb is 2^4 = 16 clusters in size.
|
|
|
|
We detect and warn about a compression_unit != 4 but we try to decompress the
|
|
data anyway.
|
|
|
|
Compression is only supported for cluster sizes between 512 and 4096. Thus a
|
|
cb can be between 8 and 64kiB in size.
|
|
|
|
Each cb is independent of the other cbs and is thus the minimal unit we have
|
|
to parse even if we wanted to decompress only one byte.
|
|
|
|
Also, a cb can be totally uncompressed and this would be indicated as a sparse
|
|
cb in the runlist.
|
|
|
|
Thus, we need to look at the runlist of the compressed data stream, starting
|
|
at the beginning of the first cb overlapping @page. So we convert the page
|
|
offset into units of clusters (vcn), and round the vcn down to a mutliple of
|
|
cb_size clusters.
|
|
|
|
We then scan the runlist for the appropriate position. Based on what we find
|
|
there, we decide how to proceed.
|
|
|
|
If the cb is not compressed at all, and covers the whole of @page, we pretend
|
|
to be accessing an uncompressed file, so we fall back to what we do in
|
|
aops.c::ntfs_file_readpage(), i.e. we do:
|
|
return block_read_full_page(page, ntfs_file_get_block);
|
|
|
|
If the cb is completely sparse, and covers the whole of @page, we can just
|
|
zero out @page and complete the io (set @page up-to-date, unlock it, and
|
|
finally return 0).
|
|
|
|
In all other cases we initiate the decompression engine, but first some more
|
|
on the compression algorithm.
|
|
|
|
Before compression the data of each cb is further divided into 4kiB blocks, we
|
|
call them "sub compression" blocks (sb), each including a header specifying
|
|
its compressed length. So we could just scan the cb for the first sb
|
|
overlapping @page and skip the sbs before that, or we could decompress the
|
|
whole cb injecting the superfluous decompressed pages into the page cache as a
|
|
form of read ahead (this is what zisofs does for example).
|
|
|
|
In either case, we then need to read and decompress all sbs overlapping @page,
|
|
potentially having to decompress one or more other cbs, too.
|
|
|
|
As soon as @page is completed we could either stop or continue until we finish
|
|
the current cb, injecting pages as we go along (again following the zisofs
|
|
example).
|
|
|
|
Because the sbs follow each other directly, we need to actually read in the
|
|
whole cb in order to be able to scan through the cb to find the first sb
|
|
overlapping @page, so it does make sense to follow the zisofs approach of
|
|
decompressing the whole cb and injecting pages as we go along. So all
|
|
discussion from now on will assume that we are going to do that. Although it
|
|
might make sense not to decompress any sbs locate before @page because this
|
|
would be a kind of "read-behind" which is probably silly, unless someone is
|
|
reading the file backwards. Performing read-ahead by decompressing all sbs
|
|
following @page OTOH, is very likely to be a good idea.
|
|
|
|
So, we read the whole cb from disk and start at the first sb.
|
|
|
|
As mentioned above, each sb is started with a header. The header is 16 bits of
|
|
which the lower twelve bits (i.e. bits 0 to 11) are the length (L) - 3 of the
|
|
sb (including the two bytes for the header itself, or L - 1 not counting the
|
|
two bytes for the header). The higher four bits are set to 1011 (0xb) by the
|
|
compressor for a compressed block, or to 0000 for an uncompressed block, but
|
|
the decompressor only checks the most significant bit taking a 1 to signify a
|
|
compressed block, and a 0 an uncompressed block.
|
|
|
|
So from the header we know how many compressed bytes we need to decompress to
|
|
obtain the next 4kiB of uncompressed data and if we didn't want to decompress
|
|
this sb we could just seek to the next next one using the length read from the
|
|
header. We could then continue seeking until we reach the first sb overlapping
|
|
@page.
|
|
|
|
In either case, we will reach a sb which we want to decompress.
|
|
|
|
Having dealt with the 16-bit header of the sb, we now have length bytes of
|
|
compressed data to decompress. This compressed stream is further split into
|
|
tokens which are organized into groups of eight tokens. Each token group (tg)
|
|
starts with a tag byte, which is an eight bit bitmap, the bits specifying the
|
|
type of each of the following eight tokens. The least significant bit (LSB)
|
|
corresponds to the first token and the most significant bit (MSB) corresponds
|
|
to the last token.
|
|
|
|
The two types of tokens are symbol tokens, specified by a zero bit, and phrase
|
|
tokens, specified by a set bit.
|
|
|
|
A symbol token (st) is a single byte and is to be taken literally and copied
|
|
into the sliding window (the decompressed data).
|
|
|
|
A phrase token (pt) is a pointer back into the sliding window (in bytes),
|
|
together with a length (again in bytes), starting at the byte the back pointer
|
|
is pointing to. Thus a phrase token defines a sequence of bytes in the sliding
|
|
window which need to be copied at the current position into the sliding window
|
|
(the decompressed data stream).
|
|
|
|
Each pt consists of 2 bytes split into the back pointer (p) and the length (l),
|
|
each of variable bit width (but the sum of the widths of p and l is fixed at
|
|
16 bits). p is at least 4 bits and l is at most 12 bits.
|
|
|
|
The most significant bits contain the back pointer (p), while the least
|
|
significant bits contain the length (l).
|
|
|
|
l is actually stored as the number of bytes minus 3 (unsigned) as anything
|
|
shorter than that would be at least as long as the 2 bytes needed for the
|
|
actual pt, so no compression would be achieved.
|
|
|
|
p is stored as the positive number of bytes minus 1 (unsigned) as going zero
|
|
bytes back is meaningless.
|
|
|
|
Note that decompression has to occur byte by byte, as it is possible that some
|
|
of the bytes pointed to by the pt will only be generated in the sliding window
|
|
as the byte sequence pointed to by the pt is being copied into it!
|
|
|
|
To give a concrete example: a block full of the letter A would be compressed
|
|
by storing the byte A once as a symbol token, followed by a single phrase
|
|
token with back pointer -1 (p = 0, therefore go back by -(0 + 1) bytes) and
|
|
length 4095 (l=0xffc, therefore length 0xffc + 3 bytes).
|
|
|
|
The widths of p and l are determined from the current position within the
|
|
decompressed data (cur_pos). We don't actually care about the widths as such
|
|
however, but instead we want the mask (l_mask) with which to AND the pt to
|
|
obtain l, and the number of bits (p_shift) by which to right shift the pt to
|
|
obtain p. These are determined using the following algorithm:
|
|
|
|
for (i = cur_pos, l_mask = 0xfff, p_shift = 12; i >= 0x10; i >>= 1) {
|
|
l_mask >>= 1;
|
|
p_shift--;
|
|
}
|
|
|
|
Note, that as usual in NTFS, the sb header, as well as each pt, are stored in
|
|
little endian format.
|
|
|