`make all` will trigger several sub-directory makefiles.
several of them need `liblz4.a`.
When built with `-j#`, there are several concurrent liblz4.a built
Make liblz4.a a dependency, which is built once,
before moving to sub-directory Makefiles
according to GNU Makefile conventions,
the Makefile should feature a make check target
to self-test the generated program:
https://www.gnu.org/prep/standards/html_node/Standard-Targets.html .
this is much less thorough and less taxing than `make test`,
and can be run on any target in a reasonable timeframe (several seconds).
Having a dedicated file for optimal parser
made sense during its creation,
it allowed Przemyslaw to work more freely on lz4opt, with less dependency on lz4hc,
moreover, the optimal parser was more complex, with its own search functions.
Since the optimal was rewritten last year, it's now a lot lighter.
It makes more sense now to integrate it directly inside lz4hc.c,
making it easier to edit (editors are a bit "lost" inside a `*.h` dependent on its #include position),
it also reduces the number of files in the project,
which fits pretty well with lz4 objectives.
(adding lz4hc requires "just" lz4hc.h and lz4hc.c).
The LZ4 block format specification
states that the last match must start
at a minimum distance of 12 bytes from the end of the block.
However, out of an abundance of caution,
the reference implementation would actually stop searching matches
at 13 bytes from the end of the block.
This patch fixes this small detail.
The new version is now able to properly compress a limit case
such as `aaaaaaaabaaa\n`
as reported by Gao Xiang (@hsiangkao).
Obviously, it doesn't change a lot of things.
This is just one additional match candidate per block, with a maximum match length of 7 (since last 5 bytes must remain literals).
With default policy, blocks are 4 MB long, so it doesn't happen too often
Compressing silesia.tar at default level 1 saves 5 bytes (100930101 -> 100930096).
At max level 12, it saves a grand 16 bytes (77389871 -> 77389855).
The impact is a bit more visible when blocks are smaller, hence more numerous.
For example, compressing silesia with blocks of 64 KB (using -12 -B4D) saves 543 bytes (77304583 -> 77304040).
So the smaller the packet size, the more visible the impact.
And it happens we have a ton of scenarios with little blocks using LZ4 compression ...
And a useless "hooray" sidenote :
the patch improves the LZ4 compression record of silesia (using -12 -B7D --no-frame-crc) by 16 bytes (77270672 -> 77270656)
and the record on enwik9 by 44 bytes (371680396 -> 371680352) (previously claimed by [smallz4](http://create.stephan-brumme.com/smallz4/) ).
Previous method would produce too many time() invocations,
becoming a significant fraction of workload measured.
The new strategy is to use time() only once per batch,
and dynamically resize batch size so that each round lasts approximately 1 second.
This only matters for small inputs.
Measurement for large files (such as silesia.tar) are much less impacted
(though decoding speed is so fast that even medium-size files will notice an improvement).