The bug is a read up to 2 bytes past the end of the buffer.
There are three cases for this bug, one for each test case added.
* An empty input causes `token = *ip++` to read one byte too far.
* A one byte input with `(token >> ML_BITS) == RUN_MASK` causes
one extra byte to be read without validation. This could be
combined with the first bug to cause 2 extra bytes to be read.
* The case pointed out in issue #508, where `ip == iend` at the
beginning of the loop after taking the shortcut.
Benchmarks show no regressions on clang or gcc-7 on both my mac
and devserver.
Fixes#508.
The notes about "security guarantee" and "malicious inputs" seemed
a bit non-technical to me, so I took the liberty to tone them down
and instead describe the actual risks in technical terms. Namely,
the function never writes past the end of the output buffer, so
a direct hostile takeover (resulting in arbitrary code execution
soon after the return from the function) is not possible. However,
the application can crash because of reads from unmapped pages.
I also took the liberty to describe what I believe is the only sensible
usage scenario for the function: "This function is only usable if the
originalSize of uncompressed data is known in advance," etc.
The simple change from
`matchIndex+MAX_DISTANCE < current`
towards
`current - matchIndex > MAX_DISTANCE`
is enough to generate a 10% performance drop under clang.
Quite massive.
(I missed as my eyes were concentrated on gcc performance at that time).
The second version is more robust, because it also survives a situation where
`matchIndex > current`
due to overflows.
The first version requires matchIndex to not overflow.
Hence were added `assert()` conditions.
The only case where this can happen is with dictCtx compression,
in the case where the dictionary context is not initialized before loading the dictionary.
So it's enough to always initialize the context while loading the dictionary.