* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized
The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.
The copyright and license of `divsufsort.{h,c}` is not changed.
as suggested in #1441.
generally U32 and unsigned are the same thing,
except when they are not ...
case : 32-bit compilation for MIPS (uint32_t == unsigned long)
A vast majority of transformation consists in transforming U32 into unsigned.
In rare cases, it's the other way around (typically for internal code, such as seeds).
Among a few issues this patches solves :
- some parameters were declared with type `unsigned` in *.h,
but with type `U32` in their implementation *.c .
- some parameters have type unsigned*,
but the caller user a pointer to U32 instead.
These fixes are useful.
However, the bulk of changes is about %u formating,
which requires unsigned type,
but generally receives U32 values instead,
often just for brevity (U32 is shorter than unsigned).
These changes are generally minor, or even annoying.
As a consequence, the amount of code changed is larger than I would expect for such a patch.
Testing is also a pain :
it requires manually modifying `mem.h`,
in order to lie about `U32`
and force it to be an `unsigned long` typically.
On a 64-bit system, this will break the equivalence unsigned == U32.
Unfortunately, it will also break a few static_assert(), controlling structure sizes.
So it also requires modifying `debug.h` to make `static_assert()` a noop.
And then reverting these changes.
So it's inconvenient, and as a consequence,
this property is currently not checked during CI tests.
Therefore, these problems can emerge again in the future.
I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests.
It's another restriction for coding, adding more frustration during merge tests,
since most platforms don't need this distinction (hence contributor will not see it),
and while this can matter in theory, the number of platforms impacted seems minimal.
Thoughts ?
ensure that, when frequency[symbol]==0,
result is (tableLog + 1) bits
with both upper-bit and fractional-bit estimates.
Also : enable BIT_DEBUG in /tests
Estimate the cost for using FSE modes `set_basic`, `set_compressed`, and
`set_repeat`, and select the one with the lowest cost.
* The cost of `set_basic` is computed using the cross-entropy cost
function `ZSTD_crossEntropyCost()`, using the normalized default count
and the count.
* The cost of `set_repeat` is computed using `FSE_bitCost()`. We check the
previous table to see if it is able to represent the distribution.
* The cost of `set_compressed` is computed with the entropy cost function
`ZSTD_entropyCost()`, together with the cost of writing the normalized
count `ZSTD_NCountCost()`.
for FSE symbols.
While it seems to work, the gains are negligible compared to rough maxNbBits evaluation.
There are even a few losses sometimes, that still need to be explained.
Furthermode, there are still cases where btlazy2 does a better job than btopt,
which seems rather strange too.
clearer separation between variables and buffers
clearer buffers category
kept static buffers at the beginning, favoring cache locality
(it will be easier to add FSE tables there later)
This break a few assumptions that hashTable was always at the beginning.
This is fixed.
And remaining assumptions (namely that tables stand next to each other in memory)
are now tested with assert.