Commit Graph

8 Commits

Author SHA1 Message Date
Yann Collet
7a225c0c46 internal benchmark: can select size of generated synthetic sample 2024-02-20 15:47:09 -08:00
Yann Collet
83598aa106 datagen generates lorem ipsum by default 2024-02-20 15:24:25 -08:00
Yann Collet
7003c9905e increase word dictionary
for higher variety of messages.
Now, level 5 compresses better than level 4 (by a hair).
2024-02-20 13:27:36 -08:00
Yann Collet
3dbd861b7d runtime weight distribution table
and made small words a bit more common.
2024-02-20 12:26:37 -08:00
Yann Collet
5a1bb4a4e0 add question marks
and (slightly) longer sentences.
2024-02-20 00:37:21 -08:00
Yann Collet
40874d4aea enriched vocabulary again
using real latin sentences from Cicero.

Compression ratio lower again, closer to "real" text,

now level 6 is way better than level 4.

level 5 is still lower than level 4,
but at least it's now higher than level 3.
2024-02-20 00:30:29 -08:00
Yann Collet
1e046ce7fa increase vocabulary size
makes compression a bit less good,
hence a bit more comparable with real text (though still too easy to compress).
level 6 is now stronger than level 4, by a hair.
However, there is still a ratio dip at level 5.
2024-02-20 00:12:32 -08:00
Yann Collet
d0b7da30e2 add a lorem ipsum generator
this generator replaces the statistical generator
for the general case when no statistic is requested.

Generated data features a compression level speed / ratio curve
which is more in line with expectation.
2024-01-29 15:00:32 -08:00