git/t/helper
Garima Singh f1294eaf7f bloom.c: introduce core Bloom filter constructs
Introduce the constructs for Bloom filters, Bloom filter keys
and Bloom filter settings.
For details on what Bloom filters are and how they work, refer
to Dr. Derrick Stolee's blog post [1]. It provides a concise
explanation of the adoption of Bloom filters as described in
[2] and [3].

Implementation specifics:
1. We currently use 7 and 10 for the number of hashes and the
   size of each entry respectively. They served as great starting
   values, the mathematical details behind this choice are
   described in [1] and [4]. The implementation, while not
   completely open to it at the moment, is flexible enough to allow
   for tweaking these settings in the future.

   Note: The performance gains we have observed with these values
   are significant enough that we did not need to tweak these
   settings. The performance numbers are included in the cover letter
   of this series and in the commit message of the subsequent commit
   where we use Bloom filters to speed up `git log -- path`.

2. As described in [1] and [3], we do not need 7 independent hashing
   functions. We use the Murmur3 hashing scheme, seed it twice and
   then combine those to procure an arbitrary number of hash values.

3. The filters will be sized according to the number of changes in
   each commit, in multiples of 8 bit words.

[1] Derrick Stolee
      "Supercharging the Git Commit Graph IV: Bloom Filters"
      https://devblogs.microsoft.com/devops/super-charging-the-git-commit-graph-iv-Bloom-filters/

[2] Flavio Bonomi, Michael Mitzenmacher, Rina Panigrahy, Sushil Singh, George Varghese
    "An Improved Construction for Counting Bloom Filters"
    http://theory.stanford.edu/~rinap/papers/esa2006b.pdf
    https://doi.org/10.1007/11841036_61

[3] Peter C. Dillinger and Panagiotis Manolios
    "Bloom Filters in Probabilistic Verification"
    http://www.ccs.neu.edu/home/pete/pub/Bloom-filters-verification.pdf
    https://doi.org/10.1007/978-3-540-30494-4_26

[4] Thomas Mueller Graf, Daniel Lemire
    "Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters"
    https://arxiv.org/abs/1912.08258

Helped-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30 09:59:53 -07:00
..
.gitignore t/helper: ignore only executable files 2019-09-20 11:13:13 -07:00
test-advise.c advice: revamp advise API 2020-03-05 06:15:02 -08:00
test-bloom.c bloom.c: introduce core Bloom filter constructs 2020-03-30 09:59:53 -07:00
test-chmtime.c
test-config.c config: provide access to the current line number 2020-02-10 10:52:10 -08:00
test-ctype.c
test-date.c test_date.c: remove reference to GIT_TEST_DATE_NOW 2019-09-18 14:15:01 -07:00
test-delta.c
test-dir-iterator.c Merge branch 'mt/dir-iterator-updates' 2019-08-09 10:13:14 -07:00
test-drop-caches.c Sync with 2.17.3 2019-12-06 16:29:15 +01:00
test-dump-cache-tree.c
test-dump-fsmonitor.c fsmonitor: change last update timestamp on the index_state to opaque token 2020-01-13 14:58:43 -08:00
test-dump-split-index.c t/helper/test-dump-split-index: initialize git repository 2020-02-24 09:33:24 -08:00
test-dump-untracked-cache.c
test-example-decorate.c object: convert lookup_unknown_object() to use object_id 2019-06-20 10:06:19 -07:00
test-fake-ssh.c
test-genrandom.c
test-genzeros.c tests: teach the test-tool to generate NUL bytes and use it 2019-02-19 10:22:21 -08:00
test-hash-speed.c
test-hash.c
test-hashmap.c hashmap_entry: remove first member requirement from docs 2019-10-07 10:20:12 +09:00
test-index-version.c
test-json-writer.c
test-lazy-init-name-hash.c OFFSETOF_VAR macro to simplify hashmap iterators 2019-10-07 10:20:11 +09:00
test-line-buffer.c
test-match-trees.c match-trees.c: remove the_repo from shift_tree*() 2019-06-27 12:45:17 -07:00
test-mergesort.c
test-mktemp.c
test-oidmap.c test-oidmap: remove 'add' subcommand 2019-07-01 10:26:28 -07:00
test-online-cpus.c
test-parse-options.c parse-options: add testcases for OPT_CMDMODE() 2020-02-20 13:20:40 -08:00
test-parse-pathspec-file.c t: directly test parse_pathspec_file() 2020-01-15 12:14:20 -08:00
test-path-utils.c real_path: remove unsafe API 2020-03-10 11:41:40 -07:00
test-pkt-line.c
test-prio-queue.c test-prio-queue: use xmalloc 2019-04-12 13:34:17 +09:00
test-progress.c test-progress: fix test failures on big-endian systems 2019-10-21 09:53:49 +09:00
test-reach.c
test-read-cache.c test-read-cache: drop namelen variable 2019-09-06 11:03:39 -07:00
test-read-graph.c commit-graph.h: use odb in 'load_commit_graph_one_fd_st' 2020-02-04 11:36:51 -08:00
test-read-midx.c
test-ref-store.c Merge branch 'cc/test-ref-store-typofix' 2019-02-05 14:26:13 -08:00
test-regex.c
test-repository.c t/helper: make repository tests hash independent 2020-02-24 09:33:27 -08:00
test-revision-walking.c
test-run-command.c Merge branch 'js/mingw-inherit-only-std-handles' 2019-12-10 13:11:42 -08:00
test-scrap-cache-tree.c
test-serve-v2.c Turn git serve into a test helper 2019-04-19 14:03:24 +09:00
test-sha1-array.c
test-sha1.c
test-sha1.sh
test-sha256.c
test-sigchain.c
test-strcmp-offset.c
test-string-list.c
test-submodule-config.c
test-submodule-nested-repo-config.c
test-subprocess.c
test-svn-fe.c
test-tool.c bloom.c: add the murmur3 hash implementation 2020-03-30 09:59:53 -07:00
test-tool.h bloom.c: add the murmur3 hash implementation 2020-03-30 09:59:53 -07:00
test-trace2.c trace2: t/helper/test-trace2, t0210.sh, t0211.sh, t0212.sh 2019-02-22 15:28:22 -08:00
test-urlmatch-normalization.c
test-wildmatch.c
test-windows-named-pipe.c use strpbrk(3) to search for characters from a given set 2020-02-24 09:30:31 -08:00
test-write-cache.c
test-xml-encode.c