Move the oid khash table definition to khash.h and define a typedef for
it, similar to the one we have for unsigned char pointers. Define
variants that are maps as well.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no need to inline oidset_init(), as it's typically only called
twice in the lifetime of an oidset (once at the beginning and at the end
by oidset_clear()) and kh_resize_* is quite big, so move its definition
to oidset.c. Document it while we're at it.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reimplement oidset using khash.h in order to reduce its memory footprint
and make it faster.
Performance of a command that mainly checks for duplicate objects using
an oidset, with master and Clang 6.0.1:
$ cmd="./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)'"
$ /usr/bin/time $cmd >/dev/null
0.22user 0.03system 0:00.25elapsed 99%CPU (0avgtext+0avgdata 48484maxresident)k
0inputs+0outputs (0major+11204minor)pagefaults 0swaps
$ hyperfine "$cmd"
Benchmark #1: ./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)'
Time (mean ± σ): 250.0 ms ± 6.0 ms [User: 225.9 ms, System: 23.6 ms]
Range (min … max): 242.0 ms … 261.1 ms
And with this patch:
$ /usr/bin/time $cmd >/dev/null
0.14user 0.00system 0:00.15elapsed 100%CPU (0avgtext+0avgdata 41396maxresident)k
0inputs+0outputs (0major+8318minor)pagefaults 0swaps
$ hyperfine "$cmd"
Benchmark #1: ./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)'
Time (mean ± σ): 151.9 ms ± 4.9 ms [User: 130.5 ms, System: 21.2 ms]
Range (min … max): 148.2 ms … 170.4 ms
Initial-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
c3a9ad3117 ("oidset: add iterator methods to oidset", 2017-11-21)
introduced a 'oidset_init()' function in oidset.h, which has void as
return type, but returns an expression.
This makes the solaris compiler fail with:
"oidset.h", line 30: void function cannot return value
As the return type is void, and even the return type of the expression
we're trying to return (oidmap_init) is void just remove the return
statement to fix the compiler error.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the usual iterator methods to oidset.
Add oidset_remove().
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is similar to using the hashmap in hashmap.c, but with an
easier-to-use API. In particular, custom entry comparisons no longer
need to be written, and lookups can be done without constructing a
temporary entry structure.
This is implemented as a thin wrapper over the hashmap API. In
particular, this means that there is an additional 4-byte overhead due
to the fact that the first 4 bytes of the hash is redundantly stored.
For now, I'm taking the simpler approach, but if need be, we can
reimplement oidmap without affecting the callers significantly.
oidset has been updated to use oidmap.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is similar to many of our uses of sha1-array, but it
overcomes one limitation of a sha1-array: when you are
de-duplicating a large input with relatively few unique
entries, sha1-array uses 20 bytes per non-unique entry.
Whereas this set will use memory linear in the number of
unique entries (albeit a few more than 20 bytes due to
hashmap overhead).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>