Commit Graph

208 Commits

Author SHA1 Message Date
Jeff King
12a58f9014 xdiff: drop unused mmfile parameters from xdl_do_patience_diff()
The entry point to the patience-diff algorithm takes two mmfile_t
structs with the original file contents, but it doesn't actually do
anything useful with them. This is similar to the case recently cleaned
up in the histogram code via f1d019071e (xdiff: drop unused mmfile
parameters from xdl_do_histogram_diff(), 2022-08-19), but there's a bit
more subtlety going on.

We pass them into the recursive patience_diff(), which in turn passes
them into fill_hashmap(), which stuffs the pointers into a struct. But
the only thing which reads the struct fields is our recursion into
patience_diff()!

So it's unlikely that something like -Wunused-parameter could find this
case: it would have to detect the circular dependency caused by the
recursion (not to mention tracing across struct field assignments).

But once found, it's easy to have the compiler confirm what's going on:

  1. Drop the "file1" and "file2" fields from the hashmap struct
     definition. Remove the assignments in fill_hashmap(), and
     temporarily substitute NULL in the recursive call to
     patience_diff(). Compiling shows that no other code touched those
     fields.

  2. Now fill_hashmap() will trigger -Wunused-parameter. Drop "file1"
     and "file2" from its definition and callsite.

  3. Now patience_diff() will trigger -Wunused-parameter. Drop them
     there, too. One of the callsites is the recursion with our
     NULL values, so those temporary values go away.

  4. Now xdl_do_patience_diff() will trigger -Wunused-parameter. Drop
     them there. And we're done.

Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-20 14:14:55 -07:00
Jeff King
f1d019071e xdiff: drop unused mmfile parameters from xdl_do_histogram_diff()
These are no longer used since 9df0fc3d57 (xdiff: fix a memory leak,
2022-02-16), as the caller is expected to call xdl_prepare_env() itself.
After that change the histogram code only examines the prepared
xdfenv_t, not the original buffers.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 12:20:24 -07:00
Phillip Wood
f7b587bf65 xdiff: introduce XDL_ALLOC_GROW()
Add a helper to grow an array. This is analogous to ALLOC_GROW() in
the rest of the codebase but returns −1 on allocation failure to
accommodate other users of libxdiff such as libgit2. It will also
return a error if the multiplication overflows while calculating the
new allocation size. Note that this keeps doubling on reallocation
like the code it is replacing rather than increasing the existing size
by half like ALLOC_GROW(). It does however copy ALLOC_GROW()'s trick
of adding a small amount to the new allocation to avoid a lot of
reallocations at small sizes.

Note that xdl_alloc_grow_helper() uses long rather than size_t for
`nr` and `alloc` to match the existing code.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-08 09:34:30 -07:00
Phillip Wood
848fd5ae5b xdiff: introduce XDL_CALLOC_ARRAY()
Add a helper for allocating an array and initialize the elements to
zero. This is analogous to CALLOC_ARRAY() in the rest of the codebase
but it returns NULL on allocation failures rather than dying to
accommodate other users of libxdiff such as libgit2.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-08 09:34:30 -07:00
Phillip Wood
18aae7e21e xdiff: introduce xdl_calloc
In preparation for introducing XDL_CALLOC_ARRAY() use calloc() to
obtain zeroed out memory rather than malloc() followed by memset(). To
try and keep the lines a reasonable length this commit also stops
casting the pointer returned by calloc() as this is unnecessary.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-08 09:34:30 -07:00
Phillip Wood
abf04bdaa8 xdiff: introduce XDL_ALLOC_ARRAY()
Add a helper to allocate an array that automatically calculates the
allocation size. This is analogous to ALLOC_ARRAY() in the rest of the
codebase but returns NULL if the allocation fails to accommodate other
users of libxdiff such as libgit2. The helper will also return NULL if
the multiplication in the allocation calculation overflows.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-08 09:34:30 -07:00
Junio C Hamano
538dc459a0 Merge branch 'ep/maint-equals-null-cocci'
Introduce and apply coccinelle rule to discourage an explicit
comparison between a pointer and NULL, and applies the clean-up to
the maintenance track.

* ep/maint-equals-null-cocci:
  tree-wide: apply equals-null.cocci
  tree-wide: apply equals-null.cocci
  contrib/coccinnelle: add equals-null.cocci
2022-05-20 15:26:59 -07:00
Junio C Hamano
2b0a58d164 Merge branch 'ep/maint-equals-null-cocci' for maint-2.35
* ep/maint-equals-null-cocci:
  tree-wide: apply equals-null.cocci
  contrib/coccinnelle: add equals-null.cocci
2022-05-02 10:06:04 -07:00
Junio C Hamano
afe8a9070b tree-wide: apply equals-null.cocci
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-02 09:50:37 -07:00
Ævar Arnfjörð Bjarmason
8c0cfadd9a xdiff/xmacros.h: remove unused XDL_PTRFREE
This macro was added in 3443546f6e (Use a *real* built-in diff
generator, 2006-03-24), but none of the xdiff code uses it, it uses
xdl_free() directly.

If we need its functionality again we'll use the FREE_AND_NULL() macro
added in 481df65f4f (git-compat-util: add a FREE_AND_NULL() wrapper
around free(ptr); ptr = NULL, 2017-06-15).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-01 10:16:06 -07:00
Phillip Wood
43ad3af380 xdiff: handle allocation failure when merging
Other users of xdiff such as libgit2 need to be able to handle
allocation failures. These allocation failures were previously
ignored.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-16 10:58:16 -08:00
Phillip Wood
4a37b80e88 xdiff: refactor a function
Use the standard "goto out" pattern rather than repeating very similar
code after checking for each error. This will simplify the next commit
that starts handling allocation failures that are currently ignored.
On error xdl_do_diff() frees the environment so we need to take care
to avoid a double free in that case. xdl_build_script() does not
assign a result unless it is successful so there is no possibility of
a double free if it fails.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-16 10:58:15 -08:00
Phillip Wood
61f883965f xdiff: handle allocation failure in patience diff
Other users of libxdiff such as libgit2 need to be able to handle
allocation failures. As NULL is a valid return value the function
signature is changed to be able report allocation failures.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-16 10:58:13 -08:00
Phillip Wood
9df0fc3d57 xdiff: fix a memory leak
Although the patience and histogram algorithms initialize the
environment they do not free it if there is an error. In contrast for
the Myers algorithm the environment is initalized in xdl_do_diff() and
it is freed if there is an error. Fix this by always initializing the
environment in xdl_do_diff() and freeing it there if there is an
error. Remove the comment in do_patience_diff() about the environment
being freed by xdl_diff() as it is not accurate because (a) xdl_diff()
does not do that if there is an error and (b) xdl_diff() is not the
only caller.

Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-16 10:58:05 -08:00
Junio C Hamano
a0f5ca94dd Merge branch 'pw/xdiff-classify-record-in-histogram'
"diff --histogram" optimization.

* pw/xdiff-classify-record-in-histogram:
  xdiff: drop unused flags parameter from recs_match
  xdiff: drop xpparam_t parameter from histogram cmp_recs()
  xdiff: drop CMP_ENV macro from xhistogram
  xdiff: simplify comparison
  xdiff: avoid unnecessary memory allocations
  diff histogram: intern strings
2021-12-21 15:03:14 -08:00
Jeff King
1e45db1214 xdiff: drop unused flags parameter from recs_match
Since 6b13bc3232 (xdiff: simplify comparison, 2021-11-17), we do not
call xdl_recmatch() from xdiffi.c's recs_match(), so we no longer need
the "flags" parameter. That in turn lets us drop the flags parameters
from the group-slide functions which call it.

There's no functional change here; it's just making these functions a
little simpler.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-04 23:27:53 -08:00
Jeff King
25449450c0 xdiff: drop xpparam_t parameter from histogram cmp_recs()
Since 663c5ad035 (diff histogram: intern strings, 2021-11-17), our
cmp_recs() does not call xdl_recmatch(), and thus no longer needs an
xpparam_t struct from which to get the flags. We can drop the unused
parameter from the function, as well as the macro which wraps it.

There's no functional change here; it's just simplifying things (and
making -Wunused-parameter happier).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-04 23:27:53 -08:00
Jeff King
cf0b26d90c xdiff: drop CMP_ENV macro from xhistogram
This macro has been unused since 43ca7530df (xdiff/xhistogram: rely on
xdl_trim_ends(), 2011-08-01). The function that called it went away
there, and its use in the CMP() macro was inlined. It probably should
have been deleted then, but nobody noticed.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-04 23:27:53 -08:00
Phillip Wood
4496526f80 xdiff: implement a zealous diff3, or "zdiff3"
"zdiff3" is identical to ordinary diff3 except that it allows compaction
of common lines on the two sides of history at the beginning or end of
the conflict hunk.  For example, the following diff3 conflict:

    1
    2
    3
    4
    <<<<<<
    A
    B
    C
    D
    E
    ||||||
    5
    6
    ======
    A
    X
    C
    Y
    E
    >>>>>>
    7
    8
    9

has common lines 'A', 'C', and 'E' on the two sides.  With zdiff3, one
would instead get the following conflict:

    1
    2
    3
    4
    A
    <<<<<<
    B
    C
    D
    ||||||
    5
    6
    ======
    X
    C
    Y
    >>>>>>
    E
    7
    8
    9

Note that the common lines, 'A', and 'E' were moved outside the
conflict.  Unlike with the two-way conflicts from the 'merge'
conflictStyle, the zdiff3 conflict is NOT split into multiple conflict
regions to allow the common 'C' lines to be shown outside a conflict,
because zdiff3 shows the base version too and the base version cannot be
reasonably split.

Note also that the removing of lines common to the two sides might make
the remaining text inside the conflict region match the base text inside
the conflict region (for example, if the diff3 conflict had '5 6 E' on
the right side of the conflict, then the common line 'E' would be moved
outside and both the base and right side's remaining conflict text would
be the lines '5' and '6').  This has the potential to surprise users and
make them think there should not have been a conflict, but there
definitely was a conflict and it should remain.

Based-on-patch-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Co-authored-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-01 14:45:58 -08:00
Phillip Wood
6b13bc3232 xdiff: simplify comparison
Now that the histogram algorithm calls xdl_classify_record() it is no
longer necessary to use xdl_recmatch() to compare lines, it is
sufficient just to compare the hash values. This has a negligible
effect on performance.

Test                                  HEAD~1            HEAD
-----------------------------------------------------------------------------
4000.1: log -3000 (baseline)          0.19(0.12+0.07)   0.18(0.14+0.04) -5.3%
4000.2: log --raw -3000 (tree-only)   0.98(0.81+0.16)   0.98(0.79+0.18) +0.0%
4000.3: log -p -3000 (Myers)          4.81(4.23+0.56)   4.80(4.26+0.53) -0.2%
4000.4: log -p -3000 --histogram      5.83(5.11+0.70)   5.82(5.15+0.65) -0.2%
4000.5: log -p -3000 --patience       5.31(4.61+0.69)   5.30(4.54+0.75) -0.2%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-18 22:23:49 -08:00
Phillip Wood
b82dd3f7f2 xdiff: avoid unnecessary memory allocations
rindex and ha are only used by xdl_cleanup_records() which is not
called by the histogram or patience algorithms. The perf test results
show a small reduction in run time but that is probably within the
noise.

Test                                  HEAD^             HEAD
-----------------------------------------------------------------------------
4000.1: log -3000 (baseline)          0.19(0.17+0.02)   0.19(0.12+0.07) +0.0%
4000.2: log --raw -3000 (tree-only)   0.98(0.78+0.20)   0.98(0.81+0.16) +0.0%
4000.3: log -p -3000 (Myers)          4.81(4.15+0.64)   4.81(4.23+0.56) +0.0%
4000.4: log -p -3000 --histogram      5.87(5.19+0.66)   5.83(5.11+0.70) -0.7%
4000.5: log -p -3000 --patience       5.35(4.60+0.73)   5.31(4.61+0.69) -0.7%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-18 22:23:45 -08:00
Phillip Wood
663c5ad035 diff histogram: intern strings
Histogram is the only diff algorithm not to call
xdl_classify_record(). xdl_classify_record() ensures that the hash
values of two strings that are not equal differ which means that it is
not necessary to use xdl_recmatch() when comparing lines, all that is
necessary is to compare the hash values. This gives a 7% reduction in
the runtime of "git log --patch" when using the histogram diff
algorithm.

Test                                  HEAD^             HEAD
-----------------------------------------------------------------------------
4000.1: log -3000 (baseline)          0.18(0.14+0.04)   0.19(0.17+0.02) +5.6%
4000.2: log --raw -3000 (tree-only)   0.99(0.77+0.21)   0.98(0.78+0.20) -1.0%
4000.3: log -p -3000 (Myers)          4.84(4.31+0.51)   4.81(4.15+0.64) -0.6%
4000.4: log -p -3000 --histogram      6.34(5.86+0.46)   5.87(5.19+0.66) -7.4%
4000.5: log -p -3000 --patience       5.39(4.60+0.76)   5.35(4.60+0.73) -0.7%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-18 22:23:31 -08:00
Junio C Hamano
4da281e84d Merge branch 'ab/pickaxe-pcre2'
Rewrite the backend for "diff -G/-S" to use pcre2 engine when
available.

* ab/pickaxe-pcre2: (22 commits)
  xdiff-interface: replace discard_hunk_line() with a flag
  xdiff users: use designated initializers for out_line
  pickaxe -G: don't special-case create/delete
  pickaxe -G: terminate early on matching lines
  xdiff-interface: allow early return from xdiff_emit_line_fn
  xdiff-interface: prepare for allowing early return
  pickaxe -S: slightly optimize contains()
  pickaxe: rename variables in has_changes() for brevity
  pickaxe -S: support content with NULs under --pickaxe-regex
  pickaxe: assert that we must have a needle under -G or -S
  pickaxe: refactor function selection in diffcore-pickaxe()
  perf: add performance test for pickaxe
  pickaxe/style: consolidate declarations and assignments
  diff.h: move pickaxe fields together again
  pickaxe: die when --find-object and --pickaxe-all are combined
  pickaxe: die when -G and --pickaxe-regex are combined
  pickaxe tests: add missing test for --no-pickaxe-regex being an error
  pickaxe tests: test for -G, -S and --find-object incompatibility
  pickaxe tests: add test for "log -S" not being a regex
  pickaxe tests: add test for diffgrep_consume() internals
  ...
2021-07-13 16:52:50 -07:00
Junio C Hamano
8e48f115aa Merge branch 'ab/xdiff-bug-cleanup'
Code clean-up.

* ab/xdiff-bug-cleanup:
  xdiff: use BUG(...), not xdl_bug(...)
2021-07-08 13:15:04 -07:00
Ævar Arnfjörð Bjarmason
546096a5cb xdiff: use BUG(...), not xdl_bug(...)
The xdl_bug() function was introduced in
e8adf23d1e (xdl_change_compact(): introduce the concept of a change
group, 2016-08-22), let's use our usual BUG() function instead.

We'll now have meaningful line numbers if we encounter bugs in xdiff,
and less code duplication.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-08 10:09:28 +09:00
Junio C Hamano
daffa8961b Merge branch 'pw/patience-diff-clean-up'
Code clean-up.

* pw/patience-diff-clean-up:
  patience diff: remove unused variable
  patience diff: remove unnecessary string comparisons
2021-05-14 08:26:08 +09:00
Ævar Arnfjörð Bjarmason
5d93460024 xdiff-interface: replace discard_hunk_line() with a flag
Remove the dummy discard_hunk_line() function added in
3b40a090fd (diff: avoid generating unused hunk header lines,
2018-11-02) in favor of having a new XDL_EMIT_NO_HUNK_HDR flag, for
use along with the two existing and similar XDL_EMIT_* flags.

Unlike the recently amended xdiff_emit_line_fn interface which'll be
called in a loop in xdl_emit_diff(), the hunk header is only emitted
once.

It makes more sense to pass this as a flag than provide a dummy
callback because that function may be able to skip doing certain work
if it knows the caller is doing nothing with the hunk header.

It would be possible to do so in the case of -U0 now, but the benefit
of doing so is so small that I haven't bothered. But this leaves the
door open to that, and more importantly makes the API use more
intuitive.

The reason we're putting a flag in the gap between 1<<0 and 1<<2 is
that the old 1<<1 flag was removed in 907681e940 (xdiff: drop
XDL_EMIT_COMMON, 2016-02-23) without re-ordering the remaining flags.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Phillip Wood
f91371b948 patience diff: remove unused variable
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 18:56:48 +09:00
Phillip Wood
204aa2d24d patience diff: remove unnecessary string comparisons
xdl_prepare_env() calls xdl_classify_record() which arranges for the
hashes of non-matching lines to be different so lines can be tested
for equality by comparing just their hashes.

This reduces the time taken to calculate the diff of v2.28.0 to
v2.29.0 by ~3-4%.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 18:56:48 +09:00
Michał Kępień
296d4a94e7 diff: add -I<regex> that ignores matching changes
Add a new diff option that enables ignoring changes whose all lines
(changed, removed, and added) match a given regular expression.  This is
similar to the -I/--ignore-matching-lines option in standalone diff
utilities and can be used e.g. to ignore changes which only affect code
comments or to look for unrelated changes in commits containing a large
number of automatically applied modifications (e.g. a tree-wide string
replacement).  The difference between -G/-S and the new -I option is
that the latter filters output on a per-change basis.

Use the 'ignore' field of xdchange_t for marking a change as ignored or
not.  Since the same field is used by --ignore-blank-lines, identical
hunk emitting rules apply for --ignore-blank-lines and -I.  These two
options can also be used together in the same git invocation (they are
complementary to each other).

Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate
that it is logically a "sibling" of xdl_mark_ignorable_regex() rather
than its "parent".

Signed-off-by: Michał Kępień <michal@isc.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 12:53:26 -07:00
Michał Kępień
ec7967cfaf merge-base, xdiff: zero out xpparam_t structures
xpparam_t structures are usually zero-initialized before their specific
fields are assigned to, but there are three locations in the tree where
that does not happen.  Add the missing memset() calls in order to make
initialization of xpparam_t structures consistent tree-wide and to
prevent stack garbage from being used as field values.

Signed-off-by: Michał Kępień <michal@isc.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 12:53:26 -07:00
Junio C Hamano
f0070a7df9 Merge branch 'rs/xdiff-ignore-ws-w-func-context'
The "diff" machinery learned not to lose added/removed blank lines
in the context when --ignore-blank-lines and --function-context are
used at the same time.

* rs/xdiff-ignore-ws-w-func-context:
  xdiff: unignore changes in function context
2019-12-16 13:08:32 -08:00
René Scharfe
0bb313a552 xdiff: unignore changes in function context
Changes involving only blank lines are hidden with --ignore-blank-lines,
unless they appear in the context lines of other changes.  This is
handled by xdl_get_hunk() for context added by --inter-hunk-context, -u
and -U.

Function context for -W and --function-context added by xdl_emit_diff()
doesn't pay attention to such ignored changes; it relies fully on
xdl_get_hunk() and shows just the post-image of ignored changes
appearing in function context.  That's inconsistent and confusing.

Improve the result of using --ignore-blank-lines and --function-context
together by fully showing ignored changes if they happen to fall within
function context.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-05 09:30:06 -08:00
Johannes Schindelin
03d3b1297c xdiffi: fix typos and touch up comments
Inspired by the thoroughly stale https://github.com/git/git/pull/159,
this patch fixes a couple of typos, rewraps and clarifies some comments.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-09 10:58:34 +09:00
Junio C Hamano
d163b6ae6b Merge branch 'cb/xdiff-no-system-includes-in-dot-c'
Compilation fix.

* cb/xdiff-no-system-includes-in-dot-c:
  xdiff: remove duplicate headers from xpatience.c
  xdiff: remove duplicate headers from xhistogram.c
  xdiff: drop system includes in xutils.c
2019-07-31 14:38:56 -07:00
Junio C Hamano
c3d4c20c7d Merge branch 'jk/xdiff-clamp-funcname-context-index'
The internal diff machinery can be made to read out of bounds while
looking for --funcion-context line in a corner case, which has been
corrected.

* jk/xdiff-clamp-funcname-context-index:
  xdiff: clamp function context indices in post-image
2019-07-29 12:39:13 -07:00
Carlo Marcelo Arenas Belón
81ed2b405c xdiff: remove duplicate headers from xpatience.c
92b7de93fb (Implement the patience diff algorithm, 2009-01-07) added them
but were already part of xinclude.h

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-28 21:51:22 -07:00
Carlo Marcelo Arenas Belón
29a0f9038e xdiff: remove duplicate headers from xhistogram.c
8c912eea94 ("teach --histogram to diff", 2011-07-12) included them, but
were already part of xinclude.h

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-28 21:51:21 -07:00
Carlo Marcelo Arenas Belón
0d0e1e85aa xdiff: drop system includes in xutils.c
After b46054b374 ("xdiff: use git-compat-util", 2019-04-11), two system
headers added with 6942efcfa9 ("xdiff: load full words in the inner loop
of xdl_hash_record", 2012-04-06) to xutils.c are no longer needed and
could conflict as shown below from an OpenIndiana build:

In file included from xdiff/xinclude.h:26:0,
                 from xdiff/xutils.c:25:
./git-compat-util.h:4:0: warning: "_FILE_OFFSET_BITS" redefined
 #define _FILE_OFFSET_BITS 64

In file included from /usr/include/limits.h:37:0,
                 from xdiff/xutils.c:23:
/usr/include/sys/feature_tests.h:231:0: note: this is the location of the previous definition
 #define _FILE_OFFSET_BITS 32

Make sure git-compat-util.h is the first header (through xinclude.h)

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-28 21:51:19 -07:00
Jeff King
b777f3fd61 xdiff: clamp function context indices in post-image
After finding a function line for --function-context in the pre-image,
xdl_emit_diff() calculates the equivalent line in the post-image.  It
assumes that the lines between changes are the same on both sides.  If
the option --ignore-blank-lines was also given then this is not
necessarily true.

Clamp the calculation results for start and end of the function context
to prevent out-of-bounds array accesses.

Note that this _just_ fixes the case where our mismatch sends us off the
beginning of the file. There are likely other cases where our assumption
causes us to go to the wrong line within the file. Nobody has developed
a test case yet, and the ultimate fix is likely more complicated than
this patch. But this at least prevents a segfault in the meantime.

Credit for finding the bug goes to "Liu Wei of Tencent Security Xuanwu
Lab".

Reported-by: 刘炜 <lw17qhdz@gmail.com>
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-23 14:26:13 -07:00
Jeff King
36c8319724 xdiff: use xmalloc/xrealloc
Most of xdiff uses a bare malloc() to allocate memory, and returns an
error when we get NULL. However, there are a few spots which don't check
the return value and may segfault, including at least xdl_merge() and
xpatience.c's find_longest_common_sequence().

Let's use xmalloc() everywhere instead, so that we get a graceful die()
for these cases, without having to do further auditing. This does mean
the existing cases which check errors will now die() instead of
returning an error up the stack. But:

  - that's how the rest of Git behaves already for malloc errors

  - all of the callers of xdi_diff(), etc, die upon seeing an error

So while we might one day want to fully lib-ify the diff code and make
it possible to use as part of a long-running process, we're not close to
that now. And because we're just tweaking the xdl_malloc() macro here,
we're not really moving ourselves any further away from that. We
could, for example, simplify some of the functions which handle malloc()
errors which can no longer occur. But that would probably be taking us
in the wrong direction.

This also makes our malloc handling more consistent with the rest of
Git, including enforcing GIT_ALLOC_LIMIT and trying to reclaim pack
memory when needed.

Reported-by: 王健强 <jianqiang.wang@securitygossip.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-12 13:34:17 +09:00
Jeff King
b46054b374 xdiff: use git-compat-util
Since the xdiff library was not originally part of Git, it does its own
system includes. Let's instead use git-compat-util, which has two
benefits:

  1. It adjusts for any system-specific quirks in how or what we should
     include (though xdiff's needs are light enough that this hasn't
     been a problem in the past).

  2. It lets us use wrapper functions like xmalloc().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-12 13:34:17 +09:00
Jeff King
611e42a598 xdiff: provide a separate emit callback for hunks
The xdiff library always emits hunk header lines to our callbacks as
formatted strings like "@@ -a,b +c,d @@\n". This is convenient if we're
going to output a diff, but less so if we actually need to compute using
those numbers, which requires re-parsing the line.

In preparation for moving away from this, let's teach xdiff a new
callback function which gets the broken-out hunk information. To help
callers that don't want to use this new callback, if it's NULL we'll
continue to format the hunk header into a string.

Note that this function renames the "outf" callback to "out_line", as
well. This isn't strictly necessary, but helps in two ways:

  1. Now that there are two callbacks, it's nice to use more descriptive
     names.

  2. Many callers did not zero the emit_callback_data struct, and needed
     to be modified to set ecb.out_hunk to NULL. By changing the name of
     the existing struct member, that guarantees that any new callers
     from in-flight topics will break the build and be examined
     manually.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-02 20:43:02 +09:00
Junio C Hamano
791ad49483 Merge branch 'sb/indent-heuristic-optim'
"git diff --indent-heuristic" had a bad corner case performance.

* sb/indent-heuristic-optim:
  xdiff: reduce indent heuristic overhead
2018-08-17 13:09:57 -07:00
Junio C Hamano
57fbd8efb0 Merge branch 'sb/histogram-less-memory'
"git diff --histogram" had a bad memory usage pattern, which has
been rearranged to reduce the peak usage.

* sb/histogram-less-memory:
  xdiff/histogram: remove tail recursion
  xdiff/xhistogram: move index allocation into find_lcs
  xdiff/xhistogram: factor out memory cleanup into free_index()
  xdiff/xhistogram: pass arguments directly to fall_back_to_classic_diff
2018-08-15 15:08:25 -07:00
Stefan Beller
301ef85401 xdiff: reduce indent heuristic overhead
Skip searching for better indentation heuristics if we'd slide a hunk more
than its size. This is the easiest fix proposed in the analysis[1] in
response to a patch that mercurial took for xdiff to limit searching
by a constant. Using a performance test as:

     #!python
     open('a', 'w').write(" \n" * 1000000)
     open('b', 'w').write(" \n" * 1000001)

This patch reduces the execution of "git diff --no-index a b" from
0.70s to 0.31s. However limiting the sliding to the size of the diff hunk,
which was proposed as a solution (that I found easiest to implement for
now) is not optimal for cases like

     open('a', 'w').write(" \n" * 1000000)
     open('b', 'w').write(" \n" * 2000000)

as then we'd still slide 1000000 times.

In addition to limiting the sliding to size of the hunk, also limit by a
constant. Choose 100 lines as the constant as that fits more than a screen,
which really means that the diff sliding is probably not providing a lot
of benefit anyway.

[1] https://public-inbox.org/git/72ac1ac2-f567-f241-41d6-d0f83072e0b3@alum.mit.edu/

Reported-by: Jun Wu <quark@fb.com>
Analysis-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-01 13:36:22 -07:00
Stefan Beller
79cb2ebb92 xdiff/histogram: remove tail recursion
When running the same reproduction script as the previous patch,
it turns out the stack is too small, which can be easily avoided.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-23 10:12:16 -07:00
Stefan Beller
64c4e8bccd xdiff/xhistogram: move index allocation into find_lcs
This fixes a memory issue when recursing a lot, which can be reproduced as

    seq 1   100000 >one
    seq 1 4 100000 >two
    git diff --no-index --histogram one two

Before this patch, histogram_diff would call itself recursively before
calling free_index, which would mean a lot of memory is allocated during
the recursion and only freed afterwards. By moving the memory allocation
(and its free call) into find_lcs, the memory is free'd before we recurse,
such that memory is reused in the next step of the recursion instead of
using new memory.

This addresses only the memory pressure, not the run time complexity,
that is also awful for the corner case outlined above.

Helpful in understanding the code (in addition to the sparse history of
this file), was https://stackoverflow.com/a/32367597 which reproduces
most of the code comments of the JGit implementation.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-19 12:46:03 -07:00
Stefan Beller
c671d4b599 xdiff/xhistogram: factor out memory cleanup into free_index()
This will be useful in the next patch as we'll introduce multiple
callers.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-19 12:46:01 -07:00
Stefan Beller
282098506f xdiff/xhistogram: pass arguments directly to fall_back_to_classic_diff
By passing the 'xpp' and 'env' argument directly to the function
'fall_back_to_classic_diff', we eliminate an occurrence of the 'index'
in histogram_diff, which will prove useful in a bit.

While at it, move it up in the file. This will make the diff of
one of the next patches more legible.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-19 12:46:00 -07:00