2006-04-05 17:03:58 +08:00
|
|
|
#ifndef XDIFF_INTERFACE_H
|
|
|
|
#define XDIFF_INTERFACE_H
|
|
|
|
|
2024-06-14 14:50:32 +08:00
|
|
|
#include "hash.h"
|
2006-04-05 17:03:58 +08:00
|
|
|
#include "xdiff/xdiff.h"
|
|
|
|
|
xdiff: reject files larger than ~1GB
The xdiff code is not prepared to handle extremely large
files. It uses "int" in many places, which can overflow if
we have a very large number of lines or even bytes in our
input files. This can cause us to produce incorrect diffs,
with no indication that the output is wrong. Or worse, we
may even underallocate a buffer whose size is the result of
an overflowing addition.
We're much better off to tell the user that we cannot diff
or merge such a large file. This patch covers both cases,
but in slightly different ways:
1. For merging, we notice the large file and cleanly fall
back to a binary merge (which is effectively "we cannot
merge this").
2. For diffing, we make the binary/text distinction much
earlier, and in many different places. For this case,
we'll use the xdi_diff as our choke point, and reject
any diff there before it hits the xdiff code.
This means in most cases we'll die() immediately after.
That's not ideal, but in practice we shouldn't
generally hit this code path unless the user is trying
to do something tricky. We already consider files
larger than core.bigfilethreshold to be binary, so this
code would only kick in when that is circumvented
(either by bumping that value, or by using a
.gitattribute to mark a file as diffable).
In other words, we can avoid being "nice" here, because
there is already nice code that tries to do the right
thing. We are adding the suspenders to the nice code's
belt, so notice when it has been worked around (both to
protect the user from malicious inputs, and because it
is better to die() than generate bogus output).
The maximum size was chosen after experimenting with feeding
large files to the xdiff code. It's just under a gigabyte,
which leaves room for two obvious cases:
- a diff3 merge conflict result on files of maximum size X
could be 3*X plus the size of the markers, which would
still be only about 3G, which fits in a 32-bit int.
- some of the diff code allocates arrays of one int per
record. Even if each file consists only of blank lines,
then a file smaller than 1G will have fewer than 1G
records, and therefore the int array will fit in 4G.
Since the limit is arbitrary anyway, I chose to go under a
gigabyte, to leave a safety margin (e.g., we would not want
to overflow by allocating "(records + 1) * sizeof(int)" or
similar.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 07:12:45 +08:00
|
|
|
/*
|
|
|
|
* xdiff isn't equipped to handle content over a gigabyte;
|
|
|
|
* we make the cutoff 1GB - 1MB to give some breathing
|
|
|
|
* room for constant-sized additions (e.g., merge markers)
|
|
|
|
*/
|
|
|
|
#define MAX_XDIFF_SIZE (1024UL * 1024 * 1023)
|
|
|
|
|
xdiff-interface: allow early return from xdiff_emit_line_fn
Finish the change started in the preceding commit and allow an early
return from "xdiff_emit_line_fn" callbacks, this will allows
diffcore-pickaxe.c to save itself redundant work.
Our xdiff interface also had the limitation of not being able to abort
early since the beginning, see d9ea73e0564 (combine-diff: refactor
built-in xdiff interface., 2006-04-05). Although at that time
"xdiff_emit_line_fn" was called "xdiff_emit_consume_fn", and
"xdiff_emit_hunk_fn" didn't exist yet.
There was some work in this area of xdiff-interface.[ch] recently with
3b40a090fd4 (diff: avoid generating unused hunk header lines,
2018-11-02) and 7c61e25fbf1 (diff: use hunk callback for word-diff,
2018-11-02).
In combination those two changes allow us to not do any work on the
hunks and diff at all, but didn't change the status quo with regards
to consumers that e.g. want the diff lines, but might want to abort
early.
Whereas now we can abort e.g. on the first "-line" of a 1000 line diff
if that's all we needed.
This interface is rather scary as noted in the comment to
xdiff-interface.h being added here, as noted there a future change
could add more exit codes, and hack xdl_emit_diff() and friends to
ignore or skip things more selectively as a result.
I did not see an inherent reason for why xdl_emit_{diffrec,record}()
could not be changed to ferry the "xdiff_emit_line_fn" error code
upwards instead of returning -1 on all "ret < 0".
But doing so would require corresponding changes in xdl_emit_diff(),
xdl_diff(). I didn't see any issue with narrowly doing that to
accomplish what I needed here, but it would leave xdiff's own return
values in an inconsistent state.
Instead I've left it at returning a more conventional (for git's own
codebase) 1 for an early return, and translating it (or rather, all
non-zero) to -1 for xdiff's consumption.
The reason for most of the "stop" complexity in xdiff_outf() is
because we want to be able to abort early, but do so in a way that
doesn't skip the appropriate strbuf_reset() invocations.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-13 01:15:25 +08:00
|
|
|
/**
|
|
|
|
* The `xdiff_emit_line_fn` function can return 1 to abort early, or 0
|
|
|
|
* to continue processing. Note that doing so is an all-or-nothing
|
|
|
|
* affair, as returning 1 will return all the way to the top-level,
|
|
|
|
* e.g. the xdi_diff_outf() call to generate the diff.
|
|
|
|
*
|
|
|
|
* Thus returning 1 means you won't be getting any more diff lines. If
|
|
|
|
* you need something in-between those two options you'll to use
|
|
|
|
* `xdl_emit_hunk_consume_func_t` and implement your own version of
|
|
|
|
* xdl_emit_diff().
|
|
|
|
*
|
|
|
|
* We may extend the interface in the future to understand other more
|
|
|
|
* granular return values. While you should return 1 to exit early,
|
|
|
|
* doing so will currently make your early return indistinguishable
|
|
|
|
* from an error internal to xdiff, xdiff itself will see that
|
|
|
|
* non-zero return and translate it to -1.
|
2021-04-13 01:15:26 +08:00
|
|
|
*
|
|
|
|
* See "diff_grep" in diffcore-pickaxe.c for a trick to work around
|
|
|
|
* this, i.e. using the "consume_callback_data" to note the desired
|
|
|
|
* early return.
|
xdiff-interface: allow early return from xdiff_emit_line_fn
Finish the change started in the preceding commit and allow an early
return from "xdiff_emit_line_fn" callbacks, this will allows
diffcore-pickaxe.c to save itself redundant work.
Our xdiff interface also had the limitation of not being able to abort
early since the beginning, see d9ea73e0564 (combine-diff: refactor
built-in xdiff interface., 2006-04-05). Although at that time
"xdiff_emit_line_fn" was called "xdiff_emit_consume_fn", and
"xdiff_emit_hunk_fn" didn't exist yet.
There was some work in this area of xdiff-interface.[ch] recently with
3b40a090fd4 (diff: avoid generating unused hunk header lines,
2018-11-02) and 7c61e25fbf1 (diff: use hunk callback for word-diff,
2018-11-02).
In combination those two changes allow us to not do any work on the
hunks and diff at all, but didn't change the status quo with regards
to consumers that e.g. want the diff lines, but might want to abort
early.
Whereas now we can abort e.g. on the first "-line" of a 1000 line diff
if that's all we needed.
This interface is rather scary as noted in the comment to
xdiff-interface.h being added here, as noted there a future change
could add more exit codes, and hack xdl_emit_diff() and friends to
ignore or skip things more selectively as a result.
I did not see an inherent reason for why xdl_emit_{diffrec,record}()
could not be changed to ferry the "xdiff_emit_line_fn" error code
upwards instead of returning -1 on all "ret < 0".
But doing so would require corresponding changes in xdl_emit_diff(),
xdl_diff(). I didn't see any issue with narrowly doing that to
accomplish what I needed here, but it would leave xdiff's own return
values in an inconsistent state.
Instead I've left it at returning a more conventional (for git's own
codebase) 1 for an early return, and translating it (or rather, all
non-zero) to -1 for xdiff's consumption.
The reason for most of the "stop" complexity in xdiff_outf() is
because we want to be able to abort early, but do so in a way that
doesn't skip the appropriate strbuf_reset() invocations.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-13 01:15:25 +08:00
|
|
|
*/
|
2021-04-13 01:15:24 +08:00
|
|
|
typedef int (*xdiff_emit_line_fn)(void *, char *, unsigned long);
|
2018-11-02 14:35:45 +08:00
|
|
|
typedef void (*xdiff_emit_hunk_fn)(void *data,
|
|
|
|
long old_begin, long old_nr,
|
|
|
|
long new_begin, long new_nr,
|
|
|
|
const char *func, long funclen);
|
2006-04-05 17:03:58 +08:00
|
|
|
|
2007-12-14 05:25:07 +08:00
|
|
|
int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t const *xecfg, xdemitcb_t *ecb);
|
Make xdi_diff_outf interface for running xdiff_outf diffs
To prepare for the need to initialize and release resources for an
xdi_diff with the xdiff_outf output function, make a new function to
wrap this usage.
Old:
ecb.outf = xdiff_outf;
ecb.priv = &state;
...
xdi_diff(file_p, file_o, &xpp, &xecfg, &ecb);
New:
xdi_diff_outf(file_p, file_o, &state.xm, &xpp, &xecfg, &ecb);
Signed-off-by: Brian Downing <bdowning@lavos.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-14 13:36:50 +08:00
|
|
|
int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
|
2018-11-02 14:35:45 +08:00
|
|
|
xdiff_emit_hunk_fn hunk_fn,
|
|
|
|
xdiff_emit_line_fn line_fn,
|
|
|
|
void *consume_callback_data,
|
2010-05-05 04:41:34 +08:00
|
|
|
xpparam_t const *xpp, xdemitconf_t const *xecfg);
|
2006-12-21 00:37:07 +08:00
|
|
|
int read_mmfile(mmfile_t *ptr, const char *filename);
|
2016-09-06 04:08:02 +08:00
|
|
|
void read_mmblob(mmfile_t *ptr, const struct object_id *oid);
|
2007-06-05 10:36:11 +08:00
|
|
|
int buffer_is_binary(const char *ptr, unsigned long size);
|
2006-04-05 17:03:58 +08:00
|
|
|
|
2019-04-29 16:28:14 +08:00
|
|
|
void xdiff_set_find_func(xdemitconf_t *xecfg, const char *line, int cflags);
|
|
|
|
void xdiff_clear_find_func(xdemitconf_t *xecfg);
|
config: add ctx arg to config_fn_t
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).
In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.
Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:
- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed
Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.
The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:
- trace2/tr2_cfg.c:tr2_cfg_set_fl()
This is indirectly called by git_config_set() so that the trace2
machinery can notice the new config values and update its settings
using the tr2 config parsing function, i.e. tr2_cfg_cb().
- builtin/checkout.c:checkout_main()
This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
This might be worth refactoring away in the future, since
git_xmerge_config() can call git_default_config(), which can do much
more than just parsing.
Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-29 03:26:22 +08:00
|
|
|
struct config_context;
|
2024-03-15 01:05:03 +08:00
|
|
|
int parse_conflict_style_name(const char *value);
|
config: add ctx arg to config_fn_t
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).
In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.
Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:
- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed
Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.
The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:
- trace2/tr2_cfg.c:tr2_cfg_set_fl()
This is indirectly called by git_config_set() so that the trace2
machinery can notice the new config values and update its settings
using the tr2 config parsing function, i.e. tr2_cfg_cb().
- builtin/checkout.c:checkout_main()
This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
This might be worth refactoring away in the future, since
git_xmerge_config() can call git_default_config(), which can do much
more than just parsing.
Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-29 03:26:22 +08:00
|
|
|
int git_xmerge_config(const char *var, const char *value,
|
|
|
|
const struct config_context *ctx, void *cb);
|
2008-08-30 01:49:56 +08:00
|
|
|
extern int git_xmerge_style;
|
2007-07-06 15:45:10 +08:00
|
|
|
|
2017-10-26 02:49:11 +08:00
|
|
|
/*
|
|
|
|
* Compare the strings l1 with l2 which are of size s1 and s2 respectively.
|
|
|
|
* Returns 1 if the strings are deemed equal, 0 otherwise.
|
|
|
|
* The `flags` given as XDF_WHITESPACE_FLAGS determine how white spaces
|
2019-11-06 01:07:23 +08:00
|
|
|
* are treated for the comparison.
|
2017-10-26 02:49:11 +08:00
|
|
|
*/
|
2019-04-29 16:28:14 +08:00
|
|
|
int xdiff_compare_lines(const char *l1, long s1,
|
2019-04-29 16:28:23 +08:00
|
|
|
const char *l2, long s2, long flags);
|
2017-10-26 02:49:11 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Returns a hash of the string s of length len.
|
|
|
|
* The `flags` given as XDF_WHITESPACE_FLAGS determine how white spaces
|
|
|
|
* are treated for the hash.
|
|
|
|
*/
|
2019-04-29 16:28:14 +08:00
|
|
|
unsigned long xdiff_hash_string(const char *s, size_t len, long flags);
|
2017-10-26 02:49:11 +08:00
|
|
|
|
2006-04-05 17:03:58 +08:00
|
|
|
#endif
|