mirrors/git

mirror of https://github.com/git/git.git synced 2024-12-12 03:14:11 +08:00

Author	SHA1	Message	Date
Junio C Hamano	65c18913de	Merge branch 'pw/word-diff-zero-width-matches' The word-diff mode has been taught to work better with a word regexp that can match an empty string. * pw/word-diff-zero-width-matches: word diff: handle zero length matches	2021-05-14 08:26:06 +09:00
Phillip Wood	0324e8fc6b	word diff: handle zero length matches If find_word_boundaries() encounters a zero length match (which can be caused by matching a newline or using '' instead of '+' in the regex) we stop splitting the input into words which generates an inaccurate diff. To fix this increment the start point when there is a zero length match and try a new match. This is safe as posix regular expressions always return the longest available match so a zero length match means there are no longer matches available from the current position. Commit `bf82940dbf` (color-words: enable REG_NEWLINE to help user, 2009-01-17) prevented matching newlines in negated character classes but it is still possible for the user to have an explicit newline match in the regex which could cause a zero length match. One could argue that having explicit newline matches or using '' rather than '+' are user errors but it seems to be better to work round them than produce inaccurate diffs. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-05-05 18:53:42 +09:00
Atharva Raykar	a437390310	userdiff: add support for Scheme Add a diff driver for Scheme-like languages which recognizes top level and local `define` forms, whether it is a function definition, binding, syntax definition or a user-defined `define-xyzzy` form. Also supports R6RS `library` forms, `module` forms along with class and struct declarations used in Racket (PLT Scheme). Alternate "def" syntax such as those in Gerbil Scheme are also supported, like defstruct, defsyntax and so on. The rationale for picking `define` forms for the hunk headers is because it is usually the only significant form for defining the structure of the program, and it is a common pattern for schemers to have local function definitions to hide their visibility, so it is not only the top level `define`'s that are of interest. Schemers also extend the language with macros to provide their own define forms (for example, something like a `define-test-suite`) which is also captured in the hunk header. Since it is common practice to extend syntax with variants of a form like `module+`, `class*` etc, those have been supported as well. The word regex is a best-effort attempt to conform to R7RS[1] valid identifiers, symbols and numbers. [1] https://small.r7rs.org/attachment/r7rs.pdf (section 2.1) Signed-off-by: Atharva Raykar <raykar.ath@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-08 13:56:09 -07:00
Ævar Arnfjörð Bjarmason	ebd73f50c6	test libs: rename "diff-lib" to "lib-diff" Rename the "diff-lib" to "lib-diff". With this rename and preceding commits there is no remaining t/lib which doesn't follow the convention of being called t/lib-*. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-12 11:58:21 -08:00
Martin Ågren	c76b84a121	t: don't spuriously close and reopen quotes In the test scripts, the recommended style is, e.g.: test_expect_success 'name' ' do-something somehow && do-some-more testing ' When using this style, any single quote in the multi-line test section is actually closing the lone single quotes that surround it. It can be a non-issue in practice: test_expect_success 'sed a little' ' sed -e 's/hi/lo/' in >out # "ok": no whitespace in s/hi/lo/ ' Or it can be a bug in the test, e.g., because variable interpolation happens before the test even begins executing: v=abc test_expect_success 'variable interpolation' ' v=def && echo '"$v"' # abc ' Change several such in-test single quotes to use double quotes instead or, in a few cases, drop them altogether. These were identified using some crude grepping. We're not fixing any test bugs here, but we're hopefully making these tests slightly easier to grok and to maintain. There are legitimate use cases for closing a quote and opening a new one, e.g., both '\'' and '"'"' can be used to produce a literal single quote. I'm not touching any of those here. In t9401, tuck the redirecting ">" to the filename while we're touching those lines. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-06 15:14:32 -07:00
brian m. carlson	0253e126a2	t4034: abstract away SHA-1-specific constants Adjust the test so that it computes variables for object IDs instead of using hard-coded hashes. Move some expected result heredocs around so that they can use computed variables. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-28 11:34:58 +09:00
Stephen Boyd	3c81760bc6	userdiff: add a builtin pattern for dts files The Linux kernel receives many patches to the devicetree files each release. The hunk header for those patches typically show nothing, making it difficult to figure out what node is being modified without applying the patch or opening the file and seeking to the context. Let's add a builtin 'dts' pattern to git so that users can get better diff output on dts files when they use the diff=dts driver. The regex has been constructed based on the spec at devicetree.org[1] and with some help from Johannes Sixt. [1] https://github.com/devicetree-org/devicetree-specification/releases/latest Cc: Rob Herring <robh+dt@kernel.org> Cc: Frank Rowand <frowand.list@gmail.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-21 15:09:34 -07:00
William Duclot	0719f3eecd	userdiff: add built-in pattern for CSS CSS is widely used, motivating it being included as a built-in pattern. It must be noted that the word_regex for CSS (i.e. the regex defining what is a word in the language) does not consider '.' and '#' characters (in CSS selectors) to be part of the word. This behavior is documented by the test t/t4018/css-rule. The logic behind this behavior is the following: identifiers in CSS selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#' character are not part of the identifier, but an indicator of the nature of the identifier in HTML/XML (class or id). Diffing ".class1" and ".class2" must show that the class name is changed, but we still are selecting a class. Logic behind the "pattern" regex is: 1. reject lines ending with a colon/semicolon (properties) 2. if a line begins with a name in column 1, pick the whole line Credits to Johannes Sixt (j6t@kdbg.org) for the pattern regex and most of the tests. Signed-off-by: William Duclot <william.duclot@ensimag.grenoble-inp.fr> Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-06-03 14:45:56 -07:00
Yann Droneaud	ff73aa405f	t4034: use test_config/test_unconfig to set/unset git config variables Instead of using construct such as: test_when_finished "git config --unset <key>" git config <key> <value> uses test_config <key> <value> The latter takes care of removing <key> at the end of the test. Additionally, instead of git config <key> "" or git config --unset <key> uses test_unconfig <key> The latter doesn't failed if <key> is not defined. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-25 08:50:53 -07:00
Adrian Johnson	e90d065e64	Add userdiff patterns for Ada Add Ada xfuncname and wordRegex patterns to the list of builtin patterns. Signed-off-by: Adrian Johnson <ajohnson@redneon.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-09-16 21:54:47 -07:00
Thomas Rast	6440d3417c	diff: tweak a _copy_ of diff_options with word-diff When using word diff, the code sets the word_regex from various defaults if it was not set already. The problem is that it does this on the original diff_options, which will also be used in subsequent diffs. This means that when the word_regex is not given on the command line, only the first diff for which a setting for word_regex (either from attributes or diff.wordRegex) ever takes effect. This value then propagates to the rest of the diff runs and in particular prevents further attribute lookups. Fix the problem of changing diff state once and for all, by working with a _copy_ of the diff_options. Noticed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-03-14 14:41:20 -07:00
Johannes Sixt	62d39359af	t4034: diff.*.wordregex should not be "sticky" in --word-diff The test case applies a custom wordRegex to one file in a diff, and expects that the default word splitting applies to the second file in the diff. But the custom wordRegex is also incorrectly used for the second file. Helped-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-03-14 14:39:01 -07:00
Junio C Hamano	05c65cb116	Merge branch 'tr/maint-word-diff-incomplete-line' * tr/maint-word-diff-incomplete-line: word-diff: ignore '\ No newline at eof' marker	2012-01-18 15:16:19 -08:00
Thomas Rast	c7c2bc0ac9	word-diff: ignore '\ No newline at eof' marker The word-diff logic accumulates + and - lines until another line type appears (normally [ @\]), at which point it generates the word diff. This is usually correct, but it breaks when the preimage does not have a newline at EOF: $ printf "%s" "a a a" >a $ printf "%s\n" "a ab a" >b $ git diff --no-index --word-diff a b diff --git 1/a 2/b index 9f68e94..6a7c02f 100644 --- 1/a +++ 2/b @@ -1 +1 @@ [-a a a-] No newline at end of file {+a ab a+} Because of the order of the lines in a unified diff @@ -1 +1 @@ -a a a \ No newline at end of file +a ab a the '\' line flushed the buffers, and the - and + lines were never matched with each other. A proper fix would defer such markers until the end of the hunk. However, word-diff is inherently whitespace-ignoring, so as a cheap fix simply ignore the marker (and hide it from the output). We use a prefix match for '\ ' to parallel the logic in apply.c:parse_fragment(). We currently do not localize this string (just accept other variants of it in git-apply), but this should be future-proof. Noticed-by: Ivan Shirokoff <shirokoff@yandex-team.ru> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-01-12 11:27:41 -08:00
Gustaf Hendeby	53b10a1405	Add built-in diff patterns for MATLAB code MATLAB is often used in industry and academia for scientific computations motivating it being included as a built-in pattern. Signed-off-by: Gustaf Hendeby <hendeby@isy.liu.se> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-11-15 16:11:52 -08:00
Jim Meyering	42536dd9b9	do not read beyond end of malloc'd buffer With diff.suppress-blank-empty=true, "git diff --word-diff" would output data that had been read from uninitialized heap memory. The problem was that fn_out_consume did not account for the possibility of a line with length 1, i.e., the empty context line that diff.suppress-blank-empty=true converts from " \n" to "\n". Since it assumed there would always be a prefix character (the space), it decremented "len" unconditionally, thus passing len=0 to emit_line, which would then blindly call emit_line_0 with len=-1 which would pass that value on to fwrite as SIZE_MAX. Boom. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-20 11:39:49 -07:00
Junio C Hamano	5269edf170	t4034 (diff --word-diff): add a minimum Perl drier test vector Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-01-18 09:44:22 -08:00
Jonathan Nieder	5094d15874	t4034 (diff --word-diff): style suggestions Rearrange code to be easier to browse: - first data - then functions - then test assertions Mark up inline test vectors as cat >vector <<-\EOF data data EOF for visual scannability. Use words like "set up" for tests that set up for other tests, to make it obvious which tests are safe to skip. Use repeated function calls instead of a loop for the language-specific tests, so the invocations can be easily tweaked individually (for example if one starts to fail). This means if you add a new subdirectory to t4034/, it will not be automatically used. I think that's worth it for the added explicitness. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-01-18 09:02:23 -08:00
Thomas Rast	8d96e7288f	t4034: bulk verify builtin word regex sanity The builtin word regexes should be tested with some simple examples against simple issues. Do this in bulk. Mainly due to a lack of language knowledge and inspiration, most of the test cases (cpp, csharp, java, objc, pascal, php, python, ruby) are directly based off a C operator precedence table to verify that all operators are split correctly. This means that they are probably incomplete or inaccurate except for 'cpp' itself. Still, they are good enough to already have uncovered a typo in the python and ruby patterns. 'fortran' is based on my anecdotal knowledge of the DO10I parsing rules, and thus probably useless. The rest (bibtex, html, tex) are an ad-hoc test of what I consider important splits in those languages. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-01-18 08:51:58 -08:00
Junio C Hamano	b3ff808b71	Merge branch 'en/and-cascade-tests' * en/and-cascade-tests: (25 commits) t4124 (apply --whitespace): use test_might_fail t3404: do not use 'describe' to implement test_cmp_rev t3404 (rebase -i): introduce helper to check position of HEAD t3404 (rebase -i): move comment to description t3404 (rebase -i): unroll test_commit loops t3301 (notes): use test_expect_code for clarity t1400 (update-ref): use test_must_fail t1502 (rev-parse --parseopt): test exit code from "-h" t6022 (renaming merge): chain test commands with && test-lib: introduce test_line_count to measure files tests: add missing &&, batch 2 tests: add missing && Introduce sane_unset and use it to ensure proper && chaining t7800 (difftool): add missing && t7601 (merge-pull-config): add missing && t7001 (mv): add missing && t6016 (rev-list-graph-simplify-history): add missing && t5602 (clone-remote-exec): add missing && t4026 (color): remove unneeded and unchained command t4019 (diff-wserror): add lots of missing && ... Conflicts: t/t7006-pager.sh	2010-11-24 15:51:49 -08:00
Jonathan Nieder	a48fcd8369	tests: add missing && Breaks in a test assertion's && chain can potentially hide failures from earlier commands in the chain. Commands intended to fail should be marked with !, test_must_fail, or test_might_fail. The examples in this patch do not require that. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-11-09 11:59:49 -08:00
Kevin Ballard	a471833d51	test-lib: extend test_decode_color to handle more color codes Enhance the test_decode_color function to handle all common color codes, including background colors and escapes that contain multiple codes. This change necessitates changing <WHITE> to <BOLD>, so update t4034 as well. This change is necessary for the next commit in order to test background colors properly. Signed-off-by: Kevin Ballard <kevin@sb.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-10-20 16:10:14 -07:00
Thomas Rast	882749a04f	diff: add --word-diff option that generalizes --color-words This teaches the --color-words engine a more general interface that supports two new modes: * --word-diff=plain, inspired by the 'wdiff' utility (most similar to 'wdiff -n <old> <new>'): uses delimiters [-removed-] and {+added+} * --word-diff=porcelain, which generates an ad-hoc machine readable format: - each diff unit is prefixed by [-+ ] and terminated by newline as in unified diff - newlines in the input are output as a line consisting only of a tilde '~' Both of these formats still support color if it is enabled, using it to highlight the differences. --color-words becomes a synonym for --word-diff=color, which is the color-only format. Also adds some compatibility/convenience options. Thanks to Junio C Hamano and Miles Bader for good ideas. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-04-14 10:56:53 -07:00
Junio C Hamano	c2ff10c98e	Merge branch 'jk/1.7.0-status' * jk/1.7.0-status: status/commit: do not suggest "reset HEAD <path>" while merging commit/status: "git add <path>" is not necessarily how to resolve commit/status: check $GIT_DIR/MERGE_HEAD only once t7508-status: test all modes with color t7508-status: status --porcelain ignores relative paths setting status: reduce duplicated setup code status: disable color for porcelain format status -s: obey color.status builtin-commit: refactor short-status code into wt-status.c t7508-status.sh: Add tests for status -s status -s: respect the status.relativePaths option docs: note that status configuration affects only long format commit: support alternate status formats status: add --porcelain output format status: refactor format option parsing status: refactor short-mode printing to its own function status: typo fix in usage git status: not "commit --dry-run" anymore git stat -s: short status output git stat: the beginning of "status that is not a dry-run of commit" Conflicts: t/t4034-diff-words.sh wt-status.c	2009-12-27 23:01:32 -08:00
Michael J Gruber	68cfc6f551	t7508-status: test all modes with color Move a useful script function to decode colored output to text form from t4034 and use it in this test as well. Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-12-08 21:52:47 -08:00
Bert Wesarg	89cb73a19a	Give the hunk comment its own color Inspired by the coloring of quilt. Introduce a separate color and paint the hunk comment part, i.e. the name of the function, in a separate color "diff.func" (defaults to plain). Whitespace between hunk header and hunk comment is printed in plain color. Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-11-28 10:05:44 -08:00
Junio C Hamano	06a4755270	emit_line(): don't emit an empty <SET><RESET> followed by a newline When emit_line() is called with an empty line (but non-zero length, as we send line terminating LF or CRLF to the function), it used to emit <SET><RESET> followed by a newline. Stop the wastefulness. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-11-27 22:33:53 -08:00
Johannes Schindelin	a4ca1465ec	diff --color-words -U0: fix the location of hunk headers Colored word diff without context lines firstly printed all the hunk headers among each other and then printed the diff. This was due to the code relying on getting at least one context line at the end of each hunk, where the colored words would be flushed (it is done that way to be able to ignore rewrapped lines). Noticed by Markus Heidelberg. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-10-30 09:42:56 -07:00
Markus Heidelberg	168eff3c80	t4034-diff-words: add a test for word diff without context Signed-off-by: Markus Heidelberg <markus.heidelberg@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-10-30 09:42:52 -07:00
Boyd Stephen Smith Jr	ae3b970ac3	Change the spelling of "wordregex". Use "wordRegex" for configuration variable names. Use "word_regex" for C language tokens. Signed-off-by: Boyd Stephen Smith Jr. <bss@iguanasuicide.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-01-21 23:52:16 -08:00
Boyd Stephen Smith Jr	98a4d87b87	color-words: Support diff.wordregex config option When diff is invoked with --color-words (w/o =regex), use the regular expression the user has configured as diff.wordregex. diff drivers configured via attributes take precedence over the diff.wordregex-words setting. If the user wants to change them, they have their own configuration variables. Signed-off-by: Boyd Stephen Smith Jr <bss@iguanasuicide.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-01-21 00:51:12 -08:00
Thomas Rast	80c49c3de2	color-words: make regex configurable via attributes Make the --color-words splitting regular expression configurable via the diff driver's 'wordregex' attribute. The user can then set the driver on a file in .gitattributes. If a regex is given on the command line, it overrides the driver's setting. We also provide built-in regexes for the languages that already had funcname patterns, and add an appropriate diff driver entry for C/++. (The patterns are designed to run UTF-8 sequences into a single chunk to make sure they remain readable.) Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-01-17 10:44:21 -08:00
Johannes Schindelin	2b6a5417d7	color-words: take an optional regular expression describing words In some applications, words are not delimited by white space. To allow for that, you can specify a regular expression describing what makes a word with git diff --color-words='[A-Za-z0-9]+' Note that words cannot contain newline characters. As suggested by Thomas Rast, the words are the exact matches of the regular expression. Note that a regular expression beginning with a '^' will match only a word at the beginning of the hunk, not a word at the beginning of a line, and is probably not what you want. This commit contains a quoting fix by Thomas Rast. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-01-17 10:43:08 -08:00
Johannes Schindelin	2e5d2003b2	color-words: change algorithm to allow for 0-character word boundaries Up until now, the color-words code assumed that word boundaries are identical to white space characters. Therefore, it could get away with a very simple scheme: it copied the hunks, substituted newlines for each white space character, called libxdiff with the processed text, and then identified the text to output by the offsets (which agreed since the original text had the same length). This code was ugly, for a number of reasons: - it was impossible to introduce 0-character word boundaries, - we had to print everything word by word, and - the code needed extra special handling of newlines in the removed part. Fix all of these issues by processing the text such that - we build word lists, separated by newlines, - we remember the original offsets for every word, and - after calling libxdiff on the wordlists, we parse the hunk headers, and find the corresponding offsets, and then - we print the removed/added parts in one go. The pre and post samples in the test were provided by Santi Béjar. Note that there is some strange special handling of hunk headers where one line range is 0 due to POSIX: in this case, the start is one too low. In other words a hunk header '@@ -1,0 +2 @@' actually means that the line must be added after the _second_ line of the pre text, _not_ the first. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-01-17 10:42:41 -08:00

34 Commits