mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-24 10:26:17 +08:00

Author	SHA1	Message	Date
Patrick Steinhardt	8c1bc2a71a	revision: add new parameter to exclude hidden refs Users can optionally hide refs from remote users in git-upload-pack(1), git-receive-pack(1) and others via the `transfer.hideRefs`, but there is not an easy way to obtain the list of all visible or hidden refs right now. We'll require just that though for a performance improvement in our connectivity check. Add a new option `--exclude-hidden=` that excludes any hidden refs from the next pseudo-ref like `--all` or `--branches`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-11-17 16:22:52 -05:00
Patrick Steinhardt	1e9f273ac0	revision: introduce struct to handle exclusions The functions that handle exclusion of refs work on a single string list. We're about to add a second mechanism for excluding refs though, and it makes sense to reuse much of the same architecture for both kinds of exclusion. Introduce a new `struct ref_exclusions` that encapsulates all the logic related to excluding refs and move the `struct string_list` that holds all wildmatch patterns of excluded refs into it. Rename functions that operate on this struct to match its name. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-11-17 16:22:52 -05:00
Patrick Steinhardt	05b9425960	revision: move together exclusion-related functions Move together the definitions of functions that handle exclusions of refs so that related functionality sits in a single place, only. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-11-17 16:22:51 -05:00
Ævar Arnfjörð Bjarmason	916ebb327c	revisions API: extend the nascent REV_INFO_INIT macro Have the REV_INFO_INIT macro added in [1] declare more members of "struct rev_info" that we can initialize statically, and have repo_init_revisions() do so with the memcpy(..., &blank) idiom introduced in [2]. As the comment for the "REV_INFO_INIT" macro notes this still isn't sufficient to initialize a "struct rev_info" for use yet. But we are getting closer to that eventual goal. Even though we can't fully initialize a "struct rev_info" with REV_INFO_INIT it's useful for readability to clearly separate those things that we can statically initialize, and those that we can't. This change could replace the: list_objects_filter_init(&revs->filter); In the repo_init_revisions() with this line, at the end of the REV_INFO_INIT deceleration in revisions.h: .filter = LIST_OBJECTS_FILTER_INIT, \ But doing so would produce a minor conflict with an outstanding topic[3]. Let's skip that for now. I have follow-ups to initialize more of this statically, e.g. changes to get rid of grep_init(). We can initialize more members with the macro in a future series. 1. `f196c1e908` (revisions API users: use release_revisions() needing REV_INFO_INIT, 2022-04-13) 2. `5726a6b401` (.c _init(): define in terms of corresponding *_INIT macro, 2021-07-01) 3. https://lore.kernel.org/git/265b292ed5c2de19b7118dfe046d3d9d932e2e89.1667901510.git.ps@pks.im/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-11-08 16:34:01 -05:00
Junio C Hamano	b988427918	Merge branch 'rs/diff-caret-bang-with-parents' "git diff rev^!" did not show combined diff to go to the rev from its parents. * rs/diff-caret-bang-with-parents: diff: support ^! for merges revisions.txt: unspecify order of resolved parts of ^! revision: use strtol_i() for exclude_parent	2022-10-25 17:11:43 -07:00
Ævar Arnfjörð Bjarmason	82dd01d81b	CodingGuidelines: allow declaring variables in for loops Since `44ba10d671` (revision: use C99 declaration of variable in for() loop, 2021-11-14) released with v2.35.0 we've had a variable declared with in a for loop. Since then we've had inadvertent follow-ups to that with at least `cb2607759e` (merge-ort: store more specific conflict information, 2022-06-18) released with v2.38.0. As November 2022 is within the window of this upcoming release, let's update the guideline to allow this. We can have the promised "revisit" discussion while this patch cooks, and drop it if it turns out that it is still premature, which is not expected to happen at this moment. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-10 13:41:11 -07:00
René Scharfe	793c21182e	revision: use strtol_i() for exclude_parent Avoid silent overflow of the int exclude_parent by using the appropriate function, strtol_i(), to parse its value. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-01 15:58:33 -07:00
Junio C Hamano	298a958224	Merge branch 'jk/list-objects-filter-cleanup' A couple of bugfixes with code clean-up. * jk/list-objects-filter-cleanup: list-objects-filter: convert filter_spec to a strbuf list-objects-filter: add and use initializers list-objects-filter: handle null default filter spec list-objects-filter: don't memset after releasing filter struct	2022-09-19 14:35:24 -07:00
Junio C Hamano	dd407f1c7c	Merge branch 'ab/unused-annotation' Undoes 'jk/unused-annotation' topic and redoes it to work around Coccinelle rules misfiring false positives in unrelated codepaths. * ab/unused-annotation: git-compat-util.h: use "deprecated" for UNUSED variables git-compat-util.h: use "UNUSED", not "UNUSED(var)"	2022-09-14 12:56:39 -07:00
Junio C Hamano	a6b42ec0c6	Merge branch 'jk/unused-annotation' Annotate function parameters that are not used (but cannot be removed for structural reasons), to prepare us to later compile with -Wunused warning turned on. * jk/unused-annotation: is_path_owned_by_current_uid(): mark "report" parameter as unused run-command: mark unused async callback parameters mark unused read_tree_recursive() callback parameters hashmap: mark unused callback parameters config: mark unused callback parameters streaming: mark unused virtual method parameters transport: mark bundle transport_options as unused refs: mark unused virtual method parameters refs: mark unused reflog callback parameters refs: mark unused each_ref_fn parameters git-compat-util: add UNUSED macro	2022-09-14 12:56:39 -07:00
Junio C Hamano	655e494047	Merge branch 'jk/rev-list-verify-objects-fix' "git rev-list --verify-objects" ought to inspect the contents of objects and notice corrupted ones, but it didn't when the commit graph is in use, which has been corrected. * jk/rev-list-verify-objects-fix: rev-list: disable commit graph with --verify-objects lookup_commit_in_graph(): use prepare_commit_graph() to check for graph	2022-09-13 11:38:24 -07:00
Jeff King	2a01bdedf8	list-objects-filter: add and use initializers In `7e2619d8ff` (list_objects_filter_options: plug leak of filter_spec strings, 2022-09-08), we noted that the filter_spec string_list was inconsistent in how it handled memory ownership of strings stored in the list. The fix there was a bit of a band-aid to set the "strdup_strings" variable right before adding anything. That works OK, and it lets the users of the API continue to zero-initialize the struct. But it makes the code a bit hard to follow and accident-prone, as any other spots appending the filter_spec need to think about whether to set the strdup_strings value, too (there's one such spot in partial_clone_get_default_filter_spec(), which is probably a possible memory leak). So let's do that full cleanup now. We'll introduce a LIST_OBJECTS_FILTER_INIT macro and matching function, and use them as appropriate (though it is for the "_options" struct, this matches the corresponding list_objects_filter_release() function). This is harder than it seems! Many other structs, like git_transport_data, embed the filter struct. So they need to initialize it themselves even if the rest of the enclosing struct is OK with zero-initialization. I found all of the relevant spots by grepping manually for declarations of list_objects_filter_options. And then doing so recursively for structs which embed it, and ones which embed those, and so on. I'm pretty sure I got everything, but there's no change that would alert the compiler if any topics in flight added new declarations. To catch this case, we now double-check in the parsing function that things were initialized as expected and BUG() if appropriate. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-12 08:38:59 -07:00
Jeff King	9a8c3c4a5f	parse_object(): check commit-graph when skip_hash set If the caller told us that they don't care about us checking the object hash, then we're free to implement any optimizations that get us the parsed value more quickly. An obvious one is to check the commit graph before loading an object from disk. And in fact, both of the callers who pass in this flag are already doing so before they call parse_object()! So we can simplify those callers, as well as any possible future ones, by moving the logic into parse_object(). There are two subtle things to note in the diff, but neither has any impact in practice: - it seems least-surprising here to do the graph lookup on the git-replace'd oid, rather than the original. This is in theory a change of behavior from the earlier code, as neither caller did a replace lookup itself. But in practice it doesn't matter, as we disable the commit graph entirely if there are any replace refs. - the caller in get_reference() passes the skip_hash flag only if revs->verify_objects isn't set, whereas it would look in the commit graph unconditionally. In practice this should not matter as we should disable the commit graph entirely when using verify_objects (and that was done recently in another patch). So this should be a pure cleanup with no behavior change. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-07 12:27:02 -07:00
Jeff King	0bc2557951	upload-pack: skip parse-object re-hashing of "want" objects Imagine we have a history with commit C pointing to a large blob B. If a client asks us for C, we can generally serve both objects to them without accessing the uncompressed contents of B. In upload-pack, we figure out which commits we have and what the client has, and feed those tips to pack-objects. In pack-objects, we traverse the commits and trees (or use bitmaps!) to find the set of objects needed, but we never open up B. When we serve it to the client, we can often pass the compressed bytes directly from the on-disk packfile over the wire. But if a client asks us directly for B, perhaps because they are doing an on-demand fetch to fill in the missing blob of a partial clone, we end up much slower. Upload-pack calls parse_object() on the oid we receive, which opens up the object and re-checks its hash (even though if it were a commit, we might skip this parse entirely in favor of the commit graph!). And then we feed the oid directly to pack-objects, which again calls parse_object() and opens the object. And then finally, when we write out the result, we may send bytes straight from disk, but only after having unnecessarily uncompressed and computed the sha1 of the object twice! This patch teaches both code paths to use the new SKIP_HASH_CHECK flag for parse_object(). You can see the speed-up in p5600, which does a blob:none clone followed by a checkout. The savings for git.git are modest: Test HEAD^ HEAD ---------------------------------------------------------------------- 5600.3: checkout of result 2.23(4.19+0.24) 1.72(3.79+0.18) -22.9% But the savings scale with the number of bytes. So on a repository like linux.git with more files, we see more improvement (in both absolute and relative numbers): Test HEAD^ HEAD ---------------------------------------------------------------------------- 5600.3: checkout of result 51.62(77.26+2.76) 34.86(61.41+2.63) -32.5% And here's an even more extreme case. This is the android gradle-plugin repository, whose tip checkout has ~3.7GB of files: Test HEAD^ HEAD -------------------------------------------------------------------------- 5600.3: checkout of result 79.51(90.84+5.55) 40.28(51.88+5.67) -49.3% Keep in mind that these timings are of the whole checkout operation. So they count the client indexing the pack and actually writing out the files. If we want to see just the server's view, we can hack up the GIT_TRACE_PACKET output from those operations and replay it via upload-pack. For the gradle example, that gives me: Benchmark 1: GIT_PROTOCOL=version=2 git.old upload-pack ../gradle-plugin <input Time (mean ± σ): 50.884 s ± 0.239 s [User: 51.450 s, System: 1.726 s] Range (min … max): 50.608 s … 51.025 s 3 runs Benchmark 2: GIT_PROTOCOL=version=2 git.new upload-pack ../gradle-plugin <input Time (mean ± σ): 9.728 s ± 0.112 s [User: 10.466 s, System: 1.535 s] Range (min … max): 9.618 s … 9.842 s 3 runs Summary 'GIT_PROTOCOL=version=2 git.new upload-pack ../gradle-plugin <input' ran 5.23 ± 0.07 times faster than 'GIT_PROTOCOL=version=2 git.old upload-pack ../gradle-plugin <input' So a server would see an 80% reduction in CPU serving the initial checkout of a partial clone for this repository. Or possibly even more depending on the packing; most of the time spent in the faster one were objects we had to open during the write phase. In both cases skipping the extra hashing on the server should be pretty safe. The client doesn't trust the server anyway, so it will re-hash all of the objects via index-pack. There is one thing to note, though: the change in get_reference() affects not just pack-objects, but rev-list, git-log, etc. We could use a flag to limit to index-pack here, but we may already skip hash checks in this instance. For commits, we'd skip anything we load via the commit-graph. And while before this commit we would check a blob fed directly to rev-list on the command-line, we'd skip checking that same blob if we found it by traversing a tree. The exception for both is if --verify-objects is used. In that case, we'll skip this optimization, and the new test makes sure we do this correctly. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-07 12:20:02 -07:00
Jeff King	b27ccae34b	rev-list: disable commit graph with --verify-objects Since the point of --verify-objects is to actually load and checksum the bytes of each object, optimizing out reads using the commit graph runs contrary to our goal. The most targeted way to implement this would be for the revision traversal code to check revs->verify_objects and avoid using the commit graph. But it's difficult to be sure we've hit all of the correct spots. For instance, I started this patch by writing the first of the included test cases, where the corrupted commit is directly on rev-list's command line. And that is easy to fix by teaching get_reference() to check revs->verify_objects before calling lookup_commit_in_graph(). But that doesn't cover the second test case: when we traverse to a corrupted commit, we'd parse the parent in process_parents(). So we'd need to check there, too. And it keeps going. In handle_commit() we sometimes parses commits, too, though I couldn't figure out a way to trigger it that did not already parse via get_reference() or tag peeling. And try_to_simplify_commit() has its own parse call, and so on. So it seems like the safest thing is to just disable the commit graph for the whole process when we see the --verify-objects option. We can do that either in builtin/rev-list.c, where we use the option, or in revision.c, where we parse it. There are some subtleties: - putting it in rev-list.c is less surprising in some ways, because there we know we are just doing a single traversal. In a command which does multiple traversals in a single process, it's rather unexpected to globally disable the commit graph. - putting it in revision.c is less surprising in some ways, because the caller does not have to remember to disable the graph themselves. But this is already tricky! The verify_objects flag in rev_info doesn't do anything by itself. The caller has to provide an object callback which does the right thing. - for that reason, in practice nobody but rev-list uses this option in the first place. So the distinction is probably not important either way. Arguably it should just be an option of rev-list, and not the general revision machinery; right now you can run "git log --verify-objects", but it does not actually do anything useful. - checking for a parsed revs.verify_objects flag in rev-list.c is too late. By that time we've already passed the arguments to setup_revisions(), which will have parsed the commits using the graph. So this commit disables the graph as soon as we see the option in revision.c. That's a pretty broad hammer, but it does what we want, and in practice nobody but rev-list is using this flag anyway. The tests cover both the "tip" and "parent" cases. Obviously our hammer hits them both in this case, but it's good to check both in case somebody later tries the more focused approach. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-07 09:44:30 -07:00
Ævar Arnfjörð Bjarmason	5cf88fd8b0	git-compat-util.h: use "UNUSED", not "UNUSED(var)" As reported in [1] the "UNUSED(var)" macro introduced in `2174b8c75d` (Merge branch 'jk/unused-annotation' into next, 2022-08-24) breaks coccinelle's parsing of our sources in files where it occurs. Let's instead partially go with the approach suggested in [2] of making this not take an argument. As noted in [1] "coccinelle" will ignore such tokens in argument lists that it doesn't know about, and it's less of a surprise to syntax highlighters. This undoes the "help us notice when a parameter marked as unused is actually use" part of `9b24034754` (git-compat-util: add UNUSED macro, 2022-08-19), a subsequent commit will further tweak the macro to implement a replacement for that functionality. 1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-01 10:49:48 -07:00
Jeff King	02c3c59e62	hashmap: mark unused callback parameters Hashmap comparison functions must conform to a particular callback interface, but many don't use all of their parameters. Especially the void cmp_data pointer, but some do not use keydata either (because they can easily form a full struct to pass when doing lookups). Let's mark these to make -Wunused-parameter happy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-19 12:18:55 -07:00
Jeff King	c006e9fa59	refs: mark unused reflog callback parameters Functions used with for_each_reflog_ent() need to conform to a particular interface, but not every function needs all of the parameters. Mark the unused ones to make -Wunused-parameter happy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-19 12:18:54 -07:00
Jeff King	63e14ee2d6	refs: mark unused each_ref_fn parameters Functions used with for_each_ref(), etc, need to conform to the each_ref_fn interface. But most of them don't need every parameter; let's annotate the unused ones to quiet -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-19 12:18:54 -07:00
Elijah Newren	257418c590	revision: allow --ancestry-path to take an argument We have long allowed users to run e.g. git log --ancestry-path master..seen which shows all commits which satisfy all three of these criteria: * are an ancestor of seen * are not an ancestor of master * have master as an ancestor This commit allows another variant: git log --ancestry-path=$TOPIC master..seen which shows all commits which satisfy all of these criteria: * are an ancestor of seen * are not an ancestor of master * have $TOPIC in their ancestry-path that last bullet can be defined as commits meeting any of these criteria: * are an ancestor of $TOPIC * have $TOPIC as an ancestor * are $TOPIC This also allows multiple --ancestry-path arguments, which can be used to find commits with any of the given topics in their ancestry path. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-19 10:45:08 -07:00
Junio C Hamano	83489a5b20	Merge branch 'ab/plug-revisions-leak' Plug a bit more leaks in the revisions API. * ab/plug-revisions-leak: revisions API: don't leak memory on argv elements that need free()-ing bisect.c: partially fix bisect_rev_setup() memory leak log: refactor "rev.pending" code in cmd_show() log: fix a memory leak in "git show <revision>..." test-fast-rebase helper: use release_revisions() (again) bisect.c: add missing "goto" for release_revisions()	2022-08-12 13:19:08 -07:00
Junio C Hamano	a6aeb2fef9	Merge branch 'jc/resolve-undo' into maint The resolve-undo information in the index was not protected against GC, which has been corrected. source: <xmqq35f7kzad.fsf@gitster.g> * jc/resolve-undo: fsck: do not dereference NULL while checking resolve-undo data revision: mark blobs needed for resolve-undo as reachable	2022-08-10 21:52:32 -07:00
Junio C Hamano	87098a047b	Merge branch 'sa/cat-file-mailmap' "git cat-file" learned an option to use the mailmap when showing commit and tag objects. * sa/cat-file-mailmap: cat-file: add mailmap support ident: rename commit_rewrite_person() to apply_mailmap_to_header() ident: move commit_rewrite_person() to ident.c revision: improve commit_rewrite_person()	2022-08-03 13:36:08 -07:00
Ævar Arnfjörð Bjarmason	f92dbdbc6a	revisions API: don't leak memory on argv elements that need free()-ing Add a "free_removed_argv_elements" member to "struct setup_revision_opt", and use it to fix several memory leaks. We have various memory leaks in APIs that take and munge "const char **argv", e.g. parse_options(). Sometimes these APIs are given the "argv" we get to the "main" function, in which case we don't leak memory, but other times we're giving it the "v" member of a "struct strvec" we created. There's several potential ways to fix those sort of leaks, we could add a "nodup" mode to "struct strvec", which would work for the cases where we push constant strings to it. But that wouldn't work as soon as we used strvec_pushf(), or otherwise needed to duplicate or create a string for that "struct strvec". Let's instead make it the responsibility of the revisions API. If it's going to clobber elements of argv it can also free() them, which it will now do if instructed to do so via "free_removed_argv_elements". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-03 11:12:36 -07:00
Junio C Hamano	418aef9055	Merge branch 'jc/resolve-undo' The resolve-undo information in the index was not protected against GC, which has been corrected. * jc/resolve-undo: fsck: do not dereference NULL while checking resolve-undo data revision: mark blobs needed for resolve-undo as reachable	2022-07-19 16:40:16 -07:00
Siddharth Asthana	66a8a95315	ident: rename commit_rewrite_person() to apply_mailmap_to_header() commit_rewrite_person() takes a commit buffer and replaces the idents in the header with their canonical versions using the mailmap mechanism. The name "commit_rewrite_person()" is misleading as it doesn't convey what kind of rewrite are we going to do to the buffer. It also doesn't clearly mention that the function will limit itself to the header part of the buffer. The new name, "apply_mailmap_to_header()", expresses the functionality of the function pretty clearly. We intend to use apply_mailmap_to_header() in git-cat-file to replace idents in the headers of commit and tag object buffers. So, we will be extending this function to take tag objects buffer as well and replace idents on the tagger header using the mailmap mechanism. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: John Cai <johncai86@gmail.com> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-18 12:55:53 -07:00
Siddharth Asthana	dc88e349a2	ident: move commit_rewrite_person() to ident.c commit_rewrite_person() and rewrite_ident_line() are static functions defined in revision.c. Their usages are as follows: - commit_rewrite_person() takes a commit buffer and replaces the author and committer idents with their canonical versions using the mailmap mechanism - rewrite_ident_line() takes author/committer header lines from the commit buffer and replaces the idents with their canonical versions using the mailmap mechanism. This patch moves commit_rewrite_person() and rewrite_ident_line() to ident.c which contains many other functions related to idents like split_ident_line(). By moving commit_rewrite_person() to ident.c, we also intend to use it in git-cat-file to replace committer and author idents from the headers to their canonical versions using the mailmap mechanism. The function is moved as is for now to make it clear that there are no other changes, but it will be renamed in a following commit. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: John Cai <johncai86@gmail.com> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-18 12:55:53 -07:00
Siddharth Asthana	e9c1b0e38c	revision: improve commit_rewrite_person() The function, commit_rewrite_person(), is designed to find and replace an ident string in the header part, and the way it avoids a random occurrence of "author A U Thor <author@example.com" in the text is by insisting "author" to appear at the beginning of line by passing "\nauthor " as "what". The implementation also doesn't make any effort to limit itself to the commit header by locating the blank line that appears after the header part and stopping the search there. Also, the interface forces the caller to make multiple calls if it wants to rewrite idents on multiple headers. It shouldn't be the case. To support the existing caller better, update commit_rewrite_person() to: - Make a single pass in the input buffer to locate headers named "author" and "committer" and replace idents on them. - Stop at the end of the header, ensuring that nothing in the body of the commit object is modified. The return type of the function commit_rewrite_person() has also been changed from int to void. This has been done because the caller of the function doesn't do anything with the return value of the function. By simplifying the interface of the commit_rewrite_person(), we also intend to expose it as a public function. We will also be renaming the function in a future commit to a different name which clearly tells that the function replaces idents in the header of the commit buffer. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: John Cai <johncai86@gmail.com> Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-18 12:55:53 -07:00
Junio C Hamano	5a5ea141e7	revision: mark blobs needed for resolve-undo as reachable The resolve-undo extension was added to the index in `cfc5789a` (resolve-undo: record resolved conflicts in a new index extension section, 2009-12-25). This extension records the blob object names and their modes of conflicted paths when the path gets resolved (e.g. with "git add"), to allow "undoing" the resolution with "checkout -m path". These blob objects should be guarded from garbage-collection while we have the resolve-undo information in the index (otherwise unresolve operation may try to use a blob object that has already been pruned away). But the code called from mark_reachable_objects() for the index forgets to do so. Teach add_index_objects_to_pending() helper to also add objects referred to by the resolve-undo extension. Also make matching changes to "fsck", which has code that is fairly similar to the reachability stuff, but have parallel implementations for all these stuff, which may (or may not) someday want to be unified. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-06-09 16:45:07 -07:00
Junio C Hamano	2da81d1efb	Merge branch 'ab/plug-leak-in-revisions' Plug the memory leaks from the trickiest API of all, the revision walker. * ab/plug-leak-in-revisions: (27 commits) revisions API: add a TODO for diff_free(&revs->diffopt) revisions API: have release_revisions() release "topo_walk_info" revisions API: have release_revisions() release "date_mode" revisions API: call diff_free(&revs->pruning) in revisions_release() revisions API: release "reflog_info" in release revisions() revisions API: clear "boundary_commits" in release_revisions() revisions API: have release_revisions() release "prune_data" revisions API: have release_revisions() release "grep_filter" revisions API: have release_revisions() release "filter" revisions API: have release_revisions() release "cmdline" revisions API: have release_revisions() release "mailmap" revisions API: have release_revisions() release "commits" revisions API users: use release_revisions() for "prune_data" users revisions API users: use release_revisions() with UNLEAK() revisions API users: use release_revisions() in builtin/log.c revisions API users: use release_revisions() in http-push.c revisions API users: add "goto cleanup" for release_revisions() stash: always have the owner of "stash_info" free it revisions API users: use release_revisions() needing REV_INFO_INIT revision.[ch]: document and move code declared around "init" ...	2022-06-07 14:10:56 -07:00
Junio C Hamano	538dc459a0	Merge branch 'ep/maint-equals-null-cocci' Introduce and apply coccinelle rule to discourage an explicit comparison between a pointer and NULL, and applies the clean-up to the maintenance track. * ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci	2022-05-20 15:26:59 -07:00
Junio C Hamano	2b0a58d164	Merge branch 'ep/maint-equals-null-cocci' for maint-2.35 * ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci	2022-05-02 10:06:04 -07:00
Junio C Hamano	afe8a9070b	tree-wide: apply equals-null.cocci Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-02 09:50:37 -07:00
Miklos Vajna	96697781e0	log: "--since-as-filter" option is a non-terminating "--since" variant The "--since=<time>" option of "git log" limits the commits displayed by the command by stopping the traversal once it sees a commit whose timestamp is older than the given time and not digging further into its parents. This is OK in a history where a commit always has a newer timestamp than any of its parents'. Once you see a commit older than the given <time>, all ancestor commits of it are even older than the time anyway. It poses, however, a problem when there is a commit with a wrong timestamp that makes it appear older than its parents. Stopping traversal at the "incorrectly old" commit will hide its ancestors that are newer than that wrong commit and are newer than the cut-off time given with the --since option. --max-age and --after being the synonyms to --since, they share the same issue. Add a new "--since-as-filter" option that is a variant of "--since=<time>". Instead of stopping the traversal to hide an old enough commit and its all ancestors, exclude commits with an old timestamp from the output but still keep digging the history. Without other traversal stopping options, this will force the command in "git log" family to dig down the history to the root. It may be an acceptable cost for a small project with short history and many commits with screwy timestamps. It is quite unlikely for us to add traversal stopper other than since, so have this as a --since-as-filter option, rather than a separate --as-filter, that would be probably more confusing. Signed-off-by: Miklos Vajna <vmiklos@vmiklos.hu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-23 09:36:07 -07:00
Ævar Arnfjörð Bjarmason	54c8a7c379	revisions API: add a TODO for diff_free(&revs->diffopt) Add a TODO comment indicating that we should release "diffopt" in release_revisions(). In a preceding commit we started releasing the "pruning" member of the same type, but handling "diffopt" will require us to untangle the "no_free" conditions I added in `e900d494dc` (diff: add an API for deferred freeing, 2021-02-11). Let's leave a TODO comment to that effect, and so that we don't forget refactor code that was changed to use release_revisions() in earlier commits to stop using the "diffopt" member after a call to release_revisions(). This works currently, but would become a logic error as soon as we started freeing "diffopt". Doing that change now doesn't harm anything, and future-proofs us against a later change to release_revisions(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-13 23:56:10 -07:00
Ævar Arnfjörð Bjarmason	ae1b383dfa	revisions API: have release_revisions() release "topo_walk_info" Refactor the existing reset_topo_walk() into a thin wrapper for a release_revisions_topo_walk_info() + resetting the member to "NULL", and call release_revisions_topo_walk_info() from release_revisions(). This fixes memory leaks that have been with us ever since "topo_walk_info" was added to revision.[ch] in `f0d9cc4196` (revision.c: begin refactoring --topo-order logic, 2018-11-01). Due to various other leaks this makes no tests pass in their entirety, but e.g. before this running this on git.git: ./git -P log --pretty=tformat:"%P %H \| %s" --parents --full-history --topo-order -3 -- README.md Would report under SANITIZE=leak: SUMMARY: LeakSanitizer: 531064 byte(s) leaked in 6 allocation(s). Now we'll free all of that memory. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-13 23:56:10 -07:00
Ævar Arnfjörð Bjarmason	9d5a7df332	revisions API: have release_revisions() release "date_mode" Extend the the release_revisions() function so that it frees the "date_mode" in the "struct ref_info". This uses the date_mode_release() function added in `974c919d36` (date API: add and use a date_mode_release(), 2022-02-16). As that commit notes "t7004-tag.sh" tests for the leaks that are being fixed here. That test now fails "only" 44 tests, instead of the 46 it failed before this change. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-13 23:56:10 -07:00
Ævar Arnfjörð Bjarmason	6ab75ac839	revisions API: call diff_free(&revs->pruning) in revisions_release() Call diff_free() on the "pruning" member of "struct rev_info". Doing so makes several tests pass under SANITIZE=leak. This was also the last missing piece that allows us to remove the UNLEAK() in "cmd_diff" and "cmd_diff_index", which allows us to use those commands as a canary for general leaks in the revisions API. See [1] for further rationale, and `886e1084d7` (builtin/: add UNLEAKs, 2017-10-01) for the commit that added the UNLEAK() there. 1. https://lore.kernel.org/git/220218.861r00ib86.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-13 23:56:10 -07:00
Ævar Arnfjörð Bjarmason	81ffbf8380	revisions API: release "reflog_info" in release revisions() Add a missing reflog_walk_info_release() to "reflog-walk.c" and use it in release_revisions(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-13 23:56:10 -07:00
Ævar Arnfjörð Bjarmason	ab1f6926e9	revisions API: clear "boundary_commits" in release_revisions() Clear the "boundary_commits" object_array in release_revisions(). This makes a few more tests pass under SANITIZE=leak, including "t/t4126-apply-empty.sh" which started failed as an UNLEAK() in cmd_format_patch() was removed in a preceding commit. This also re-marks the various tests relying on "git format-patch" as passing under "SANITIZE=leak", in the preceding "revisions API users: use release_revisions() in builtin/log.c" commit those were marked as failing as we removed the UNLEAK(rev) from cmd_format_patch() in "builtin/log.c". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason	689a8e80dd	revisions API: have release_revisions() release "prune_data" Extend the the release_revisions() function so that it frees the "prune_data" in the "struct rev_info". This means that any code that calls "release_revisions()" already can get rid of adjacent calls to clear_pathspec(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason	f41fb662f5	revisions API: have release_revisions() release "grep_filter" Extend the the release_revisions() function so that it frees the "grep_filter" in the "struct rev_info".This allows us to mark a test as passing under "TEST_PASSES_SANITIZE_LEAK=true". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason	e75d2f7f73	revisions API: have release_revisions() release "filter" Extend the the release_revisions() function so that it frees the "filter" in the "struct rev_info". This in combination with a preceding change to free "cmdline" means that we can mark another set of tests as passing under "TEST_PASSES_SANITIZE_LEAK=true". The "filter" member was added recently in `ffaa137f64` (revision: put object filter into struct rev_info, 2022-03-09), and this fixes leaks intruded in the subsequent leak `7940941de1` (pack-objects: use rev.filter when possible, 2022-03-09) and `105c6f14ad` (bundle: parse filter capability, 2022-03-09). The "builtin/pack-objects.c" leak in `7940941de1` was effectively with us already, but the variable was referred to by a "static" file-scoped variable. The "bundle.c " leak in `105c6f14ad` was newly introduced with the new "filter" feature for bundles. The "t5600-clone-fail-cleanup.sh" change here to add "TEST_PASSES_SANITIZE_LEAK=true" is one of the cases where run-command.c in not carrying the abort() exit code upwards would have had that test passing before, but now it actually passes[1]. We should fix the lack of 1=1 mapping of SANITIZE=leak testing to actual leaks some other time, but it's an existing edge case, let's just mark the really-passing test as passing for now. 1. https://lore.kernel.org/git/220303.86fsnz5o9w.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason	7a98d9ab00	revisions API: have release_revisions() release "cmdline" Extend the the release_revisions() function so that it frees the "cmdline" in the "struct rev_info". This in combination with a preceding change to free "commits" and "mailmap" means that we can whitelist another test under "TEST_PASSES_SANITIZE_LEAK=true". There was a proposal in [1] to do away with xstrdup()-ing this add_rev_cmdline(), perhaps that would be worthwhile, but for now let's just free() it. We could also make that a "char " in "struct rev_cmdline_entry" itself, but since we own it let's expose it as a constant to outside callers. I proposed that in [2] but have since changed my mind. See `14d30cdfc0` (ref-filter: fix memory leak in `free_array_item()`, 2019-07-10), `c514c62a4f` (checkout: fix leak of non-existent branch names, 2020-08-14) and other log history hits for "free((char )" for prior art. This includes the tests we had false-positive passes on before my `6798b08e84` (perl Git.pm: don't ignore signalled failure in _cmd_close(), 2022-02-01), now they pass for real. Since there are 66 tests matching t/t[0-9]git-svn.sh it's easier to list those that don't pass than to touch most of those 66. So let's introduce a "TEST_FAILS_SANITIZE_LEAK=true", which if set in the tests won't cause lib-git-svn.sh to set "TEST_PASSES_SANITIZE_LEAK=true. This change also marks all the tests that we removed "TEST_FAILS_SANITIZE_LEAK=true" from in an earlier commit due to removing the UNLEAK() from cmd_format_patch(), we can now assert that its API use doesn't leak any "struct rev_info" memory. This change also made commit "t5503-tagfollow.sh" pass on current master, but that would regress when combined with ps/fetch-atomic-fixup's `de004e848a` (t5503: simplify setup of test which exercises failure of backfill, 2022-03-03) (through no fault of that topic, that change started using "git clone" in the test, which has an outstanding leak). Let's leave that test out for now to avoid in-flight semantic conflicts. 1. https://lore.kernel.org/git/YUj%2FgFRh6pwrZalY@carlos-mbp.lan/ 2. https://lore.kernel.org/git/87o88obkb1.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason	a52f07afcb	revisions API: have release_revisions() release "mailmap" Extend the the release_revisions() function so that it frees the "mailmap" in the "struct rev_info". The log family of functions now calls the clear_mailmap() function added in fa8afd18e5a (revisions API: provide and use a release_revisions(), 2021-09-19), allowing us to whitelist some tests with "TEST_PASSES_SANITIZE_LEAK=true". Unfortunately having a pointer to a mailmap in "struct rev_info" instead of an embedded member that we "own" get a bit messy, as can be seen in the change to builtin/commit.c. When we free() this data we won't be able to tell apart a pointer to a "mailmap" on the heap from one on the stack. As seen in `ea57bc0d41` (log: add --use-mailmap option, 2013-01-05) the "log" family allocates it on the heap, but in the find_author_by_nickname() code added in `ea16794e43` (commit: search author pattern against mailmap, 2013-08-23) we allocated it on the stack instead. Ideally we'd simply change that member to a "struct string_list mailmap" and never free() the "mailmap" itself, but that would be a much larger change to the revisions API. We have code that needs to hand an existing "mailmap" to a "struct rev_info", while we could change all of that, let's not go there now. The complexity isn't in the ownership of the "mailmap" per-se, but that various things assume a "rev_info.mailmap == NULL" means "doesn't want mailmap", if we changed that to an init'd "struct string_list we'd need to carefully refactor things to change those assumptions. Let's instead always free() it, and simply declare that if you add such a "mailmap" it must be allocated on the heap. Any modern libc will correctly panic if we free() a stack variable, so this should be safe going forward. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason	e966fc5a89	revisions API: have release_revisions() release "commits" Extend the the release_revisions() function so that it frees the "commits" in the "struct rev_info". We don't expect to use this "struct rev_info" again, so there's no reason to NULL out revs->commits, as e.g. simplify_merges() and create_boundary_commit_list() do. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason	296a143845	revision.[ch]: document and move code declared around "init" A subsequent commit will add "REV_INFO_INIT" macro adjacent to repo_init_revisions(), unfortunately between the "struct rev_info" itself and that function we've added various miscellaneous code between the two over the years. Let's move that code either lower in revision.h, giving it API docs while we're at it, or in cases where it wasn't public API at all move it into revision.c No lines of code are changed here, only moved around. The only changes are the addition of new API comments. The "tree_difference" variable could also be declared like this, which I think would be a lot clearer, but let's leave that for now to keep this a move-only change: static enum { REV_TREE_SAME, REV_TREE_NEW, /* Only new files / REV_TREE_OLD, / Only files removed / REV_TREE_DIFFERENT, / Mixed changes */ } tree_difference = REV_TREE_SAME; Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-13 23:56:08 -07:00
Ævar Arnfjörð Bjarmason	1878b5edc0	revision.[ch]: provide and start using a release_revisions() The users of the revision.[ch] API's "struct rev_info" are a major source of memory leaks in the test suite under SANITIZE=leak, which in turn adds a lot of noise when trying to mark up tests with "TEST_PASSES_SANITIZE_LEAK=true". The users of that API are largely one-shot, e.g. "git rev-list" or "git log", or the "git checkout" and "git stash" being modified here For these callers freeing the memory is arguably a waste of time, but in many cases they've actually been trying to free the memory, and just doing that in a buggy manner. Let's provide a release_revisions() function for these users, and start migrating them over per the plan outlined in [1]. Right now this only handles the "pending" member of the struct, but more will be added in subsequent commits. Even though we only clear the "pending" member now, let's not leave a trap in code like the pre-image of index_differs_from(), where we'd start doing the wrong thing as soon as the release_revisions() learned to clear its "diffopt". I.e. we need to call release_revisions() after we've inspected any state in "struct rev_info". This leaves in place e.g. clear_pathspec(&rev.prune_data) in stash_working_tree() in builtin/stash.c, subsequent commits will teach release_revisions() to free "prune_data" and other members that in some cases are individually cleared by users of "struct rev_info" by reaching into its members. Those subsequent commits will remove the relevant calls to e.g. clear_pathspec(). We avoid amending code in index_differs_from() in diff-lib.c as well as wt_status_collect_changes_index(), has_unstaged_changes() and has_uncommitted_changes() in wt-status.c in a way that assumes that we are already clearing the "diffopt" member. That will be handled in a subsequent commit. 1. https://lore.kernel.org/git/87a6k8daeu.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-13 23:56:08 -07:00
Ævar Arnfjörð Bjarmason	bf20fe4ca8	cocci: add and apply free_commit_list() rules Add and apply coccinelle rules to remove "if (E)" before "free_commit_list(E)", the function can accept NULL, and further change cases where "E = NULL" followed to also be unconditionally. The code changes in this commit were entirely made by the coccinelle rule being added here, and applied with: make contrib/coccinelle/free.cocci.patch patch -p1 <contrib/coccinelle/free.cocci.patch The only manual intervention here is that the the relevant code in commit.c has been manually re-indented. Suggested-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-13 23:56:08 -07:00
Derrick Stolee	cc91044256	list-objects-filter: remove CL_ARG__FILTER We have established the command-line interface for the --[no-]filter options for a while now, so we do not need a helper to make this editable in the future. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-23 13:13:17 -07:00
Junio C Hamano	7391ecd338	Merge branch 'ds/partial-bundles' Bundle file format gets extended to allow a partial bundle, filtered by similar criteria you would give when making a partial/lazy clone. * ds/partial-bundles: clone: fail gracefully when cloning filtered bundle bundle: unbundle promisor packs bundle: create filtered bundles rev-list: move --filter parsing into revision.c bundle: parse filter capability list-objects: handle NULL function pointers MyFirstObjectWalk: update recommended usage list-objects: consolidate traverse_commit_list[_filtered] pack-bitmap: drop filter in prepare_bitmap_walk() pack-objects: use rev.filter when possible revision: put object filter into struct rev_info list-objects-filter-options: create copy helper index-pack: document and test the --promisor option	2022-03-21 15:14:24 -07:00
Derrick Stolee	c4ea513f4a	rev-list: move --filter parsing into revision.c Now that 'struct rev_info' has a 'filter' member and most consumers of object filtering are using that member instead of an external struct, move the parsing of the '--filter' option out of builtin/rev-list.c and into revision.c. This use within handle_revision_pseudo_opt() allows us to find the option within setup_revisions() if the arguments are passed directly. In the case of a command such as 'git blame', the arguments are first scanned and checked with parse_revision_opt(), which complains about the option, so 'git blame --filter=blob:none <file>' does not become valid with this change. Some commands, such as 'git diff' gain this option without having it make an effect. And 'git diff --objects' was already possible, but does not actually make sense in that builtin. The key addition that is coming is 'git bundle create --filter=<X>' so we can create bundles containing promisor packs. More work is required to make them fully functional, but that will follow. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-09 10:25:27 -08:00
Junio C Hamano	5b84280c65	Merge branch 'ab/grep-patterntype' Some code clean-up in the "git grep" machinery. * ab/grep-patterntype: grep: simplify config parsing and option parsing grep.c: do "if (bool && memchr())" not "if (memchr() && bool)" grep.h: make "grep_opt.pattern_type_option" use its enum grep API: call grep_config() after grep_init() grep.c: don't pass along NULL callback value built-ins: trust the "prefix" from run_builtin() grep tests: add missing "grep.patternType" config tests grep tests: create a helper function for "BRE" or "ERE" log tests: check if grep_config() is called by "log"-like cmds grep.h: remove unused "regex_t regexp" from grep_opt	2022-02-25 15:47:36 -08:00
Junio C Hamano	8813596531	Merge branch 'ah/log-no-graph' "git log --graph --graph" used to leak a graph structure, and there was no way to countermand "--graph" that appear earlier on the command line. A "--no-graph" option has been added and resource leakage has been plugged. * ah/log-no-graph: log: add a --no-graph option log: fix memory leak if --graph is passed multiple times	2022-02-23 16:58:03 -08:00
Ævar Arnfjörð Bjarmason	04bf052eef	grep: simplify config parsing and option parsing Simplify the parsing of "grep.patternType" and "grep.extendedRegexp". This changes no behavior, but gets rid of complex parsing logic that isn't needed anymore. When "grep.patternType" was introduced in `84befcd0a4` (grep: add a grep.patternType configuration setting, 2012-08-03) we promised that: 1. You can set "grep.patternType", and "[setting it to] 'default' will return to the default matching behavior". In that context "the default" meant whatever the configuration system specified before that change, i.e. via grep.extendedRegexp. 2. We'd support the existing "grep.extendedRegexp" option, but ignore it when the new "grep.patternType" option is set. We said we'd only ignore the older "grep.extendedRegexp" option "when the `grep.patternType` option is set to a value other than 'default'". In a preceding commit we changed grep_config() to be called after grep_init(), which means that much of the complexity here can go away. As before both "grep.patternType" and "grep.extendedRegexp" are last-one-wins variable, with "grep.extendedRegexp" yielding to "grep.patternType", except when "grep.patternType=default". Note that as the previously added tests indicate this cannot be done on-the-fly as we see the config variables, without introducing more state keeping. I.e. if we see: -c grep.extendedRegexp=false -c grep.patternType=default -c extendedRegexp=true We need to select ERE, since grep.patternType=default unselects that variable, which normally has higher precedence, but we also need to select BRE in cases of: -c grep.extendedRegexp=true \ -c grep.extendedRegexp=false Which would not be the case for this, which select ERE: -c grep.patternType=extended \ -c grep.extendedRegexp=false Therefore we cannot do this on-the-fly in grep_config without also introducing tracking variables for not only the pattern type, but what the source of that pattern type was. So we need to decide on the pattern after our config was fully parsed. Let's do that by deferring the decision on the pattern type until it's time to compile it in compile_regexp(). By that time we've not only parsed the config, but also handled the command-line options. Those will set "opt.pattern_type_option" (not "opt.extended_regexp_option"!). At that point all we need to do is see if "grep.patternType" was UNSPECIFIED in the end (including an explicit "=default"), if so we'll use the "grep.extendedRegexp" configuration, if any. See my `07a3d41173` (grep: remove regflags from the public grep_opt API, 2017-06-29) for addition of the two comments being removed here, i.e. the complexity noted in that commit is now going away. 1. https://lore.kernel.org/git/patch-v8-09.10-c211bb0c69d-20220118T155211Z-avarab@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-15 18:00:50 -08:00
Ævar Arnfjörð Bjarmason	9725c8dda2	built-ins: trust the "prefix" from run_builtin() Change code in "builtin/grep.c" and "builtin/ls-tree.c" to trust the "prefix" passed from "run_builtin()". The "prefix" we get from setup.c is either going to be NULL or a string of length >0, never "". So we can drop the "prefix && prefix" checks added for "builtin/grep.c" in `0d042fecf2` (git-grep: show pathnames relative to the current directory, 2006-08-11), and for "builtin/ls-tree.c" in `a69dd585fc` (ls-tree: chomp leading directories when run from a subdirectory, 2005-12-23). As seen in code in revision.c that was added in `cd676a5136` (diff --relative: output paths as relative to the current subdirectory, 2008-02-12) we already have existing code that does away with this assertion. This makes it easier to reason about a subsequent change to the "prefix_length" code in grep.c in a subsequent commit, and since we're going to the trouble of doing that let's leave behind an assert() to promise this to any future callers. For "builtin/grep.c" it would be painful to pass the "prefix" down the callchain of: cmd_grep -> grep_tree -> grep_submodule -> grep_cache -> grep_oid -> grep_source_name So for the code that needs it in grep_source_name() let's add a "grep_prefix" variable similar to the existing "ls_tree_prefix". While at it let's move the code in cmd_ls_tree() around so that we assign to the "ls_tree_prefix" right after declaring the variables, and stop assigning to "prefix". We only subsequently used that variable later in the function after clobbering it. Let's just use our own "grep_prefix" instead. Let's also add an assert() in git.c, so that we'll make this promise about the "prefix" to any current and future callers, as well as to any readers of the code. Code history: The strlen() in "grep.c" hasn't been used since `493b7a08d8` (grep: accept relative paths outside current working directory, 2009-09-05). When that code was added in `0d042fecf2` (git-grep: show pathnames relative to the current directory, 2006-08-11) we used the length. But since `493b7a08d8` we haven't used it for anything except a boolean check that we could have done on the "prefix" member itself. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-15 18:00:50 -08:00
Alex Henrie	087c745833	log: add a --no-graph option It's useful to be able to countermand a previous --graph option, for example if `git log --graph` is run via an alias. Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-11 10:06:41 -08:00
Alex Henrie	dccf6c16f1	log: fix memory leak if --graph is passed multiple times Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-11 10:06:40 -08:00
Jerry Zhang	9d505b7b49	git-rev-list: add --exclude-first-parent-only flag It is useful to know when a branch first diverged in history from some integration branch in order to be able to enumerate the user's local changes. However, these local changes can include arbitrary merges, so it is necessary to ignore this merge structure when finding the divergence point. In order to do this, teach the "rev-list" family to accept "--exclude-first-parent-only", which restricts the traversal of excluded commits to only follow first parent links. -A-----E-F-G--main \ / / B-C-D--topic In this example, the goal is to return the set {B, C, D} which represents a topic branch that has been merged into main branch. `git rev-list topic ^main` will end up returning no commits since excluding main will end up traversing the commits on topic as well. `git rev-list --exclude-first-parent-only topic ^main` however will return {B, C, D} as desired. Add docs for the new flag, and clarify the doc for --first-parent to indicate that it applies to traversing the set of included commits only. Signed-off-by: Jerry Zhang <jerry@skydio.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-12 11:08:42 -08:00
Junio C Hamano	c17de5a505	Merge branch 'ja/i18n-similar-messages' Similar message templates have been consolidated so that translators need to work on fewer number of messages. * ja/i18n-similar-messages: i18n: turn even more messages into "cannot be used together" ones i18n: ref-filter: factorize "%(foo) atom used without %(bar) atom" i18n: factorize "--foo outside a repository" i18n: refactor "unrecognized %(foo) argument" strings i18n: factorize "no directory given for --foo" i18n: factorize "--foo requires --bar" and the like i18n: tag.c factorize i18n strings i18n: standardize "cannot open" and "cannot read" i18n: turn "options are incompatible" into "cannot be used together" i18n: refactor "%s, %s and %s are mutually exclusive" i18n: refactor "foo and bar are mutually exclusive"	2022-01-10 11:52:56 -08:00
Junio C Hamano	2043ce828e	Merge branch 'rs/log-invert-grep-with-headers' "git log --invert-grep --author=<name>" used to exclude commits written by the given author, but now "--invert-grep" only affects the matches made by the "--grep=<pattern>" option. * rs/log-invert-grep-with-headers: log: let --invert-grep only invert --grep	2022-01-05 14:01:30 -08:00
Jean-Noël Avila	6fa00ee843	i18n: factorize "--foo requires --bar" and the like They are all replaced by "the option '%s' requires '%s'", which is a new string but replaces 17 previous unique strings. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-05 13:31:00 -08:00
Jean-Noël Avila	12909b6b8a	i18n: turn "options are incompatible" into "cannot be used together" Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-05 13:29:23 -08:00
Junio C Hamano	5a4069a1d8	Merge branch 'jc/c99-var-decl-in-for-loop' Weather balloon to find compilers that do not grok variable declaration in the for() loop. * jc/c99-var-decl-in-for-loop: revision: use C99 declaration of variable in for() loop	2021-12-21 15:03:15 -08:00
René Scharfe	794c000267	log: let --invert-grep only invert --grep The option --invert-grep is documented to filter out commits whose messages match the --grep filters. However, it also affects the header matches (--author, --committer), which is not intended. Move the handling of that option to grep.c, as only the code there can distinguish between matches in the header from those in the message body. If --invert-grep is given then enable extended expressions (not the regex type, we just need git grep's --not to work), negate the body patterns and check if any of them match by piggy-backing on the collect_hits mechanism of grep_source_1(). Collecting the matches in struct grep_opt is a bit iffy, but with "last_shown" we have a precedent for writing state information to that struct. Reported-by: Dotan Cohen <dotancohen@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-17 14:13:08 -08:00
Junio C Hamano	44ba10d671	revision: use C99 declaration of variable in for() loop There are certain C99 features that might be nice to use in our code base, but we've hesitated to do so in order to avoid breaking compatibility with older compilers. But we don't actually know if people are even using pre-C99 compilers these days. One way to figure that out is to introduce a very small use of a feature, and see if anybody complains, and we've done so to probe the portability for a few features like "trailing comma in enum declaration", "designated initializer for struct", and "designated initializer for array". A few years ago, we tried to use a handy for (int i = 0; i < n; i++) use(i); to introduce a new variable valid only in the loop, but found that some compilers we cared about didn't like it back then. Two years is a long-enough time, so let's try it again. If this patch can survive a few releases without complaint, then we can feel more confident that variable declaration in for() loop is supported by the compilers our user base use. And if we do get complaints, then we'll have gained some data and we can easily revert this patch. Helped-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-03 10:16:00 -08:00
Junio C Hamano	8996d68ac7	Merge branch 'ps/connectivity-optim' Regression fix. * ps/connectivity-optim: Revert "connected: do not sort input revisions"	2021-11-12 15:29:24 -08:00
Junio C Hamano	a7df4f52af	Revert "connected: do not sort input revisions" This reverts commit `f45022dc2f`, as this is like breakage in the traversal more likely. In a history with 10 single strand of pearls, 1-->2-->3--...->7-->8-->9-->10 asking "rev-list --unsorted-input 1 10 --not 9 8 7 6 5 4" fails to paint the bottom 1 uninteresting as the traversal stops, without completing the propagation of uninteresting bit starting at 4 down through 3 and 2 to 1.	2021-11-11 12:34:41 -08:00
Junio C Hamano	404c4a5462	Merge branch 'ab/designated-initializers' Code clean-up. * ab/designated-initializers: cbtree.h: define cb_init() in terms of CBTREE_INIT .h: move some _INIT to designated initializers .h _INIT macros: don't specify fields equal to 0 .[ch] *_INIT macros: use { 0 } for a "zero out" idiom submodule-config.h: remove unused SUBMODULE_INIT macro	2021-10-11 10:21:48 -07:00
Junio C Hamano	f6c075ad71	Merge branch 'jk/ref-paranoia' The ref iteration code used to optionally allow dangling refs to be shown, which has been tightened up. * jk/ref-paranoia: refs: drop "broken" flag from for_each_fullref_in() ref-filter: drop broken-ref code entirely ref-filter: stop setting FILTER_REFS_INCLUDE_BROKEN repack, prune: drop GIT_REF_PARANOIA settings refs: turn on GIT_REF_PARANOIA by default refs: omit dangling symrefs when using GIT_REF_PARANOIA refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flag refs-internal.h: reorganize DO_FOR_EACH_* flag documentation refs-internal.h: move DO_FOR_EACH_* flags next to each other t5312: be more assertive about command failure t5312: test non-destructive repack t5312: create bogus ref as necessary t5312: drop "verbose" helper t5600: provide detached HEAD for corruption failures t5516: don't use HEAD ref for invalid ref-deletion tests t7900: clean up some more broken refs	2021-10-11 10:21:47 -07:00
Junio C Hamano	921c795c25	Merge branch 'jt/add-submodule-odb-clean-up' More code paths that use the hack to add submodule's object database to the set of alternate object store have been cleaned up. * jt/add-submodule-odb-clean-up: revision: remove "submodule" from opt struct repository: support unabsorbed in repo_submodule_init submodule: remove unnecessary unabsorbed fallback	2021-10-06 13:40:11 -07:00
Ævar Arnfjörð Bjarmason	9865b6e6a4	.[ch] _INIT macros: use { 0 } for a "zero out" idiom In C it isn't required to specify that all members of a struct are zero'd out to 0, NULL or '\0', just providing a "{ 0 }" will accomplish that. Let's also change code that provided N zero'd fields to just provide one, and change e.g. "{ NULL }" to "{ 0 }" for consistency. I.e. even if the first member is a pointer let's use "0" instead of "NULL". The point of using "0" consistently is to pick one, and to not have the reader wonder why we're not using the same pattern everywhere. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-27 14:47:59 -07:00
Jeff King	67985e4e4a	refs: drop "broken" flag from for_each_fullref_in() No callers pass in anything but "0" here. Likewise to our sibling functions. Note that some of them ferry along the flag, but none of their callers pass anything but "0" either. Nor is anybody likely to change that. Callers which really want to see all of the raw refs use for_each_rawref(). And anybody interested in iterating a subset of the refs will likely be happy to use the now-default behavior of showing broken refs, but omitting dangling symlinks. So we can get rid of this whole feature. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-27 12:36:45 -07:00
Jonathan Tan	10a0d6ae64	revision: remove "submodule" from opt struct Clean up a TODO in revision.h by removing the "submodule" field from struct setup_revision_opt. This field is only used to specify the ref store to use, so use rev_info->repo to determine the ref store instead. The only users of this field are merge-ort.c and merge-recursive.c. However, both these files specify the superproject as rev_info->repo and the submodule as setup_revision_opt->submodule. In order to be able to pass the submodule as rev_info->repo, all commits must be parsed with the submodule explicitly specified; this patch does that as well. (An incremental solution in which only some commits are parsed with explicit submodule will not work, because if the same commit is parsed twice in different repositories, there will be 2 heap-allocated object structs corresponding to that commit, and any flag set by the revision walking mechanism on one of them will not be reflected onto the other.) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-09 14:09:30 -07:00
Patrick Steinhardt	f559d6d45e	revision: avoid hitting packfiles when commits are in commit-graph When queueing references in git-rev-list(1), we try to optimize parsing of commits via the commit-graph. To do so, we first look up the object's type, and if it is a commit we call `repo_parse_commit()` instead of `parse_object()`. This is quite inefficient though given that we're always uncompressing the object header in order to determine the type. Instead, we can opportunistically search the commit-graph for the object ID: in case it's found, we know it's a commit and can directly fill in the commit object without having to uncompress the object header. Expose a new function `lookup_commit_in_graph()`, which tries to find a commit in the commit-graph by ID, and convert `get_reference()` to use this function. This provides a big performance win in cases where we load references in a repository with lots of references pointing to commits. The following has been executed in a real-world repository with about 2.2 million refs: Benchmark #1: HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev Time (mean ± σ): 4.458 s ± 0.044 s [User: 4.115 s, System: 0.342 s] Range (min … max): 4.409 s … 4.534 s 10 runs Benchmark #2: HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev Time (mean ± σ): 3.089 s ± 0.015 s [User: 2.768 s, System: 0.321 s] Range (min … max): 3.061 s … 3.105 s 10 runs Summary 'HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' ran 1.44 ± 0.02 times faster than 'HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-08-09 09:51:12 -07:00
Patrick Steinhardt	bf9c0cbddb	revision: stop retrieving reference twice When queueing up references for the revision walk, `handle_one_ref()` will resolve the reference's object ID via `get_reference()` and then queue the ID as pending object via `add_pending_oid()`. But given that `add_pending_oid()` is only a thin wrapper around `add_pending_object()` which fist calls `get_reference()`, we effectively resolve the reference twice and thus duplicate some of the work. Fix the issue by instead calling `add_pending_object()` directly, which takes the already-resolved object as input. In a repository with lots of refs, this translates into a near 10% speedup: Benchmark #1: HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev Time (mean ± σ): 5.015 s ± 0.038 s [User: 4.698 s, System: 0.316 s] Range (min … max): 4.970 s … 5.089 s 10 runs Benchmark #2: HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev Time (mean ± σ): 4.606 s ± 0.029 s [User: 4.260 s, System: 0.345 s] Range (min … max): 4.565 s … 4.657 s 10 runs Summary 'HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' ran 1.09 ± 0.01 times faster than 'HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-08-09 09:51:12 -07:00
Patrick Steinhardt	f45022dc2f	connected: do not sort input revisions In order to compute whether objects reachable from a set of tips are all connected, we do a revision walk with these tips as positive references and `--not --all`. `--not --all` will cause the revision walk to load all preexisting references as uninteresting, which can be very expensive in repositories with many references. Benchmarking the git-rev-list(1) command highlights that by far the most expensive single phase is initial sorting of the input revisions: after all references have been loaded, we first sort commits by author date. In a real-world repository with about 2.2 million references, it makes up about 40% of the total runtime of git-rev-list(1). Ultimately, the connectivity check shouldn't really bother about the order of input revisions at all. We only care whether we can actually walk all objects until we hit the cut-off point. So sorting the input is a complete waste of time. Introduce a new "--unsorted-input" flag to git-rev-list(1) which will cause it to not sort the commits and adjust the connectivity check to always pass the flag. This results in the following speedups, executed in a clone of gitlab-org/gitlab [1]: Benchmark #1: git rev-list --objects --quiet --not --all --not $(cat newrev) Time (mean ± σ): 7.639 s ± 0.065 s [User: 7.304 s, System: 0.335 s] Range (min … max): 7.543 s … 7.742 s 10 runs Benchmark #2: git rev-list --unsorted-input --objects --quiet --not --all --not $newrev Time (mean ± σ): 4.995 s ± 0.044 s [User: 4.657 s, System: 0.337 s] Range (min … max): 4.909 s … 5.048 s 10 runs Summary 'git rev-list --unsorted-input --objects --quiet --not --all --not $(cat newrev)' ran 1.53 ± 0.02 times faster than 'git rev-list --objects --quiet --not --all --not $newrev' [1]: https://gitlab.com/gitlab-org/gitlab.git. Note that not all refs are visible to clients. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-08-09 09:51:12 -07:00
Patrick Steinhardt	29ef1f27fe	revision: separate walk and unsorted flags The `--no-walk` flag supports two modes: either it sorts the revisions given as input input or it doesn't. This is reflected in a single `no_walk` flag, which reflects one of the three states "walk", "don't walk but without sorting" and "don't walk but with sorting". Split up the flag into two separate bits, one indicating whether we should walk or not and one indicating whether the input should be sorted or not. This will allow us to more easily introduce a new flag `--unsorted-input`, which only impacts the sorting bit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-08-05 09:37:28 -07:00
Jeff King	1d72b604ef	add_pending_object_with_path(): work around "gcc -O3" complaint When compiling with -O3, some gcc versions (10.2.1 here) complain about an out-of-bounds subscript: revision.c: In function ‘do_add_index_objects_to_pending’: revision.c:321:22: error: array subscript [1, 2147483647] is outside array bounds of ‘char[1]’ [-Werror=array-bounds] 321 \| if (0 < len && name[len] && buf.len) \| ~~~~^~~~~ The "len" parameter here comes from calling interpret_branch_name(), which intends to return the number of characters of "name" it parsed. But the compiler doesn't realize this. It knows the size of the empty string "name" passed in from do_add_index_objects_to_pending(), but it has no clue that the "len" we get back will be constrained to "0" in that case. And I don't think the warning is telling us about some subtle or clever bug. The implementation of interpret_branch_name() is in another file entirely, and the compiler can't see it (you can even verify there is no clever LTO going on by replacing it with "return 0" and still getting the warning). We can work around this by replacing our "did we hit the trailing NUL" subscript dereference with a length check. We do not even have to pay the cost for an extra strlen(), as we can pass our new length into interpret_branch_name(), which was converting our "0" into a call to strlen() anyway. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-06-11 12:45:37 +09:00
Junio C Hamano	936e58851a	Merge branch 'ah/plugleaks' Plug various leans reported by LSAN. * ah/plugleaks: builtin/rm: avoid leaking pathspec and seen builtin/rebase: release git_format_patch_opt too builtin/for-each-ref: free filter and UNLEAK sorting. mailinfo: also free strbuf lists when clearing mailinfo builtin/checkout: clear pending objects after diffing builtin/check-ignore: clear_pathspec before returning builtin/bugreport: don't leak prefixed filename branch: FREE_AND_NULL instead of NULL'ing real_ref bloom: clear each bloom_key after use ls-files: free max_prefix when done wt-status: fix multiple small leaks revision: free remainder of old commit list in limit_list	2021-05-07 12:47:41 +09:00
Junio C Hamano	8585d6c04a	Merge branch 'ps/rev-list-object-type-filter' "git rev-list" learns the "--filter=object:type=<type>" option, which can be used to exclude objects of the given kind from the packfile generated by pack-objects. * ps/rev-list-object-type-filter: rev-list: allow filtering of provided items pack-bitmap: implement combined filter pack-bitmap: implement object type filter list-objects: implement object type filter list-objects: support filtering by tag and commit list-objects: move tag processing into its own function revision: mark commit parents as NOT_USER_GIVEN uploadpack.txt: document implication of `uploadpackfilter.allow`	2021-05-07 12:47:41 +09:00
Junio C Hamano	8e97852919	Merge branch 'ds/sparse-index-protections' Builds on top of the sparse-index infrastructure to mark operations that are not ready to mark with the sparse index, causing them to fall back on fully-populated index that they always have worked with. * ds/sparse-index-protections: (47 commits) name-hash: use expand_to_path() sparse-index: expand_to_path() name-hash: don't add directories to name_hash revision: ensure full index resolve-undo: ensure full index read-cache: ensure full index pathspec: ensure full index merge-recursive: ensure full index entry: ensure full index dir: ensure full index update-index: ensure full index stash: ensure full index rm: ensure full index merge-index: ensure full index ls-files: ensure full index grep: ensure full index fsck: ensure full index difftool: ensure full index commit: ensure full index checkout: ensure full index ...	2021-04-30 13:50:26 +09:00
Andrzej Hunt	db69bf608d	revision: free remainder of old commit list in limit_list limit_list() iterates over the original revs->commits list, and consumes many of its entries via pop_commit. However we might stop iterating over the list early (e.g. if we realise that the rest of the list is uninteresting). If we do stop iterating early, list will be pointing to the unconsumed portion of revs->commits - and we need to free this list to avoid a leak. (revs->commits itself will be an invalid pointer: it will have been free'd during the first pop_commit.) However the list pointer is later reused to iterate over our new list, but only for the limiting_can_increase_treesame() branch. We therefore need to introduce a new variable for that branch - and while we're here we can rename the original list to original_list as that makes its purpose more obvious. This leak was found while running t0090. It's not likely to be very impactful, but it can happen quite early during some checkout invocations, and hence seems to be worth fixing: Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x49a85d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3 #1 0x9ac084 in do_xmalloc wrapper.c:41:8 #2 0x9ac05a in xmalloc wrapper.c:62:9 #3 0x7175d6 in commit_list_insert commit.c:540:33 #4 0x71800f in commit_list_insert_by_date commit.c:604:9 #5 0x8f8d2e in process_parents revision.c:1128:5 #6 0x8f2f2c in limit_list revision.c:1418:7 #7 0x8f210e in prepare_revision_walk revision.c:3577:7 #8 0x514170 in orphaned_commit_warning builtin/checkout.c:1185:6 #9 0x512f05 in switch_branches builtin/checkout.c:1250:3 #10 0x50f8de in checkout_branch builtin/checkout.c:1646:9 #11 0x50ba12 in checkout_main builtin/checkout.c:2003:9 #12 0x5086c0 in cmd_checkout builtin/checkout.c:2055:8 #13 0x4cd91d in run_builtin git.c:467:11 #14 0x4cb5f3 in handle_builtin git.c:719:3 #15 0x4ccf47 in run_argv git.c:808:4 #16 0x4caf49 in cmd_main git.c:939:19 #17 0x69dc0e in main common-main.c:52:11 #18 0x7faaabd0e349 in __libc_start_main (/lib64/libc.so.6+0x24349) Indirect leak of 48 byte(s) in 3 object(s) allocated from: #0 0x49a85d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3 #1 0x9ac084 in do_xmalloc wrapper.c:41:8 #2 0x9ac05a in xmalloc wrapper.c:62:9 #3 0x717de6 in commit_list_append commit.c:1609:35 #4 0x8f1f9b in prepare_revision_walk revision.c:3554:12 #5 0x514170 in orphaned_commit_warning builtin/checkout.c:1185:6 #6 0x512f05 in switch_branches builtin/checkout.c:1250:3 #7 0x50f8de in checkout_branch builtin/checkout.c:1646:9 #8 0x50ba12 in checkout_main builtin/checkout.c:2003:9 #9 0x5086c0 in cmd_checkout builtin/checkout.c:2055:8 #10 0x4cd91d in run_builtin git.c:467:11 #11 0x4cb5f3 in handle_builtin git.c:719:3 #12 0x4ccf47 in run_argv git.c:808:4 #13 0x4caf49 in cmd_main git.c:939:19 #14 0x69dc0e in main common-main.c:52:11 #15 0x7faaabd0e349 in __libc_start_main (/lib64/libc.so.6+0x24349) Signed-off-by: Andrzej Hunt <ajrhunt@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-28 09:25:44 +09:00
Derrick Stolee	f5fed74fb2	revision: ensure full index Before iterating over all index entries, ensure that a sparse index is expanded to a full index to avoid unexpected behavior. This case could be integrated later by ensuring that we walk the tree in the sparse-directory entry, but the current behavior is only expecting blobs. Save this integration for later when it can be properly tested. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-14 13:47:48 -07:00
Jeff King	c1fa951d7e	revision: avoid parsing with --exclude-promisor-objects When --exclude-promisor-objects is given, before traversing any objects we iterate over all of the objects in any promisor packs, marking them as UNINTERESTING and SEEN. We turn the oid we get from iterating the pack into an object with parse_object(), but this has two problems: - it's slow; we are zlib inflating (and reconstructing from deltas) every byte of every object in the packfile - it leaves the tree buffers attached to their structs, which means our heap usage will grow to store every uncompressed tree simultaneously. This can be gigabytes. We can obviously fix the second by freeing the tree buffers after we've parsed them. But we can observe that the function doesn't look at the object contents at all! The only reason we call parse_object() is that we need a "struct object" on which to set the flags. There are two options here: - we can look up just the object type via oid_object_info(), and then call the appropriate lookup_foo() function - we can call lookup_unknown_object(), which gives us an OBJ_NONE struct (which will get auto-converted later by object_as_type() via calls to lookup_commit(), etc). The first one is closer to the current code, but we do pay the price to look up the type for each object. The latter should be more efficient in CPU, though it wastes a little bit of memory (the "unknown" object structs are a union of all object types, so some of the structs are bigger than they need to be). It also runs the risk of triggering a latent bug in code that calls lookup_object() directly but isn't ready to handle OBJ_NONE (such code would already be buggy, but we use lookup_unknown_object() infrequently enough that it might be hiding). I went with the second option here. I don't think the risk is high (and we'd want to find and fix any such bugs anyway), and it should be more efficient overall. The new tests in p5600 show off the improvement (this is on git.git): Test HEAD^ HEAD ------------------------------------------------------------------------------- 5600.5: count commits 0.37(0.37+0.00) 0.38(0.38+0.00) +2.7% 5600.6: count non-promisor commits 11.74(11.37+0.37) 0.04(0.03+0.00) -99.7% The improvement is particularly big in this script because _every_ object in the newly-cloned partial repo is a promisor object. So after marking them all, there's nothing left to traverse. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-13 13:22:37 -07:00
Patrick Steinhardt	b2025da38b	revision: mark commit parents as NOT_USER_GIVEN The NOT_USER_GIVEN flag of an object marks whether a flag was explicitly provided by the user or not. The most important use case for this is when filtering objects: only objects that were not explicitly requested will get filtered. The flag is currently only set for blobs and trees, which has been fine given that there are no filters for tags or commits currently. We're about to extend filtering capabilities to add object type filter though, which requires us to set up the NOT_USER_GIVEN flag correctly -- if it's not set, the object wouldn't get filtered at all. Mark unseen commit parents as NOT_USER_GIVEN when processing parents. Like this, explicitly provided parents stay user-given and thus unfiltered, while parents which get loaded as part of the graph walk can be filtered. This commit shouldn't have any user-visible impact yet as there is no logic to filter commits yet. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-10 23:03:20 -07:00
Junio C Hamano	2744383cbd	Merge branch 'tb/geometric-repack' "git repack" so far has been only capable of repacking everything under the sun into a single pack (or split by size). A cleverer strategy to reduce the cost of repacking a repository has been introduced. * tb/geometric-repack: builtin/pack-objects.c: ignore missing links with --stdin-packs builtin/repack.c: reword comment around pack-objects flags builtin/repack.c: be more conservative with unsigned overflows builtin/repack.c: assign pack split later t7703: test --geometric repack with loose objects builtin/repack.c: do not repack single packs with --geometric builtin/repack.c: add '--geometric' option packfile: add kept-pack cache for find_kept_pack_entry() builtin/pack-objects.c: rewrite honor-pack-keep logic p5303: measure time to repack with keep p5303: add missing &&-chains builtin/pack-objects.c: add '--stdin-packs' option revision: learn '--no-kept-objects' packfile: introduce 'find_kept_pack_entry()'	2021-03-24 14:36:27 -07:00
René Scharfe	ca56dadb4b	use CALLOC_ARRAY Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-13 16:00:09 -08:00
Taylor Blau	c9fff00016	revision: learn '--no-kept-objects' A future caller will want to be able to perform a reachability traversal which terminates when visiting an object found in a kept pack. The closest existing option is '--honor-pack-keep', but this isn't quite what we want. Instead of halting the traversal midway through, a full traversal is always performed, and the results are only trimmed afterwords. Besides needing to introduce a new flag (since culling results post-facto can be different than halting the traversal as it's happening), there is an additional wrinkle handling the distinction in-core and on-disk kept packs. That is: what kinds of kept pack should stop the traversal? Introduce '--no-kept-objects[=<on-disk\|in-core>]' to specify which kinds of kept packs, if any, should stop a traversal. This can be useful for callers that want to perform a reachability analysis, but want to leave certain packs alone (for e.g., when doing a geometric repack that has some "large" packs which are kept in-core that it wants to leave alone). Note that this option is not guaranteed to produce exactly the set of objects that aren't in kept packs, since it's possible the traversal order may end up in a situation where a non-kept ancestor was "cut off" by a kept object (at which point we would stop traversing). But, we don't care about absolute correctness here, since this will eventually be used as a purely additive guide in an upcoming new repack mode. Explicitly avoid documenting this new flag, since it is only used internally. In theory we could avoid even adding it rev-list, but being able to spell this option out on the command-line makes some special cases easier to test without promising to keep it behaving consistently forever. Those tricky cases are exercised in t6114. Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 23:30:52 -08:00
Junio C Hamano	8b4701ae4f	Merge branch 'ak/corrected-commit-date' The commit-graph learned to use corrected commit dates instead of the generation number to help topological revision traversal. * ak/corrected-commit-date: doc: add corrected commit date info commit-reach: use corrected commit dates in paint_down_to_common() commit-graph: use generation v2 only if entire chain does commit-graph: implement generation data chunk commit-graph: implement corrected commit date commit-graph: return 64-bit generation number commit-graph: add a slab to store topological levels t6600-test-reach: generalize *_three_modes commit-graph: consolidate fill_commit_graph_info revision: parse parent in indegree_walk_step() commit-graph: fix regression when computing Bloom filters	2021-02-17 17:21:40 -08:00
Junio C Hamano	c9f94ab4fa	Merge branch 'ab/lose-grep-debug' Lose the debugging aid that may have been useful in the past, but no longer is, in the "grep" codepaths. * ab/lose-grep-debug: grep/log: remove hidden --debug and --grep-debug options	2021-02-10 14:48:31 -08:00
Junio C Hamano	aac006aa99	Merge branch 'so/log-diff-merge' "git log" learned a new "--diff-merges=<how>" option. * so/log-diff-merge: (32 commits) t4013: add tests for --diff-merges=first-parent doc/git-show: include --diff-merges description doc/rev-list-options: document --first-parent changes merges format doc/diff-generate-patch: mention new --diff-merges option doc/git-log: describe new --diff-merges options diff-merges: add '--diff-merges=1' as synonym for 'first-parent' diff-merges: add old mnemonic counterparts to --diff-merges diff-merges: let new options enable diff without -p diff-merges: do not imply -p for new options diff-merges: implement new values for --diff-merges diff-merges: make -m/-c/--cc explicitly mutually exclusive diff-merges: refactor opt settings into separate functions diff-merges: get rid of now empty diff_merges_init_revs() diff-merges: group diff-merge flags next to each other inside 'rev_info' diff-merges: split 'ignore_merges' field diff-merges: fix -m to properly override -c/--cc t4013: add tests for -m failing to override -c/--cc t4013: support test_expect_failure through ':failure' magic diff-merges: revise revs->diff flag handling diff-merges: handle imply -p on -c/--cc logic for log.c ...	2021-02-05 16:40:44 -08:00
Ævar Arnfjörð Bjarmason	15c9649730	grep/log: remove hidden --debug and --grep-debug options Remove the hidden "grep --debug" and "log --grep-debug" options added in `17bf35a3c7` (grep: teach --debug option to dump the parse tree, 2012-09-13). At the time these options seem to have been intended to go along with a documentation discussion and to help the author of relevant tests to perform ad-hoc debugging on them[1]. Reasons to want this gone: 1. They were never documented, and the only (rather trivial) use of them in our own codebase for testing is something I removed back in `e01b4dab01` (grep: change non-ASCII -i test to stop using --debug, 2017-05-20). 2. Googling around doesn't show any in-the-wild uses I could dig up, and on the Git ML the only mentions after the original discussion seem to have been when they came up in unrelated diff contexts, or that test commit of mine. 3. An exception to that is `c581e4a749` (grep: under --debug, show whether PCRE JIT is enabled, 2019-08-18) where we added the ability to dump out when PCREv2 has the JIT in effect. The combination of that and my earlier `b65abcafc7` (grep: use PCRE v2 for optimized fixed-string search, 2019-07-01) means Git prints this out in its most common in-the-wild configuration: $ git log --grep-debug --grep=foo --grep=bar --grep=baz --all-match pcre2_jit_on=1 pcre2_jit_on=1 pcre2_jit_on=1 [all-match] (or pattern_body<body>foo (or pattern_body<body>bar pattern_body<body>baz ) ) $ git grep --debug $ -e foo --and -e bar $ --or -e baz pcre2_jit_on=1 pcre2_jit_on=1 pcre2_jit_on=1 (or (and patternfoo patternbar ) patternbaz ) I.e. for each pattern we're considering for the and/or/--all-match etc. debugging we'll now diligently spew out another identical line saying whether the PCREv2 JIT is on or not. I think that nobody's complained about that rather glaringly obviously bad output says something about how much this is used, i.e. it's not. The need for this debugging aid for the composed grep/log patterns seems to have passed, and the desire to dump the JIT config seems to have been another one-off around the time we had JIT-related issues on the PCREv2 codepath. That the original author of this debugging facility seemingly hasn't noticed the bad output since then[2] is probably some indicator. 1. https://lore.kernel.org/git/cover.1347615361.git.git@drmicha.warpmail.net/ 2. https://lore.kernel.org/git/xmqqk1b8x0ac.fsf@gitster-ct.c.googlers.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-26 11:36:20 -08:00
Junio C Hamano	b69bed22c5	Merge branch 'jk/log-cherry-pick-duplicate-patches' When more than one commit with the same patch ID appears on one side, "git log --cherry-pick A...B" did not exclude them all when a commit with the same patch ID appears on the other side. Now it does. * jk/log-cherry-pick-duplicate-patches: patch-ids: handle duplicate hashmap entries	2021-01-25 14:19:19 -08:00
Abhishek Kumar	d7f92784c6	commit-graph: return 64-bit generation number In a preparatory step for introducing corrected commit dates, let's return timestamp_t values from commit_graph_generation(), use timestamp_t for local variables and define GENERATION_NUMBER_INFINITY as (2 ^ 63 - 1) instead. We rename GENERATION_NUMBER_MAX to GENERATION_NUMBER_V1_MAX to represent the largest topological level we can store in the commit data chunk. With corrected commit dates implemented, we will have two such *_MAX variables to denote the largest offset and largest topological level that can be stored. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-18 16:21:18 -08:00
Abhishek Kumar	2f9bbb6d91	revision: parse parent in indegree_walk_step() In indegree_walk_step(), we add unvisited parents to the indegree queue. However, parents are not guaranteed to be parsed. As the indegree queue sorts by generation number, let's parse parents before inserting them to ensure the correct priority order. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-18 16:21:18 -08:00
Jeff King	c9e3a4e76d	patch-ids: handle duplicate hashmap entries This fixes a bug introduced in `dfb7a1b4d0` (patch-ids: stop using a hand-rolled hashmap implementation, 2016-07-29) in which git rev-list --cherry-pick A...B will fail to suppress commits reachable from A even if a commit with matching patch-id appears in B. Around the time of that commit, the algorithm for "--cherry-pick" looked something like this: 0. Traverse all of the commits, marking them as being on the left or right side of the symmetric difference. 1. Iterate over the left-hand commits, inserting a patch-id struct for each into a hashmap, and pointing commit->util to the patch-id struct. 2. Iterate over the right-hand commits, checking which are present in the hashmap. If so, we exclude the commit from the output _and_ we mark the patch-id as "seen". 3. Iterate again over the left-hand commits, checking whether commit->util->seen is set; if so, exclude them from the output. At the end, we'll have eliminated commits from both sides that have a matching patch-id on the other side. But there's a subtle assumption here: for any given patch-id, we must have exactly one struct representing it. If two commits from A both have the same patch-id and we allow duplicates in the hashmap, then we run into a problem: a. In step 1, we insert two patch-id structs into the hashmap. b. In step 2, our lookups will find only one of these structs, so only one "seen" flag is marked. c. In step 3, one of the commits in A will have its commit->util->seen set, but the other will not. We'll erroneously output the latter. Prior to `dfb7a1b4d0`, our hashmap did not allow duplicates. Afterwards, it used hashmap_add(), which explicitly does allow duplicates. At that point, the solution would have been easy: when we are about to add a duplicate, skip doing so and return the existing entry which matches. But it gets more complicated. In `683f17ec44` (patch-ids: replace the seen indicator with a commit pointer, 2016-07-29), our step 3 goes away entirely. Instead, in step 2, when the right-hand side finds a matching patch_id from the left-hand side, we can directly mark the left-hand patch_id->commit to be omitted. Solving that would be easy, too; there's a one-to-many relationship of patch-ids to commits, so we just need to keep a list. But there's more. Commit `b3dfeebb92` (rebase: avoid computing unnecessary patch IDs, 2016-07-29) built on that by lazily computing the full patch-ids. So we don't even know when adding to the hashmap whether two commits truly have the same id. We'd have to tentatively assign them a list, and then possibly split them apart (possibly into N new structs) at the moment we compute the real patch-ids. This could work, but it's complicated and error-prone. Instead, let's accept that we may store duplicates, and teach the lookup side to be more clever. Rather than asking for a single matching patch-id, it will need to iterate over all matching patch-ids. This does mean examining every entry in a single hash bucket, but the worst-case for a hash lookup was already doing that. We'll keep the hashmap details out of the caller by providing a simple iteration interface. We can retain the simple has_commit_patch_id() interface for the other callers, but we'll simplify its return value into an integer, rather than returning the patch_id struct. That way they won't be tempted to look at the "commit" field of the return value without iterating. Reported-by: Arnaud Morin <arnaud.morin@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 11:13:32 -08:00
Derrick Stolee	90b666da60	revision: trace topo-walk statistics We trace statistics about the effectiveness of changed-path Bloom filters since `42e50e78` (revision.c: add trace2 stats around Bloom filter usage, 2020-04-06). Add similar tracing for the topo-walk algorithm that uses generation numbers to limit the walk size. This information can help investigate and describe benefits to heuristics and other changes. The information that is printed is in JSON format and can be formatted nicely to present as follows: { "count_explort_walked":2603, "count_indegree_walked":2603, "count_topo_walked":473 } Each of these values count the number of commits are visited by each of the three "stages" of the topo-walk as detailed in `b4542418` (revision.c: generation-based topo-order algorithm, 2018-11-01). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 15:18:22 -08:00
Sergey Organov	a6e66af923	diff-merges: get rid of now empty diff_merges_init_revs() After getting rid of 'ignore_merges' field, the diff_merges_init_revs() function became empty. Get rid of it. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:31 -08:00
Sergey Organov	18f09473bf	diff-merges: rename all functions to have common prefix Use the same "diff_merges" prefix for all the diff merges function names. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:30 -08:00

1 2 3 4 5 ...

989 Commits