git/patch-ids.h

#ifndef PATCH_IDS_H
#define PATCH_IDS_H

#include "diff.h"
#include "hashmap.h"

struct commit;
struct object_id;
struct repository;

struct patch_id {
	struct hashmap_entry ent;
	struct object_id patch_id;
	struct commit *commit;
};

struct patch_ids {
	struct hashmap patches;
	struct diff_options diffopts;
};

int commit_patch_id(struct commit *commit, struct diff_options *options,
		    struct object_id *oid, int);
int init_patch_ids(struct repository *, struct patch_ids *);
int free_patch_ids(struct patch_ids *);

/* Add a patch_id for a single commit to the set. */
struct patch_id *add_commit_patch_id(struct commit *, struct patch_ids *);

/* Returns true if the patch-id of "commit" is present in the set. */
int has_commit_patch_id(struct commit *commit, struct patch_ids *);

/*
 * Iterate over all commits in the set whose patch id matches that of
 * "commit", like:
 *
 *   struct patch_id *cur;
 *   for (cur = patch_id_iter_first(commit, ids);
 *        cur;
 *        cur = patch_id_iter_next(cur, ids) {
 *           ... look at cur->commit
 *   }
 */
struct patch_id *patch_id_iter_first(struct commit *commit, struct patch_ids *);
struct patch_id *patch_id_iter_next(struct patch_id *cur, struct patch_ids *);

#endif /* PATCH_IDS_H */
Refactor patch-id filtering out of git-cherry and git-format-patch. This implements the patch-id computation and recording library, patch-ids.c, and rewrites the get_patch_ids() function used in cherry and format-patch to use it, so that they do not pollute the object namespace. Earlier code threw non-objects into the in-core object database, and hoped for not getting bitten by SHA-1 collisions. While it may be practically Ok, it still was an ugly hack. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-10 08:01:27 +08:00			`#ifndef PATCH_IDS_H`
			`#define PATCH_IDS_H`

Add missing includes and forward declarations I looped over the toplevel header files, creating a temporary two-line C program for each consisting of #include "git-compat-util.h" #include $HEADER This patch is the result of manually fixing errors in compiling those tiny programs. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-08-16 01:54:05 +08:00			`#include "diff.h"`
			`#include "hashmap.h"`

			`struct commit;`
			`struct object_id;`
patch-ids.c: remove implicit dependency on the_index Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-21 23:57:30 +08:00			`struct repository;`
Add missing includes and forward declarations I looped over the toplevel header files, creating a temporary two-line C program for each consisting of #include "git-compat-util.h" #include $HEADER This patch is the result of manually fixing errors in compiling those tiny programs. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-08-16 01:54:05 +08:00
Refactor patch-id filtering out of git-cherry and git-format-patch. This implements the patch-id computation and recording library, patch-ids.c, and rewrites the get_patch_ids() function used in cherry and format-patch to use it, so that they do not pollute the object namespace. Earlier code threw non-objects into the in-core object database, and hoped for not getting bitten by SHA-1 collisions. While it may be practically Ok, it still was an ugly hack. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-10 08:01:27 +08:00			`struct patch_id {`
patch-ids: stop using a hand-rolled hashmap implementation This change will use the hashmap from the hashmap.h to keep track of the patch_ids that have been encountered instead of using an internal implementation. This simplifies the implementation of the patch ids. Signed-off-by: Kevin Willford <kcwillford@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-07-30 00:19:17 +08:00			`struct hashmap_entry ent;`
patch-ids: convert to struct object_id Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-05-31 01:30:53 +08:00			`struct object_id patch_id;`
patch-ids: replace the seen indicator with a commit pointer The cherry_pick_list was looping through the original side checking the seen indicator and setting the cherry_flag on the commit. If we save off the commit in the patch_id we can set the cherry_flag on the correct commit when running through the other side when a patch_id match is found. Signed-off-by: Kevin Willford <kcwillford@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-07-30 00:19:18 +08:00			`struct commit *commit;`
Refactor patch-id filtering out of git-cherry and git-format-patch. This implements the patch-id computation and recording library, patch-ids.c, and rewrites the get_patch_ids() function used in cherry and format-patch to use it, so that they do not pollute the object namespace. Earlier code threw non-objects into the in-core object database, and hoped for not getting bitten by SHA-1 collisions. While it may be practically Ok, it still was an ugly hack. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-10 08:01:27 +08:00			`};`

			`struct patch_ids {`
patch-ids: stop using a hand-rolled hashmap implementation This change will use the hashmap from the hashmap.h to keep track of the patch_ids that have been encountered instead of using an internal implementation. This simplifies the implementation of the patch ids. Signed-off-by: Kevin Willford <kcwillford@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-07-30 00:19:17 +08:00			`struct hashmap patches;`
Refactor patch-id filtering out of git-cherry and git-format-patch. This implements the patch-id computation and recording library, patch-ids.c, and rewrites the get_patch_ids() function used in cherry and format-patch to use it, so that they do not pollute the object namespace. Earlier code threw non-objects into the in-core object database, and hoped for not getting bitten by SHA-1 collisions. While it may be practically Ok, it still was an ugly hack. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-10 08:01:27 +08:00			`struct diff_options diffopts;`
			`};`

patch-ids: make commit_patch_id() a public helper function Make commit_patch_id() available to other builtins. Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-04-26 15:51:21 +08:00			`int commit_patch_id(struct commit commit, struct diff_options options,`
patch-id: use stable patch-id for rebases Git doesn't persist patch-ids during the rebase process, so there is no need to specifically invoke the unstable variant. Use the stable logic for all internal patch-id calculations to minimize the number of code paths and improve test coverage. Signed-off-by: Jerry Zhang <jerry@skydio.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2022-10-25 04:07:40 +08:00			`struct object_id *oid, int);`
patch-ids.c: remove implicit dependency on the_index Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-09-21 23:57:30 +08:00			`int init_patch_ids(struct repository , struct patch_ids );`
Refactor patch-id filtering out of git-cherry and git-format-patch. This implements the patch-id computation and recording library, patch-ids.c, and rewrites the get_patch_ids() function used in cherry and format-patch to use it, so that they do not pollute the object namespace. Earlier code threw non-objects into the in-core object database, and hoped for not getting bitten by SHA-1 collisions. While it may be practically Ok, it still was an ugly hack. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-10 08:01:27 +08:00			`int free_patch_ids(struct patch_ids *);`
patch-ids: handle duplicate hashmap entries This fixes a bug introduced in dfb7a1b4d0 (patch-ids: stop using a hand-rolled hashmap implementation, 2016-07-29) in which git rev-list --cherry-pick A...B will fail to suppress commits reachable from A even if a commit with matching patch-id appears in B. Around the time of that commit, the algorithm for "--cherry-pick" looked something like this: 0. Traverse all of the commits, marking them as being on the left or right side of the symmetric difference. 1. Iterate over the left-hand commits, inserting a patch-id struct for each into a hashmap, and pointing commit->util to the patch-id struct. 2. Iterate over the right-hand commits, checking which are present in the hashmap. If so, we exclude the commit from the output _and_ we mark the patch-id as "seen". 3. Iterate again over the left-hand commits, checking whether commit->util->seen is set; if so, exclude them from the output. At the end, we'll have eliminated commits from both sides that have a matching patch-id on the other side. But there's a subtle assumption here: for any given patch-id, we must have exactly one struct representing it. If two commits from A both have the same patch-id and we allow duplicates in the hashmap, then we run into a problem: a. In step 1, we insert two patch-id structs into the hashmap. b. In step 2, our lookups will find only one of these structs, so only one "seen" flag is marked. c. In step 3, one of the commits in A will have its commit->util->seen set, but the other will not. We'll erroneously output the latter. Prior to dfb7a1b4d0, our hashmap did not allow duplicates. Afterwards, it used hashmap_add(), which explicitly does allow duplicates. At that point, the solution would have been easy: when we are about to add a duplicate, skip doing so and return the existing entry which matches. But it gets more complicated. In 683f17ec44 (patch-ids: replace the seen indicator with a commit pointer, 2016-07-29), our step 3 goes away entirely. Instead, in step 2, when the right-hand side finds a matching patch_id from the left-hand side, we can directly mark the left-hand patch_id->commit to be omitted. Solving that would be easy, too; there's a one-to-many relationship of patch-ids to commits, so we just need to keep a list. But there's more. Commit b3dfeebb92 (rebase: avoid computing unnecessary patch IDs, 2016-07-29) built on that by lazily computing the full patch-ids. So we don't even know when adding to the hashmap whether two commits truly have the same id. We'd have to tentatively assign them a list, and then possibly split them apart (possibly into N new structs) at the moment we compute the real patch-ids. This could work, but it's complicated and error-prone. Instead, let's accept that we may store duplicates, and teach the lookup side to be more clever. Rather than asking for a single matching patch-id, it will need to iterate over all matching patch-ids. This does mean examining every entry in a single hash bucket, but the worst-case for a hash lookup was already doing that. We'll keep the hashmap details out of the caller by providing a simple iteration interface. We can retain the simple has_commit_patch_id() interface for the other callers, but we'll simplify its return value into an integer, rather than returning the patch_id struct. That way they won't be tempted to look at the "commit" field of the return value without iterating. Reported-by: Arnaud Morin <arnaud.morin@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2021-01-12 23:52:32 +08:00
			`/* Add a patch_id for a single commit to the set. */`
Refactor patch-id filtering out of git-cherry and git-format-patch. This implements the patch-id computation and recording library, patch-ids.c, and rewrites the get_patch_ids() function used in cherry and format-patch to use it, so that they do not pollute the object namespace. Earlier code threw non-objects into the in-core object database, and hoped for not getting bitten by SHA-1 collisions. While it may be practically Ok, it still was an ugly hack. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-10 08:01:27 +08:00			`struct patch_id add_commit_patch_id(struct commit , struct patch_ids *);`
patch-ids: handle duplicate hashmap entries This fixes a bug introduced in dfb7a1b4d0 (patch-ids: stop using a hand-rolled hashmap implementation, 2016-07-29) in which git rev-list --cherry-pick A...B will fail to suppress commits reachable from A even if a commit with matching patch-id appears in B. Around the time of that commit, the algorithm for "--cherry-pick" looked something like this: 0. Traverse all of the commits, marking them as being on the left or right side of the symmetric difference. 1. Iterate over the left-hand commits, inserting a patch-id struct for each into a hashmap, and pointing commit->util to the patch-id struct. 2. Iterate over the right-hand commits, checking which are present in the hashmap. If so, we exclude the commit from the output _and_ we mark the patch-id as "seen". 3. Iterate again over the left-hand commits, checking whether commit->util->seen is set; if so, exclude them from the output. At the end, we'll have eliminated commits from both sides that have a matching patch-id on the other side. But there's a subtle assumption here: for any given patch-id, we must have exactly one struct representing it. If two commits from A both have the same patch-id and we allow duplicates in the hashmap, then we run into a problem: a. In step 1, we insert two patch-id structs into the hashmap. b. In step 2, our lookups will find only one of these structs, so only one "seen" flag is marked. c. In step 3, one of the commits in A will have its commit->util->seen set, but the other will not. We'll erroneously output the latter. Prior to dfb7a1b4d0, our hashmap did not allow duplicates. Afterwards, it used hashmap_add(), which explicitly does allow duplicates. At that point, the solution would have been easy: when we are about to add a duplicate, skip doing so and return the existing entry which matches. But it gets more complicated. In 683f17ec44 (patch-ids: replace the seen indicator with a commit pointer, 2016-07-29), our step 3 goes away entirely. Instead, in step 2, when the right-hand side finds a matching patch_id from the left-hand side, we can directly mark the left-hand patch_id->commit to be omitted. Solving that would be easy, too; there's a one-to-many relationship of patch-ids to commits, so we just need to keep a list. But there's more. Commit b3dfeebb92 (rebase: avoid computing unnecessary patch IDs, 2016-07-29) built on that by lazily computing the full patch-ids. So we don't even know when adding to the hashmap whether two commits truly have the same id. We'd have to tentatively assign them a list, and then possibly split them apart (possibly into N new structs) at the moment we compute the real patch-ids. This could work, but it's complicated and error-prone. Instead, let's accept that we may store duplicates, and teach the lookup side to be more clever. Rather than asking for a single matching patch-id, it will need to iterate over all matching patch-ids. This does mean examining every entry in a single hash bucket, but the worst-case for a hash lookup was already doing that. We'll keep the hashmap details out of the caller by providing a simple iteration interface. We can retain the simple has_commit_patch_id() interface for the other callers, but we'll simplify its return value into an integer, rather than returning the patch_id struct. That way they won't be tempted to look at the "commit" field of the return value without iterating. Reported-by: Arnaud Morin <arnaud.morin@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2021-01-12 23:52:32 +08:00
			`/* Returns true if the patch-id of "commit" is present in the set. */`
			`int has_commit_patch_id(struct commit commit, struct patch_ids );`

			`/*`
			`* Iterate over all commits in the set whose patch id matches that of`
			`* "commit", like:`
			`*`
			`* struct patch_id *cur;`
			`* for (cur = patch_id_iter_first(commit, ids);`
			`* cur;`
			`* cur = patch_id_iter_next(cur, ids) {`
			`* ... look at cur->commit`
			`* }`
			`*/`
			`struct patch_id patch_id_iter_first(struct commit commit, struct patch_ids *);`
			`struct patch_id patch_id_iter_next(struct patch_id cur, struct patch_ids *);`
Refactor patch-id filtering out of git-cherry and git-format-patch. This implements the patch-id computation and recording library, patch-ids.c, and rewrites the get_patch_ids() function used in cherry and format-patch to use it, so that they do not pollute the object namespace. Earlier code threw non-objects into the in-core object database, and hoped for not getting bitten by SHA-1 collisions. While it may be practically Ok, it still was an ugly hack. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-10 08:01:27 +08:00
			`#endif /* PATCH_IDS_H */`