global: introduce `USE_THE_REPOSITORY_VARIABLE` macro
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.
It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.
Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit
For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).
Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 14:50:23 +08:00
|
|
|
#define USE_THE_REPOSITORY_VARIABLE
|
|
|
|
|
2023-02-24 08:09:24 +08:00
|
|
|
#include "git-compat-util.h"
|
2018-07-21 00:33:02 +08:00
|
|
|
#include "commit.h"
|
2018-07-21 00:33:08 +08:00
|
|
|
#include "commit-graph.h"
|
2018-07-21 00:33:06 +08:00
|
|
|
#include "decorate.h"
|
2023-02-24 08:09:27 +08:00
|
|
|
#include "hex.h"
|
2018-07-21 00:33:06 +08:00
|
|
|
#include "prio-queue.h"
|
2018-08-29 05:36:57 +08:00
|
|
|
#include "ref-filter.h"
|
2018-07-21 00:33:06 +08:00
|
|
|
#include "revision.h"
|
|
|
|
#include "tag.h"
|
2018-07-21 00:33:02 +08:00
|
|
|
#include "commit-reach.h"
|
commit-reach: implement ahead_behind() logic
Fully implement the commit-counting logic required to determine
ahead/behind counts for a batch of commit pairs. This is a new library
method within commit-reach.h. This method will be linked to the
for-each-ref builtin in the next change.
The interface for ahead_behind() uses two arrays. The first array of
commits contains the list of all starting points for the walk. This
includes all tip commits _and_ base commits. The second array specifies
base/tip pairs by pointing to commits within the first array, by index.
The second array also stores the resulting ahead/behind counts for each
of these pairs.
This implementation of ahead_behind() allows multiple bases, if desired.
Even with multiple bases, there is only one commit walk used for
counting the ahead/behind values, saving time when the base/tip ranges
overlap significantly.
This interface for ahead_behind() also makes it very easy to call
ensure_generations_valid() on the entire array of bases and tips. This
call is necessary because it is critical that the walk that counts
ahead/behind values never walks a commit more than once. Without
generation numbers on every commit, there is a possibility that a
commit date skew could cause the walk to revisit a commit and then
double-count it. For this reason, it is strongly recommended that 'git
ahead-behind' is only run in a repository with a commit-graph file that
covers most of the reachable commits, storing precomputed generation
numbers. If no commit-graph exists, this walk will be much slower as it
must walk all reachable commits in ensure_generations_valid() before
performing the counting logic.
It is possible to detect if generation numbers are available at run time
and redirect the implementation to another algorithm that does not
require this property. However, that implementation requires a commit
walk per base/tip pair _and_ can be slower due to the commit date
heuristics required. Such an implementation could be considered in the
future if there is a reason to include it, but most Git hosts should
already be generating a commit-graph file as part of repository
maintenance. Most Git clients should also be generating commit-graph
files as part of background maintenance or automatic GCs.
Now, let's discuss the ahead/behind counting algorithm.
The first array of commits are considered the starting commits. The
index within that array will play a critical role.
We create a new commit slab that maps commits to a bitmap. For a given
commit (anywhere in the history), its bitmap stores information relative
to which of the input commits can reach that commit. The ith bit will be
on if the ith commit from the starting list can reach that commit. It is
important to notice that these bitmaps are not the typical "reachability
bitmaps" that are stored in .bitmap files. Instead of signalling which
objects are reachable from the current commit, they instead signal
"which starting commits can reach me?" It is also important to know that
the bitmap is not necessarily "complete" until we walk that commit. We
will perform a commit walk by generation number in such a way that we
can guarantee the bitmap is correct when we visit that commit.
At the beginning of the ahead_behind() method, we initialize the bitmaps
for each of the starting commits. By enabling the ith bit for the ith
starting commit, we signal "the ith commit can reach itself."
We walk commits by popping the commit with maximum generation number out
of the queue, guaranteeing that we will never walk a child of that
commit in any future steps.
As we walk, we load the bitmap for the current commit and perform two
main steps. The _second_ step examines each parent of the current commit
and adds the current commit's bitmap bits to each parent's bitmap. (We
create a new bitmap for the parent if this is our first time seeing that
parent.) After adding the bits to the parent's bitmap, the parent is
added to the walk queue. Due to this passing of bits to parents, the
current commit has a guarantee that the ith bit is enabled on its bitmap
if and only if the ith commit can reach the current commit.
The first step of the walk is to examine the bitmask on the current
commit and decide which ranges the commit is in or not. Due to the "bit
pushing" in the second step, we have a guarantee that the ith bit of the
current commit's bitmap is on if and only if the ith starting commit can
reach it. For each ahead_behind_count struct, check the base_index and
tip_index to see if those bits are enabled on the current bitmap. If
exactly one bit is enabled, then increment the corresponding 'ahead' or
'behind' count. This increment is the reason we _absolutely need_ to
walk commits at most once.
The only subtle thing to do with this walk is to check to see if a
parent has all bits on in its bitmap, in which case it becomes "stale"
and is marked with the STALE bit. This allows queue_has_nonstale() to be
the terminating condition of the walk, which greatly reduces the number
of commits walked if all of the commits are nearby in history. It avoids
walking a large number of common commits when there is a deep history.
We also use the helper method insert_no_dup() to add commits to the
priority queue without adding them multiple times. This uses the PARENT2
flag. Thus, we must clear both the STALE and PARENT2 bits of all
commits, in case ahead_behind() is called multiple times in the same
process.
Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 19:26:53 +08:00
|
|
|
#include "ewah/ewok.h"
|
2018-07-21 00:33:02 +08:00
|
|
|
|
|
|
|
/* Remember to update object flag allocation in object.h */
|
|
|
|
#define PARENT1 (1u<<16)
|
|
|
|
#define PARENT2 (1u<<17)
|
|
|
|
#define STALE (1u<<18)
|
|
|
|
#define RESULT (1u<<19)
|
|
|
|
|
|
|
|
static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT);
|
|
|
|
|
2021-02-19 20:34:08 +08:00
|
|
|
static int compare_commits_by_gen(const void *_a, const void *_b)
|
|
|
|
{
|
|
|
|
const struct commit *a = *(const struct commit * const *)_a;
|
|
|
|
const struct commit *b = *(const struct commit * const *)_b;
|
|
|
|
|
|
|
|
timestamp_t generation_a = commit_graph_generation(a);
|
|
|
|
timestamp_t generation_b = commit_graph_generation(b);
|
|
|
|
|
|
|
|
if (generation_a < generation_b)
|
|
|
|
return -1;
|
|
|
|
if (generation_a > generation_b)
|
|
|
|
return 1;
|
2021-02-19 20:34:09 +08:00
|
|
|
if (a->date < b->date)
|
|
|
|
return -1;
|
|
|
|
if (a->date > b->date)
|
|
|
|
return 1;
|
2021-02-19 20:34:08 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-07-21 00:33:02 +08:00
|
|
|
static int queue_has_nonstale(struct prio_queue *queue)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
for (i = 0; i < queue->nr; i++) {
|
|
|
|
struct commit *commit = queue->array[i].data;
|
|
|
|
if (!(commit->object.flags & STALE))
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* all input commits in one and twos[] must have been parsed! */
|
2024-02-28 17:44:11 +08:00
|
|
|
static int paint_down_to_common(struct repository *r,
|
|
|
|
struct commit *one, int n,
|
|
|
|
struct commit **twos,
|
|
|
|
timestamp_t min_generation,
|
|
|
|
int ignore_missing_commits,
|
|
|
|
struct commit_list **result)
|
2018-07-21 00:33:02 +08:00
|
|
|
{
|
|
|
|
struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
|
|
|
|
int i;
|
2021-01-17 02:11:13 +08:00
|
|
|
timestamp_t last_gen = GENERATION_NUMBER_INFINITY;
|
2018-07-21 00:33:02 +08:00
|
|
|
|
commit-reach: use corrected commit dates in paint_down_to_common()
091f4cf (commit: don't use generation numbers if not needed,
2018-08-30) changed paint_down_to_common() to use commit dates instead
of generation numbers v1 (topological levels) as the performance
regressed on certain topologies. With generation number v2 (corrected
commit dates) implemented, we no longer have to rely on commit dates and
can use generation numbers.
For example, the command `git merge-base v4.8 v4.9` on the Linux
repository walks 167468 commits, taking 0.135s for committer date and
167496 commits, taking 0.157s for corrected committer date respectively.
While using corrected commit dates, Git walks nearly the same number of
commits as commit date, the process is slower as for each comparision we
have to access a commit-slab (for corrected committer date) instead of
accessing struct member (for committer date).
This change incidentally broke the fragile t6404-recursive-merge test.
t6404-recursive-merge sets up a unique repository where all commits have
the same committer date without a well-defined merge-base.
While running tests with GIT_TEST_COMMIT_GRAPH unset, we use committer
date as a heuristic in paint_down_to_common(). 6404.1 'combined merge
conflicts' merges commits in the order:
- Merge C with B to form an intermediate commit.
- Merge the intermediate commit with A.
With GIT_TEST_COMMIT_GRAPH=1, we write a commit-graph and subsequently
use the corrected committer date, which changes the order in which
commits are merged:
- Merge A with B to form an intermediate commit.
- Merge the intermediate commit with C.
While resulting repositories are equivalent, 6404.4 'virtual trees were
processed' fails with GIT_TEST_COMMIT_GRAPH=1 as we are selecting
different merge-bases and thus have different object ids for the
intermediate commits.
As this has already causes problems (as noted in 859fdc0 (commit-graph:
define GIT_TEST_COMMIT_GRAPH, 2018-08-29)), we disable commit graph
within t6404-recursive-merge.
Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-17 02:11:17 +08:00
|
|
|
if (!min_generation && !corrected_commit_dates_enabled(r))
|
2018-09-18 04:53:52 +08:00
|
|
|
queue.compare = compare_commits_by_commit_date;
|
|
|
|
|
2018-07-21 00:33:02 +08:00
|
|
|
one->object.flags |= PARENT1;
|
|
|
|
if (!n) {
|
2024-02-28 17:44:11 +08:00
|
|
|
commit_list_append(one, result);
|
|
|
|
return 0;
|
2018-07-21 00:33:02 +08:00
|
|
|
}
|
|
|
|
prio_queue_put(&queue, one);
|
|
|
|
|
|
|
|
for (i = 0; i < n; i++) {
|
|
|
|
twos[i]->object.flags |= PARENT2;
|
|
|
|
prio_queue_put(&queue, twos[i]);
|
|
|
|
}
|
|
|
|
|
|
|
|
while (queue_has_nonstale(&queue)) {
|
|
|
|
struct commit *commit = prio_queue_get(&queue);
|
|
|
|
struct commit_list *parents;
|
|
|
|
int flags;
|
2021-01-17 02:11:13 +08:00
|
|
|
timestamp_t generation = commit_graph_generation(commit);
|
2018-07-21 00:33:02 +08:00
|
|
|
|
2020-06-17 17:14:11 +08:00
|
|
|
if (min_generation && generation > last_gen)
|
2021-01-17 02:11:13 +08:00
|
|
|
BUG("bad generation skip %"PRItime" > %"PRItime" at %s",
|
2020-06-17 17:14:11 +08:00
|
|
|
generation, last_gen,
|
2018-07-21 00:33:02 +08:00
|
|
|
oid_to_hex(&commit->object.oid));
|
2020-06-17 17:14:11 +08:00
|
|
|
last_gen = generation;
|
2018-07-21 00:33:02 +08:00
|
|
|
|
2020-06-17 17:14:11 +08:00
|
|
|
if (generation < min_generation)
|
2018-07-21 00:33:02 +08:00
|
|
|
break;
|
|
|
|
|
|
|
|
flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
|
|
|
|
if (flags == (PARENT1 | PARENT2)) {
|
|
|
|
if (!(commit->object.flags & RESULT)) {
|
|
|
|
commit->object.flags |= RESULT;
|
2024-02-28 17:44:11 +08:00
|
|
|
commit_list_insert_by_date(commit, result);
|
2018-07-21 00:33:02 +08:00
|
|
|
}
|
|
|
|
/* Mark parents of a found merge stale */
|
|
|
|
flags |= STALE;
|
|
|
|
}
|
|
|
|
parents = commit->parents;
|
|
|
|
while (parents) {
|
|
|
|
struct commit *p = parents->item;
|
|
|
|
parents = parents->next;
|
|
|
|
if ((p->object.flags & flags) == flags)
|
|
|
|
continue;
|
2024-02-28 17:44:07 +08:00
|
|
|
if (repo_parse_commit(r, p)) {
|
|
|
|
clear_prio_queue(&queue);
|
2024-02-28 17:44:11 +08:00
|
|
|
free_commit_list(*result);
|
|
|
|
*result = NULL;
|
commit-reach(paint_down_to_common): prepare for handling shallow commits
When `git fetch --update-shallow` needs to test for commit ancestry, it
can naturally run into a missing object (e.g. if it is a parent of a
shallow commit). For the purpose of `--update-shallow`, this needs to be
treated as if the child commit did not even have that parent, i.e. the
commit history needs to be clamped.
For all other scenarios, clamping the commit history is actually a bug,
as it would hide repository corruption (for an analysis regarding
shallow and partial clones, see the analysis further down).
Add a flag to optionally ask the function to ignore missing commits, as
`--update-shallow` needs it to, while detecting missing objects as a
repository corruption error by default.
This flag is needed, and cannot be replaced by `is_repository_shallow()`
to indicate that situation, because that function would return 0 in the
`--update-shallow` scenario: There is not actually a `shallow` file in
that scenario, as demonstrated e.g. by t5537.10 ("add new shallow root
with receive.updateshallow on") and t5538.4 ("add new shallow root with
receive.updateshallow on").
Note: shallow commits' parents are set to `NULL` internally already,
therefore there is no need to special-case shallow repositories here, as
the merge-base logic will not try to access parent commits of shallow
commits.
Likewise, partial clones aren't an issue either: If a commit is missing
during the revision walk in the merge-base logic, it is fetched via
`promisor_remote_get_direct()`. And not only the single missing commit
object: Due to the way the "promised" objects are fetched (in
`fetch_objects()` in `promisor-remote.c`, using `fetch
--filter=blob:none`), there is no actual way to fetch a single commit
object, as the remote side will pass that commit OID to `pack-objects
--revs [...]` which in turn passes it to `rev-list` which interprets
this as a commit _range_ instead of a single object. Therefore, in
partial clones (unless they are shallow in addition), all commits
reachable from a commit that is in the local object database are also
present in that local database.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28 17:44:10 +08:00
|
|
|
/*
|
|
|
|
* At this stage, we know that the commit is
|
|
|
|
* missing: `repo_parse_commit()` uses
|
|
|
|
* `OBJECT_INFO_DIE_IF_CORRUPT` and therefore
|
|
|
|
* corrupt commits would already have been
|
|
|
|
* dispatched with a `die()`.
|
|
|
|
*/
|
2024-02-28 17:44:11 +08:00
|
|
|
if (ignore_missing_commits)
|
|
|
|
return 0;
|
|
|
|
return error(_("could not parse commit %s"),
|
|
|
|
oid_to_hex(&p->object.oid));
|
2024-02-28 17:44:07 +08:00
|
|
|
}
|
2018-07-21 00:33:02 +08:00
|
|
|
p->object.flags |= flags;
|
|
|
|
prio_queue_put(&queue, p);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
clear_prio_queue(&queue);
|
2024-02-28 17:44:11 +08:00
|
|
|
return 0;
|
2018-07-21 00:33:02 +08:00
|
|
|
}
|
|
|
|
|
2024-02-28 17:44:12 +08:00
|
|
|
static int merge_bases_many(struct repository *r,
|
|
|
|
struct commit *one, int n,
|
|
|
|
struct commit **twos,
|
|
|
|
struct commit_list **result)
|
2018-07-21 00:33:02 +08:00
|
|
|
{
|
|
|
|
struct commit_list *list = NULL;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < n; i++) {
|
2024-02-28 17:44:12 +08:00
|
|
|
if (one == twos[i]) {
|
2018-07-21 00:33:02 +08:00
|
|
|
/*
|
|
|
|
* We do not mark this even with RESULT so we do not
|
|
|
|
* have to clean it up.
|
|
|
|
*/
|
2024-02-28 17:44:12 +08:00
|
|
|
*result = commit_list_insert(one, result);
|
|
|
|
return 0;
|
|
|
|
}
|
2018-07-21 00:33:02 +08:00
|
|
|
}
|
|
|
|
|
2024-02-28 17:44:12 +08:00
|
|
|
if (!one)
|
|
|
|
return 0;
|
2018-11-14 08:12:52 +08:00
|
|
|
if (repo_parse_commit(r, one))
|
2024-02-28 17:44:12 +08:00
|
|
|
return error(_("could not parse commit %s"),
|
|
|
|
oid_to_hex(&one->object.oid));
|
2018-07-21 00:33:02 +08:00
|
|
|
for (i = 0; i < n; i++) {
|
2024-02-28 17:44:12 +08:00
|
|
|
if (!twos[i])
|
|
|
|
return 0;
|
2018-11-14 08:12:52 +08:00
|
|
|
if (repo_parse_commit(r, twos[i]))
|
2024-02-28 17:44:12 +08:00
|
|
|
return error(_("could not parse commit %s"),
|
|
|
|
oid_to_hex(&twos[i]->object.oid));
|
2018-07-21 00:33:02 +08:00
|
|
|
}
|
|
|
|
|
2024-02-28 17:44:11 +08:00
|
|
|
if (paint_down_to_common(r, one, n, twos, 0, 0, &list)) {
|
|
|
|
free_commit_list(list);
|
2024-02-28 17:44:12 +08:00
|
|
|
return -1;
|
2024-02-28 17:44:11 +08:00
|
|
|
}
|
2018-07-21 00:33:02 +08:00
|
|
|
|
|
|
|
while (list) {
|
|
|
|
struct commit *commit = pop_commit(&list);
|
|
|
|
if (!(commit->object.flags & STALE))
|
2024-02-28 17:44:12 +08:00
|
|
|
commit_list_insert_by_date(commit, result);
|
2018-07-21 00:33:02 +08:00
|
|
|
}
|
2024-02-28 17:44:12 +08:00
|
|
|
return 0;
|
2018-07-21 00:33:02 +08:00
|
|
|
}
|
|
|
|
|
2024-02-28 17:44:15 +08:00
|
|
|
int get_octopus_merge_bases(struct commit_list *in, struct commit_list **result)
|
2018-07-21 00:33:02 +08:00
|
|
|
{
|
2024-02-28 17:44:15 +08:00
|
|
|
struct commit_list *i, *j, *k;
|
2018-07-21 00:33:02 +08:00
|
|
|
|
|
|
|
if (!in)
|
2024-02-28 17:44:15 +08:00
|
|
|
return 0;
|
2018-07-21 00:33:02 +08:00
|
|
|
|
2024-02-28 17:44:15 +08:00
|
|
|
commit_list_insert(in->item, result);
|
2018-07-21 00:33:02 +08:00
|
|
|
|
|
|
|
for (i = in->next; i; i = i->next) {
|
|
|
|
struct commit_list *new_commits = NULL, *end = NULL;
|
|
|
|
|
2024-02-28 17:44:15 +08:00
|
|
|
for (j = *result; j; j = j->next) {
|
2024-02-28 17:44:14 +08:00
|
|
|
struct commit_list *bases = NULL;
|
|
|
|
if (repo_get_merge_bases(the_repository, i->item,
|
|
|
|
j->item, &bases) < 0) {
|
|
|
|
free_commit_list(bases);
|
2024-02-28 17:44:15 +08:00
|
|
|
free_commit_list(*result);
|
|
|
|
*result = NULL;
|
|
|
|
return -1;
|
2024-02-28 17:44:14 +08:00
|
|
|
}
|
2018-07-21 00:33:02 +08:00
|
|
|
if (!new_commits)
|
|
|
|
new_commits = bases;
|
|
|
|
else
|
|
|
|
end->next = bases;
|
|
|
|
for (k = bases; k; k = k->next)
|
|
|
|
end = k;
|
|
|
|
}
|
2024-02-28 17:44:15 +08:00
|
|
|
free_commit_list(*result);
|
|
|
|
*result = new_commits;
|
2018-07-21 00:33:02 +08:00
|
|
|
}
|
2024-02-28 17:44:15 +08:00
|
|
|
return 0;
|
2018-07-21 00:33:02 +08:00
|
|
|
}
|
|
|
|
|
commit-reach: use one walk in remove_redundant()
The current implementation of remove_redundant() uses several calls to
paint_down_to_common() to determine that commits are independent of each
other. This leads to quadratic behavior when many inputs are passed to
commands such as 'git merge-base'.
For example, in the Linux kernel repository, I tested the performance
by passing all tags:
git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)")
(Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do
not point to commits.)
Here is the performance improvement introduced by this change:
Before: 16.4s
After: 1.1s
This performance improvement requires the commit-graph file to be
present. We keep the old algorithm around as remove_redundant_no_gen()
and use it when generation_numbers_enabled() is false. This is similar
to other algorithms within commit-reach.c. The new algorithm is
implemented in remove_redundant_with_gen().
The basic approach is to do one commit walk instead of many. First, scan
all commits in the list and mark their _parents_ with the STALE flag.
This flag will indicate commits that are reachable from one of the
inputs, except not including themselves. Then, walk commits until
covering all commits up to the minimum generation number pushing the
STALE flag throughout.
At the end, we need to clear the STALE bit from all of the commits
we walked. We move the non-stale commits in 'array' to the beginning of
the list, and this might overwrite stale commits. However, we store an
array of commits that started the walk, and use clear_commit_marks() on
each of those starting commits. That method will walk the reachable
commits with the STALE bit and clear them all. This makes the algorithm
safe for re-entry or for other uses of those commits after this walk.
This logic is covered by tests in t6600-test-reach.sh, so the behavior
does not change. This is tested both in the case with a commit-graph and
without.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 20:34:07 +08:00
|
|
|
static int remove_redundant_no_gen(struct repository *r,
|
|
|
|
struct commit **array, int cnt)
|
2018-07-21 00:33:02 +08:00
|
|
|
{
|
|
|
|
struct commit **work;
|
|
|
|
unsigned char *redundant;
|
|
|
|
int *filled_index;
|
|
|
|
int i, j, filled;
|
|
|
|
|
2021-03-14 00:17:22 +08:00
|
|
|
CALLOC_ARRAY(work, cnt);
|
2018-07-21 00:33:02 +08:00
|
|
|
redundant = xcalloc(cnt, 1);
|
|
|
|
ALLOC_ARRAY(filled_index, cnt - 1);
|
|
|
|
|
|
|
|
for (i = 0; i < cnt; i++)
|
2018-11-14 08:12:53 +08:00
|
|
|
repo_parse_commit(r, array[i]);
|
2018-07-21 00:33:02 +08:00
|
|
|
for (i = 0; i < cnt; i++) {
|
2024-02-28 17:44:11 +08:00
|
|
|
struct commit_list *common = NULL;
|
2021-01-17 02:11:13 +08:00
|
|
|
timestamp_t min_generation = commit_graph_generation(array[i]);
|
2018-07-21 00:33:02 +08:00
|
|
|
|
|
|
|
if (redundant[i])
|
|
|
|
continue;
|
|
|
|
for (j = filled = 0; j < cnt; j++) {
|
2021-01-17 02:11:13 +08:00
|
|
|
timestamp_t curr_generation;
|
2018-07-21 00:33:02 +08:00
|
|
|
if (i == j || redundant[j])
|
|
|
|
continue;
|
|
|
|
filled_index[filled] = j;
|
|
|
|
work[filled++] = array[j];
|
|
|
|
|
2020-06-17 17:14:11 +08:00
|
|
|
curr_generation = commit_graph_generation(array[j]);
|
|
|
|
if (curr_generation < min_generation)
|
|
|
|
min_generation = curr_generation;
|
2018-07-21 00:33:02 +08:00
|
|
|
}
|
2024-02-28 17:44:11 +08:00
|
|
|
if (paint_down_to_common(r, array[i], filled,
|
|
|
|
work, min_generation, 0, &common)) {
|
|
|
|
clear_commit_marks(array[i], all_flags);
|
|
|
|
clear_commit_marks_many(filled, work, all_flags);
|
|
|
|
free_commit_list(common);
|
|
|
|
free(work);
|
|
|
|
free(redundant);
|
|
|
|
free(filled_index);
|
|
|
|
return -1;
|
|
|
|
}
|
2018-07-21 00:33:02 +08:00
|
|
|
if (array[i]->object.flags & PARENT2)
|
|
|
|
redundant[i] = 1;
|
|
|
|
for (j = 0; j < filled; j++)
|
|
|
|
if (work[j]->object.flags & PARENT1)
|
|
|
|
redundant[filled_index[j]] = 1;
|
|
|
|
clear_commit_marks(array[i], all_flags);
|
|
|
|
clear_commit_marks_many(filled, work, all_flags);
|
|
|
|
free_commit_list(common);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Now collect the result */
|
|
|
|
COPY_ARRAY(work, array, cnt);
|
|
|
|
for (i = filled = 0; i < cnt; i++)
|
|
|
|
if (!redundant[i])
|
|
|
|
array[filled++] = work[i];
|
|
|
|
free(work);
|
|
|
|
free(redundant);
|
|
|
|
free(filled_index);
|
|
|
|
return filled;
|
|
|
|
}
|
|
|
|
|
commit-reach: use one walk in remove_redundant()
The current implementation of remove_redundant() uses several calls to
paint_down_to_common() to determine that commits are independent of each
other. This leads to quadratic behavior when many inputs are passed to
commands such as 'git merge-base'.
For example, in the Linux kernel repository, I tested the performance
by passing all tags:
git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)")
(Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do
not point to commits.)
Here is the performance improvement introduced by this change:
Before: 16.4s
After: 1.1s
This performance improvement requires the commit-graph file to be
present. We keep the old algorithm around as remove_redundant_no_gen()
and use it when generation_numbers_enabled() is false. This is similar
to other algorithms within commit-reach.c. The new algorithm is
implemented in remove_redundant_with_gen().
The basic approach is to do one commit walk instead of many. First, scan
all commits in the list and mark their _parents_ with the STALE flag.
This flag will indicate commits that are reachable from one of the
inputs, except not including themselves. Then, walk commits until
covering all commits up to the minimum generation number pushing the
STALE flag throughout.
At the end, we need to clear the STALE bit from all of the commits
we walked. We move the non-stale commits in 'array' to the beginning of
the list, and this might overwrite stale commits. However, we store an
array of commits that started the walk, and use clear_commit_marks() on
each of those starting commits. That method will walk the reachable
commits with the STALE bit and clear them all. This makes the algorithm
safe for re-entry or for other uses of those commits after this walk.
This logic is covered by tests in t6600-test-reach.sh, so the behavior
does not change. This is tested both in the case with a commit-graph and
without.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 20:34:07 +08:00
|
|
|
static int remove_redundant_with_gen(struct repository *r,
|
|
|
|
struct commit **array, int cnt)
|
|
|
|
{
|
2021-02-19 20:34:09 +08:00
|
|
|
int i, count_non_stale = 0, count_still_independent = cnt;
|
commit-reach: use one walk in remove_redundant()
The current implementation of remove_redundant() uses several calls to
paint_down_to_common() to determine that commits are independent of each
other. This leads to quadratic behavior when many inputs are passed to
commands such as 'git merge-base'.
For example, in the Linux kernel repository, I tested the performance
by passing all tags:
git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)")
(Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do
not point to commits.)
Here is the performance improvement introduced by this change:
Before: 16.4s
After: 1.1s
This performance improvement requires the commit-graph file to be
present. We keep the old algorithm around as remove_redundant_no_gen()
and use it when generation_numbers_enabled() is false. This is similar
to other algorithms within commit-reach.c. The new algorithm is
implemented in remove_redundant_with_gen().
The basic approach is to do one commit walk instead of many. First, scan
all commits in the list and mark their _parents_ with the STALE flag.
This flag will indicate commits that are reachable from one of the
inputs, except not including themselves. Then, walk commits until
covering all commits up to the minimum generation number pushing the
STALE flag throughout.
At the end, we need to clear the STALE bit from all of the commits
we walked. We move the non-stale commits in 'array' to the beginning of
the list, and this might overwrite stale commits. However, we store an
array of commits that started the walk, and use clear_commit_marks() on
each of those starting commits. That method will walk the reachable
commits with the STALE bit and clear them all. This makes the algorithm
safe for re-entry or for other uses of those commits after this walk.
This logic is covered by tests in t6600-test-reach.sh, so the behavior
does not change. This is tested both in the case with a commit-graph and
without.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 20:34:07 +08:00
|
|
|
timestamp_t min_generation = GENERATION_NUMBER_INFINITY;
|
commit-reach: stale commits may prune generation further
The remove_redundant_with_gen() algorithm performs a depth-first-search
to find commits in the 'array' list, starting at the parents of each
commit in 'array'. The result is that commits in 'array' are marked
STALE when they are reachable from another commit in 'array'.
This depth-first-search is fast when commits lie on or near the
first-parent history of the higher commits. The search terminates early
if all but one commit becomes marked STALE.
However, it is possible that there are two independent commits with high
generation number. In that case, the depth-first-search might languish
by searching in lower generations due to the fixed min_generation used
throughout the method.
With the expectation that commits with lower generation are expected to
become STALE more often, we can optimize further by increasing that
min_generation boundary upon discovery of the commit with minimum
generation.
We must first sort the commits in 'array' by generation. We cannot sort
'array' itself since it must preserve relative order among the returned
results (see revision.c:mark_redundant_parents() for an example).
This simplifies the initialization of min_generation, but it also allows
us to increase the new min_generation when we find the commit with
smallest generation remaining.
This requires more than two commits in order to test, so I used the
Linux kernel repository with a few commits that are slightly off of the
first-parent history. I timed the following command:
git merge-base --independent 2ecedd756908 d2360a398f0b \
1253935ad801 160bab43419e 0e2209629fec 1d0e16ac1a9e
The first two commits have similar generation and are near the v5.10
tag. Commit 160bab43419e is off of the first-parent history behind v5.5,
while the others are scattered somewhere reachable from v5.9. This is
designed to demonstrate the optimization, as that commit within v5.5
would normally cause a lot of extra commit walking.
Since remove_redundant_with_alg() is called only when at least one of
the input commits has a finite generation number, this algorithm is
tested with a commit-graph generated starting at a number of different
tags, the earliest being v5.5.
commit-graph at v5.5:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 864ms |
| *_with_gen() (before) | 858ms |
| *_with_gen() (after) | 810ms |
commit-graph at v5.7:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 625ms |
| *_with_gen() (before) | 572ms |
| *_with_gen() (after) | 517ms |
commit-graph at v5.9:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 268ms |
| *_with_gen() (before) | 224ms |
| *_with_gen() (after) | 202ms |
commit-graph at v5.10:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 72ms |
| *_with_gen() (before) | 37ms |
| *_with_gen() (after) | 9ms |
Note that these are only modest improvements for the case where the two
independent commits are not in the commit-graph (not until v5.10). All
algorithms get faster as more commits are indexed, which is not a
surprise. However, the cost of walking extra commits is more and more
prevalent in relative terms as more commits are indexed. Finally, the
last case allows us to jump to the minimum generation between the last
two commits (that are actually independent) so we greatly reduce the
cost in that case.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 20:34:10 +08:00
|
|
|
struct commit **walk_start, **sorted;
|
commit-reach: use one walk in remove_redundant()
The current implementation of remove_redundant() uses several calls to
paint_down_to_common() to determine that commits are independent of each
other. This leads to quadratic behavior when many inputs are passed to
commands such as 'git merge-base'.
For example, in the Linux kernel repository, I tested the performance
by passing all tags:
git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)")
(Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do
not point to commits.)
Here is the performance improvement introduced by this change:
Before: 16.4s
After: 1.1s
This performance improvement requires the commit-graph file to be
present. We keep the old algorithm around as remove_redundant_no_gen()
and use it when generation_numbers_enabled() is false. This is similar
to other algorithms within commit-reach.c. The new algorithm is
implemented in remove_redundant_with_gen().
The basic approach is to do one commit walk instead of many. First, scan
all commits in the list and mark their _parents_ with the STALE flag.
This flag will indicate commits that are reachable from one of the
inputs, except not including themselves. Then, walk commits until
covering all commits up to the minimum generation number pushing the
STALE flag throughout.
At the end, we need to clear the STALE bit from all of the commits
we walked. We move the non-stale commits in 'array' to the beginning of
the list, and this might overwrite stale commits. However, we store an
array of commits that started the walk, and use clear_commit_marks() on
each of those starting commits. That method will walk the reachable
commits with the STALE bit and clear them all. This makes the algorithm
safe for re-entry or for other uses of those commits after this walk.
This logic is covered by tests in t6600-test-reach.sh, so the behavior
does not change. This is tested both in the case with a commit-graph and
without.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 20:34:07 +08:00
|
|
|
size_t walk_start_nr = 0, walk_start_alloc = cnt;
|
commit-reach: stale commits may prune generation further
The remove_redundant_with_gen() algorithm performs a depth-first-search
to find commits in the 'array' list, starting at the parents of each
commit in 'array'. The result is that commits in 'array' are marked
STALE when they are reachable from another commit in 'array'.
This depth-first-search is fast when commits lie on or near the
first-parent history of the higher commits. The search terminates early
if all but one commit becomes marked STALE.
However, it is possible that there are two independent commits with high
generation number. In that case, the depth-first-search might languish
by searching in lower generations due to the fixed min_generation used
throughout the method.
With the expectation that commits with lower generation are expected to
become STALE more often, we can optimize further by increasing that
min_generation boundary upon discovery of the commit with minimum
generation.
We must first sort the commits in 'array' by generation. We cannot sort
'array' itself since it must preserve relative order among the returned
results (see revision.c:mark_redundant_parents() for an example).
This simplifies the initialization of min_generation, but it also allows
us to increase the new min_generation when we find the commit with
smallest generation remaining.
This requires more than two commits in order to test, so I used the
Linux kernel repository with a few commits that are slightly off of the
first-parent history. I timed the following command:
git merge-base --independent 2ecedd756908 d2360a398f0b \
1253935ad801 160bab43419e 0e2209629fec 1d0e16ac1a9e
The first two commits have similar generation and are near the v5.10
tag. Commit 160bab43419e is off of the first-parent history behind v5.5,
while the others are scattered somewhere reachable from v5.9. This is
designed to demonstrate the optimization, as that commit within v5.5
would normally cause a lot of extra commit walking.
Since remove_redundant_with_alg() is called only when at least one of
the input commits has a finite generation number, this algorithm is
tested with a commit-graph generated starting at a number of different
tags, the earliest being v5.5.
commit-graph at v5.5:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 864ms |
| *_with_gen() (before) | 858ms |
| *_with_gen() (after) | 810ms |
commit-graph at v5.7:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 625ms |
| *_with_gen() (before) | 572ms |
| *_with_gen() (after) | 517ms |
commit-graph at v5.9:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 268ms |
| *_with_gen() (before) | 224ms |
| *_with_gen() (after) | 202ms |
commit-graph at v5.10:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 72ms |
| *_with_gen() (before) | 37ms |
| *_with_gen() (after) | 9ms |
Note that these are only modest improvements for the case where the two
independent commits are not in the commit-graph (not until v5.10). All
algorithms get faster as more commits are indexed, which is not a
surprise. However, the cost of walking extra commits is more and more
prevalent in relative terms as more commits are indexed. Finally, the
last case allows us to jump to the minimum generation between the last
two commits (that are actually independent) so we greatly reduce the
cost in that case.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 20:34:10 +08:00
|
|
|
int min_gen_pos = 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Sort the input by generation number, ascending. This allows
|
|
|
|
* us to increase the "min_generation" limit when we discover
|
|
|
|
* the commit with lowest generation is STALE. The index
|
|
|
|
* min_gen_pos points to the current position within 'array'
|
|
|
|
* that is not yet known to be STALE.
|
|
|
|
*/
|
2023-01-02 05:16:48 +08:00
|
|
|
DUP_ARRAY(sorted, array, cnt);
|
commit-reach: stale commits may prune generation further
The remove_redundant_with_gen() algorithm performs a depth-first-search
to find commits in the 'array' list, starting at the parents of each
commit in 'array'. The result is that commits in 'array' are marked
STALE when they are reachable from another commit in 'array'.
This depth-first-search is fast when commits lie on or near the
first-parent history of the higher commits. The search terminates early
if all but one commit becomes marked STALE.
However, it is possible that there are two independent commits with high
generation number. In that case, the depth-first-search might languish
by searching in lower generations due to the fixed min_generation used
throughout the method.
With the expectation that commits with lower generation are expected to
become STALE more often, we can optimize further by increasing that
min_generation boundary upon discovery of the commit with minimum
generation.
We must first sort the commits in 'array' by generation. We cannot sort
'array' itself since it must preserve relative order among the returned
results (see revision.c:mark_redundant_parents() for an example).
This simplifies the initialization of min_generation, but it also allows
us to increase the new min_generation when we find the commit with
smallest generation remaining.
This requires more than two commits in order to test, so I used the
Linux kernel repository with a few commits that are slightly off of the
first-parent history. I timed the following command:
git merge-base --independent 2ecedd756908 d2360a398f0b \
1253935ad801 160bab43419e 0e2209629fec 1d0e16ac1a9e
The first two commits have similar generation and are near the v5.10
tag. Commit 160bab43419e is off of the first-parent history behind v5.5,
while the others are scattered somewhere reachable from v5.9. This is
designed to demonstrate the optimization, as that commit within v5.5
would normally cause a lot of extra commit walking.
Since remove_redundant_with_alg() is called only when at least one of
the input commits has a finite generation number, this algorithm is
tested with a commit-graph generated starting at a number of different
tags, the earliest being v5.5.
commit-graph at v5.5:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 864ms |
| *_with_gen() (before) | 858ms |
| *_with_gen() (after) | 810ms |
commit-graph at v5.7:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 625ms |
| *_with_gen() (before) | 572ms |
| *_with_gen() (after) | 517ms |
commit-graph at v5.9:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 268ms |
| *_with_gen() (before) | 224ms |
| *_with_gen() (after) | 202ms |
commit-graph at v5.10:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 72ms |
| *_with_gen() (before) | 37ms |
| *_with_gen() (after) | 9ms |
Note that these are only modest improvements for the case where the two
independent commits are not in the commit-graph (not until v5.10). All
algorithms get faster as more commits are indexed, which is not a
surprise. However, the cost of walking extra commits is more and more
prevalent in relative terms as more commits are indexed. Finally, the
last case allows us to jump to the minimum generation between the last
two commits (that are actually independent) so we greatly reduce the
cost in that case.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 20:34:10 +08:00
|
|
|
QSORT(sorted, cnt, compare_commits_by_gen);
|
|
|
|
min_generation = commit_graph_generation(sorted[0]);
|
commit-reach: use one walk in remove_redundant()
The current implementation of remove_redundant() uses several calls to
paint_down_to_common() to determine that commits are independent of each
other. This leads to quadratic behavior when many inputs are passed to
commands such as 'git merge-base'.
For example, in the Linux kernel repository, I tested the performance
by passing all tags:
git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)")
(Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do
not point to commits.)
Here is the performance improvement introduced by this change:
Before: 16.4s
After: 1.1s
This performance improvement requires the commit-graph file to be
present. We keep the old algorithm around as remove_redundant_no_gen()
and use it when generation_numbers_enabled() is false. This is similar
to other algorithms within commit-reach.c. The new algorithm is
implemented in remove_redundant_with_gen().
The basic approach is to do one commit walk instead of many. First, scan
all commits in the list and mark their _parents_ with the STALE flag.
This flag will indicate commits that are reachable from one of the
inputs, except not including themselves. Then, walk commits until
covering all commits up to the minimum generation number pushing the
STALE flag throughout.
At the end, we need to clear the STALE bit from all of the commits
we walked. We move the non-stale commits in 'array' to the beginning of
the list, and this might overwrite stale commits. However, we store an
array of commits that started the walk, and use clear_commit_marks() on
each of those starting commits. That method will walk the reachable
commits with the STALE bit and clear them all. This makes the algorithm
safe for re-entry or for other uses of those commits after this walk.
This logic is covered by tests in t6600-test-reach.sh, so the behavior
does not change. This is tested both in the case with a commit-graph and
without.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 20:34:07 +08:00
|
|
|
|
|
|
|
ALLOC_ARRAY(walk_start, walk_start_alloc);
|
|
|
|
|
|
|
|
/* Mark all parents of the input as STALE */
|
|
|
|
for (i = 0; i < cnt; i++) {
|
|
|
|
struct commit_list *parents;
|
|
|
|
|
|
|
|
repo_parse_commit(r, array[i]);
|
2021-02-19 20:34:09 +08:00
|
|
|
array[i]->object.flags |= RESULT;
|
commit-reach: use one walk in remove_redundant()
The current implementation of remove_redundant() uses several calls to
paint_down_to_common() to determine that commits are independent of each
other. This leads to quadratic behavior when many inputs are passed to
commands such as 'git merge-base'.
For example, in the Linux kernel repository, I tested the performance
by passing all tags:
git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)")
(Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do
not point to commits.)
Here is the performance improvement introduced by this change:
Before: 16.4s
After: 1.1s
This performance improvement requires the commit-graph file to be
present. We keep the old algorithm around as remove_redundant_no_gen()
and use it when generation_numbers_enabled() is false. This is similar
to other algorithms within commit-reach.c. The new algorithm is
implemented in remove_redundant_with_gen().
The basic approach is to do one commit walk instead of many. First, scan
all commits in the list and mark their _parents_ with the STALE flag.
This flag will indicate commits that are reachable from one of the
inputs, except not including themselves. Then, walk commits until
covering all commits up to the minimum generation number pushing the
STALE flag throughout.
At the end, we need to clear the STALE bit from all of the commits
we walked. We move the non-stale commits in 'array' to the beginning of
the list, and this might overwrite stale commits. However, we store an
array of commits that started the walk, and use clear_commit_marks() on
each of those starting commits. That method will walk the reachable
commits with the STALE bit and clear them all. This makes the algorithm
safe for re-entry or for other uses of those commits after this walk.
This logic is covered by tests in t6600-test-reach.sh, so the behavior
does not change. This is tested both in the case with a commit-graph and
without.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 20:34:07 +08:00
|
|
|
parents = array[i]->parents;
|
|
|
|
|
|
|
|
while (parents) {
|
|
|
|
repo_parse_commit(r, parents->item);
|
|
|
|
if (!(parents->item->object.flags & STALE)) {
|
|
|
|
parents->item->object.flags |= STALE;
|
|
|
|
ALLOC_GROW(walk_start, walk_start_nr + 1, walk_start_alloc);
|
|
|
|
walk_start[walk_start_nr++] = parents->item;
|
|
|
|
}
|
|
|
|
parents = parents->next;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-02-19 20:34:09 +08:00
|
|
|
QSORT(walk_start, walk_start_nr, compare_commits_by_gen);
|
commit-reach: use one walk in remove_redundant()
The current implementation of remove_redundant() uses several calls to
paint_down_to_common() to determine that commits are independent of each
other. This leads to quadratic behavior when many inputs are passed to
commands such as 'git merge-base'.
For example, in the Linux kernel repository, I tested the performance
by passing all tags:
git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)")
(Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do
not point to commits.)
Here is the performance improvement introduced by this change:
Before: 16.4s
After: 1.1s
This performance improvement requires the commit-graph file to be
present. We keep the old algorithm around as remove_redundant_no_gen()
and use it when generation_numbers_enabled() is false. This is similar
to other algorithms within commit-reach.c. The new algorithm is
implemented in remove_redundant_with_gen().
The basic approach is to do one commit walk instead of many. First, scan
all commits in the list and mark their _parents_ with the STALE flag.
This flag will indicate commits that are reachable from one of the
inputs, except not including themselves. Then, walk commits until
covering all commits up to the minimum generation number pushing the
STALE flag throughout.
At the end, we need to clear the STALE bit from all of the commits
we walked. We move the non-stale commits in 'array' to the beginning of
the list, and this might overwrite stale commits. However, we store an
array of commits that started the walk, and use clear_commit_marks() on
each of those starting commits. That method will walk the reachable
commits with the STALE bit and clear them all. This makes the algorithm
safe for re-entry or for other uses of those commits after this walk.
This logic is covered by tests in t6600-test-reach.sh, so the behavior
does not change. This is tested both in the case with a commit-graph and
without.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 20:34:07 +08:00
|
|
|
|
2021-02-19 20:34:09 +08:00
|
|
|
/* remove STALE bit for now to allow walking through parents */
|
|
|
|
for (i = 0; i < walk_start_nr; i++)
|
|
|
|
walk_start[i]->object.flags &= ~STALE;
|
commit-reach: use one walk in remove_redundant()
The current implementation of remove_redundant() uses several calls to
paint_down_to_common() to determine that commits are independent of each
other. This leads to quadratic behavior when many inputs are passed to
commands such as 'git merge-base'.
For example, in the Linux kernel repository, I tested the performance
by passing all tags:
git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)")
(Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do
not point to commits.)
Here is the performance improvement introduced by this change:
Before: 16.4s
After: 1.1s
This performance improvement requires the commit-graph file to be
present. We keep the old algorithm around as remove_redundant_no_gen()
and use it when generation_numbers_enabled() is false. This is similar
to other algorithms within commit-reach.c. The new algorithm is
implemented in remove_redundant_with_gen().
The basic approach is to do one commit walk instead of many. First, scan
all commits in the list and mark their _parents_ with the STALE flag.
This flag will indicate commits that are reachable from one of the
inputs, except not including themselves. Then, walk commits until
covering all commits up to the minimum generation number pushing the
STALE flag throughout.
At the end, we need to clear the STALE bit from all of the commits
we walked. We move the non-stale commits in 'array' to the beginning of
the list, and this might overwrite stale commits. However, we store an
array of commits that started the walk, and use clear_commit_marks() on
each of those starting commits. That method will walk the reachable
commits with the STALE bit and clear them all. This makes the algorithm
safe for re-entry or for other uses of those commits after this walk.
This logic is covered by tests in t6600-test-reach.sh, so the behavior
does not change. This is tested both in the case with a commit-graph and
without.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 20:34:07 +08:00
|
|
|
|
2021-02-19 20:34:09 +08:00
|
|
|
/*
|
|
|
|
* Start walking from the highest generation. Hopefully, it will
|
|
|
|
* find all other items during the first-parent walk, and we can
|
|
|
|
* terminate early. Otherwise, we will do the same amount of work
|
|
|
|
* as before.
|
|
|
|
*/
|
|
|
|
for (i = walk_start_nr - 1; i >= 0 && count_still_independent > 1; i--) {
|
|
|
|
/* push the STALE bits up to min generation */
|
|
|
|
struct commit_list *stack = NULL;
|
commit-reach: use one walk in remove_redundant()
The current implementation of remove_redundant() uses several calls to
paint_down_to_common() to determine that commits are independent of each
other. This leads to quadratic behavior when many inputs are passed to
commands such as 'git merge-base'.
For example, in the Linux kernel repository, I tested the performance
by passing all tags:
git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)")
(Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do
not point to commits.)
Here is the performance improvement introduced by this change:
Before: 16.4s
After: 1.1s
This performance improvement requires the commit-graph file to be
present. We keep the old algorithm around as remove_redundant_no_gen()
and use it when generation_numbers_enabled() is false. This is similar
to other algorithms within commit-reach.c. The new algorithm is
implemented in remove_redundant_with_gen().
The basic approach is to do one commit walk instead of many. First, scan
all commits in the list and mark their _parents_ with the STALE flag.
This flag will indicate commits that are reachable from one of the
inputs, except not including themselves. Then, walk commits until
covering all commits up to the minimum generation number pushing the
STALE flag throughout.
At the end, we need to clear the STALE bit from all of the commits
we walked. We move the non-stale commits in 'array' to the beginning of
the list, and this might overwrite stale commits. However, we store an
array of commits that started the walk, and use clear_commit_marks() on
each of those starting commits. That method will walk the reachable
commits with the STALE bit and clear them all. This makes the algorithm
safe for re-entry or for other uses of those commits after this walk.
This logic is covered by tests in t6600-test-reach.sh, so the behavior
does not change. This is tested both in the case with a commit-graph and
without.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 20:34:07 +08:00
|
|
|
|
2021-02-19 20:34:09 +08:00
|
|
|
commit_list_insert(walk_start[i], &stack);
|
|
|
|
walk_start[i]->object.flags |= STALE;
|
|
|
|
|
|
|
|
while (stack) {
|
|
|
|
struct commit_list *parents;
|
|
|
|
struct commit *c = stack->item;
|
|
|
|
|
|
|
|
repo_parse_commit(r, c);
|
|
|
|
|
|
|
|
if (c->object.flags & RESULT) {
|
|
|
|
c->object.flags &= ~RESULT;
|
|
|
|
if (--count_still_independent <= 1)
|
|
|
|
break;
|
commit-reach: stale commits may prune generation further
The remove_redundant_with_gen() algorithm performs a depth-first-search
to find commits in the 'array' list, starting at the parents of each
commit in 'array'. The result is that commits in 'array' are marked
STALE when they are reachable from another commit in 'array'.
This depth-first-search is fast when commits lie on or near the
first-parent history of the higher commits. The search terminates early
if all but one commit becomes marked STALE.
However, it is possible that there are two independent commits with high
generation number. In that case, the depth-first-search might languish
by searching in lower generations due to the fixed min_generation used
throughout the method.
With the expectation that commits with lower generation are expected to
become STALE more often, we can optimize further by increasing that
min_generation boundary upon discovery of the commit with minimum
generation.
We must first sort the commits in 'array' by generation. We cannot sort
'array' itself since it must preserve relative order among the returned
results (see revision.c:mark_redundant_parents() for an example).
This simplifies the initialization of min_generation, but it also allows
us to increase the new min_generation when we find the commit with
smallest generation remaining.
This requires more than two commits in order to test, so I used the
Linux kernel repository with a few commits that are slightly off of the
first-parent history. I timed the following command:
git merge-base --independent 2ecedd756908 d2360a398f0b \
1253935ad801 160bab43419e 0e2209629fec 1d0e16ac1a9e
The first two commits have similar generation and are near the v5.10
tag. Commit 160bab43419e is off of the first-parent history behind v5.5,
while the others are scattered somewhere reachable from v5.9. This is
designed to demonstrate the optimization, as that commit within v5.5
would normally cause a lot of extra commit walking.
Since remove_redundant_with_alg() is called only when at least one of
the input commits has a finite generation number, this algorithm is
tested with a commit-graph generated starting at a number of different
tags, the earliest being v5.5.
commit-graph at v5.5:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 864ms |
| *_with_gen() (before) | 858ms |
| *_with_gen() (after) | 810ms |
commit-graph at v5.7:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 625ms |
| *_with_gen() (before) | 572ms |
| *_with_gen() (after) | 517ms |
commit-graph at v5.9:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 268ms |
| *_with_gen() (before) | 224ms |
| *_with_gen() (after) | 202ms |
commit-graph at v5.10:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 72ms |
| *_with_gen() (before) | 37ms |
| *_with_gen() (after) | 9ms |
Note that these are only modest improvements for the case where the two
independent commits are not in the commit-graph (not until v5.10). All
algorithms get faster as more commits are indexed, which is not a
surprise. However, the cost of walking extra commits is more and more
prevalent in relative terms as more commits are indexed. Finally, the
last case allows us to jump to the minimum generation between the last
two commits (that are actually independent) so we greatly reduce the
cost in that case.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 20:34:10 +08:00
|
|
|
if (oideq(&c->object.oid, &sorted[min_gen_pos]->object.oid)) {
|
|
|
|
while (min_gen_pos < cnt - 1 &&
|
|
|
|
(sorted[min_gen_pos]->object.flags & STALE))
|
|
|
|
min_gen_pos++;
|
|
|
|
min_generation = commit_graph_generation(sorted[min_gen_pos]);
|
|
|
|
}
|
commit-reach: use one walk in remove_redundant()
The current implementation of remove_redundant() uses several calls to
paint_down_to_common() to determine that commits are independent of each
other. This leads to quadratic behavior when many inputs are passed to
commands such as 'git merge-base'.
For example, in the Linux kernel repository, I tested the performance
by passing all tags:
git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)")
(Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do
not point to commits.)
Here is the performance improvement introduced by this change:
Before: 16.4s
After: 1.1s
This performance improvement requires the commit-graph file to be
present. We keep the old algorithm around as remove_redundant_no_gen()
and use it when generation_numbers_enabled() is false. This is similar
to other algorithms within commit-reach.c. The new algorithm is
implemented in remove_redundant_with_gen().
The basic approach is to do one commit walk instead of many. First, scan
all commits in the list and mark their _parents_ with the STALE flag.
This flag will indicate commits that are reachable from one of the
inputs, except not including themselves. Then, walk commits until
covering all commits up to the minimum generation number pushing the
STALE flag throughout.
At the end, we need to clear the STALE bit from all of the commits
we walked. We move the non-stale commits in 'array' to the beginning of
the list, and this might overwrite stale commits. However, we store an
array of commits that started the walk, and use clear_commit_marks() on
each of those starting commits. That method will walk the reachable
commits with the STALE bit and clear them all. This makes the algorithm
safe for re-entry or for other uses of those commits after this walk.
This logic is covered by tests in t6600-test-reach.sh, so the behavior
does not change. This is tested both in the case with a commit-graph and
without.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 20:34:07 +08:00
|
|
|
}
|
2021-02-19 20:34:09 +08:00
|
|
|
|
|
|
|
if (commit_graph_generation(c) < min_generation) {
|
|
|
|
pop_commit(&stack);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
parents = c->parents;
|
|
|
|
while (parents) {
|
|
|
|
if (!(parents->item->object.flags & STALE)) {
|
|
|
|
parents->item->object.flags |= STALE;
|
|
|
|
commit_list_insert(parents->item, &stack);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
parents = parents->next;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* pop if all parents have been visited already */
|
|
|
|
if (!parents)
|
|
|
|
pop_commit(&stack);
|
commit-reach: use one walk in remove_redundant()
The current implementation of remove_redundant() uses several calls to
paint_down_to_common() to determine that commits are independent of each
other. This leads to quadratic behavior when many inputs are passed to
commands such as 'git merge-base'.
For example, in the Linux kernel repository, I tested the performance
by passing all tags:
git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)")
(Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do
not point to commits.)
Here is the performance improvement introduced by this change:
Before: 16.4s
After: 1.1s
This performance improvement requires the commit-graph file to be
present. We keep the old algorithm around as remove_redundant_no_gen()
and use it when generation_numbers_enabled() is false. This is similar
to other algorithms within commit-reach.c. The new algorithm is
implemented in remove_redundant_with_gen().
The basic approach is to do one commit walk instead of many. First, scan
all commits in the list and mark their _parents_ with the STALE flag.
This flag will indicate commits that are reachable from one of the
inputs, except not including themselves. Then, walk commits until
covering all commits up to the minimum generation number pushing the
STALE flag throughout.
At the end, we need to clear the STALE bit from all of the commits
we walked. We move the non-stale commits in 'array' to the beginning of
the list, and this might overwrite stale commits. However, we store an
array of commits that started the walk, and use clear_commit_marks() on
each of those starting commits. That method will walk the reachable
commits with the STALE bit and clear them all. This makes the algorithm
safe for re-entry or for other uses of those commits after this walk.
This logic is covered by tests in t6600-test-reach.sh, so the behavior
does not change. This is tested both in the case with a commit-graph and
without.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 20:34:07 +08:00
|
|
|
}
|
2021-02-19 20:34:09 +08:00
|
|
|
free_commit_list(stack);
|
commit-reach: use one walk in remove_redundant()
The current implementation of remove_redundant() uses several calls to
paint_down_to_common() to determine that commits are independent of each
other. This leads to quadratic behavior when many inputs are passed to
commands such as 'git merge-base'.
For example, in the Linux kernel repository, I tested the performance
by passing all tags:
git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)")
(Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do
not point to commits.)
Here is the performance improvement introduced by this change:
Before: 16.4s
After: 1.1s
This performance improvement requires the commit-graph file to be
present. We keep the old algorithm around as remove_redundant_no_gen()
and use it when generation_numbers_enabled() is false. This is similar
to other algorithms within commit-reach.c. The new algorithm is
implemented in remove_redundant_with_gen().
The basic approach is to do one commit walk instead of many. First, scan
all commits in the list and mark their _parents_ with the STALE flag.
This flag will indicate commits that are reachable from one of the
inputs, except not including themselves. Then, walk commits until
covering all commits up to the minimum generation number pushing the
STALE flag throughout.
At the end, we need to clear the STALE bit from all of the commits
we walked. We move the non-stale commits in 'array' to the beginning of
the list, and this might overwrite stale commits. However, we store an
array of commits that started the walk, and use clear_commit_marks() on
each of those starting commits. That method will walk the reachable
commits with the STALE bit and clear them all. This makes the algorithm
safe for re-entry or for other uses of those commits after this walk.
This logic is covered by tests in t6600-test-reach.sh, so the behavior
does not change. This is tested both in the case with a commit-graph and
without.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 20:34:07 +08:00
|
|
|
}
|
commit-reach: stale commits may prune generation further
The remove_redundant_with_gen() algorithm performs a depth-first-search
to find commits in the 'array' list, starting at the parents of each
commit in 'array'. The result is that commits in 'array' are marked
STALE when they are reachable from another commit in 'array'.
This depth-first-search is fast when commits lie on or near the
first-parent history of the higher commits. The search terminates early
if all but one commit becomes marked STALE.
However, it is possible that there are two independent commits with high
generation number. In that case, the depth-first-search might languish
by searching in lower generations due to the fixed min_generation used
throughout the method.
With the expectation that commits with lower generation are expected to
become STALE more often, we can optimize further by increasing that
min_generation boundary upon discovery of the commit with minimum
generation.
We must first sort the commits in 'array' by generation. We cannot sort
'array' itself since it must preserve relative order among the returned
results (see revision.c:mark_redundant_parents() for an example).
This simplifies the initialization of min_generation, but it also allows
us to increase the new min_generation when we find the commit with
smallest generation remaining.
This requires more than two commits in order to test, so I used the
Linux kernel repository with a few commits that are slightly off of the
first-parent history. I timed the following command:
git merge-base --independent 2ecedd756908 d2360a398f0b \
1253935ad801 160bab43419e 0e2209629fec 1d0e16ac1a9e
The first two commits have similar generation and are near the v5.10
tag. Commit 160bab43419e is off of the first-parent history behind v5.5,
while the others are scattered somewhere reachable from v5.9. This is
designed to demonstrate the optimization, as that commit within v5.5
would normally cause a lot of extra commit walking.
Since remove_redundant_with_alg() is called only when at least one of
the input commits has a finite generation number, this algorithm is
tested with a commit-graph generated starting at a number of different
tags, the earliest being v5.5.
commit-graph at v5.5:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 864ms |
| *_with_gen() (before) | 858ms |
| *_with_gen() (after) | 810ms |
commit-graph at v5.7:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 625ms |
| *_with_gen() (before) | 572ms |
| *_with_gen() (after) | 517ms |
commit-graph at v5.9:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 268ms |
| *_with_gen() (before) | 224ms |
| *_with_gen() (after) | 202ms |
commit-graph at v5.10:
| Method | Time |
|-----------------------+-------|
| *_no_gen() | 72ms |
| *_with_gen() (before) | 37ms |
| *_with_gen() (after) | 9ms |
Note that these are only modest improvements for the case where the two
independent commits are not in the commit-graph (not until v5.10). All
algorithms get faster as more commits are indexed, which is not a
surprise. However, the cost of walking extra commits is more and more
prevalent in relative terms as more commits are indexed. Finally, the
last case allows us to jump to the minimum generation between the last
two commits (that are actually independent) so we greatly reduce the
cost in that case.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 20:34:10 +08:00
|
|
|
free(sorted);
|
commit-reach: use one walk in remove_redundant()
The current implementation of remove_redundant() uses several calls to
paint_down_to_common() to determine that commits are independent of each
other. This leads to quadratic behavior when many inputs are passed to
commands such as 'git merge-base'.
For example, in the Linux kernel repository, I tested the performance
by passing all tags:
git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)")
(Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do
not point to commits.)
Here is the performance improvement introduced by this change:
Before: 16.4s
After: 1.1s
This performance improvement requires the commit-graph file to be
present. We keep the old algorithm around as remove_redundant_no_gen()
and use it when generation_numbers_enabled() is false. This is similar
to other algorithms within commit-reach.c. The new algorithm is
implemented in remove_redundant_with_gen().
The basic approach is to do one commit walk instead of many. First, scan
all commits in the list and mark their _parents_ with the STALE flag.
This flag will indicate commits that are reachable from one of the
inputs, except not including themselves. Then, walk commits until
covering all commits up to the minimum generation number pushing the
STALE flag throughout.
At the end, we need to clear the STALE bit from all of the commits
we walked. We move the non-stale commits in 'array' to the beginning of
the list, and this might overwrite stale commits. However, we store an
array of commits that started the walk, and use clear_commit_marks() on
each of those starting commits. That method will walk the reachable
commits with the STALE bit and clear them all. This makes the algorithm
safe for re-entry or for other uses of those commits after this walk.
This logic is covered by tests in t6600-test-reach.sh, so the behavior
does not change. This is tested both in the case with a commit-graph and
without.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 20:34:07 +08:00
|
|
|
|
2021-02-19 20:34:09 +08:00
|
|
|
/* clear result */
|
|
|
|
for (i = 0; i < cnt; i++)
|
|
|
|
array[i]->object.flags &= ~RESULT;
|
|
|
|
|
commit-reach: use one walk in remove_redundant()
The current implementation of remove_redundant() uses several calls to
paint_down_to_common() to determine that commits are independent of each
other. This leads to quadratic behavior when many inputs are passed to
commands such as 'git merge-base'.
For example, in the Linux kernel repository, I tested the performance
by passing all tags:
git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)")
(Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do
not point to commits.)
Here is the performance improvement introduced by this change:
Before: 16.4s
After: 1.1s
This performance improvement requires the commit-graph file to be
present. We keep the old algorithm around as remove_redundant_no_gen()
and use it when generation_numbers_enabled() is false. This is similar
to other algorithms within commit-reach.c. The new algorithm is
implemented in remove_redundant_with_gen().
The basic approach is to do one commit walk instead of many. First, scan
all commits in the list and mark their _parents_ with the STALE flag.
This flag will indicate commits that are reachable from one of the
inputs, except not including themselves. Then, walk commits until
covering all commits up to the minimum generation number pushing the
STALE flag throughout.
At the end, we need to clear the STALE bit from all of the commits
we walked. We move the non-stale commits in 'array' to the beginning of
the list, and this might overwrite stale commits. However, we store an
array of commits that started the walk, and use clear_commit_marks() on
each of those starting commits. That method will walk the reachable
commits with the STALE bit and clear them all. This makes the algorithm
safe for re-entry or for other uses of those commits after this walk.
This logic is covered by tests in t6600-test-reach.sh, so the behavior
does not change. This is tested both in the case with a commit-graph and
without.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 20:34:07 +08:00
|
|
|
/* rearrange array */
|
|
|
|
for (i = count_non_stale = 0; i < cnt; i++) {
|
|
|
|
if (!(array[i]->object.flags & STALE))
|
|
|
|
array[count_non_stale++] = array[i];
|
|
|
|
}
|
|
|
|
|
|
|
|
/* clear marks */
|
|
|
|
clear_commit_marks_many(walk_start_nr, walk_start, STALE);
|
|
|
|
free(walk_start);
|
|
|
|
|
|
|
|
return count_non_stale;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int remove_redundant(struct repository *r, struct commit **array, int cnt)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Some commit in the array may be an ancestor of
|
|
|
|
* another commit. Move the independent commits to the
|
|
|
|
* beginning of 'array' and return their number. Callers
|
|
|
|
* should not rely upon the contents of 'array' after
|
|
|
|
* that number.
|
|
|
|
*/
|
|
|
|
if (generation_numbers_enabled(r)) {
|
|
|
|
int i;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we have a single commit with finite generation
|
|
|
|
* number, then the _with_gen algorithm is preferred.
|
|
|
|
*/
|
|
|
|
for (i = 0; i < cnt; i++) {
|
|
|
|
if (commit_graph_generation(array[i]) < GENERATION_NUMBER_INFINITY)
|
|
|
|
return remove_redundant_with_gen(r, array, cnt);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return remove_redundant_no_gen(r, array, cnt);
|
|
|
|
}
|
|
|
|
|
2024-02-28 17:44:13 +08:00
|
|
|
static int get_merge_bases_many_0(struct repository *r,
|
|
|
|
struct commit *one,
|
|
|
|
int n,
|
|
|
|
struct commit **twos,
|
|
|
|
int cleanup,
|
|
|
|
struct commit_list **result)
|
2018-07-21 00:33:02 +08:00
|
|
|
{
|
|
|
|
struct commit_list *list;
|
|
|
|
struct commit **rslt;
|
|
|
|
int cnt, i;
|
|
|
|
|
2024-02-28 17:44:13 +08:00
|
|
|
if (merge_bases_many(r, one, n, twos, result) < 0)
|
|
|
|
return -1;
|
2018-07-21 00:33:02 +08:00
|
|
|
for (i = 0; i < n; i++) {
|
|
|
|
if (one == twos[i])
|
2024-02-28 17:44:13 +08:00
|
|
|
return 0;
|
2018-07-21 00:33:02 +08:00
|
|
|
}
|
2024-02-28 17:44:13 +08:00
|
|
|
if (!*result || !(*result)->next) {
|
2018-07-21 00:33:02 +08:00
|
|
|
if (cleanup) {
|
|
|
|
clear_commit_marks(one, all_flags);
|
|
|
|
clear_commit_marks_many(n, twos, all_flags);
|
|
|
|
}
|
2024-02-28 17:44:13 +08:00
|
|
|
return 0;
|
2018-07-21 00:33:02 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* There are more than one */
|
2024-02-28 17:44:13 +08:00
|
|
|
cnt = commit_list_count(*result);
|
2021-03-14 00:17:22 +08:00
|
|
|
CALLOC_ARRAY(rslt, cnt);
|
2024-02-28 17:44:13 +08:00
|
|
|
for (list = *result, i = 0; list; list = list->next)
|
2018-07-21 00:33:02 +08:00
|
|
|
rslt[i++] = list->item;
|
2024-02-28 17:44:13 +08:00
|
|
|
free_commit_list(*result);
|
|
|
|
*result = NULL;
|
2018-07-21 00:33:02 +08:00
|
|
|
|
|
|
|
clear_commit_marks(one, all_flags);
|
|
|
|
clear_commit_marks_many(n, twos, all_flags);
|
|
|
|
|
2018-11-14 08:12:54 +08:00
|
|
|
cnt = remove_redundant(r, rslt, cnt);
|
2024-02-28 17:44:11 +08:00
|
|
|
if (cnt < 0) {
|
|
|
|
free(rslt);
|
2024-02-28 17:44:13 +08:00
|
|
|
return -1;
|
2024-02-28 17:44:11 +08:00
|
|
|
}
|
2018-07-21 00:33:02 +08:00
|
|
|
for (i = 0; i < cnt; i++)
|
2024-02-28 17:44:13 +08:00
|
|
|
commit_list_insert_by_date(rslt[i], result);
|
2018-07-21 00:33:02 +08:00
|
|
|
free(rslt);
|
2024-02-28 17:44:13 +08:00
|
|
|
return 0;
|
2018-07-21 00:33:02 +08:00
|
|
|
}
|
|
|
|
|
2024-02-28 17:44:16 +08:00
|
|
|
int repo_get_merge_bases_many(struct repository *r,
|
|
|
|
struct commit *one,
|
|
|
|
int n,
|
|
|
|
struct commit **twos,
|
|
|
|
struct commit_list **result)
|
2018-07-21 00:33:02 +08:00
|
|
|
{
|
2024-02-28 17:44:16 +08:00
|
|
|
return get_merge_bases_many_0(r, one, n, twos, 1, result);
|
2018-07-21 00:33:02 +08:00
|
|
|
}
|
|
|
|
|
2024-02-28 17:44:17 +08:00
|
|
|
int repo_get_merge_bases_many_dirty(struct repository *r,
|
|
|
|
struct commit *one,
|
|
|
|
int n,
|
|
|
|
struct commit **twos,
|
|
|
|
struct commit_list **result)
|
2018-07-21 00:33:02 +08:00
|
|
|
{
|
2024-02-28 17:44:17 +08:00
|
|
|
return get_merge_bases_many_0(r, one, n, twos, 0, result);
|
2018-07-21 00:33:02 +08:00
|
|
|
}
|
|
|
|
|
2024-02-28 17:44:14 +08:00
|
|
|
int repo_get_merge_bases(struct repository *r,
|
|
|
|
struct commit *one,
|
|
|
|
struct commit *two,
|
|
|
|
struct commit_list **result)
|
2018-07-21 00:33:02 +08:00
|
|
|
{
|
2024-02-28 17:44:14 +08:00
|
|
|
return get_merge_bases_many_0(r, one, 1, &two, 1, result);
|
2018-07-21 00:33:02 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Is "commit" a descendant of one of the elements on the "with_commit" list?
|
|
|
|
*/
|
2020-06-24 02:42:22 +08:00
|
|
|
int repo_is_descendant_of(struct repository *r,
|
|
|
|
struct commit *commit,
|
|
|
|
struct commit_list *with_commit)
|
2018-07-21 00:33:02 +08:00
|
|
|
{
|
|
|
|
if (!with_commit)
|
|
|
|
return 1;
|
|
|
|
|
libs: use "struct repository *" argument, not "the_repository"
As can easily be seen from grepping in our sources, we had these uses
of "the_repository" in various library code in cases where the
function in question was already getting a "struct repository *"
argument. Let's use that argument instead.
Out of these changes only the changes to "cache-tree.c",
"commit-reach.c", "shallow.c" and "upload-pack.c" would have cleanly
applied before the migration away from the "repo_*()" wrapper macros
in the preceding commits.
The rest aren't new, as we'd previously implicitly refer to
"the_repository", but it's now more obvious that we were doing the
wrong thing all along, and should have used the parameter instead.
The change to change "get_index_format_default(the_repository)" in
"read-cache.c" to use the "r" variable instead should arguably have
been part of [1], or in the subsequent cleanup in [2]. Let's do it
here, as can be seen from the initial code in [3] it's not important
that we use "the_repository" there, but would prefer to always use the
current repository.
This change excludes the "the_repository" use in "upload-pack.c"'s
upload_pack_advertise(), as the in-flight [4] makes that change.
1. ee1f0c242ef (read-cache: add index.skipHash config option,
2023-01-06)
2. 6269f8eaad0 (treewide: always have a valid "index_state.repo"
member, 2023-01-17)
3. 7211b9e7534 (repo-settings: consolidate some config settings,
2019-08-13)
4. <Y/hbUsGPVNAxTdmS@coredump.intra.peff.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 21:58:58 +08:00
|
|
|
if (generation_numbers_enabled(r)) {
|
commit-reach: use can_all_from_reach
The is_descendant_of method previously used in_merge_bases() to check if
the commit can reach any of the commits in the provided list. This had
two performance problems:
1. The performance is quadratic in worst-case.
2. A single in_merge_bases() call requires walking beyond the target
commit in order to find the full set of boundary commits that may be
merge-bases.
The can_all_from_reach method avoids this quadratic behavior and can
limit the search beyond the target commits using generation numbers. It
requires a small prototype adjustment to stop using commit-date as a
cutoff, as that optimization is no longer appropriate here.
Since in_merge_bases() uses paint_down_to_common(), is_descendant_of()
naturally found cutoffs to avoid walking the entire commit graph. Since
we want to always return the correct result, we cannot use the
min_commit_date cutoff in can_all_from_reach. We then rely on generation
numbers to provide the cutoff.
Since not all repos will have a commit-graph file, nor will we always
have generation numbers computed for a commit-graph file, create a new
method, generation_numbers_enabled(), that checks for a commit-graph
file and sees if the first commit in the file has a non-zero generation
number. In the case that we do not have generation numbers, use the old
logic for is_descendant_of().
Performance was meausured on a copy of the Linux repository using the
'test-tool reach is_descendant_of' command using this input:
A:v4.9
X:v4.10
X:v4.11
X:v4.12
X:v4.13
X:v4.14
X:v4.15
X:v4.16
X:v4.17
X.v3.0
Note that this input is tailored to demonstrate the quadratic nature of
the previous method, as it will compute merge-bases for v4.9 versus all
of the later versions before checking against v4.1.
Before: 0.26 s
After: 0.21 s
Since we previously used the is_descendant_of method in the ref_newer
method, we also measured performance there using
'test-tool reach ref_newer' with this input:
A:v4.9
B:v3.19
Before: 0.10 s
After: 0.08 s
By adding a new commit with parent v3.19, we test the non-reachable case
of ref_newer:
Before: 0.09 s
After: 0.08 s
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-21 00:33:30 +08:00
|
|
|
struct commit_list *from_list = NULL;
|
|
|
|
int result;
|
|
|
|
commit_list_insert(commit, &from_list);
|
|
|
|
result = can_all_from_reach(from_list, with_commit, 0);
|
|
|
|
free_commit_list(from_list);
|
|
|
|
return result;
|
|
|
|
} else {
|
|
|
|
while (with_commit) {
|
|
|
|
struct commit *other;
|
2024-02-28 17:44:09 +08:00
|
|
|
int ret;
|
commit-reach: use can_all_from_reach
The is_descendant_of method previously used in_merge_bases() to check if
the commit can reach any of the commits in the provided list. This had
two performance problems:
1. The performance is quadratic in worst-case.
2. A single in_merge_bases() call requires walking beyond the target
commit in order to find the full set of boundary commits that may be
merge-bases.
The can_all_from_reach method avoids this quadratic behavior and can
limit the search beyond the target commits using generation numbers. It
requires a small prototype adjustment to stop using commit-date as a
cutoff, as that optimization is no longer appropriate here.
Since in_merge_bases() uses paint_down_to_common(), is_descendant_of()
naturally found cutoffs to avoid walking the entire commit graph. Since
we want to always return the correct result, we cannot use the
min_commit_date cutoff in can_all_from_reach. We then rely on generation
numbers to provide the cutoff.
Since not all repos will have a commit-graph file, nor will we always
have generation numbers computed for a commit-graph file, create a new
method, generation_numbers_enabled(), that checks for a commit-graph
file and sees if the first commit in the file has a non-zero generation
number. In the case that we do not have generation numbers, use the old
logic for is_descendant_of().
Performance was meausured on a copy of the Linux repository using the
'test-tool reach is_descendant_of' command using this input:
A:v4.9
X:v4.10
X:v4.11
X:v4.12
X:v4.13
X:v4.14
X:v4.15
X:v4.16
X:v4.17
X.v3.0
Note that this input is tailored to demonstrate the quadratic nature of
the previous method, as it will compute merge-bases for v4.9 versus all
of the later versions before checking against v4.1.
Before: 0.26 s
After: 0.21 s
Since we previously used the is_descendant_of method in the ref_newer
method, we also measured performance there using
'test-tool reach ref_newer' with this input:
A:v4.9
B:v3.19
Before: 0.10 s
After: 0.08 s
By adding a new commit with parent v3.19, we test the non-reachable case
of ref_newer:
Before: 0.09 s
After: 0.08 s
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-21 00:33:30 +08:00
|
|
|
|
|
|
|
other = with_commit->item;
|
|
|
|
with_commit = with_commit->next;
|
2024-02-28 17:44:09 +08:00
|
|
|
ret = repo_in_merge_bases_many(r, other, 1, &commit, 0);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
commit-reach: use can_all_from_reach
The is_descendant_of method previously used in_merge_bases() to check if
the commit can reach any of the commits in the provided list. This had
two performance problems:
1. The performance is quadratic in worst-case.
2. A single in_merge_bases() call requires walking beyond the target
commit in order to find the full set of boundary commits that may be
merge-bases.
The can_all_from_reach method avoids this quadratic behavior and can
limit the search beyond the target commits using generation numbers. It
requires a small prototype adjustment to stop using commit-date as a
cutoff, as that optimization is no longer appropriate here.
Since in_merge_bases() uses paint_down_to_common(), is_descendant_of()
naturally found cutoffs to avoid walking the entire commit graph. Since
we want to always return the correct result, we cannot use the
min_commit_date cutoff in can_all_from_reach. We then rely on generation
numbers to provide the cutoff.
Since not all repos will have a commit-graph file, nor will we always
have generation numbers computed for a commit-graph file, create a new
method, generation_numbers_enabled(), that checks for a commit-graph
file and sees if the first commit in the file has a non-zero generation
number. In the case that we do not have generation numbers, use the old
logic for is_descendant_of().
Performance was meausured on a copy of the Linux repository using the
'test-tool reach is_descendant_of' command using this input:
A:v4.9
X:v4.10
X:v4.11
X:v4.12
X:v4.13
X:v4.14
X:v4.15
X:v4.16
X:v4.17
X.v3.0
Note that this input is tailored to demonstrate the quadratic nature of
the previous method, as it will compute merge-bases for v4.9 versus all
of the later versions before checking against v4.1.
Before: 0.26 s
After: 0.21 s
Since we previously used the is_descendant_of method in the ref_newer
method, we also measured performance there using
'test-tool reach ref_newer' with this input:
A:v4.9
B:v3.19
Before: 0.10 s
After: 0.08 s
By adding a new commit with parent v3.19, we test the non-reachable case
of ref_newer:
Before: 0.09 s
After: 0.08 s
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-21 00:33:30 +08:00
|
|
|
}
|
|
|
|
return 0;
|
2018-07-21 00:33:02 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Is "commit" an ancestor of one of the "references"?
|
|
|
|
*/
|
2018-11-14 08:12:56 +08:00
|
|
|
int repo_in_merge_bases_many(struct repository *r, struct commit *commit,
|
commit-reach(repo_in_merge_bases_many): optionally expect missing commits
Currently this function treats unrelated commit histories the same way
as commit histories with missing commit objects.
Typically, missing commit objects constitute a corrupt repository,
though, and should be reported as such. The next commits will make it
so, but there is one exception: In `git fetch --update-shallow` we
_expect_ commit objects to be missing, and we do want to treat the
now-incomplete commit histories as unrelated.
To allow for that, let's introduce an additional parameter that is
passed to `repo_in_merge_bases_many()` to trigger this behavior, and use
it in the two callers in `shallow.c`.
This commit changes behavior slightly: unless called from the
`shallow.c` functions that set the `ignore_missing_commits` bit, any
non-existing tip commit that is passed to `repo_in_merge_bases_many()`
will now result in an error.
Note: When encountering missing commits while traversing the commit
history in search for merge bases, with this commit there won't be a
change in behavior just yet, their children will still be interpreted as
root commits. This bug will get fixed by follow-up commits.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28 17:44:08 +08:00
|
|
|
int nr_reference, struct commit **reference,
|
|
|
|
int ignore_missing_commits)
|
2018-07-21 00:33:02 +08:00
|
|
|
{
|
2024-02-28 17:44:11 +08:00
|
|
|
struct commit_list *bases = NULL;
|
2018-07-21 00:33:02 +08:00
|
|
|
int ret = 0, i;
|
2021-01-17 02:11:13 +08:00
|
|
|
timestamp_t generation, max_generation = GENERATION_NUMBER_ZERO;
|
2018-07-21 00:33:02 +08:00
|
|
|
|
2018-11-14 08:12:56 +08:00
|
|
|
if (repo_parse_commit(r, commit))
|
commit-reach(repo_in_merge_bases_many): optionally expect missing commits
Currently this function treats unrelated commit histories the same way
as commit histories with missing commit objects.
Typically, missing commit objects constitute a corrupt repository,
though, and should be reported as such. The next commits will make it
so, but there is one exception: In `git fetch --update-shallow` we
_expect_ commit objects to be missing, and we do want to treat the
now-incomplete commit histories as unrelated.
To allow for that, let's introduce an additional parameter that is
passed to `repo_in_merge_bases_many()` to trigger this behavior, and use
it in the two callers in `shallow.c`.
This commit changes behavior slightly: unless called from the
`shallow.c` functions that set the `ignore_missing_commits` bit, any
non-existing tip commit that is passed to `repo_in_merge_bases_many()`
will now result in an error.
Note: When encountering missing commits while traversing the commit
history in search for merge bases, with this commit there won't be a
change in behavior just yet, their children will still be interpreted as
root commits. This bug will get fixed by follow-up commits.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28 17:44:08 +08:00
|
|
|
return ignore_missing_commits ? 0 : -1;
|
2018-07-21 00:33:02 +08:00
|
|
|
for (i = 0; i < nr_reference; i++) {
|
2018-11-14 08:12:56 +08:00
|
|
|
if (repo_parse_commit(r, reference[i]))
|
commit-reach(repo_in_merge_bases_many): optionally expect missing commits
Currently this function treats unrelated commit histories the same way
as commit histories with missing commit objects.
Typically, missing commit objects constitute a corrupt repository,
though, and should be reported as such. The next commits will make it
so, but there is one exception: In `git fetch --update-shallow` we
_expect_ commit objects to be missing, and we do want to treat the
now-incomplete commit histories as unrelated.
To allow for that, let's introduce an additional parameter that is
passed to `repo_in_merge_bases_many()` to trigger this behavior, and use
it in the two callers in `shallow.c`.
This commit changes behavior slightly: unless called from the
`shallow.c` functions that set the `ignore_missing_commits` bit, any
non-existing tip commit that is passed to `repo_in_merge_bases_many()`
will now result in an error.
Note: When encountering missing commits while traversing the commit
history in search for merge bases, with this commit there won't be a
change in behavior just yet, their children will still be interpreted as
root commits. This bug will get fixed by follow-up commits.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28 17:44:08 +08:00
|
|
|
return ignore_missing_commits ? 0 : -1;
|
2020-06-17 17:14:11 +08:00
|
|
|
|
|
|
|
generation = commit_graph_generation(reference[i]);
|
commit-reach: fix in_merge_bases_many bug
Way back in f9b8908b (commit.c: use generation numbers for
in_merge_bases(), 2018-05-01), a heuristic was used to short-circuit
the in_merge_bases() walk. This works just fine as long as the
caller is checking only two commits, but when there are multiple,
there is a possibility that this heuristic is _very wrong_.
Some code moves since then has changed this method to
repo_in_merge_bases_many() inside commit-reach.c. The heuristic
computes the minimum generation number of the "reference" list, then
compares this number to the generation number of the "commit".
In a recent topic, a test was added that used in_merge_bases_many()
to test if a commit was reachable from a number of commits pulled
from a reflog. However, this highlighted the problem: if any of the
reference commits have a smaller generation number than the given
commit, then the walk is skipped _even if there exist some with
higher generation number_.
This heuristic is wrong! It must check the MAXIMUM generation number
of the reference commits, not the MINIMUM.
This highlights a testing gap. t6600-test-reach.sh covers many
methods in commit-reach.c, including in_merge_bases() and
get_merge_bases_many(), but since these methods either restrict to
two input commits or actually look for the full list of merge bases,
they don't check this heuristic!
Add a possible input to "test-tool reach" that tests
in_merge_bases_many() and add tests to t6600-test-reach.sh that
cover this heuristic. This includes cases for the reference commits
having generation above and below the generation of the input commit,
but also having maximum generation below the generation of the input
commit.
The fix itself is to swap min_generation with a max_generation in
repo_in_merge_bases_many().
Reported-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-02 22:58:56 +08:00
|
|
|
if (generation > max_generation)
|
|
|
|
max_generation = generation;
|
2018-07-21 00:33:02 +08:00
|
|
|
}
|
|
|
|
|
2020-06-17 17:14:11 +08:00
|
|
|
generation = commit_graph_generation(commit);
|
commit-reach: fix in_merge_bases_many bug
Way back in f9b8908b (commit.c: use generation numbers for
in_merge_bases(), 2018-05-01), a heuristic was used to short-circuit
the in_merge_bases() walk. This works just fine as long as the
caller is checking only two commits, but when there are multiple,
there is a possibility that this heuristic is _very wrong_.
Some code moves since then has changed this method to
repo_in_merge_bases_many() inside commit-reach.c. The heuristic
computes the minimum generation number of the "reference" list, then
compares this number to the generation number of the "commit".
In a recent topic, a test was added that used in_merge_bases_many()
to test if a commit was reachable from a number of commits pulled
from a reflog. However, this highlighted the problem: if any of the
reference commits have a smaller generation number than the given
commit, then the walk is skipped _even if there exist some with
higher generation number_.
This heuristic is wrong! It must check the MAXIMUM generation number
of the reference commits, not the MINIMUM.
This highlights a testing gap. t6600-test-reach.sh covers many
methods in commit-reach.c, including in_merge_bases() and
get_merge_bases_many(), but since these methods either restrict to
two input commits or actually look for the full list of merge bases,
they don't check this heuristic!
Add a possible input to "test-tool reach" that tests
in_merge_bases_many() and add tests to t6600-test-reach.sh that
cover this heuristic. This includes cases for the reference commits
having generation above and below the generation of the input commit,
but also having maximum generation below the generation of the input
commit.
The fix itself is to swap min_generation with a max_generation in
repo_in_merge_bases_many().
Reported-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-02 22:58:56 +08:00
|
|
|
if (generation > max_generation)
|
2018-07-21 00:33:02 +08:00
|
|
|
return ret;
|
|
|
|
|
2024-02-28 17:44:11 +08:00
|
|
|
if (paint_down_to_common(r, commit,
|
|
|
|
nr_reference, reference,
|
|
|
|
generation, ignore_missing_commits, &bases))
|
|
|
|
ret = -1;
|
|
|
|
else if (commit->object.flags & PARENT2)
|
2018-07-21 00:33:02 +08:00
|
|
|
ret = 1;
|
|
|
|
clear_commit_marks(commit, all_flags);
|
|
|
|
clear_commit_marks_many(nr_reference, reference, all_flags);
|
|
|
|
free_commit_list(bases);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Is "commit" an ancestor of (i.e. reachable from) the "reference"?
|
|
|
|
*/
|
2018-11-14 08:12:56 +08:00
|
|
|
int repo_in_merge_bases(struct repository *r,
|
|
|
|
struct commit *commit,
|
|
|
|
struct commit *reference)
|
2018-07-21 00:33:02 +08:00
|
|
|
{
|
commit-reach: use fast logic in repo_in_merge_base
The repo_is_descendant_of() method is aware of the existence of the
commit-graph file. It checks for generation_numbers_enabled() before
deciding on using can_all_from_reach() or repo_in_merge_bases()
depending on the situation. The reason here is that can_all_from_reach()
uses a depth-first search that is limited by the minimum generation
number of the target commits, and that algorithm can be very slow when
generation numbers are not present. The alternative uses
paint_down_to_common() which will walk the entire merge-base boundary,
which is typically slower.
This method is used by commands like "git tag --contains" and "git
branch --contains" for very fast results when a commit-graph file
exists. Unfortunately, it is _not_ used in commands like "git merge-base
--is-ancestor" which is doing an even simpler request.
This issue was raised recently [1] with respect to a change to how
generation numbers are stored, but was also reported much earlier [2]
before commit-reach.c existed to simplify these reachability queries.
[1] https://lore.kernel.org/git/20200607195347.GA8232@szeder.dev/
[2] https://lore.kernel.org/git/87608bawoa.fsf@evledraar.gmail.com/
The root cause is that builtin/merge-base.c has a method
handle_is_ancestor() that calls in_merge_bases(), an older version of
repo_in_merge_bases(). It would be better if we have every caller to
in_merge_bases() use the logic in can_all_from_reach() when possible.
This is where things get a little tricky: repo_is_descendant_of() calls
repo_in_merge_bases() in the non-generation numbers enabled case! If we
simply update repo_in_merge_bases() to call repo_is_descendant_of()
instead of repo_in_merge_bases_many(), then we will get a recursive call
loop. Thankfully, this is caught by the test suite in the default mode
(i.e. GIT_TEST_COMMIT_GRAPH=0).
The trick, then, is to make the non-generation number case for
repo_is_descendant_of() call repo_in_merge_bases_many() directly,
skipping the non-_many version. This allows us to take advantage of this
faster code path, when possible.
The easiest way to measure the performance impact is to test the
following command on the Linux kernel repository:
git merge-base --is-ancestor <A> <B>
| A | B | Time Before | Time After |
|------|------|-------------|------------|
| v3.0 | v5.7 | 0.459s | 0.028s |
| v4.0 | v5.7 | 0.267s | 0.021s |
| v5.0 | v5.7 | 0.074s | 0.013s |
Note that each of these samples return success. The old code performed
the same operation when <A> and <B> are swapped. However,
can_all_from_reach() will return immediately if the generation numbers
show that <A> has larger generation number than <B>. Thus, the time for
the swapped case is universally 0.004s in each case.
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-18 01:24:29 +08:00
|
|
|
int res;
|
|
|
|
struct commit_list *list = NULL;
|
|
|
|
struct commit_list **next = &list;
|
|
|
|
|
|
|
|
next = commit_list_append(commit, next);
|
|
|
|
res = repo_is_descendant_of(r, reference, list);
|
|
|
|
free_commit_list(list);
|
|
|
|
|
|
|
|
return res;
|
2018-07-21 00:33:02 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
struct commit_list *reduce_heads(struct commit_list *heads)
|
|
|
|
{
|
|
|
|
struct commit_list *p;
|
|
|
|
struct commit_list *result = NULL, **tail = &result;
|
|
|
|
struct commit **array;
|
|
|
|
int num_head, i;
|
|
|
|
|
|
|
|
if (!heads)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
/* Uniquify */
|
|
|
|
for (p = heads; p; p = p->next)
|
|
|
|
p->item->object.flags &= ~STALE;
|
|
|
|
for (p = heads, num_head = 0; p; p = p->next) {
|
|
|
|
if (p->item->object.flags & STALE)
|
|
|
|
continue;
|
|
|
|
p->item->object.flags |= STALE;
|
|
|
|
num_head++;
|
|
|
|
}
|
2021-03-14 00:17:22 +08:00
|
|
|
CALLOC_ARRAY(array, num_head);
|
2018-07-21 00:33:02 +08:00
|
|
|
for (p = heads, i = 0; p; p = p->next) {
|
|
|
|
if (p->item->object.flags & STALE) {
|
|
|
|
array[i++] = p->item;
|
|
|
|
p->item->object.flags &= ~STALE;
|
|
|
|
}
|
|
|
|
}
|
2018-11-14 08:12:53 +08:00
|
|
|
num_head = remove_redundant(the_repository, array, num_head);
|
2024-02-28 17:44:11 +08:00
|
|
|
if (num_head < 0) {
|
|
|
|
free(array);
|
|
|
|
return NULL;
|
|
|
|
}
|
2018-07-21 00:33:02 +08:00
|
|
|
for (i = 0; i < num_head; i++)
|
|
|
|
tail = &commit_list_insert(array[i], tail)->next;
|
|
|
|
free(array);
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
void reduce_heads_replace(struct commit_list **heads)
|
|
|
|
{
|
|
|
|
struct commit_list *result = reduce_heads(*heads);
|
|
|
|
free_commit_list(*heads);
|
|
|
|
*heads = result;
|
|
|
|
}
|
2018-07-21 00:33:06 +08:00
|
|
|
|
|
|
|
int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid)
|
|
|
|
{
|
|
|
|
struct object *o;
|
|
|
|
struct commit *old_commit, *new_commit;
|
2018-07-21 00:33:27 +08:00
|
|
|
struct commit_list *old_commit_list = NULL;
|
2020-06-19 21:13:46 +08:00
|
|
|
int ret;
|
2018-07-21 00:33:06 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Both new_commit and old_commit must be commit-ish and new_commit is descendant of
|
|
|
|
* old_commit. Otherwise we require --force.
|
|
|
|
*/
|
|
|
|
o = deref_tag(the_repository, parse_object(the_repository, old_oid),
|
|
|
|
NULL, 0);
|
|
|
|
if (!o || o->type != OBJ_COMMIT)
|
|
|
|
return 0;
|
|
|
|
old_commit = (struct commit *) o;
|
|
|
|
|
|
|
|
o = deref_tag(the_repository, parse_object(the_repository, new_oid),
|
|
|
|
NULL, 0);
|
|
|
|
if (!o || o->type != OBJ_COMMIT)
|
|
|
|
return 0;
|
|
|
|
new_commit = (struct commit *) o;
|
|
|
|
|
2023-03-28 21:58:48 +08:00
|
|
|
if (repo_parse_commit(the_repository, new_commit) < 0)
|
2018-07-21 00:33:06 +08:00
|
|
|
return 0;
|
|
|
|
|
2018-07-21 00:33:27 +08:00
|
|
|
commit_list_insert(old_commit, &old_commit_list);
|
2020-07-07 13:09:15 +08:00
|
|
|
ret = repo_is_descendant_of(the_repository,
|
2020-06-24 02:42:22 +08:00
|
|
|
new_commit, old_commit_list);
|
2024-02-28 17:44:09 +08:00
|
|
|
if (ret < 0)
|
|
|
|
exit(128);
|
2020-06-19 21:13:46 +08:00
|
|
|
free_commit_list(old_commit_list);
|
|
|
|
return ret;
|
2018-07-21 00:33:06 +08:00
|
|
|
}
|
2018-07-21 00:33:08 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Mimicking the real stack, this stack lives on the heap, avoiding stack
|
|
|
|
* overflows.
|
|
|
|
*
|
|
|
|
* At each recursion step, the stack items points to the commits whose
|
|
|
|
* ancestors are to be inspected.
|
|
|
|
*/
|
|
|
|
struct contains_stack {
|
|
|
|
int nr, alloc;
|
|
|
|
struct contains_stack_entry {
|
|
|
|
struct commit *commit;
|
|
|
|
struct commit_list *parents;
|
|
|
|
} *contains_stack;
|
|
|
|
};
|
|
|
|
|
|
|
|
static int in_commit_list(const struct commit_list *want, struct commit *c)
|
|
|
|
{
|
|
|
|
for (; want; want = want->next)
|
2018-10-03 05:19:21 +08:00
|
|
|
if (oideq(&want->item->object.oid, &c->object.oid))
|
2018-07-21 00:33:08 +08:00
|
|
|
return 1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Test whether the candidate is contained in the list.
|
|
|
|
* Do not recurse to find out, though, but return -1 if inconclusive.
|
|
|
|
*/
|
|
|
|
static enum contains_result contains_test(struct commit *candidate,
|
|
|
|
const struct commit_list *want,
|
|
|
|
struct contains_cache *cache,
|
2021-01-17 02:11:13 +08:00
|
|
|
timestamp_t cutoff)
|
2018-07-21 00:33:08 +08:00
|
|
|
{
|
|
|
|
enum contains_result *cached = contains_cache_at(cache, candidate);
|
|
|
|
|
|
|
|
/* If we already have the answer cached, return that. */
|
|
|
|
if (*cached)
|
|
|
|
return *cached;
|
|
|
|
|
|
|
|
/* or are we it? */
|
|
|
|
if (in_commit_list(want, candidate)) {
|
|
|
|
*cached = CONTAINS_YES;
|
|
|
|
return CONTAINS_YES;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Otherwise, we don't know; prepare to recurse */
|
|
|
|
parse_commit_or_die(candidate);
|
|
|
|
|
2020-06-17 17:14:10 +08:00
|
|
|
if (commit_graph_generation(candidate) < cutoff)
|
2018-07-21 00:33:08 +08:00
|
|
|
return CONTAINS_NO;
|
|
|
|
|
|
|
|
return CONTAINS_UNKNOWN;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void push_to_contains_stack(struct commit *candidate, struct contains_stack *contains_stack)
|
|
|
|
{
|
|
|
|
ALLOC_GROW(contains_stack->contains_stack, contains_stack->nr + 1, contains_stack->alloc);
|
|
|
|
contains_stack->contains_stack[contains_stack->nr].commit = candidate;
|
|
|
|
contains_stack->contains_stack[contains_stack->nr++].parents = candidate->parents;
|
|
|
|
}
|
|
|
|
|
|
|
|
static enum contains_result contains_tag_algo(struct commit *candidate,
|
|
|
|
const struct commit_list *want,
|
|
|
|
struct contains_cache *cache)
|
|
|
|
{
|
|
|
|
struct contains_stack contains_stack = { 0, 0, NULL };
|
|
|
|
enum contains_result result;
|
2021-01-17 02:11:13 +08:00
|
|
|
timestamp_t cutoff = GENERATION_NUMBER_INFINITY;
|
2018-07-21 00:33:08 +08:00
|
|
|
const struct commit_list *p;
|
|
|
|
|
|
|
|
for (p = want; p; p = p->next) {
|
2021-01-17 02:11:13 +08:00
|
|
|
timestamp_t generation;
|
2018-07-21 00:33:08 +08:00
|
|
|
struct commit *c = p->item;
|
|
|
|
load_commit_graph_info(the_repository, c);
|
2020-06-17 17:14:11 +08:00
|
|
|
generation = commit_graph_generation(c);
|
|
|
|
if (generation < cutoff)
|
|
|
|
cutoff = generation;
|
2018-07-21 00:33:08 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
result = contains_test(candidate, want, cache, cutoff);
|
|
|
|
if (result != CONTAINS_UNKNOWN)
|
|
|
|
return result;
|
|
|
|
|
|
|
|
push_to_contains_stack(candidate, &contains_stack);
|
|
|
|
while (contains_stack.nr) {
|
|
|
|
struct contains_stack_entry *entry = &contains_stack.contains_stack[contains_stack.nr - 1];
|
|
|
|
struct commit *commit = entry->commit;
|
|
|
|
struct commit_list *parents = entry->parents;
|
|
|
|
|
|
|
|
if (!parents) {
|
|
|
|
*contains_cache_at(cache, commit) = CONTAINS_NO;
|
|
|
|
contains_stack.nr--;
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
* If we just popped the stack, parents->item has been marked,
|
|
|
|
* therefore contains_test will return a meaningful yes/no.
|
|
|
|
*/
|
|
|
|
else switch (contains_test(parents->item, want, cache, cutoff)) {
|
|
|
|
case CONTAINS_YES:
|
|
|
|
*contains_cache_at(cache, commit) = CONTAINS_YES;
|
|
|
|
contains_stack.nr--;
|
|
|
|
break;
|
|
|
|
case CONTAINS_NO:
|
|
|
|
entry->parents = parents->next;
|
|
|
|
break;
|
|
|
|
case CONTAINS_UNKNOWN:
|
|
|
|
push_to_contains_stack(parents->item, &contains_stack);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
free(contains_stack.contains_stack);
|
|
|
|
return contains_test(candidate, want, cache, cutoff);
|
|
|
|
}
|
|
|
|
|
|
|
|
int commit_contains(struct ref_filter *filter, struct commit *commit,
|
|
|
|
struct commit_list *list, struct contains_cache *cache)
|
|
|
|
{
|
|
|
|
if (filter->with_commit_tag_algo)
|
|
|
|
return contains_tag_algo(commit, list, cache) == CONTAINS_YES;
|
2020-06-24 02:42:22 +08:00
|
|
|
return repo_is_descendant_of(the_repository, commit, list);
|
2018-07-21 00:33:08 +08:00
|
|
|
}
|
2018-07-21 00:33:13 +08:00
|
|
|
|
|
|
|
int can_all_from_reach_with_flag(struct object_array *from,
|
|
|
|
unsigned int with_flag,
|
|
|
|
unsigned int assign_flag,
|
2018-07-21 00:33:28 +08:00
|
|
|
time_t min_commit_date,
|
2021-01-17 02:11:13 +08:00
|
|
|
timestamp_t min_generation)
|
2018-07-21 00:33:13 +08:00
|
|
|
{
|
2018-07-21 00:33:28 +08:00
|
|
|
struct commit **list = NULL;
|
2018-07-21 00:33:13 +08:00
|
|
|
int i;
|
2018-09-21 23:05:26 +08:00
|
|
|
int nr_commits;
|
2018-07-21 00:33:28 +08:00
|
|
|
int result = 1;
|
2018-07-21 00:33:13 +08:00
|
|
|
|
2018-07-21 00:33:28 +08:00
|
|
|
ALLOC_ARRAY(list, from->nr);
|
2018-09-21 23:05:26 +08:00
|
|
|
nr_commits = 0;
|
2018-07-21 00:33:13 +08:00
|
|
|
for (i = 0; i < from->nr; i++) {
|
2018-09-21 23:05:26 +08:00
|
|
|
struct object *from_one = from->objects[i].item;
|
2018-07-21 00:33:13 +08:00
|
|
|
|
2018-09-21 23:05:26 +08:00
|
|
|
if (!from_one || from_one->flags & assign_flag)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
from_one = deref_tag(the_repository, from_one,
|
|
|
|
"a from object", 0);
|
|
|
|
if (!from_one || from_one->type != OBJ_COMMIT) {
|
2018-09-25 21:27:41 +08:00
|
|
|
/*
|
|
|
|
* no way to tell if this is reachable by
|
2018-09-21 23:05:26 +08:00
|
|
|
* looking at the ancestry chain alone, so
|
|
|
|
* leave a note to ourselves not to worry about
|
|
|
|
* this object anymore.
|
|
|
|
*/
|
|
|
|
from->objects[i].item->flags |= assign_flag;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
list[nr_commits] = (struct commit *)from_one;
|
2023-03-28 21:58:48 +08:00
|
|
|
if (repo_parse_commit(the_repository, list[nr_commits]) ||
|
2020-06-17 17:14:10 +08:00
|
|
|
commit_graph_generation(list[nr_commits]) < min_generation) {
|
2018-09-21 23:05:26 +08:00
|
|
|
result = 0;
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
nr_commits++;
|
2018-07-21 00:33:13 +08:00
|
|
|
}
|
2018-07-21 00:33:28 +08:00
|
|
|
|
2018-09-21 23:05:26 +08:00
|
|
|
QSORT(list, nr_commits, compare_commits_by_gen);
|
2018-07-21 00:33:28 +08:00
|
|
|
|
2018-09-21 23:05:26 +08:00
|
|
|
for (i = 0; i < nr_commits; i++) {
|
2018-07-21 00:33:28 +08:00
|
|
|
/* DFS from list[i] */
|
|
|
|
struct commit_list *stack = NULL;
|
|
|
|
|
|
|
|
list[i]->object.flags |= assign_flag;
|
|
|
|
commit_list_insert(list[i], &stack);
|
|
|
|
|
|
|
|
while (stack) {
|
|
|
|
struct commit_list *parent;
|
|
|
|
|
commit-reach: fix first-parent heuristic
The algorithm in can_all_from_reach_with_flags() performs a depth-
first-search, terminated by generation number, intending to use
a hueristic that "important" commits are found in the first-parent
history. This heuristic is valuable in scenarios like fetch
negotiation.
However, there is a problem! After the search finds a target commit,
it should pop all commits off the stack and mark them as "can reach".
This logic is incorrect, so the algorithm instead walks all reachable
commits above the generation-number cutoff.
The existing algorithm is still an improvement over the previous
algorithm, as the worst-case complexity went from quadratic to linear.
The performance measurement at the time was good, but not dramatic.
By fixing this heuristic, we reduce the number of walked commits.
We can also re-run the performance tests from commit 4fbcca4e
"commit-reach: make can_all_from_reach... linear".
Performance was measured on the Linux repository using
'test-tool reach can_all_from_reach'. The input included rows seeded by
tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as
v3.[0-9]*. This mimics a (very large) fetch that says "I have all major
v3 releases and want all major v4 releases." The "large" case included
X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate
tags to the set, which does not greatly increase the number of objects
that are considered, but does increase the number of 'from' commits,
demonstrating the quadratic nature of the previous code.
Small Case:
4fbcca4e~1: 0.85 s
4fbcca4e: 0.26 s (num_walked: 1,011,035)
HEAD: 0.14 s (num_walked: 8,601)
Large Case:
4fbcca4e~1: 24.0 s
4fbcca4e: 0.12 s (num_walked: 503,925)
HEAD: 0.06 s (num_walked: 217,243)
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19 01:24:40 +08:00
|
|
|
if (stack->item->object.flags & (with_flag | RESULT)) {
|
2018-07-21 00:33:28 +08:00
|
|
|
pop_commit(&stack);
|
commit-reach: fix first-parent heuristic
The algorithm in can_all_from_reach_with_flags() performs a depth-
first-search, terminated by generation number, intending to use
a hueristic that "important" commits are found in the first-parent
history. This heuristic is valuable in scenarios like fetch
negotiation.
However, there is a problem! After the search finds a target commit,
it should pop all commits off the stack and mark them as "can reach".
This logic is incorrect, so the algorithm instead walks all reachable
commits above the generation-number cutoff.
The existing algorithm is still an improvement over the previous
algorithm, as the worst-case complexity went from quadratic to linear.
The performance measurement at the time was good, but not dramatic.
By fixing this heuristic, we reduce the number of walked commits.
We can also re-run the performance tests from commit 4fbcca4e
"commit-reach: make can_all_from_reach... linear".
Performance was measured on the Linux repository using
'test-tool reach can_all_from_reach'. The input included rows seeded by
tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as
v3.[0-9]*. This mimics a (very large) fetch that says "I have all major
v3 releases and want all major v4 releases." The "large" case included
X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate
tags to the set, which does not greatly increase the number of objects
that are considered, but does increase the number of 'from' commits,
demonstrating the quadratic nature of the previous code.
Small Case:
4fbcca4e~1: 0.85 s
4fbcca4e: 0.26 s (num_walked: 1,011,035)
HEAD: 0.14 s (num_walked: 8,601)
Large Case:
4fbcca4e~1: 24.0 s
4fbcca4e: 0.12 s (num_walked: 503,925)
HEAD: 0.06 s (num_walked: 217,243)
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19 01:24:40 +08:00
|
|
|
if (stack)
|
|
|
|
stack->item->object.flags |= RESULT;
|
2018-07-21 00:33:28 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (parent = stack->item->parents; parent; parent = parent->next) {
|
|
|
|
if (parent->item->object.flags & (with_flag | RESULT))
|
|
|
|
stack->item->object.flags |= RESULT;
|
|
|
|
|
|
|
|
if (!(parent->item->object.flags & assign_flag)) {
|
|
|
|
parent->item->object.flags |= assign_flag;
|
|
|
|
|
2023-03-28 21:58:48 +08:00
|
|
|
if (repo_parse_commit(the_repository, parent->item) ||
|
2018-07-21 00:33:28 +08:00
|
|
|
parent->item->date < min_commit_date ||
|
2020-06-17 17:14:10 +08:00
|
|
|
commit_graph_generation(parent->item) < min_generation)
|
2018-07-21 00:33:28 +08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
commit_list_insert(parent->item, &stack);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!parent)
|
|
|
|
pop_commit(&stack);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!(list[i]->object.flags & (with_flag | RESULT))) {
|
|
|
|
result = 0;
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
cleanup:
|
2018-09-25 21:27:41 +08:00
|
|
|
clear_commit_marks_many(nr_commits, list, RESULT | assign_flag);
|
2018-09-21 23:05:27 +08:00
|
|
|
free(list);
|
|
|
|
|
2023-02-11 19:15:26 +08:00
|
|
|
for (i = 0; i < from->nr; i++) {
|
|
|
|
struct object *from_one = from->objects[i].item;
|
|
|
|
|
|
|
|
if (from_one)
|
|
|
|
from_one->flags &= ~assign_flag;
|
|
|
|
}
|
2018-09-21 23:05:27 +08:00
|
|
|
|
2018-07-21 00:33:28 +08:00
|
|
|
return result;
|
2018-07-21 00:33:13 +08:00
|
|
|
}
|
2018-07-21 00:33:23 +08:00
|
|
|
|
|
|
|
int can_all_from_reach(struct commit_list *from, struct commit_list *to,
|
|
|
|
int cutoff_by_min_date)
|
|
|
|
{
|
|
|
|
struct object_array from_objs = OBJECT_ARRAY_INIT;
|
|
|
|
time_t min_commit_date = cutoff_by_min_date ? from->item->date : 0;
|
|
|
|
struct commit_list *from_iter = from, *to_iter = to;
|
|
|
|
int result;
|
2021-01-17 02:11:13 +08:00
|
|
|
timestamp_t min_generation = GENERATION_NUMBER_INFINITY;
|
2018-07-21 00:33:23 +08:00
|
|
|
|
|
|
|
while (from_iter) {
|
|
|
|
add_object_array(&from_iter->item->object, NULL, &from_objs);
|
|
|
|
|
2023-03-28 21:58:48 +08:00
|
|
|
if (!repo_parse_commit(the_repository, from_iter->item)) {
|
2021-01-17 02:11:13 +08:00
|
|
|
timestamp_t generation;
|
2018-07-21 00:33:23 +08:00
|
|
|
if (from_iter->item->date < min_commit_date)
|
|
|
|
min_commit_date = from_iter->item->date;
|
2018-07-21 00:33:28 +08:00
|
|
|
|
2020-06-17 17:14:11 +08:00
|
|
|
generation = commit_graph_generation(from_iter->item);
|
|
|
|
if (generation < min_generation)
|
|
|
|
min_generation = generation;
|
2018-07-21 00:33:23 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
from_iter = from_iter->next;
|
|
|
|
}
|
|
|
|
|
|
|
|
while (to_iter) {
|
2023-03-28 21:58:48 +08:00
|
|
|
if (!repo_parse_commit(the_repository, to_iter->item)) {
|
2021-01-17 02:11:13 +08:00
|
|
|
timestamp_t generation;
|
2018-07-21 00:33:23 +08:00
|
|
|
if (to_iter->item->date < min_commit_date)
|
|
|
|
min_commit_date = to_iter->item->date;
|
2018-07-21 00:33:28 +08:00
|
|
|
|
2020-06-17 17:14:11 +08:00
|
|
|
generation = commit_graph_generation(to_iter->item);
|
|
|
|
if (generation < min_generation)
|
|
|
|
min_generation = generation;
|
2018-07-21 00:33:23 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
to_iter->item->object.flags |= PARENT2;
|
|
|
|
|
|
|
|
to_iter = to_iter->next;
|
|
|
|
}
|
|
|
|
|
|
|
|
result = can_all_from_reach_with_flag(&from_objs, PARENT2, PARENT1,
|
2018-07-21 00:33:28 +08:00
|
|
|
min_commit_date, min_generation);
|
2018-07-21 00:33:23 +08:00
|
|
|
|
|
|
|
while (from) {
|
|
|
|
clear_commit_marks(from->item, PARENT1);
|
|
|
|
from = from->next;
|
|
|
|
}
|
|
|
|
|
|
|
|
while (to) {
|
|
|
|
clear_commit_marks(to->item, PARENT2);
|
|
|
|
to = to->next;
|
|
|
|
}
|
|
|
|
|
|
|
|
object_array_clear(&from_objs);
|
|
|
|
return result;
|
|
|
|
}
|
2018-11-02 21:14:45 +08:00
|
|
|
|
|
|
|
struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
|
|
|
|
struct commit **to, int nr_to,
|
|
|
|
unsigned int reachable_flag)
|
|
|
|
{
|
|
|
|
struct commit **item;
|
|
|
|
struct commit *current;
|
|
|
|
struct commit_list *found_commits = NULL;
|
|
|
|
struct commit **to_last = to + nr_to;
|
|
|
|
struct commit **from_last = from + nr_from;
|
2021-01-17 02:11:13 +08:00
|
|
|
timestamp_t min_generation = GENERATION_NUMBER_INFINITY;
|
2018-11-02 21:14:45 +08:00
|
|
|
int num_to_find = 0;
|
|
|
|
|
|
|
|
struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
|
|
|
|
|
|
|
|
for (item = to; item < to_last; item++) {
|
2021-01-17 02:11:13 +08:00
|
|
|
timestamp_t generation;
|
2018-11-02 21:14:45 +08:00
|
|
|
struct commit *c = *item;
|
|
|
|
|
2023-03-28 21:58:48 +08:00
|
|
|
repo_parse_commit(the_repository, c);
|
2020-06-17 17:14:11 +08:00
|
|
|
generation = commit_graph_generation(c);
|
|
|
|
if (generation < min_generation)
|
|
|
|
min_generation = generation;
|
2018-11-02 21:14:45 +08:00
|
|
|
|
|
|
|
if (!(c->object.flags & PARENT1)) {
|
|
|
|
c->object.flags |= PARENT1;
|
|
|
|
num_to_find++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
for (item = from; item < from_last; item++) {
|
|
|
|
struct commit *c = *item;
|
|
|
|
if (!(c->object.flags & PARENT2)) {
|
|
|
|
c->object.flags |= PARENT2;
|
2023-03-28 21:58:48 +08:00
|
|
|
repo_parse_commit(the_repository, c);
|
2018-11-02 21:14:45 +08:00
|
|
|
|
|
|
|
prio_queue_put(&queue, *item);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
while (num_to_find && (current = prio_queue_get(&queue)) != NULL) {
|
|
|
|
struct commit_list *parents;
|
|
|
|
|
|
|
|
if (current->object.flags & PARENT1) {
|
|
|
|
current->object.flags &= ~PARENT1;
|
|
|
|
current->object.flags |= reachable_flag;
|
|
|
|
commit_list_insert(current, &found_commits);
|
|
|
|
num_to_find--;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (parents = current->parents; parents; parents = parents->next) {
|
|
|
|
struct commit *p = parents->item;
|
|
|
|
|
2023-03-28 21:58:48 +08:00
|
|
|
repo_parse_commit(the_repository, p);
|
2018-11-02 21:14:45 +08:00
|
|
|
|
2020-06-17 17:14:10 +08:00
|
|
|
if (commit_graph_generation(p) < min_generation)
|
2018-11-02 21:14:45 +08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
if (p->object.flags & PARENT2)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
p->object.flags |= PARENT2;
|
|
|
|
prio_queue_put(&queue, p);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2023-06-03 08:28:19 +08:00
|
|
|
clear_prio_queue(&queue);
|
|
|
|
|
2018-11-02 21:14:45 +08:00
|
|
|
clear_commit_marks_many(nr_to, to, PARENT1);
|
|
|
|
clear_commit_marks_many(nr_from, from, PARENT2);
|
|
|
|
|
|
|
|
return found_commits;
|
|
|
|
}
|
commit-reach: implement ahead_behind() logic
Fully implement the commit-counting logic required to determine
ahead/behind counts for a batch of commit pairs. This is a new library
method within commit-reach.h. This method will be linked to the
for-each-ref builtin in the next change.
The interface for ahead_behind() uses two arrays. The first array of
commits contains the list of all starting points for the walk. This
includes all tip commits _and_ base commits. The second array specifies
base/tip pairs by pointing to commits within the first array, by index.
The second array also stores the resulting ahead/behind counts for each
of these pairs.
This implementation of ahead_behind() allows multiple bases, if desired.
Even with multiple bases, there is only one commit walk used for
counting the ahead/behind values, saving time when the base/tip ranges
overlap significantly.
This interface for ahead_behind() also makes it very easy to call
ensure_generations_valid() on the entire array of bases and tips. This
call is necessary because it is critical that the walk that counts
ahead/behind values never walks a commit more than once. Without
generation numbers on every commit, there is a possibility that a
commit date skew could cause the walk to revisit a commit and then
double-count it. For this reason, it is strongly recommended that 'git
ahead-behind' is only run in a repository with a commit-graph file that
covers most of the reachable commits, storing precomputed generation
numbers. If no commit-graph exists, this walk will be much slower as it
must walk all reachable commits in ensure_generations_valid() before
performing the counting logic.
It is possible to detect if generation numbers are available at run time
and redirect the implementation to another algorithm that does not
require this property. However, that implementation requires a commit
walk per base/tip pair _and_ can be slower due to the commit date
heuristics required. Such an implementation could be considered in the
future if there is a reason to include it, but most Git hosts should
already be generating a commit-graph file as part of repository
maintenance. Most Git clients should also be generating commit-graph
files as part of background maintenance or automatic GCs.
Now, let's discuss the ahead/behind counting algorithm.
The first array of commits are considered the starting commits. The
index within that array will play a critical role.
We create a new commit slab that maps commits to a bitmap. For a given
commit (anywhere in the history), its bitmap stores information relative
to which of the input commits can reach that commit. The ith bit will be
on if the ith commit from the starting list can reach that commit. It is
important to notice that these bitmaps are not the typical "reachability
bitmaps" that are stored in .bitmap files. Instead of signalling which
objects are reachable from the current commit, they instead signal
"which starting commits can reach me?" It is also important to know that
the bitmap is not necessarily "complete" until we walk that commit. We
will perform a commit walk by generation number in such a way that we
can guarantee the bitmap is correct when we visit that commit.
At the beginning of the ahead_behind() method, we initialize the bitmaps
for each of the starting commits. By enabling the ith bit for the ith
starting commit, we signal "the ith commit can reach itself."
We walk commits by popping the commit with maximum generation number out
of the queue, guaranteeing that we will never walk a child of that
commit in any future steps.
As we walk, we load the bitmap for the current commit and perform two
main steps. The _second_ step examines each parent of the current commit
and adds the current commit's bitmap bits to each parent's bitmap. (We
create a new bitmap for the parent if this is our first time seeing that
parent.) After adding the bits to the parent's bitmap, the parent is
added to the walk queue. Due to this passing of bits to parents, the
current commit has a guarantee that the ith bit is enabled on its bitmap
if and only if the ith commit can reach the current commit.
The first step of the walk is to examine the bitmask on the current
commit and decide which ranges the commit is in or not. Due to the "bit
pushing" in the second step, we have a guarantee that the ith bit of the
current commit's bitmap is on if and only if the ith starting commit can
reach it. For each ahead_behind_count struct, check the base_index and
tip_index to see if those bits are enabled on the current bitmap. If
exactly one bit is enabled, then increment the corresponding 'ahead' or
'behind' count. This increment is the reason we _absolutely need_ to
walk commits at most once.
The only subtle thing to do with this walk is to check to see if a
parent has all bits on in its bitmap, in which case it becomes "stale"
and is marked with the STALE bit. This allows queue_has_nonstale() to be
the terminating condition of the walk, which greatly reduces the number
of commits walked if all of the commits are nearby in history. It avoids
walking a large number of common commits when there is a deep history.
We also use the helper method insert_no_dup() to add commits to the
priority queue without adding them multiple times. This uses the PARENT2
flag. Thus, we must clear both the STALE and PARENT2 bits of all
commits, in case ahead_behind() is called multiple times in the same
process.
Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 19:26:53 +08:00
|
|
|
|
|
|
|
define_commit_slab(bit_arrays, struct bitmap *);
|
|
|
|
static struct bit_arrays bit_arrays;
|
|
|
|
|
|
|
|
static void insert_no_dup(struct prio_queue *queue, struct commit *c)
|
|
|
|
{
|
|
|
|
if (c->object.flags & PARENT2)
|
|
|
|
return;
|
|
|
|
prio_queue_put(queue, c);
|
|
|
|
c->object.flags |= PARENT2;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct bitmap *get_bit_array(struct commit *c, int width)
|
|
|
|
{
|
|
|
|
struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c);
|
|
|
|
if (!*bitmap)
|
|
|
|
*bitmap = bitmap_word_alloc(width);
|
|
|
|
return *bitmap;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void free_bit_array(struct commit *c)
|
|
|
|
{
|
|
|
|
struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c);
|
|
|
|
if (!*bitmap)
|
|
|
|
return;
|
|
|
|
bitmap_free(*bitmap);
|
|
|
|
*bitmap = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
void ahead_behind(struct repository *r,
|
|
|
|
struct commit **commits, size_t commits_nr,
|
|
|
|
struct ahead_behind_count *counts, size_t counts_nr)
|
|
|
|
{
|
|
|
|
struct prio_queue queue = { .compare = compare_commits_by_gen_then_commit_date };
|
|
|
|
size_t width = DIV_ROUND_UP(commits_nr, BITS_IN_EWORD);
|
|
|
|
|
|
|
|
if (!commits_nr || !counts_nr)
|
|
|
|
return;
|
|
|
|
|
|
|
|
for (size_t i = 0; i < counts_nr; i++) {
|
|
|
|
counts[i].ahead = 0;
|
|
|
|
counts[i].behind = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
ensure_generations_valid(r, commits, commits_nr);
|
|
|
|
|
|
|
|
init_bit_arrays(&bit_arrays);
|
|
|
|
|
|
|
|
for (size_t i = 0; i < commits_nr; i++) {
|
|
|
|
struct commit *c = commits[i];
|
|
|
|
struct bitmap *bitmap = get_bit_array(c, width);
|
|
|
|
|
|
|
|
bitmap_set(bitmap, i);
|
|
|
|
insert_no_dup(&queue, c);
|
|
|
|
}
|
|
|
|
|
|
|
|
while (queue_has_nonstale(&queue)) {
|
|
|
|
struct commit *c = prio_queue_get(&queue);
|
|
|
|
struct commit_list *p;
|
|
|
|
struct bitmap *bitmap_c = get_bit_array(c, width);
|
|
|
|
|
|
|
|
for (size_t i = 0; i < counts_nr; i++) {
|
|
|
|
int reach_from_tip = !!bitmap_get(bitmap_c, counts[i].tip_index);
|
|
|
|
int reach_from_base = !!bitmap_get(bitmap_c, counts[i].base_index);
|
|
|
|
|
|
|
|
if (reach_from_tip ^ reach_from_base) {
|
|
|
|
if (reach_from_base)
|
|
|
|
counts[i].behind++;
|
|
|
|
else
|
|
|
|
counts[i].ahead++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
for (p = c->parents; p; p = p->next) {
|
|
|
|
struct bitmap *bitmap_p;
|
|
|
|
|
|
|
|
repo_parse_commit(r, p->item);
|
|
|
|
|
|
|
|
bitmap_p = get_bit_array(p->item, width);
|
|
|
|
bitmap_or(bitmap_p, bitmap_c);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If this parent is reachable from every starting
|
|
|
|
* commit, then none of its ancestors can contribute
|
|
|
|
* to the ahead/behind count. Mark it as STALE, so
|
|
|
|
* we can stop the walk when every commit in the
|
|
|
|
* queue is STALE.
|
|
|
|
*/
|
|
|
|
if (bitmap_popcount(bitmap_p) == commits_nr)
|
|
|
|
p->item->object.flags |= STALE;
|
|
|
|
|
|
|
|
insert_no_dup(&queue, p->item);
|
|
|
|
}
|
|
|
|
|
|
|
|
free_bit_array(c);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* STALE is used here, PARENT2 is used by insert_no_dup(). */
|
|
|
|
repo_clear_commit_marks(r, PARENT2 | STALE);
|
2024-05-27 19:46:54 +08:00
|
|
|
while (prio_queue_peek(&queue)) {
|
|
|
|
struct commit *c = prio_queue_get(&queue);
|
|
|
|
free_bit_array(c);
|
|
|
|
}
|
commit-reach: implement ahead_behind() logic
Fully implement the commit-counting logic required to determine
ahead/behind counts for a batch of commit pairs. This is a new library
method within commit-reach.h. This method will be linked to the
for-each-ref builtin in the next change.
The interface for ahead_behind() uses two arrays. The first array of
commits contains the list of all starting points for the walk. This
includes all tip commits _and_ base commits. The second array specifies
base/tip pairs by pointing to commits within the first array, by index.
The second array also stores the resulting ahead/behind counts for each
of these pairs.
This implementation of ahead_behind() allows multiple bases, if desired.
Even with multiple bases, there is only one commit walk used for
counting the ahead/behind values, saving time when the base/tip ranges
overlap significantly.
This interface for ahead_behind() also makes it very easy to call
ensure_generations_valid() on the entire array of bases and tips. This
call is necessary because it is critical that the walk that counts
ahead/behind values never walks a commit more than once. Without
generation numbers on every commit, there is a possibility that a
commit date skew could cause the walk to revisit a commit and then
double-count it. For this reason, it is strongly recommended that 'git
ahead-behind' is only run in a repository with a commit-graph file that
covers most of the reachable commits, storing precomputed generation
numbers. If no commit-graph exists, this walk will be much slower as it
must walk all reachable commits in ensure_generations_valid() before
performing the counting logic.
It is possible to detect if generation numbers are available at run time
and redirect the implementation to another algorithm that does not
require this property. However, that implementation requires a commit
walk per base/tip pair _and_ can be slower due to the commit date
heuristics required. Such an implementation could be considered in the
future if there is a reason to include it, but most Git hosts should
already be generating a commit-graph file as part of repository
maintenance. Most Git clients should also be generating commit-graph
files as part of background maintenance or automatic GCs.
Now, let's discuss the ahead/behind counting algorithm.
The first array of commits are considered the starting commits. The
index within that array will play a critical role.
We create a new commit slab that maps commits to a bitmap. For a given
commit (anywhere in the history), its bitmap stores information relative
to which of the input commits can reach that commit. The ith bit will be
on if the ith commit from the starting list can reach that commit. It is
important to notice that these bitmaps are not the typical "reachability
bitmaps" that are stored in .bitmap files. Instead of signalling which
objects are reachable from the current commit, they instead signal
"which starting commits can reach me?" It is also important to know that
the bitmap is not necessarily "complete" until we walk that commit. We
will perform a commit walk by generation number in such a way that we
can guarantee the bitmap is correct when we visit that commit.
At the beginning of the ahead_behind() method, we initialize the bitmaps
for each of the starting commits. By enabling the ith bit for the ith
starting commit, we signal "the ith commit can reach itself."
We walk commits by popping the commit with maximum generation number out
of the queue, guaranteeing that we will never walk a child of that
commit in any future steps.
As we walk, we load the bitmap for the current commit and perform two
main steps. The _second_ step examines each parent of the current commit
and adds the current commit's bitmap bits to each parent's bitmap. (We
create a new bitmap for the parent if this is our first time seeing that
parent.) After adding the bits to the parent's bitmap, the parent is
added to the walk queue. Due to this passing of bits to parents, the
current commit has a guarantee that the ith bit is enabled on its bitmap
if and only if the ith commit can reach the current commit.
The first step of the walk is to examine the bitmask on the current
commit and decide which ranges the commit is in or not. Due to the "bit
pushing" in the second step, we have a guarantee that the ith bit of the
current commit's bitmap is on if and only if the ith starting commit can
reach it. For each ahead_behind_count struct, check the base_index and
tip_index to see if those bits are enabled on the current bitmap. If
exactly one bit is enabled, then increment the corresponding 'ahead' or
'behind' count. This increment is the reason we _absolutely need_ to
walk commits at most once.
The only subtle thing to do with this walk is to check to see if a
parent has all bits on in its bitmap, in which case it becomes "stale"
and is marked with the STALE bit. This allows queue_has_nonstale() to be
the terminating condition of the walk, which greatly reduces the number
of commits walked if all of the commits are nearby in history. It avoids
walking a large number of common commits when there is a deep history.
We also use the helper method insert_no_dup() to add commits to the
priority queue without adding them multiple times. This uses the PARENT2
flag. Thus, we must clear both the STALE and PARENT2 bits of all
commits, in case ahead_behind() is called multiple times in the same
process.
Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 19:26:53 +08:00
|
|
|
clear_bit_arrays(&bit_arrays);
|
|
|
|
clear_prio_queue(&queue);
|
|
|
|
}
|
commit-reach: add tips_reachable_from_bases()
Both 'git for-each-ref --merged=<X>' and 'git branch --merged=<X>' use
the ref-filter machinery to select references or branches (respectively)
that are reachable from a set of commits presented by one or more
--merged arguments. This happens within reach_filter(), which uses the
revision-walk machinery to walk history in a standard way.
However, the commit-reach.c file is full of custom searches that are
more efficient, especially for reachability queries that can terminate
early when reachability is discovered. Add a new
tips_reachable_from_bases() method to commit-reach.c and call it from
within reach_filter() in ref-filter.c. This affects both 'git branch'
and 'git for-each-ref' as tested in p1500-graph-walks.sh.
For the Linux kernel repository, we take an already-fast algorithm and
make it even faster:
Test HEAD~1 HEAD
-------------------------------------------------------------------
1500.5: contains: git for-each-ref --merged 0.13 0.02 -84.6%
1500.6: contains: git branch --merged 0.14 0.02 -85.7%
1500.7: contains: git tag --merged 0.15 0.03 -80.0%
(Note that we remove the iterative 'git rev-list' test from p1500
because it no longer makes sense as a comparison to 'git for-each-ref'
and would just waste time running it for these comparisons.)
The algorithm is implemented in commit-reach.c in the method
tips_reachable_from_base(). This method takes a string_list of tips and
assigns the 'util' for each item with the value 1 if the base commit can
reach those tips.
Like other reachability queries in commit-reach.c, the fastest way to
search for "can A reach B?" is to do a depth-first search up to the
generation number of B, preferring to explore first parents before later
parents. While we must walk all reachable commits up to that generation
number when the answer is "no", the depth-first search can answer "yes"
much faster than other approaches in most cases.
This search becomes trickier when there are multiple targets for the
depth-first search. The commits with lower generation number are more
likely to be within the history of the start commit, but we don't want
to waste time searching commits of low generation number if the commit
target with lowest generation number has already been found.
The trick here is to take the input commits and sort them by generation
number in ascending order. Track the index within this order as
min_generation_index. When we find a commit, if its index in the list is
equal to min_generation_index, then we can increase the generation
number boundary of our search to the next-lowest value in the list.
With this mechanism, the number of commits to search is minimized with
respect to the depth-first search heuristic. We will walk all commits up
to the minimum generation number of a commit that is _not_ reachable
from the start, but we will walk only the necessary portion of the
depth-first search for the reachable commits of lower generation.
Add extra tests for this behavior in t6600-test-reach.sh as the
interesting data shape of that repository can sometimes demonstrate
corner case bugs.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 19:26:55 +08:00
|
|
|
|
|
|
|
struct commit_and_index {
|
|
|
|
struct commit *commit;
|
|
|
|
unsigned int index;
|
|
|
|
timestamp_t generation;
|
|
|
|
};
|
|
|
|
|
|
|
|
static int compare_commit_and_index_by_generation(const void *va, const void *vb)
|
|
|
|
{
|
|
|
|
const struct commit_and_index *a = (const struct commit_and_index *)va;
|
|
|
|
const struct commit_and_index *b = (const struct commit_and_index *)vb;
|
|
|
|
|
|
|
|
if (a->generation > b->generation)
|
|
|
|
return 1;
|
|
|
|
if (a->generation < b->generation)
|
|
|
|
return -1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
void tips_reachable_from_bases(struct repository *r,
|
|
|
|
struct commit_list *bases,
|
|
|
|
struct commit **tips, size_t tips_nr,
|
|
|
|
int mark)
|
|
|
|
{
|
|
|
|
struct commit_and_index *commits;
|
|
|
|
size_t min_generation_index = 0;
|
|
|
|
timestamp_t min_generation;
|
|
|
|
struct commit_list *stack = NULL;
|
|
|
|
|
|
|
|
if (!bases || !tips || !tips_nr)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Do a depth-first search starting at 'bases' to search for the
|
|
|
|
* tips. Stop at the lowest (un-found) generation number. When
|
|
|
|
* finding the lowest commit, increase the minimum generation
|
|
|
|
* number to the next lowest (un-found) generation number.
|
|
|
|
*/
|
|
|
|
|
|
|
|
CALLOC_ARRAY(commits, tips_nr);
|
|
|
|
|
|
|
|
for (size_t i = 0; i < tips_nr; i++) {
|
|
|
|
commits[i].commit = tips[i];
|
|
|
|
commits[i].index = i;
|
|
|
|
commits[i].generation = commit_graph_generation(tips[i]);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Sort with generation number ascending. */
|
|
|
|
QSORT(commits, tips_nr, compare_commit_and_index_by_generation);
|
|
|
|
min_generation = commits[0].generation;
|
|
|
|
|
|
|
|
while (bases) {
|
|
|
|
repo_parse_commit(r, bases->item);
|
|
|
|
commit_list_insert(bases->item, &stack);
|
|
|
|
bases = bases->next;
|
|
|
|
}
|
|
|
|
|
|
|
|
while (stack) {
|
|
|
|
int explored_all_parents = 1;
|
|
|
|
struct commit_list *p;
|
|
|
|
struct commit *c = stack->item;
|
|
|
|
timestamp_t c_gen = commit_graph_generation(c);
|
|
|
|
|
|
|
|
/* Does it match any of our tips? */
|
|
|
|
for (size_t j = min_generation_index; j < tips_nr; j++) {
|
|
|
|
if (c_gen < commits[j].generation)
|
|
|
|
break;
|
|
|
|
|
|
|
|
if (commits[j].commit == c) {
|
|
|
|
tips[commits[j].index]->object.flags |= mark;
|
|
|
|
|
|
|
|
if (j == min_generation_index) {
|
|
|
|
unsigned int k = j + 1;
|
|
|
|
while (k < tips_nr &&
|
|
|
|
(tips[commits[k].index]->object.flags & mark))
|
|
|
|
k++;
|
|
|
|
|
|
|
|
/* Terminate early if all found. */
|
|
|
|
if (k >= tips_nr)
|
|
|
|
goto done;
|
|
|
|
|
|
|
|
min_generation_index = k;
|
|
|
|
min_generation = commits[k].generation;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
for (p = c->parents; p; p = p->next) {
|
|
|
|
repo_parse_commit(r, p->item);
|
|
|
|
|
|
|
|
/* Have we already explored this parent? */
|
|
|
|
if (p->item->object.flags & SEEN)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/* Is it below the current minimum generation? */
|
|
|
|
if (commit_graph_generation(p->item) < min_generation)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/* Ok, we will explore from here on. */
|
|
|
|
p->item->object.flags |= SEEN;
|
|
|
|
explored_all_parents = 0;
|
|
|
|
commit_list_insert(p->item, &stack);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (explored_all_parents)
|
|
|
|
pop_commit(&stack);
|
|
|
|
}
|
|
|
|
|
|
|
|
done:
|
|
|
|
free(commits);
|
|
|
|
repo_clear_commit_marks(r, SEEN);
|
2024-08-01 18:41:15 +08:00
|
|
|
free_commit_list(stack);
|
commit-reach: add tips_reachable_from_bases()
Both 'git for-each-ref --merged=<X>' and 'git branch --merged=<X>' use
the ref-filter machinery to select references or branches (respectively)
that are reachable from a set of commits presented by one or more
--merged arguments. This happens within reach_filter(), which uses the
revision-walk machinery to walk history in a standard way.
However, the commit-reach.c file is full of custom searches that are
more efficient, especially for reachability queries that can terminate
early when reachability is discovered. Add a new
tips_reachable_from_bases() method to commit-reach.c and call it from
within reach_filter() in ref-filter.c. This affects both 'git branch'
and 'git for-each-ref' as tested in p1500-graph-walks.sh.
For the Linux kernel repository, we take an already-fast algorithm and
make it even faster:
Test HEAD~1 HEAD
-------------------------------------------------------------------
1500.5: contains: git for-each-ref --merged 0.13 0.02 -84.6%
1500.6: contains: git branch --merged 0.14 0.02 -85.7%
1500.7: contains: git tag --merged 0.15 0.03 -80.0%
(Note that we remove the iterative 'git rev-list' test from p1500
because it no longer makes sense as a comparison to 'git for-each-ref'
and would just waste time running it for these comparisons.)
The algorithm is implemented in commit-reach.c in the method
tips_reachable_from_base(). This method takes a string_list of tips and
assigns the 'util' for each item with the value 1 if the base commit can
reach those tips.
Like other reachability queries in commit-reach.c, the fastest way to
search for "can A reach B?" is to do a depth-first search up to the
generation number of B, preferring to explore first parents before later
parents. While we must walk all reachable commits up to that generation
number when the answer is "no", the depth-first search can answer "yes"
much faster than other approaches in most cases.
This search becomes trickier when there are multiple targets for the
depth-first search. The commits with lower generation number are more
likely to be within the history of the start commit, but we don't want
to waste time searching commits of low generation number if the commit
target with lowest generation number has already been found.
The trick here is to take the input commits and sort them by generation
number in ascending order. Track the index within this order as
min_generation_index. When we find a commit, if its index in the list is
equal to min_generation_index, then we can increase the generation
number boundary of our search to the next-lowest value in the list.
With this mechanism, the number of commits to search is minimized with
respect to the depth-first search heuristic. We will walk all commits up
to the minimum generation number of a commit that is _not_ reachable
from the start, but we will walk only the necessary portion of the
depth-first search for the reachable commits of lower generation.
Add extra tests for this behavior in t6600-test-reach.sh as the
interesting data shape of that repository can sometimes demonstrate
corner case bugs.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 19:26:55 +08:00
|
|
|
}
|
commit-reach: add get_branch_base_for_tip
Add a new reachability algorithm that intends to discover (from a heuristic)
which branch was used as the starting point for a given commit. Add focused
tests using the 'test-tool reach' command.
In repositories that use pull requests (or merge requests) to advance one or
more "protected" branches, the history of that reference can be recovered by
following the first-parent history in most cases. Most are completed using
no-fast-forward merges, though squash merges are quite common. Less common
is rebase-and-merge, which still validates this assumption. Finally, the
case that breaks this assumption is the fast-forward update (with potential
rebasing). Even in this case, the previous commit commonly appears in the
first-parent history of the branch.
Similar assumptions can be made for a topic branch created by a single user
with the intention to merge back into another branch. Using 'git commit',
'git merge', and 'git cherry-pick' from HEAD will default to having the
first-parent commit be the previous commit at HEAD. This history changes
only with commands such as 'git reset' or 'git rebase', where the command
names also imply that the branch is starting from a new location.
With this movement of branches in mind, the following heuristic is proposed
as a way to determine the base branch for a given source branch:
Among a list of candidate base branches, select the candidate that
minimizes the number of commits in the first-parent history of the source
that are not in the first-parent history of the candidate.
Prior third-party solutions to this problem have used this optimization
criteria, but have relied upon extracting the first-parent history and
comparing those lists as tables instead of using commit-graph walks.
Given current command-line interface options, this optimization criteria is
not easy to detect directly. Even using the command
git rev-list --count --first-parent <base>..<source>
does not measure this count, as it uses full reachability from <base> to
determine which commits to remove from the range '<base>..<source>'. This
may lead to one asking if we should instead be using the full reachability
of the candidate and only the first-parent history of the source. This,
unfortunately, does not work for repositories that use long-lived branches
and automation to merge across those branches.
In extremely large repositories, merging into a single trunk may not be
feasible. This is usually due to the desired frequency of updates
(thousands of engineers doing daily work) combined with the time required to
perform a validation build. These factors combine to create significant
risk of semantic merge conflicts, leading to build breaks on the trunk. In
response, repository maintainers can create a single Level Zero (L0) trunk
and multiple Level One (L1) branches. By partitioning the engineers by
organization, these engineers may see lower risk of semantic merge conflicts
as well as be protected against build breaks in other L1 branches. The key
to making this system work is a semi-automated process of merging L1
branches into the L0 trunk and vice-versa. In a large enough organization,
these L1 branches may further split into L2 or L3 branches, but the same
principles apply for merging across deeper levels.
If these automated merges use a typical merge with the second parent
bringing in the "new" content, then each L0 and L1 branch can track its
previous positions by following first-parent history, which appear as
parallel paths (until reaching the first place where the branches diverged).
If we also walk to second parents, then the histories overlap significantly
and cannot be distinguished except for very-recent changes.
For this reason, the first-parent condition should be symmetrical across the
base and source branches.
Another common case for desiring the result of this optimization method is
the use of release branches. When releasing a version of a repository, a
branch can be used to track that release. Any updates that are worth fixing
in that release can be merged to the release branch and shipped with only
the necessary fixes without any new features introduced in the trunk branch.
The 'maint-2.<X>' branches represent this pattern in the Git project. The
microsoft/git fork uses 'vfs-2.<X>.<Y>' branches to track the changes that
are custom to that fork on top of each upstream Git release 2.<X>.<Y>. This
application doesn't need the symmetrical first-parent condition, but the use
of first-parent histories does not change the results for these branches.
To determine the base branch from a list of candidates, create a new method
in commit-reach.c that performs a single* commit-graph walk. The core
concept is to walk first-parents starting at the candidate bases and the
source, tracking the "best" base to reach a given commit. Use generation
numbers to ensure that a commit is walked at most once and all children have
been explored before visiting it. When reaching a commit that is reachable
from both a base and the source, we will then have a guarantee that this is
the closest intersection of first-parent histories. Track the best base to
reach that commit and return it as a result. In rare cases involving
multiple root commits, the first-parent history of the source may never
intersect any of the candidates and thus a null result is returned.
* There are up to two walks, since we require all commits to have a computed
generation number in order to avoid incorrect results. This is similar to
the need for computed generation numbers in ahead_behind() as implemented
in fd67d149bde (commit-reach: implement ahead_behind() logic, 2023-03-20).
In order to track the "best" base, use a new commit slab that stores an
integer. This value defaults to zero upon initialization, so use -1 to
track that the source commit can reach this commit and use 'i + 1' to track
that the ith base can reach this commit. When multiple bases can reach a
commit, minimize the index to break ties. This allows the caller to specify
an order to the bases that determines some amount of preference when the
heuristic does not result in a unique result.
The trickiest part of the integer slab is what happens when reaching a
collision among the histories of the bases and the history of the source.
This is noticed when viewing the first parent and seeing that it has a slab
value that differs in sign (negative or positive). In this case, the
collision commit is stored in the method variable 'branch_point' and its
slab value is set to -1. The index of the best base (so far) is stored in
the method variable 'best_index'. It is possible that there are multiple
commits that have the branch_point as its first parent, leading to multiple
updates of best_index. The result is determined when 'branch_point' is
visited in the commit walk, giving the guarantee that all commits that could
reach 'branch_point' were visited.
Several interesting cases of collisions and different results are tested in
the t6600-test-reach.sh script. Recall that this script also tests the
algorithm in three possible states involving the commit-graph file and how
many commits are written in the file. This provides some coverage of the
need (and lack of need) for the ensure_generations_valid() method.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 18:31:27 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* This slab initializes integers to zero, so use "-1" for "tip is best" and
|
|
|
|
* "i + 1" for "bases[i] is best".
|
|
|
|
*/
|
|
|
|
define_commit_slab(best_branch_base, int);
|
|
|
|
static struct best_branch_base best_branch_base;
|
|
|
|
#define get_best(c) (*best_branch_base_at(&best_branch_base, (c)))
|
|
|
|
#define set_best(c,v) (*best_branch_base_at(&best_branch_base, (c)) = (v))
|
|
|
|
|
|
|
|
int get_branch_base_for_tip(struct repository *r,
|
|
|
|
struct commit *tip,
|
|
|
|
struct commit **bases,
|
|
|
|
size_t bases_nr)
|
|
|
|
{
|
|
|
|
int best_index = -1;
|
|
|
|
struct commit *branch_point = NULL;
|
|
|
|
struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
|
|
|
|
int found_missing_gen = 0;
|
|
|
|
|
|
|
|
if (!bases_nr)
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
repo_parse_commit(r, tip);
|
|
|
|
if (commit_graph_generation(tip) == GENERATION_NUMBER_INFINITY)
|
|
|
|
found_missing_gen = 1;
|
|
|
|
|
|
|
|
/* Check for missing generation numbers. */
|
|
|
|
for (size_t i = 0; i < bases_nr; i++) {
|
|
|
|
struct commit *c = bases[i];
|
|
|
|
repo_parse_commit(r, c);
|
|
|
|
if (commit_graph_generation(c) == GENERATION_NUMBER_INFINITY)
|
|
|
|
found_missing_gen = 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (found_missing_gen) {
|
|
|
|
struct commit **commits;
|
|
|
|
size_t commits_nr = bases_nr + 1;
|
|
|
|
|
|
|
|
CALLOC_ARRAY(commits, commits_nr);
|
|
|
|
COPY_ARRAY(commits, bases, bases_nr);
|
|
|
|
commits[bases_nr] = tip;
|
|
|
|
ensure_generations_valid(r, commits, commits_nr);
|
|
|
|
free(commits);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Initialize queue and slab now that generations are guaranteed. */
|
|
|
|
init_best_branch_base(&best_branch_base);
|
|
|
|
set_best(tip, -1);
|
|
|
|
prio_queue_put(&queue, tip);
|
|
|
|
|
|
|
|
for (size_t i = 0; i < bases_nr; i++) {
|
|
|
|
struct commit *c = bases[i];
|
|
|
|
int best = get_best(c);
|
|
|
|
|
|
|
|
/* Has this already been marked as best by another commit? */
|
|
|
|
if (best) {
|
|
|
|
if (best == -1) {
|
|
|
|
/* We agree at this position. Stop now. */
|
|
|
|
best_index = i + 1;
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
set_best(c, i + 1);
|
|
|
|
prio_queue_put(&queue, c);
|
|
|
|
}
|
|
|
|
|
|
|
|
while (queue.nr) {
|
|
|
|
struct commit *c = prio_queue_get(&queue);
|
|
|
|
int best_for_c = get_best(c);
|
|
|
|
int best_for_p, positive;
|
|
|
|
struct commit *parent;
|
|
|
|
|
|
|
|
/* Have we reached a known branch point? It's optimal. */
|
|
|
|
if (c == branch_point)
|
|
|
|
break;
|
|
|
|
|
|
|
|
repo_parse_commit(r, c);
|
|
|
|
if (!c->parents)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
parent = c->parents->item;
|
|
|
|
repo_parse_commit(r, parent);
|
|
|
|
best_for_p = get_best(parent);
|
|
|
|
|
|
|
|
if (!best_for_p) {
|
|
|
|
/* 'parent' is new, so pass along best_for_c. */
|
|
|
|
set_best(parent, best_for_c);
|
|
|
|
prio_queue_put(&queue, parent);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (best_for_p > 0 && best_for_c > 0) {
|
|
|
|
/* Collision among bases. Minimize. */
|
|
|
|
if (best_for_c < best_for_p)
|
|
|
|
set_best(parent, best_for_c);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* At this point, we have reached a commit that is reachable
|
|
|
|
* from the tip, either from 'c' or from an earlier commit to
|
|
|
|
* have 'parent' as its first parent.
|
|
|
|
*
|
|
|
|
* Update 'best_index' to match the minimum of all base indices
|
|
|
|
* to reach 'parent'.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* Exactly one is positive due to initial conditions. */
|
|
|
|
positive = (best_for_c < 0) ? best_for_p : best_for_c;
|
|
|
|
|
|
|
|
if (best_index < 0 || positive < best_index)
|
|
|
|
best_index = positive;
|
|
|
|
|
|
|
|
/* No matter what, track that the parent is reachable from tip. */
|
|
|
|
set_best(parent, -1);
|
|
|
|
branch_point = parent;
|
|
|
|
}
|
|
|
|
|
|
|
|
cleanup:
|
|
|
|
clear_best_branch_base(&best_branch_base);
|
|
|
|
clear_prio_queue(&queue);
|
|
|
|
return best_index > 0 ? best_index - 1 : -1;
|
|
|
|
}
|