fast-export: fix surprising behavior with --first-parent

The revision traversal machinery typically processes and returns all
children before any parent.  fast-export needs to operate in the
reverse fashion, handling parents before any of their children in
order to build up the history starting from the root commit(s).  This
would be a clear case where we could just use the revision traversal
machinery's "reverse" option to achieve this desired affect.

However, this wasn't what the code did.  It added its own array for
queuing.  The obvious hand-rolled solution would be to just push all
the commits into the array and then traverse afterwards, but it didn't
quite do that either.  It instead attempted to process anything it
could as soon as it could, and once it could, check whether it could
process anything that had been queued.  As far as I can tell, this was
an effort to save a little memory in the case of multiple root commits
since it could process some commits before queueing all of them.  This
involved some helper functions named has_unshown_parent() and
handle_tail().  For typical invocations of fast-export, this
alternative essentially amounted to a hand-rolled method of reversing
the commits -- it was a bunch of work to duplicate the revision
traversal machinery's "reverse" option.

This hand-rolled reversing mechanism is actually somewhat difficult to
reason about.  It takes some time to figure out how it ensures in
normal cases that it will actually process all traversed commits
(rather than just dropping some and not printing anything for them).

And it turns out there are some cases where the code does drop commits
without handling them, and not even printing an error or warning for
the user.  Due to the has_unshown_parent() checks, some commits could
be left in the array at the end of the "while...get_revision()" loop
which would be unprocessed.  This could be triggered for example with
    git fast-export main -- --first-parent
or non-sensical traversal rules such as
    git fast-export main -- --grep=Merge --invert-grep

While most traversals that don't include all parents should likely
trigger errors in fast-export (or at least require being used in
combination with --reference-excluded-parents), the --first-parent
traversal is at least reasonable and it'd be nice if it didn't just drop
commits. It'd also be nice for future readers of the code to have a
simpler "reverse traversal" mechanism. Use the "reverse" option of the
revision traversal machinery to achieve both.

Even for the non-sensical traversal flags like the --grep one above,
this would be an improvement. For example, in that case, the code
previously would have silently truncated history to only those commits
that do not have an ancestor containing "Merge" in their commit message.
After this code change, that case would include all commits without
"Merge" in their commit message -- but any commit that previously had a
"Merge"-mentioning parent would lose that parent
(likely resulting in many new root commits). While the new behavior is
still odd, it is at least understandable given that
--reference-excluded-parents is not the default.

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: William Sprent <williams@unity3d.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
William Sprent 2021-12-16 16:23:09 +00:00 committed by Junio C Hamano
parent e9d7761bb9
commit 726a228dfb
2 changed files with 36 additions and 36 deletions

View File

@ -107,18 +107,6 @@ static int parse_opt_reencode_mode(const struct option *opt,
static struct decoration idnums;
static uint32_t last_idnum;
static int has_unshown_parent(struct commit *commit)
{
struct commit_list *parent;
for (parent = commit->parents; parent; parent = parent->next)
if (!(parent->item->object.flags & SHOWN) &&
!(parent->item->object.flags & UNINTERESTING))
return 1;
return 0;
}
struct anonymized_entry {
struct hashmap_entry hash;
const char *anon;
@ -752,20 +740,6 @@ static char *anonymize_tag(void *data)
return strbuf_detach(&out, NULL);
}
static void handle_tail(struct object_array *commits, struct rev_info *revs,
struct string_list *paths_of_changed_objects)
{
struct commit *commit;
while (commits->nr) {
commit = (struct commit *)object_array_pop(commits);
if (has_unshown_parent(commit)) {
/* Queue again, to be handled later */
add_object_array(&commit->object, NULL, commits);
return;
}
handle_commit(commit, revs, paths_of_changed_objects);
}
}
static void handle_tag(const char *name, struct tag *tag)
{
@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt,
int cmd_fast_export(int argc, const char **argv, const char *prefix)
{
struct rev_info revs;
struct object_array commits = OBJECT_ARRAY_INIT;
struct commit *commit;
char *export_filename = NULL,
*import_filename = NULL,
@ -1283,18 +1256,13 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
if (prepare_revision_walk(&revs))
die("revision walk setup failed");
revs.reverse = 1;
revs.diffopt.format_callback = show_filemodify;
revs.diffopt.format_callback_data = &paths_of_changed_objects;
revs.diffopt.flags.recursive = 1;
while ((commit = get_revision(&revs))) {
if (has_unshown_parent(commit)) {
add_object_array(&commit->object, NULL, &commits);
}
else {
handle_commit(commit, &revs, &paths_of_changed_objects);
handle_tail(&commits, &revs, &paths_of_changed_objects);
}
}
while ((commit = get_revision(&revs)))
handle_commit(commit, &revs, &paths_of_changed_objects);
handle_tags_and_duplicates(&extra_refs);
handle_tags_and_duplicates(&tag_refs);

View File

@ -750,4 +750,36 @@ test_expect_success 'merge commit gets exported with --import-marks' '
)
'
test_expect_success 'fast-export --first-parent outputs all revisions output by revision walk' '
git init first-parent &&
(
cd first-parent &&
test_commit A &&
git checkout -b topic1 &&
test_commit B &&
git checkout main &&
git merge --no-ff topic1 &&
git checkout -b topic2 &&
test_commit C &&
git checkout main &&
git merge --no-ff topic2 &&
test_commit D &&
git fast-export main -- --first-parent >first-parent-export &&
git fast-export main -- --first-parent --reverse >first-parent-reverse-export &&
test_cmp first-parent-export first-parent-reverse-export &&
git init import &&
git -C import fast-import <first-parent-export &&
git log --format="%ad %s" --first-parent main >expected &&
git -C import log --format="%ad %s" --all >actual &&
test_cmp expected actual &&
test_line_count = 4 actual
)
'
test_done