clean: optimize and document cases where we recurse into subdirectories

Commit 6b1db43109 ("clean: teach clean -d to preserve ignored paths",
2017-05-23) added the following code block (among others) to git-clean:
    if (remove_directories)
        dir.flags |= DIR_SHOW_IGNORED_TOO | DIR_KEEP_UNTRACKED_CONTENTS;
The reason for these flags is well documented in the commit message, but
isn't obvious just from looking at the code.  Add some explanations to
the code to make it clearer.

Further, it appears git-2.26 did not correctly handle this combination
of flags from git-clean.  With both these flags and without
DIR_SHOW_IGNORED_TOO_MODE_MATCHING set, git is supposed to recurse into
all untracked AND ignored directories.  git-2.26.0 clearly was not doing
that.  I don't know the full reasons for that or whether git < 2.27.0
had additional unknown bugs because of that misbehavior, because I don't
feel it's worth digging into.  As per the huge changes and craziness
documented in commit 8d92fb2927 ("dir: replace exponential algorithm
with a linear one", 2020-04-01), the old algorithm was a mess and was
thrown out.  What I can say is that git-2.27.0 correctly recurses into
untracked AND ignored directories with that combination.

However, in clean's case we don't need to recurse into ignored
directories; that is just a waste of time.  Thus, when git-2.27.0
started correctly handling those flags, we got a performance regression
report.  Rather than relying on other bugs in fill_directory()'s former
logic to provide the behavior of skipping ignored directories, make use
of the DIR_SHOW_IGNORED_TOO_MODE_MATCHING value specifically added in
commit eec0f7f2b7 ("status: add option to show ignored files
differently", 2017-10-30) for this purpose.

Reported-by: Brian Malehorn <bmalehorn@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Elijah Newren 2020-06-11 06:59:33 +00:00 committed by Junio C Hamano
parent f7f5c6c0ba
commit 7233f17577

View File

@ -955,8 +955,37 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
remove_directories = 1;
}
if (remove_directories && !ignored_only)
dir.flags |= DIR_SHOW_IGNORED_TOO | DIR_KEEP_UNTRACKED_CONTENTS;
if (remove_directories && !ignored_only) {
/*
* We need to know about ignored files too:
*
* If (ignored), then we will delete ignored files as well.
*
* If (!ignored), then even though we not are doing
* anything with ignored files, we need to know about them
* so that we can avoid deleting a directory of untracked
* files that also contains an ignored file within it.
*
* For the (!ignored) case, since we only need to avoid
* deleting ignored files, we can set
* DIR_SHOW_IGNORED_TOO_MODE_MATCHING in order to avoid
* recursing into a directory which is itself ignored.
*/
dir.flags |= DIR_SHOW_IGNORED_TOO;
if (!ignored)
dir.flags |= DIR_SHOW_IGNORED_TOO_MODE_MATCHING;
/*
* Let the fill_directory() machinery know that we aren't
* just recursing to collect the ignored files; we want all
* the untracked ones so that we can delete them. (Note:
* we could also set DIR_KEEP_UNTRACKED_CONTENTS when
* ignored_only is true, since DIR_KEEP_UNTRACKED_CONTENTS
* only has effect in combination with DIR_SHOW_IGNORED_TOO. It makes
* the code clearer to exclude it, though.
*/
dir.flags |= DIR_KEEP_UNTRACKED_CONTENTS;
}
if (read_cache() < 0)
die(_("index file corrupt"));