Commit Graph

132 Commits

Author SHA1 Message Date
Jeff King
42efa1231a filter-branch: drop $_x40 glob
When checking whether a commit was rewritten to a single object id, we
use a glob that insists on a 40-hex result. This works for sha1, but
fails t7003 when run with GIT_TEST_DEFAULT_HASH=sha256.

Since the previous commit simplified the case statement here, we only
have two arms: an empty string or a single object id. We can just loosen
our glob to match anything, and still distinguish those cases (we lose
the ability to notice bogus input, but that's not a problem; we are the
one who wrote the map in the first place, and anyway update-ref will
complain loudly if the input isn't a valid hash).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-10 14:16:58 -08:00
Jeff King
98fe9e666f filter-branch: drop multiple-ancestor warning
When a ref maps to a commit that is neither rewritten nor kept by
filter-branch (e.g., because it was eliminated by rev-list's pathspec
selection), we rewrite it to its nearest ancestor.

Since the initial commit in 6f6826c52b (Add git-filter-branch,
2007-06-03), we have warned when there are multiple such ancestors in
the map file. However, the warning code is impossible to trigger these
days. Since a0e46390d3 (filter-branch: fix ref rewriting with
--subdirectory-filter, 2008-08-12), we find the ancestor using "rev-list
-1", so it can only ever have a single value.

This code is made doubly confusing by the fact that we append to the map
file when mapping ancestors. However, this can never yield multiple
values because:

  - we explicitly check whether the map already exists, and if so, do
    nothing (so our "append" will always be to a file that does not
    exist)

  - even if we were to try mapping twice, the process to do so is
    deterministic. I.e., we'd always end up with the same ancestor for a
    given sha1. So warning about it would be pointless; there is no
    ambiguity.

So swap out the warning code for a BUG (which we'll simplify further in
the next commit). And let's stop using the append operator to make the
ancestor-mapping code less confusing.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-10 14:14:52 -08:00
Elijah Newren
9df53c5de6 Recommend git-filter-repo instead of git-filter-branch
filter-branch suffers from a deluge of disguised dangers that disfigure
history rewrites (i.e. deviate from the deliberate changes).  Many of
these problems are unobtrusive and can easily go undiscovered until the
new repository is in use.  This can result in problems ranging from an
even messier history than what led folks to filter-branch in the first
place, to data loss or corruption.  These issues cannot be backward
compatibly fixed, so add a warning to both filter-branch and its manpage
recommending that another tool (such as filter-repo) be used instead.

Also, update other manpages that referenced filter-branch.  Several of
these needed updates even if we could continue recommending
filter-branch, either due to implying that something was unique to
filter-branch when it applied more generally to all history rewriting
tools (e.g. BFG, reposurgeon, fast-import, filter-repo), or because
something about filter-branch was used as an example despite other more
commonly known examples now existing.  Reword these sections to fix
these issues and to avoid recommending filter-branch.

Finally, remove the section explaining BFG Repo Cleaner as an
alternative to filter-branch.  I feel somewhat bad about this,
especially since I feel like I learned so much from BFG that I put to
good use in filter-repo (which is much more than I can say for
filter-branch), but keeping that section presented a few problems:
  * In order to recommend that people quit using filter-branch, we need
    to provide them a recomendation for something else to use that
    can handle all the same types of rewrites.  To my knowledge,
    filter-repo is the only such tool.  So it needs to be mentioned.
  * I don't want to give conflicting recommendations to users
  * If we recommend two tools, we shouldn't expect users to learn both
    and pick which one to use; we should explain which problems one
    can solve that the other can't or when one is much faster than
    the other.
  * BFG and filter-repo have similar performance
  * All filtering types that BFG can do, filter-repo can also do.  In
    fact, filter-repo comes with a reimplementation of BFG named
    bfg-ish which provides the same user-interface as BFG but with
    several bugfixes and new features that are hard to implement in
    BFG due to its technical underpinnings.
While I could still mention both tools, it seems like I would need to
provide some kind of comparison and I would ultimately just say that
filter-repo can do everything BFG can, so ultimately it seems that it
is just better to remove that section altogether.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-05 13:01:48 -07:00
Junio C Hamano
676c7e50b1 Merge branch 'mb/filter-branch-optim'
"git filter-branch" when used with the "--state-branch" option
still attempted to rewrite the commits whose filtered result is
known from the previous attempt (which is recorded on the state
branch); the command has been corrected not to waste cycles doing
so.

* mb/filter-branch-optim:
  filter-branch: skip commits present on --state-branch
2018-07-18 12:20:32 -07:00
Michael Barabanov
709cfe848a filter-branch: skip commits present on --state-branch
The commits in state:filter.map have already been processed, so don't
filter them again. This makes incremental git filter-branch much faster.

Also add tests for --state-branch option.

Signed-off-by: Michael Barabanov <michael.barabanov@gmail.com>
Acked-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-26 15:44:53 -07:00
brian m. carlson
03a7f388da Update shell scripts to compute empty tree object ID
Several of our shell scripts hard-code the object ID of the empty tree.
To avoid any problems when changing hashes, compute this value on
startup of the script.  For performance, store the value in a variable
and reuse it throughout the life of the script.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02 13:59:53 +09:00
Junio C Hamano
9aa3a4c406 Merge branch 'yk/filter-branch-non-committish-refs'
when refs that do not point at committish are given, "git
filter-branch" gave a misleading error messages.  This has been
corrected.

* yk/filter-branch-non-committish-refs:
  filter-branch: fix errors caused by refs that point at non-committish
2018-04-10 16:28:23 +09:00
Junio C Hamano
cb3e97dae8 Merge branch 'ml/filter-branch-no-op-error'
"git filter-branch" learned to use a different exit code to allow
the callers to tell the case where there was no new commits to
rewrite from other error cases.

* ml/filter-branch-no-op-error:
  filter-branch: return 2 when nothing to rewrite
2018-04-10 08:25:44 +09:00
Yuki Kokubun
f78ab355e7 filter-branch: fix errors caused by refs that point at non-committish
"git filter-branch -- --all" prints error messages when processing refs that
point at objects that are not committish. Such refs can be created by
"git replace" with trees or blobs. And also "git tag" with trees or blobs can
create such refs.

Filter these problematic refs out early, before they are seen by the logic to
see which refs have been modified and which have been left intact (which is
where the unwanted error messages come from), and warn that these refs are left
unwritten while doing so.

Signed-off-by: Yuki Kokubun <orga.chem.job@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-25 10:12:27 -07:00
Michele Locati
206a6ae013 filter-branch: use printf instead of echo -e
In order to echo a tab character, it's better to use printf instead of
"echo -e", because it's more portable (for instance, "echo -e" doesn't work
as expected on a Mac).

This solves the "fatal: Not a valid object name" error in git-filter-branch
when using the --state-branch option.

Furthermore, let's switch from "/bin/echo" to just "echo", so that the
built-in echo command is used where available.

Signed-off-by: Michele Locati <michele@locati.it>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-19 10:59:28 -07:00
Michele Locati
0a0eb2e585 filter-branch: return 2 when nothing to rewrite
Using the --state-branch option allows us to perform incremental filtering.
This may lead to having nothing to rewrite in subsequent filtering, so we need
a way to recognize this case.
So, let's exit with 2 instead of 1 when this "error" occurs.

Signed-off-by: Michele Locati <michele@locati.it>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-15 10:41:51 -07:00
Junio C Hamano
e336afdfb6 Merge branch 'dg/filter-branch-filter-order-doc'
Update the documentation for "git filter-branch" so that the filter
options are listed in the same order as they are applied, as
described in an earlier part of the doc.

* dg/filter-branch-filter-order-doc:
  doc: list filter-branch subdirectory-filter first
2017-10-19 14:45:45 +09:00
David Glasser
07c4984508 doc: list filter-branch subdirectory-filter first
The docs claim that filters are applied in the listed order, so
subdirectory-filter should come first.

For consistency, apply the same order to the SYNOPSIS and the script's usage, as
well as the switch while parsing arguments.

Add missing --prune-empty to the script's usage.

Signed-off-by: David Glasser <glasser@davidglasser.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-18 09:10:15 +09:00
Ian Campbell
b2c1ca6b4b filter-branch: use hash-object instead of mktag
This allows us to recreate even historical tags which would now be consider
invalid, such as v2.6.12-rc2..v2.6.13-rc3 in the Linux kernel source tree which
lack the `tagger` header.

    $ git rev-parse v2.6.12-rc2
    9e734775f7c22d2f89943ad6c745571f1930105f
    $ git cat-file tag v2.6.12-rc2 | git mktag
    error: char76: could not find "tagger "
    fatal: invalid tag signature file
    $ git cat-file tag v2.6.12-rc2 | git hash-object -t tag -w --stdin
    9e734775f7c22d2f89943ad6c745571f1930105f

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-22 12:57:45 +09:00
Ian Campbell
bd2c79fbfe filter-branch: stash away ref map in a branch
With "--state-branch=<branchname>" option, the mapping from old object names
and filtered ones in ./map/ directory is stashed away in the object database,
and the one from the previous run is read to populate the ./map/ directory,
allowing for incremental updates of large trees.

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-22 12:57:43 +09:00
Ian Campbell
7b1378bd95 filter-branch: preserve and restore $GIT_AUTHOR_* and $GIT_COMMITTER_*
These are modified by set_ident() but a subsequent patch would like to operate
on their original values.

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-22 12:57:42 +09:00
Ian Campbell
d24813c460 filter-branch: reset $GIT_* before cleaning up
This is pure code motion to enable a subsequent patch to add code which needs
to happen with the reset $GIT_* but before the temporary directory has been
cleaned up.

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-22 12:57:40 +09:00
Andreas Heiduk
d612975e8e filter-branch: add [--] to usage
Signed-off-by: Andreas Heiduk <asheiduk@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-12 09:49:52 -07:00
Andreas Heiduk
3b117f7301 filter-branch: add --setup step
A `--setup` step in `git filter-branch` makes it much easier to
define the initial values of variables used in the real filters.
Also sourcing/defining utility functions here instead of
`--env-filter` improves performance and minimizes clogging the
output in case of errors.

Signed-off-by: Andreas Heiduk <asheiduk@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-12 09:44:54 -07:00
Jean-Noel Avila
6963893943 git-filter-branch: be more direct in an error message
git-filter-branch requires the specification of a branch by one way or
another. If no branch appears to have been specified, we know the user
got the usage wrong but we don't know what they were trying to do ---
e.g. maybe they specified the ref to rewrite but in the wrong place.

In this case, just state that the branch specification is missing.

Signed-off-by: Jean-Noel Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-12 15:27:10 +09:00
Devin J. Pohly
a582a82d24 filter-branch: fix --prune-empty on parentless commits
Previously, the git_commit_non_empty_tree function would always pass any
commit with no parents to git-commit-tree, regardless of whether the
tree was nonempty.  The new commit would then be recorded in the
filter-branch revision map, and subsequent commits which leave the tree
untouched would be correctly filtered.

With this change, parentless commits with an empty tree are correctly
pruned, and an empty file is recorded in the revision map, signifying
that it was rewritten to "no commits."  This works naturally with the
parent mapping for subsequent commits.

Signed-off-by: Devin J. Pohly <djpohly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-03 12:43:37 -08:00
Junio C Hamano
a2ec9484c1 Merge branch 'jk/filter-branch-no-index'
A recent optimization to filter-branch in v2.7.0 introduced a
regression when --prune-empty filter is used, which has been
corrected.

* jk/filter-branch-no-index:
  filter-branch: resolve $commit^{tree} in no-index case
2016-01-28 16:10:12 -08:00
Jeff King
1dc413ebe5 filter-branch: resolve $commit^{tree} in no-index case
Commit 348d4f2 (filter-branch: skip index read/write when
possible, 2015-11-06) taught filter-branch to optimize out
the final "git write-tree" when we know we haven't touched
the tree with any of our filters. It does by simply putting
the literal text "$commit^{tree}" into the "$tree" variable,
avoiding a useless rev-parse call.

However, when we pass this to git_commit_non_empty_tree(),
it gets confused; it resolves "$commit^{tree}" itself, and
compares our string to the 40-hex sha1, which obviously
doesn't match. As a result, "--prune-empty" (or any custom
filter using git_commit_non_empty_tree) will fail to drop
an empty commit (when filter-branch is used without a tree
or index filter).

Let's resolve $tree to the 40-hex ourselves, so that
git_commit_non_empty_tree can work. Unfortunately, this is a
bit slower due to the extra process overhead:

  $ cd t/perf && ./run 348d4f2 HEAD p7000-filter-branch.sh
  [...]
  Test                  348d4f2           HEAD
  --------------------------------------------------------------
  7000.2: noop filter   3.76(0.24+0.26)   4.54(0.28+0.24) +20.7%

We could try to make git_commit_non_empty_tree more clever.
However, the value of $tree here is technically
user-visible. The user can provide arbitrary shell code at
this stage, which could itself have a similar assumption to
what is in git_commit_non_empty_tree. So the conservative
choice to fix this regression is to take the 20% hit and
give the pre-348d4f2 behavior. We still end up much faster
than before the optimization:

  $ cd t/perf && ./run 348d4f2^ HEAD p7000-filter-branch.sh
  [...]
  Test                  348d4f2^          HEAD
  --------------------------------------------------------------
  7000.2: noop filter   9.51(4.32+0.40)   4.51(0.28+0.23) -52.6%

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-19 14:20:56 -08:00
Junio C Hamano
2e5adec97a Merge branch 'jk/filter-branch-no-index'
Speed up filter-branch for cases where we only care about rewriting
commits, not tree data.

* jk/filter-branch-no-index:
  filter-branch: skip index read/write when possible
2015-12-04 11:19:10 -08:00
Jeff King
40fdcc5357 Merge branch 'maint'
* maint:
  http: treat config options sslCAPath and sslCAInfo as paths
  Documentation/diff: give --word-diff-regex=. example
  filter-branch: deal with object name vs. pathname ambiguity in tree-filter
  check-ignore: correct documentation about output
  git-p4: clean up after p4 submit failure
  git-p4: work with a detached head
  git-p4: add option to system() to return subshell status
  git-p4: add failing test for submit from detached head
  remote-http(s): support SOCKS proxies
  t5813: avoid creating urls that break on cygwin
  Escape Git's exec path in contrib/rerere-train.sh script
  allow hooks to ignore their standard input stream
  rebase-i-exec: Allow space in SHELL_PATH
  Documentation: make environment variable formatting more consistent
2015-12-01 17:32:38 -05:00
SZEDER Gábor
4d2a3646d1 filter-branch: deal with object name vs. pathname ambiguity in tree-filter
'git filter-branch' fails complaining about an ambiguous argument, if
a tree-filter renames a path and the new pathname happens to match an
existing object name.

After the tree-filter has been applied, 'git filter-branch' looks for
changed paths by running:

  git diff-index -r --name-only --ignore-submodules $commit

which then, because of the lack of disambiguating double-dash, can't
decide whether to treat '$commit' as revision or path and errors out.

Add that disambiguating double-dash after 'git diff-index's revision
argument to make sure that '$commit' is interpreted as a revision.

Signed-off-by: SZEDER Gábor <szeder@ira.uka.de>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-24 18:37:50 -05:00
Jeff King
348d4f2fc5 filter-branch: skip index read/write when possible
If the user specifies an index filter but not a tree filter,
filter-branch cleverly avoids checking out the tree
entirely. But we don't do the next level of optimization: if
you have no index or tree filter, we do not need to read the
index at all.

This can greatly speed up cases where we are only changing
the commit objects (e.g., cementing a graft into place).
Here are numbers from the newly-added perf test:

  Test                  HEAD^              HEAD
  ---------------------------------------------------------------
  7000.2: noop filter   13.81(4.95+0.83)   5.43(0.42+0.43) -60.7%

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-11-06 09:35:49 -08:00
Junio C Hamano
1551511bdb Merge branch 'jk/filter-branch-use-of-sed-on-incomplete-line'
A recent "filter-branch --msg-filter" broke skipping of the commit
object header, which is fixed.

* jk/filter-branch-use-of-sed-on-incomplete-line:
  filter-branch: remove multi-line headers in msg filter
2015-10-16 14:42:47 -07:00
James McCoy
a5a4b3ff4d filter-branch: remove multi-line headers in msg filter
df062010 (filter-branch: avoid passing commit message through sed)
introduced a regression when filtering commits with multi-line headers,
if the header contains a blank line.  An example of this is a gpg-signed
commit:

  $ git cat-file commit signed-commit
  tree 3d4038e029712da9fc59a72afbfcc90418451630
  parent 110eac945dc1713b27bdf49e74e5805db66971f0
  author A U Thor <author@example.com> 1112912413 -0700
  committer C O Mitter <committer@example.com> 1112912413 -0700
  gpgsig -----BEGIN PGP SIGNATURE-----
   Version: GnuPG v1

   iEYEABECAAYFAlYXADwACgkQE7b1Hs3eQw23CACgldB/InRyDgQwyiFyMMm3zFpj
   pUsAnA+f3aMUsd9mNroloSmlOgL6jIMO
   =0Hgm
   -----END PGP SIGNATURE-----

  Adding gpg

As a consequence, "filter-branch --msg-filter cat" (which should leave the
commit message unchanged) spills the signature (after the internal blank
line) into the original commit message.

The reason is that although the signature is indented, making the line a
whitespace only line, the "read" call is splitting the line based on
the shell's IFS, which defaults to <space><tab><newline>.  The leading
space is consumed and $header_line is empty, causing the "skip header
lines" loop to exit.

The rest of the commit object is then re-used as the rewritten commit
message, causing the new message to include the signature of the
original commit.

Set IFS to an empty string for the "read" call, thus disabling the word
splitting, which causes $header_line to be set to the non-empty value ' '.
This allows the loop to fully consume the header lines before
emitting the original, intact commit message.

[jc: this is literally based on MJG's suggestion]

Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: James McCoy <vega.james@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-12 11:23:19 -07:00
Junio C Hamano
71400d97b1 filter-branch: make report-progress more readable
The name of some variables that are used very locally in this
function were overly long; they were making the lines harder to read
and the longer names didn't add much more information.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-21 15:19:06 -07:00
Gabor Bernat
6a9d16a0a8 filter-branch: add passed/remaining seconds on progress
adds seconds progress and estimated seconds time if getting the current
timestamp is supported by the date +%s command

Signed-off-by: Gabor Bernat <gabor.bernat@gravityrd.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-21 15:19:06 -07:00
Jeff King
df0620108b filter-branch: avoid passing commit message through sed
On some systems (like OS X), if sed encounters input without
a trailing newline, it will silently add it. As a result,
"git filter-branch" on such systems may silently rewrite
commit messages that omit a trailing newline. Even though
this is not something we generate ourselves with "git
commit", it's better for filter-branch to preserve the
original data as closely as possible.

We're using sed here only to strip the header fields from
the commit object. We can accomplish the same thing with a
shell loop. Since shell "read" calls are slow (usually one
syscall per byte), we use "cat" once we've skipped past the
header. Depending on the size of your commit messages, this
is probably faster (you pay the cost to fork, but then read
the data in saner-sized chunks). This idea is shamelessly
stolen from Junio.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-29 10:01:04 -07:00
Charles Bailey
79bc4ef368 filter-branch: eliminate duplicate mapped parents
When multiple parents of a merge commit get mapped to the same
commit, filter-branch used to pass all instances of the parent
commit to the parent and commit filters and to "git commit-tree" or
"git_commit_non_empty_tree".

This can often happen when extracting a small project from a large
repository; merges can join history with no commits on any branch
which affect the paths being retained.  Once the intermediate
commits have been filtered out, all the immediate parents of the
merge commit can end up being mapped to the same commit - either the
original merge-base or an ancestor of it.

"git commit-tree" would display an error but write the commit with
the normalized parents in any case.  "git_commit_non_empty_tree"
would fail to notice that the commit being made was in fact a
non-merge commit and would retain it even if a further pass with
"--prune-empty" would discard the commit as empty.

Ensure that duplicate parents are pruned before the parent filter to
make "--prune-empty" idempotent, removing all empty non-merge
commits in a singe pass.

Signed-off-by: Charles Bailey <cbailey32@bloomberg.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-01 08:30:41 -07:00
Junio C Hamano
f52752d36a Merge branch 'lc/filter-branch-too-many-refs'
"git filter-branch" in a repository with many refs blew limit of
command line length.

* lc/filter-branch-too-many-refs:
  Allow git-filter-branch to process large repositories with lots of branches.
2013-10-17 15:55:12 -07:00
Lee Carver
3361a548db Allow git-filter-branch to process large repositories with lots of branches.
A recommended way to move trees between repositories is to use
git-filter-branch to revise the history for a single tree:

However, this can lead to "argument list too long" errors when the
original repository has many retained branches (>6k)

    /usr/local/git/libexec/git-core/git-filter-branch: line 270:
    /usr/local/git/libexec/git-core/git: Argument list too long
    Could not get the commits

Saving the output from rev-parse and feeding it into rev-list from
its standard input avoids this problem, since the rev-parse output
is not processed as a command line argument.

Signed-off-by: Lee Carver <Lee.Carver@servicenow.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-12 11:00:51 -07:00
Jeff King
83bd7437ca write_index: optionally allow broken null sha1s
Commit 4337b58 (do not write null sha1s to on-disk index,
2012-07-28) added a safety check preventing git from writing
null sha1s into the index. The intent was to catch errors in
other parts of the code that might let such an entry slip
into the index (or worse, a tree).

Some existing repositories may have invalid trees that
contain null sha1s already, though.  Until 4337b58, a common
way to clean this up would be to use git-filter-branch's
index-filter to repair such broken entries.  That now fails
when filter-branch tries to write out the index.

Introduce a GIT_ALLOW_NULL_SHA1 environment variable to
relax this check and make it easier to recover from such a
history.

It is tempting to not involve filter-branch in this commit
at all, and instead require the user to manually invoke

	GIT_ALLOW_NULL_SHA1=1 git filter-branch ...

to perform an index-filter on a history with trees with null
sha1s.  That would be slightly safer, but requires some
specialized knowledge from the user.  So let's set the
GIT_ALLOW_NULL_SHA1 variable automatically when checking out
the to-be-filtered trees.  Advice on using filter-branch to
remove such entries already exists on places like
stackoverflow, and this patch makes it Just Work again on
recent versions of git.

Further commands that touch the index will still notice and
fail, unless they actually remove the broken entries.  A
filter-branch whose filters do not touch the index at all
will not error out (since we complain of the null sha1 only
on writing, not when making a tree out of the index), but
this is acceptable, as we still print a loud warning, so the
problem is unlikely to go unnoticed.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-28 20:54:43 -07:00
Junio C Hamano
9a11f13d9e Merge branch 'jk/filter-branch-come-back-to-original'
When used with "-d temporary-directory" option, "git filter-branch"
failed to come back to the original working tree to perform the
final clean-up procedure.

* jk/filter-branch-come-back-to-original:
  filter-branch: return to original dir after filtering
2013-04-07 14:29:34 -07:00
Jeff King
97276019bb filter-branch: return to original dir after filtering
The first thing filter-branch does is to create a temporary
directory, either ".git-rewrite" in the current directory
(which may be the working tree or the repository if bare),
or in a directory specified by "-d". We then chdir to
$tempdir/t as our temporary working directory in which to run
tree filters.

After finishing the filter, we then attempt to go back to
the original directory with "cd ../..". This works in the
.git-rewrite case, but if "-d" is used, we end up in a
random directory. The only thing we do after this chdir is
to run git-read-tree, but that means that:

  1. The working directory is not updated to reflect the
     filtered history.

  2. We dump random files into "$tempdir/.." (e.g., if you
     use "-d /tmp/foo", we dump junk into /tmp).

Fix it by recording the full path to the original directory
and returning there explicitly.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-02 13:34:55 -07:00
Jeff King
3c730fab2c filter-branch: use git-sh-setup's ident parsing functions
This saves us some code, but it also reduces the number of
processes we start for each filtered commit. Since we can
parse both author and committer in the same sed invocation,
we save one process. And since the new interface avoids tr,
we save 4 processes.

It also avoids using "tr", which has had some odd
portability problems reported with from Solaris's xpg6
version.

We also tweak one of the tests in t7003 to double-check that
we are properly exporting the variables (because test-lib.sh
exports GIT_AUTHOR_NAME, it will be automatically exported
in subprograms. We override this to make sure that
filter-branch handles it properly itself).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-18 15:43:49 -07:00
Junio C Hamano
9a0231b395 Merge branch 'jc/maint-filter-branch-epoch-date'
In 1.7.9 era, we taught "git rebase" about the raw timestamp format
but we did not teach the same trick to "filter-branch", which rolled
a similar logic on its own.  Because of this, "filter-branch" failed
to rewrite commits with ancient timestamps.

* jc/maint-filter-branch-epoch-date:
  t7003: add test to filter a branch with a commit at epoch
  date.c: Fix off by one error in object-header date parsing
  filter-branch: do not forget the '@' prefix to force git-timestamp
2012-07-22 12:55:05 -07:00
Junio C Hamano
cb102b0832 filter-branch: do not forget the '@' prefix to force git-timestamp
For some reason, this script reinvents, instead of refactoring the
existing one in git-sh-setup, the logic to grab ident information
from an existing commit; it was missed when the corresponding logic
in git-sh-setup was updated with 2c733fb (parse_date(): '@' prefix
forces git-timestamp, 2012-02-02).

Teach the script that it is OK to have a way ancient timestamp in
the commits that are being filtered.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-09 20:42:54 -07:00
Junio C Hamano
9c14001650 Merge branch 'jk/filter-branch-require-clean-work-tree'
* jk/filter-branch-require-clean-work-tree:
  filter-branch: use require_clean_work_tree
2011-10-05 12:35:55 -07:00
Jeff King
5347a50fec filter-branch: use require_clean_work_tree
Filter-branch already requires that we have a clean work
tree before starting. However, it failed to refresh the
index before checking, which means it could be wrong in the
case of stat-dirtiness.

Instead of simply adding a call to refresh the index, let's
switch to using the require_clean_work_tree function
provided by git-sh-setup. It does exactly what we want, and
with fewer lines of code and more specific output messages.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-15 16:58:55 -07:00
Junio C Hamano
1461205880 Merge branch 'js/sh-style'
* js/sh-style:
  filter-branch.sh: de-dent usage string
  misc-sh: fix up whitespace in some other .sh files.
2011-08-17 17:35:50 -07:00
Michael Witten
0906f6e14e filter-branch: Export variable `workdir' for --commit-filter
According to `git help filter-branch':

       --commit-filter <command>
           ...
           You can use the _map_ convenience function in this filter,
           and other convenience functions, too...
           ...

However, it turns out that `map' hasn't been usable because it depends
on the variable `workdir', which is not propogated to the environment
of the shell that runs the commit-filter <command> because the
shell is created via a simple-command rather than a compound-command
subshell:

 @SHELL_PATH@ -c "$filter_commit" "git commit-tree" \
                 $(git write-tree) $parentstr < ../message > ../map/$commit ||
                         die "could not write rewritten commit"

One solution is simply to export `workdir'. However, it seems rather
heavy-handed to export `workdir' to the environments of all commands,
so instead this commit exports `workdir' for only the duration of the
shell command in question:

 workdir=$workdir @SHELL_PATH@ -c "$filter_commit" "git commit-tree" \
                 $(git write-tree) $parentstr < ../message > ../map/$commit ||
                         die "could not write rewritten commit"

Signed-off-by: Michael Witten <mfwitten@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-08 12:09:38 -07:00
Junio C Hamano
9dcca58db4 filter-branch.sh: de-dent usage string
"Usage: git filter-branch " that is prefixed to the first line is 25
columns long, so the "[--index-filter ..." on the second line would not
align with "[--env-filter ..." on the first line to begin with. If the
second and subsequent lines do not aim to align with anything on the
first line, it is just fine to indent them with a single HT.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-05 15:06:21 -07:00
Jon Seymour
285c6cbf3c misc-sh: fix up whitespace in some other .sh files.
I found that the patched 4 files were different when this
filter is applied.

	expand -i | unexpand --first-only

This patch contains the corrected files.

Signed-off-by: Jon Seymour <jon.seymour@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-05 15:04:48 -07:00
Csaba Henk
7ec344d802 filter-branch: retire --remap-to-ancestor
We can be clever and know by ourselves when we need the behavior
implied by "--remap-to-ancestor". No need to encumber users by having
them exposed to it as a tunable. (Option kept for backward compatibility,
but it's now a no-op.)

Signed-off-by: Csaba Henk <csaba@gluster.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-27 16:47:01 -07:00
Junio C Hamano
618d18b5aa Merge branch 'maint'
* maint:
  filter-branch: Fix error message for --prune-empty --commit-filter
2010-02-11 23:06:32 -08:00
Jacob Helwig
5da8171370 filter-branch: Fix error message for --prune-empty --commit-filter
Running filter-branch with --prune-empty and --commit-filter reports:

  "Cannot set --prune-empty and --filter-commit at the same time".

Change it to use the correct option name: --commit-filter

Signed-off-by: Jacob Helwig <jacob.helwig@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-11 22:12:36 -08:00