git/t/t7008-filter-branch-null-sha1.sh

57 lines
1.4 KiB
Bash
Raw Normal View History

write_index: optionally allow broken null sha1s Commit 4337b58 (do not write null sha1s to on-disk index, 2012-07-28) added a safety check preventing git from writing null sha1s into the index. The intent was to catch errors in other parts of the code that might let such an entry slip into the index (or worse, a tree). Some existing repositories may have invalid trees that contain null sha1s already, though. Until 4337b58, a common way to clean this up would be to use git-filter-branch's index-filter to repair such broken entries. That now fails when filter-branch tries to write out the index. Introduce a GIT_ALLOW_NULL_SHA1 environment variable to relax this check and make it easier to recover from such a history. It is tempting to not involve filter-branch in this commit at all, and instead require the user to manually invoke GIT_ALLOW_NULL_SHA1=1 git filter-branch ... to perform an index-filter on a history with trees with null sha1s. That would be slightly safer, but requires some specialized knowledge from the user. So let's set the GIT_ALLOW_NULL_SHA1 variable automatically when checking out the to-be-filtered trees. Advice on using filter-branch to remove such entries already exists on places like stackoverflow, and this patch makes it Just Work again on recent versions of git. Further commands that touch the index will still notice and fail, unless they actually remove the broken entries. A filter-branch whose filters do not touch the index at all will not error out (since we complain of the null sha1 only on writing, not when making a tree out of the index), but this is acceptable, as we still print a loud warning, so the problem is unlikely to go unnoticed. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-28 04:41:12 +08:00
#!/bin/sh
test_description='filter-branch removal of trees with null sha1'
revisions API: have release_revisions() release "mailmap" Extend the the release_revisions() function so that it frees the "mailmap" in the "struct rev_info". The log family of functions now calls the clear_mailmap() function added in fa8afd18e5a (revisions API: provide and use a release_revisions(), 2021-09-19), allowing us to whitelist some tests with "TEST_PASSES_SANITIZE_LEAK=true". Unfortunately having a pointer to a mailmap in "struct rev_info" instead of an embedded member that we "own" get a bit messy, as can be seen in the change to builtin/commit.c. When we free() this data we won't be able to tell apart a pointer to a "mailmap" on the heap from one on the stack. As seen in ea57bc0d41b (log: add --use-mailmap option, 2013-01-05) the "log" family allocates it on the heap, but in the find_author_by_nickname() code added in ea16794e430 (commit: search author pattern against mailmap, 2013-08-23) we allocated it on the stack instead. Ideally we'd simply change that member to a "struct string_list mailmap" and never free() the "mailmap" itself, but that would be a much larger change to the revisions API. We have code that needs to hand an existing "mailmap" to a "struct rev_info", while we could change all of that, let's not go there now. The complexity isn't in the ownership of the "mailmap" per-se, but that various things assume a "rev_info.mailmap == NULL" means "doesn't want mailmap", if we changed that to an init'd "struct string_list we'd need to carefully refactor things to change those assumptions. Let's instead always free() it, and simply declare that if you add such a "mailmap" it must be allocated on the heap. Any modern libc will correctly panic if we free() a stack variable, so this should be safe going forward. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-14 04:01:46 +08:00
write_index: optionally allow broken null sha1s Commit 4337b58 (do not write null sha1s to on-disk index, 2012-07-28) added a safety check preventing git from writing null sha1s into the index. The intent was to catch errors in other parts of the code that might let such an entry slip into the index (or worse, a tree). Some existing repositories may have invalid trees that contain null sha1s already, though. Until 4337b58, a common way to clean this up would be to use git-filter-branch's index-filter to repair such broken entries. That now fails when filter-branch tries to write out the index. Introduce a GIT_ALLOW_NULL_SHA1 environment variable to relax this check and make it easier to recover from such a history. It is tempting to not involve filter-branch in this commit at all, and instead require the user to manually invoke GIT_ALLOW_NULL_SHA1=1 git filter-branch ... to perform an index-filter on a history with trees with null sha1s. That would be slightly safer, but requires some specialized knowledge from the user. So let's set the GIT_ALLOW_NULL_SHA1 variable automatically when checking out the to-be-filtered trees. Advice on using filter-branch to remove such entries already exists on places like stackoverflow, and this patch makes it Just Work again on recent versions of git. Further commands that touch the index will still notice and fail, unless they actually remove the broken entries. A filter-branch whose filters do not touch the index at all will not error out (since we complain of the null sha1 only on writing, not when making a tree out of the index), but this is acceptable, as we still print a loud warning, so the problem is unlikely to go unnoticed. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-28 04:41:12 +08:00
. ./test-lib.sh
test_expect_success 'setup: base commits' '
test_commit one &&
test_commit two &&
test_commit three
'
test_expect_success 'setup: a commit with a bogus null sha1 in the tree' '
{
git ls-tree HEAD &&
printf "160000 commit $ZERO_OID\\tbroken\\n"
} >broken-tree &&
write_index: optionally allow broken null sha1s Commit 4337b58 (do not write null sha1s to on-disk index, 2012-07-28) added a safety check preventing git from writing null sha1s into the index. The intent was to catch errors in other parts of the code that might let such an entry slip into the index (or worse, a tree). Some existing repositories may have invalid trees that contain null sha1s already, though. Until 4337b58, a common way to clean this up would be to use git-filter-branch's index-filter to repair such broken entries. That now fails when filter-branch tries to write out the index. Introduce a GIT_ALLOW_NULL_SHA1 environment variable to relax this check and make it easier to recover from such a history. It is tempting to not involve filter-branch in this commit at all, and instead require the user to manually invoke GIT_ALLOW_NULL_SHA1=1 git filter-branch ... to perform an index-filter on a history with trees with null sha1s. That would be slightly safer, but requires some specialized knowledge from the user. So let's set the GIT_ALLOW_NULL_SHA1 variable automatically when checking out the to-be-filtered trees. Advice on using filter-branch to remove such entries already exists on places like stackoverflow, and this patch makes it Just Work again on recent versions of git. Further commands that touch the index will still notice and fail, unless they actually remove the broken entries. A filter-branch whose filters do not touch the index at all will not error out (since we complain of the null sha1 only on writing, not when making a tree out of the index), but this is acceptable, as we still print a loud warning, so the problem is unlikely to go unnoticed. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-28 04:41:12 +08:00
echo "add broken entry" >msg &&
tree=$(git mktree <broken-tree) &&
test_tick &&
commit=$(git commit-tree $tree -p HEAD <msg) &&
git update-ref HEAD "$commit"
'
# we have to make one more commit on top removing the broken
# entry, since otherwise our index does not match HEAD (and filter-branch will
# complain). We could make the index match HEAD, but doing so would involve
# writing a null sha1 into the index.
test_expect_success 'setup: bring HEAD and index in sync' '
test_tick &&
git commit -a -m "back to normal"
'
cache-tree: reject entries with null sha1 We generally disallow null sha1s from entering the index, due to 4337b5856 (do not write null sha1s to on-disk index, 2012-07-28). However, we loosened that in 83bd7437c (write_index: optionally allow broken null sha1s, 2013-08-27) so that tools like filter-branch could be used to repair broken history. However, we should make sure that these broken entries do not get propagated into new trees. For most entries, we'd catch them with the missing-object check (since presumably the null sha1 does not exist in our object database). But gitlink entries do not need reachability, so we may blindly copy the entry into a bogus tree. This patch rejects all null sha1s (with the same "invalid entry" message that missing objects get) when building trees from the index. It does so even for non-gitlinks, and even when "write-tree" is given the --missing-ok flag. The null sha1 is a special sentinel value that is already rejected in trees by fsck; whether the object exists or not, it is an error to put it in a tree. Note that for this to work, we must also avoid reusing an existing cache-tree that contains the null sha1. This patch does so by just refusing to write out any cache tree when the index contains a null sha1. This is blunter than we need to be; we could just reject the subtree that contains the offending entry. But it's not worth the complexity. The behavior is unchanged unless you have a broken index entry, and even then we'd refuse the whole index write unless the emergency GIT_ALLOW_NULL_SHA1 is in use. And even then the end result is only a performance drop (any write-tree will have to generate the whole cache-tree from scratch). The tests bear some explanation. The existing test in t7009 doesn't catch this problem, because our index-filter runs "git rm --cached", which will try to rewrite the updated index and barf on the bogus entry. So we never even make it to write-tree. The new test there adds a noop index-filter, which does show the problem. The new tests in t1601 are slightly redundant with what filter-branch is doing under the hood in t7009. But as they're much more direct, they're easier to reason about. And should filter-branch ever change or go away, we'd want to make sure that these plumbing commands behave sanely. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-22 02:46:17 +08:00
test_expect_success 'noop filter-branch complains' '
test_must_fail git filter-branch \
--force --prune-empty \
--index-filter "true"
'
write_index: optionally allow broken null sha1s Commit 4337b58 (do not write null sha1s to on-disk index, 2012-07-28) added a safety check preventing git from writing null sha1s into the index. The intent was to catch errors in other parts of the code that might let such an entry slip into the index (or worse, a tree). Some existing repositories may have invalid trees that contain null sha1s already, though. Until 4337b58, a common way to clean this up would be to use git-filter-branch's index-filter to repair such broken entries. That now fails when filter-branch tries to write out the index. Introduce a GIT_ALLOW_NULL_SHA1 environment variable to relax this check and make it easier to recover from such a history. It is tempting to not involve filter-branch in this commit at all, and instead require the user to manually invoke GIT_ALLOW_NULL_SHA1=1 git filter-branch ... to perform an index-filter on a history with trees with null sha1s. That would be slightly safer, but requires some specialized knowledge from the user. So let's set the GIT_ALLOW_NULL_SHA1 variable automatically when checking out the to-be-filtered trees. Advice on using filter-branch to remove such entries already exists on places like stackoverflow, and this patch makes it Just Work again on recent versions of git. Further commands that touch the index will still notice and fail, unless they actually remove the broken entries. A filter-branch whose filters do not touch the index at all will not error out (since we complain of the null sha1 only on writing, not when making a tree out of the index), but this is acceptable, as we still print a loud warning, so the problem is unlikely to go unnoticed. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-28 04:41:12 +08:00
test_expect_success 'filter commands are still checked' '
test_must_fail git filter-branch \
--force --prune-empty \
--index-filter "git rm --cached --ignore-unmatch three.t"
'
test_expect_success 'removing the broken entry works' '
echo three >expect &&
git filter-branch \
--force --prune-empty \
--index-filter "git rm --cached --ignore-unmatch broken" &&
git log -1 --format=%s >actual &&
test_cmp expect actual
'
test_done