When bundleURI interface fetches multiple bundles, Git failed to
take full advantage of all bundles and ended up slurping duplicated
objects.
* xx/bundie-uri-fixes:
unbundle: extend object verification for fetches
fetch-pack: expose fsckObjects configuration logic
bundle-uri: verify oid before writing refs
The existing fetch.fsckObjects and transfer.fsckObjects configurations
were not fully applied to bundle-involved fetches, including direct
bundle fetches and bundle-uri enabled fetches. Furthermore, there was no
object verification support for unbundle.
This commit extends object verification support in `bundle.c:unbundle`
by adding the `VERIFY_BUNDLE_FSCK` option to `verify_bundle_flags`. When
this option is enabled, we append the `--fsck-objects` flag to
`git-index-pack`.
The `VERIFY_BUNDLE_FSCK` option is now used by bundle-involved fetches,
where we use `fetch-pack.c:fetch_pack_fsck_objects` to determine whether
to enable this option for `bundle.c:unbundle`, specifically in:
- `transport.c:fetch_refs_from_bundle` for direct bundle fetches.
- `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches.
This addition ensures a consistent logic for object verification during
fetches. Tests have been added to confirm functionality in the scenarios
mentioned above.
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating a bundle, we set up a revision walk, but never release
data associated with it. Furthermore, we create a mostly-shallow copy of
that revision walk where we only adapt its pending objects such that we
can reuse the walk. While that copy must not be released, the pending
objects array need to be.
Plug those memory leaks by releasing the revision walk and the pending
objects of the copied revision walk.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
They are equivalents and the former still exists, so as long as the
only change this commit makes are to rewrite test_i18ngrep to
test_grep, there won't be any new bug, even if there still are
callers of test_i18ngrep remaining in the tree, or when merged to
other topics that add new uses of test_i18ngrep.
This patch was produced more or less with
git grep -l -e 'test_i18ngrep ' 't/t[0-9][0-9][0-9][0-9]-*.sh' |
xargs perl -p -i -e 's/test_i18ngrep /test_grep /'
and a good way to sanity check the result yourself is to run the
above in a checkout of c4603c1c (test framework: further deprecate
test_i18ngrep, 2023-10-31) and compare the resulting working tree
contents with the result of applying this patch to the same commit.
You'll see that test_i18ngrep in a few t/lib-*.sh files corrected,
in addition to the manual reproduction.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In this test, we currently use the SHA1 prerequisite to specify the
algorithm we're using to test, since SHA-256 bundles are always v3,
whereas SHA-1 bundles default to v2, and as a result the default output
differs.
However, this causes a problem if we run with GIT_TEST_FAIL_PREREQS set,
since that means that we'll unexpectedly fail the SHA1 prerequisite,
resulting in incorrect expected output. Let's fix this by checking
against the built-in data called "algo", which tells us which algorithm
is in use. This should work in any situation, making our test a little
more robust.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the bundle tests to fully compare the expected "git ls-remote"
or "git bundle list-heads" output, instead of merely grepping it.
This avoids subtle regressions in the tests. In
f62e0a39b6 (t5704 (bundle): add tests for bundle --stdin, 2010-04-19)
the "bundle --stdin <rev-list options>" test was added to make sure we
didn't include the tag.
But since the --stdin mode didn't work until 5bb0fd2cab (bundle:
arguments can be read from stdin, 2021-01-11) our grepping of
"master" (later "main") missed the important part of the test.
Namely that we should not include the "refs/tags/tag" tag in that
case. Since the test only grepped for "main" in the output we'd miss a
regression in that code.
So let's use test_cmp instead, and also in the other nearby tests
where it's easy.
This does make things a bit more verbose in the case of the test
that's checking the bundle header, since it's different under SHA1 and
SHA256. I think this makes test easier to follow.
I've got some WIP changes to extend the "git bundle" command to dump
parts of the header out, which are easier to understand if we test the
output explicitly like this.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change uses of ":" on the LHS of a ">" to the more commonly used
">file" pattern in t/t5607-clone-bundle.sh.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git bundle" learns "--stdin" option to read its refs from the
standard input. Also, it now does not lose refs whey they point
at the same object.
* jx/bundle:
bundle: arguments can be read from stdin
bundle: lost objects when removing duplicate pendings
test: add helper functions for git-bundle
In order to create an incremental bundle, we need to pass many arguments
to let git-bundle ignore some already packed commits. It will be more
convenient to pass args via stdin. But the current implementation does
not allow us to do this.
This is because args are parsed twice when creating bundle. The first
time for parsing args is in `compute_and_write_prerequisites()` by
running `git-rev-list` command to write prerequisites in bundle file,
and stdin is consumed in this step if "--stdin" option is provided for
`git-bundle`. Later nothing can be read from stdin when running
`setup_revisions()` in `create_bundle()`.
The solution is to parse args once by removing the entire function
`compute_and_write_prerequisites()` and then calling function
`setup_revisions()`. In order to write prerequisites for bundle, will
call `prepare_revision_walk()` and `traverse_commit_list()`. But after
calling `prepare_revision_walk()`, the object array `revs.pending` is
left empty, and the following steps could not work properly with the
empty object array (`revs.pending`). Therefore, make a copy of `revs`
to `revs_copy` for later use right after calling `setup_revisions()`.
The copy of `revs_copy` is not a deep copy, it shares the same objects
with `revs`. The object array of `revs` has been cleared, but objects
themselves are still kept. Flags of objects may change after calling
`prepare_revision_walk()`, we can use these changed flags without
calling the `git rev-list` command and parsing its output like the
former implementation.
Also add testcases for git bundle in t6020, which read args from stdin.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -- t5[6-9]*.sh)
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In addition to the manual adjustment to let the `linux-gcc` CI job run
the test suite with `master` and then with `main`, this patch makes sure
that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts
that currently rely on the initial branch name being `master by default.
To determine which test scripts to mark up, the first step was to
force-set the default branch name to `master` in
- all test scripts that contain the keyword `master`,
- t4211, which expects `t/t4211/history.export` with a hard-coded ref to
initialize the default branch,
- t5560 because it sources `t/t556x_common` which uses `master`,
- t8002 and t8012 because both source `t/annotate-tests.sh` which also
uses `master`)
This trick was performed by this command:
$ sed -i '/^ *\. \.\/\(test-lib\|lib-\(bash\|cvs\|git-svn\)\|gitweb-lib\)\.sh$/i\
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\
' $(git grep -l master t/t[0-9]*.sh) \
t/t4211*.sh t/t5560*.sh t/t8002*.sh t/t8012*.sh
After that, careful, manual inspection revealed that some of the test
scripts containing the needle `master` do not actually rely on a
specific default branch name: either they mention `master` only in a
comment, or they initialize that branch specificially, or they do not
actually refer to the current default branch. Therefore, the
aforementioned modification was undone in those test scripts thusly:
$ git checkout HEAD -- \
t/t0027-auto-crlf.sh t/t0060-path-utils.sh \
t/t1011-read-tree-sparse-checkout.sh \
t/t1305-config-include.sh t/t1309-early-config.sh \
t/t1402-check-ref-format.sh t/t1450-fsck.sh \
t/t2024-checkout-dwim.sh \
t/t2106-update-index-assume-unchanged.sh \
t/t3040-subprojects-basic.sh t/t3301-notes.sh \
t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \
t/t3436-rebase-more-options.sh \
t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \
t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \
t/t5511-refspec.sh t/t5526-fetch-submodules.sh \
t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \
t/t5548-push-porcelain.sh \
t/t5552-skipping-fetch-negotiator.sh \
t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \
t/t5614-clone-submodules-shallow.sh \
t/t7508-status.sh t/t7606-merge-custom.sh \
t/t9302-fast-import-unpack-limit.sh
We excluded one set of test scripts in these commands, though: the range
of `git p4` tests. The reason? `git p4` stores the (foreign) remote
branch in the branch called `p4/master`, which is obviously not the
default branch. Manual analysis revealed that only five of these tests
actually require a specific default branch name to pass; They were
modified thusly:
$ sed -i '/^ *\. \.\/lib-git-p4\.sh$/i\
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\
' t/t980[0167]*.sh t/t9811*.sh
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently we detect the hash algorithm in use by the length of the
object ID. This is inelegant and prevents us from using a different
hash algorithm that is also 256 bits in length.
Since we cannot extend the v2 format in a backward-compatible way, let's
add a v3 format, which is identical, except for the addition of
capabilities, which are prefixed by an at sign. We add "object-format"
as the only capability and reject unknown capabilities, since we do not
have a network connection and therefore cannot negotiate with the other
side.
For compatibility, default to the v2 format for SHA-1 and require v3
for SHA-256.
In t5510, always use format v3 so we can be sure we produce consistent
results across hash algorithms. Since head -n N lists the top N lines
instead of the Nth line, let's run our output through sed to normalize
it and compare it against a fixed value, which will make sure we get
exactly what we're expecting.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the future, we plan on only allowing `test_must_fail` to work on a
restricted subset of commands, including `git`. Reorder the commands so
that `nongit` comes before `test_must_fail`. This way, `test_must_fail`
operates on a git command.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use $OID_REGEX instead of a hard-coded regular expression.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The only transport that does not allow fetch() to be called before
get_refs_list() is the bundle transport. Clean up the code by teaching
the bundle transport the ability to do this, and removing support for
transports that don't support this order of invocation.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The deal with bundles is: they really are thin packs, with very little
sugar on top. So we really need a repository (or more appropriately, an
object database) to work with, when asked to verify a bundle.
Let's error out with a useful error message if `git bundle verify` is
called without such an object database to work with.
Reported by Konstantin Ryabitsev.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing a bundle to a file, the bundle code actually creates
"your.bundle.lock" using our lockfile interface. We feed that output
descriptor to a child git-pack-objects via run-command, which has the
quirk that it closes the output descriptor in the parent.
To avoid confusing the lockfile code (which still thinks the descriptor
is valid), we dup() it, and operate on the duplicate.
However, this has a confusing side effect: after the dup() but before we
call pack-objects, we have _two_ descriptors open to the lockfile. If we
call die() during that time, the lockfile code will try to clean up the
partially-written file. It knows to close() the file before unlinking,
since on some platforms (i.e., Windows) the open file would block the
deletion. But it doesn't know about the duplicate descriptor. On
Windows, triggering an error at the right part of the code will result
in the cleanup failing and the lockfile being left in the filesystem.
We can solve this by moving the dup() much closer to start_command(),
shrinking the window in which we have the second descriptor open. It's
easy to place this in such a way that no die() is possible. We could
still die due to a signal in the exact wrong moment, but we already
tolerate races there (e.g., a signal could come before we manage to put
the file on the cleanup list in the first place).
As a bonus, this shields create_bundle() itself from the duplicate-fd
trick, and we can simplify its error handling (note that the lock
rollback now happens unconditionally, but that's OK; it's a noop if we
didn't open the lock in the first place).
The included test uses an empty bundle to cause a failure at the right
spot in the code, because that's easy to trigger (the other likely
errors are write() problems like ENOSPC). Note that it would already
pass on non-Windows systems (because they are happy to unlink an
already-open file).
Based-on-a-patch-by: Gaël Lhez <gael.lhez@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Tested-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When trying to find a good spot for testing clone with submodules, I
got confused where to add a new test file. There are both tests in t560*
as well as t57* both testing the clone command. t/README claims the
second digit is to indicate the command, which is inconsistent to the
current naming structure.
Rename all t57* tests to be in t56* to follow the pattern of the digits
as laid out in t/README.
It would have been less work to rename t56* => t57* because there are less
files, but the tests in t56* look more basic and I assumed the higher the
last digits the more complicated niche details are tested, so with the patch
now it looks more in order to me.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>