global: introduce `USE_THE_REPOSITORY_VARIABLE` macro
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.
It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.
Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit
For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).
Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 14:50:23 +08:00
|
|
|
#define USE_THE_REPOSITORY_VARIABLE
|
|
|
|
|
2023-04-11 15:41:48 +08:00
|
|
|
#include "git-compat-util.h"
|
2018-03-24 01:45:21 +08:00
|
|
|
#include "repository.h"
|
2017-06-15 02:07:36 +08:00
|
|
|
#include "config.h"
|
2023-04-23 04:17:26 +08:00
|
|
|
#include "date.h"
|
2023-03-21 14:26:03 +08:00
|
|
|
#include "environment.h"
|
2023-03-21 14:25:54 +08:00
|
|
|
#include "gettext.h"
|
2023-02-24 08:09:27 +08:00
|
|
|
#include "hex.h"
|
2014-10-01 18:28:42 +08:00
|
|
|
#include "lockfile.h"
|
2012-10-26 23:53:55 +08:00
|
|
|
#include "refs.h"
|
|
|
|
#include "pkt-line.h"
|
|
|
|
#include "commit.h"
|
|
|
|
#include "tag.h"
|
|
|
|
#include "pack.h"
|
|
|
|
#include "sideband.h"
|
|
|
|
#include "fetch-pack.h"
|
|
|
|
#include "remote.h"
|
|
|
|
#include "run-command.h"
|
2013-07-09 04:56:53 +08:00
|
|
|
#include "connect.h"
|
2023-04-11 11:00:38 +08:00
|
|
|
#include "trace2.h"
|
2012-10-26 23:53:55 +08:00
|
|
|
#include "version.h"
|
2020-03-30 22:03:46 +08:00
|
|
|
#include "oid-array.h"
|
2017-05-16 01:32:20 +08:00
|
|
|
#include "oidset.h"
|
2017-08-19 06:20:26 +08:00
|
|
|
#include "packfile.h"
|
2023-05-16 14:34:06 +08:00
|
|
|
#include "object-store-ll.h"
|
2023-05-16 14:33:59 +08:00
|
|
|
#include "path.h"
|
fetch-pack: write shallow, then check connectivity
When fetching, connectivity is checked after the shallow file is
updated. There are 2 issues with this: (1) the connectivity check is
only performed up to ancestors of existing refs (which is not thorough
enough if we were deepening an existing ref in the first place), and (2)
there is no rollback of the shallow file if the connectivity check
fails.
To solve (1), update the connectivity check to check the ancestry chain
completely in the case of a deepening fetch by refraining from passing
"--not --all" when invoking rev-list in connected.c.
To solve (2), have fetch_pack() perform its own connectivity check
before updating the shallow file. To support existing use cases in which
"git fetch-pack" is used to download objects without much regard as to
the connectivity of the resulting objects with respect to the existing
repository, the connectivity check is only done if necessary (that is,
the fetch is not a clone, and the fetch involves shallow/deepen
functionality). "git fetch" still performs its own connectivity check,
preserving correctness but sometimes performing redundant work. This
redundancy is mitigated by the fact that fetch_pack() reports if it has
performed a connectivity check itself, and if the transport supports
connect or stateless-connect, it will bubble up that report so that "git
fetch" knows not to perform the connectivity check in such a case.
This was noticed when a user tried to deepen an existing repository by
fetching with --no-shallow from a server that did not send all necessary
objects - the connectivity check as run by "git fetch" succeeded, but a
subsequent "git fsck" failed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03 06:08:43 +08:00
|
|
|
#include "connected.h"
|
2018-06-15 06:54:28 +08:00
|
|
|
#include "fetch-negotiator.h"
|
2018-07-27 22:37:17 +08:00
|
|
|
#include "fsck.h"
|
2020-05-01 03:48:50 +08:00
|
|
|
#include "shallow.h"
|
fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 05:16:01 +08:00
|
|
|
#include "commit-reach.h"
|
|
|
|
#include "commit-graph.h"
|
fetch-pack: ignore SIGPIPE when writing to index-pack
When fetching, we send the incoming pack to index-pack (or
unpack-objects) via the sideband demuxer. If index-pack hits an error
(e.g., because an object fails fsck), then it will die immediately. This
may cause us to get SIGPIPE on the fetch, as we're still trying to write
pack contents from the sideband demuxer (which is typically a thread,
and thus takes down the whole fetch process).
You can see this in action with:
./t5702-protocol-v2.sh --stress --run=59
which ends with (wrapped for readability):
test_must_fail: died by signal 13: git -c protocol.version=2 \
-c transfer.fsckobjects=1 -c fetch.uriprotocols=http,https \
clone http://127.0.0.1:5708/smart/http_parent http_child
not ok 59 - packfile-uri with transfer.fsckobjects fails on bad object
This is mostly cosmetic. The actual error of interest (in this case, the
object that failed the fsck check) comes from index-pack straight to
stderr, so the user still sees it. They _might_ even see fetch-pack
complaining about index-pack failing, because the main thread is racing
with the sideband-demuxer. But they'll definitely see the signal death
in the exit code, which is what the test is complaining about.
We can make this more predictable by just ignoring SIGPIPE. The sideband
demuxer uses write_or_die(), so it will notice and stop (gracefully,
because we hook die_routine() to exit just the thread). And during this
section we're not writing anywhere else where we'd be concerned about
SIGPIPE preventing us from wasting effort writing to nowhere.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-20 04:58:55 +08:00
|
|
|
#include "sigchain.h"
|
2022-07-17 00:59:59 +08:00
|
|
|
#include "mergesort.h"
|
2012-10-26 23:53:55 +08:00
|
|
|
|
|
|
|
static int transfer_unpack_limit = -1;
|
|
|
|
static int fetch_unpack_limit = -1;
|
|
|
|
static int unpack_limit = 100;
|
|
|
|
static int prefer_ofs_delta = 1;
|
|
|
|
static int no_done;
|
2016-06-12 18:53:59 +08:00
|
|
|
static int deepen_since_ok;
|
2016-06-12 18:54:04 +08:00
|
|
|
static int deepen_not_ok;
|
2012-10-26 23:53:55 +08:00
|
|
|
static int fetch_fsck_objects = -1;
|
|
|
|
static int transfer_fsck_objects = -1;
|
|
|
|
static int agent_supported;
|
2017-12-08 23:58:40 +08:00
|
|
|
static int server_supports_filtering;
|
2020-11-12 07:29:31 +08:00
|
|
|
static int advertise_sid;
|
2020-05-01 03:48:57 +08:00
|
|
|
static struct shallow_lock shallow_lock;
|
2013-05-26 09:16:15 +08:00
|
|
|
static const char *alternate_shallow_file;
|
2021-03-28 21:15:51 +08:00
|
|
|
static struct fsck_options fsck_options = FSCK_OPTIONS_MISSING_GITMODULES;
|
2018-07-27 22:37:17 +08:00
|
|
|
static struct strbuf fsck_msg_types = STRBUF_INIT;
|
2020-06-11 04:57:23 +08:00
|
|
|
static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
|
2012-10-26 23:53:55 +08:00
|
|
|
|
2014-03-25 21:23:26 +08:00
|
|
|
/* Remember to update object flag allocation in object.h */
|
2012-10-26 23:53:55 +08:00
|
|
|
#define COMPLETE (1U << 0)
|
2018-06-15 06:54:28 +08:00
|
|
|
#define ALTERNATE (1U << 1)
|
fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 05:16:01 +08:00
|
|
|
#define COMMON (1U << 6)
|
|
|
|
#define REACH_SCRATCH (1U << 7)
|
2012-10-26 23:53:55 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* After sending this many "have"s if we do not get any new ACK , we
|
|
|
|
* give up traversing our history.
|
|
|
|
*/
|
|
|
|
#define MAX_IN_VAIN 256
|
|
|
|
|
2018-06-15 06:54:26 +08:00
|
|
|
static int multi_ack, use_sideband;
|
2015-05-22 04:23:38 +08:00
|
|
|
/* Allow specifying sha1 if it is a ref tip. */
|
|
|
|
#define ALLOW_TIP_SHA1 01
|
2015-05-22 04:23:39 +08:00
|
|
|
/* Allow request of a sha1 if it is reachable from a ref (possibly hidden ref). */
|
|
|
|
#define ALLOW_REACHABLE_SHA1 02
|
2015-05-22 04:23:38 +08:00
|
|
|
static unsigned int allow_unadvertised_object_request;
|
2012-10-26 23:53:55 +08:00
|
|
|
|
2016-06-12 18:53:54 +08:00
|
|
|
__attribute__((format (printf, 2, 3)))
|
|
|
|
static inline void print_verbose(const struct fetch_pack_args *args,
|
|
|
|
const char *fmt, ...)
|
|
|
|
{
|
|
|
|
va_list params;
|
|
|
|
|
|
|
|
if (!args->verbose)
|
|
|
|
return;
|
|
|
|
|
|
|
|
va_start(params, fmt);
|
|
|
|
vfprintf(stderr, fmt, params);
|
|
|
|
va_end(params);
|
|
|
|
fputc('\n', stderr);
|
|
|
|
}
|
|
|
|
|
fetch-pack: cache results of for_each_alternate_ref
We may run for_each_alternate_ref() twice, once in
find_common() and once in everything_local(). This operation
can be expensive, because it involves running a sub-process
which must freshly load all of the alternate's refs from
disk.
Let's cache and reuse the results between the two calls. We
can make some optimizations based on the particular use
pattern in fetch-pack to keep our memory usage down.
The first is that we only care about the sha1s, not the refs
themselves. So it's OK to store only the sha1s, and to
suppress duplicates. The natural fit would therefore be a
sha1_array.
However, sha1_array's de-duplication happens only after it
has read and sorted all entries. It still stores each
duplicate. For an alternate with a large number of refs
pointing to the same commits, this is a needless expense.
Instead, we'd prefer to eliminate duplicates before putting
them in the cache, which implies using a hash. We can
further note that fetch-pack will call parse_object() on
each alternate sha1. We can therefore keep our cache as a
set of pointers to "struct object". That gives us a place to
put our "already seen" bit with an optimized hash lookup.
And as a bonus, the object stores the sha1 for us, so
pointer-to-object is all we need.
There are two extra optimizations I didn't do here:
- we actually store an array of pointer-to-object.
Technically we could just walk the obj_hash table
looking for entries with the ALTERNATE flag set (because
our use case doesn't care about the order here).
But that hash table may be mostly composed of
non-ALTERNATE entries, so we'd waste time walking over
them. So it would be a slight win in memory use, but a
loss in CPU.
- the items we pull out of the cache are actual "struct
object"s, but then we feed "obj->sha1" to our
sub-functions, which promptly call parse_object().
This second parse is cheap, because it starts with
lookup_object() and will bail immediately when it sees
we've already parsed the object. We could save the extra
hash lookup, but it would involve refactoring the
functions we call. It may or may not be worth the
trouble.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-09 04:53:03 +08:00
|
|
|
struct alternate_object_cache {
|
|
|
|
struct object **items;
|
|
|
|
size_t nr, alloc;
|
|
|
|
};
|
|
|
|
|
2018-10-09 02:09:23 +08:00
|
|
|
static void cache_one_alternate(const struct object_id *oid,
|
fetch-pack: cache results of for_each_alternate_ref
We may run for_each_alternate_ref() twice, once in
find_common() and once in everything_local(). This operation
can be expensive, because it involves running a sub-process
which must freshly load all of the alternate's refs from
disk.
Let's cache and reuse the results between the two calls. We
can make some optimizations based on the particular use
pattern in fetch-pack to keep our memory usage down.
The first is that we only care about the sha1s, not the refs
themselves. So it's OK to store only the sha1s, and to
suppress duplicates. The natural fit would therefore be a
sha1_array.
However, sha1_array's de-duplication happens only after it
has read and sorted all entries. It still stores each
duplicate. For an alternate with a large number of refs
pointing to the same commits, this is a needless expense.
Instead, we'd prefer to eliminate duplicates before putting
them in the cache, which implies using a hash. We can
further note that fetch-pack will call parse_object() on
each alternate sha1. We can therefore keep our cache as a
set of pointers to "struct object". That gives us a place to
put our "already seen" bit with an optimized hash lookup.
And as a bonus, the object stores the sha1 for us, so
pointer-to-object is all we need.
There are two extra optimizations I didn't do here:
- we actually store an array of pointer-to-object.
Technically we could just walk the obj_hash table
looking for entries with the ALTERNATE flag set (because
our use case doesn't care about the order here).
But that hash table may be mostly composed of
non-ALTERNATE entries, so we'd waste time walking over
them. So it would be a slight win in memory use, but a
loss in CPU.
- the items we pull out of the cache are actual "struct
object"s, but then we feed "obj->sha1" to our
sub-functions, which promptly call parse_object().
This second parse is cheap, because it starts with
lookup_object() and will bail immediately when it sees
we've already parsed the object. We could save the extra
hash lookup, but it would involve refactoring the
functions we call. It may or may not be worth the
trouble.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-09 04:53:03 +08:00
|
|
|
void *vcache)
|
|
|
|
{
|
|
|
|
struct alternate_object_cache *cache = vcache;
|
2018-06-29 09:21:51 +08:00
|
|
|
struct object *obj = parse_object(the_repository, oid);
|
fetch-pack: cache results of for_each_alternate_ref
We may run for_each_alternate_ref() twice, once in
find_common() and once in everything_local(). This operation
can be expensive, because it involves running a sub-process
which must freshly load all of the alternate's refs from
disk.
Let's cache and reuse the results between the two calls. We
can make some optimizations based on the particular use
pattern in fetch-pack to keep our memory usage down.
The first is that we only care about the sha1s, not the refs
themselves. So it's OK to store only the sha1s, and to
suppress duplicates. The natural fit would therefore be a
sha1_array.
However, sha1_array's de-duplication happens only after it
has read and sorted all entries. It still stores each
duplicate. For an alternate with a large number of refs
pointing to the same commits, this is a needless expense.
Instead, we'd prefer to eliminate duplicates before putting
them in the cache, which implies using a hash. We can
further note that fetch-pack will call parse_object() on
each alternate sha1. We can therefore keep our cache as a
set of pointers to "struct object". That gives us a place to
put our "already seen" bit with an optimized hash lookup.
And as a bonus, the object stores the sha1 for us, so
pointer-to-object is all we need.
There are two extra optimizations I didn't do here:
- we actually store an array of pointer-to-object.
Technically we could just walk the obj_hash table
looking for entries with the ALTERNATE flag set (because
our use case doesn't care about the order here).
But that hash table may be mostly composed of
non-ALTERNATE entries, so we'd waste time walking over
them. So it would be a slight win in memory use, but a
loss in CPU.
- the items we pull out of the cache are actual "struct
object"s, but then we feed "obj->sha1" to our
sub-functions, which promptly call parse_object().
This second parse is cheap, because it starts with
lookup_object() and will bail immediately when it sees
we've already parsed the object. We could save the extra
hash lookup, but it would involve refactoring the
functions we call. It may or may not be worth the
trouble.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-09 04:53:03 +08:00
|
|
|
|
|
|
|
if (!obj || (obj->flags & ALTERNATE))
|
|
|
|
return;
|
|
|
|
|
|
|
|
obj->flags |= ALTERNATE;
|
|
|
|
ALLOC_GROW(cache->items, cache->nr + 1, cache->alloc);
|
|
|
|
cache->items[cache->nr++] = obj;
|
|
|
|
}
|
|
|
|
|
2018-06-15 06:54:28 +08:00
|
|
|
static void for_each_cached_alternate(struct fetch_negotiator *negotiator,
|
|
|
|
void (*cb)(struct fetch_negotiator *,
|
2018-06-15 06:54:26 +08:00
|
|
|
struct object *))
|
fetch-pack: cache results of for_each_alternate_ref
We may run for_each_alternate_ref() twice, once in
find_common() and once in everything_local(). This operation
can be expensive, because it involves running a sub-process
which must freshly load all of the alternate's refs from
disk.
Let's cache and reuse the results between the two calls. We
can make some optimizations based on the particular use
pattern in fetch-pack to keep our memory usage down.
The first is that we only care about the sha1s, not the refs
themselves. So it's OK to store only the sha1s, and to
suppress duplicates. The natural fit would therefore be a
sha1_array.
However, sha1_array's de-duplication happens only after it
has read and sorted all entries. It still stores each
duplicate. For an alternate with a large number of refs
pointing to the same commits, this is a needless expense.
Instead, we'd prefer to eliminate duplicates before putting
them in the cache, which implies using a hash. We can
further note that fetch-pack will call parse_object() on
each alternate sha1. We can therefore keep our cache as a
set of pointers to "struct object". That gives us a place to
put our "already seen" bit with an optimized hash lookup.
And as a bonus, the object stores the sha1 for us, so
pointer-to-object is all we need.
There are two extra optimizations I didn't do here:
- we actually store an array of pointer-to-object.
Technically we could just walk the obj_hash table
looking for entries with the ALTERNATE flag set (because
our use case doesn't care about the order here).
But that hash table may be mostly composed of
non-ALTERNATE entries, so we'd waste time walking over
them. So it would be a slight win in memory use, but a
loss in CPU.
- the items we pull out of the cache are actual "struct
object"s, but then we feed "obj->sha1" to our
sub-functions, which promptly call parse_object().
This second parse is cheap, because it starts with
lookup_object() and will bail immediately when it sees
we've already parsed the object. We could save the extra
hash lookup, but it would involve refactoring the
functions we call. It may or may not be worth the
trouble.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-09 04:53:03 +08:00
|
|
|
{
|
|
|
|
static int initialized;
|
|
|
|
static struct alternate_object_cache cache;
|
|
|
|
size_t i;
|
|
|
|
|
|
|
|
if (!initialized) {
|
|
|
|
for_each_alternate_ref(cache_one_alternate, &cache);
|
|
|
|
initialized = 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = 0; i < cache.nr; i++)
|
2018-06-15 06:54:28 +08:00
|
|
|
cb(negotiator, cache.items[i]);
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
|
|
|
|
fetch-pack: die if in commit graph but not obj db
When fetching, there is a step in which sought objects are first checked
against the local repository; only objects that are not in the local
repository are then fetched. This check first looks up the commit graph
file, and returns "present" if the object is in there.
However, the action of first looking up the commit graph file is not
done everywhere in Git, especially if the type of the object at the time
of lookup is not known. This means that in a repo corruption situation,
a user may encounter an "object missing" error, attempt to fetch it, and
still encounter the same error later when they reattempt their original
action, because the object is present in the commit graph file but not in
the object DB.
Therefore, make it a fatal error when this occurs. (Note that we cannot
proceed to include this object in the list of objects to be fetched
without changing at least the fetch negotiation code: what would happen
is that the client will send "want X" and "have X" and when I tested
at $DAYJOB with a work server that uses JGit, the server reasonably
returned an empty packfile. And changing the fetch negotiation code to
only use the object DB when deciding what to report as "have" would be
an unnecessary slowdown, I think.)
This was discovered when a lazy fetch of a missing commit completed with
nothing actually fetched, and the writing of the commit graph file after
every fetch then attempted to read said missing commit, triggering a
lazy fetch of said missing commit, resulting in an infinite loop with no
user-visible indication (until they check the list of processes running
on their computer). With this fix, there is no infinite loop. Note that
although the repo corruption we discovered was caused by a bug in GC in
a partial clone, the behavior that this patch teaches Git to warn about
applies to any repo with commit graph enabled and with a missing commit,
whether it is a partial clone or not.
t5330, introduced in 3a1ea94a49 (commit-graph.c: no lazy fetch in
lookup_commit_in_graph(), 2022-07-01), tests that an interaction between
fetch and the commit graph does not cause an infinite loop. This patch
changes the exit code in that situation, so that test had to be changed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-06 03:24:19 +08:00
|
|
|
static void die_in_commit_graph_only(const struct object_id *oid)
|
|
|
|
{
|
|
|
|
die(_("You are attempting to fetch %s, which is in the commit graph file but not in the object database.\n"
|
|
|
|
"This is probably due to repo corruption.\n"
|
|
|
|
"If you are attempting to repair this repo corruption by refetching the missing object, use 'git fetch --refetch' with the missing object."),
|
|
|
|
oid_to_hex(oid));
|
|
|
|
}
|
|
|
|
|
2024-11-06 03:24:18 +08:00
|
|
|
static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
|
fetch-pack: die if in commit graph but not obj db
When fetching, there is a step in which sought objects are first checked
against the local repository; only objects that are not in the local
repository are then fetched. This check first looks up the commit graph
file, and returns "present" if the object is in there.
However, the action of first looking up the commit graph file is not
done everywhere in Git, especially if the type of the object at the time
of lookup is not known. This means that in a repo corruption situation,
a user may encounter an "object missing" error, attempt to fetch it, and
still encounter the same error later when they reattempt their original
action, because the object is present in the commit graph file but not in
the object DB.
Therefore, make it a fatal error when this occurs. (Note that we cannot
proceed to include this object in the list of objects to be fetched
without changing at least the fetch negotiation code: what would happen
is that the client will send "want X" and "have X" and when I tested
at $DAYJOB with a work server that uses JGit, the server reasonably
returned an empty packfile. And changing the fetch negotiation code to
only use the object DB when deciding what to report as "have" would be
an unnecessary slowdown, I think.)
This was discovered when a lazy fetch of a missing commit completed with
nothing actually fetched, and the writing of the commit graph file after
every fetch then attempted to read said missing commit, triggering a
lazy fetch of said missing commit, resulting in an infinite loop with no
user-visible indication (until they check the list of processes running
on their computer). With this fix, there is no infinite loop. Note that
although the repo corruption we discovered was caused by a bug in GC in
a partial clone, the behavior that this patch teaches Git to warn about
applies to any repo with commit graph enabled and with a missing commit,
whether it is a partial clone or not.
t5330, introduced in 3a1ea94a49 (commit-graph.c: no lazy fetch in
lookup_commit_in_graph(), 2022-07-01), tests that an interaction between
fetch and the commit graph does not cause an infinite loop. This patch
changes the exit code in that situation, so that test had to be changed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-06 03:24:19 +08:00
|
|
|
int mark_tags_complete_and_check_obj_db)
|
2020-08-18 12:01:35 +08:00
|
|
|
{
|
2024-11-06 03:24:18 +08:00
|
|
|
enum object_type type;
|
|
|
|
struct object_info info = { .typep = &type };
|
fetch-pack: optimize loading of refs via commit graph
In order to negotiate a packfile, we need to dereference refs to see
which commits we have in common with the remote. To do so, we first look
up the object's type -- if it's a tag, we peel until we hit a non-tag
object. If we hit a commit eventually, then we return that commit.
In case the object ID points to a commit directly, we can avoid the
initial lookup of the object type by opportunistically looking up the
commit via the commit-graph, if available, which gives us a slight speed
bump of about 2% in a huge repository with about 2.3M refs:
Benchmark #1: HEAD~: git-fetch
Time (mean ± σ): 31.634 s ± 0.258 s [User: 28.400 s, System: 5.090 s]
Range (min … max): 31.280 s … 31.896 s 5 runs
Benchmark #2: HEAD: git-fetch
Time (mean ± σ): 31.129 s ± 0.543 s [User: 27.976 s, System: 5.056 s]
Range (min … max): 30.172 s … 31.479 s 5 runs
Summary
'HEAD: git-fetch' ran
1.02 ± 0.02 times faster than 'HEAD~: git-fetch'
In case this fails, we fall back to the old code which peels the
objects to a commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 21:09:54 +08:00
|
|
|
struct commit *commit;
|
|
|
|
|
|
|
|
commit = lookup_commit_in_graph(the_repository, oid);
|
fetch-pack: die if in commit graph but not obj db
When fetching, there is a step in which sought objects are first checked
against the local repository; only objects that are not in the local
repository are then fetched. This check first looks up the commit graph
file, and returns "present" if the object is in there.
However, the action of first looking up the commit graph file is not
done everywhere in Git, especially if the type of the object at the time
of lookup is not known. This means that in a repo corruption situation,
a user may encounter an "object missing" error, attempt to fetch it, and
still encounter the same error later when they reattempt their original
action, because the object is present in the commit graph file but not in
the object DB.
Therefore, make it a fatal error when this occurs. (Note that we cannot
proceed to include this object in the list of objects to be fetched
without changing at least the fetch negotiation code: what would happen
is that the client will send "want X" and "have X" and when I tested
at $DAYJOB with a work server that uses JGit, the server reasonably
returned an empty packfile. And changing the fetch negotiation code to
only use the object DB when deciding what to report as "have" would be
an unnecessary slowdown, I think.)
This was discovered when a lazy fetch of a missing commit completed with
nothing actually fetched, and the writing of the commit graph file after
every fetch then attempted to read said missing commit, triggering a
lazy fetch of said missing commit, resulting in an infinite loop with no
user-visible indication (until they check the list of processes running
on their computer). With this fix, there is no infinite loop. Note that
although the repo corruption we discovered was caused by a bug in GC in
a partial clone, the behavior that this patch teaches Git to warn about
applies to any repo with commit graph enabled and with a missing commit,
whether it is a partial clone or not.
t5330, introduced in 3a1ea94a49 (commit-graph.c: no lazy fetch in
lookup_commit_in_graph(), 2022-07-01), tests that an interaction between
fetch and the commit graph does not cause an infinite loop. This patch
changes the exit code in that situation, so that test had to be changed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-06 03:24:19 +08:00
|
|
|
if (commit) {
|
|
|
|
if (mark_tags_complete_and_check_obj_db) {
|
|
|
|
if (!has_object(the_repository, oid, 0))
|
|
|
|
die_in_commit_graph_only(oid);
|
|
|
|
}
|
fetch-pack: optimize loading of refs via commit graph
In order to negotiate a packfile, we need to dereference refs to see
which commits we have in common with the remote. To do so, we first look
up the object's type -- if it's a tag, we peel until we hit a non-tag
object. If we hit a commit eventually, then we return that commit.
In case the object ID points to a commit directly, we can avoid the
initial lookup of the object type by opportunistically looking up the
commit via the commit-graph, if available, which gives us a slight speed
bump of about 2% in a huge repository with about 2.3M refs:
Benchmark #1: HEAD~: git-fetch
Time (mean ± σ): 31.634 s ± 0.258 s [User: 28.400 s, System: 5.090 s]
Range (min … max): 31.280 s … 31.896 s 5 runs
Benchmark #2: HEAD: git-fetch
Time (mean ± σ): 31.129 s ± 0.543 s [User: 27.976 s, System: 5.056 s]
Range (min … max): 30.172 s … 31.479 s 5 runs
Summary
'HEAD: git-fetch' ran
1.02 ± 0.02 times faster than 'HEAD~: git-fetch'
In case this fails, we fall back to the old code which peels the
objects to a commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 21:09:54 +08:00
|
|
|
return commit;
|
fetch-pack: die if in commit graph but not obj db
When fetching, there is a step in which sought objects are first checked
against the local repository; only objects that are not in the local
repository are then fetched. This check first looks up the commit graph
file, and returns "present" if the object is in there.
However, the action of first looking up the commit graph file is not
done everywhere in Git, especially if the type of the object at the time
of lookup is not known. This means that in a repo corruption situation,
a user may encounter an "object missing" error, attempt to fetch it, and
still encounter the same error later when they reattempt their original
action, because the object is present in the commit graph file but not in
the object DB.
Therefore, make it a fatal error when this occurs. (Note that we cannot
proceed to include this object in the list of objects to be fetched
without changing at least the fetch negotiation code: what would happen
is that the client will send "want X" and "have X" and when I tested
at $DAYJOB with a work server that uses JGit, the server reasonably
returned an empty packfile. And changing the fetch negotiation code to
only use the object DB when deciding what to report as "have" would be
an unnecessary slowdown, I think.)
This was discovered when a lazy fetch of a missing commit completed with
nothing actually fetched, and the writing of the commit graph file after
every fetch then attempted to read said missing commit, triggering a
lazy fetch of said missing commit, resulting in an infinite loop with no
user-visible indication (until they check the list of processes running
on their computer). With this fix, there is no infinite loop. Note that
although the repo corruption we discovered was caused by a bug in GC in
a partial clone, the behavior that this patch teaches Git to warn about
applies to any repo with commit graph enabled and with a missing commit,
whether it is a partial clone or not.
t5330, introduced in 3a1ea94a49 (commit-graph.c: no lazy fetch in
lookup_commit_in_graph(), 2022-07-01), tests that an interaction between
fetch and the commit graph does not cause an infinite loop. This patch
changes the exit code in that situation, so that test had to be changed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-06 03:24:19 +08:00
|
|
|
}
|
2020-08-18 12:01:35 +08:00
|
|
|
|
|
|
|
while (1) {
|
|
|
|
if (oid_object_info_extended(the_repository, oid, &info,
|
2024-11-06 03:24:18 +08:00
|
|
|
OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK))
|
2020-08-18 12:01:35 +08:00
|
|
|
return NULL;
|
2024-11-06 03:24:18 +08:00
|
|
|
if (type == OBJ_TAG) {
|
2020-08-18 12:01:35 +08:00
|
|
|
struct tag *tag = (struct tag *)
|
|
|
|
parse_object(the_repository, oid);
|
|
|
|
|
|
|
|
if (!tag->tagged)
|
|
|
|
return NULL;
|
fetch-pack: die if in commit graph but not obj db
When fetching, there is a step in which sought objects are first checked
against the local repository; only objects that are not in the local
repository are then fetched. This check first looks up the commit graph
file, and returns "present" if the object is in there.
However, the action of first looking up the commit graph file is not
done everywhere in Git, especially if the type of the object at the time
of lookup is not known. This means that in a repo corruption situation,
a user may encounter an "object missing" error, attempt to fetch it, and
still encounter the same error later when they reattempt their original
action, because the object is present in the commit graph file but not in
the object DB.
Therefore, make it a fatal error when this occurs. (Note that we cannot
proceed to include this object in the list of objects to be fetched
without changing at least the fetch negotiation code: what would happen
is that the client will send "want X" and "have X" and when I tested
at $DAYJOB with a work server that uses JGit, the server reasonably
returned an empty packfile. And changing the fetch negotiation code to
only use the object DB when deciding what to report as "have" would be
an unnecessary slowdown, I think.)
This was discovered when a lazy fetch of a missing commit completed with
nothing actually fetched, and the writing of the commit graph file after
every fetch then attempted to read said missing commit, triggering a
lazy fetch of said missing commit, resulting in an infinite loop with no
user-visible indication (until they check the list of processes running
on their computer). With this fix, there is no infinite loop. Note that
although the repo corruption we discovered was caused by a bug in GC in
a partial clone, the behavior that this patch teaches Git to warn about
applies to any repo with commit graph enabled and with a missing commit,
whether it is a partial clone or not.
t5330, introduced in 3a1ea94a49 (commit-graph.c: no lazy fetch in
lookup_commit_in_graph(), 2022-07-01), tests that an interaction between
fetch and the commit graph does not cause an infinite loop. This patch
changes the exit code in that situation, so that test had to be changed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-06 03:24:19 +08:00
|
|
|
if (mark_tags_complete_and_check_obj_db)
|
2020-08-18 12:01:35 +08:00
|
|
|
tag->object.flags |= COMPLETE;
|
|
|
|
oid = &tag->tagged->oid;
|
|
|
|
} else {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2021-08-04 21:56:11 +08:00
|
|
|
|
2024-11-06 03:24:18 +08:00
|
|
|
if (type == OBJ_COMMIT) {
|
2021-08-04 21:56:11 +08:00
|
|
|
struct commit *commit = lookup_commit(the_repository, oid);
|
|
|
|
if (!commit || repo_parse_commit(the_repository, commit))
|
|
|
|
return NULL;
|
|
|
|
return commit;
|
|
|
|
}
|
|
|
|
|
2020-08-18 12:01:35 +08:00
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2018-06-15 06:54:28 +08:00
|
|
|
static int rev_list_insert_ref(struct fetch_negotiator *negotiator,
|
2018-06-15 06:54:26 +08:00
|
|
|
const struct object_id *oid)
|
2012-10-26 23:53:55 +08:00
|
|
|
{
|
2020-08-18 12:01:35 +08:00
|
|
|
struct commit *c = deref_without_lazy_fetch(oid, 0);
|
2012-10-26 23:53:55 +08:00
|
|
|
|
2020-08-18 12:01:35 +08:00
|
|
|
if (c)
|
|
|
|
negotiator->add_tip(negotiator, c);
|
2012-10-26 23:53:55 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-08-26 01:09:48 +08:00
|
|
|
static int rev_list_insert_ref_oid(const char *refname UNUSED,
|
2024-08-09 23:37:50 +08:00
|
|
|
const char *referent UNUSED,
|
2022-08-19 18:08:32 +08:00
|
|
|
const struct object_id *oid,
|
2022-08-26 01:09:48 +08:00
|
|
|
int flag UNUSED,
|
2022-08-19 18:08:32 +08:00
|
|
|
void *cb_data)
|
2012-10-26 23:53:55 +08:00
|
|
|
{
|
2020-08-18 12:01:35 +08:00
|
|
|
return rev_list_insert_ref(cb_data, oid);
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
enum ack_type {
|
|
|
|
NAK = 0,
|
|
|
|
ACK,
|
|
|
|
ACK_continue,
|
|
|
|
ACK_common,
|
|
|
|
ACK_ready
|
|
|
|
};
|
|
|
|
|
2018-12-30 05:19:14 +08:00
|
|
|
static void consume_shallow_list(struct fetch_pack_args *args,
|
|
|
|
struct packet_reader *reader)
|
2012-10-26 23:53:55 +08:00
|
|
|
{
|
2016-06-12 18:53:56 +08:00
|
|
|
if (args->stateless_rpc && args->deepen) {
|
2012-10-26 23:53:55 +08:00
|
|
|
/* If we sent a depth we will get back "duplicate"
|
|
|
|
* shallow and unshallow commands every time there
|
|
|
|
* is a block of have lines exchanged.
|
|
|
|
*/
|
2018-12-30 05:19:14 +08:00
|
|
|
while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
|
|
|
|
if (starts_with(reader->line, "shallow "))
|
2012-10-26 23:53:55 +08:00
|
|
|
continue;
|
2018-12-30 05:19:14 +08:00
|
|
|
if (starts_with(reader->line, "unshallow "))
|
2012-10-26 23:53:55 +08:00
|
|
|
continue;
|
2016-06-12 18:53:55 +08:00
|
|
|
die(_("git fetch-pack: expected shallow list"));
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
2018-12-30 05:19:14 +08:00
|
|
|
if (reader->status != PACKET_READ_FLUSH)
|
|
|
|
die(_("git fetch-pack: expected a flush packet after shallow list"));
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-12-30 05:19:14 +08:00
|
|
|
static enum ack_type get_ack(struct packet_reader *reader,
|
|
|
|
struct object_id *result_oid)
|
2012-10-26 23:53:55 +08:00
|
|
|
{
|
pkt-line: provide a LARGE_PACKET_MAX static buffer
Most of the callers of packet_read_line just read into a
static 1000-byte buffer (callers which handle arbitrary
binary data already use LARGE_PACKET_MAX). This works fine
in practice, because:
1. The only variable-sized data in these lines is a ref
name, and refs tend to be a lot shorter than 1000
characters.
2. When sending ref lines, git-core always limits itself
to 1000 byte packets.
However, the only limit given in the protocol specification
in Documentation/technical/protocol-common.txt is
LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in
pack-protocol.txt, and then only describing what we write,
not as a specific limit for readers.
This patch lets us bump the 1000-byte limit to
LARGE_PACKET_MAX. Even though git-core will never write a
packet where this makes a difference, there are two good
reasons to do this:
1. Other git implementations may have followed
protocol-common.txt and used a larger maximum size. We
don't bump into it in practice because it would involve
very long ref names.
2. We may want to increase the 1000-byte limit one day.
Since packets are transferred before any capabilities,
it's difficult to do this in a backwards-compatible
way. But if we bump the size of buffer the readers can
handle, eventually older versions of git will be
obsolete enough that we can justify bumping the
writers, as well. We don't have plans to do this
anytime soon, but there is no reason not to start the
clock ticking now.
Just bumping all of the reading bufs to LARGE_PACKET_MAX
would waste memory. Instead, since most readers just read
into a temporary buffer anyway, let's provide a single
static buffer that all callers can use. We can further wrap
this detail away by having the packet_read_line wrapper just
use the buffer transparently and return a pointer to the
static storage. That covers most of the cases, and the
remaining ones already read into their own LARGE_PACKET_MAX
buffers.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-21 04:02:57 +08:00
|
|
|
int len;
|
2014-06-19 03:56:03 +08:00
|
|
|
const char *arg;
|
2012-10-26 23:53:55 +08:00
|
|
|
|
2018-12-30 05:19:14 +08:00
|
|
|
if (packet_reader_read(reader) != PACKET_READ_NORMAL)
|
2018-02-09 02:47:49 +08:00
|
|
|
die(_("git fetch-pack: expected ACK/NAK, got a flush packet"));
|
2018-12-30 05:19:14 +08:00
|
|
|
len = reader->pktlen;
|
|
|
|
|
|
|
|
if (!strcmp(reader->line, "NAK"))
|
2012-10-26 23:53:55 +08:00
|
|
|
return NAK;
|
2018-12-30 05:19:14 +08:00
|
|
|
if (skip_prefix(reader->line, "ACK ", &arg)) {
|
2019-08-19 04:04:04 +08:00
|
|
|
const char *p;
|
|
|
|
if (!parse_oid_hex(arg, result_oid, &p)) {
|
|
|
|
len -= p - reader->line;
|
2014-06-19 03:56:03 +08:00
|
|
|
if (len < 1)
|
fetch-pack: fix out-of-bounds buffer offset in get_ack
When we read acks from the remote, we expect either:
ACK <sha1>
or
ACK <sha1> <multi-ack-flag>
We parse the "ACK <sha1>" bit from the line, and then start
looking for the flag strings at "line+45"; if we don't have
them, we assume it's of the first type. But if we do have
the first type, then line+45 is not necessarily inside our
string at all!
It turns out that this works most of the time due to the way
we parse the packets. They should come in with a newline,
and packet_read puts an extra NUL into the buffer, so we end
up with:
ACK <sha1>\n\0
with the newline at offset 44 and the NUL at offset 45. We
then strip the newline, putting a NUL at offset 44. So
when we look at "line+45", we are looking past the end of
our string; but it's OK, because we hit the terminator from
the original string.
This breaks down, however, if the other side does not
terminate their packets with a newline. In that case, our
packet is one character shorter, and we start looking
through uninitialized memory for the flag. No known
implementation sends such a packet, so it has never come up
in practice.
This patch tightens the check by looking for a short,
flagless ACK before trying to parse the flag.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-21 04:00:28 +08:00
|
|
|
return ACK;
|
2019-08-19 04:04:04 +08:00
|
|
|
if (strstr(p, "continue"))
|
2012-10-26 23:53:55 +08:00
|
|
|
return ACK_continue;
|
2019-08-19 04:04:04 +08:00
|
|
|
if (strstr(p, "common"))
|
2012-10-26 23:53:55 +08:00
|
|
|
return ACK_common;
|
2019-08-19 04:04:04 +08:00
|
|
|
if (strstr(p, "ready"))
|
2012-10-26 23:53:55 +08:00
|
|
|
return ACK_ready;
|
|
|
|
return ACK;
|
|
|
|
}
|
|
|
|
}
|
2018-12-30 05:19:14 +08:00
|
|
|
die(_("git fetch-pack: expected ACK/NAK, got '%s'"), reader->line);
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void send_request(struct fetch_pack_args *args,
|
|
|
|
int fd, struct strbuf *buf)
|
|
|
|
{
|
|
|
|
if (args->stateless_rpc) {
|
|
|
|
send_sideband(fd, -1, buf->buf, buf->len, LARGE_PACKET_MAX);
|
|
|
|
packet_flush(fd);
|
2019-03-05 12:11:39 +08:00
|
|
|
} else {
|
|
|
|
if (write_in_full(fd, buf->buf, buf->len) < 0)
|
|
|
|
die_errno(_("unable to write to remote"));
|
|
|
|
}
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
|
|
|
|
2018-06-15 06:54:28 +08:00
|
|
|
static void insert_one_alternate_object(struct fetch_negotiator *negotiator,
|
2018-06-15 06:54:26 +08:00
|
|
|
struct object *obj)
|
2012-10-26 23:53:55 +08:00
|
|
|
{
|
2020-08-18 12:01:35 +08:00
|
|
|
rev_list_insert_ref(negotiator, &obj->oid);
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
#define INITIAL_FLUSH 16
|
|
|
|
#define PIPESAFE_FLUSH 32
|
2016-07-19 06:21:38 +08:00
|
|
|
#define LARGE_FLUSH 16384
|
2012-10-26 23:53:55 +08:00
|
|
|
|
2018-03-16 01:31:28 +08:00
|
|
|
static int next_flush(int stateless_rpc, int count)
|
2012-10-26 23:53:55 +08:00
|
|
|
{
|
2018-03-16 01:31:28 +08:00
|
|
|
if (stateless_rpc) {
|
2016-07-19 06:21:38 +08:00
|
|
|
if (count < LARGE_FLUSH)
|
|
|
|
count <<= 1;
|
|
|
|
else
|
|
|
|
count = count * 11 / 10;
|
|
|
|
} else {
|
|
|
|
if (count < PIPESAFE_FLUSH)
|
|
|
|
count <<= 1;
|
|
|
|
else
|
|
|
|
count += PIPESAFE_FLUSH;
|
|
|
|
}
|
2012-10-26 23:53:55 +08:00
|
|
|
return count;
|
|
|
|
}
|
|
|
|
|
2018-07-03 06:39:44 +08:00
|
|
|
static void mark_tips(struct fetch_negotiator *negotiator,
|
|
|
|
const struct oid_array *negotiation_tips)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!negotiation_tips) {
|
2024-05-07 15:11:53 +08:00
|
|
|
refs_for_each_rawref(get_main_ref_store(the_repository),
|
|
|
|
rev_list_insert_ref_oid, negotiator);
|
2018-07-03 06:39:44 +08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = 0; i < negotiation_tips->nr; i++)
|
2020-08-18 12:01:35 +08:00
|
|
|
rev_list_insert_ref(negotiator, &negotiation_tips->oid[i]);
|
2018-07-03 06:39:44 +08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2022-07-27 00:27:11 +08:00
|
|
|
static void send_filter(struct fetch_pack_args *args,
|
|
|
|
struct strbuf *req_buf,
|
|
|
|
int server_supports_filter)
|
|
|
|
{
|
|
|
|
if (args->filter_options.choice) {
|
|
|
|
const char *spec =
|
|
|
|
expand_list_objects_filter_spec(&args->filter_options);
|
|
|
|
if (server_supports_filter) {
|
|
|
|
print_verbose(args, _("Server supports filter"));
|
|
|
|
packet_buf_write(req_buf, "filter %s", spec);
|
|
|
|
trace2_data_string("fetch", the_repository,
|
|
|
|
"filter/effective", spec);
|
|
|
|
} else {
|
|
|
|
warning("filtering not recognized by server, ignoring");
|
|
|
|
trace2_data_string("fetch", the_repository,
|
|
|
|
"filter/unsupported", spec);
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
trace2_data_string("fetch", the_repository,
|
|
|
|
"filter/none", "");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-06-15 06:54:28 +08:00
|
|
|
static int find_common(struct fetch_negotiator *negotiator,
|
2018-06-15 06:54:26 +08:00
|
|
|
struct fetch_pack_args *args,
|
2017-05-01 10:28:54 +08:00
|
|
|
int fd[2], struct object_id *result_oid,
|
2012-10-26 23:53:55 +08:00
|
|
|
struct ref *refs)
|
|
|
|
{
|
|
|
|
int fetching;
|
|
|
|
int count = 0, flushes = 0, flush_at = INITIAL_FLUSH, retval;
|
fetch-pack: add tracing for negotiation rounds
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 06:04:05 +08:00
|
|
|
int negotiation_round = 0, haves = 0;
|
2017-05-01 10:28:54 +08:00
|
|
|
const struct object_id *oid;
|
2012-10-26 23:53:55 +08:00
|
|
|
unsigned in_vain = 0;
|
|
|
|
int got_continue = 0;
|
|
|
|
int got_ready = 0;
|
|
|
|
struct strbuf req_buf = STRBUF_INIT;
|
|
|
|
size_t state_len = 0;
|
2018-12-30 05:19:14 +08:00
|
|
|
struct packet_reader reader;
|
2012-10-26 23:53:55 +08:00
|
|
|
|
|
|
|
if (args->stateless_rpc && multi_ack == 1)
|
2022-01-06 04:02:19 +08:00
|
|
|
die(_("the option '%s' requires '%s'"), "--stateless-rpc", "multi_ack_detailed");
|
2012-10-26 23:53:55 +08:00
|
|
|
|
2018-12-30 05:19:14 +08:00
|
|
|
packet_reader_init(&reader, fd[0], NULL, 0,
|
pack-protocol.txt: accept error packets in any context
In the Git pack protocol definition, an error packet may appear only in
a certain context. However, servers can face a runtime error (e.g. I/O
error) at an arbitrary timing. This patch changes the protocol to allow
an error packet to be sent instead of any packet.
Without this protocol spec change, when a server cannot process a
request, there's no way to tell that to a client. Since the server
cannot produce a valid response, it would be forced to cut a connection
without telling why. With this protocol spec change, the server can be
more gentle in this situation. An old client may see these error packets
as an unexpected packet, but this is not worse than having an unexpected
EOF.
Following this protocol spec change, the error packet handling code is
moved to pkt-line.c. Implementation wise, this implementation uses
pkt-line to communicate with a subprocess. Since this is not a part of
Git protocol, it's possible that a packet that is not supposed to be an
error packet is mistakenly parsed as an error packet. This error packet
handling is enabled only for the Git pack protocol parsing code
considering this.
Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-30 05:19:15 +08:00
|
|
|
PACKET_READ_CHOMP_NEWLINE |
|
|
|
|
PACKET_READ_DIE_ON_ERR_PACKET);
|
2018-12-30 05:19:14 +08:00
|
|
|
|
2020-08-18 12:01:37 +08:00
|
|
|
mark_tips(negotiator, args->negotiation_tips);
|
|
|
|
for_each_cached_alternate(negotiator, insert_one_alternate_object);
|
2012-10-26 23:53:55 +08:00
|
|
|
|
|
|
|
fetching = 0;
|
|
|
|
for ( ; refs ; refs = refs->next) {
|
2017-05-01 10:28:54 +08:00
|
|
|
struct object_id *remote = &refs->old_oid;
|
2012-10-26 23:53:55 +08:00
|
|
|
const char *remote_hex;
|
|
|
|
struct object *o;
|
|
|
|
|
2022-03-28 22:02:06 +08:00
|
|
|
if (!args->refetch) {
|
|
|
|
/*
|
|
|
|
* If that object is complete (i.e. it is an ancestor of a
|
|
|
|
* local ref), we tell them we have it but do not have to
|
|
|
|
* tell them about its ancestors, which they already know
|
|
|
|
* about.
|
|
|
|
*
|
|
|
|
* We use lookup_object here because we are only
|
|
|
|
* interested in the case we *know* the object is
|
|
|
|
* reachable and we have already scanned it.
|
|
|
|
*/
|
|
|
|
if (((o = lookup_object(the_repository, remote)) != NULL) &&
|
|
|
|
(o->flags & COMPLETE)) {
|
|
|
|
continue;
|
|
|
|
}
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
|
|
|
|
2017-05-01 10:28:54 +08:00
|
|
|
remote_hex = oid_to_hex(remote);
|
2012-10-26 23:53:55 +08:00
|
|
|
if (!fetching) {
|
|
|
|
struct strbuf c = STRBUF_INIT;
|
|
|
|
if (multi_ack == 2) strbuf_addstr(&c, " multi_ack_detailed");
|
|
|
|
if (multi_ack == 1) strbuf_addstr(&c, " multi_ack");
|
|
|
|
if (no_done) strbuf_addstr(&c, " no-done");
|
|
|
|
if (use_sideband == 2) strbuf_addstr(&c, " side-band-64k");
|
|
|
|
if (use_sideband == 1) strbuf_addstr(&c, " side-band");
|
fetch, upload-pack: --deepen=N extends shallow boundary by N commits
In git-fetch, --depth argument is always relative with the latest
remote refs. This makes it a bit difficult to cover this use case,
where the user wants to make the shallow history, say 3 levels
deeper. It would work if remote refs have not moved yet, but nobody
can guarantee that, especially when that use case is performed a
couple months after the last clone or "git fetch --depth". Also,
modifying shallow boundary using --depth does not work well with
clones created by --since or --not.
This patch fixes that. A new argument --deepen=<N> will add <N> more (*)
parent commits to the current history regardless of where remote refs
are.
Have/Want negotiation is still respected. So if remote refs move, the
server will send two chunks: one between "have" and "want" and another
to extend shallow history. In theory, the client could send no "want"s
in order to get the second chunk only. But the protocol does not allow
that. Either you send no want lines, which means ls-remote; or you
have to send at least one want line that carries deep-relative to the
server..
The main work was done by Dongcan Jiang. I fixed it up here and there.
And of course all the bugs belong to me.
(*) We could even support --deepen=<N> where <N> is negative. In that
case we can cut some history from the shallow clone. This operation
(and --depth=<shorter depth>) does not require interaction with remote
side (and more complicated to implement as a result).
Helped-by: Duy Nguyen <pclouds@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Dongcan Jiang <dongcan.jiang@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-12 18:54:09 +08:00
|
|
|
if (args->deepen_relative) strbuf_addstr(&c, " deepen-relative");
|
2012-10-26 23:53:55 +08:00
|
|
|
if (args->use_thin_pack) strbuf_addstr(&c, " thin-pack");
|
|
|
|
if (args->no_progress) strbuf_addstr(&c, " no-progress");
|
|
|
|
if (args->include_tag) strbuf_addstr(&c, " include-tag");
|
|
|
|
if (prefer_ofs_delta) strbuf_addstr(&c, " ofs-delta");
|
2016-06-12 18:53:59 +08:00
|
|
|
if (deepen_since_ok) strbuf_addstr(&c, " deepen-since");
|
2016-06-12 18:54:04 +08:00
|
|
|
if (deepen_not_ok) strbuf_addstr(&c, " deepen-not");
|
2012-10-26 23:53:55 +08:00
|
|
|
if (agent_supported) strbuf_addf(&c, " agent=%s",
|
|
|
|
git_user_agent_sanitized());
|
2020-11-12 07:29:31 +08:00
|
|
|
if (advertise_sid)
|
|
|
|
strbuf_addf(&c, " session-id=%s", trace2_session_id());
|
2017-12-08 23:58:40 +08:00
|
|
|
if (args->filter_options.choice)
|
|
|
|
strbuf_addstr(&c, " filter");
|
2012-10-26 23:53:55 +08:00
|
|
|
packet_buf_write(&req_buf, "want %s%s\n", remote_hex, c.buf);
|
|
|
|
strbuf_release(&c);
|
|
|
|
} else
|
|
|
|
packet_buf_write(&req_buf, "want %s\n", remote_hex);
|
|
|
|
fetching++;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!fetching) {
|
|
|
|
strbuf_release(&req_buf);
|
|
|
|
packet_flush(fd[1]);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2018-05-18 06:51:46 +08:00
|
|
|
if (is_repository_shallow(the_repository))
|
2013-12-05 21:02:34 +08:00
|
|
|
write_shallow_commits(&req_buf, 1, NULL);
|
2012-10-26 23:53:55 +08:00
|
|
|
if (args->depth > 0)
|
|
|
|
packet_buf_write(&req_buf, "deepen %d", args->depth);
|
2016-06-12 18:53:59 +08:00
|
|
|
if (args->deepen_since) {
|
2017-04-27 03:29:31 +08:00
|
|
|
timestamp_t max_age = approxidate(args->deepen_since);
|
2017-04-21 18:45:48 +08:00
|
|
|
packet_buf_write(&req_buf, "deepen-since %"PRItime, max_age);
|
2016-06-12 18:53:59 +08:00
|
|
|
}
|
2016-06-12 18:54:04 +08:00
|
|
|
if (args->deepen_not) {
|
|
|
|
int i;
|
|
|
|
for (i = 0; i < args->deepen_not->nr; i++) {
|
|
|
|
struct string_list_item *s = args->deepen_not->items + i;
|
|
|
|
packet_buf_write(&req_buf, "deepen-not %s", s->string);
|
|
|
|
}
|
|
|
|
}
|
2022-07-27 00:27:11 +08:00
|
|
|
send_filter(args, &req_buf, server_supports_filtering);
|
2012-10-26 23:53:55 +08:00
|
|
|
packet_buf_flush(&req_buf);
|
|
|
|
state_len = req_buf.len;
|
|
|
|
|
2016-06-12 18:53:56 +08:00
|
|
|
if (args->deepen) {
|
use skip_prefix to avoid magic numbers
It's a common idiom to match a prefix and then skip past it
with a magic number, like:
if (starts_with(foo, "bar"))
foo += 3;
This is easy to get wrong, since you have to count the
prefix string yourself, and there's no compiler check if the
string changes. We can use skip_prefix to avoid the magic
numbers here.
Note that some of these conversions could be much shorter.
For example:
if (starts_with(arg, "--foo=")) {
bar = arg + 6;
continue;
}
could become:
if (skip_prefix(arg, "--foo=", &bar))
continue;
However, I have left it as:
if (skip_prefix(arg, "--foo=", &v)) {
bar = v;
continue;
}
to visually match nearby cases which need to actually
process the string. Like:
if (skip_prefix(arg, "--foo=", &v)) {
bar = atoi(v);
continue;
}
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-19 03:47:50 +08:00
|
|
|
const char *arg;
|
2017-05-01 10:28:54 +08:00
|
|
|
struct object_id oid;
|
2012-10-26 23:53:55 +08:00
|
|
|
|
|
|
|
send_request(args, fd[1], &req_buf);
|
2018-12-30 05:19:14 +08:00
|
|
|
while (packet_reader_read(&reader) == PACKET_READ_NORMAL) {
|
|
|
|
if (skip_prefix(reader.line, "shallow ", &arg)) {
|
2017-05-01 10:28:54 +08:00
|
|
|
if (get_oid_hex(arg, &oid))
|
2018-12-30 05:19:14 +08:00
|
|
|
die(_("invalid shallow line: %s"), reader.line);
|
2018-05-18 06:51:44 +08:00
|
|
|
register_shallow(the_repository, &oid);
|
2012-10-26 23:53:55 +08:00
|
|
|
continue;
|
|
|
|
}
|
2018-12-30 05:19:14 +08:00
|
|
|
if (skip_prefix(reader.line, "unshallow ", &arg)) {
|
2017-05-01 10:28:54 +08:00
|
|
|
if (get_oid_hex(arg, &oid))
|
2018-12-30 05:19:14 +08:00
|
|
|
die(_("invalid unshallow line: %s"), reader.line);
|
2019-06-20 15:41:14 +08:00
|
|
|
if (!lookup_object(the_repository, &oid))
|
2018-12-30 05:19:14 +08:00
|
|
|
die(_("object not found: %s"), reader.line);
|
2012-10-26 23:53:55 +08:00
|
|
|
/* make sure that it is parsed as shallow */
|
2018-06-29 09:21:51 +08:00
|
|
|
if (!parse_object(the_repository, &oid))
|
2018-12-30 05:19:14 +08:00
|
|
|
die(_("error in object: %s"), reader.line);
|
2017-05-07 06:10:06 +08:00
|
|
|
if (unregister_shallow(&oid))
|
2018-12-30 05:19:14 +08:00
|
|
|
die(_("no shallow found: %s"), reader.line);
|
2012-10-26 23:53:55 +08:00
|
|
|
continue;
|
|
|
|
}
|
2018-12-30 05:19:14 +08:00
|
|
|
die(_("expected shallow/unshallow, got %s"), reader.line);
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
|
|
|
} else if (!args->stateless_rpc)
|
|
|
|
send_request(args, fd[1], &req_buf);
|
|
|
|
|
|
|
|
if (!args->stateless_rpc) {
|
|
|
|
/* If we aren't using the stateless-rpc interface
|
|
|
|
* we don't need to retain the headers.
|
|
|
|
*/
|
|
|
|
strbuf_setlen(&req_buf, 0);
|
|
|
|
state_len = 0;
|
|
|
|
}
|
|
|
|
|
2019-10-03 07:49:28 +08:00
|
|
|
trace2_region_enter("fetch-pack", "negotiation_v0_v1", the_repository);
|
2012-10-26 23:53:55 +08:00
|
|
|
flushes = 0;
|
|
|
|
retval = -1;
|
2018-06-15 06:54:28 +08:00
|
|
|
while ((oid = negotiator->next(negotiator))) {
|
2017-05-01 10:28:54 +08:00
|
|
|
packet_buf_write(&req_buf, "have %s\n", oid_to_hex(oid));
|
|
|
|
print_verbose(args, "have %s", oid_to_hex(oid));
|
2012-10-26 23:53:55 +08:00
|
|
|
in_vain++;
|
fetch-pack: add tracing for negotiation rounds
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 06:04:05 +08:00
|
|
|
haves++;
|
2012-10-26 23:53:55 +08:00
|
|
|
if (flush_at <= ++count) {
|
|
|
|
int ack;
|
|
|
|
|
fetch-pack: add tracing for negotiation rounds
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 06:04:05 +08:00
|
|
|
negotiation_round++;
|
|
|
|
trace2_region_enter_printf("negotiation_v0_v1", "round",
|
|
|
|
the_repository, "%d",
|
|
|
|
negotiation_round);
|
|
|
|
trace2_data_intmax("negotiation_v0_v1", the_repository,
|
|
|
|
"haves_added", haves);
|
|
|
|
trace2_data_intmax("negotiation_v0_v1", the_repository,
|
|
|
|
"in_vain", in_vain);
|
|
|
|
haves = 0;
|
2012-10-26 23:53:55 +08:00
|
|
|
packet_buf_flush(&req_buf);
|
|
|
|
send_request(args, fd[1], &req_buf);
|
|
|
|
strbuf_setlen(&req_buf, state_len);
|
|
|
|
flushes++;
|
2018-03-16 01:31:28 +08:00
|
|
|
flush_at = next_flush(args->stateless_rpc, count);
|
2012-10-26 23:53:55 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We keep one window "ahead" of the other side, and
|
|
|
|
* will wait for an ACK only on the next one
|
|
|
|
*/
|
|
|
|
if (!args->stateless_rpc && count == INITIAL_FLUSH)
|
|
|
|
continue;
|
|
|
|
|
2018-12-30 05:19:14 +08:00
|
|
|
consume_shallow_list(args, &reader);
|
2012-10-26 23:53:55 +08:00
|
|
|
do {
|
2018-12-30 05:19:14 +08:00
|
|
|
ack = get_ack(&reader, result_oid);
|
2016-06-12 18:53:54 +08:00
|
|
|
if (ack)
|
2016-06-12 18:53:55 +08:00
|
|
|
print_verbose(args, _("got %s %d %s"), "ack",
|
2017-05-01 10:28:54 +08:00
|
|
|
ack, oid_to_hex(result_oid));
|
2012-10-26 23:53:55 +08:00
|
|
|
switch (ack) {
|
|
|
|
case ACK:
|
fetch-pack: add tracing for negotiation rounds
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 06:04:05 +08:00
|
|
|
trace2_region_leave_printf("negotiation_v0_v1", "round",
|
|
|
|
the_repository, "%d",
|
|
|
|
negotiation_round);
|
2012-10-26 23:53:55 +08:00
|
|
|
flushes = 0;
|
|
|
|
multi_ack = 0;
|
|
|
|
retval = 0;
|
|
|
|
goto done;
|
|
|
|
case ACK_common:
|
|
|
|
case ACK_ready:
|
|
|
|
case ACK_continue: {
|
|
|
|
struct commit *commit =
|
2018-06-29 09:21:59 +08:00
|
|
|
lookup_commit(the_repository,
|
|
|
|
result_oid);
|
2018-06-15 06:54:27 +08:00
|
|
|
int was_common;
|
2018-08-03 06:30:42 +08:00
|
|
|
|
2012-10-26 23:53:55 +08:00
|
|
|
if (!commit)
|
2017-05-01 10:28:54 +08:00
|
|
|
die(_("invalid commit %s"), oid_to_hex(result_oid));
|
2018-06-15 06:54:28 +08:00
|
|
|
was_common = negotiator->ack(negotiator, commit);
|
2012-10-26 23:53:55 +08:00
|
|
|
if (args->stateless_rpc
|
|
|
|
&& ack == ACK_common
|
2018-06-15 06:54:27 +08:00
|
|
|
&& !was_common) {
|
2012-10-26 23:53:55 +08:00
|
|
|
/* We need to replay the have for this object
|
|
|
|
* on the next RPC request so the peer knows
|
|
|
|
* it is in common with us.
|
|
|
|
*/
|
2017-05-01 10:28:54 +08:00
|
|
|
const char *hex = oid_to_hex(result_oid);
|
2012-10-26 23:53:55 +08:00
|
|
|
packet_buf_write(&req_buf, "have %s\n", hex);
|
|
|
|
state_len = req_buf.len;
|
fetch-pack: add tracing for negotiation rounds
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 06:04:05 +08:00
|
|
|
haves++;
|
2016-09-24 01:41:35 +08:00
|
|
|
/*
|
|
|
|
* Reset in_vain because an ack
|
|
|
|
* for this commit has not been
|
|
|
|
* seen.
|
|
|
|
*/
|
|
|
|
in_vain = 0;
|
|
|
|
} else if (!args->stateless_rpc
|
|
|
|
|| ack != ACK_common)
|
|
|
|
in_vain = 0;
|
2012-10-26 23:53:55 +08:00
|
|
|
retval = 0;
|
|
|
|
got_continue = 1;
|
2018-06-15 06:54:24 +08:00
|
|
|
if (ack == ACK_ready)
|
2012-10-26 23:53:55 +08:00
|
|
|
got_ready = 1;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
} while (ack);
|
|
|
|
flushes--;
|
fetch-pack: add tracing for negotiation rounds
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 06:04:05 +08:00
|
|
|
trace2_region_leave_printf("negotiation_v0_v1", "round",
|
|
|
|
the_repository, "%d",
|
|
|
|
negotiation_round);
|
2012-10-26 23:53:55 +08:00
|
|
|
if (got_continue && MAX_IN_VAIN < in_vain) {
|
2016-06-12 18:53:55 +08:00
|
|
|
print_verbose(args, _("giving up"));
|
2012-10-26 23:53:55 +08:00
|
|
|
break; /* give up */
|
|
|
|
}
|
2018-06-15 06:54:24 +08:00
|
|
|
if (got_ready)
|
|
|
|
break;
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
done:
|
2019-10-03 07:49:28 +08:00
|
|
|
trace2_region_leave("fetch-pack", "negotiation_v0_v1", the_repository);
|
fetch-pack: add tracing for negotiation rounds
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 06:04:05 +08:00
|
|
|
trace2_data_intmax("negotiation_v0_v1", the_repository, "total_rounds",
|
|
|
|
negotiation_round);
|
2012-10-26 23:53:55 +08:00
|
|
|
if (!got_ready || !no_done) {
|
|
|
|
packet_buf_write(&req_buf, "done\n");
|
|
|
|
send_request(args, fd[1], &req_buf);
|
|
|
|
}
|
2016-06-12 18:53:55 +08:00
|
|
|
print_verbose(args, _("done"));
|
2012-10-26 23:53:55 +08:00
|
|
|
if (retval != 0) {
|
|
|
|
multi_ack = 0;
|
|
|
|
flushes++;
|
|
|
|
}
|
|
|
|
strbuf_release(&req_buf);
|
|
|
|
|
2014-02-06 23:10:39 +08:00
|
|
|
if (!got_ready || !no_done)
|
2018-12-30 05:19:14 +08:00
|
|
|
consume_shallow_list(args, &reader);
|
2012-10-26 23:53:55 +08:00
|
|
|
while (flushes || multi_ack) {
|
2018-12-30 05:19:14 +08:00
|
|
|
int ack = get_ack(&reader, result_oid);
|
2012-10-26 23:53:55 +08:00
|
|
|
if (ack) {
|
2016-06-12 18:53:55 +08:00
|
|
|
print_verbose(args, _("got %s (%d) %s"), "ack",
|
2017-05-01 10:28:54 +08:00
|
|
|
ack, oid_to_hex(result_oid));
|
2012-10-26 23:53:55 +08:00
|
|
|
if (ack == ACK)
|
|
|
|
return 0;
|
|
|
|
multi_ack = 1;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
flushes--;
|
|
|
|
}
|
|
|
|
/* it is no error to fetch into a completely empty repo */
|
|
|
|
return count ? retval : 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct commit_list *complete;
|
|
|
|
|
2017-05-01 10:28:54 +08:00
|
|
|
static int mark_complete(const struct object_id *oid)
|
2012-10-26 23:53:55 +08:00
|
|
|
{
|
2020-08-18 12:01:35 +08:00
|
|
|
struct commit *commit = deref_without_lazy_fetch(oid, 1);
|
|
|
|
|
|
|
|
if (commit && !(commit->object.flags & COMPLETE)) {
|
|
|
|
commit->object.flags |= COMPLETE;
|
|
|
|
commit_list_insert(commit, &complete);
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-08-26 01:09:48 +08:00
|
|
|
static int mark_complete_oid(const char *refname UNUSED,
|
2024-08-09 23:37:50 +08:00
|
|
|
const char *referent UNUSED,
|
2022-08-19 18:08:32 +08:00
|
|
|
const struct object_id *oid,
|
2022-08-26 01:09:48 +08:00
|
|
|
int flag UNUSED,
|
|
|
|
void *cb_data UNUSED)
|
2015-05-26 02:39:16 +08:00
|
|
|
{
|
2017-05-01 10:28:54 +08:00
|
|
|
return mark_complete(oid);
|
2015-05-26 02:39:16 +08:00
|
|
|
}
|
|
|
|
|
2012-10-26 23:53:55 +08:00
|
|
|
static void mark_recent_complete_commits(struct fetch_pack_args *args,
|
2017-04-27 03:29:31 +08:00
|
|
|
timestamp_t cutoff)
|
2012-10-26 23:53:55 +08:00
|
|
|
{
|
|
|
|
while (complete && cutoff <= complete->item->date) {
|
2016-06-12 18:53:55 +08:00
|
|
|
print_verbose(args, _("Marking %s as complete"),
|
2016-06-12 18:53:54 +08:00
|
|
|
oid_to_hex(&complete->item->object.oid));
|
2012-10-26 23:53:55 +08:00
|
|
|
pop_most_recent_commit(&complete, COMPLETE);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-05-16 01:32:20 +08:00
|
|
|
static void add_refs_to_oidset(struct oidset *oids, struct ref *refs)
|
|
|
|
{
|
|
|
|
for (; refs; refs = refs->next)
|
|
|
|
oidset_insert(oids, &refs->old_oid);
|
|
|
|
}
|
|
|
|
|
2018-10-04 23:09:06 +08:00
|
|
|
static int is_unmatched_ref(const struct ref *ref)
|
|
|
|
{
|
|
|
|
struct object_id oid;
|
|
|
|
const char *p;
|
|
|
|
return ref->match_status == REF_NOT_MATCHED &&
|
|
|
|
!parse_oid_hex(ref->name, &oid, &p) &&
|
|
|
|
*p == '\0' &&
|
|
|
|
oideq(&oid, &ref->old_oid);
|
|
|
|
}
|
|
|
|
|
2012-10-26 23:53:55 +08:00
|
|
|
static void filter_refs(struct fetch_pack_args *args,
|
2013-01-30 06:02:15 +08:00
|
|
|
struct ref **refs,
|
|
|
|
struct ref **sought, int nr_sought)
|
2012-10-26 23:53:55 +08:00
|
|
|
{
|
|
|
|
struct ref *newlist = NULL;
|
|
|
|
struct ref **newtail = &newlist;
|
2017-05-16 01:32:20 +08:00
|
|
|
struct ref *unmatched = NULL;
|
2012-10-26 23:53:55 +08:00
|
|
|
struct ref *ref, *next;
|
2017-05-16 01:32:20 +08:00
|
|
|
struct oidset tip_oids = OIDSET_INIT;
|
2013-01-30 06:02:15 +08:00
|
|
|
int i;
|
2018-10-04 23:09:39 +08:00
|
|
|
int strict = !(allow_unadvertised_object_request &
|
|
|
|
(ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1));
|
2012-10-26 23:53:55 +08:00
|
|
|
|
2013-01-30 06:02:15 +08:00
|
|
|
i = 0;
|
2012-10-26 23:53:55 +08:00
|
|
|
for (ref = *refs; ref; ref = next) {
|
|
|
|
int keep = 0;
|
|
|
|
next = ref->next;
|
2013-01-30 06:02:15 +08:00
|
|
|
|
2014-06-07 01:24:48 +08:00
|
|
|
if (starts_with(ref->name, "refs/") &&
|
fetch: do not consider peeled tags as advertised tips
Our filter_refs() function accidentally considers the target of a peeled
tag to be advertised by the server, even though upload-pack on the
server side does not consider it so. This can result in the client
making a bogus fetch to the server, which will end with the server
complaining "not our ref". Whereas the correct behavior is for the
client to notice that the server will not allow the request and error
out immediately.
So as bugs go, this is not very serious (the outcome is the same either
way -- the fetch fails). But it's worth making the logic here correct
and consistent with other related cases (e.g., fetching an oid that the
server did not mention at all).
The crux of the issue comes from fdb69d33c4 (fetch-pack: always allow
fetching of literal SHA1s, 2017-05-15). After that, the strategy of
filter_refs() is basically:
- for each advertised ref, try to match it with a "sought" ref
provided by the user. Skip any malformed refs (which includes
peeled values like "refs/tags/foo^{}"), and place any unmatched
items onto the unmatched list.
- if there are unmatched sought refs, then put all of the advertised
tips into an oidset, including the unmatched ones.
- for each sought ref, see if it's in the oidset, in which case it's
legal for us to ask the server for it
The problem is in the second step. Our list of unmatched refs includes
the peeled refs, even though upload-pack does not allow them to be
directly fetched. So the simplest fix would be to exclude them during
that step.
However, we can observe that the unmatched list isn't used for anything
else, and is freed at the end. We can just free those malformed refs
immediately. That saves us having to check each ref a second time to see
if it's malformed.
Note that this code only kicks in when "strict" is in effect. I.e., if
we are using the v0 protocol and uploadpack.allowReachableSHA1InWant is
not in effect. With v2, all oids are allowed, and we do not bother
creating or consulting the oidset at all. To future-proof our test
against the upcoming GIT_TEST_PROTOCOL_VERSION flag, we'll manually mark
it as a v0-only test.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-13 13:57:37 +08:00
|
|
|
check_refname_format(ref->name, 0)) {
|
|
|
|
/*
|
|
|
|
* trash or a peeled value; do not even add it to
|
|
|
|
* unmatched list
|
|
|
|
*/
|
|
|
|
free_one_ref(ref);
|
|
|
|
continue;
|
|
|
|
} else {
|
2013-01-30 06:02:15 +08:00
|
|
|
while (i < nr_sought) {
|
|
|
|
int cmp = strcmp(ref->name, sought[i]->name);
|
2012-10-26 23:53:55 +08:00
|
|
|
if (cmp < 0)
|
|
|
|
break; /* definitely do not have it */
|
|
|
|
else if (cmp == 0) {
|
|
|
|
keep = 1; /* definitely have it */
|
2017-02-23 00:05:57 +08:00
|
|
|
sought[i]->match_status = REF_MATCHED;
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
2013-01-30 06:02:15 +08:00
|
|
|
i++;
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
|
|
|
|
2018-06-11 13:53:57 +08:00
|
|
|
if (!keep && args->fetch_all &&
|
|
|
|
(!args->deepen || !starts_with(ref->name, "refs/tags/")))
|
|
|
|
keep = 1;
|
|
|
|
}
|
2012-10-26 23:53:55 +08:00
|
|
|
|
|
|
|
if (keep) {
|
|
|
|
*newtail = ref;
|
|
|
|
ref->next = NULL;
|
|
|
|
newtail = &ref->next;
|
|
|
|
} else {
|
2017-05-16 01:32:20 +08:00
|
|
|
ref->next = unmatched;
|
|
|
|
unmatched = ref;
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-10-04 23:09:39 +08:00
|
|
|
if (strict) {
|
|
|
|
for (i = 0; i < nr_sought; i++) {
|
|
|
|
ref = sought[i];
|
|
|
|
if (!is_unmatched_ref(ref))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
add_refs_to_oidset(&tip_oids, unmatched);
|
|
|
|
add_refs_to_oidset(&tip_oids, newlist);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-01-30 06:02:15 +08:00
|
|
|
/* Append unmatched requests to the list */
|
2017-02-23 00:05:57 +08:00
|
|
|
for (i = 0; i < nr_sought; i++) {
|
|
|
|
ref = sought[i];
|
2018-10-04 23:09:06 +08:00
|
|
|
if (!is_unmatched_ref(ref))
|
2017-02-23 00:05:57 +08:00
|
|
|
continue;
|
2013-01-30 06:02:15 +08:00
|
|
|
|
2018-10-04 23:09:39 +08:00
|
|
|
if (!strict || oidset_contains(&tip_oids, &ref->old_oid)) {
|
2017-02-23 00:05:57 +08:00
|
|
|
ref->match_status = REF_MATCHED;
|
filter_ref: make a copy of extra "sought" entries
If the server supports allow_tip_sha1_in_want, we add any
unmatched raw-sha1 entries in our "sought" list of refs to
the list of refs we will ask the other side for. We do so by
inserting the original "struct ref" directly into our list,
rather than making a copy. This has several problems.
The most minor problem is that one cannot ever free the
resulting list; it contains structs that are copies of the
remote refs (made earlier by fetch_pack) along with sought
refs that are referenced elsewhere.
But more importantly that we set the ref->next pointer to
NULL, chopping off the remainder of any existing list that
the ref was a part of. We get the set of "sought" refs in
an array rather than a linked list, but that array is often
in turn generated from a list. The test modification in
t5516 demonstrates this. Rather than fetching just an exact
sha1, we fetch that sha1 plus another ref:
- we build a linked list of refs to fetch when do_fetch
calls get_ref_map; the exact sha1 is first, followed by
the named ref ("refs/heads/extra" in this case).
- we pass that linked list to transport_fetch_ref, which
squashes it into an array of pointers
- that array goes to fetch_pack, which calls filter_ref.
There we generate the want list from a mix of what the
remote side has advertised, and the "sought" entry for
the exact sha1. We set the sought entry's "next" pointer
to NULL.
- after we return from transport_fetch_refs, we then try
to update the refs by following the linked list. But our
list is now truncated, and we do not update
refs/heads/extra at all.
We can fix this by making a copy of the ref. There's nothing
that fetch_pack does to it that must be reflected in the
original "sought" list (and indeed, if that were the case we
would have a serious bug, because it is only exact-sha1
entries which are treated this way).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-20 04:37:09 +08:00
|
|
|
*newtail = copy_ref(ref);
|
|
|
|
newtail = &(*newtail)->next;
|
2017-02-23 00:05:57 +08:00
|
|
|
} else {
|
|
|
|
ref->match_status = REF_UNADVERTISED_NOT_ALLOWED;
|
2013-01-30 06:02:15 +08:00
|
|
|
}
|
|
|
|
}
|
2017-05-16 01:32:20 +08:00
|
|
|
|
|
|
|
oidset_clear(&tip_oids);
|
2019-04-13 13:54:09 +08:00
|
|
|
free_refs(unmatched);
|
2017-05-16 01:32:20 +08:00
|
|
|
|
2012-10-26 23:53:55 +08:00
|
|
|
*refs = newlist;
|
|
|
|
}
|
|
|
|
|
2023-02-24 14:39:35 +08:00
|
|
|
static void mark_alternate_complete(struct fetch_negotiator *negotiator UNUSED,
|
2018-06-15 06:54:26 +08:00
|
|
|
struct object *obj)
|
2012-10-26 23:53:55 +08:00
|
|
|
{
|
2017-05-01 10:28:54 +08:00
|
|
|
mark_complete(&obj->oid);
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
|
|
|
|
2018-06-07 04:47:07 +08:00
|
|
|
/*
|
|
|
|
* Mark recent commits available locally and reachable from a local ref as
|
2020-08-18 12:01:37 +08:00
|
|
|
* COMPLETE.
|
2018-06-07 04:47:07 +08:00
|
|
|
*
|
|
|
|
* The cutoff time for recency is determined by this heuristic: it is the
|
|
|
|
* earliest commit time of the objects in refs that are commits and that we know
|
|
|
|
* the commit time of.
|
|
|
|
*/
|
2018-06-15 06:54:28 +08:00
|
|
|
static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
|
2018-06-15 06:54:26 +08:00
|
|
|
struct fetch_pack_args *args,
|
2018-06-07 04:47:07 +08:00
|
|
|
struct ref **refs)
|
2012-10-26 23:53:55 +08:00
|
|
|
{
|
|
|
|
struct ref *ref;
|
fetch-pack: restore save_commit_buffer after use
In fetch-pack, the global variable save_commit_buffer is set to 0, but
not restored to its original value after use.
In particular, if show_log() (in log-tree.c) is invoked after
fetch_pack() in the same process, show_log() will return before printing
out the commit message (because the invocation to
get_cached_commit_buffer() returns NULL, because the commit buffer was
not saved). I discovered this when attempting to run "git log -S" in a
partial clone, triggering the case where revision walking lazily loads
missing objects.
Therefore, restore save_commit_buffer to its original value after use.
An alternative to solve the problem I had is to replace
get_cached_commit_buffer() with get_commit_buffer(). That invocation was
introduced in commit a97934d ("use get_cached_commit_buffer where
appropriate", 2014-06-13) to replace "commit->buffer" introduced in
commit 3131b71 ("Add "--show-all" revision walker flag for debugging",
2008-02-13). In the latter commit, the commit author seems to be
deciding between not showing an unparsed commit at all and showing an
unparsed commit without the message (which is what the commit does), and
did not mention parsing the unparsed commit, so I prefer to preserve the
existing behavior.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08 23:58:48 +08:00
|
|
|
int old_save_commit_buffer = save_commit_buffer;
|
2017-04-27 03:29:31 +08:00
|
|
|
timestamp_t cutoff = 0;
|
2012-10-26 23:53:55 +08:00
|
|
|
|
2022-03-28 22:02:06 +08:00
|
|
|
if (args->refetch)
|
|
|
|
return;
|
|
|
|
|
2012-10-26 23:53:55 +08:00
|
|
|
save_commit_buffer = 0;
|
|
|
|
|
2019-11-20 07:02:09 +08:00
|
|
|
trace2_region_enter("fetch-pack", "parse_remote_refs_and_find_cutoff", NULL);
|
2012-10-26 23:53:55 +08:00
|
|
|
for (ref = *refs; ref; ref = ref->next) {
|
2022-02-10 20:28:09 +08:00
|
|
|
struct commit *commit;
|
|
|
|
|
|
|
|
commit = lookup_commit_in_graph(the_repository, &ref->old_oid);
|
|
|
|
if (!commit) {
|
|
|
|
struct object *o;
|
2012-10-26 23:53:55 +08:00
|
|
|
|
2023-03-28 21:58:50 +08:00
|
|
|
if (!repo_has_object_file_with_flags(the_repository, &ref->old_oid,
|
|
|
|
OBJECT_INFO_QUICK |
|
|
|
|
OBJECT_INFO_SKIP_FETCH_OBJECT))
|
2022-02-10 20:28:09 +08:00
|
|
|
continue;
|
|
|
|
o = parse_object(the_repository, &ref->old_oid);
|
|
|
|
if (!o || o->type != OBJ_COMMIT)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
commit = (struct commit *)o;
|
|
|
|
}
|
2012-10-26 23:53:55 +08:00
|
|
|
|
2019-11-20 07:02:09 +08:00
|
|
|
/*
|
|
|
|
* We already have it -- which may mean that we were
|
2012-10-26 23:53:55 +08:00
|
|
|
* in sync with the other side at some time after
|
|
|
|
* that (it is OK if we guess wrong here).
|
|
|
|
*/
|
2022-02-10 20:28:09 +08:00
|
|
|
if (!cutoff || cutoff < commit->date)
|
|
|
|
cutoff = commit->date;
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
2019-11-20 07:02:09 +08:00
|
|
|
trace2_region_leave("fetch-pack", "parse_remote_refs_and_find_cutoff", NULL);
|
2012-10-26 23:53:55 +08:00
|
|
|
|
2019-11-20 07:02:09 +08:00
|
|
|
/*
|
|
|
|
* This block marks all local refs as COMPLETE, and then recursively marks all
|
|
|
|
* parents of those refs as COMPLETE.
|
|
|
|
*/
|
|
|
|
trace2_region_enter("fetch-pack", "mark_complete_local_refs", NULL);
|
fetch-pack: avoid object flags if no_dependents
When fetch_pack() is invoked as part of another Git command (due to a
lazy fetch from a partial clone, for example), it uses object flags that
may already be used by the outer Git command.
The commit that introduced the lazy fetch feature (88e2f9ed8e
("introduce fetch-object: fetch one promisor object", 2017-12-05)) tried
to avoid this overlap, but it did not avoid it totally. It was
successful in avoiding writing COMPLETE, but did not avoid reading
COMPLETE, and did not avoid writing and reading ALTERNATE.
Ensure that no flags are written or read by fetch_pack() in the case
where it is used to perform a lazy fetch. To do this, it is sufficient
to avoid checking completeness of wanted refs (unnecessary in the case
of lazy fetches), and to avoid negotiation-related work (in the current
implementation, already, no negotiation is performed). After that was
done, the lack of overlap was verified by checking all direct and
indirect usages of COMPLETE and ALTERNATE - that they are read or
written only if no_dependents is false.
There are other possible solutions to this issue:
(1) Split fetch-pack.{c,h} into a flag-using part and a non-flag-using
part, and whenever no_dependents is set, only use the
non-flag-using part.
(2) Make fetch_pack() be able to be used with arbitrary repository
objects. fetch_pack() should then create its own repository object
based on the given repository object, with its own object
hashtable, so that the flags do not conflict.
(1) is possible but invasive - some functions would need to be split;
and such invasiveness would potentially be unnecessary if we ever were
to need (2) anyway. (2) would be useful if we were to support, say,
submodules that were partial clones themselves, but I don't know when or
if the Git project plans to support those.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04 07:04:52 +08:00
|
|
|
if (!args->deepen) {
|
2024-05-07 15:11:53 +08:00
|
|
|
refs_for_each_rawref(get_main_ref_store(the_repository),
|
|
|
|
mark_complete_oid, NULL);
|
fetch-pack: avoid object flags if no_dependents
When fetch_pack() is invoked as part of another Git command (due to a
lazy fetch from a partial clone, for example), it uses object flags that
may already be used by the outer Git command.
The commit that introduced the lazy fetch feature (88e2f9ed8e
("introduce fetch-object: fetch one promisor object", 2017-12-05)) tried
to avoid this overlap, but it did not avoid it totally. It was
successful in avoiding writing COMPLETE, but did not avoid reading
COMPLETE, and did not avoid writing and reading ALTERNATE.
Ensure that no flags are written or read by fetch_pack() in the case
where it is used to perform a lazy fetch. To do this, it is sufficient
to avoid checking completeness of wanted refs (unnecessary in the case
of lazy fetches), and to avoid negotiation-related work (in the current
implementation, already, no negotiation is performed). After that was
done, the lack of overlap was verified by checking all direct and
indirect usages of COMPLETE and ALTERNATE - that they are read or
written only if no_dependents is false.
There are other possible solutions to this issue:
(1) Split fetch-pack.{c,h} into a flag-using part and a non-flag-using
part, and whenever no_dependents is set, only use the
non-flag-using part.
(2) Make fetch_pack() be able to be used with arbitrary repository
objects. fetch_pack() should then create its own repository object
based on the given repository object, with its own object
hashtable, so that the flags do not conflict.
(1) is possible but invasive - some functions would need to be split;
and such invasiveness would potentially be unnecessary if we ever were
to need (2) anyway. (2) would be useful if we were to support, say,
submodules that were partial clones themselves, but I don't know when or
if the Git project plans to support those.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04 07:04:52 +08:00
|
|
|
for_each_cached_alternate(NULL, mark_alternate_complete);
|
|
|
|
commit_list_sort_by_date(&complete);
|
|
|
|
if (cutoff)
|
|
|
|
mark_recent_complete_commits(args, cutoff);
|
|
|
|
}
|
2019-11-20 07:02:09 +08:00
|
|
|
trace2_region_leave("fetch-pack", "mark_complete_local_refs", NULL);
|
2012-10-26 23:53:55 +08:00
|
|
|
|
fetch-pack: avoid object flags if no_dependents
When fetch_pack() is invoked as part of another Git command (due to a
lazy fetch from a partial clone, for example), it uses object flags that
may already be used by the outer Git command.
The commit that introduced the lazy fetch feature (88e2f9ed8e
("introduce fetch-object: fetch one promisor object", 2017-12-05)) tried
to avoid this overlap, but it did not avoid it totally. It was
successful in avoiding writing COMPLETE, but did not avoid reading
COMPLETE, and did not avoid writing and reading ALTERNATE.
Ensure that no flags are written or read by fetch_pack() in the case
where it is used to perform a lazy fetch. To do this, it is sufficient
to avoid checking completeness of wanted refs (unnecessary in the case
of lazy fetches), and to avoid negotiation-related work (in the current
implementation, already, no negotiation is performed). After that was
done, the lack of overlap was verified by checking all direct and
indirect usages of COMPLETE and ALTERNATE - that they are read or
written only if no_dependents is false.
There are other possible solutions to this issue:
(1) Split fetch-pack.{c,h} into a flag-using part and a non-flag-using
part, and whenever no_dependents is set, only use the
non-flag-using part.
(2) Make fetch_pack() be able to be used with arbitrary repository
objects. fetch_pack() should then create its own repository object
based on the given repository object, with its own object
hashtable, so that the flags do not conflict.
(1) is possible but invasive - some functions would need to be split;
and such invasiveness would potentially be unnecessary if we ever were
to need (2) anyway. (2) would be useful if we were to support, say,
submodules that were partial clones themselves, but I don't know when or
if the Git project plans to support those.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04 07:04:52 +08:00
|
|
|
/*
|
|
|
|
* Mark all complete remote refs as common refs.
|
|
|
|
* Don't mark them common yet; the server has to be told so first.
|
|
|
|
*/
|
2019-11-20 07:02:09 +08:00
|
|
|
trace2_region_enter("fetch-pack", "mark_common_remote_refs", NULL);
|
fetch-pack: avoid object flags if no_dependents
When fetch_pack() is invoked as part of another Git command (due to a
lazy fetch from a partial clone, for example), it uses object flags that
may already be used by the outer Git command.
The commit that introduced the lazy fetch feature (88e2f9ed8e
("introduce fetch-object: fetch one promisor object", 2017-12-05)) tried
to avoid this overlap, but it did not avoid it totally. It was
successful in avoiding writing COMPLETE, but did not avoid reading
COMPLETE, and did not avoid writing and reading ALTERNATE.
Ensure that no flags are written or read by fetch_pack() in the case
where it is used to perform a lazy fetch. To do this, it is sufficient
to avoid checking completeness of wanted refs (unnecessary in the case
of lazy fetches), and to avoid negotiation-related work (in the current
implementation, already, no negotiation is performed). After that was
done, the lack of overlap was verified by checking all direct and
indirect usages of COMPLETE and ALTERNATE - that they are read or
written only if no_dependents is false.
There are other possible solutions to this issue:
(1) Split fetch-pack.{c,h} into a flag-using part and a non-flag-using
part, and whenever no_dependents is set, only use the
non-flag-using part.
(2) Make fetch_pack() be able to be used with arbitrary repository
objects. fetch_pack() should then create its own repository object
based on the given repository object, with its own object
hashtable, so that the flags do not conflict.
(1) is possible but invasive - some functions would need to be split;
and such invasiveness would potentially be unnecessary if we ever were
to need (2) anyway. (2) would be useful if we were to support, say,
submodules that were partial clones themselves, but I don't know when or
if the Git project plans to support those.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04 07:04:52 +08:00
|
|
|
for (ref = *refs; ref; ref = ref->next) {
|
2020-08-18 12:01:35 +08:00
|
|
|
struct commit *c = deref_without_lazy_fetch(&ref->old_oid, 0);
|
2012-10-26 23:53:55 +08:00
|
|
|
|
2020-08-18 12:01:35 +08:00
|
|
|
if (!c || !(c->object.flags & COMPLETE))
|
fetch-pack: avoid object flags if no_dependents
When fetch_pack() is invoked as part of another Git command (due to a
lazy fetch from a partial clone, for example), it uses object flags that
may already be used by the outer Git command.
The commit that introduced the lazy fetch feature (88e2f9ed8e
("introduce fetch-object: fetch one promisor object", 2017-12-05)) tried
to avoid this overlap, but it did not avoid it totally. It was
successful in avoiding writing COMPLETE, but did not avoid reading
COMPLETE, and did not avoid writing and reading ALTERNATE.
Ensure that no flags are written or read by fetch_pack() in the case
where it is used to perform a lazy fetch. To do this, it is sufficient
to avoid checking completeness of wanted refs (unnecessary in the case
of lazy fetches), and to avoid negotiation-related work (in the current
implementation, already, no negotiation is performed). After that was
done, the lack of overlap was verified by checking all direct and
indirect usages of COMPLETE and ALTERNATE - that they are read or
written only if no_dependents is false.
There are other possible solutions to this issue:
(1) Split fetch-pack.{c,h} into a flag-using part and a non-flag-using
part, and whenever no_dependents is set, only use the
non-flag-using part.
(2) Make fetch_pack() be able to be used with arbitrary repository
objects. fetch_pack() should then create its own repository object
based on the given repository object, with its own object
hashtable, so that the flags do not conflict.
(1) is possible but invasive - some functions would need to be split;
and such invasiveness would potentially be unnecessary if we ever were
to need (2) anyway. (2) would be useful if we were to support, say,
submodules that were partial clones themselves, but I don't know when or
if the Git project plans to support those.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04 07:04:52 +08:00
|
|
|
continue;
|
2012-10-26 23:53:55 +08:00
|
|
|
|
2020-08-18 12:01:35 +08:00
|
|
|
negotiator->known_common(negotiator, c);
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
2019-11-20 07:02:09 +08:00
|
|
|
trace2_region_leave("fetch-pack", "mark_common_remote_refs", NULL);
|
2012-10-26 23:53:55 +08:00
|
|
|
|
2018-06-07 04:47:07 +08:00
|
|
|
save_commit_buffer = old_save_commit_buffer;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Returns 1 if every object pointed to by the given remote refs is available
|
|
|
|
* locally and reachable from a local ref, and 0 otherwise.
|
|
|
|
*/
|
|
|
|
static int everything_local(struct fetch_pack_args *args,
|
|
|
|
struct ref **refs)
|
|
|
|
{
|
|
|
|
struct ref *ref;
|
|
|
|
int retval;
|
2012-10-26 23:53:55 +08:00
|
|
|
|
|
|
|
for (retval = 1, ref = *refs; ref ; ref = ref->next) {
|
2017-05-01 10:28:54 +08:00
|
|
|
const struct object_id *remote = &ref->old_oid;
|
2012-10-26 23:53:55 +08:00
|
|
|
struct object *o;
|
|
|
|
|
2019-06-20 15:41:14 +08:00
|
|
|
o = lookup_object(the_repository, remote);
|
2012-10-26 23:53:55 +08:00
|
|
|
if (!o || !(o->flags & COMPLETE)) {
|
|
|
|
retval = 0;
|
2017-05-01 10:28:54 +08:00
|
|
|
print_verbose(args, "want %s (%s)", oid_to_hex(remote),
|
2016-06-12 18:53:54 +08:00
|
|
|
ref->name);
|
2012-10-26 23:53:55 +08:00
|
|
|
continue;
|
|
|
|
}
|
2017-05-01 10:28:54 +08:00
|
|
|
print_verbose(args, _("already have %s (%s)"), oid_to_hex(remote),
|
2016-06-12 18:53:54 +08:00
|
|
|
ref->name);
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
fetch-pack: restore save_commit_buffer after use
In fetch-pack, the global variable save_commit_buffer is set to 0, but
not restored to its original value after use.
In particular, if show_log() (in log-tree.c) is invoked after
fetch_pack() in the same process, show_log() will return before printing
out the commit message (because the invocation to
get_cached_commit_buffer() returns NULL, because the commit buffer was
not saved). I discovered this when attempting to run "git log -S" in a
partial clone, triggering the case where revision walking lazily loads
missing objects.
Therefore, restore save_commit_buffer to its original value after use.
An alternative to solve the problem I had is to replace
get_cached_commit_buffer() with get_commit_buffer(). That invocation was
introduced in commit a97934d ("use get_cached_commit_buffer where
appropriate", 2014-06-13) to replace "commit->buffer" introduced in
commit 3131b71 ("Add "--show-all" revision walker flag for debugging",
2008-02-13). In the latter commit, the commit author seems to be
deciding between not showing an unparsed commit at all and showing an
unparsed commit without the message (which is what the commit does), and
did not mention parsing the unparsed commit, so I prefer to preserve the
existing behavior.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08 23:58:48 +08:00
|
|
|
|
2012-10-26 23:53:55 +08:00
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
|
2022-08-26 01:09:48 +08:00
|
|
|
static int sideband_demux(int in UNUSED, int out, void *data)
|
2012-10-26 23:53:55 +08:00
|
|
|
{
|
|
|
|
int *xd = data;
|
fetch-pack: ignore SIGPIPE in sideband demuxer
If the other side feeds us a bogus pack, index-pack (or
unpack-objects) may die early, before consuming all of its
input. As a result, the sideband demuxer may get SIGPIPE
(racily, depending on whether our data made it into the pipe
buffer or not). If this happens and we are compiled with
pthread support, it will take down the main thread, too.
This isn't the end of the world, as the main process will
just die() anyway when it sees index-pack failed. But it
does mean we don't get a chance to say "fatal: index-pack
failed" or similar. And it also means that we racily fail
t5504, as we sometimes die() and sometimes are killed by
SIGPIPE.
So let's ignore SIGPIPE while demuxing the sideband. We are
already careful to check the return value of write(), so we
won't waste time writing to a broken pipe. The caller will
notice the error return from the async thread, though in
practice we don't even get that far, as we die() as soon as
we see that index-pack failed.
The non-sideband case is already fine; we let index-pack
read straight from the socket, so there is no SIGPIPE at
all. Technically the non-threaded async case is also OK
without this (the forked async process gets SIGPIPE), but
it's not worth distinguishing from the threaded case here.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-24 15:44:58 +08:00
|
|
|
int ret;
|
2012-10-26 23:53:55 +08:00
|
|
|
|
fetch-pack: ignore SIGPIPE in sideband demuxer
If the other side feeds us a bogus pack, index-pack (or
unpack-objects) may die early, before consuming all of its
input. As a result, the sideband demuxer may get SIGPIPE
(racily, depending on whether our data made it into the pipe
buffer or not). If this happens and we are compiled with
pthread support, it will take down the main thread, too.
This isn't the end of the world, as the main process will
just die() anyway when it sees index-pack failed. But it
does mean we don't get a chance to say "fatal: index-pack
failed" or similar. And it also means that we racily fail
t5504, as we sometimes die() and sometimes are killed by
SIGPIPE.
So let's ignore SIGPIPE while demuxing the sideband. We are
already careful to check the return value of write(), so we
won't waste time writing to a broken pipe. The caller will
notice the error return from the async thread, though in
practice we don't even get that far, as we die() as soon as
we see that index-pack failed.
The non-sideband case is already fine; we let index-pack
read straight from the socket, so there is no SIGPIPE at
all. Technically the non-threaded async case is also OK
without this (the forked async process gets SIGPIPE), but
it's not worth distinguishing from the threaded case here.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-24 15:44:58 +08:00
|
|
|
ret = recv_sideband("fetch-pack", xd[0], out);
|
2012-10-26 23:53:55 +08:00
|
|
|
close(out);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2021-01-12 16:21:58 +08:00
|
|
|
static void create_promisor_file(const char *keep_name,
|
|
|
|
struct ref **sought, int nr_sought)
|
2019-10-15 08:12:31 +08:00
|
|
|
{
|
|
|
|
struct strbuf promisor_name = STRBUF_INIT;
|
|
|
|
int suffix_stripped;
|
|
|
|
|
|
|
|
strbuf_addstr(&promisor_name, keep_name);
|
|
|
|
suffix_stripped = strbuf_strip_suffix(&promisor_name, ".keep");
|
|
|
|
if (!suffix_stripped)
|
|
|
|
BUG("name of pack lockfile should end with .keep (was '%s')",
|
|
|
|
keep_name);
|
|
|
|
strbuf_addstr(&promisor_name, ".promisor");
|
|
|
|
|
2021-01-12 16:21:59 +08:00
|
|
|
write_promisor_file(promisor_name.buf, sought, nr_sought);
|
2019-10-15 08:12:31 +08:00
|
|
|
|
|
|
|
strbuf_release(&promisor_name);
|
|
|
|
}
|
|
|
|
|
2021-02-23 03:20:09 +08:00
|
|
|
static void parse_gitmodules_oids(int fd, struct oidset *gitmodules_oids)
|
|
|
|
{
|
|
|
|
int len = the_hash_algo->hexsz + 1; /* hash + NL */
|
|
|
|
|
|
|
|
do {
|
|
|
|
char hex_hash[GIT_MAX_HEXSZ + 1];
|
|
|
|
int read_len = read_in_full(fd, hex_hash, len);
|
|
|
|
struct object_id oid;
|
|
|
|
const char *end;
|
|
|
|
|
|
|
|
if (!read_len)
|
|
|
|
return;
|
|
|
|
if (read_len != len)
|
|
|
|
die("invalid length read %d", read_len);
|
|
|
|
if (parse_oid_hex(hex_hash, &oid, &end) || *end != '\n')
|
|
|
|
die("invalid hash");
|
|
|
|
oidset_insert(gitmodules_oids, &oid);
|
|
|
|
} while (1);
|
|
|
|
}
|
|
|
|
|
2022-05-17 04:11:01 +08:00
|
|
|
static void add_index_pack_keep_option(struct strvec *args)
|
|
|
|
{
|
|
|
|
char hostname[HOST_NAME_MAX + 1];
|
|
|
|
|
|
|
|
if (xgethostname(hostname, sizeof(hostname)))
|
|
|
|
xsnprintf(hostname, sizeof(hostname), "localhost");
|
|
|
|
strvec_pushf(args, "--keep=fetch-pack %"PRIuMAX " on %s",
|
|
|
|
(uintmax_t)getpid(), hostname);
|
|
|
|
}
|
|
|
|
|
2020-08-18 03:48:19 +08:00
|
|
|
/*
|
2021-02-23 03:20:08 +08:00
|
|
|
* If packfile URIs were provided, pass a non-NULL pointer to index_pack_args.
|
|
|
|
* The strings to pass as the --index-pack-arg arguments to http-fetch will be
|
|
|
|
* stored there. (It must be freed by the caller.)
|
2020-08-18 03:48:19 +08:00
|
|
|
*/
|
2012-10-26 23:53:55 +08:00
|
|
|
static int get_pack(struct fetch_pack_args *args,
|
fetch-pack: support more than one pack lockfile
Whenever a fetch results in a packfile being downloaded, a .keep file is
generated, so that the packfile can be preserved (from, say, a running
"git repack") until refs are written referring to the contents of the
packfile.
In a subsequent patch, a successful fetch using protocol v2 may result
in more than one .keep file being generated. Therefore, teach
fetch_pack() and the transport mechanism to support multiple .keep
files.
Implementation notes:
- builtin/fetch-pack.c normally does not generate .keep files, and thus
is unaffected by this or future changes. However, it has an
undocumented "--lock-pack" feature, used by remote-curl.c when
implementing the "fetch" remote helper command. In keeping with the
remote helper protocol, only one "lock" line will ever be written;
the rest will result in warnings to stderr. However, in practice,
warnings will never be written because the remote-curl.c "fetch" is
only used for protocol v0/v1 (which will not generate multiple .keep
files). (Protocol v2 uses the "stateless-connect" command, not the
"fetch" command.)
- connected.c has an optimization in that connectivity checks on a ref
need not be done if the target object is in a pack known to be
self-contained and connected. If there are multiple packfiles, this
optimization can no longer be done.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 04:57:22 +08:00
|
|
|
int xd[2], struct string_list *pack_lockfiles,
|
2021-02-23 03:20:08 +08:00
|
|
|
struct strvec *index_pack_args,
|
2021-02-23 03:20:09 +08:00
|
|
|
struct ref **sought, int nr_sought,
|
|
|
|
struct oidset *gitmodules_oids)
|
2012-10-26 23:53:55 +08:00
|
|
|
{
|
|
|
|
struct async demux;
|
|
|
|
int do_keep = args->keep_pack;
|
2015-09-25 05:07:54 +08:00
|
|
|
const char *cmd_name;
|
|
|
|
struct pack_header header;
|
|
|
|
int pass_header = 0;
|
2014-08-20 03:09:35 +08:00
|
|
|
struct child_process cmd = CHILD_PROCESS_INIT;
|
2021-02-23 03:20:09 +08:00
|
|
|
int fsck_objects = 0;
|
2013-05-26 09:16:17 +08:00
|
|
|
int ret;
|
2012-10-26 23:53:55 +08:00
|
|
|
|
|
|
|
memset(&demux, 0, sizeof(demux));
|
|
|
|
if (use_sideband) {
|
|
|
|
/* xd[] is talking with upload-pack; subprocess reads from
|
|
|
|
* xd[0], spits out band#2 to stderr, and feeds us band#1
|
|
|
|
* through demux->out.
|
|
|
|
*/
|
|
|
|
demux.proc = sideband_demux;
|
|
|
|
demux.data = xd;
|
|
|
|
demux.out = -1;
|
2016-04-20 06:50:29 +08:00
|
|
|
demux.isolate_sigpipe = 1;
|
2012-10-26 23:53:55 +08:00
|
|
|
if (start_async(&demux))
|
2016-06-12 18:53:55 +08:00
|
|
|
die(_("fetch-pack: unable to fork off sideband demultiplexer"));
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
|
|
|
else
|
|
|
|
demux.out = xd[0];
|
|
|
|
|
fetch-pack: do not mix --pack_header and packfile uri
When fetching (as opposed to cloning) from a repository with packfile
URIs enabled, an error like this may occur:
fatal: pack has bad object at offset 12: unknown object type 5
fatal: finish_http_pack_request gave result -1
fatal: fetch-pack: expected keep then TAB at start of http-fetch output
This bug was introduced in b664e9ffa1 ("fetch-pack: with packfile URIs,
use index-pack arg", 2021-02-22), when the index-pack args used when
processing the inline packfile of a fetch response and when processing
packfile URIs were unified.
This bug happens because fetch, by default, partially reads (and
consumes) the header of the inline packfile to determine if it should
store the downloaded objects as a packfile or loose objects, and thus
passes --pack_header=<...> to index-pack to inform it that some bytes
are missing. However, when it subsequently fetches the additional
packfiles linked by URIs, it reuses the same index-pack arguments, thus
wrongly passing --index-pack-arg=--pack_header=<...> when no bytes are
missing.
This does not happen when cloning because "git clone" always passes
do_keep, which instructs the fetch mechanism to always retain the
packfile, eliminating the need to read the header.
There are a few ways to fix this, including filtering out pack_header
arguments when downloading the additional packfiles, but I decided to
stick to always using index-pack throughout when packfile URIs are
present - thus, Git no longer needs to read the bytes, and no longer
needs --pack_header here.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-05 09:16:20 +08:00
|
|
|
if (!args->keep_pack && unpack_limit && !index_pack_args) {
|
2012-10-26 23:53:55 +08:00
|
|
|
|
|
|
|
if (read_pack_header(demux.out, &header))
|
2016-06-12 18:53:55 +08:00
|
|
|
die(_("protocol error: bad pack header"));
|
2015-09-25 05:07:54 +08:00
|
|
|
pass_header = 1;
|
2012-10-26 23:53:55 +08:00
|
|
|
if (ntohl(header.hdr_entries) < unpack_limit)
|
|
|
|
do_keep = 0;
|
|
|
|
else
|
|
|
|
do_keep = 1;
|
|
|
|
}
|
|
|
|
|
2013-05-26 09:16:15 +08:00
|
|
|
if (alternate_shallow_file) {
|
2020-07-29 04:24:53 +08:00
|
|
|
strvec_push(&cmd.args, "--shallow-file");
|
|
|
|
strvec_push(&cmd.args, alternate_shallow_file);
|
2013-05-26 09:16:15 +08:00
|
|
|
}
|
|
|
|
|
2024-06-19 12:07:32 +08:00
|
|
|
fsck_objects = fetch_pack_fsck_objects();
|
2021-02-23 03:20:09 +08:00
|
|
|
|
|
|
|
if (do_keep || args->from_promisor || index_pack_args || fsck_objects) {
|
|
|
|
if (pack_lockfiles || fsck_objects)
|
2012-10-26 23:53:55 +08:00
|
|
|
cmd.out = -1;
|
2015-09-25 05:07:54 +08:00
|
|
|
cmd_name = "index-pack";
|
2020-07-29 04:24:53 +08:00
|
|
|
strvec_push(&cmd.args, cmd_name);
|
|
|
|
strvec_push(&cmd.args, "--stdin");
|
2012-10-26 23:53:55 +08:00
|
|
|
if (!args->quiet && !args->no_progress)
|
2020-07-29 04:24:53 +08:00
|
|
|
strvec_push(&cmd.args, "-v");
|
2012-10-26 23:53:55 +08:00
|
|
|
if (args->use_thin_pack)
|
2020-07-29 04:24:53 +08:00
|
|
|
strvec_push(&cmd.args, "--fix-thin");
|
2022-05-17 04:11:01 +08:00
|
|
|
if ((do_keep || index_pack_args) && (args->lock_pack || unpack_limit))
|
|
|
|
add_index_pack_keep_option(&cmd.args);
|
2021-02-23 03:20:08 +08:00
|
|
|
if (!index_pack_args && args->check_self_contained_and_connected)
|
2020-07-29 04:24:53 +08:00
|
|
|
strvec_push(&cmd.args, "--check-self-contained-and-connected");
|
2020-06-11 04:57:23 +08:00
|
|
|
else
|
|
|
|
/*
|
|
|
|
* We cannot perform any connectivity checks because
|
|
|
|
* not all packs have been downloaded; let the caller
|
|
|
|
* have this responsibility.
|
|
|
|
*/
|
|
|
|
args->check_self_contained_and_connected = 0;
|
fetch-pack: in partial clone, pass --promisor
When fetching a pack from a promisor remote, the corresponding .promisor
file needs to be created. "fetch-pack" originally did this by passing
"--promisor" to "index-pack", but in 5374a290aa ("fetch-pack: write
fetched refs to .promisor", 2019-10-16), "fetch-pack" was taught to do
this itself instead, because it needed to store ref information in the
.promisor file.
This causes a problem with superprojects when transfer.fsckobjects is
set, because in the current implementation, it is "index-pack" that
calls fsck_finish() to check the objects; before 5374a290aa,
fsck_finish() would see that .gitmodules is a promisor object and
tolerate it being missing, but after, there is no .promisor file (at the
time of the invocation of fsck_finish() by "index-pack") to tell it that
.gitmodules is a promisor object, so it returns an error.
Therefore, teach "fetch-pack" to pass "--promisor" to index pack once
again. "fetch-pack" will subsequently overwrite this file with the ref
information.
An alternative is to instead move object checking to "fetch-pack", and
let "index-pack" only index the files. However, since "index-pack" has
to inflate objects in order to index them, it seems reasonable to also
let it check the objects (which also require inflated files).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-21 01:51:16 +08:00
|
|
|
|
|
|
|
if (args->from_promisor)
|
|
|
|
/*
|
2021-01-12 16:21:58 +08:00
|
|
|
* create_promisor_file() may be called afterwards but
|
fetch-pack: in partial clone, pass --promisor
When fetching a pack from a promisor remote, the corresponding .promisor
file needs to be created. "fetch-pack" originally did this by passing
"--promisor" to "index-pack", but in 5374a290aa ("fetch-pack: write
fetched refs to .promisor", 2019-10-16), "fetch-pack" was taught to do
this itself instead, because it needed to store ref information in the
.promisor file.
This causes a problem with superprojects when transfer.fsckobjects is
set, because in the current implementation, it is "index-pack" that
calls fsck_finish() to check the objects; before 5374a290aa,
fsck_finish() would see that .gitmodules is a promisor object and
tolerate it being missing, but after, there is no .promisor file (at the
time of the invocation of fsck_finish() by "index-pack") to tell it that
.gitmodules is a promisor object, so it returns an error.
Therefore, teach "fetch-pack" to pass "--promisor" to index pack once
again. "fetch-pack" will subsequently overwrite this file with the ref
information.
An alternative is to instead move object checking to "fetch-pack", and
let "index-pack" only index the files. However, since "index-pack" has
to inflate objects in order to index them, it seems reasonable to also
let it check the objects (which also require inflated files).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-21 01:51:16 +08:00
|
|
|
* we still need index-pack to know that this is a
|
|
|
|
* promisor pack. For example, if transfer.fsckobjects
|
|
|
|
* is true, index-pack needs to know that .gitmodules
|
|
|
|
* is a promisor object (so that it won't complain if
|
|
|
|
* it is missing).
|
|
|
|
*/
|
2020-07-29 04:24:53 +08:00
|
|
|
strvec_push(&cmd.args, "--promisor");
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
|
|
|
else {
|
2015-09-25 05:07:54 +08:00
|
|
|
cmd_name = "unpack-objects";
|
2020-07-29 04:24:53 +08:00
|
|
|
strvec_push(&cmd.args, cmd_name);
|
2012-10-26 23:53:55 +08:00
|
|
|
if (args->quiet || args->no_progress)
|
2020-07-29 04:24:53 +08:00
|
|
|
strvec_push(&cmd.args, "-q");
|
2013-05-26 09:16:17 +08:00
|
|
|
args->check_self_contained_and_connected = 0;
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
2015-09-25 05:07:54 +08:00
|
|
|
|
|
|
|
if (pass_header)
|
2020-07-29 04:24:53 +08:00
|
|
|
strvec_pushf(&cmd.args, "--pack_header=%"PRIu32",%"PRIu32,
|
strvec: fix indentation in renamed calls
Code which split an argv_array call across multiple lines, like:
argv_array_pushl(&args, "one argument",
"another argument", "and more",
NULL);
was recently mechanically renamed to use strvec, which results in
mis-matched indentation like:
strvec_pushl(&args, "one argument",
"another argument", "and more",
NULL);
Let's fix these up to align the arguments with the opening paren. I did
this manually by sifting through the results of:
git jump grep 'strvec_.*,$'
and liberally applying my editor's auto-format. Most of the changes are
of the form shown above, though I also normalized a few that had
originally used a single-tab indentation (rather than our usual style of
aligning with the open paren). I also rewrapped a couple of obvious
cases (e.g., where previously too-long lines became short enough to fit
on one), but I wasn't aggressive about it. In cases broken to three or
more lines, the grouping of arguments is sometimes meaningful, and it
wasn't worth my time or reviewer time to ponder each case individually.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-29 04:26:31 +08:00
|
|
|
ntohl(header.hdr_version),
|
2015-09-25 05:07:54 +08:00
|
|
|
ntohl(header.hdr_entries));
|
2021-02-23 03:20:09 +08:00
|
|
|
if (fsck_objects) {
|
2021-02-23 03:20:08 +08:00
|
|
|
if (args->from_promisor || index_pack_args)
|
2018-03-15 02:42:41 +08:00
|
|
|
/*
|
|
|
|
* We cannot use --strict in index-pack because it
|
|
|
|
* checks both broken objects and links, but we only
|
|
|
|
* want to check for broken objects.
|
|
|
|
*/
|
2020-07-29 04:24:53 +08:00
|
|
|
strvec_push(&cmd.args, "--fsck-objects");
|
2018-03-15 02:42:41 +08:00
|
|
|
else
|
2020-07-29 04:24:53 +08:00
|
|
|
strvec_pushf(&cmd.args, "--strict%s",
|
strvec: fix indentation in renamed calls
Code which split an argv_array call across multiple lines, like:
argv_array_pushl(&args, "one argument",
"another argument", "and more",
NULL);
was recently mechanically renamed to use strvec, which results in
mis-matched indentation like:
strvec_pushl(&args, "one argument",
"another argument", "and more",
NULL);
Let's fix these up to align the arguments with the opening paren. I did
this manually by sifting through the results of:
git jump grep 'strvec_.*,$'
and liberally applying my editor's auto-format. Most of the changes are
of the form shown above, though I also normalized a few that had
originally used a single-tab indentation (rather than our usual style of
aligning with the open paren). I also rewrapped a couple of obvious
cases (e.g., where previously too-long lines became short enough to fit
on one), but I wasn't aggressive about it. In cases broken to three or
more lines, the grouping of arguments is sometimes meaningful, and it
wasn't worth my time or reviewer time to ponder each case individually.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-29 04:26:31 +08:00
|
|
|
fsck_msg_types.buf);
|
2018-03-15 02:42:41 +08:00
|
|
|
}
|
2012-10-26 23:53:55 +08:00
|
|
|
|
2021-02-23 03:20:08 +08:00
|
|
|
if (index_pack_args) {
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < cmd.args.nr; i++)
|
|
|
|
strvec_push(index_pack_args, cmd.args.v[i]);
|
|
|
|
}
|
|
|
|
|
fetch-pack: ignore SIGPIPE when writing to index-pack
When fetching, we send the incoming pack to index-pack (or
unpack-objects) via the sideband demuxer. If index-pack hits an error
(e.g., because an object fails fsck), then it will die immediately. This
may cause us to get SIGPIPE on the fetch, as we're still trying to write
pack contents from the sideband demuxer (which is typically a thread,
and thus takes down the whole fetch process).
You can see this in action with:
./t5702-protocol-v2.sh --stress --run=59
which ends with (wrapped for readability):
test_must_fail: died by signal 13: git -c protocol.version=2 \
-c transfer.fsckobjects=1 -c fetch.uriprotocols=http,https \
clone http://127.0.0.1:5708/smart/http_parent http_child
not ok 59 - packfile-uri with transfer.fsckobjects fails on bad object
This is mostly cosmetic. The actual error of interest (in this case, the
object that failed the fsck check) comes from index-pack straight to
stderr, so the user still sees it. They _might_ even see fetch-pack
complaining about index-pack failing, because the main thread is racing
with the sideband-demuxer. But they'll definitely see the signal death
in the exit code, which is what the test is complaining about.
We can make this more predictable by just ignoring SIGPIPE. The sideband
demuxer uses write_or_die(), so it will notice and stop (gracefully,
because we hook die_routine() to exit just the thread). And during this
section we're not writing anywhere else where we'd be concerned about
SIGPIPE preventing us from wasting effort writing to nowhere.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-20 04:58:55 +08:00
|
|
|
sigchain_push(SIGPIPE, SIG_IGN);
|
|
|
|
|
2012-10-26 23:53:55 +08:00
|
|
|
cmd.in = demux.out;
|
|
|
|
cmd.git_cmd = 1;
|
|
|
|
if (start_command(&cmd))
|
2016-06-12 18:53:55 +08:00
|
|
|
die(_("fetch-pack: unable to fork off %s"), cmd_name);
|
2021-02-23 03:20:09 +08:00
|
|
|
if (do_keep && (pack_lockfiles || fsck_objects)) {
|
|
|
|
int is_well_formed;
|
|
|
|
char *pack_lockfile = index_pack_lockfile(cmd.out, &is_well_formed);
|
|
|
|
|
|
|
|
if (!is_well_formed)
|
|
|
|
die(_("fetch-pack: invalid index-pack output"));
|
fetch-pack: fix segfault when fscking without --lock-pack
The fetch-pack internals have multiple options related to creating
".keep" lock-files for the received pack:
- if args.lock_pack is set, then we tell index-pack to create a .keep
file. In the fetch-pack plumbing command, this is triggered by
passing "-k" twice.
- if the caller passes in a pack_lockfiles string list, then we use it
to record the path of the keep-file created by index-pack. We get
that name by reading the stdout of index-pack. In the fetch-pack
command, this is triggered by passing the (undocumented) --lock-pack
option; without it, we pass in a NULL string list.
So it's possible to ask index-pack to create the lock-file (using "-k
-k") but not ask to record it (by avoiding "--lock-pack"). This worked
fine until 5476e1efde (fetch-pack: print and use dangling .gitmodules,
2021-02-22), but now it causes a segfault.
Before that commit, if pack_lockfiles was NULL, we wouldn't bother
reading the output from index-pack at all. But since that commit,
index-pack may produce extra output if we asked it to fsck. So even if
nobody cares about the lockfile path, we still need to read it to skip
to the output we do care about.
We correctly check that we didn't get a NULL lockfile path (which can
happen if we did not ask it to create a .keep file at all), but we
missed the case where the lockfile path is not NULL (due to "-k -k") but
the pack_lockfiles string_list is NULL (because nobody passed
"--lock-pack"), and segfault trying to add to the NULL string-list.
We can fix this by skipping the append to the string list when either
the value or the list is NULL. In that case we must also free the
lockfile path to avoid leaking it when it's non-NULL.
Nobody noticed the bug for so long because the transport code used by
"git fetch" always passes in a pack_lockfiles pointer, and remote-curl
(the main user of the fetch-pack plumbing command) always passes
--lock-pack.
Reported-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-19 21:02:56 +08:00
|
|
|
if (pack_lockfiles && pack_lockfile)
|
2020-12-01 03:27:15 +08:00
|
|
|
string_list_append_nodup(pack_lockfiles, pack_lockfile);
|
fetch-pack: fix segfault when fscking without --lock-pack
The fetch-pack internals have multiple options related to creating
".keep" lock-files for the received pack:
- if args.lock_pack is set, then we tell index-pack to create a .keep
file. In the fetch-pack plumbing command, this is triggered by
passing "-k" twice.
- if the caller passes in a pack_lockfiles string list, then we use it
to record the path of the keep-file created by index-pack. We get
that name by reading the stdout of index-pack. In the fetch-pack
command, this is triggered by passing the (undocumented) --lock-pack
option; without it, we pass in a NULL string list.
So it's possible to ask index-pack to create the lock-file (using "-k
-k") but not ask to record it (by avoiding "--lock-pack"). This worked
fine until 5476e1efde (fetch-pack: print and use dangling .gitmodules,
2021-02-22), but now it causes a segfault.
Before that commit, if pack_lockfiles was NULL, we wouldn't bother
reading the output from index-pack at all. But since that commit,
index-pack may produce extra output if we asked it to fsck. So even if
nobody cares about the lockfile path, we still need to read it to skip
to the output we do care about.
We correctly check that we didn't get a NULL lockfile path (which can
happen if we did not ask it to create a .keep file at all), but we
missed the case where the lockfile path is not NULL (due to "-k -k") but
the pack_lockfiles string_list is NULL (because nobody passed
"--lock-pack"), and segfault trying to add to the NULL string-list.
We can fix this by skipping the append to the string list when either
the value or the list is NULL. In that case we must also free the
lockfile path to avoid leaking it when it's non-NULL.
Nobody noticed the bug for so long because the transport code used by
"git fetch" always passes in a pack_lockfiles pointer, and remote-curl
(the main user of the fetch-pack plumbing command) always passes
--lock-pack.
Reported-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-19 21:02:56 +08:00
|
|
|
else
|
|
|
|
free(pack_lockfile);
|
2021-02-23 03:20:09 +08:00
|
|
|
parse_gitmodules_oids(cmd.out, gitmodules_oids);
|
2012-10-26 23:53:55 +08:00
|
|
|
close(cmd.out);
|
|
|
|
}
|
|
|
|
|
2013-10-22 21:36:02 +08:00
|
|
|
if (!use_sideband)
|
|
|
|
/* Closed by start_command() */
|
|
|
|
xd[0] = -1;
|
|
|
|
|
2013-05-26 09:16:17 +08:00
|
|
|
ret = finish_command(&cmd);
|
|
|
|
if (!ret || (args->check_self_contained_and_connected && ret == 1))
|
|
|
|
args->self_contained_and_connected =
|
|
|
|
args->check_self_contained_and_connected &&
|
|
|
|
ret == 0;
|
|
|
|
else
|
2016-06-12 18:53:55 +08:00
|
|
|
die(_("%s failed"), cmd_name);
|
2012-10-26 23:53:55 +08:00
|
|
|
if (use_sideband && finish_async(&demux))
|
2016-06-12 18:53:55 +08:00
|
|
|
die(_("error in sideband demultiplexer"));
|
2019-10-15 08:12:31 +08:00
|
|
|
|
fetch-pack: ignore SIGPIPE when writing to index-pack
When fetching, we send the incoming pack to index-pack (or
unpack-objects) via the sideband demuxer. If index-pack hits an error
(e.g., because an object fails fsck), then it will die immediately. This
may cause us to get SIGPIPE on the fetch, as we're still trying to write
pack contents from the sideband demuxer (which is typically a thread,
and thus takes down the whole fetch process).
You can see this in action with:
./t5702-protocol-v2.sh --stress --run=59
which ends with (wrapped for readability):
test_must_fail: died by signal 13: git -c protocol.version=2 \
-c transfer.fsckobjects=1 -c fetch.uriprotocols=http,https \
clone http://127.0.0.1:5708/smart/http_parent http_child
not ok 59 - packfile-uri with transfer.fsckobjects fails on bad object
This is mostly cosmetic. The actual error of interest (in this case, the
object that failed the fsck check) comes from index-pack straight to
stderr, so the user still sees it. They _might_ even see fetch-pack
complaining about index-pack failing, because the main thread is racing
with the sideband-demuxer. But they'll definitely see the signal death
in the exit code, which is what the test is complaining about.
We can make this more predictable by just ignoring SIGPIPE. The sideband
demuxer uses write_or_die(), so it will notice and stop (gracefully,
because we hook die_routine() to exit just the thread). And during this
section we're not writing anywhere else where we'd be concerned about
SIGPIPE preventing us from wasting effort writing to nowhere.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-20 04:58:55 +08:00
|
|
|
sigchain_pop(SIGPIPE);
|
|
|
|
|
2019-10-15 08:12:31 +08:00
|
|
|
/*
|
|
|
|
* Now that index-pack has succeeded, write the promisor file using the
|
|
|
|
* obtained .keep filename if necessary
|
|
|
|
*/
|
fetch-pack: support more than one pack lockfile
Whenever a fetch results in a packfile being downloaded, a .keep file is
generated, so that the packfile can be preserved (from, say, a running
"git repack") until refs are written referring to the contents of the
packfile.
In a subsequent patch, a successful fetch using protocol v2 may result
in more than one .keep file being generated. Therefore, teach
fetch_pack() and the transport mechanism to support multiple .keep
files.
Implementation notes:
- builtin/fetch-pack.c normally does not generate .keep files, and thus
is unaffected by this or future changes. However, it has an
undocumented "--lock-pack" feature, used by remote-curl.c when
implementing the "fetch" remote helper command. In keeping with the
remote helper protocol, only one "lock" line will ever be written;
the rest will result in warnings to stderr. However, in practice,
warnings will never be written because the remote-curl.c "fetch" is
only used for protocol v0/v1 (which will not generate multiple .keep
files). (Protocol v2 uses the "stateless-connect" command, not the
"fetch" command.)
- connected.c has an optimization in that connectivity checks on a ref
need not be done if the target object is in a pack known to be
self-contained and connected. If there are multiple packfiles, this
optimization can no longer be done.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 04:57:22 +08:00
|
|
|
if (do_keep && pack_lockfiles && pack_lockfiles->nr && args->from_promisor)
|
2021-01-12 16:21:58 +08:00
|
|
|
create_promisor_file(pack_lockfiles->items[0].string, sought, nr_sought);
|
2019-10-15 08:12:31 +08:00
|
|
|
|
2012-10-26 23:53:55 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-07-17 00:59:59 +08:00
|
|
|
static int ref_compare_name(const struct ref *a, const struct ref *b)
|
|
|
|
{
|
|
|
|
return strcmp(a->name, b->name);
|
|
|
|
}
|
|
|
|
|
|
|
|
DEFINE_LIST_SORT(static, sort_ref_list, struct ref, next);
|
|
|
|
|
2013-01-30 06:02:15 +08:00
|
|
|
static int cmp_ref_by_name(const void *a_, const void *b_)
|
|
|
|
{
|
|
|
|
const struct ref *a = *((const struct ref **)a_);
|
|
|
|
const struct ref *b = *((const struct ref **)b_);
|
|
|
|
return strcmp(a->name, b->name);
|
|
|
|
}
|
|
|
|
|
2012-10-26 23:53:55 +08:00
|
|
|
static struct ref *do_fetch_pack(struct fetch_pack_args *args,
|
|
|
|
int fd[2],
|
|
|
|
const struct ref *orig_ref,
|
2013-01-30 06:02:15 +08:00
|
|
|
struct ref **sought, int nr_sought,
|
2013-12-05 21:02:39 +08:00
|
|
|
struct shallow_info *si,
|
fetch-pack: support more than one pack lockfile
Whenever a fetch results in a packfile being downloaded, a .keep file is
generated, so that the packfile can be preserved (from, say, a running
"git repack") until refs are written referring to the contents of the
packfile.
In a subsequent patch, a successful fetch using protocol v2 may result
in more than one .keep file being generated. Therefore, teach
fetch_pack() and the transport mechanism to support multiple .keep
files.
Implementation notes:
- builtin/fetch-pack.c normally does not generate .keep files, and thus
is unaffected by this or future changes. However, it has an
undocumented "--lock-pack" feature, used by remote-curl.c when
implementing the "fetch" remote helper command. In keeping with the
remote helper protocol, only one "lock" line will ever be written;
the rest will result in warnings to stderr. However, in practice,
warnings will never be written because the remote-curl.c "fetch" is
only used for protocol v0/v1 (which will not generate multiple .keep
files). (Protocol v2 uses the "stateless-connect" command, not the
"fetch" command.)
- connected.c has an optimization in that connectivity checks on a ref
need not be done if the target object is in a pack known to be
self-contained and connected. If there are multiple packfiles, this
optimization can no longer be done.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 04:57:22 +08:00
|
|
|
struct string_list *pack_lockfiles)
|
2012-10-26 23:53:55 +08:00
|
|
|
{
|
2019-08-14 02:37:48 +08:00
|
|
|
struct repository *r = the_repository;
|
2012-10-26 23:53:55 +08:00
|
|
|
struct ref *ref = copy_ref_list(orig_ref);
|
2017-05-01 10:28:54 +08:00
|
|
|
struct object_id oid;
|
2012-10-26 23:53:55 +08:00
|
|
|
const char *agent_feature;
|
2023-04-15 05:25:20 +08:00
|
|
|
size_t agent_len;
|
promisor-remote: remove fetch_if_missing=0
Commit 6462d5eb9a ("fetch: remove fetch_if_missing=0", 2019-11-08)
strove to remove the need for fetch_if_missing=0 from the fetching
mechanism, so it is plausible to attempt removing fetch_if_missing=0
from the lazy-fetching mechanism in promisor-remote as well.
But doing so reveals a bug - when the server does not send an object
pointed to by a tag object, an infinite loop occurs: Git attempts to
fetch the missing object, which causes a deferencing of all refs (for
negotiation), which causes a lazy fetch of that missing object, and so
on. This bug is because of unnecessary use of the fetch negotiator
during lazy fetching - it is not used after initialization, but it is
still initialized (which causes the dereferencing of all refs).
Thus, when the negotiator is not used during fetching, refrain from
initializing it. Then, remove fetch_if_missing from promisor-remote.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-13 08:34:20 +08:00
|
|
|
struct fetch_negotiator negotiator_alloc;
|
|
|
|
struct fetch_negotiator *negotiator;
|
|
|
|
|
2020-08-18 12:01:37 +08:00
|
|
|
negotiator = &negotiator_alloc;
|
2022-03-28 22:02:06 +08:00
|
|
|
if (args->refetch) {
|
|
|
|
fetch_negotiator_init_noop(negotiator);
|
|
|
|
} else {
|
|
|
|
fetch_negotiator_init(r, negotiator);
|
|
|
|
}
|
2012-10-26 23:53:55 +08:00
|
|
|
|
|
|
|
sort_ref_list(&ref, ref_compare_name);
|
2016-09-29 23:27:31 +08:00
|
|
|
QSORT(sought, nr_sought, cmp_ref_by_name);
|
2012-10-26 23:53:55 +08:00
|
|
|
|
2019-06-20 19:59:51 +08:00
|
|
|
if ((agent_feature = server_feature_value("agent", &agent_len))) {
|
|
|
|
agent_supported = 1;
|
|
|
|
if (agent_len)
|
|
|
|
print_verbose(args, _("Server version is %.*s"),
|
2023-04-15 05:25:20 +08:00
|
|
|
(int)agent_len, agent_feature);
|
2019-06-20 19:59:51 +08:00
|
|
|
}
|
|
|
|
|
2020-11-12 07:29:31 +08:00
|
|
|
if (!server_supports("session-id"))
|
|
|
|
advertise_sid = 0;
|
|
|
|
|
2019-06-20 19:59:50 +08:00
|
|
|
if (server_supports("shallow"))
|
|
|
|
print_verbose(args, _("Server supports %s"), "shallow");
|
2019-08-14 02:37:48 +08:00
|
|
|
else if (args->depth > 0 || is_repository_shallow(r))
|
2016-06-12 18:53:55 +08:00
|
|
|
die(_("Server does not support shallow clients"));
|
2016-06-12 18:54:04 +08:00
|
|
|
if (args->depth > 0 || args->deepen_since || args->deepen_not)
|
2016-06-12 18:53:56 +08:00
|
|
|
args->deepen = 1;
|
2012-10-26 23:53:55 +08:00
|
|
|
if (server_supports("multi_ack_detailed")) {
|
2019-06-20 19:59:49 +08:00
|
|
|
print_verbose(args, _("Server supports %s"), "multi_ack_detailed");
|
2012-10-26 23:53:55 +08:00
|
|
|
multi_ack = 2;
|
|
|
|
if (server_supports("no-done")) {
|
2019-06-20 19:59:49 +08:00
|
|
|
print_verbose(args, _("Server supports %s"), "no-done");
|
2012-10-26 23:53:55 +08:00
|
|
|
if (args->stateless_rpc)
|
|
|
|
no_done = 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
else if (server_supports("multi_ack")) {
|
2019-06-20 19:59:49 +08:00
|
|
|
print_verbose(args, _("Server supports %s"), "multi_ack");
|
2012-10-26 23:53:55 +08:00
|
|
|
multi_ack = 1;
|
|
|
|
}
|
|
|
|
if (server_supports("side-band-64k")) {
|
2019-06-20 19:59:49 +08:00
|
|
|
print_verbose(args, _("Server supports %s"), "side-band-64k");
|
2012-10-26 23:53:55 +08:00
|
|
|
use_sideband = 2;
|
|
|
|
}
|
|
|
|
else if (server_supports("side-band")) {
|
2019-06-20 19:59:49 +08:00
|
|
|
print_verbose(args, _("Server supports %s"), "side-band");
|
2012-10-26 23:53:55 +08:00
|
|
|
use_sideband = 1;
|
|
|
|
}
|
2013-01-30 06:02:15 +08:00
|
|
|
if (server_supports("allow-tip-sha1-in-want")) {
|
2019-06-20 19:59:49 +08:00
|
|
|
print_verbose(args, _("Server supports %s"), "allow-tip-sha1-in-want");
|
2015-05-22 04:23:38 +08:00
|
|
|
allow_unadvertised_object_request |= ALLOW_TIP_SHA1;
|
2013-01-30 06:02:15 +08:00
|
|
|
}
|
2015-05-22 04:23:39 +08:00
|
|
|
if (server_supports("allow-reachable-sha1-in-want")) {
|
2019-06-20 19:59:49 +08:00
|
|
|
print_verbose(args, _("Server supports %s"), "allow-reachable-sha1-in-want");
|
2015-05-22 04:23:39 +08:00
|
|
|
allow_unadvertised_object_request |= ALLOW_REACHABLE_SHA1;
|
|
|
|
}
|
2019-06-20 19:59:50 +08:00
|
|
|
if (server_supports("thin-pack"))
|
|
|
|
print_verbose(args, _("Server supports %s"), "thin-pack");
|
|
|
|
else
|
2012-10-26 23:53:55 +08:00
|
|
|
args->use_thin_pack = 0;
|
2019-06-20 19:59:50 +08:00
|
|
|
if (server_supports("no-progress"))
|
|
|
|
print_verbose(args, _("Server supports %s"), "no-progress");
|
|
|
|
else
|
2012-10-26 23:53:55 +08:00
|
|
|
args->no_progress = 0;
|
2019-06-20 19:59:50 +08:00
|
|
|
if (server_supports("include-tag"))
|
|
|
|
print_verbose(args, _("Server supports %s"), "include-tag");
|
|
|
|
else
|
2012-10-26 23:53:55 +08:00
|
|
|
args->include_tag = 0;
|
2016-06-12 18:53:54 +08:00
|
|
|
if (server_supports("ofs-delta"))
|
2019-06-20 19:59:49 +08:00
|
|
|
print_verbose(args, _("Server supports %s"), "ofs-delta");
|
2016-06-12 18:53:54 +08:00
|
|
|
else
|
2012-10-26 23:53:55 +08:00
|
|
|
prefer_ofs_delta = 0;
|
|
|
|
|
2017-12-08 23:58:40 +08:00
|
|
|
if (server_supports("filter")) {
|
|
|
|
server_supports_filtering = 1;
|
2019-06-20 19:59:49 +08:00
|
|
|
print_verbose(args, _("Server supports %s"), "filter");
|
2017-12-08 23:58:40 +08:00
|
|
|
} else if (args->filter_options.choice) {
|
|
|
|
warning("filtering not recognized by server, ignoring");
|
|
|
|
}
|
|
|
|
|
2019-06-20 19:59:50 +08:00
|
|
|
if (server_supports("deepen-since")) {
|
|
|
|
print_verbose(args, _("Server supports %s"), "deepen-since");
|
2016-06-12 18:53:59 +08:00
|
|
|
deepen_since_ok = 1;
|
2019-06-20 19:59:50 +08:00
|
|
|
} else if (args->deepen_since)
|
2016-06-12 18:53:59 +08:00
|
|
|
die(_("Server does not support --shallow-since"));
|
2019-06-20 19:59:50 +08:00
|
|
|
if (server_supports("deepen-not")) {
|
|
|
|
print_verbose(args, _("Server supports %s"), "deepen-not");
|
2016-06-12 18:54:04 +08:00
|
|
|
deepen_not_ok = 1;
|
2019-06-20 19:59:50 +08:00
|
|
|
} else if (args->deepen_not)
|
2016-06-12 18:54:04 +08:00
|
|
|
die(_("Server does not support --shallow-exclude"));
|
2019-06-20 19:59:50 +08:00
|
|
|
if (server_supports("deepen-relative"))
|
|
|
|
print_verbose(args, _("Server supports %s"), "deepen-relative");
|
|
|
|
else if (args->deepen_relative)
|
fetch, upload-pack: --deepen=N extends shallow boundary by N commits
In git-fetch, --depth argument is always relative with the latest
remote refs. This makes it a bit difficult to cover this use case,
where the user wants to make the shallow history, say 3 levels
deeper. It would work if remote refs have not moved yet, but nobody
can guarantee that, especially when that use case is performed a
couple months after the last clone or "git fetch --depth". Also,
modifying shallow boundary using --depth does not work well with
clones created by --since or --not.
This patch fixes that. A new argument --deepen=<N> will add <N> more (*)
parent commits to the current history regardless of where remote refs
are.
Have/Want negotiation is still respected. So if remote refs move, the
server will send two chunks: one between "have" and "want" and another
to extend shallow history. In theory, the client could send no "want"s
in order to get the second chunk only. But the protocol does not allow
that. Either you send no want lines, which means ls-remote; or you
have to send at least one want line that carries deep-relative to the
server..
The main work was done by Dongcan Jiang. I fixed it up here and there.
And of course all the bugs belong to me.
(*) We could even support --deepen=<N> where <N> is negative. In that
case we can cut some history from the shallow clone. This operation
(and --depth=<shorter depth>) does not require interaction with remote
side (and more complicated to implement as a result).
Helped-by: Duy Nguyen <pclouds@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Dongcan Jiang <dongcan.jiang@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-12 18:54:09 +08:00
|
|
|
die(_("Server does not support --deepen"));
|
2020-05-26 03:58:59 +08:00
|
|
|
if (!server_supports_hash(the_hash_algo->name, NULL))
|
|
|
|
die(_("Server does not support this repository's object format"));
|
2012-10-26 23:53:55 +08:00
|
|
|
|
2020-08-18 12:01:37 +08:00
|
|
|
mark_complete_and_common_ref(negotiator, args, &ref);
|
|
|
|
filter_refs(args, &ref, sought, nr_sought);
|
2022-03-28 22:02:06 +08:00
|
|
|
if (!args->refetch && everything_local(args, &ref)) {
|
2020-08-18 12:01:37 +08:00
|
|
|
packet_flush(fd[1]);
|
|
|
|
goto all_done;
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
promisor-remote: remove fetch_if_missing=0
Commit 6462d5eb9a ("fetch: remove fetch_if_missing=0", 2019-11-08)
strove to remove the need for fetch_if_missing=0 from the fetching
mechanism, so it is plausible to attempt removing fetch_if_missing=0
from the lazy-fetching mechanism in promisor-remote as well.
But doing so reveals a bug - when the server does not send an object
pointed to by a tag object, an infinite loop occurs: Git attempts to
fetch the missing object, which causes a deferencing of all refs (for
negotiation), which causes a lazy fetch of that missing object, and so
on. This bug is because of unnecessary use of the fetch negotiator
during lazy fetching - it is not used after initialization, but it is
still initialized (which causes the dereferencing of all refs).
Thus, when the negotiator is not used during fetching, refrain from
initializing it. Then, remove fetch_if_missing from promisor-remote.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-13 08:34:20 +08:00
|
|
|
if (find_common(negotiator, args, fd, &oid, ref) < 0)
|
2012-10-26 23:53:55 +08:00
|
|
|
if (!args->keep_pack)
|
|
|
|
/* When cloning, it is not unusual to have
|
|
|
|
* no common commit.
|
|
|
|
*/
|
2016-06-12 18:53:55 +08:00
|
|
|
warning(_("no common commits"));
|
2012-10-26 23:53:55 +08:00
|
|
|
|
|
|
|
if (args->stateless_rpc)
|
|
|
|
packet_flush(fd[1]);
|
2016-06-12 18:53:56 +08:00
|
|
|
if (args->deepen)
|
2013-12-05 21:02:34 +08:00
|
|
|
setup_alternate_shallow(&shallow_lock, &alternate_shallow_file,
|
|
|
|
NULL);
|
2021-04-01 18:46:59 +08:00
|
|
|
else if (si->nr_ours || si->nr_theirs) {
|
|
|
|
if (args->reject_shallow_remote)
|
|
|
|
die(_("source repository is shallow, reject to clone."));
|
2013-12-05 21:02:39 +08:00
|
|
|
alternate_shallow_file = setup_temporary_shallow(si->shallow);
|
2021-04-01 18:46:59 +08:00
|
|
|
} else
|
2013-08-26 10:17:26 +08:00
|
|
|
alternate_shallow_file = NULL;
|
2021-02-23 03:20:09 +08:00
|
|
|
if (get_pack(args, fd, pack_lockfiles, NULL, sought, nr_sought,
|
2021-03-28 21:15:51 +08:00
|
|
|
&fsck_options.gitmodules_found))
|
2016-06-12 18:53:55 +08:00
|
|
|
die(_("git fetch-pack: fetch failed."));
|
2021-03-28 21:15:51 +08:00
|
|
|
if (fsck_finish(&fsck_options))
|
|
|
|
die("fsck failed");
|
2012-10-26 23:53:55 +08:00
|
|
|
|
|
|
|
all_done:
|
promisor-remote: remove fetch_if_missing=0
Commit 6462d5eb9a ("fetch: remove fetch_if_missing=0", 2019-11-08)
strove to remove the need for fetch_if_missing=0 from the fetching
mechanism, so it is plausible to attempt removing fetch_if_missing=0
from the lazy-fetching mechanism in promisor-remote as well.
But doing so reveals a bug - when the server does not send an object
pointed to by a tag object, an infinite loop occurs: Git attempts to
fetch the missing object, which causes a deferencing of all refs (for
negotiation), which causes a lazy fetch of that missing object, and so
on. This bug is because of unnecessary use of the fetch negotiator
during lazy fetching - it is not used after initialization, but it is
still initialized (which causes the dereferencing of all refs).
Thus, when the negotiator is not used during fetching, refrain from
initializing it. Then, remove fetch_if_missing from promisor-remote.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-13 08:34:20 +08:00
|
|
|
if (negotiator)
|
|
|
|
negotiator->release(negotiator);
|
2012-10-26 23:53:55 +08:00
|
|
|
return ref;
|
|
|
|
}
|
|
|
|
|
2018-03-16 01:31:29 +08:00
|
|
|
static void add_shallow_requests(struct strbuf *req_buf,
|
|
|
|
const struct fetch_pack_args *args)
|
|
|
|
{
|
2018-07-19 03:20:27 +08:00
|
|
|
if (is_repository_shallow(the_repository))
|
2018-03-16 01:31:29 +08:00
|
|
|
write_shallow_commits(req_buf, 1, NULL);
|
|
|
|
if (args->depth > 0)
|
|
|
|
packet_buf_write(req_buf, "deepen %d", args->depth);
|
|
|
|
if (args->deepen_since) {
|
|
|
|
timestamp_t max_age = approxidate(args->deepen_since);
|
|
|
|
packet_buf_write(req_buf, "deepen-since %"PRItime, max_age);
|
|
|
|
}
|
|
|
|
if (args->deepen_not) {
|
|
|
|
int i;
|
|
|
|
for (i = 0; i < args->deepen_not->nr; i++) {
|
|
|
|
struct string_list_item *s = args->deepen_not->items + i;
|
|
|
|
packet_buf_write(req_buf, "deepen-not %s", s->string);
|
|
|
|
}
|
|
|
|
}
|
2018-12-19 05:24:35 +08:00
|
|
|
if (args->deepen_relative)
|
|
|
|
packet_buf_write(req_buf, "deepen-relative\n");
|
2018-03-16 01:31:29 +08:00
|
|
|
}
|
|
|
|
|
2020-08-18 12:01:37 +08:00
|
|
|
static void add_wants(const struct ref *wants, struct strbuf *req_buf)
|
2018-03-16 01:31:28 +08:00
|
|
|
{
|
2018-06-28 06:30:23 +08:00
|
|
|
int use_ref_in_want = server_supports_feature("fetch", "ref-in-want", 0);
|
|
|
|
|
2018-03-16 01:31:28 +08:00
|
|
|
for ( ; wants ; wants = wants->next) {
|
|
|
|
const struct object_id *remote = &wants->old_oid;
|
|
|
|
struct object *o;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If that object is complete (i.e. it is an ancestor of a
|
|
|
|
* local ref), we tell them we have it but do not have to
|
|
|
|
* tell them about its ancestors, which they already know
|
|
|
|
* about.
|
|
|
|
*
|
|
|
|
* We use lookup_object here because we are only
|
|
|
|
* interested in the case we *know* the object is
|
|
|
|
* reachable and we have already scanned it.
|
|
|
|
*/
|
2020-08-18 12:01:37 +08:00
|
|
|
if (((o = lookup_object(the_repository, remote)) != NULL) &&
|
2018-03-16 01:31:28 +08:00
|
|
|
(o->flags & COMPLETE)) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2018-06-28 06:30:23 +08:00
|
|
|
if (!use_ref_in_want || wants->exact_oid)
|
|
|
|
packet_buf_write(req_buf, "want %s\n", oid_to_hex(remote));
|
|
|
|
else
|
|
|
|
packet_buf_write(req_buf, "want-ref %s\n", wants->name);
|
2018-03-16 01:31:28 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void add_common(struct strbuf *req_buf, struct oidset *common)
|
|
|
|
{
|
|
|
|
struct oidset_iter iter;
|
|
|
|
const struct object_id *oid;
|
|
|
|
oidset_iter_init(common, &iter);
|
|
|
|
|
|
|
|
while ((oid = oidset_iter_next(&iter))) {
|
|
|
|
packet_buf_write(req_buf, "have %s\n", oid_to_hex(oid));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-06-15 06:54:28 +08:00
|
|
|
static int add_haves(struct fetch_negotiator *negotiator,
|
|
|
|
struct strbuf *req_buf,
|
2021-04-09 09:10:00 +08:00
|
|
|
int *haves_to_send)
|
2018-03-16 01:31:28 +08:00
|
|
|
{
|
|
|
|
int haves_added = 0;
|
|
|
|
const struct object_id *oid;
|
|
|
|
|
2018-06-15 06:54:28 +08:00
|
|
|
while ((oid = negotiator->next(negotiator))) {
|
2018-03-16 01:31:28 +08:00
|
|
|
packet_buf_write(req_buf, "have %s\n", oid_to_hex(oid));
|
|
|
|
if (++haves_added >= *haves_to_send)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Increase haves to send on next round */
|
|
|
|
*haves_to_send = next_flush(1, *haves_to_send);
|
|
|
|
|
2021-04-09 09:10:00 +08:00
|
|
|
return haves_added;
|
2018-03-16 01:31:28 +08:00
|
|
|
}
|
|
|
|
|
2021-04-09 09:10:01 +08:00
|
|
|
static void write_fetch_command_and_capabilities(struct strbuf *req_buf,
|
|
|
|
const struct string_list *server_options)
|
2018-03-16 01:31:28 +08:00
|
|
|
{
|
2020-05-26 03:59:07 +08:00
|
|
|
const char *hash_name;
|
2018-03-16 01:31:28 +08:00
|
|
|
|
2022-12-13 18:52:58 +08:00
|
|
|
ensure_server_supports_v2("fetch");
|
|
|
|
packet_buf_write(req_buf, "command=fetch");
|
|
|
|
if (server_supports_v2("agent"))
|
2021-04-09 09:10:01 +08:00
|
|
|
packet_buf_write(req_buf, "agent=%s", git_user_agent_sanitized());
|
2022-12-13 18:52:58 +08:00
|
|
|
if (advertise_sid && server_supports_v2("session-id"))
|
2021-04-09 09:10:01 +08:00
|
|
|
packet_buf_write(req_buf, "session-id=%s", trace2_session_id());
|
2022-12-13 18:52:58 +08:00
|
|
|
if (server_options && server_options->nr) {
|
2018-04-24 06:46:24 +08:00
|
|
|
int i;
|
2022-12-13 18:52:58 +08:00
|
|
|
ensure_server_supports_v2("server-option");
|
2021-04-09 09:10:01 +08:00
|
|
|
for (i = 0; i < server_options->nr; i++)
|
|
|
|
packet_buf_write(req_buf, "server-option=%s",
|
|
|
|
server_options->items[i].string);
|
2018-04-24 06:46:24 +08:00
|
|
|
}
|
2018-03-16 01:31:28 +08:00
|
|
|
|
2020-05-26 03:59:07 +08:00
|
|
|
if (server_feature_v2("object-format", &hash_name)) {
|
|
|
|
int hash_algo = hash_algo_by_name(hash_name);
|
|
|
|
if (hash_algo_by_ptr(the_hash_algo) != hash_algo)
|
|
|
|
die(_("mismatched algorithms: client %s; server %s"),
|
|
|
|
the_hash_algo->name, hash_name);
|
2021-04-09 09:10:01 +08:00
|
|
|
packet_buf_write(req_buf, "object-format=%s", the_hash_algo->name);
|
2020-05-26 03:59:07 +08:00
|
|
|
} else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1) {
|
|
|
|
die(_("the server does not support algorithm '%s'"),
|
|
|
|
the_hash_algo->name);
|
|
|
|
}
|
2021-04-09 09:10:01 +08:00
|
|
|
packet_buf_delim(req_buf);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
|
|
|
|
struct fetch_pack_args *args,
|
|
|
|
const struct ref *wants, struct oidset *common,
|
|
|
|
int *haves_to_send, int *in_vain,
|
|
|
|
int sideband_all, int seen_ack)
|
|
|
|
{
|
|
|
|
int haves_added;
|
|
|
|
int done_sent = 0;
|
|
|
|
struct strbuf req_buf = STRBUF_INIT;
|
|
|
|
|
|
|
|
write_fetch_command_and_capabilities(&req_buf, args->server_options);
|
2020-05-26 03:59:07 +08:00
|
|
|
|
2018-03-16 01:31:28 +08:00
|
|
|
if (args->use_thin_pack)
|
|
|
|
packet_buf_write(&req_buf, "thin-pack");
|
|
|
|
if (args->no_progress)
|
|
|
|
packet_buf_write(&req_buf, "no-progress");
|
|
|
|
if (args->include_tag)
|
|
|
|
packet_buf_write(&req_buf, "include-tag");
|
|
|
|
if (prefer_ofs_delta)
|
|
|
|
packet_buf_write(&req_buf, "ofs-delta");
|
2019-01-17 03:28:14 +08:00
|
|
|
if (sideband_all)
|
|
|
|
packet_buf_write(&req_buf, "sideband-all");
|
2018-03-16 01:31:28 +08:00
|
|
|
|
2018-03-16 01:31:29 +08:00
|
|
|
/* Add shallow-info and deepen request */
|
|
|
|
if (server_supports_feature("fetch", "shallow", 0))
|
|
|
|
add_shallow_requests(&req_buf, args);
|
2018-07-19 03:20:27 +08:00
|
|
|
else if (is_repository_shallow(the_repository) || args->deepen)
|
2018-03-16 01:31:29 +08:00
|
|
|
die(_("Server does not support shallow requests"));
|
|
|
|
|
2018-05-04 07:46:56 +08:00
|
|
|
/* Add filter */
|
2022-07-27 00:27:11 +08:00
|
|
|
send_filter(args, &req_buf,
|
|
|
|
server_supports_feature("fetch", "filter", 0));
|
2018-05-04 07:46:56 +08:00
|
|
|
|
2020-06-11 04:57:23 +08:00
|
|
|
if (server_supports_feature("fetch", "packfile-uris", 0)) {
|
|
|
|
int i;
|
|
|
|
struct strbuf to_send = STRBUF_INIT;
|
|
|
|
|
|
|
|
for (i = 0; i < uri_protocols.nr; i++) {
|
|
|
|
const char *s = uri_protocols.items[i].string;
|
|
|
|
|
|
|
|
if (!strcmp(s, "https") || !strcmp(s, "http")) {
|
|
|
|
if (to_send.len)
|
|
|
|
strbuf_addch(&to_send, ',');
|
|
|
|
strbuf_addstr(&to_send, s);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (to_send.len) {
|
|
|
|
packet_buf_write(&req_buf, "packfile-uris %s",
|
|
|
|
to_send.buf);
|
|
|
|
strbuf_release(&to_send);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-03-16 01:31:28 +08:00
|
|
|
/* add wants */
|
2020-08-18 12:01:37 +08:00
|
|
|
add_wants(wants, &req_buf);
|
2018-03-16 01:31:28 +08:00
|
|
|
|
2020-08-18 12:01:37 +08:00
|
|
|
/* Add all of the common commits we've found in previous rounds */
|
|
|
|
add_common(&req_buf, common);
|
2018-03-16 01:31:28 +08:00
|
|
|
|
2021-04-09 09:10:00 +08:00
|
|
|
haves_added = add_haves(negotiator, &req_buf, haves_to_send);
|
|
|
|
*in_vain += haves_added;
|
fetch-pack: add tracing for negotiation rounds
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 06:04:05 +08:00
|
|
|
trace2_data_intmax("negotiation_v2", the_repository, "haves_added", haves_added);
|
|
|
|
trace2_data_intmax("negotiation_v2", the_repository, "in_vain", *in_vain);
|
2021-04-09 09:10:00 +08:00
|
|
|
if (!haves_added || (seen_ack && *in_vain >= MAX_IN_VAIN)) {
|
|
|
|
/* Send Done */
|
|
|
|
packet_buf_write(&req_buf, "done\n");
|
|
|
|
done_sent = 1;
|
|
|
|
}
|
2018-03-16 01:31:28 +08:00
|
|
|
|
|
|
|
/* Send request */
|
|
|
|
packet_buf_flush(&req_buf);
|
2019-03-05 12:11:39 +08:00
|
|
|
if (write_in_full(fd_out, req_buf.buf, req_buf.len) < 0)
|
|
|
|
die_errno(_("unable to write request to remote"));
|
2018-03-16 01:31:28 +08:00
|
|
|
|
|
|
|
strbuf_release(&req_buf);
|
2021-04-09 09:10:00 +08:00
|
|
|
return done_sent;
|
2018-03-16 01:31:28 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Processes a section header in a server's response and checks if it matches
|
|
|
|
* `section`. If the value of `peek` is 1, the header line will be peeked (and
|
|
|
|
* not consumed); if 0, the line will be consumed and the function will die if
|
|
|
|
* the section header doesn't match what was expected.
|
|
|
|
*/
|
|
|
|
static int process_section_header(struct packet_reader *reader,
|
|
|
|
const char *section, int peek)
|
|
|
|
{
|
fetch-pack: make unexpected peek result non-fatal
When a Git server responds to a fetch request, it may send optional
sections before the packfile section. To handle this, the Git client
calls packet_reader_peek() (see process_section_header()) in order to
see what's next without consuming the line.
However, as implemented, Git errors out whenever what's peeked is not an
ordinary line. This is not only unexpected (here, we only need to know
whether the upcoming line is the section header we want) but causes
errors to include the name of a section header that is irrelevant to the
cause of the error. For example, at $DAYJOB, we have seen "fatal: error
reading section header 'shallow-info'" error messages when none of the
repositories involved are shallow.
Therefore, fix this so that the peek returns 1 if the upcoming line is
the wanted section header and nothing else. Because of this change,
reader->line may now be NULL later in the function, so update the error
message printing code accordingly.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-16 19:02:20 +08:00
|
|
|
int ret = 0;
|
2018-03-16 01:31:28 +08:00
|
|
|
|
fetch-pack: make unexpected peek result non-fatal
When a Git server responds to a fetch request, it may send optional
sections before the packfile section. To handle this, the Git client
calls packet_reader_peek() (see process_section_header()) in order to
see what's next without consuming the line.
However, as implemented, Git errors out whenever what's peeked is not an
ordinary line. This is not only unexpected (here, we only need to know
whether the upcoming line is the section header we want) but causes
errors to include the name of a section header that is irrelevant to the
cause of the error. For example, at $DAYJOB, we have seen "fatal: error
reading section header 'shallow-info'" error messages when none of the
repositories involved are shallow.
Therefore, fix this so that the peek returns 1 if the upcoming line is
the wanted section header and nothing else. Because of this change,
reader->line may now be NULL later in the function, so update the error
message printing code accordingly.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-16 19:02:20 +08:00
|
|
|
if (packet_reader_peek(reader) == PACKET_READ_NORMAL &&
|
|
|
|
!strcmp(reader->line, section))
|
|
|
|
ret = 1;
|
2018-03-16 01:31:28 +08:00
|
|
|
|
|
|
|
if (!peek) {
|
fetch-pack: make unexpected peek result non-fatal
When a Git server responds to a fetch request, it may send optional
sections before the packfile section. To handle this, the Git client
calls packet_reader_peek() (see process_section_header()) in order to
see what's next without consuming the line.
However, as implemented, Git errors out whenever what's peeked is not an
ordinary line. This is not only unexpected (here, we only need to know
whether the upcoming line is the section header we want) but causes
errors to include the name of a section header that is irrelevant to the
cause of the error. For example, at $DAYJOB, we have seen "fatal: error
reading section header 'shallow-info'" error messages when none of the
repositories involved are shallow.
Therefore, fix this so that the peek returns 1 if the upcoming line is
the wanted section header and nothing else. Because of this change,
reader->line may now be NULL later in the function, so update the error
message printing code accordingly.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-16 19:02:20 +08:00
|
|
|
if (!ret) {
|
|
|
|
if (reader->line)
|
|
|
|
die(_("expected '%s', received '%s'"),
|
|
|
|
section, reader->line);
|
|
|
|
else
|
|
|
|
die(_("expected '%s'"), section);
|
|
|
|
}
|
2018-03-16 01:31:28 +08:00
|
|
|
packet_reader_read(reader);
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2021-04-09 09:09:59 +08:00
|
|
|
static int process_ack(struct fetch_negotiator *negotiator,
|
|
|
|
struct packet_reader *reader,
|
|
|
|
struct object_id *common_oid,
|
|
|
|
int *received_ready)
|
2018-03-16 01:31:28 +08:00
|
|
|
{
|
|
|
|
while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
|
|
|
|
const char *arg;
|
|
|
|
|
|
|
|
if (!strcmp(reader->line, "NAK"))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (skip_prefix(reader->line, "ACK ", &arg)) {
|
2021-04-09 09:09:59 +08:00
|
|
|
if (!get_oid_hex(arg, common_oid)) {
|
2018-03-16 01:31:28 +08:00
|
|
|
struct commit *commit;
|
2021-04-09 09:09:59 +08:00
|
|
|
commit = lookup_commit(the_repository, common_oid);
|
promisor-remote: remove fetch_if_missing=0
Commit 6462d5eb9a ("fetch: remove fetch_if_missing=0", 2019-11-08)
strove to remove the need for fetch_if_missing=0 from the fetching
mechanism, so it is plausible to attempt removing fetch_if_missing=0
from the lazy-fetching mechanism in promisor-remote as well.
But doing so reveals a bug - when the server does not send an object
pointed to by a tag object, an infinite loop occurs: Git attempts to
fetch the missing object, which causes a deferencing of all refs (for
negotiation), which causes a lazy fetch of that missing object, and so
on. This bug is because of unnecessary use of the fetch negotiator
during lazy fetching - it is not used after initialization, but it is
still initialized (which causes the dereferencing of all refs).
Thus, when the negotiator is not used during fetching, refrain from
initializing it. Then, remove fetch_if_missing from promisor-remote.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-13 08:34:20 +08:00
|
|
|
if (negotiator)
|
|
|
|
negotiator->ack(negotiator, commit);
|
2018-03-16 01:31:28 +08:00
|
|
|
}
|
2021-04-09 09:09:59 +08:00
|
|
|
return 1;
|
2018-03-16 01:31:28 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (!strcmp(reader->line, "ready")) {
|
2021-04-09 09:09:59 +08:00
|
|
|
*received_ready = 1;
|
2018-03-16 01:31:28 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2018-07-24 01:56:35 +08:00
|
|
|
die(_("unexpected acknowledgment line: '%s'"), reader->line);
|
2018-03-16 01:31:28 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (reader->status != PACKET_READ_FLUSH &&
|
|
|
|
reader->status != PACKET_READ_DELIM)
|
2018-07-24 01:56:35 +08:00
|
|
|
die(_("error processing acks: %d"), reader->status);
|
2018-03-16 01:31:28 +08:00
|
|
|
|
fetch-pack: be more precise in parsing v2 response
Each section in a protocol v2 response is followed by either a DELIM
packet (indicating more sections to follow) or a FLUSH packet
(indicating none to follow). But when parsing the "acknowledgments"
section, do_fetch_pack_v2() is liberal in accepting both, but determines
whether to continue reading or not based solely on the contents of the
"acknowledgments" section, not on whether DELIM or FLUSH was read.
There is no issue with a protocol-compliant server, but can result in
confusing error messages when communicating with a server that
serves unexpected additional sections. Consider a server that sends
"new-section" after "acknowledgments":
- client writes request
- client reads the "acknowledgments" section which contains no "ready",
then DELIM
- since there was no "ready", client needs to continue negotiation, and
writes request
- client reads "new-section", and reports to the end user "expected
'acknowledgments', received 'new-section'"
For the person debugging the involved Git implementation(s), the error
message is confusing in that "new-section" was not received in response
to the latest request, but to the first one.
One solution is to always continue reading after DELIM, but in this
case, we can do better. We know from the protocol that "ready" means at
least the packfile section is coming (hence, DELIM) and that no "ready"
means that no sections are to follow (hence, FLUSH). So teach
process_acks() to enforce this.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-20 06:54:04 +08:00
|
|
|
/*
|
|
|
|
* If an "acknowledgments" section is sent, a packfile is sent if and
|
|
|
|
* only if "ready" was sent in this section. The other sections
|
|
|
|
* ("shallow-info" and "wanted-refs") are sent only if a packfile is
|
|
|
|
* sent. Therefore, a DELIM is expected if "ready" is sent, and a FLUSH
|
|
|
|
* otherwise.
|
|
|
|
*/
|
2021-04-09 09:09:59 +08:00
|
|
|
if (*received_ready && reader->status != PACKET_READ_DELIM)
|
2021-12-22 15:58:06 +08:00
|
|
|
/*
|
|
|
|
* TRANSLATORS: The parameter will be 'ready', a protocol
|
|
|
|
* keyword.
|
|
|
|
*/
|
|
|
|
die(_("expected packfile to be sent after '%s'"), "ready");
|
2021-04-09 09:09:59 +08:00
|
|
|
if (!*received_ready && reader->status != PACKET_READ_FLUSH)
|
2021-12-22 15:58:06 +08:00
|
|
|
/*
|
|
|
|
* TRANSLATORS: The parameter will be 'ready', a protocol
|
|
|
|
* keyword.
|
|
|
|
*/
|
|
|
|
die(_("expected no other sections to be sent after no '%s'"), "ready");
|
fetch-pack: be more precise in parsing v2 response
Each section in a protocol v2 response is followed by either a DELIM
packet (indicating more sections to follow) or a FLUSH packet
(indicating none to follow). But when parsing the "acknowledgments"
section, do_fetch_pack_v2() is liberal in accepting both, but determines
whether to continue reading or not based solely on the contents of the
"acknowledgments" section, not on whether DELIM or FLUSH was read.
There is no issue with a protocol-compliant server, but can result in
confusing error messages when communicating with a server that
serves unexpected additional sections. Consider a server that sends
"new-section" after "acknowledgments":
- client writes request
- client reads the "acknowledgments" section which contains no "ready",
then DELIM
- since there was no "ready", client needs to continue negotiation, and
writes request
- client reads "new-section", and reports to the end user "expected
'acknowledgments', received 'new-section'"
For the person debugging the involved Git implementation(s), the error
message is confusing in that "new-section" was not received in response
to the latest request, but to the first one.
One solution is to always continue reading after DELIM, but in this
case, we can do better. We know from the protocol that "ready" means at
least the packfile section is coming (hence, DELIM) and that no "ready"
means that no sections are to follow (hence, FLUSH). So teach
process_acks() to enforce this.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-20 06:54:04 +08:00
|
|
|
|
2021-04-09 09:09:59 +08:00
|
|
|
return 0;
|
2018-03-16 01:31:28 +08:00
|
|
|
}
|
|
|
|
|
2018-03-16 01:31:29 +08:00
|
|
|
static void receive_shallow_info(struct fetch_pack_args *args,
|
fetch-pack: respect --no-update-shallow in v2
In protocol v0, when sending "shallow" lines, the server distinguishes
between lines caused by the remote repo being shallow and lines caused
by client-specified depth settings. Unless "--update-shallow" is
specified, there is a difference in behavior: refs that reach the former
"shallow" lines, but not the latter, are rejected. But in v2, the server
does not, and the client treats all "shallow" lines like lines caused by
client-specified depth settings.
Full restoration of v0 functionality is not possible without protocol
change, but we can implement a heuristic: if we specify any depth
setting, treat all "shallow" lines like lines caused by client-specified
depth settings (that is, unaffected by "--no-update-shallow"), but
otherwise, treat them like lines caused by the remote repo being shallow
(that is, affected by "--no-update-shallow"). This restores most of v0
behavior, except in the case where a client fetches from a shallow
repository with depth settings.
This patch causes a test that previously failed with
GIT_TEST_PROTOCOL_VERSION=2 to pass.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-27 03:31:21 +08:00
|
|
|
struct packet_reader *reader,
|
|
|
|
struct oid_array *shallows,
|
|
|
|
struct shallow_info *si)
|
2018-03-16 01:31:29 +08:00
|
|
|
{
|
fetch-pack: respect --no-update-shallow in v2
In protocol v0, when sending "shallow" lines, the server distinguishes
between lines caused by the remote repo being shallow and lines caused
by client-specified depth settings. Unless "--update-shallow" is
specified, there is a difference in behavior: refs that reach the former
"shallow" lines, but not the latter, are rejected. But in v2, the server
does not, and the client treats all "shallow" lines like lines caused by
client-specified depth settings.
Full restoration of v0 functionality is not possible without protocol
change, but we can implement a heuristic: if we specify any depth
setting, treat all "shallow" lines like lines caused by client-specified
depth settings (that is, unaffected by "--no-update-shallow"), but
otherwise, treat them like lines caused by the remote repo being shallow
(that is, affected by "--no-update-shallow"). This restores most of v0
behavior, except in the case where a client fetches from a shallow
repository with depth settings.
This patch causes a test that previously failed with
GIT_TEST_PROTOCOL_VERSION=2 to pass.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-27 03:31:21 +08:00
|
|
|
int unshallow_received = 0;
|
fetch-pack: do not take shallow lock unnecessarily
When fetching using protocol v2, the remote may send a "shallow-info"
section if the client is shallow. If so, Git as the client currently
takes the shallow file lock, even if the "shallow-info" section is
empty.
This is not a problem except that Git does not support taking the
shallow file lock after modifying the shallow file, because
is_repository_shallow() stores information that is never cleared. And
this take-after-modify occurs when Git does a tag-following fetch from a
shallow repository on a transport that does not support tag following
(since in this case, 2 fetches are performed).
To solve this issue, take the shallow file lock (and perform all other
shallow processing) only if the "shallow-info" section is non-empty;
otherwise, behave as if it were empty.
A full solution (probably, ensuring that any action of committing
shallow file locks also includes clearing the information stored by
is_repository_shallow()) would solve the issue without need for this
patch, but this patch is independently useful (as an optimization to
prevent writing a file in an unnecessary case), hence why I wrote it. I
have included a NEEDSWORK outlining the full solution.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-11 03:36:45 +08:00
|
|
|
|
2018-03-16 01:31:29 +08:00
|
|
|
process_section_header(reader, "shallow-info", 0);
|
|
|
|
while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
|
|
|
|
const char *arg;
|
|
|
|
struct object_id oid;
|
|
|
|
|
|
|
|
if (skip_prefix(reader->line, "shallow ", &arg)) {
|
|
|
|
if (get_oid_hex(arg, &oid))
|
|
|
|
die(_("invalid shallow line: %s"), reader->line);
|
fetch-pack: respect --no-update-shallow in v2
In protocol v0, when sending "shallow" lines, the server distinguishes
between lines caused by the remote repo being shallow and lines caused
by client-specified depth settings. Unless "--update-shallow" is
specified, there is a difference in behavior: refs that reach the former
"shallow" lines, but not the latter, are rejected. But in v2, the server
does not, and the client treats all "shallow" lines like lines caused by
client-specified depth settings.
Full restoration of v0 functionality is not possible without protocol
change, but we can implement a heuristic: if we specify any depth
setting, treat all "shallow" lines like lines caused by client-specified
depth settings (that is, unaffected by "--no-update-shallow"), but
otherwise, treat them like lines caused by the remote repo being shallow
(that is, affected by "--no-update-shallow"). This restores most of v0
behavior, except in the case where a client fetches from a shallow
repository with depth settings.
This patch causes a test that previously failed with
GIT_TEST_PROTOCOL_VERSION=2 to pass.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-27 03:31:21 +08:00
|
|
|
oid_array_append(shallows, &oid);
|
2018-03-16 01:31:29 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
if (skip_prefix(reader->line, "unshallow ", &arg)) {
|
|
|
|
if (get_oid_hex(arg, &oid))
|
|
|
|
die(_("invalid unshallow line: %s"), reader->line);
|
2019-06-20 15:41:14 +08:00
|
|
|
if (!lookup_object(the_repository, &oid))
|
2018-03-16 01:31:29 +08:00
|
|
|
die(_("object not found: %s"), reader->line);
|
|
|
|
/* make sure that it is parsed as shallow */
|
2018-06-29 09:21:51 +08:00
|
|
|
if (!parse_object(the_repository, &oid))
|
2018-03-16 01:31:29 +08:00
|
|
|
die(_("error in object: %s"), reader->line);
|
|
|
|
if (unregister_shallow(&oid))
|
|
|
|
die(_("no shallow found: %s"), reader->line);
|
fetch-pack: respect --no-update-shallow in v2
In protocol v0, when sending "shallow" lines, the server distinguishes
between lines caused by the remote repo being shallow and lines caused
by client-specified depth settings. Unless "--update-shallow" is
specified, there is a difference in behavior: refs that reach the former
"shallow" lines, but not the latter, are rejected. But in v2, the server
does not, and the client treats all "shallow" lines like lines caused by
client-specified depth settings.
Full restoration of v0 functionality is not possible without protocol
change, but we can implement a heuristic: if we specify any depth
setting, treat all "shallow" lines like lines caused by client-specified
depth settings (that is, unaffected by "--no-update-shallow"), but
otherwise, treat them like lines caused by the remote repo being shallow
(that is, affected by "--no-update-shallow"). This restores most of v0
behavior, except in the case where a client fetches from a shallow
repository with depth settings.
This patch causes a test that previously failed with
GIT_TEST_PROTOCOL_VERSION=2 to pass.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-27 03:31:21 +08:00
|
|
|
unshallow_received = 1;
|
2018-03-16 01:31:29 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
die(_("expected shallow/unshallow, got %s"), reader->line);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (reader->status != PACKET_READ_FLUSH &&
|
|
|
|
reader->status != PACKET_READ_DELIM)
|
2018-07-24 01:56:35 +08:00
|
|
|
die(_("error processing shallow info: %d"), reader->status);
|
2018-03-16 01:31:29 +08:00
|
|
|
|
fetch-pack: respect --no-update-shallow in v2
In protocol v0, when sending "shallow" lines, the server distinguishes
between lines caused by the remote repo being shallow and lines caused
by client-specified depth settings. Unless "--update-shallow" is
specified, there is a difference in behavior: refs that reach the former
"shallow" lines, but not the latter, are rejected. But in v2, the server
does not, and the client treats all "shallow" lines like lines caused by
client-specified depth settings.
Full restoration of v0 functionality is not possible without protocol
change, but we can implement a heuristic: if we specify any depth
setting, treat all "shallow" lines like lines caused by client-specified
depth settings (that is, unaffected by "--no-update-shallow"), but
otherwise, treat them like lines caused by the remote repo being shallow
(that is, affected by "--no-update-shallow"). This restores most of v0
behavior, except in the case where a client fetches from a shallow
repository with depth settings.
This patch causes a test that previously failed with
GIT_TEST_PROTOCOL_VERSION=2 to pass.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-27 03:31:21 +08:00
|
|
|
if (args->deepen || unshallow_received) {
|
|
|
|
/*
|
|
|
|
* Treat these as shallow lines caused by our depth settings.
|
|
|
|
* In v0, these lines cannot cause refs to be rejected; do the
|
|
|
|
* same.
|
|
|
|
*/
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < shallows->nr; i++)
|
|
|
|
register_shallow(the_repository, &shallows->oid[i]);
|
fetch-pack: do not take shallow lock unnecessarily
When fetching using protocol v2, the remote may send a "shallow-info"
section if the client is shallow. If so, Git as the client currently
takes the shallow file lock, even if the "shallow-info" section is
empty.
This is not a problem except that Git does not support taking the
shallow file lock after modifying the shallow file, because
is_repository_shallow() stores information that is never cleared. And
this take-after-modify occurs when Git does a tag-following fetch from a
shallow repository on a transport that does not support tag following
(since in this case, 2 fetches are performed).
To solve this issue, take the shallow file lock (and perform all other
shallow processing) only if the "shallow-info" section is non-empty;
otherwise, behave as if it were empty.
A full solution (probably, ensuring that any action of committing
shallow file locks also includes clearing the information stored by
is_repository_shallow()) would solve the issue without need for this
patch, but this patch is independently useful (as an optimization to
prevent writing a file in an unnecessary case), hence why I wrote it. I
have included a NEEDSWORK outlining the full solution.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-11 03:36:45 +08:00
|
|
|
setup_alternate_shallow(&shallow_lock, &alternate_shallow_file,
|
|
|
|
NULL);
|
|
|
|
args->deepen = 1;
|
fetch-pack: respect --no-update-shallow in v2
In protocol v0, when sending "shallow" lines, the server distinguishes
between lines caused by the remote repo being shallow and lines caused
by client-specified depth settings. Unless "--update-shallow" is
specified, there is a difference in behavior: refs that reach the former
"shallow" lines, but not the latter, are rejected. But in v2, the server
does not, and the client treats all "shallow" lines like lines caused by
client-specified depth settings.
Full restoration of v0 functionality is not possible without protocol
change, but we can implement a heuristic: if we specify any depth
setting, treat all "shallow" lines like lines caused by client-specified
depth settings (that is, unaffected by "--no-update-shallow"), but
otherwise, treat them like lines caused by the remote repo being shallow
(that is, affected by "--no-update-shallow"). This restores most of v0
behavior, except in the case where a client fetches from a shallow
repository with depth settings.
This patch causes a test that previously failed with
GIT_TEST_PROTOCOL_VERSION=2 to pass.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-27 03:31:21 +08:00
|
|
|
} else if (shallows->nr) {
|
|
|
|
/*
|
|
|
|
* Treat these as shallow lines caused by the remote being
|
|
|
|
* shallow. In v0, remote refs that reach these objects are
|
|
|
|
* rejected (unless --update-shallow is set); do the same.
|
|
|
|
*/
|
|
|
|
prepare_shallow_info(si, shallows);
|
2021-04-01 18:46:59 +08:00
|
|
|
if (si->nr_ours || si->nr_theirs) {
|
|
|
|
if (args->reject_shallow_remote)
|
|
|
|
die(_("source repository is shallow, reject to clone."));
|
fetch-pack: respect --no-update-shallow in v2
In protocol v0, when sending "shallow" lines, the server distinguishes
between lines caused by the remote repo being shallow and lines caused
by client-specified depth settings. Unless "--update-shallow" is
specified, there is a difference in behavior: refs that reach the former
"shallow" lines, but not the latter, are rejected. But in v2, the server
does not, and the client treats all "shallow" lines like lines caused by
client-specified depth settings.
Full restoration of v0 functionality is not possible without protocol
change, but we can implement a heuristic: if we specify any depth
setting, treat all "shallow" lines like lines caused by client-specified
depth settings (that is, unaffected by "--no-update-shallow"), but
otherwise, treat them like lines caused by the remote repo being shallow
(that is, affected by "--no-update-shallow"). This restores most of v0
behavior, except in the case where a client fetches from a shallow
repository with depth settings.
This patch causes a test that previously failed with
GIT_TEST_PROTOCOL_VERSION=2 to pass.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-27 03:31:21 +08:00
|
|
|
alternate_shallow_file =
|
|
|
|
setup_temporary_shallow(si->shallow);
|
2021-04-01 18:46:59 +08:00
|
|
|
} else
|
fetch-pack: respect --no-update-shallow in v2
In protocol v0, when sending "shallow" lines, the server distinguishes
between lines caused by the remote repo being shallow and lines caused
by client-specified depth settings. Unless "--update-shallow" is
specified, there is a difference in behavior: refs that reach the former
"shallow" lines, but not the latter, are rejected. But in v2, the server
does not, and the client treats all "shallow" lines like lines caused by
client-specified depth settings.
Full restoration of v0 functionality is not possible without protocol
change, but we can implement a heuristic: if we specify any depth
setting, treat all "shallow" lines like lines caused by client-specified
depth settings (that is, unaffected by "--no-update-shallow"), but
otherwise, treat them like lines caused by the remote repo being shallow
(that is, affected by "--no-update-shallow"). This restores most of v0
behavior, except in the case where a client fetches from a shallow
repository with depth settings.
This patch causes a test that previously failed with
GIT_TEST_PROTOCOL_VERSION=2 to pass.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-27 03:31:21 +08:00
|
|
|
alternate_shallow_file = NULL;
|
2019-02-07 07:59:37 +08:00
|
|
|
} else {
|
|
|
|
alternate_shallow_file = NULL;
|
fetch-pack: do not take shallow lock unnecessarily
When fetching using protocol v2, the remote may send a "shallow-info"
section if the client is shallow. If so, Git as the client currently
takes the shallow file lock, even if the "shallow-info" section is
empty.
This is not a problem except that Git does not support taking the
shallow file lock after modifying the shallow file, because
is_repository_shallow() stores information that is never cleared. And
this take-after-modify occurs when Git does a tag-following fetch from a
shallow repository on a transport that does not support tag following
(since in this case, 2 fetches are performed).
To solve this issue, take the shallow file lock (and perform all other
shallow processing) only if the "shallow-info" section is non-empty;
otherwise, behave as if it were empty.
A full solution (probably, ensuring that any action of committing
shallow file locks also includes clearing the information stored by
is_repository_shallow()) would solve the issue without need for this
patch, but this patch is independently useful (as an optimization to
prevent writing a file in an unnecessary case), hence why I wrote it. I
have included a NEEDSWORK outlining the full solution.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-11 03:36:45 +08:00
|
|
|
}
|
2018-03-16 01:31:29 +08:00
|
|
|
}
|
|
|
|
|
2019-03-28 05:11:10 +08:00
|
|
|
static int cmp_name_ref(const void *name, const void *ref)
|
|
|
|
{
|
|
|
|
return strcmp(name, (*(struct ref **)ref)->name);
|
|
|
|
}
|
|
|
|
|
fetch-pack: unify ref in and out param
When a user fetches:
- at least one up-to-date ref and at least one non-up-to-date ref,
- using HTTP with protocol v0 (or something else that uses the fetch
command of a remote helper)
some refs might not be updated after the fetch.
This bug was introduced in commit 989b8c4452 ("fetch-pack: put shallow
info in output parameter", 2018-06-28) which allowed transports to
report the refs that they have fetched in a new out-parameter
"fetched_refs". If they do so, transport_fetch_refs() makes this
information available to its caller.
Users of "fetched_refs" rely on the following 3 properties:
(1) it is the complete list of refs that was passed to
transport_fetch_refs(),
(2) it has shallow information (REF_STATUS_REJECT_SHALLOW set if
relevant), and
(3) it has updated OIDs if ref-in-want was used (introduced after
989b8c4452).
In an effort to satisfy (1), whenever transport_fetch_refs()
filters the refs sent to the transport, it re-adds the filtered refs to
whatever the transport supplies before returning it to the user.
However, the implementation in 989b8c4452 unconditionally re-adds the
filtered refs without checking if the transport refrained from reporting
anything in "fetched_refs" (which it is allowed to do), resulting in an
incomplete list, no longer satisfying (1).
An earlier effort to resolve this [1] solved the issue by readding the
filtered refs only if the transport did not refrain from reporting in
"fetched_refs", but after further discussion, it seems that the better
solution is to revert the API change that introduced "fetched_refs".
This API change was first suggested as part of a ref-in-want
implementation that allowed for ref patterns and, thus, there could be
drastic differences between the input refs and the refs actually fetched
[2]; we eventually decided to only allow exact ref names, but this API
change remained even though its necessity was decreased.
Therefore, revert this API change by reverting commit 989b8c4452, and
make receive_wanted_refs() update the OIDs in the sought array (like how
update_shallow() updates shallow information in the sought array)
instead. A test is also included to show that the user-visible bug
discussed at the beginning of this commit message no longer exists.
[1] https://public-inbox.org/git/20180801171806.GA122458@google.com/
[2] https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathantanmy@google.com/
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-02 04:13:20 +08:00
|
|
|
static void receive_wanted_refs(struct packet_reader *reader,
|
|
|
|
struct ref **sought, int nr_sought)
|
2018-06-28 06:30:23 +08:00
|
|
|
{
|
|
|
|
process_section_header(reader, "wanted-refs", 0);
|
|
|
|
while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
|
|
|
|
struct object_id oid;
|
|
|
|
const char *end;
|
2019-03-28 05:11:10 +08:00
|
|
|
struct ref **found;
|
2018-06-28 06:30:23 +08:00
|
|
|
|
|
|
|
if (parse_oid_hex(reader->line, &oid, &end) || *end++ != ' ')
|
2018-07-24 01:56:35 +08:00
|
|
|
die(_("expected wanted-ref, got '%s'"), reader->line);
|
2018-06-28 06:30:23 +08:00
|
|
|
|
2019-03-28 05:11:10 +08:00
|
|
|
found = bsearch(end, sought, nr_sought, sizeof(*sought),
|
|
|
|
cmp_name_ref);
|
|
|
|
if (!found)
|
2018-07-24 01:56:35 +08:00
|
|
|
die(_("unexpected wanted-ref: '%s'"), reader->line);
|
2019-03-28 05:11:10 +08:00
|
|
|
oidcpy(&(*found)->old_oid, &oid);
|
2018-06-28 06:30:23 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (reader->status != PACKET_READ_DELIM)
|
2018-07-24 01:56:35 +08:00
|
|
|
die(_("error processing wanted refs: %d"), reader->status);
|
2018-06-28 06:30:23 +08:00
|
|
|
}
|
|
|
|
|
2020-06-11 04:57:23 +08:00
|
|
|
static void receive_packfile_uris(struct packet_reader *reader,
|
|
|
|
struct string_list *uris)
|
|
|
|
{
|
|
|
|
process_section_header(reader, "packfile-uris", 0);
|
|
|
|
while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
|
|
|
|
if (reader->pktlen < the_hash_algo->hexsz ||
|
|
|
|
reader->line[the_hash_algo->hexsz] != ' ')
|
2024-09-05 16:51:49 +08:00
|
|
|
die("expected '<hash> <uri>', got: %s", reader->line);
|
2020-06-11 04:57:23 +08:00
|
|
|
|
|
|
|
string_list_append(uris, reader->line);
|
|
|
|
}
|
|
|
|
if (reader->status != PACKET_READ_DELIM)
|
|
|
|
die("expected DELIM");
|
|
|
|
}
|
|
|
|
|
2018-03-16 01:31:28 +08:00
|
|
|
enum fetch_state {
|
|
|
|
FETCH_CHECK_LOCAL = 0,
|
|
|
|
FETCH_SEND_REQUEST,
|
|
|
|
FETCH_PROCESS_ACKS,
|
|
|
|
FETCH_GET_PACK,
|
|
|
|
FETCH_DONE,
|
|
|
|
};
|
|
|
|
|
fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 05:16:01 +08:00
|
|
|
static void do_check_stateless_delimiter(int stateless_rpc,
|
2020-05-19 18:54:00 +08:00
|
|
|
struct packet_reader *reader)
|
|
|
|
{
|
fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 05:16:01 +08:00
|
|
|
check_stateless_delimiter(stateless_rpc, reader,
|
2020-05-19 18:54:00 +08:00
|
|
|
_("git fetch-pack: expected response end packet"));
|
|
|
|
}
|
|
|
|
|
2018-03-16 01:31:28 +08:00
|
|
|
static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
|
|
|
|
int fd[2],
|
|
|
|
const struct ref *orig_ref,
|
|
|
|
struct ref **sought, int nr_sought,
|
fetch-pack: respect --no-update-shallow in v2
In protocol v0, when sending "shallow" lines, the server distinguishes
between lines caused by the remote repo being shallow and lines caused
by client-specified depth settings. Unless "--update-shallow" is
specified, there is a difference in behavior: refs that reach the former
"shallow" lines, but not the latter, are rejected. But in v2, the server
does not, and the client treats all "shallow" lines like lines caused by
client-specified depth settings.
Full restoration of v0 functionality is not possible without protocol
change, but we can implement a heuristic: if we specify any depth
setting, treat all "shallow" lines like lines caused by client-specified
depth settings (that is, unaffected by "--no-update-shallow"), but
otherwise, treat them like lines caused by the remote repo being shallow
(that is, affected by "--no-update-shallow"). This restores most of v0
behavior, except in the case where a client fetches from a shallow
repository with depth settings.
This patch causes a test that previously failed with
GIT_TEST_PROTOCOL_VERSION=2 to pass.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-27 03:31:21 +08:00
|
|
|
struct oid_array *shallows,
|
|
|
|
struct shallow_info *si,
|
fetch-pack: support more than one pack lockfile
Whenever a fetch results in a packfile being downloaded, a .keep file is
generated, so that the packfile can be preserved (from, say, a running
"git repack") until refs are written referring to the contents of the
packfile.
In a subsequent patch, a successful fetch using protocol v2 may result
in more than one .keep file being generated. Therefore, teach
fetch_pack() and the transport mechanism to support multiple .keep
files.
Implementation notes:
- builtin/fetch-pack.c normally does not generate .keep files, and thus
is unaffected by this or future changes. However, it has an
undocumented "--lock-pack" feature, used by remote-curl.c when
implementing the "fetch" remote helper command. In keeping with the
remote helper protocol, only one "lock" line will ever be written;
the rest will result in warnings to stderr. However, in practice,
warnings will never be written because the remote-curl.c "fetch" is
only used for protocol v0/v1 (which will not generate multiple .keep
files). (Protocol v2 uses the "stateless-connect" command, not the
"fetch" command.)
- connected.c has an optimization in that connectivity checks on a ref
need not be done if the target object is in a pack known to be
self-contained and connected. If there are multiple packfiles, this
optimization can no longer be done.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 04:57:22 +08:00
|
|
|
struct string_list *pack_lockfiles)
|
2018-03-16 01:31:28 +08:00
|
|
|
{
|
2019-08-14 02:37:48 +08:00
|
|
|
struct repository *r = the_repository;
|
2018-03-16 01:31:28 +08:00
|
|
|
struct ref *ref = copy_ref_list(orig_ref);
|
|
|
|
enum fetch_state state = FETCH_CHECK_LOCAL;
|
|
|
|
struct oidset common = OIDSET_INIT;
|
|
|
|
struct packet_reader reader;
|
2019-10-03 07:49:28 +08:00
|
|
|
int in_vain = 0, negotiation_started = 0;
|
fetch-pack: add tracing for negotiation rounds
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 06:04:05 +08:00
|
|
|
int negotiation_round = 0;
|
2018-03-16 01:31:28 +08:00
|
|
|
int haves_to_send = INITIAL_FLUSH;
|
promisor-remote: remove fetch_if_missing=0
Commit 6462d5eb9a ("fetch: remove fetch_if_missing=0", 2019-11-08)
strove to remove the need for fetch_if_missing=0 from the fetching
mechanism, so it is plausible to attempt removing fetch_if_missing=0
from the lazy-fetching mechanism in promisor-remote as well.
But doing so reveals a bug - when the server does not send an object
pointed to by a tag object, an infinite loop occurs: Git attempts to
fetch the missing object, which causes a deferencing of all refs (for
negotiation), which causes a lazy fetch of that missing object, and so
on. This bug is because of unnecessary use of the fetch negotiator
during lazy fetching - it is not used after initialization, but it is
still initialized (which causes the dereferencing of all refs).
Thus, when the negotiator is not used during fetching, refrain from
initializing it. Then, remove fetch_if_missing from promisor-remote.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-13 08:34:20 +08:00
|
|
|
struct fetch_negotiator negotiator_alloc;
|
|
|
|
struct fetch_negotiator *negotiator;
|
2020-04-28 08:01:09 +08:00
|
|
|
int seen_ack = 0;
|
2021-04-09 09:09:59 +08:00
|
|
|
struct object_id common_oid;
|
|
|
|
int received_ready = 0;
|
2020-06-11 04:57:23 +08:00
|
|
|
struct string_list packfile_uris = STRING_LIST_INIT_DUP;
|
|
|
|
int i;
|
2021-02-23 03:20:08 +08:00
|
|
|
struct strvec index_pack_args = STRVEC_INIT;
|
promisor-remote: remove fetch_if_missing=0
Commit 6462d5eb9a ("fetch: remove fetch_if_missing=0", 2019-11-08)
strove to remove the need for fetch_if_missing=0 from the fetching
mechanism, so it is plausible to attempt removing fetch_if_missing=0
from the lazy-fetching mechanism in promisor-remote as well.
But doing so reveals a bug - when the server does not send an object
pointed to by a tag object, an infinite loop occurs: Git attempts to
fetch the missing object, which causes a deferencing of all refs (for
negotiation), which causes a lazy fetch of that missing object, and so
on. This bug is because of unnecessary use of the fetch negotiator
during lazy fetching - it is not used after initialization, but it is
still initialized (which causes the dereferencing of all refs).
Thus, when the negotiator is not used during fetching, refrain from
initializing it. Then, remove fetch_if_missing from promisor-remote.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-13 08:34:20 +08:00
|
|
|
|
2020-08-18 12:01:37 +08:00
|
|
|
negotiator = &negotiator_alloc;
|
2022-03-28 22:02:06 +08:00
|
|
|
if (args->refetch)
|
|
|
|
fetch_negotiator_init_noop(negotiator);
|
|
|
|
else
|
|
|
|
fetch_negotiator_init(r, negotiator);
|
promisor-remote: remove fetch_if_missing=0
Commit 6462d5eb9a ("fetch: remove fetch_if_missing=0", 2019-11-08)
strove to remove the need for fetch_if_missing=0 from the fetching
mechanism, so it is plausible to attempt removing fetch_if_missing=0
from the lazy-fetching mechanism in promisor-remote as well.
But doing so reveals a bug - when the server does not send an object
pointed to by a tag object, an infinite loop occurs: Git attempts to
fetch the missing object, which causes a deferencing of all refs (for
negotiation), which causes a lazy fetch of that missing object, and so
on. This bug is because of unnecessary use of the fetch negotiator
during lazy fetching - it is not used after initialization, but it is
still initialized (which causes the dereferencing of all refs).
Thus, when the negotiator is not used during fetching, refrain from
initializing it. Then, remove fetch_if_missing from promisor-remote.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-13 08:34:20 +08:00
|
|
|
|
2018-03-16 01:31:28 +08:00
|
|
|
packet_reader_init(&reader, fd[0], NULL, 0,
|
pack-protocol.txt: accept error packets in any context
In the Git pack protocol definition, an error packet may appear only in
a certain context. However, servers can face a runtime error (e.g. I/O
error) at an arbitrary timing. This patch changes the protocol to allow
an error packet to be sent instead of any packet.
Without this protocol spec change, when a server cannot process a
request, there's no way to tell that to a client. Since the server
cannot produce a valid response, it would be forced to cut a connection
without telling why. With this protocol spec change, the server can be
more gentle in this situation. An old client may see these error packets
as an unexpected packet, but this is not worse than having an unexpected
EOF.
Following this protocol spec change, the error packet handling code is
moved to pkt-line.c. Implementation wise, this implementation uses
pkt-line to communicate with a subprocess. Since this is not a part of
Git protocol, it's possible that a packet that is not supposed to be an
error packet is mistakenly parsed as an error packet. This error packet
handling is enabled only for the Git pack protocol parsing code
considering this.
Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-30 05:19:15 +08:00
|
|
|
PACKET_READ_CHOMP_NEWLINE |
|
|
|
|
PACKET_READ_DIE_ON_ERR_PACKET);
|
2019-01-17 03:28:15 +08:00
|
|
|
if (git_env_bool("GIT_TEST_SIDEBAND_ALL", 1) &&
|
|
|
|
server_supports_feature("fetch", "sideband-all", 0)) {
|
2019-01-17 03:28:14 +08:00
|
|
|
reader.use_sideband = 1;
|
|
|
|
reader.me = "fetch-pack";
|
|
|
|
}
|
2018-03-16 01:31:28 +08:00
|
|
|
|
|
|
|
while (state != FETCH_DONE) {
|
|
|
|
switch (state) {
|
|
|
|
case FETCH_CHECK_LOCAL:
|
|
|
|
sort_ref_list(&ref, ref_compare_name);
|
|
|
|
QSORT(sought, nr_sought, cmp_ref_by_name);
|
|
|
|
|
|
|
|
/* v2 supports these by default */
|
|
|
|
allow_unadvertised_object_request |= ALLOW_REACHABLE_SHA1;
|
|
|
|
use_sideband = 2;
|
2018-03-16 01:31:29 +08:00
|
|
|
if (args->depth > 0 || args->deepen_since || args->deepen_not)
|
|
|
|
args->deepen = 1;
|
2018-03-16 01:31:28 +08:00
|
|
|
|
|
|
|
/* Filter 'ref' by 'sought' and those that aren't local */
|
2020-08-18 12:01:37 +08:00
|
|
|
mark_complete_and_common_ref(negotiator, args, &ref);
|
|
|
|
filter_refs(args, &ref, sought, nr_sought);
|
2022-03-28 22:02:06 +08:00
|
|
|
if (!args->refetch && everything_local(args, &ref))
|
2020-08-18 12:01:37 +08:00
|
|
|
state = FETCH_DONE;
|
|
|
|
else
|
2018-03-16 01:31:28 +08:00
|
|
|
state = FETCH_SEND_REQUEST;
|
2020-08-18 12:01:37 +08:00
|
|
|
|
|
|
|
mark_tips(negotiator, args->negotiation_tips);
|
|
|
|
for_each_cached_alternate(negotiator,
|
|
|
|
insert_one_alternate_object);
|
2018-03-16 01:31:28 +08:00
|
|
|
break;
|
|
|
|
case FETCH_SEND_REQUEST:
|
2019-10-03 07:49:28 +08:00
|
|
|
if (!negotiation_started) {
|
|
|
|
negotiation_started = 1;
|
|
|
|
trace2_region_enter("fetch-pack",
|
|
|
|
"negotiation_v2",
|
|
|
|
the_repository);
|
|
|
|
}
|
fetch-pack: add tracing for negotiation rounds
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 06:04:05 +08:00
|
|
|
negotiation_round++;
|
|
|
|
trace2_region_enter_printf("negotiation_v2", "round",
|
|
|
|
the_repository, "%d",
|
|
|
|
negotiation_round);
|
promisor-remote: remove fetch_if_missing=0
Commit 6462d5eb9a ("fetch: remove fetch_if_missing=0", 2019-11-08)
strove to remove the need for fetch_if_missing=0 from the fetching
mechanism, so it is plausible to attempt removing fetch_if_missing=0
from the lazy-fetching mechanism in promisor-remote as well.
But doing so reveals a bug - when the server does not send an object
pointed to by a tag object, an infinite loop occurs: Git attempts to
fetch the missing object, which causes a deferencing of all refs (for
negotiation), which causes a lazy fetch of that missing object, and so
on. This bug is because of unnecessary use of the fetch negotiator
during lazy fetching - it is not used after initialization, but it is
still initialized (which causes the dereferencing of all refs).
Thus, when the negotiator is not used during fetching, refrain from
initializing it. Then, remove fetch_if_missing from promisor-remote.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-13 08:34:20 +08:00
|
|
|
if (send_fetch_request(negotiator, fd[1], args, ref,
|
2018-06-15 06:54:28 +08:00
|
|
|
&common,
|
2019-01-17 03:28:14 +08:00
|
|
|
&haves_to_send, &in_vain,
|
2020-04-28 08:01:09 +08:00
|
|
|
reader.use_sideband,
|
fetch-pack: add tracing for negotiation rounds
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 06:04:05 +08:00
|
|
|
seen_ack)) {
|
|
|
|
trace2_region_leave_printf("negotiation_v2", "round",
|
|
|
|
the_repository, "%d",
|
|
|
|
negotiation_round);
|
2018-03-16 01:31:28 +08:00
|
|
|
state = FETCH_GET_PACK;
|
fetch-pack: add tracing for negotiation rounds
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 06:04:05 +08:00
|
|
|
}
|
2018-03-16 01:31:28 +08:00
|
|
|
else
|
|
|
|
state = FETCH_PROCESS_ACKS;
|
|
|
|
break;
|
|
|
|
case FETCH_PROCESS_ACKS:
|
|
|
|
/* Process ACKs/NAKs */
|
2021-04-09 09:09:59 +08:00
|
|
|
process_section_header(&reader, "acknowledgments", 0);
|
|
|
|
while (process_ack(negotiator, &reader, &common_oid,
|
|
|
|
&received_ready)) {
|
|
|
|
in_vain = 0;
|
|
|
|
seen_ack = 1;
|
|
|
|
oidset_insert(&common, &common_oid);
|
|
|
|
}
|
fetch-pack: add tracing for negotiation rounds
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 06:04:05 +08:00
|
|
|
trace2_region_leave_printf("negotiation_v2", "round",
|
|
|
|
the_repository, "%d",
|
|
|
|
negotiation_round);
|
2021-04-09 09:09:59 +08:00
|
|
|
if (received_ready) {
|
2020-05-19 18:54:00 +08:00
|
|
|
/*
|
|
|
|
* Don't check for response delimiter; get_pack() will
|
|
|
|
* read the rest of this response.
|
|
|
|
*/
|
2018-03-16 01:31:28 +08:00
|
|
|
state = FETCH_GET_PACK;
|
2021-04-09 09:09:59 +08:00
|
|
|
} else {
|
fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 05:16:01 +08:00
|
|
|
do_check_stateless_delimiter(args->stateless_rpc, &reader);
|
2018-03-16 01:31:28 +08:00
|
|
|
state = FETCH_SEND_REQUEST;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case FETCH_GET_PACK:
|
2019-10-03 07:49:28 +08:00
|
|
|
trace2_region_leave("fetch-pack",
|
|
|
|
"negotiation_v2",
|
|
|
|
the_repository);
|
fetch-pack: add tracing for negotiation rounds
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 06:04:05 +08:00
|
|
|
trace2_data_intmax("negotiation_v2", the_repository,
|
|
|
|
"total_rounds", negotiation_round);
|
2018-03-16 01:31:29 +08:00
|
|
|
/* Check for shallow-info section */
|
|
|
|
if (process_section_header(&reader, "shallow-info", 1))
|
fetch-pack: respect --no-update-shallow in v2
In protocol v0, when sending "shallow" lines, the server distinguishes
between lines caused by the remote repo being shallow and lines caused
by client-specified depth settings. Unless "--update-shallow" is
specified, there is a difference in behavior: refs that reach the former
"shallow" lines, but not the latter, are rejected. But in v2, the server
does not, and the client treats all "shallow" lines like lines caused by
client-specified depth settings.
Full restoration of v0 functionality is not possible without protocol
change, but we can implement a heuristic: if we specify any depth
setting, treat all "shallow" lines like lines caused by client-specified
depth settings (that is, unaffected by "--no-update-shallow"), but
otherwise, treat them like lines caused by the remote repo being shallow
(that is, affected by "--no-update-shallow"). This restores most of v0
behavior, except in the case where a client fetches from a shallow
repository with depth settings.
This patch causes a test that previously failed with
GIT_TEST_PROTOCOL_VERSION=2 to pass.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-27 03:31:21 +08:00
|
|
|
receive_shallow_info(args, &reader, shallows, si);
|
2018-03-16 01:31:29 +08:00
|
|
|
|
2018-06-28 06:30:23 +08:00
|
|
|
if (process_section_header(&reader, "wanted-refs", 1))
|
fetch-pack: unify ref in and out param
When a user fetches:
- at least one up-to-date ref and at least one non-up-to-date ref,
- using HTTP with protocol v0 (or something else that uses the fetch
command of a remote helper)
some refs might not be updated after the fetch.
This bug was introduced in commit 989b8c4452 ("fetch-pack: put shallow
info in output parameter", 2018-06-28) which allowed transports to
report the refs that they have fetched in a new out-parameter
"fetched_refs". If they do so, transport_fetch_refs() makes this
information available to its caller.
Users of "fetched_refs" rely on the following 3 properties:
(1) it is the complete list of refs that was passed to
transport_fetch_refs(),
(2) it has shallow information (REF_STATUS_REJECT_SHALLOW set if
relevant), and
(3) it has updated OIDs if ref-in-want was used (introduced after
989b8c4452).
In an effort to satisfy (1), whenever transport_fetch_refs()
filters the refs sent to the transport, it re-adds the filtered refs to
whatever the transport supplies before returning it to the user.
However, the implementation in 989b8c4452 unconditionally re-adds the
filtered refs without checking if the transport refrained from reporting
anything in "fetched_refs" (which it is allowed to do), resulting in an
incomplete list, no longer satisfying (1).
An earlier effort to resolve this [1] solved the issue by readding the
filtered refs only if the transport did not refrain from reporting in
"fetched_refs", but after further discussion, it seems that the better
solution is to revert the API change that introduced "fetched_refs".
This API change was first suggested as part of a ref-in-want
implementation that allowed for ref patterns and, thus, there could be
drastic differences between the input refs and the refs actually fetched
[2]; we eventually decided to only allow exact ref names, but this API
change remained even though its necessity was decreased.
Therefore, revert this API change by reverting commit 989b8c4452, and
make receive_wanted_refs() update the OIDs in the sought array (like how
update_shallow() updates shallow information in the sought array)
instead. A test is also included to show that the user-visible bug
discussed at the beginning of this commit message no longer exists.
[1] https://public-inbox.org/git/20180801171806.GA122458@google.com/
[2] https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathantanmy@google.com/
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-02 04:13:20 +08:00
|
|
|
receive_wanted_refs(&reader, sought, nr_sought);
|
2018-06-28 06:30:23 +08:00
|
|
|
|
2020-06-11 04:57:23 +08:00
|
|
|
/* get the pack(s) */
|
2021-11-11 07:51:28 +08:00
|
|
|
if (git_env_bool("GIT_TRACE_REDACT", 1))
|
|
|
|
reader.options |= PACKET_READ_REDACT_URI_PATH;
|
2020-06-11 04:57:23 +08:00
|
|
|
if (process_section_header(&reader, "packfile-uris", 1))
|
|
|
|
receive_packfile_uris(&reader, &packfile_uris);
|
2021-11-11 07:51:28 +08:00
|
|
|
/* We don't expect more URIs. Reset to avoid expensive URI check. */
|
|
|
|
reader.options &= ~PACKET_READ_REDACT_URI_PATH;
|
|
|
|
|
2018-03-16 01:31:28 +08:00
|
|
|
process_section_header(&reader, "packfile", 0);
|
fetch-pack: signal v2 server that we are done making requests
When fetching with the v0 protocol over ssh (or a local upload-pack with
pipes), the server closes the connection as soon as it is finished
sending the pack. So even though the client may still be operating on
the data via index-pack (e.g., resolving deltas, checking connectivity,
etc), the server has released all resources.
With the v2 protocol, however, the server considers the ssh session only
as a transport, with individual requests coming over it. After sending
the pack, it goes back to its main loop, waiting for another request to
come from the client. As a result, the ssh session hangs around until
the client process ends, which may be much later (because resolving
deltas, etc, may consume a lot of CPU).
This is bad for two reasons:
- it's consuming resources on the server to leave open a connection
that won't see any more use
- if something bad happens to the ssh connection in the meantime (say,
it gets killed by the network because it's idle, as happened in a
real-world report), then ssh will exit non-zero, and we'll propagate
the error up the stack.
The server is correct here not to hang up after serving the pack. The v2
protocol's design is meant to allow multiple requests like this, and
hanging up would be the wrong thing for a hypothetical client which was
planning to make more requests (though in practice, the git.git client
never would, and I doubt any other implementations would either).
The right thing is instead for the client to signal to the server that
it's not interested in making more requests. We can do that by closing
the pipe descriptor we use to write to ssh. This will propagate to the
server upload-pack as an EOF when it tries to read the next request (and
then it will close its half, and the whole connection will go away).
It's important to do this "half duplex" shutdown, because we have to do
it _before_ we actually receive the pack. This is an artifact of the way
fetch-pack and index-pack (or unpack-objects) interact. We hand the
connection off to index-pack (really, a sideband demuxer which feeds
it), and then wait until it returns. And it doesn't do that until it has
resolved all of the deltas in the pack, even though it was done reading
from the server long before.
So just closing the connection fully after index-pack returns would be
too late; we'd have held it open much longer than was necessary. And
teaching index-pack to close the connection is awkward. It's not even
seeing the whole conversation (the sideband demuxer is, but it doesn't
actually know what's in the packets, or when the end comes).
Note that this close() is happening deep within the transport code. It's
possible that a caller would want to perform other operations over the
same ssh transport after receiving the pack. But as of the current code,
none of the callers do, and there haven't been discussions of any plans
to change this. If we need to support that later, we can probably do so
by passing down a flag for "you're the last request on the transport;
it's OK to close" instead of the code just assuming that's true.
The description above all discusses v2 ssh, so it's worth thinking about
how this interacts with other protocols:
- in v0 protocols, we could do the same half-duplex shutdown (it just
goes into the v0 do_fetch_pack() instead). This does work, but since
it doesn't have the same persistence problem in the first place,
there's little reason to change it at this point.
- local fetches against git-upload-pack on the same machine will
behave the same as ssh (they are talking over two pipes, and see EOF
on their input pipe)
- fetches against git-daemon will run this same code, and close one of
the descriptors. In practice, this won't do anything, since there
our two descriptors are dups of each other, and not part of a
half-duplex pair. The right thing would probably be to call
shutdown(SHUT_WR) on it. I didn't bother with that here. It doesn't
face the same error-code problem (since it's just a TCP connection),
so it's really only an optimization problem. And git:// is not that
widely used these days, and has less impact on server resources than
an ssh termination.
- v2 http doesn't suffer from this problem in the first place, as our
pipes terminate at a local git-remote-https, which is passing data
along as individual requests via curl. Probably curl is keeping the
TCP/TLS connection open for more requests, and we might be able to
tell it manually "hey, we are done making requests now". But I think
that's much less important. It again doesn't suffer from the
error-code problem, and HTTP keepalive is pretty well understood
(importantly, the timeouts can be set low, because clients like curl
know how to reconnect for subsequent requests if necessary). So it's
probably not worth figuring out how to tell curl that we're done
(though if we do, this patch is probably the first step anyway;
fetch-pack closes the pipe back to remote-https, which would be the
signal that it should tell curl we're done).
The code is pretty straightforward. We close the pipe at the right
moment, and set it to -1 to mark it as invalid. I modified the later
cleanup code to avoid calling close(-1). That's not strictly necessary,
since close(-1) is a noop, but hopefully makes things a bit more obvious
to a reader.
I suspect that trying to call more transport functions after the close()
(e.g., calling transport_fetch_refs() again) would fail, as it's not
smart enough to realize we need to re-open the ssh connection. But
that's already true when v0 is in use. And no current callers want to do
that (and again, the solution is probably a flag in the transport code
to keep things open, which can be added later).
There's no test here, as the situation it covers is inherently racy (the
question is when upload-pack exits, compared to when index-pack finishes
resolving deltas and exits). The rather gross shell snippet below does
recreate the problematic situation; when run on a sufficiently-large
repository (git.git works fine), it kills an "idle" upload-pack while
the client is resolving deltas, leading to a failed clone.
(
git clone --no-local --progress . foo.git 2>&1
echo >&2 "clone exit code=$?"
) |
tr '\r' '\n' |
while read line
do
case "$done,$line" in
,Resolving*)
echo "hit resolving deltas; killing upload-pack"
killall -9 git-upload-pack
done=t
;;
esac
done
Reported-by: Greg Pflaum <greg.pflaum@pnp-hcl.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20 00:11:05 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* this is the final request we'll make of the server;
|
|
|
|
* do a half-duplex shutdown to indicate that they can
|
|
|
|
* hang up as soon as the pack is sent.
|
|
|
|
*/
|
|
|
|
close(fd[1]);
|
|
|
|
fd[1] = -1;
|
|
|
|
|
2020-06-11 04:57:23 +08:00
|
|
|
if (get_pack(args, fd, pack_lockfiles,
|
2021-02-23 03:20:08 +08:00
|
|
|
packfile_uris.nr ? &index_pack_args : NULL,
|
2021-03-28 21:15:51 +08:00
|
|
|
sought, nr_sought, &fsck_options.gitmodules_found))
|
2018-03-16 01:31:28 +08:00
|
|
|
die(_("git fetch-pack: fetch failed."));
|
fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 05:16:01 +08:00
|
|
|
do_check_stateless_delimiter(args->stateless_rpc, &reader);
|
2018-03-16 01:31:28 +08:00
|
|
|
|
|
|
|
state = FETCH_DONE;
|
|
|
|
break;
|
|
|
|
case FETCH_DONE:
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-06-11 04:57:23 +08:00
|
|
|
for (i = 0; i < packfile_uris.nr; i++) {
|
2021-02-23 03:20:08 +08:00
|
|
|
int j;
|
2020-06-11 04:57:23 +08:00
|
|
|
struct child_process cmd = CHILD_PROCESS_INIT;
|
|
|
|
char packname[GIT_MAX_HEXSZ + 1];
|
|
|
|
const char *uri = packfile_uris.items[i].string +
|
|
|
|
the_hash_algo->hexsz + 1;
|
|
|
|
|
2020-07-29 04:24:53 +08:00
|
|
|
strvec_push(&cmd.args, "http-fetch");
|
|
|
|
strvec_pushf(&cmd.args, "--packfile=%.*s",
|
strvec: fix indentation in renamed calls
Code which split an argv_array call across multiple lines, like:
argv_array_pushl(&args, "one argument",
"another argument", "and more",
NULL);
was recently mechanically renamed to use strvec, which results in
mis-matched indentation like:
strvec_pushl(&args, "one argument",
"another argument", "and more",
NULL);
Let's fix these up to align the arguments with the opening paren. I did
this manually by sifting through the results of:
git jump grep 'strvec_.*,$'
and liberally applying my editor's auto-format. Most of the changes are
of the form shown above, though I also normalized a few that had
originally used a single-tab indentation (rather than our usual style of
aligning with the open paren). I also rewrapped a couple of obvious
cases (e.g., where previously too-long lines became short enough to fit
on one), but I wasn't aggressive about it. In cases broken to three or
more lines, the grouping of arguments is sometimes meaningful, and it
wasn't worth my time or reviewer time to ponder each case individually.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-29 04:26:31 +08:00
|
|
|
(int) the_hash_algo->hexsz,
|
|
|
|
packfile_uris.items[i].string);
|
2021-02-23 03:20:08 +08:00
|
|
|
for (j = 0; j < index_pack_args.nr; j++)
|
|
|
|
strvec_pushf(&cmd.args, "--index-pack-arg=%s",
|
|
|
|
index_pack_args.v[j]);
|
2020-07-29 04:24:53 +08:00
|
|
|
strvec_push(&cmd.args, uri);
|
2020-06-11 04:57:23 +08:00
|
|
|
cmd.git_cmd = 1;
|
|
|
|
cmd.no_stdin = 1;
|
|
|
|
cmd.out = -1;
|
|
|
|
if (start_command(&cmd))
|
|
|
|
die("fetch-pack: unable to spawn http-fetch");
|
|
|
|
|
|
|
|
if (read_in_full(cmd.out, packname, 5) < 0 ||
|
|
|
|
memcmp(packname, "keep\t", 5))
|
|
|
|
die("fetch-pack: expected keep then TAB at start of http-fetch output");
|
|
|
|
|
|
|
|
if (read_in_full(cmd.out, packname,
|
|
|
|
the_hash_algo->hexsz + 1) < 0 ||
|
|
|
|
packname[the_hash_algo->hexsz] != '\n')
|
|
|
|
die("fetch-pack: expected hash then LF at end of http-fetch output");
|
|
|
|
|
|
|
|
packname[the_hash_algo->hexsz] = '\0';
|
|
|
|
|
2021-03-28 21:15:51 +08:00
|
|
|
parse_gitmodules_oids(cmd.out, &fsck_options.gitmodules_found);
|
2021-02-23 03:20:09 +08:00
|
|
|
|
2020-06-11 04:57:23 +08:00
|
|
|
close(cmd.out);
|
|
|
|
|
|
|
|
if (finish_command(&cmd))
|
|
|
|
die("fetch-pack: unable to finish http-fetch");
|
|
|
|
|
|
|
|
if (memcmp(packfile_uris.items[i].string, packname,
|
|
|
|
the_hash_algo->hexsz))
|
|
|
|
die("fetch-pack: pack downloaded from %s does not match expected hash %.*s",
|
|
|
|
uri, (int) the_hash_algo->hexsz,
|
|
|
|
packfile_uris.items[i].string);
|
|
|
|
|
|
|
|
string_list_append_nodup(pack_lockfiles,
|
|
|
|
xstrfmt("%s/pack/pack-%s.keep",
|
2024-09-12 19:29:30 +08:00
|
|
|
repo_get_object_directory(the_repository),
|
2020-06-11 04:57:23 +08:00
|
|
|
packname));
|
|
|
|
}
|
|
|
|
string_list_clear(&packfile_uris, 0);
|
2021-02-23 03:20:08 +08:00
|
|
|
strvec_clear(&index_pack_args);
|
2020-06-11 04:57:23 +08:00
|
|
|
|
2021-03-28 21:15:51 +08:00
|
|
|
if (fsck_finish(&fsck_options))
|
|
|
|
die("fsck failed");
|
2020-06-11 04:57:23 +08:00
|
|
|
|
promisor-remote: remove fetch_if_missing=0
Commit 6462d5eb9a ("fetch: remove fetch_if_missing=0", 2019-11-08)
strove to remove the need for fetch_if_missing=0 from the fetching
mechanism, so it is plausible to attempt removing fetch_if_missing=0
from the lazy-fetching mechanism in promisor-remote as well.
But doing so reveals a bug - when the server does not send an object
pointed to by a tag object, an infinite loop occurs: Git attempts to
fetch the missing object, which causes a deferencing of all refs (for
negotiation), which causes a lazy fetch of that missing object, and so
on. This bug is because of unnecessary use of the fetch negotiator
during lazy fetching - it is not used after initialization, but it is
still initialized (which causes the dereferencing of all refs).
Thus, when the negotiator is not used during fetching, refrain from
initializing it. Then, remove fetch_if_missing from promisor-remote.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-13 08:34:20 +08:00
|
|
|
if (negotiator)
|
|
|
|
negotiator->release(negotiator);
|
2020-06-11 04:57:23 +08:00
|
|
|
|
2018-03-16 01:31:28 +08:00
|
|
|
oidset_clear(&common);
|
|
|
|
return ref;
|
|
|
|
}
|
|
|
|
|
config: add ctx arg to config_fn_t
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).
In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.
Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:
- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed
Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.
The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:
- trace2/tr2_cfg.c:tr2_cfg_set_fl()
This is indirectly called by git_config_set() so that the trace2
machinery can notice the new config values and update its settings
using the tr2 config parsing function, i.e. tr2_cfg_cb().
- builtin/checkout.c:checkout_main()
This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
This might be worth refactoring away in the future, since
git_xmerge_config() can call git_default_config(), which can do much
more than just parsing.
Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-29 03:26:22 +08:00
|
|
|
static int fetch_pack_config_cb(const char *var, const char *value,
|
|
|
|
const struct config_context *ctx, void *cb)
|
2018-07-27 22:37:17 +08:00
|
|
|
{
|
2023-12-07 15:11:35 +08:00
|
|
|
const char *msg_id;
|
|
|
|
|
2018-07-27 22:37:17 +08:00
|
|
|
if (strcmp(var, "fetch.fsck.skiplist") == 0) {
|
2024-05-27 19:46:15 +08:00
|
|
|
char *path ;
|
2018-07-27 22:37:17 +08:00
|
|
|
|
|
|
|
if (git_config_pathname(&path, var, value))
|
|
|
|
return 1;
|
|
|
|
strbuf_addf(&fsck_msg_types, "%cskiplist=%s",
|
|
|
|
fsck_msg_types.len ? ',' : '=', path);
|
2024-05-27 19:46:15 +08:00
|
|
|
free(path);
|
2018-07-27 22:37:17 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2023-12-07 15:11:35 +08:00
|
|
|
if (skip_prefix(var, "fetch.fsck.", &msg_id)) {
|
|
|
|
if (!value)
|
|
|
|
return config_error_nonbool(var);
|
|
|
|
if (is_valid_msg_type(msg_id, value))
|
2018-07-27 22:37:17 +08:00
|
|
|
strbuf_addf(&fsck_msg_types, "%c%s=%s",
|
2023-12-07 15:11:35 +08:00
|
|
|
fsck_msg_types.len ? ',' : '=', msg_id, value);
|
2018-07-27 22:37:17 +08:00
|
|
|
else
|
2023-12-07 15:11:35 +08:00
|
|
|
warning("Skipping unknown msg id '%s'", msg_id);
|
2018-07-27 22:37:17 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
config: add ctx arg to config_fn_t
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).
In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.
Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:
- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed
Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.
The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:
- trace2/tr2_cfg.c:tr2_cfg_set_fl()
This is indirectly called by git_config_set() so that the trace2
machinery can notice the new config values and update its settings
using the tr2 config parsing function, i.e. tr2_cfg_cb().
- builtin/checkout.c:checkout_main()
This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
This might be worth refactoring away in the future, since
git_xmerge_config() can call git_default_config(), which can do much
more than just parsing.
Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-29 03:26:22 +08:00
|
|
|
return git_default_config(var, value, ctx, cb);
|
2018-07-27 22:37:17 +08:00
|
|
|
}
|
|
|
|
|
2014-08-08 00:21:20 +08:00
|
|
|
static void fetch_pack_config(void)
|
2012-10-26 23:53:55 +08:00
|
|
|
{
|
2014-08-08 00:21:20 +08:00
|
|
|
git_config_get_int("fetch.unpacklimit", &fetch_unpack_limit);
|
|
|
|
git_config_get_int("transfer.unpacklimit", &transfer_unpack_limit);
|
|
|
|
git_config_get_bool("repack.usedeltabaseoffset", &prefer_ofs_delta);
|
|
|
|
git_config_get_bool("fetch.fsckobjects", &fetch_fsck_objects);
|
|
|
|
git_config_get_bool("transfer.fsckobjects", &transfer_fsck_objects);
|
2020-11-12 07:29:31 +08:00
|
|
|
git_config_get_bool("transfer.advertisesid", &advertise_sid);
|
2020-06-11 04:57:23 +08:00
|
|
|
if (!uri_protocols.nr) {
|
|
|
|
char *str;
|
|
|
|
|
|
|
|
if (!git_config_get_string("fetch.uriprotocols", &str) && str) {
|
|
|
|
string_list_split(&uri_protocols, str, ',', -1);
|
|
|
|
free(str);
|
|
|
|
}
|
|
|
|
}
|
2012-10-26 23:53:55 +08:00
|
|
|
|
2018-07-27 22:37:17 +08:00
|
|
|
git_config(fetch_pack_config_cb, NULL);
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void fetch_pack_setup(void)
|
|
|
|
{
|
|
|
|
static int did_setup;
|
|
|
|
if (did_setup)
|
|
|
|
return;
|
2014-08-08 00:21:20 +08:00
|
|
|
fetch_pack_config();
|
2023-08-23 09:30:21 +08:00
|
|
|
if (0 <= fetch_unpack_limit)
|
2012-10-26 23:53:55 +08:00
|
|
|
unpack_limit = fetch_unpack_limit;
|
2023-08-23 09:30:21 +08:00
|
|
|
else if (0 <= transfer_unpack_limit)
|
|
|
|
unpack_limit = transfer_unpack_limit;
|
2012-10-26 23:53:55 +08:00
|
|
|
did_setup = 1;
|
|
|
|
}
|
|
|
|
|
2013-01-30 06:02:15 +08:00
|
|
|
static int remove_duplicates_in_refs(struct ref **ref, int nr)
|
|
|
|
{
|
|
|
|
struct string_list names = STRING_LIST_INIT_NODUP;
|
|
|
|
int src, dst;
|
|
|
|
|
|
|
|
for (src = dst = 0; src < nr; src++) {
|
|
|
|
struct string_list_item *item;
|
|
|
|
item = string_list_insert(&names, ref[src]->name);
|
|
|
|
if (item->util)
|
|
|
|
continue; /* already have it */
|
|
|
|
item->util = ref[src];
|
|
|
|
if (src != dst)
|
|
|
|
ref[dst] = ref[src];
|
|
|
|
dst++;
|
|
|
|
}
|
|
|
|
for (src = dst; src < nr; src++)
|
|
|
|
ref[src] = NULL;
|
|
|
|
string_list_clear(&names, 0);
|
|
|
|
return dst;
|
|
|
|
}
|
|
|
|
|
2013-12-05 21:02:39 +08:00
|
|
|
static void update_shallow(struct fetch_pack_args *args,
|
fetch-pack: unify ref in and out param
When a user fetches:
- at least one up-to-date ref and at least one non-up-to-date ref,
- using HTTP with protocol v0 (or something else that uses the fetch
command of a remote helper)
some refs might not be updated after the fetch.
This bug was introduced in commit 989b8c4452 ("fetch-pack: put shallow
info in output parameter", 2018-06-28) which allowed transports to
report the refs that they have fetched in a new out-parameter
"fetched_refs". If they do so, transport_fetch_refs() makes this
information available to its caller.
Users of "fetched_refs" rely on the following 3 properties:
(1) it is the complete list of refs that was passed to
transport_fetch_refs(),
(2) it has shallow information (REF_STATUS_REJECT_SHALLOW set if
relevant), and
(3) it has updated OIDs if ref-in-want was used (introduced after
989b8c4452).
In an effort to satisfy (1), whenever transport_fetch_refs()
filters the refs sent to the transport, it re-adds the filtered refs to
whatever the transport supplies before returning it to the user.
However, the implementation in 989b8c4452 unconditionally re-adds the
filtered refs without checking if the transport refrained from reporting
anything in "fetched_refs" (which it is allowed to do), resulting in an
incomplete list, no longer satisfying (1).
An earlier effort to resolve this [1] solved the issue by readding the
filtered refs only if the transport did not refrain from reporting in
"fetched_refs", but after further discussion, it seems that the better
solution is to revert the API change that introduced "fetched_refs".
This API change was first suggested as part of a ref-in-want
implementation that allowed for ref patterns and, thus, there could be
drastic differences between the input refs and the refs actually fetched
[2]; we eventually decided to only allow exact ref names, but this API
change remained even though its necessity was decreased.
Therefore, revert this API change by reverting commit 989b8c4452, and
make receive_wanted_refs() update the OIDs in the sought array (like how
update_shallow() updates shallow information in the sought array)
instead. A test is also included to show that the user-visible bug
discussed at the beginning of this commit message no longer exists.
[1] https://public-inbox.org/git/20180801171806.GA122458@google.com/
[2] https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathantanmy@google.com/
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-02 04:13:20 +08:00
|
|
|
struct ref **sought, int nr_sought,
|
2013-12-05 21:02:39 +08:00
|
|
|
struct shallow_info *si)
|
2013-12-05 21:02:37 +08:00
|
|
|
{
|
2017-03-31 09:40:00 +08:00
|
|
|
struct oid_array ref = OID_ARRAY_INIT;
|
2013-12-05 21:02:40 +08:00
|
|
|
int *status;
|
2013-12-05 21:02:39 +08:00
|
|
|
int i;
|
|
|
|
|
2016-06-12 18:53:56 +08:00
|
|
|
if (args->deepen && alternate_shallow_file) {
|
2013-12-05 21:02:37 +08:00
|
|
|
if (*alternate_shallow_file == '\0') { /* --unshallow */
|
2018-05-18 06:51:51 +08:00
|
|
|
unlink_or_warn(git_path_shallow(the_repository));
|
shallow.c: use '{commit,rollback}_shallow_file'
In bd0b42aed3 (fetch-pack: do not take shallow lock unnecessarily,
2019-01-10), the author noted that 'is_repository_shallow' produces
visible side-effect(s) by setting 'is_shallow' and 'shallow_stat'.
This is a problem for e.g., fetching with '--update-shallow' in a
shallow repository with 'fetch.writeCommitGraph' enabled, since the
update to '.git/shallow' will cause Git to think that the repository
isn't shallow when it is, thereby circumventing the commit-graph
compatibility check.
This causes problems in shallow repositories with at least shallow refs
that have at least one ancestor (since the client won't have those
objects, and therefore can't take the reachability closure over commits
when writing a commit-graph).
Address this by introducing thin wrappers over 'commit_lock_file' and
'rollback_lock_file' for use specifically when the lock is held over
'.git/shallow'. These wrappers (appropriately called
'commit_shallow_file' and 'rollback_shallow_file') call into their
respective functions in 'lockfile.h', but additionally reset validity
checks used by the shallow machinery.
Replace each instance of 'commit_lock_file' and 'rollback_lock_file'
with 'commit_shallow_file' and 'rollback_shallow_file' when the lock
being held is over the '.git/shallow' file.
As a result, 'prune_shallow' can now only be called once (since
'check_shallow_file_for_update' will die after calling
'reset_repository_shallow'). But, this is OK since we only call
'prune_shallow' at most once per process.
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-23 08:25:45 +08:00
|
|
|
rollback_shallow_file(the_repository, &shallow_lock);
|
2013-12-05 21:02:37 +08:00
|
|
|
} else
|
shallow.c: use '{commit,rollback}_shallow_file'
In bd0b42aed3 (fetch-pack: do not take shallow lock unnecessarily,
2019-01-10), the author noted that 'is_repository_shallow' produces
visible side-effect(s) by setting 'is_shallow' and 'shallow_stat'.
This is a problem for e.g., fetching with '--update-shallow' in a
shallow repository with 'fetch.writeCommitGraph' enabled, since the
update to '.git/shallow' will cause Git to think that the repository
isn't shallow when it is, thereby circumventing the commit-graph
compatibility check.
This causes problems in shallow repositories with at least shallow refs
that have at least one ancestor (since the client won't have those
objects, and therefore can't take the reachability closure over commits
when writing a commit-graph).
Address this by introducing thin wrappers over 'commit_lock_file' and
'rollback_lock_file' for use specifically when the lock is held over
'.git/shallow'. These wrappers (appropriately called
'commit_shallow_file' and 'rollback_shallow_file') call into their
respective functions in 'lockfile.h', but additionally reset validity
checks used by the shallow machinery.
Replace each instance of 'commit_lock_file' and 'rollback_lock_file'
with 'commit_shallow_file' and 'rollback_shallow_file' when the lock
being held is over the '.git/shallow' file.
As a result, 'prune_shallow' can now only be called once (since
'check_shallow_file_for_update' will die after calling
'reset_repository_shallow'). But, this is OK since we only call
'prune_shallow' at most once per process.
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-23 08:25:45 +08:00
|
|
|
commit_shallow_file(the_repository, &shallow_lock);
|
2019-02-04 08:06:50 +08:00
|
|
|
alternate_shallow_file = NULL;
|
2013-12-05 21:02:37 +08:00
|
|
|
return;
|
|
|
|
}
|
2013-12-05 21:02:39 +08:00
|
|
|
|
|
|
|
if (!si->shallow || !si->shallow->nr)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (args->cloning) {
|
|
|
|
/*
|
|
|
|
* remote is shallow, but this is a clone, there are
|
|
|
|
* no objects in repo to worry about. Accept any
|
|
|
|
* shallow points that exist in the pack (iow in repo
|
|
|
|
* after get_pack() and reprepare_packed_git())
|
|
|
|
*/
|
2017-03-31 09:40:00 +08:00
|
|
|
struct oid_array extra = OID_ARRAY_INIT;
|
2017-03-27 00:01:37 +08:00
|
|
|
struct object_id *oid = si->shallow->oid;
|
2013-12-05 21:02:39 +08:00
|
|
|
for (i = 0; i < si->shallow->nr; i++)
|
2023-03-28 21:58:50 +08:00
|
|
|
if (repo_has_object_file(the_repository, &oid[i]))
|
2017-03-31 09:40:00 +08:00
|
|
|
oid_array_append(&extra, &oid[i]);
|
2013-12-05 21:02:39 +08:00
|
|
|
if (extra.nr) {
|
|
|
|
setup_alternate_shallow(&shallow_lock,
|
|
|
|
&alternate_shallow_file,
|
|
|
|
&extra);
|
shallow.c: use '{commit,rollback}_shallow_file'
In bd0b42aed3 (fetch-pack: do not take shallow lock unnecessarily,
2019-01-10), the author noted that 'is_repository_shallow' produces
visible side-effect(s) by setting 'is_shallow' and 'shallow_stat'.
This is a problem for e.g., fetching with '--update-shallow' in a
shallow repository with 'fetch.writeCommitGraph' enabled, since the
update to '.git/shallow' will cause Git to think that the repository
isn't shallow when it is, thereby circumventing the commit-graph
compatibility check.
This causes problems in shallow repositories with at least shallow refs
that have at least one ancestor (since the client won't have those
objects, and therefore can't take the reachability closure over commits
when writing a commit-graph).
Address this by introducing thin wrappers over 'commit_lock_file' and
'rollback_lock_file' for use specifically when the lock is held over
'.git/shallow'. These wrappers (appropriately called
'commit_shallow_file' and 'rollback_shallow_file') call into their
respective functions in 'lockfile.h', but additionally reset validity
checks used by the shallow machinery.
Replace each instance of 'commit_lock_file' and 'rollback_lock_file'
with 'commit_shallow_file' and 'rollback_shallow_file' when the lock
being held is over the '.git/shallow' file.
As a result, 'prune_shallow' can now only be called once (since
'check_shallow_file_for_update' will die after calling
'reset_repository_shallow'). But, this is OK since we only call
'prune_shallow' at most once per process.
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-23 08:25:45 +08:00
|
|
|
commit_shallow_file(the_repository, &shallow_lock);
|
2019-02-04 08:06:50 +08:00
|
|
|
alternate_shallow_file = NULL;
|
2013-12-05 21:02:39 +08:00
|
|
|
}
|
2017-03-31 09:40:00 +08:00
|
|
|
oid_array_clear(&extra);
|
2013-12-05 21:02:39 +08:00
|
|
|
return;
|
|
|
|
}
|
2013-12-05 21:02:40 +08:00
|
|
|
|
|
|
|
if (!si->nr_ours && !si->nr_theirs)
|
|
|
|
return;
|
|
|
|
|
|
|
|
remove_nonexistent_theirs_shallow(si);
|
|
|
|
if (!si->nr_ours && !si->nr_theirs)
|
|
|
|
return;
|
fetch-pack: unify ref in and out param
When a user fetches:
- at least one up-to-date ref and at least one non-up-to-date ref,
- using HTTP with protocol v0 (or something else that uses the fetch
command of a remote helper)
some refs might not be updated after the fetch.
This bug was introduced in commit 989b8c4452 ("fetch-pack: put shallow
info in output parameter", 2018-06-28) which allowed transports to
report the refs that they have fetched in a new out-parameter
"fetched_refs". If they do so, transport_fetch_refs() makes this
information available to its caller.
Users of "fetched_refs" rely on the following 3 properties:
(1) it is the complete list of refs that was passed to
transport_fetch_refs(),
(2) it has shallow information (REF_STATUS_REJECT_SHALLOW set if
relevant), and
(3) it has updated OIDs if ref-in-want was used (introduced after
989b8c4452).
In an effort to satisfy (1), whenever transport_fetch_refs()
filters the refs sent to the transport, it re-adds the filtered refs to
whatever the transport supplies before returning it to the user.
However, the implementation in 989b8c4452 unconditionally re-adds the
filtered refs without checking if the transport refrained from reporting
anything in "fetched_refs" (which it is allowed to do), resulting in an
incomplete list, no longer satisfying (1).
An earlier effort to resolve this [1] solved the issue by readding the
filtered refs only if the transport did not refrain from reporting in
"fetched_refs", but after further discussion, it seems that the better
solution is to revert the API change that introduced "fetched_refs".
This API change was first suggested as part of a ref-in-want
implementation that allowed for ref patterns and, thus, there could be
drastic differences between the input refs and the refs actually fetched
[2]; we eventually decided to only allow exact ref names, but this API
change remained even though its necessity was decreased.
Therefore, revert this API change by reverting commit 989b8c4452, and
make receive_wanted_refs() update the OIDs in the sought array (like how
update_shallow() updates shallow information in the sought array)
instead. A test is also included to show that the user-visible bug
discussed at the beginning of this commit message no longer exists.
[1] https://public-inbox.org/git/20180801171806.GA122458@google.com/
[2] https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathantanmy@google.com/
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-02 04:13:20 +08:00
|
|
|
for (i = 0; i < nr_sought; i++)
|
|
|
|
oid_array_append(&ref, &sought[i]->old_oid);
|
2013-12-05 21:02:40 +08:00
|
|
|
si->ref = &ref;
|
|
|
|
|
2013-12-05 21:02:42 +08:00
|
|
|
if (args->update_shallow) {
|
|
|
|
/*
|
|
|
|
* remote is also shallow, .git/shallow may be updated
|
|
|
|
* so all refs can be accepted. Make sure we only add
|
|
|
|
* shallow roots that are actually reachable from new
|
|
|
|
* refs.
|
|
|
|
*/
|
2017-03-31 09:40:00 +08:00
|
|
|
struct oid_array extra = OID_ARRAY_INIT;
|
2017-03-27 00:01:37 +08:00
|
|
|
struct object_id *oid = si->shallow->oid;
|
2013-12-05 21:02:42 +08:00
|
|
|
assign_shallow_commits_to_refs(si, NULL, NULL);
|
|
|
|
if (!si->nr_ours && !si->nr_theirs) {
|
2017-03-31 09:40:00 +08:00
|
|
|
oid_array_clear(&ref);
|
2013-12-05 21:02:42 +08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
for (i = 0; i < si->nr_ours; i++)
|
2017-03-31 09:40:00 +08:00
|
|
|
oid_array_append(&extra, &oid[si->ours[i]]);
|
2013-12-05 21:02:42 +08:00
|
|
|
for (i = 0; i < si->nr_theirs; i++)
|
2017-03-31 09:40:00 +08:00
|
|
|
oid_array_append(&extra, &oid[si->theirs[i]]);
|
2013-12-05 21:02:42 +08:00
|
|
|
setup_alternate_shallow(&shallow_lock,
|
|
|
|
&alternate_shallow_file,
|
|
|
|
&extra);
|
shallow.c: use '{commit,rollback}_shallow_file'
In bd0b42aed3 (fetch-pack: do not take shallow lock unnecessarily,
2019-01-10), the author noted that 'is_repository_shallow' produces
visible side-effect(s) by setting 'is_shallow' and 'shallow_stat'.
This is a problem for e.g., fetching with '--update-shallow' in a
shallow repository with 'fetch.writeCommitGraph' enabled, since the
update to '.git/shallow' will cause Git to think that the repository
isn't shallow when it is, thereby circumventing the commit-graph
compatibility check.
This causes problems in shallow repositories with at least shallow refs
that have at least one ancestor (since the client won't have those
objects, and therefore can't take the reachability closure over commits
when writing a commit-graph).
Address this by introducing thin wrappers over 'commit_lock_file' and
'rollback_lock_file' for use specifically when the lock is held over
'.git/shallow'. These wrappers (appropriately called
'commit_shallow_file' and 'rollback_shallow_file') call into their
respective functions in 'lockfile.h', but additionally reset validity
checks used by the shallow machinery.
Replace each instance of 'commit_lock_file' and 'rollback_lock_file'
with 'commit_shallow_file' and 'rollback_shallow_file' when the lock
being held is over the '.git/shallow' file.
As a result, 'prune_shallow' can now only be called once (since
'check_shallow_file_for_update' will die after calling
'reset_repository_shallow'). But, this is OK since we only call
'prune_shallow' at most once per process.
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-23 08:25:45 +08:00
|
|
|
commit_shallow_file(the_repository, &shallow_lock);
|
2017-03-31 09:40:00 +08:00
|
|
|
oid_array_clear(&extra);
|
|
|
|
oid_array_clear(&ref);
|
2019-02-04 08:06:50 +08:00
|
|
|
alternate_shallow_file = NULL;
|
2013-12-05 21:02:42 +08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2013-12-05 21:02:40 +08:00
|
|
|
/*
|
|
|
|
* remote is also shallow, check what ref is safe to update
|
|
|
|
* without updating .git/shallow
|
|
|
|
*/
|
2021-03-14 00:17:22 +08:00
|
|
|
CALLOC_ARRAY(status, nr_sought);
|
2013-12-05 21:02:40 +08:00
|
|
|
assign_shallow_commits_to_refs(si, NULL, status);
|
|
|
|
if (si->nr_ours || si->nr_theirs) {
|
fetch-pack: unify ref in and out param
When a user fetches:
- at least one up-to-date ref and at least one non-up-to-date ref,
- using HTTP with protocol v0 (or something else that uses the fetch
command of a remote helper)
some refs might not be updated after the fetch.
This bug was introduced in commit 989b8c4452 ("fetch-pack: put shallow
info in output parameter", 2018-06-28) which allowed transports to
report the refs that they have fetched in a new out-parameter
"fetched_refs". If they do so, transport_fetch_refs() makes this
information available to its caller.
Users of "fetched_refs" rely on the following 3 properties:
(1) it is the complete list of refs that was passed to
transport_fetch_refs(),
(2) it has shallow information (REF_STATUS_REJECT_SHALLOW set if
relevant), and
(3) it has updated OIDs if ref-in-want was used (introduced after
989b8c4452).
In an effort to satisfy (1), whenever transport_fetch_refs()
filters the refs sent to the transport, it re-adds the filtered refs to
whatever the transport supplies before returning it to the user.
However, the implementation in 989b8c4452 unconditionally re-adds the
filtered refs without checking if the transport refrained from reporting
anything in "fetched_refs" (which it is allowed to do), resulting in an
incomplete list, no longer satisfying (1).
An earlier effort to resolve this [1] solved the issue by readding the
filtered refs only if the transport did not refrain from reporting in
"fetched_refs", but after further discussion, it seems that the better
solution is to revert the API change that introduced "fetched_refs".
This API change was first suggested as part of a ref-in-want
implementation that allowed for ref patterns and, thus, there could be
drastic differences between the input refs and the refs actually fetched
[2]; we eventually decided to only allow exact ref names, but this API
change remained even though its necessity was decreased.
Therefore, revert this API change by reverting commit 989b8c4452, and
make receive_wanted_refs() update the OIDs in the sought array (like how
update_shallow() updates shallow information in the sought array)
instead. A test is also included to show that the user-visible bug
discussed at the beginning of this commit message no longer exists.
[1] https://public-inbox.org/git/20180801171806.GA122458@google.com/
[2] https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathantanmy@google.com/
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-02 04:13:20 +08:00
|
|
|
for (i = 0; i < nr_sought; i++)
|
2013-12-05 21:02:40 +08:00
|
|
|
if (status[i])
|
fetch-pack: unify ref in and out param
When a user fetches:
- at least one up-to-date ref and at least one non-up-to-date ref,
- using HTTP with protocol v0 (or something else that uses the fetch
command of a remote helper)
some refs might not be updated after the fetch.
This bug was introduced in commit 989b8c4452 ("fetch-pack: put shallow
info in output parameter", 2018-06-28) which allowed transports to
report the refs that they have fetched in a new out-parameter
"fetched_refs". If they do so, transport_fetch_refs() makes this
information available to its caller.
Users of "fetched_refs" rely on the following 3 properties:
(1) it is the complete list of refs that was passed to
transport_fetch_refs(),
(2) it has shallow information (REF_STATUS_REJECT_SHALLOW set if
relevant), and
(3) it has updated OIDs if ref-in-want was used (introduced after
989b8c4452).
In an effort to satisfy (1), whenever transport_fetch_refs()
filters the refs sent to the transport, it re-adds the filtered refs to
whatever the transport supplies before returning it to the user.
However, the implementation in 989b8c4452 unconditionally re-adds the
filtered refs without checking if the transport refrained from reporting
anything in "fetched_refs" (which it is allowed to do), resulting in an
incomplete list, no longer satisfying (1).
An earlier effort to resolve this [1] solved the issue by readding the
filtered refs only if the transport did not refrain from reporting in
"fetched_refs", but after further discussion, it seems that the better
solution is to revert the API change that introduced "fetched_refs".
This API change was first suggested as part of a ref-in-want
implementation that allowed for ref patterns and, thus, there could be
drastic differences between the input refs and the refs actually fetched
[2]; we eventually decided to only allow exact ref names, but this API
change remained even though its necessity was decreased.
Therefore, revert this API change by reverting commit 989b8c4452, and
make receive_wanted_refs() update the OIDs in the sought array (like how
update_shallow() updates shallow information in the sought array)
instead. A test is also included to show that the user-visible bug
discussed at the beginning of this commit message no longer exists.
[1] https://public-inbox.org/git/20180801171806.GA122458@google.com/
[2] https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathantanmy@google.com/
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-02 04:13:20 +08:00
|
|
|
sought[i]->status = REF_STATUS_REJECT_SHALLOW;
|
2013-12-05 21:02:40 +08:00
|
|
|
}
|
|
|
|
free(status);
|
2017-03-31 09:40:00 +08:00
|
|
|
oid_array_clear(&ref);
|
2013-12-05 21:02:37 +08:00
|
|
|
}
|
|
|
|
|
2021-09-01 21:09:50 +08:00
|
|
|
static const struct object_id *iterate_ref_map(void *cb_data)
|
fetch-pack: write shallow, then check connectivity
When fetching, connectivity is checked after the shallow file is
updated. There are 2 issues with this: (1) the connectivity check is
only performed up to ancestors of existing refs (which is not thorough
enough if we were deepening an existing ref in the first place), and (2)
there is no rollback of the shallow file if the connectivity check
fails.
To solve (1), update the connectivity check to check the ancestry chain
completely in the case of a deepening fetch by refraining from passing
"--not --all" when invoking rev-list in connected.c.
To solve (2), have fetch_pack() perform its own connectivity check
before updating the shallow file. To support existing use cases in which
"git fetch-pack" is used to download objects without much regard as to
the connectivity of the resulting objects with respect to the existing
repository, the connectivity check is only done if necessary (that is,
the fetch is not a clone, and the fetch involves shallow/deepen
functionality). "git fetch" still performs its own connectivity check,
preserving correctness but sometimes performing redundant work. This
redundancy is mitigated by the fact that fetch_pack() reports if it has
performed a connectivity check itself, and if the transport supports
connect or stateless-connect, it will bubble up that report so that "git
fetch" knows not to perform the connectivity check in such a case.
This was noticed when a user tried to deepen an existing repository by
fetching with --no-shallow from a server that did not send all necessary
objects - the connectivity check as run by "git fetch" succeeded, but a
subsequent "git fsck" failed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03 06:08:43 +08:00
|
|
|
{
|
|
|
|
struct ref **rm = cb_data;
|
|
|
|
struct ref *ref = *rm;
|
|
|
|
|
|
|
|
if (!ref)
|
2021-09-01 21:09:50 +08:00
|
|
|
return NULL;
|
fetch-pack: write shallow, then check connectivity
When fetching, connectivity is checked after the shallow file is
updated. There are 2 issues with this: (1) the connectivity check is
only performed up to ancestors of existing refs (which is not thorough
enough if we were deepening an existing ref in the first place), and (2)
there is no rollback of the shallow file if the connectivity check
fails.
To solve (1), update the connectivity check to check the ancestry chain
completely in the case of a deepening fetch by refraining from passing
"--not --all" when invoking rev-list in connected.c.
To solve (2), have fetch_pack() perform its own connectivity check
before updating the shallow file. To support existing use cases in which
"git fetch-pack" is used to download objects without much regard as to
the connectivity of the resulting objects with respect to the existing
repository, the connectivity check is only done if necessary (that is,
the fetch is not a clone, and the fetch involves shallow/deepen
functionality). "git fetch" still performs its own connectivity check,
preserving correctness but sometimes performing redundant work. This
redundancy is mitigated by the fact that fetch_pack() reports if it has
performed a connectivity check itself, and if the transport supports
connect or stateless-connect, it will bubble up that report so that "git
fetch" knows not to perform the connectivity check in such a case.
This was noticed when a user tried to deepen an existing repository by
fetching with --no-shallow from a server that did not send all necessary
objects - the connectivity check as run by "git fetch" succeeded, but a
subsequent "git fsck" failed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03 06:08:43 +08:00
|
|
|
*rm = ref->next;
|
2021-09-01 21:09:50 +08:00
|
|
|
return &ref->old_oid;
|
fetch-pack: write shallow, then check connectivity
When fetching, connectivity is checked after the shallow file is
updated. There are 2 issues with this: (1) the connectivity check is
only performed up to ancestors of existing refs (which is not thorough
enough if we were deepening an existing ref in the first place), and (2)
there is no rollback of the shallow file if the connectivity check
fails.
To solve (1), update the connectivity check to check the ancestry chain
completely in the case of a deepening fetch by refraining from passing
"--not --all" when invoking rev-list in connected.c.
To solve (2), have fetch_pack() perform its own connectivity check
before updating the shallow file. To support existing use cases in which
"git fetch-pack" is used to download objects without much regard as to
the connectivity of the resulting objects with respect to the existing
repository, the connectivity check is only done if necessary (that is,
the fetch is not a clone, and the fetch involves shallow/deepen
functionality). "git fetch" still performs its own connectivity check,
preserving correctness but sometimes performing redundant work. This
redundancy is mitigated by the fact that fetch_pack() reports if it has
performed a connectivity check itself, and if the transport supports
connect or stateless-connect, it will bubble up that report so that "git
fetch" knows not to perform the connectivity check in such a case.
This was noticed when a user tried to deepen an existing repository by
fetching with --no-shallow from a server that did not send all necessary
objects - the connectivity check as run by "git fetch" succeeded, but a
subsequent "git fsck" failed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03 06:08:43 +08:00
|
|
|
}
|
|
|
|
|
2024-06-19 12:07:32 +08:00
|
|
|
int fetch_pack_fsck_objects(void)
|
|
|
|
{
|
|
|
|
fetch_pack_setup();
|
|
|
|
if (fetch_fsck_objects >= 0)
|
|
|
|
return fetch_fsck_objects;
|
|
|
|
if (transfer_fsck_objects >= 0)
|
|
|
|
return transfer_fsck_objects;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2012-10-26 23:53:55 +08:00
|
|
|
struct ref *fetch_pack(struct fetch_pack_args *args,
|
2019-03-20 16:16:14 +08:00
|
|
|
int fd[],
|
2012-10-26 23:53:55 +08:00
|
|
|
const struct ref *ref,
|
2013-01-30 06:02:15 +08:00
|
|
|
struct ref **sought, int nr_sought,
|
2017-03-31 09:40:00 +08:00
|
|
|
struct oid_array *shallow,
|
fetch-pack: support more than one pack lockfile
Whenever a fetch results in a packfile being downloaded, a .keep file is
generated, so that the packfile can be preserved (from, say, a running
"git repack") until refs are written referring to the contents of the
packfile.
In a subsequent patch, a successful fetch using protocol v2 may result
in more than one .keep file being generated. Therefore, teach
fetch_pack() and the transport mechanism to support multiple .keep
files.
Implementation notes:
- builtin/fetch-pack.c normally does not generate .keep files, and thus
is unaffected by this or future changes. However, it has an
undocumented "--lock-pack" feature, used by remote-curl.c when
implementing the "fetch" remote helper command. In keeping with the
remote helper protocol, only one "lock" line will ever be written;
the rest will result in warnings to stderr. However, in practice,
warnings will never be written because the remote-curl.c "fetch" is
only used for protocol v0/v1 (which will not generate multiple .keep
files). (Protocol v2 uses the "stateless-connect" command, not the
"fetch" command.)
- connected.c has an optimization in that connectivity checks on a ref
need not be done if the target object is in a pack known to be
self-contained and connected. If there are multiple packfiles, this
optimization can no longer be done.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 04:57:22 +08:00
|
|
|
struct string_list *pack_lockfiles,
|
2018-03-16 01:31:28 +08:00
|
|
|
enum protocol_version version)
|
2012-10-26 23:53:55 +08:00
|
|
|
{
|
|
|
|
struct ref *ref_cpy;
|
2013-12-05 21:02:39 +08:00
|
|
|
struct shallow_info si;
|
fetch-pack: respect --no-update-shallow in v2
In protocol v0, when sending "shallow" lines, the server distinguishes
between lines caused by the remote repo being shallow and lines caused
by client-specified depth settings. Unless "--update-shallow" is
specified, there is a difference in behavior: refs that reach the former
"shallow" lines, but not the latter, are rejected. But in v2, the server
does not, and the client treats all "shallow" lines like lines caused by
client-specified depth settings.
Full restoration of v0 functionality is not possible without protocol
change, but we can implement a heuristic: if we specify any depth
setting, treat all "shallow" lines like lines caused by client-specified
depth settings (that is, unaffected by "--no-update-shallow"), but
otherwise, treat them like lines caused by the remote repo being shallow
(that is, affected by "--no-update-shallow"). This restores most of v0
behavior, except in the case where a client fetches from a shallow
repository with depth settings.
This patch causes a test that previously failed with
GIT_TEST_PROTOCOL_VERSION=2 to pass.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-27 03:31:21 +08:00
|
|
|
struct oid_array shallows_scratch = OID_ARRAY_INIT;
|
2012-10-26 23:53:55 +08:00
|
|
|
|
|
|
|
fetch_pack_setup();
|
2013-01-30 06:02:15 +08:00
|
|
|
if (nr_sought)
|
|
|
|
nr_sought = remove_duplicates_in_refs(sought, nr_sought);
|
2012-10-26 23:53:55 +08:00
|
|
|
|
2018-09-28 03:24:05 +08:00
|
|
|
if (version != protocol_v2 && !ref) {
|
2012-10-26 23:53:55 +08:00
|
|
|
packet_flush(fd[1]);
|
2016-06-12 18:53:55 +08:00
|
|
|
die(_("no matching remote head"));
|
2012-10-26 23:53:55 +08:00
|
|
|
}
|
2019-03-27 03:31:20 +08:00
|
|
|
if (version == protocol_v2) {
|
|
|
|
if (shallow->nr)
|
|
|
|
BUG("Protocol V2 does not provide shallows at this point in the fetch");
|
|
|
|
memset(&si, 0, sizeof(si));
|
2018-03-16 01:31:28 +08:00
|
|
|
ref_cpy = do_fetch_pack_v2(args, fd, ref, sought, nr_sought,
|
fetch-pack: respect --no-update-shallow in v2
In protocol v0, when sending "shallow" lines, the server distinguishes
between lines caused by the remote repo being shallow and lines caused
by client-specified depth settings. Unless "--update-shallow" is
specified, there is a difference in behavior: refs that reach the former
"shallow" lines, but not the latter, are rejected. But in v2, the server
does not, and the client treats all "shallow" lines like lines caused by
client-specified depth settings.
Full restoration of v0 functionality is not possible without protocol
change, but we can implement a heuristic: if we specify any depth
setting, treat all "shallow" lines like lines caused by client-specified
depth settings (that is, unaffected by "--no-update-shallow"), but
otherwise, treat them like lines caused by the remote repo being shallow
(that is, affected by "--no-update-shallow"). This restores most of v0
behavior, except in the case where a client fetches from a shallow
repository with depth settings.
This patch causes a test that previously failed with
GIT_TEST_PROTOCOL_VERSION=2 to pass.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-27 03:31:21 +08:00
|
|
|
&shallows_scratch, &si,
|
fetch-pack: support more than one pack lockfile
Whenever a fetch results in a packfile being downloaded, a .keep file is
generated, so that the packfile can be preserved (from, say, a running
"git repack") until refs are written referring to the contents of the
packfile.
In a subsequent patch, a successful fetch using protocol v2 may result
in more than one .keep file being generated. Therefore, teach
fetch_pack() and the transport mechanism to support multiple .keep
files.
Implementation notes:
- builtin/fetch-pack.c normally does not generate .keep files, and thus
is unaffected by this or future changes. However, it has an
undocumented "--lock-pack" feature, used by remote-curl.c when
implementing the "fetch" remote helper command. In keeping with the
remote helper protocol, only one "lock" line will ever be written;
the rest will result in warnings to stderr. However, in practice,
warnings will never be written because the remote-curl.c "fetch" is
only used for protocol v0/v1 (which will not generate multiple .keep
files). (Protocol v2 uses the "stateless-connect" command, not the
"fetch" command.)
- connected.c has an optimization in that connectivity checks on a ref
need not be done if the target object is in a pack known to be
self-contained and connected. If there are multiple packfiles, this
optimization can no longer be done.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 04:57:22 +08:00
|
|
|
pack_lockfiles);
|
2019-03-27 03:31:20 +08:00
|
|
|
} else {
|
|
|
|
prepare_shallow_info(&si, shallow);
|
2018-03-16 01:31:28 +08:00
|
|
|
ref_cpy = do_fetch_pack(args, fd, ref, sought, nr_sought,
|
fetch-pack: support more than one pack lockfile
Whenever a fetch results in a packfile being downloaded, a .keep file is
generated, so that the packfile can be preserved (from, say, a running
"git repack") until refs are written referring to the contents of the
packfile.
In a subsequent patch, a successful fetch using protocol v2 may result
in more than one .keep file being generated. Therefore, teach
fetch_pack() and the transport mechanism to support multiple .keep
files.
Implementation notes:
- builtin/fetch-pack.c normally does not generate .keep files, and thus
is unaffected by this or future changes. However, it has an
undocumented "--lock-pack" feature, used by remote-curl.c when
implementing the "fetch" remote helper command. In keeping with the
remote helper protocol, only one "lock" line will ever be written;
the rest will result in warnings to stderr. However, in practice,
warnings will never be written because the remote-curl.c "fetch" is
only used for protocol v0/v1 (which will not generate multiple .keep
files). (Protocol v2 uses the "stateless-connect" command, not the
"fetch" command.)
- connected.c has an optimization in that connectivity checks on a ref
need not be done if the target object is in a pack known to be
self-contained and connected. If there are multiple packfiles, this
optimization can no longer be done.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 04:57:22 +08:00
|
|
|
&si, pack_lockfiles);
|
2019-03-27 03:31:20 +08:00
|
|
|
}
|
2018-03-24 01:45:21 +08:00
|
|
|
reprepare_packed_git(the_repository);
|
fetch-pack: write shallow, then check connectivity
When fetching, connectivity is checked after the shallow file is
updated. There are 2 issues with this: (1) the connectivity check is
only performed up to ancestors of existing refs (which is not thorough
enough if we were deepening an existing ref in the first place), and (2)
there is no rollback of the shallow file if the connectivity check
fails.
To solve (1), update the connectivity check to check the ancestry chain
completely in the case of a deepening fetch by refraining from passing
"--not --all" when invoking rev-list in connected.c.
To solve (2), have fetch_pack() perform its own connectivity check
before updating the shallow file. To support existing use cases in which
"git fetch-pack" is used to download objects without much regard as to
the connectivity of the resulting objects with respect to the existing
repository, the connectivity check is only done if necessary (that is,
the fetch is not a clone, and the fetch involves shallow/deepen
functionality). "git fetch" still performs its own connectivity check,
preserving correctness but sometimes performing redundant work. This
redundancy is mitigated by the fact that fetch_pack() reports if it has
performed a connectivity check itself, and if the transport supports
connect or stateless-connect, it will bubble up that report so that "git
fetch" knows not to perform the connectivity check in such a case.
This was noticed when a user tried to deepen an existing repository by
fetching with --no-shallow from a server that did not send all necessary
objects - the connectivity check as run by "git fetch" succeeded, but a
subsequent "git fsck" failed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03 06:08:43 +08:00
|
|
|
|
|
|
|
if (!args->cloning && args->deepen) {
|
|
|
|
struct check_connected_options opt = CHECK_CONNECTED_INIT;
|
|
|
|
struct ref *iterator = ref_cpy;
|
|
|
|
opt.shallow_file = alternate_shallow_file;
|
|
|
|
if (args->deepen)
|
|
|
|
opt.is_deepening_fetch = 1;
|
|
|
|
if (check_connected(iterate_ref_map, &iterator, &opt)) {
|
|
|
|
error(_("remote did not send all necessary objects"));
|
|
|
|
free_refs(ref_cpy);
|
|
|
|
ref_cpy = NULL;
|
shallow.c: use '{commit,rollback}_shallow_file'
In bd0b42aed3 (fetch-pack: do not take shallow lock unnecessarily,
2019-01-10), the author noted that 'is_repository_shallow' produces
visible side-effect(s) by setting 'is_shallow' and 'shallow_stat'.
This is a problem for e.g., fetching with '--update-shallow' in a
shallow repository with 'fetch.writeCommitGraph' enabled, since the
update to '.git/shallow' will cause Git to think that the repository
isn't shallow when it is, thereby circumventing the commit-graph
compatibility check.
This causes problems in shallow repositories with at least shallow refs
that have at least one ancestor (since the client won't have those
objects, and therefore can't take the reachability closure over commits
when writing a commit-graph).
Address this by introducing thin wrappers over 'commit_lock_file' and
'rollback_lock_file' for use specifically when the lock is held over
'.git/shallow'. These wrappers (appropriately called
'commit_shallow_file' and 'rollback_shallow_file') call into their
respective functions in 'lockfile.h', but additionally reset validity
checks used by the shallow machinery.
Replace each instance of 'commit_lock_file' and 'rollback_lock_file'
with 'commit_shallow_file' and 'rollback_shallow_file' when the lock
being held is over the '.git/shallow' file.
As a result, 'prune_shallow' can now only be called once (since
'check_shallow_file_for_update' will die after calling
'reset_repository_shallow'). But, this is OK since we only call
'prune_shallow' at most once per process.
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-23 08:25:45 +08:00
|
|
|
rollback_shallow_file(the_repository, &shallow_lock);
|
fetch-pack: write shallow, then check connectivity
When fetching, connectivity is checked after the shallow file is
updated. There are 2 issues with this: (1) the connectivity check is
only performed up to ancestors of existing refs (which is not thorough
enough if we were deepening an existing ref in the first place), and (2)
there is no rollback of the shallow file if the connectivity check
fails.
To solve (1), update the connectivity check to check the ancestry chain
completely in the case of a deepening fetch by refraining from passing
"--not --all" when invoking rev-list in connected.c.
To solve (2), have fetch_pack() perform its own connectivity check
before updating the shallow file. To support existing use cases in which
"git fetch-pack" is used to download objects without much regard as to
the connectivity of the resulting objects with respect to the existing
repository, the connectivity check is only done if necessary (that is,
the fetch is not a clone, and the fetch involves shallow/deepen
functionality). "git fetch" still performs its own connectivity check,
preserving correctness but sometimes performing redundant work. This
redundancy is mitigated by the fact that fetch_pack() reports if it has
performed a connectivity check itself, and if the transport supports
connect or stateless-connect, it will bubble up that report so that "git
fetch" knows not to perform the connectivity check in such a case.
This was noticed when a user tried to deepen an existing repository by
fetching with --no-shallow from a server that did not send all necessary
objects - the connectivity check as run by "git fetch" succeeded, but a
subsequent "git fsck" failed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03 06:08:43 +08:00
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
args->connectivity_checked = 1;
|
|
|
|
}
|
|
|
|
|
fetch-pack: unify ref in and out param
When a user fetches:
- at least one up-to-date ref and at least one non-up-to-date ref,
- using HTTP with protocol v0 (or something else that uses the fetch
command of a remote helper)
some refs might not be updated after the fetch.
This bug was introduced in commit 989b8c4452 ("fetch-pack: put shallow
info in output parameter", 2018-06-28) which allowed transports to
report the refs that they have fetched in a new out-parameter
"fetched_refs". If they do so, transport_fetch_refs() makes this
information available to its caller.
Users of "fetched_refs" rely on the following 3 properties:
(1) it is the complete list of refs that was passed to
transport_fetch_refs(),
(2) it has shallow information (REF_STATUS_REJECT_SHALLOW set if
relevant), and
(3) it has updated OIDs if ref-in-want was used (introduced after
989b8c4452).
In an effort to satisfy (1), whenever transport_fetch_refs()
filters the refs sent to the transport, it re-adds the filtered refs to
whatever the transport supplies before returning it to the user.
However, the implementation in 989b8c4452 unconditionally re-adds the
filtered refs without checking if the transport refrained from reporting
anything in "fetched_refs" (which it is allowed to do), resulting in an
incomplete list, no longer satisfying (1).
An earlier effort to resolve this [1] solved the issue by readding the
filtered refs only if the transport did not refrain from reporting in
"fetched_refs", but after further discussion, it seems that the better
solution is to revert the API change that introduced "fetched_refs".
This API change was first suggested as part of a ref-in-want
implementation that allowed for ref patterns and, thus, there could be
drastic differences between the input refs and the refs actually fetched
[2]; we eventually decided to only allow exact ref names, but this API
change remained even though its necessity was decreased.
Therefore, revert this API change by reverting commit 989b8c4452, and
make receive_wanted_refs() update the OIDs in the sought array (like how
update_shallow() updates shallow information in the sought array)
instead. A test is also included to show that the user-visible bug
discussed at the beginning of this commit message no longer exists.
[1] https://public-inbox.org/git/20180801171806.GA122458@google.com/
[2] https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathantanmy@google.com/
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-02 04:13:20 +08:00
|
|
|
update_shallow(args, sought, nr_sought, &si);
|
fetch-pack: write shallow, then check connectivity
When fetching, connectivity is checked after the shallow file is
updated. There are 2 issues with this: (1) the connectivity check is
only performed up to ancestors of existing refs (which is not thorough
enough if we were deepening an existing ref in the first place), and (2)
there is no rollback of the shallow file if the connectivity check
fails.
To solve (1), update the connectivity check to check the ancestry chain
completely in the case of a deepening fetch by refraining from passing
"--not --all" when invoking rev-list in connected.c.
To solve (2), have fetch_pack() perform its own connectivity check
before updating the shallow file. To support existing use cases in which
"git fetch-pack" is used to download objects without much regard as to
the connectivity of the resulting objects with respect to the existing
repository, the connectivity check is only done if necessary (that is,
the fetch is not a clone, and the fetch involves shallow/deepen
functionality). "git fetch" still performs its own connectivity check,
preserving correctness but sometimes performing redundant work. This
redundancy is mitigated by the fact that fetch_pack() reports if it has
performed a connectivity check itself, and if the transport supports
connect or stateless-connect, it will bubble up that report so that "git
fetch" knows not to perform the connectivity check in such a case.
This was noticed when a user tried to deepen an existing repository by
fetching with --no-shallow from a server that did not send all necessary
objects - the connectivity check as run by "git fetch" succeeded, but a
subsequent "git fsck" failed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03 06:08:43 +08:00
|
|
|
cleanup:
|
2013-12-05 21:02:39 +08:00
|
|
|
clear_shallow_info(&si);
|
fetch-pack: respect --no-update-shallow in v2
In protocol v0, when sending "shallow" lines, the server distinguishes
between lines caused by the remote repo being shallow and lines caused
by client-specified depth settings. Unless "--update-shallow" is
specified, there is a difference in behavior: refs that reach the former
"shallow" lines, but not the latter, are rejected. But in v2, the server
does not, and the client treats all "shallow" lines like lines caused by
client-specified depth settings.
Full restoration of v0 functionality is not possible without protocol
change, but we can implement a heuristic: if we specify any depth
setting, treat all "shallow" lines like lines caused by client-specified
depth settings (that is, unaffected by "--no-update-shallow"), but
otherwise, treat them like lines caused by the remote repo being shallow
(that is, affected by "--no-update-shallow"). This restores most of v0
behavior, except in the case where a client fetches from a shallow
repository with depth settings.
This patch causes a test that previously failed with
GIT_TEST_PROTOCOL_VERSION=2 to pass.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-27 03:31:21 +08:00
|
|
|
oid_array_clear(&shallows_scratch);
|
2012-10-26 23:53:55 +08:00
|
|
|
return ref_cpy;
|
|
|
|
}
|
2017-02-23 00:01:22 +08:00
|
|
|
|
fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 05:16:01 +08:00
|
|
|
static int add_to_object_array(const struct object_id *oid, void *data)
|
|
|
|
{
|
|
|
|
struct object_array *a = data;
|
|
|
|
|
|
|
|
add_object_array(lookup_object(the_repository, oid), "", a);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void clear_common_flag(struct oidset *s)
|
|
|
|
{
|
|
|
|
struct oidset_iter iter;
|
|
|
|
const struct object_id *oid;
|
|
|
|
oidset_iter_init(s, &iter);
|
|
|
|
|
|
|
|
while ((oid = oidset_iter_next(&iter))) {
|
|
|
|
struct object *obj = lookup_object(the_repository, oid);
|
|
|
|
obj->flags &= ~COMMON;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void negotiate_using_fetch(const struct oid_array *negotiation_tips,
|
|
|
|
const struct string_list *server_options,
|
|
|
|
int stateless_rpc,
|
|
|
|
int fd[],
|
|
|
|
struct oidset *acked_commits)
|
|
|
|
{
|
|
|
|
struct fetch_negotiator negotiator;
|
|
|
|
struct packet_reader reader;
|
|
|
|
struct object_array nt_object_array = OBJECT_ARRAY_INIT;
|
|
|
|
struct strbuf req_buf = STRBUF_INIT;
|
|
|
|
int haves_to_send = INITIAL_FLUSH;
|
|
|
|
int in_vain = 0;
|
|
|
|
int seen_ack = 0;
|
|
|
|
int last_iteration = 0;
|
fetch-pack: add tracing for negotiation rounds
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 06:04:05 +08:00
|
|
|
int negotiation_round = 0;
|
fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 05:16:01 +08:00
|
|
|
timestamp_t min_generation = GENERATION_NUMBER_INFINITY;
|
|
|
|
|
|
|
|
fetch_negotiator_init(the_repository, &negotiator);
|
|
|
|
mark_tips(&negotiator, negotiation_tips);
|
|
|
|
|
|
|
|
packet_reader_init(&reader, fd[0], NULL, 0,
|
|
|
|
PACKET_READ_CHOMP_NEWLINE |
|
|
|
|
PACKET_READ_DIE_ON_ERR_PACKET);
|
|
|
|
|
|
|
|
oid_array_for_each((struct oid_array *) negotiation_tips,
|
|
|
|
add_to_object_array,
|
|
|
|
&nt_object_array);
|
|
|
|
|
fetch-pack: add tracing for negotiation rounds
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 06:04:05 +08:00
|
|
|
trace2_region_enter("fetch-pack", "negotiate_using_fetch", the_repository);
|
fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 05:16:01 +08:00
|
|
|
while (!last_iteration) {
|
|
|
|
int haves_added;
|
|
|
|
struct object_id common_oid;
|
|
|
|
int received_ready = 0;
|
|
|
|
|
fetch-pack: add tracing for negotiation rounds
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 06:04:05 +08:00
|
|
|
negotiation_round++;
|
|
|
|
|
|
|
|
trace2_region_enter_printf("negotiate_using_fetch", "round",
|
|
|
|
the_repository, "%d",
|
|
|
|
negotiation_round);
|
fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 05:16:01 +08:00
|
|
|
strbuf_reset(&req_buf);
|
|
|
|
write_fetch_command_and_capabilities(&req_buf, server_options);
|
|
|
|
|
|
|
|
packet_buf_write(&req_buf, "wait-for-done");
|
|
|
|
|
|
|
|
haves_added = add_haves(&negotiator, &req_buf, &haves_to_send);
|
|
|
|
in_vain += haves_added;
|
|
|
|
if (!haves_added || (seen_ack && in_vain >= MAX_IN_VAIN))
|
|
|
|
last_iteration = 1;
|
|
|
|
|
fetch-pack: add tracing for negotiation rounds
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 06:04:05 +08:00
|
|
|
trace2_data_intmax("negotiate_using_fetch", the_repository,
|
|
|
|
"haves_added", haves_added);
|
|
|
|
trace2_data_intmax("negotiate_using_fetch", the_repository,
|
|
|
|
"in_vain", in_vain);
|
|
|
|
|
fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 05:16:01 +08:00
|
|
|
/* Send request */
|
|
|
|
packet_buf_flush(&req_buf);
|
|
|
|
if (write_in_full(fd[1], req_buf.buf, req_buf.len) < 0)
|
|
|
|
die_errno(_("unable to write request to remote"));
|
|
|
|
|
|
|
|
/* Process ACKs/NAKs */
|
|
|
|
process_section_header(&reader, "acknowledgments", 0);
|
|
|
|
while (process_ack(&negotiator, &reader, &common_oid,
|
|
|
|
&received_ready)) {
|
|
|
|
struct commit *commit = lookup_commit(the_repository,
|
|
|
|
&common_oid);
|
|
|
|
if (commit) {
|
|
|
|
timestamp_t generation;
|
|
|
|
|
|
|
|
parse_commit_or_die(commit);
|
|
|
|
commit->object.flags |= COMMON;
|
|
|
|
generation = commit_graph_generation(commit);
|
|
|
|
if (generation < min_generation)
|
|
|
|
min_generation = generation;
|
|
|
|
}
|
|
|
|
in_vain = 0;
|
|
|
|
seen_ack = 1;
|
|
|
|
oidset_insert(acked_commits, &common_oid);
|
|
|
|
}
|
|
|
|
if (received_ready)
|
|
|
|
die(_("unexpected 'ready' from remote"));
|
|
|
|
else
|
|
|
|
do_check_stateless_delimiter(stateless_rpc, &reader);
|
|
|
|
if (can_all_from_reach_with_flag(&nt_object_array, COMMON,
|
|
|
|
REACH_SCRATCH, 0,
|
|
|
|
min_generation))
|
|
|
|
last_iteration = 1;
|
fetch-pack: add tracing for negotiation rounds
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 06:04:05 +08:00
|
|
|
trace2_region_leave_printf("negotiation", "round",
|
|
|
|
the_repository, "%d",
|
|
|
|
negotiation_round);
|
fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 05:16:01 +08:00
|
|
|
}
|
2024-01-04 06:40:54 +08:00
|
|
|
trace2_region_leave("fetch-pack", "negotiate_using_fetch", the_repository);
|
fetch-pack: add tracing for negotiation rounds
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering
the entire negotiation process. However, we'd like additional data, such
as timing for each round of negotiation or the number of "haves" in each
round. Additionally, "independent negotiation" (AKA push negotiation)
has no tracing at all. Having this data would allow us to compare the
performance of the various negotation implementations, and to debug
unexpectedly slow fetch & push sessions.
Add per-round trace2 regions for all negotiation implementations (V0+V1,
V2, and independent negotiation), as well as an overall region for
independent negotiation. Add trace2 data logging for the number of haves
and "in vain" objects for each round, and for the total number of rounds
once negotiation completes. Finally, add a few checks into various
tests to verify that the number of rounds is logged as expected.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 06:04:05 +08:00
|
|
|
trace2_data_intmax("negotiate_using_fetch", the_repository,
|
|
|
|
"total_rounds", negotiation_round);
|
2024-09-05 18:08:40 +08:00
|
|
|
|
fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 05:16:01 +08:00
|
|
|
clear_common_flag(acked_commits);
|
2024-09-05 18:08:40 +08:00
|
|
|
object_array_clear(&nt_object_array);
|
|
|
|
negotiator.release(&negotiator);
|
fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 05:16:01 +08:00
|
|
|
strbuf_release(&req_buf);
|
|
|
|
}
|
|
|
|
|
2017-02-23 00:01:22 +08:00
|
|
|
int report_unmatched_refs(struct ref **sought, int nr_sought)
|
|
|
|
{
|
|
|
|
int i, ret = 0;
|
|
|
|
|
|
|
|
for (i = 0; i < nr_sought; i++) {
|
2017-02-23 00:05:57 +08:00
|
|
|
if (!sought[i])
|
2017-02-23 00:01:22 +08:00
|
|
|
continue;
|
2017-02-23 00:05:57 +08:00
|
|
|
switch (sought[i]->match_status) {
|
|
|
|
case REF_MATCHED:
|
|
|
|
continue;
|
|
|
|
case REF_NOT_MATCHED:
|
|
|
|
error(_("no such remote ref %s"), sought[i]->name);
|
|
|
|
break;
|
|
|
|
case REF_UNADVERTISED_NOT_ALLOWED:
|
|
|
|
error(_("Server does not allow request for unadvertised object %s"),
|
|
|
|
sought[i]->name);
|
|
|
|
break;
|
|
|
|
}
|
2017-02-23 00:01:22 +08:00
|
|
|
ret = 1;
|
|
|
|
}
|
|
|
|
return ret;
|
|
|
|
}
|