global: introduce `USE_THE_REPOSITORY_VARIABLE` macro
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.
It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.
Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit
For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).
Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 14:50:23 +08:00
|
|
|
#define USE_THE_REPOSITORY_VARIABLE
|
|
|
|
|
2023-04-11 15:41:48 +08:00
|
|
|
#include "git-compat-util.h"
|
2017-06-15 02:07:36 +08:00
|
|
|
#include "config.h"
|
2023-03-21 14:26:03 +08:00
|
|
|
#include "environment.h"
|
2023-03-21 14:25:54 +08:00
|
|
|
#include "gettext.h"
|
2023-02-24 08:09:27 +08:00
|
|
|
#include "hex.h"
|
2005-07-05 04:26:53 +08:00
|
|
|
#include "refs.h"
|
|
|
|
#include "pkt-line.h"
|
2006-09-10 18:20:24 +08:00
|
|
|
#include "sideband.h"
|
2018-06-29 09:21:51 +08:00
|
|
|
#include "repository.h"
|
2023-05-16 14:34:06 +08:00
|
|
|
#include "object-store-ll.h"
|
2023-04-11 11:00:42 +08:00
|
|
|
#include "oid-array.h"
|
2005-10-14 09:57:40 +08:00
|
|
|
#include "object.h"
|
2005-10-28 10:48:32 +08:00
|
|
|
#include "commit.h"
|
2006-10-31 03:08:43 +08:00
|
|
|
#include "diff.h"
|
|
|
|
#include "revision.h"
|
2017-12-08 23:58:39 +08:00
|
|
|
#include "list-objects-filter-options.h"
|
2007-10-20 03:47:59 +08:00
|
|
|
#include "run-command.h"
|
2013-07-09 04:56:53 +08:00
|
|
|
#include "connect.h"
|
2011-08-06 04:54:06 +08:00
|
|
|
#include "sigchain.h"
|
2012-08-04 00:19:16 +08:00
|
|
|
#include "version.h"
|
upload/receive-pack: allow hiding ref hierarchies
A repository may have refs that are only used for its internal
bookkeeping purposes that should not be exposed to the others that
come over the network.
Teach upload-pack to omit some refs from its initial advertisement
by paying attention to the uploadpack.hiderefs multi-valued
configuration variable. Do the same to receive-pack via the
receive.hiderefs variable. As a convenient short-hand, allow using
transfer.hiderefs to set the value to both of these variables.
Any ref that is under the hierarchies listed on the value of these
variable is excluded from responses to requests made by "ls-remote",
"fetch", etc. (for upload-pack) and "push" (for receive-pack).
Because these hidden refs do not count as OUR_REF, an attempt to
fetch objects at the tip of them will be rejected, and because these
refs do not get advertised, "git push :" will not see local branches
that have the same name as them as "matching" ones to be sent.
An attempt to update/delete these hidden refs with an explicit
refspec, e.g. "git push origin :refs/hidden/22", is rejected. This
is not a new restriction. To the pusher, it would appear that there
is no such ref, so its push request will conclude with "Now that I
sent you all the data, it is time for you to update the refs. I saw
that the ref did not exist when I started pushing, and I want the
result to point at this commit". The receiving end will apply the
compare-and-swap rule to this request and rejects the push with
"Well, your update request conflicts with somebody else; I see there
is such a ref.", which is the right thing to do. Otherwise a push to
a hidden ref will always be "the last one wins", which is not a good
default.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 08:08:30 +08:00
|
|
|
#include "string-list.h"
|
2020-07-29 04:23:39 +08:00
|
|
|
#include "strvec.h"
|
2023-04-11 11:00:38 +08:00
|
|
|
#include "trace2.h"
|
2017-10-17 01:55:26 +08:00
|
|
|
#include "protocol.h"
|
2018-03-15 02:31:41 +08:00
|
|
|
#include "upload-pack.h"
|
2018-08-21 02:24:34 +08:00
|
|
|
#include "commit-graph.h"
|
2018-07-21 00:33:13 +08:00
|
|
|
#include "commit-reach.h"
|
2020-05-01 03:48:50 +08:00
|
|
|
#include "shallow.h"
|
2023-03-21 14:26:07 +08:00
|
|
|
#include "write-or-die.h"
|
2023-10-18 05:12:47 +08:00
|
|
|
#include "json-writer.h"
|
upload-pack: use a strmap for want-ref lines
When the "ref-in-want" capability is advertised (which it is not by
default), then upload-pack processes a "want-ref" line from the client
by checking that the name is a valid ref and recording it in a
string-list.
In theory this list should grow no larger than the number of refs in the
server-side repository. But since we don't do any de-duplication, a
client which sends "want-ref refs/heads/foo" over and over will cause
the array to grow without bound.
We can fix this by switching to strmap, which efficiently detects
duplicates. There are two client-visible changes here:
1. The "wanted-refs" response will now be in an apparently-random
order (based on iterating the hashmap) rather than the order given
by the client. The protocol documentation is quiet on ordering
here. The current fetch-pack implementation is happy with any
order, as it looks up each returned ref using a binary search in
its local sorted list. JGit seems to implement want-ref on the
server side, but has no client-side support. libgit2 doesn't
support either side.
It would obviously be possible to record the original order or to
use the strmap as an auxiliary data structure. But if the client
doesn't care, we may as well do the simplest thing.
2. We'll now reject duplicates explicitly as a protocol error. The
client should never send them (and our current implementation, even
when asked to "git fetch master:one master:two" will de-dup on the
client side).
If we wanted to be more forgiving, we could perhaps just throw away
the duplicates. But then our "wanted-refs" response back to the
client would omit the duplicates, and it's hard to say what a
client that accidentally sent a duplicate would do with that. So I
think we're better off to complain loudly before anybody
accidentally writes such a client.
Let's also add a note to the protocol documentation clarifying that
duplicates are forbidden. As discussed above, this was already the
intent, but it's not very explicit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:38:40 +08:00
|
|
|
#include "strmap.h"
|
2005-07-05 04:26:53 +08:00
|
|
|
|
2014-03-25 21:23:26 +08:00
|
|
|
/* Remember to update object flag allocation in object.h */
|
2006-07-06 12:28:20 +08:00
|
|
|
#define THEY_HAVE (1u << 11)
|
|
|
|
#define OUR_REF (1u << 12)
|
|
|
|
#define WANTED (1u << 13)
|
|
|
|
#define COMMON_KNOWN (1u << 14)
|
|
|
|
|
2006-10-31 03:09:53 +08:00
|
|
|
#define SHALLOW (1u << 16)
|
|
|
|
#define NOT_SHALLOW (1u << 17)
|
|
|
|
#define CLIENT_SHALLOW (1u << 18)
|
2013-01-29 13:49:57 +08:00
|
|
|
#define HIDDEN_REF (1u << 19)
|
2006-10-31 03:09:53 +08:00
|
|
|
|
upload-pack: clear flags before each v2 request
Suppose a server has the following commit graph:
A B
\ /
O
We create a client by cloning A from the server with depth 1, and add
many commits to it (so that future fetches span multiple requests due to
lengthy negotiation). If it then fetches B using protocol v2, the fetch
spanning multiple requests, the resulting packfile does not contain O
even though the client did report that A is shallow.
This is because upload_pack_v2() can be called multiple times while
processing the same session. During the 2nd and all subsequent
invocations, some object flags remain from the previous invocations. In
particular, CLIENT_SHALLOW remains, preventing process_shallow() from
adding client-reported shallows to the "shallows" array, and hence
pack-objects not knowing about these client-reported shallows.
Therefore, teach upload_pack_v2() to clear object flags at the start of
each invocation. This has some other results:
- THEY_HAVE gates addition of objects to have_obj in process_haves().
Previously in upload_pack_v2(), have_obj needed to be static because
once an object is added to have_obj, it is never readded and thus we
needed to retain the contents of have_obj between invocations. Now
that flags are cleared, this is no longer necessary. This patch does
not change the behavior of ok_to_give_up() (THEY_HAVE is still set on
each "have") and got_oid() (used only in non-v2)); THEY_HAVE is not
used in any other function.
- WANTED gates addition of objects to want_obj in parse_want() and
parse_want_ref(). It is also used in receive_needs(), but that is
only used in non-v2. For the same reasons as THEY_HAVE, want_obj no
longer needs to be static in upload_pack_v2().
- CLIENT_SHALLOW is changed as discussed above.
Clearing of the other 5 flags does not affect functionality in v2. (Note
that in non-v2, upload_pack() is only called once per process, so each
invocation starts with blank flags anyway.)
- OUR_REF is only used in non-v2.
- COMMON_KNOWN is only used as a scratch flag in ok_to_give_up().
- SHALLOW is passed to invocations in deepen() and
deepen_by_rev_list(), but upload-pack doesn't use it.
- NOT_SHALLOW is used by send_shallow() and send_unshallow(), but
invocations of those functions are always preceded by code that sets
NOT_SHALLOW on the appropriate objects.
- HIDDEN_REF is only used in non-v2.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19 04:43:29 +08:00
|
|
|
#define ALL_FLAGS (THEY_HAVE | OUR_REF | WANTED | COMMON_KNOWN | SHALLOW | \
|
|
|
|
NOT_SHALLOW | CLIENT_SHALLOW | HIDDEN_REF)
|
|
|
|
|
2020-06-11 20:05:12 +08:00
|
|
|
/* Enum for allowed unadvertised object request (UOR) */
|
|
|
|
enum allow_uor {
|
|
|
|
/* Allow specifying sha1 if it is a ref tip. */
|
|
|
|
ALLOW_TIP_SHA1 = 0x01,
|
|
|
|
/* Allow request of a sha1 if it is reachable from a ref (possibly hidden ref). */
|
|
|
|
ALLOW_REACHABLE_SHA1 = 0x02,
|
|
|
|
/* Allow request of any sha1. Implies ALLOW_TIP_SHA1 and ALLOW_REACHABLE_SHA1. */
|
|
|
|
ALLOW_ANY_SHA1 = 0x07
|
|
|
|
};
|
2005-10-20 05:27:01 +08:00
|
|
|
|
2020-06-05 01:54:39 +08:00
|
|
|
/*
|
|
|
|
* Please annotate, and if possible group together, fields used only
|
|
|
|
* for protocol v0 or only for protocol v2.
|
|
|
|
*/
|
2020-05-15 18:04:44 +08:00
|
|
|
struct upload_pack_data {
|
2020-06-05 01:54:39 +08:00
|
|
|
struct string_list symref; /* v0 only */
|
2020-05-15 18:04:44 +08:00
|
|
|
struct object_array want_obj;
|
|
|
|
struct object_array have_obj;
|
upload-pack: use a strmap for want-ref lines
When the "ref-in-want" capability is advertised (which it is not by
default), then upload-pack processes a "want-ref" line from the client
by checking that the name is a valid ref and recording it in a
string-list.
In theory this list should grow no larger than the number of refs in the
server-side repository. But since we don't do any de-duplication, a
client which sends "want-ref refs/heads/foo" over and over will cause
the array to grow without bound.
We can fix this by switching to strmap, which efficiently detects
duplicates. There are two client-visible changes here:
1. The "wanted-refs" response will now be in an apparently-random
order (based on iterating the hashmap) rather than the order given
by the client. The protocol documentation is quiet on ordering
here. The current fetch-pack implementation is happy with any
order, as it looks up each returned ref using a binary search in
its local sorted list. JGit seems to implement want-ref on the
server side, but has no client-side support. libgit2 doesn't
support either side.
It would obviously be possible to record the original order or to
use the strmap as an auxiliary data structure. But if the client
doesn't care, we may as well do the simplest thing.
2. We'll now reject duplicates explicitly as a protocol error. The
client should never send them (and our current implementation, even
when asked to "git fetch master:one master:two" will de-dup on the
client side).
If we wanted to be more forgiving, we could perhaps just throw away
the duplicates. But then our "wanted-refs" response back to the
client would omit the duplicates, and it's hard to say what a
client that accidentally sent a duplicate would do with that. So I
think we're better off to complain loudly before anybody
accidentally writes such a client.
Let's also add a note to the protocol documentation clarifying that
duplicates are forbidden. As discussed above, this was already the
intent, but it's not very explicit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:38:40 +08:00
|
|
|
struct strmap wanted_refs; /* v2 only */
|
2023-07-11 05:12:33 +08:00
|
|
|
struct strvec hidden_refs;
|
2020-05-15 18:04:44 +08:00
|
|
|
|
|
|
|
struct object_array shallows;
|
2024-02-29 06:37:44 +08:00
|
|
|
struct oidset deepen_not;
|
2020-06-11 20:05:10 +08:00
|
|
|
struct object_array extra_edge_obj;
|
2020-05-15 18:04:44 +08:00
|
|
|
int depth;
|
|
|
|
timestamp_t deepen_since;
|
|
|
|
int deepen_rev_list;
|
|
|
|
int deepen_relative;
|
2020-06-05 01:54:46 +08:00
|
|
|
int keepalive;
|
2020-06-11 20:05:09 +08:00
|
|
|
int shallow_nr;
|
2020-06-11 20:05:17 +08:00
|
|
|
timestamp_t oldest_have;
|
2020-05-15 18:04:44 +08:00
|
|
|
|
2020-06-05 01:54:40 +08:00
|
|
|
unsigned int timeout; /* v0 only */
|
2020-06-05 01:54:44 +08:00
|
|
|
enum {
|
|
|
|
NO_MULTI_ACK = 0,
|
|
|
|
MULTI_ACK = 1,
|
|
|
|
MULTI_ACK_DETAILED = 2
|
|
|
|
} multi_ack; /* v0 only */
|
2020-06-05 01:54:40 +08:00
|
|
|
|
2020-06-05 01:54:41 +08:00
|
|
|
/* 0 for no sideband, otherwise DEFAULT_PACKET_MAX or LARGE_PACKET_MAX */
|
|
|
|
int use_sideband;
|
|
|
|
|
2020-06-11 04:57:23 +08:00
|
|
|
struct string_list uri_protocols;
|
2020-06-11 20:05:12 +08:00
|
|
|
enum allow_uor allow_uor;
|
2020-06-11 20:05:11 +08:00
|
|
|
|
2020-05-15 18:04:44 +08:00
|
|
|
struct list_objects_filter_options filter_options;
|
upload-pack.c: allow banning certain object filter(s)
Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.
However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).
Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing two new configuration variables:
- 'uploadpackfilter.allow'
- 'uploadpackfilter.<kind>.allow'
where '<kind>' may be one of 'blobNone', 'blobLimit', 'tree', and so on.
Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.
If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpackfilter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04 02:00:10 +08:00
|
|
|
struct string_list allowed_filters;
|
2020-05-15 18:04:44 +08:00
|
|
|
|
|
|
|
struct packet_writer writer;
|
|
|
|
|
2024-05-27 19:46:39 +08:00
|
|
|
char *pack_objects_hook;
|
2020-06-05 01:54:50 +08:00
|
|
|
|
2020-06-05 01:54:39 +08:00
|
|
|
unsigned stateless_rpc : 1; /* v0 only */
|
2020-06-05 01:54:40 +08:00
|
|
|
unsigned no_done : 1; /* v0 only */
|
|
|
|
unsigned daemon_mode : 1; /* v0 only */
|
2020-06-05 01:54:42 +08:00
|
|
|
unsigned filter_capability_requested : 1; /* v0 only */
|
2020-05-15 18:04:44 +08:00
|
|
|
|
|
|
|
unsigned use_thin_pack : 1;
|
|
|
|
unsigned use_ofs_delta : 1;
|
|
|
|
unsigned no_progress : 1;
|
|
|
|
unsigned use_include_tag : 1;
|
fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 05:16:01 +08:00
|
|
|
unsigned wait_for_done : 1;
|
2020-06-05 01:54:47 +08:00
|
|
|
unsigned allow_filter : 1;
|
upload-pack.c: allow banning certain object filter(s)
Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.
However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).
Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing two new configuration variables:
- 'uploadpackfilter.allow'
- 'uploadpackfilter.<kind>.allow'
where '<kind>' may be one of 'blobNone', 'blobLimit', 'tree', and so on.
Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.
If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpackfilter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04 02:00:10 +08:00
|
|
|
unsigned allow_filter_fallback : 1;
|
upload-pack.c: introduce 'uploadpackfilter.tree.maxDepth'
In b79cf959b2 (upload-pack.c: allow banning certain object filter(s),
2020-02-26), we introduced functionality to disallow certain object
filters from being chosen from within 'git upload-pack'. Traditionally,
administrators use this functionality to disallow filters that are known
to perform slowly, for e.g., those that do not have bitmap-level
filtering.
In the past, the '--filter=tree:<n>' was one such filter that does not
have bitmap-level filtering support, and so was likely to be banned by
administrators.
However, in the previous couple of commits, we introduced bitmap-level
filtering for the case when 'n' is equal to '0', i.e., as if we had a
'--filter=tree:none' choice.
While it would be sufficient to simply write
$ git config uploadpackfilter.tree.allow true
(since it would allow all values of 'n'), we would like to be able to
allow this filter for certain values of 'n', i.e., those no greater than
some pre-specified maximum.
In order to do this, introduce a new configuration key, as follows:
$ git config uploadpackfilter.tree.maxDepth <m>
where '<m>' specifies the maximum allowed value of 'n' in the filter
'tree:n'. Administrators who wish to allow for only the value '0' can
write:
$ git config uploadpackfilter.tree.allow true
$ git config uploadpackfilter.tree.maxDepth 0
which allows '--filter=tree:0', but no other values.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04 02:00:17 +08:00
|
|
|
unsigned long tree_filter_max_depth;
|
2020-06-05 01:54:39 +08:00
|
|
|
|
|
|
|
unsigned done : 1; /* v2 only */
|
2020-06-05 01:54:48 +08:00
|
|
|
unsigned allow_ref_in_want : 1; /* v2 only */
|
2020-06-05 01:54:49 +08:00
|
|
|
unsigned allow_sideband_all : 1; /* v2 only */
|
upload-pack: drop separate v2 "haves" array
When upload-pack sees a "have" line in the v0 protocol, it immediately
calls got_oid() with its argument and potentially produces an ACK
response. In the v2 protocol, we simply record the argument in an
oid_array, and only later process all of the "have" objects by calling
the equivalent of got_oid() on the contents of the array.
This makes some sense, as v2 is a pure request/response protocol, as
opposed to v0's asynchronous negotiation phase. But there's a downside:
a client can send us an infinite number of garbage "have" lines, which
we'll happily slurp into the array, consuming memory. Whereas in v0,
they are limited by the number of objects in the repository (because
got_oid() only records objects we have ourselves, and we avoid
duplicates by setting a flag on the object struct).
We can make v2 behave more like v0 by also calling got_oid() directly
when v2 parses a "have" line. Calling it early like this is OK because
got_oid() itself does not interact with the client; it only confirms
that we have the object and sets a few flags. Note that unlike v0, v2
does not ever (before or after this patch) check the return code of
got_oid(), which lets the caller know whether we have the object. But
again, that makes sense; v0 is using it to asynchronously tell the
client to stop sending. In v2's synchronous protocol, we just discard
those entries (and decide how to ACK at the end of each round).
There is one slight tweak we need, though. In v2's state machine, we
reach the SEND_ACKS state if the other side sent us any "have" lines,
whether they were useful or not. Right now we do that by checking
whether the "have" array had any entries, but if we record only the
useful ones, that doesn't work. Instead, we can add a simple boolean
that tells us whether we saw any have line (even if it was useless).
This lets us drop the "haves" array entirely, as we're now placing
objects directly into the "have_obj" object array (which is where
got_oid() put them in the long run anyway). And as a bonus, we can drop
the secondary "common" array used in process_haves_and_send_acks(). It
was essentially a copy of "haves" minus the objects we do not have. But
now that we are using "have_obj" directly, we know everything in it is
useful. So in addition to protecting ourselves against malicious input,
we should slightly lower our memory usage for normal inputs.
Note that there is one user-visible effect. The trace2 output records
the number of "haves". Previously this was the total number of "have"
lines we saw, but now is the number of useful ones. We could retain the
original meaning by keeping a separate counter, but it doesn't seem
worth the effort; this trace info is for debugging and metrics, and
arguably the count of common oids is at least as useful as the total
count.
Reported-by: Benjamin Flesch <benjaminflesch@icloud.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:37:13 +08:00
|
|
|
unsigned seen_haves : 1; /* v2 only */
|
2024-02-29 06:50:50 +08:00
|
|
|
unsigned allow_packfile_uris : 1; /* v2 only */
|
2020-11-12 07:29:27 +08:00
|
|
|
unsigned advertise_sid : 1;
|
upload-pack: advertise capabilities when cloning empty repos
When cloning an empty repository, protocol versions 0 and 1 currently
offer nothing but the header and flush packets for the /info/refs
endpoint. This means that no capabilities are provided, so the client
side doesn't know what capabilities are present.
However, this does pose a problem when working with SHA-256
repositories, since we use the capabilities to know the remote side's
object format (hash algorithm). As of 8b214c2e9d ("clone: propagate
object-format when cloning from void", 2023-04-05), this has been fixed
for protocol v2, since there we always read the hash algorithm from the
remote.
Fortunately, the push version of the protocol already indicates a clue
for how to solve this. When the /info/refs endpoint is accessed for a
push and the remote is empty, we include a dummy "capabilities^{}" ref
pointing to the all-zeros object ID. The protocol documentation already
indicates this should _always_ be sent, even for fetches and clones, so
let's just do that, which means we'll properly announce the hash
algorithm as part of the capabilities. This just works with the
existing code because we share the same ref code for fetches and clones,
and libgit2, JGit, and dulwich do as well.
There is one minor issue to fix, though. If we called send_ref with
namespaces, we would return NULL with the capabilities entry, which
would cause a crash. Instead, let's refactor out a function to print
just the ref itself without stripping the namespace and use it for our
special capabilities entry.
Add several sets of tests for HTTP as well as for local clones. The
behavior can be slightly different for HTTP versus a local or SSH clone
because of the stateless-rpc functionality, so it's worth testing both.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-18 03:24:43 +08:00
|
|
|
unsigned sent_capabilities : 1;
|
2020-05-15 18:04:44 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static void upload_pack_data_init(struct upload_pack_data *data)
|
|
|
|
{
|
2020-05-15 18:04:49 +08:00
|
|
|
struct string_list symref = STRING_LIST_INIT_DUP;
|
upload-pack: use a strmap for want-ref lines
When the "ref-in-want" capability is advertised (which it is not by
default), then upload-pack processes a "want-ref" line from the client
by checking that the name is a valid ref and recording it in a
string-list.
In theory this list should grow no larger than the number of refs in the
server-side repository. But since we don't do any de-duplication, a
client which sends "want-ref refs/heads/foo" over and over will cause
the array to grow without bound.
We can fix this by switching to strmap, which efficiently detects
duplicates. There are two client-visible changes here:
1. The "wanted-refs" response will now be in an apparently-random
order (based on iterating the hashmap) rather than the order given
by the client. The protocol documentation is quiet on ordering
here. The current fetch-pack implementation is happy with any
order, as it looks up each returned ref using a binary search in
its local sorted list. JGit seems to implement want-ref on the
server side, but has no client-side support. libgit2 doesn't
support either side.
It would obviously be possible to record the original order or to
use the strmap as an auxiliary data structure. But if the client
doesn't care, we may as well do the simplest thing.
2. We'll now reject duplicates explicitly as a protocol error. The
client should never send them (and our current implementation, even
when asked to "git fetch master:one master:two" will de-dup on the
client side).
If we wanted to be more forgiving, we could perhaps just throw away
the duplicates. But then our "wanted-refs" response back to the
client would omit the duplicates, and it's hard to say what a
client that accidentally sent a duplicate would do with that. So I
think we're better off to complain loudly before anybody
accidentally writes such a client.
Let's also add a note to the protocol documentation clarifying that
duplicates are forbidden. As discussed above, this was already the
intent, but it's not very explicit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:38:40 +08:00
|
|
|
struct strmap wanted_refs = STRMAP_INIT;
|
2023-07-11 05:12:33 +08:00
|
|
|
struct strvec hidden_refs = STRVEC_INIT;
|
2020-05-15 18:04:44 +08:00
|
|
|
struct object_array want_obj = OBJECT_ARRAY_INIT;
|
|
|
|
struct object_array have_obj = OBJECT_ARRAY_INIT;
|
|
|
|
struct object_array shallows = OBJECT_ARRAY_INIT;
|
2024-02-29 06:37:44 +08:00
|
|
|
struct oidset deepen_not = OID_ARRAY_INIT;
|
2020-06-11 04:57:23 +08:00
|
|
|
struct string_list uri_protocols = STRING_LIST_INIT_DUP;
|
2020-06-11 20:05:10 +08:00
|
|
|
struct object_array extra_edge_obj = OBJECT_ARRAY_INIT;
|
upload-pack.c: allow banning certain object filter(s)
Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.
However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).
Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing two new configuration variables:
- 'uploadpackfilter.allow'
- 'uploadpackfilter.<kind>.allow'
where '<kind>' may be one of 'blobNone', 'blobLimit', 'tree', and so on.
Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.
If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpackfilter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04 02:00:10 +08:00
|
|
|
struct string_list allowed_filters = STRING_LIST_INIT_DUP;
|
2020-05-15 18:04:44 +08:00
|
|
|
|
|
|
|
memset(data, 0, sizeof(*data));
|
2020-05-15 18:04:49 +08:00
|
|
|
data->symref = symref;
|
2020-05-15 18:04:44 +08:00
|
|
|
data->wanted_refs = wanted_refs;
|
2022-11-17 13:46:43 +08:00
|
|
|
data->hidden_refs = hidden_refs;
|
2020-05-15 18:04:44 +08:00
|
|
|
data->want_obj = want_obj;
|
|
|
|
data->have_obj = have_obj;
|
|
|
|
data->shallows = shallows;
|
|
|
|
data->deepen_not = deepen_not;
|
2020-06-11 04:57:23 +08:00
|
|
|
data->uri_protocols = uri_protocols;
|
2020-06-11 20:05:10 +08:00
|
|
|
data->extra_edge_obj = extra_edge_obj;
|
upload-pack.c: allow banning certain object filter(s)
Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.
However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).
Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing two new configuration variables:
- 'uploadpackfilter.allow'
- 'uploadpackfilter.<kind>.allow'
where '<kind>' may be one of 'blobNone', 'blobLimit', 'tree', and so on.
Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.
If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpackfilter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04 02:00:10 +08:00
|
|
|
data->allowed_filters = allowed_filters;
|
|
|
|
data->allow_filter_fallback = 1;
|
upload-pack.c: introduce 'uploadpackfilter.tree.maxDepth'
In b79cf959b2 (upload-pack.c: allow banning certain object filter(s),
2020-02-26), we introduced functionality to disallow certain object
filters from being chosen from within 'git upload-pack'. Traditionally,
administrators use this functionality to disallow filters that are known
to perform slowly, for e.g., those that do not have bitmap-level
filtering.
In the past, the '--filter=tree:<n>' was one such filter that does not
have bitmap-level filtering support, and so was likely to be banned by
administrators.
However, in the previous couple of commits, we introduced bitmap-level
filtering for the case when 'n' is equal to '0', i.e., as if we had a
'--filter=tree:none' choice.
While it would be sufficient to simply write
$ git config uploadpackfilter.tree.allow true
(since it would allow all values of 'n'), we would like to be able to
allow this filter for certain values of 'n', i.e., those no greater than
some pre-specified maximum.
In order to do this, introduce a new configuration key, as follows:
$ git config uploadpackfilter.tree.maxDepth <m>
where '<m>' specifies the maximum allowed value of 'n' in the filter
'tree:n'. Administrators who wish to allow for only the value '0' can
write:
$ git config uploadpackfilter.tree.allow true
$ git config uploadpackfilter.tree.maxDepth 0
which allows '--filter=tree:0', but no other values.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04 02:00:17 +08:00
|
|
|
data->tree_filter_max_depth = ULONG_MAX;
|
2020-05-15 18:04:44 +08:00
|
|
|
packet_writer_init(&data->writer, 1);
|
list-objects-filter: add and use initializers
In 7e2619d8ff (list_objects_filter_options: plug leak of filter_spec
strings, 2022-09-08), we noted that the filter_spec string_list was
inconsistent in how it handled memory ownership of strings stored in the
list. The fix there was a bit of a band-aid to set the "strdup_strings"
variable right before adding anything.
That works OK, and it lets the users of the API continue to
zero-initialize the struct. But it makes the code a bit hard to follow
and accident-prone, as any other spots appending the filter_spec need to
think about whether to set the strdup_strings value, too (there's one
such spot in partial_clone_get_default_filter_spec(), which is probably
a possible memory leak).
So let's do that full cleanup now. We'll introduce a
LIST_OBJECTS_FILTER_INIT macro and matching function, and use them as
appropriate (though it is for the "_options" struct, this matches the
corresponding list_objects_filter_release() function).
This is harder than it seems! Many other structs, like
git_transport_data, embed the filter struct. So they need to initialize
it themselves even if the rest of the enclosing struct is OK with
zero-initialization. I found all of the relevant spots by grepping
manually for declarations of list_objects_filter_options. And then doing
so recursively for structs which embed it, and ones which embed those,
and so on.
I'm pretty sure I got everything, but there's no change that would alert
the compiler if any topics in flight added new declarations. To catch
this case, we now double-check in the parsing function that things were
initialized as expected and BUG() if appropriate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-11 13:03:07 +08:00
|
|
|
list_objects_filter_init(&data->filter_options);
|
2020-06-05 01:54:46 +08:00
|
|
|
|
|
|
|
data->keepalive = 5;
|
2020-11-12 07:29:27 +08:00
|
|
|
data->advertise_sid = 0;
|
2020-05-15 18:04:44 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void upload_pack_data_clear(struct upload_pack_data *data)
|
|
|
|
{
|
2020-05-15 18:04:49 +08:00
|
|
|
string_list_clear(&data->symref, 1);
|
upload-pack: use a strmap for want-ref lines
When the "ref-in-want" capability is advertised (which it is not by
default), then upload-pack processes a "want-ref" line from the client
by checking that the name is a valid ref and recording it in a
string-list.
In theory this list should grow no larger than the number of refs in the
server-side repository. But since we don't do any de-duplication, a
client which sends "want-ref refs/heads/foo" over and over will cause
the array to grow without bound.
We can fix this by switching to strmap, which efficiently detects
duplicates. There are two client-visible changes here:
1. The "wanted-refs" response will now be in an apparently-random
order (based on iterating the hashmap) rather than the order given
by the client. The protocol documentation is quiet on ordering
here. The current fetch-pack implementation is happy with any
order, as it looks up each returned ref using a binary search in
its local sorted list. JGit seems to implement want-ref on the
server side, but has no client-side support. libgit2 doesn't
support either side.
It would obviously be possible to record the original order or to
use the strmap as an auxiliary data structure. But if the client
doesn't care, we may as well do the simplest thing.
2. We'll now reject duplicates explicitly as a protocol error. The
client should never send them (and our current implementation, even
when asked to "git fetch master:one master:two" will de-dup on the
client side).
If we wanted to be more forgiving, we could perhaps just throw away
the duplicates. But then our "wanted-refs" response back to the
client would omit the duplicates, and it's hard to say what a
client that accidentally sent a duplicate would do with that. So I
think we're better off to complain loudly before anybody
accidentally writes such a client.
Let's also add a note to the protocol documentation clarifying that
duplicates are forbidden. As discussed above, this was already the
intent, but it's not very explicit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:38:40 +08:00
|
|
|
strmap_clear(&data->wanted_refs, 1);
|
2023-07-11 05:12:33 +08:00
|
|
|
strvec_clear(&data->hidden_refs);
|
2020-05-15 18:04:44 +08:00
|
|
|
object_array_clear(&data->want_obj);
|
|
|
|
object_array_clear(&data->have_obj);
|
|
|
|
object_array_clear(&data->shallows);
|
2024-02-29 06:37:44 +08:00
|
|
|
oidset_clear(&data->deepen_not);
|
2020-06-11 20:05:10 +08:00
|
|
|
object_array_clear(&data->extra_edge_obj);
|
2020-05-15 18:04:44 +08:00
|
|
|
list_objects_filter_release(&data->filter_options);
|
upload-pack.c: don't free allowed_filters util pointers
To keep track of which object filters are allowed or not, 'git
upload-pack' stores the name of each filter in a string_list, and sets
it ->util pointer to be either 0 or 1, indicating whether it is banned
or allowed.
Later on, we attempt to clear that list, but we incorrectly ask for the
util pointers to be free()'d, too. This behavior (introduced back in
6dd3456a8c (upload-pack.c: allow banning certain object filter(s),
2020-08-03)) leads to an invalid free, and causes us to crash.
In order to trigger this, one needs to fetch from a server that (a) has
at least one object filter allowed, and (b) issue a fetch that contains
a subset of the allowed filters (i.e., we cannot ask for a banned
filter, since this causes us to die() before we hit the bogus
string_list_clear()).
In that case, whatever banned filters exist will cause a noop free()
(since those ->util pointers are set to 0), but the first allowed filter
we try to free will crash us.
We never noticed this in the tests because we didn't have an example of
setting 'uploadPackFilter' configuration variables and then following up
with a valid fetch. The first new 'git clone' prevents further
regression here. For good measure on top, add a test which checks the
same behavior at a tree depth greater than 0.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-04 02:55:18 +08:00
|
|
|
string_list_clear(&data->allowed_filters, 0);
|
2020-06-05 01:54:50 +08:00
|
|
|
|
|
|
|
free((char *)data->pack_objects_hook);
|
2020-05-15 18:04:44 +08:00
|
|
|
}
|
|
|
|
|
2020-06-05 01:54:40 +08:00
|
|
|
static void reset_timeout(unsigned int timeout)
|
2005-10-20 05:27:01 +08:00
|
|
|
{
|
|
|
|
alarm(timeout);
|
|
|
|
}
|
2005-07-05 06:29:17 +08:00
|
|
|
|
2020-06-05 01:54:41 +08:00
|
|
|
static void send_client_data(int fd, const char *data, ssize_t sz,
|
|
|
|
int use_sideband)
|
2006-06-21 15:30:21 +08:00
|
|
|
{
|
2016-06-14 22:49:16 +08:00
|
|
|
if (use_sideband) {
|
|
|
|
send_sideband(1, fd, data, sz, use_sideband);
|
2016-06-14 22:49:17 +08:00
|
|
|
return;
|
2016-06-14 22:49:16 +08:00
|
|
|
}
|
2006-09-10 18:20:24 +08:00
|
|
|
if (fd == 3)
|
|
|
|
/* emergency quit */
|
|
|
|
fd = 2;
|
|
|
|
if (fd == 2) {
|
2007-01-08 23:58:23 +08:00
|
|
|
/* XXX: are we happy to lose stuff here? */
|
2006-09-10 18:20:24 +08:00
|
|
|
xwrite(fd, data, sz);
|
2016-06-14 22:49:17 +08:00
|
|
|
return;
|
2006-06-21 15:30:21 +08:00
|
|
|
}
|
2013-02-21 04:01:56 +08:00
|
|
|
write_or_die(fd, data, sz);
|
2006-06-21 15:30:21 +08:00
|
|
|
}
|
|
|
|
|
2014-03-11 20:59:46 +08:00
|
|
|
static int write_one_shallow(const struct commit_graft *graft, void *cb_data)
|
|
|
|
{
|
|
|
|
FILE *fp = cb_data;
|
|
|
|
if (graft->nr_parent == -1)
|
2015-03-14 07:39:34 +08:00
|
|
|
fprintf(fp, "--shallow %s\n", oid_to_hex(&graft->oid));
|
2014-03-11 20:59:46 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-06-11 04:57:21 +08:00
|
|
|
struct output_state {
|
2021-12-15 03:46:26 +08:00
|
|
|
/*
|
|
|
|
* We do writes no bigger than LARGE_PACKET_DATA_MAX - 1, because with
|
|
|
|
* sideband-64k the band designator takes up 1 byte of space. Because
|
|
|
|
* relay_pack_data keeps the last byte to itself, we make the buffer 1
|
|
|
|
* byte bigger than the intended maximum write size.
|
|
|
|
*/
|
|
|
|
char buffer[(LARGE_PACKET_DATA_MAX - 1) + 1];
|
2020-06-11 04:57:21 +08:00
|
|
|
int used;
|
2020-06-11 04:57:23 +08:00
|
|
|
unsigned packfile_uris_started : 1;
|
|
|
|
unsigned packfile_started : 1;
|
2020-06-11 04:57:21 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static int relay_pack_data(int pack_objects_out, struct output_state *os,
|
2020-06-11 04:57:23 +08:00
|
|
|
int use_sideband, int write_packfile_line)
|
2020-06-11 04:57:21 +08:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* We keep the last byte to ourselves
|
|
|
|
* in case we detect broken rev-list, so that we
|
|
|
|
* can leave the stream corrupted. This is
|
|
|
|
* unfortunate -- unpack-objects would happily
|
|
|
|
* accept a valid packdata with trailing garbage,
|
|
|
|
* so appending garbage after we pass all the
|
|
|
|
* pack data is not good enough to signal
|
|
|
|
* breakage to downstream.
|
|
|
|
*/
|
|
|
|
ssize_t readsz;
|
|
|
|
|
|
|
|
readsz = xread(pack_objects_out, os->buffer + os->used,
|
|
|
|
sizeof(os->buffer) - os->used);
|
|
|
|
if (readsz < 0) {
|
|
|
|
return readsz;
|
|
|
|
}
|
|
|
|
os->used += readsz;
|
|
|
|
|
2020-06-11 04:57:23 +08:00
|
|
|
while (!os->packfile_started) {
|
|
|
|
char *p;
|
|
|
|
if (os->used >= 4 && !memcmp(os->buffer, "PACK", 4)) {
|
|
|
|
os->packfile_started = 1;
|
|
|
|
if (write_packfile_line) {
|
|
|
|
if (os->packfile_uris_started)
|
|
|
|
packet_delim(1);
|
|
|
|
packet_write_fmt(1, "\1packfile\n");
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if ((p = memchr(os->buffer, '\n', os->used))) {
|
|
|
|
if (!os->packfile_uris_started) {
|
|
|
|
os->packfile_uris_started = 1;
|
|
|
|
if (!write_packfile_line)
|
|
|
|
BUG("packfile_uris requires sideband-all");
|
|
|
|
packet_write_fmt(1, "\1packfile-uris\n");
|
|
|
|
}
|
|
|
|
*p = '\0';
|
|
|
|
packet_write_fmt(1, "\1%s\n", os->buffer);
|
|
|
|
|
|
|
|
os->used -= p - os->buffer + 1;
|
|
|
|
memmove(os->buffer, p + 1, os->used);
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* Incomplete line.
|
|
|
|
*/
|
|
|
|
return readsz;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-06-11 04:57:21 +08:00
|
|
|
if (os->used > 1) {
|
|
|
|
send_client_data(1, os->buffer, os->used - 1, use_sideband);
|
|
|
|
os->buffer[0] = os->buffer[os->used - 1];
|
|
|
|
os->used = 1;
|
|
|
|
} else {
|
|
|
|
send_client_data(1, os->buffer, os->used, use_sideband);
|
|
|
|
os->used = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
return readsz;
|
|
|
|
}
|
|
|
|
|
2020-06-11 04:57:23 +08:00
|
|
|
static void create_pack_file(struct upload_pack_data *pack_data,
|
|
|
|
const struct string_list *uri_protocols)
|
2005-07-05 06:29:17 +08:00
|
|
|
{
|
2014-08-20 03:09:35 +08:00
|
|
|
struct child_process pack_objects = CHILD_PROCESS_INIT;
|
2021-12-15 03:46:26 +08:00
|
|
|
struct output_state *output_state = xcalloc(1, sizeof(struct output_state));
|
2020-06-11 04:57:21 +08:00
|
|
|
char progress[128];
|
2006-06-21 15:30:21 +08:00
|
|
|
char abort_msg[] = "aborting due to possible repository "
|
|
|
|
"corruption on the remote side.";
|
2009-12-11 04:17:11 +08:00
|
|
|
ssize_t sz;
|
2016-02-25 20:13:26 +08:00
|
|
|
int i;
|
2013-08-16 17:52:05 +08:00
|
|
|
FILE *pipe_fd;
|
2005-07-05 07:35:13 +08:00
|
|
|
|
2020-06-05 01:54:50 +08:00
|
|
|
if (!pack_data->pack_objects_hook)
|
upload-pack: provide a hook for running pack-objects
When upload-pack serves a client request, it turns to
pack-objects to do the heavy lifting of creating a
packfile. There's no easy way to intercept the call to
pack-objects, but there are a few good reasons to want to do
so:
1. If you're debugging a client or server issue with
fetching, you may want to store a copy of the generated
packfile.
2. If you're gathering data from real-world fetches for
performance analysis or debugging, storing a copy of
the arguments and stdin lets you replay the pack
generation at your leisure.
3. You may want to insert a caching layer around
pack-objects; it is the most CPU- and memory-intensive
part of serving a fetch, and its output is a pure
function[1] of its input, making it an ideal place to
consolidate identical requests.
This patch adds a simple "hook" interface to intercept calls
to pack-objects. The new test demonstrates how it can be
used for debugging (using it for caching is a
straightforward extension; the tricky part is writing the
actual caching layer).
This hook is unlike the normal hook scripts found in the
"hooks/" directory of a repository. Because we promise that
upload-pack is safe to run in an untrusted repository, we
cannot execute arbitrary code or commands found in the
repository (neither in hooks/, nor in the config). So
instead, this hook is triggered from a config variable that
is explicitly ignored in the per-repo config.
The config variable holds the actual shell command to run as
the hook. Another approach would be to simply treat it as a
boolean: "should I respect the upload-pack hooks in this
repo?", and then run the script from "hooks/" as we usually
do. However, that isn't as flexible; there's no way to run a
hook approved by the site administrator (e.g., in
"/etc/gitconfig") on a repository whose contents are not
trusted. The approach taken by this patch is more
fine-grained, if a little less conventional for git hooks
(it does behave similar to other configured commands like
diff.external, etc).
[1] Pack-objects isn't _actually_ a pure function. Its
output depends on the exact packing of the object
database, and if multi-threading is used for delta
compression, can even differ racily. But for the
purposes of caching, that's OK; of the many possible
outputs for a given input, it is sufficient only that we
output one of them.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-19 06:45:37 +08:00
|
|
|
pack_objects.git_cmd = 1;
|
|
|
|
else {
|
2020-07-29 04:25:12 +08:00
|
|
|
strvec_push(&pack_objects.args, pack_data->pack_objects_hook);
|
|
|
|
strvec_push(&pack_objects.args, "git");
|
upload-pack: provide a hook for running pack-objects
When upload-pack serves a client request, it turns to
pack-objects to do the heavy lifting of creating a
packfile. There's no easy way to intercept the call to
pack-objects, but there are a few good reasons to want to do
so:
1. If you're debugging a client or server issue with
fetching, you may want to store a copy of the generated
packfile.
2. If you're gathering data from real-world fetches for
performance analysis or debugging, storing a copy of
the arguments and stdin lets you replay the pack
generation at your leisure.
3. You may want to insert a caching layer around
pack-objects; it is the most CPU- and memory-intensive
part of serving a fetch, and its output is a pure
function[1] of its input, making it an ideal place to
consolidate identical requests.
This patch adds a simple "hook" interface to intercept calls
to pack-objects. The new test demonstrates how it can be
used for debugging (using it for caching is a
straightforward extension; the tricky part is writing the
actual caching layer).
This hook is unlike the normal hook scripts found in the
"hooks/" directory of a repository. Because we promise that
upload-pack is safe to run in an untrusted repository, we
cannot execute arbitrary code or commands found in the
repository (neither in hooks/, nor in the config). So
instead, this hook is triggered from a config variable that
is explicitly ignored in the per-repo config.
The config variable holds the actual shell command to run as
the hook. Another approach would be to simply treat it as a
boolean: "should I respect the upload-pack hooks in this
repo?", and then run the script from "hooks/" as we usually
do. However, that isn't as flexible; there's no way to run a
hook approved by the site administrator (e.g., in
"/etc/gitconfig") on a repository whose contents are not
trusted. The approach taken by this patch is more
fine-grained, if a little less conventional for git hooks
(it does behave similar to other configured commands like
diff.external, etc).
[1] Pack-objects isn't _actually_ a pure function. Its
output depends on the exact packing of the object
database, and if multi-threading is used for delta
compression, can even differ racily. But for the
purposes of caching, that's OK; of the many possible
outputs for a given input, it is sufficient only that we
output one of them.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-19 06:45:37 +08:00
|
|
|
pack_objects.use_shell = 1;
|
|
|
|
}
|
|
|
|
|
2020-06-11 20:05:09 +08:00
|
|
|
if (pack_data->shallow_nr) {
|
2020-07-29 04:25:12 +08:00
|
|
|
strvec_push(&pack_objects.args, "--shallow-file");
|
|
|
|
strvec_push(&pack_objects.args, "");
|
2009-06-10 07:50:18 +08:00
|
|
|
}
|
2020-07-29 04:25:12 +08:00
|
|
|
strvec_push(&pack_objects.args, "pack-objects");
|
|
|
|
strvec_push(&pack_objects.args, "--revs");
|
2020-06-05 01:54:38 +08:00
|
|
|
if (pack_data->use_thin_pack)
|
2020-07-29 04:25:12 +08:00
|
|
|
strvec_push(&pack_objects.args, "--thin");
|
2005-07-05 07:35:13 +08:00
|
|
|
|
2020-07-29 04:25:12 +08:00
|
|
|
strvec_push(&pack_objects.args, "--stdout");
|
2020-06-11 20:05:09 +08:00
|
|
|
if (pack_data->shallow_nr)
|
2020-07-29 04:25:12 +08:00
|
|
|
strvec_push(&pack_objects.args, "--shallow");
|
2020-06-05 01:54:38 +08:00
|
|
|
if (!pack_data->no_progress)
|
2020-07-29 04:25:12 +08:00
|
|
|
strvec_push(&pack_objects.args, "--progress");
|
2020-06-05 01:54:38 +08:00
|
|
|
if (pack_data->use_ofs_delta)
|
2020-07-29 04:25:12 +08:00
|
|
|
strvec_push(&pack_objects.args, "--delta-base-offset");
|
2020-06-05 01:54:38 +08:00
|
|
|
if (pack_data->use_include_tag)
|
2020-07-29 04:25:12 +08:00
|
|
|
strvec_push(&pack_objects.args, "--include-tag");
|
2020-05-15 18:04:53 +08:00
|
|
|
if (pack_data->filter_options.choice) {
|
2019-06-28 06:54:10 +08:00
|
|
|
const char *spec =
|
2020-05-15 18:04:53 +08:00
|
|
|
expand_list_objects_filter_spec(&pack_data->filter_options);
|
2021-01-29 00:04:53 +08:00
|
|
|
strvec_pushf(&pack_objects.args, "--filter=%s", spec);
|
2017-12-08 23:58:39 +08:00
|
|
|
}
|
2020-06-11 04:57:23 +08:00
|
|
|
if (uri_protocols) {
|
|
|
|
for (i = 0; i < uri_protocols->nr; i++)
|
2020-07-29 04:25:12 +08:00
|
|
|
strvec_pushf(&pack_objects.args, "--uri-protocol=%s",
|
2020-06-11 04:57:23 +08:00
|
|
|
uri_protocols->items[i].string);
|
|
|
|
}
|
2007-10-20 03:47:59 +08:00
|
|
|
|
upload-pack: start pack-objects before async rev-list
In a pthread-enabled version of upload-pack, there's a race condition
that can cause a deadlock on the fflush(NULL) we call from run-command.
What happens is this:
1. Upload-pack is informed we are doing a shallow clone.
2. We call start_async() to spawn a thread that will generate rev-list
results to feed to pack-objects. It gets a file descriptor to a
pipe which will eventually hook to pack-objects.
3. The rev-list thread uses fdopen to create a new output stream
around the fd we gave it, called pack_pipe.
4. The thread writes results to pack_pipe. Outside of our control,
libc is doing locking on the stream. We keep writing until the OS
pipe buffer is full, and then we block in write(), still holding
the lock.
5. The main thread now uses start_command to spawn pack-objects.
Before forking, it calls fflush(NULL) to flush every stdio output
buffer. It blocks trying to get the lock on pack_pipe.
And we have a deadlock. The thread will block until somebody starts
reading from the pipe. But nobody will read from the pipe until we
finish flushing to the pipe.
To fix this, we swap the start order: we start the
pack-objects reader first, and then the rev-list writer
after. Thus the problematic fflush(NULL) happens before we
even open the new file descriptor (and even if it didn't,
flushing should no longer block, as the reader at the end of
the pipe is now active).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-07 05:33:33 +08:00
|
|
|
pack_objects.in = -1;
|
2007-10-20 03:47:59 +08:00
|
|
|
pack_objects.out = -1;
|
|
|
|
pack_objects.err = -1;
|
upload-pack: kill pack-objects helper on signal or exit
We spawn an external pack-objects process to actually send objects to
the remote side. If we are killed by a signal during this process, then
pack-objects may continue to run. As soon as it starts producing output
for the pack, it will see a failure writing to upload-pack and exit
itself. But before then, it may do significant work traversing the
object graph, compressing deltas, etc, which will all be pointless. So
let's make sure to kill as soon as we know that the caller will not read
the result.
There's no test here, since it's inherently racy, but here's an easy
reproduction is on a large-ish repo like linux.git:
- make sure you don't have pack bitmaps (since they make the enumerating
phase go quickly). For linux.git it takes ~30s or so to walk the
whole graph on my machine.
- run "git clone --no-local -q . dst"; the "-q" is important because
if pack-objects is writing progress to upload-pack (to get
multiplexed over the sideband to the client), then it will notice
pretty quickly the failure to write to stderr
- kill the client-side clone process in another terminal (don't use
^C, as that will send SIGINT to all of the processes)
- run "ps au | grep git" or similar to observe upload-pack dying
within 5 seconds (it will send a keepalive that will notice the
client has gone away)
- but you'll still see pack-objects consuming 100% CPU (and 1GB+ of
RAM) during the traversal and delta compression phases. It will exit
as soon as it starts to write the pack (when it will notice that
upload-pack went away).
With this patch, pack-objects exits as soon as upload-pack does.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-01 20:15:13 +08:00
|
|
|
pack_objects.clean_on_exit = 1;
|
2007-10-20 03:48:03 +08:00
|
|
|
|
upload-pack: Use finish_{command,async}() instead of waitpid().
upload-pack spawns two processes, rev-list and pack-objects, and carefully
monitors their status so that it can report failure to the remote end.
This change removes the complicated procedures on the grounds of the
following observations:
- If everything is OK, rev-list closes its output pipe end, upon which
pack-objects (which reads from the pipe) sees EOF and terminates itself,
closing its output (and error) pipes. upload-pack reads from both until
it sees EOF in both. It collects the exit codes of the child processes
(which indicate success) and terminates successfully.
- If rev-list sees an error, it closes its output and terminates with
failure. pack-objects sees EOF in its input and terminates successfully.
Again upload-pack reads its inputs until EOF. When it now collects
the exit codes of its child processes, it notices the failure of rev-list
and signals failure to the remote end.
- If pack-objects sees an error, it terminates with failure. Since this
breaks the pipe to rev-list, rev-list is killed with SIGPIPE.
upload-pack reads its input until EOF, then collects the exit codes of
the child processes, notices their failures, and signals failure to the
remote end.
- If upload-pack itself dies unexpectedly, pack-objects is killed with
SIGPIPE, and subsequently also rev-list.
The upshot of this is that precise monitoring of child processes is not
required because both terminate if either one of them dies unexpectedly.
This allows us to use finish_command() and finish_async() instead of
an explicit waitpid(2) call.
The change is smaller than it looks because most of it only reduces the
indentation of a large part of the inner loop.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 03:46:48 +08:00
|
|
|
if (start_command(&pack_objects))
|
2008-09-01 00:39:19 +08:00
|
|
|
die("git upload-pack: unable to fork git-pack-objects");
|
2006-06-21 09:26:34 +08:00
|
|
|
|
2013-08-16 17:52:05 +08:00
|
|
|
pipe_fd = xfdopen(pack_objects.in, "w");
|
|
|
|
|
2020-06-11 20:05:09 +08:00
|
|
|
if (pack_data->shallow_nr)
|
2014-03-11 20:59:46 +08:00
|
|
|
for_each_commit_graft(write_one_shallow, pipe_fd);
|
|
|
|
|
2020-05-15 18:04:53 +08:00
|
|
|
for (i = 0; i < pack_data->want_obj.nr; i++)
|
2013-08-16 17:52:05 +08:00
|
|
|
fprintf(pipe_fd, "%s\n",
|
2020-05-15 18:04:53 +08:00
|
|
|
oid_to_hex(&pack_data->want_obj.objects[i].item->oid));
|
2013-08-16 17:52:05 +08:00
|
|
|
fprintf(pipe_fd, "--not\n");
|
2020-05-15 18:04:53 +08:00
|
|
|
for (i = 0; i < pack_data->have_obj.nr; i++)
|
2013-08-16 17:52:05 +08:00
|
|
|
fprintf(pipe_fd, "%s\n",
|
2020-05-15 18:04:53 +08:00
|
|
|
oid_to_hex(&pack_data->have_obj.objects[i].item->oid));
|
2020-06-11 20:05:10 +08:00
|
|
|
for (i = 0; i < pack_data->extra_edge_obj.nr; i++)
|
2013-08-16 17:52:05 +08:00
|
|
|
fprintf(pipe_fd, "%s\n",
|
2020-06-11 20:05:10 +08:00
|
|
|
oid_to_hex(&pack_data->extra_edge_obj.objects[i].item->oid));
|
2013-08-16 17:52:05 +08:00
|
|
|
fprintf(pipe_fd, "\n");
|
|
|
|
fflush(pipe_fd);
|
|
|
|
fclose(pipe_fd);
|
2009-06-10 07:50:18 +08:00
|
|
|
|
2007-10-20 03:47:59 +08:00
|
|
|
/* We read from pack_objects.err to capture stderr output for
|
|
|
|
* progress bar, and pack_objects.out to capture the pack data.
|
2006-06-21 09:26:34 +08:00
|
|
|
*/
|
|
|
|
|
|
|
|
while (1) {
|
|
|
|
struct pollfd pfd[2];
|
2020-06-05 01:54:46 +08:00
|
|
|
int pe, pu, pollsize, polltimeout;
|
2013-09-08 17:01:31 +08:00
|
|
|
int ret;
|
2006-06-21 09:26:34 +08:00
|
|
|
|
2020-06-05 01:54:40 +08:00
|
|
|
reset_timeout(pack_data->timeout);
|
2006-07-19 01:14:51 +08:00
|
|
|
|
2006-06-21 09:26:34 +08:00
|
|
|
pollsize = 0;
|
2006-06-21 13:48:23 +08:00
|
|
|
pe = pu = -1;
|
2006-06-21 09:26:34 +08:00
|
|
|
|
2007-10-20 03:47:59 +08:00
|
|
|
if (0 <= pack_objects.out) {
|
|
|
|
pfd[pollsize].fd = pack_objects.out;
|
2006-06-21 09:26:34 +08:00
|
|
|
pfd[pollsize].events = POLLIN;
|
|
|
|
pu = pollsize;
|
|
|
|
pollsize++;
|
|
|
|
}
|
2007-10-20 03:47:59 +08:00
|
|
|
if (0 <= pack_objects.err) {
|
|
|
|
pfd[pollsize].fd = pack_objects.err;
|
2006-06-21 13:48:23 +08:00
|
|
|
pfd[pollsize].events = POLLIN;
|
|
|
|
pe = pollsize;
|
|
|
|
pollsize++;
|
|
|
|
}
|
2006-06-21 09:26:34 +08:00
|
|
|
|
upload-pack: Use finish_{command,async}() instead of waitpid().
upload-pack spawns two processes, rev-list and pack-objects, and carefully
monitors their status so that it can report failure to the remote end.
This change removes the complicated procedures on the grounds of the
following observations:
- If everything is OK, rev-list closes its output pipe end, upon which
pack-objects (which reads from the pipe) sees EOF and terminates itself,
closing its output (and error) pipes. upload-pack reads from both until
it sees EOF in both. It collects the exit codes of the child processes
(which indicate success) and terminates successfully.
- If rev-list sees an error, it closes its output and terminates with
failure. pack-objects sees EOF in its input and terminates successfully.
Again upload-pack reads its inputs until EOF. When it now collects
the exit codes of its child processes, it notices the failure of rev-list
and signals failure to the remote end.
- If pack-objects sees an error, it terminates with failure. Since this
breaks the pipe to rev-list, rev-list is killed with SIGPIPE.
upload-pack reads its input until EOF, then collects the exit codes of
the child processes, notices their failures, and signals failure to the
remote end.
- If upload-pack itself dies unexpectedly, pack-objects is killed with
SIGPIPE, and subsequently also rev-list.
The upshot of this is that precise monitoring of child processes is not
required because both terminate if either one of them dies unexpectedly.
This allows us to use finish_command() and finish_async() instead of
an explicit waitpid(2) call.
The change is smaller than it looks because most of it only reduces the
indentation of a large part of the inner loop.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 03:46:48 +08:00
|
|
|
if (!pollsize)
|
|
|
|
break;
|
|
|
|
|
2020-06-05 01:54:46 +08:00
|
|
|
polltimeout = pack_data->keepalive < 0
|
|
|
|
? -1
|
|
|
|
: 1000 * pack_data->keepalive;
|
|
|
|
|
|
|
|
ret = poll(pfd, pollsize, polltimeout);
|
2014-08-22 23:19:11 +08:00
|
|
|
|
2013-09-08 17:01:31 +08:00
|
|
|
if (ret < 0) {
|
upload-pack: Use finish_{command,async}() instead of waitpid().
upload-pack spawns two processes, rev-list and pack-objects, and carefully
monitors their status so that it can report failure to the remote end.
This change removes the complicated procedures on the grounds of the
following observations:
- If everything is OK, rev-list closes its output pipe end, upon which
pack-objects (which reads from the pipe) sees EOF and terminates itself,
closing its output (and error) pipes. upload-pack reads from both until
it sees EOF in both. It collects the exit codes of the child processes
(which indicate success) and terminates successfully.
- If rev-list sees an error, it closes its output and terminates with
failure. pack-objects sees EOF in its input and terminates successfully.
Again upload-pack reads its inputs until EOF. When it now collects
the exit codes of its child processes, it notices the failure of rev-list
and signals failure to the remote end.
- If pack-objects sees an error, it terminates with failure. Since this
breaks the pipe to rev-list, rev-list is killed with SIGPIPE.
upload-pack reads its input until EOF, then collects the exit codes of
the child processes, notices their failures, and signals failure to the
remote end.
- If upload-pack itself dies unexpectedly, pack-objects is killed with
SIGPIPE, and subsequently also rev-list.
The upshot of this is that precise monitoring of child processes is not
required because both terminate if either one of them dies unexpectedly.
This allows us to use finish_command() and finish_async() instead of
an explicit waitpid(2) call.
The change is smaller than it looks because most of it only reduces the
indentation of a large part of the inner loop.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 03:46:48 +08:00
|
|
|
if (errno != EINTR) {
|
2016-05-08 17:47:59 +08:00
|
|
|
error_errno("poll failed, resuming");
|
upload-pack: Use finish_{command,async}() instead of waitpid().
upload-pack spawns two processes, rev-list and pack-objects, and carefully
monitors their status so that it can report failure to the remote end.
This change removes the complicated procedures on the grounds of the
following observations:
- If everything is OK, rev-list closes its output pipe end, upon which
pack-objects (which reads from the pipe) sees EOF and terminates itself,
closing its output (and error) pipes. upload-pack reads from both until
it sees EOF in both. It collects the exit codes of the child processes
(which indicate success) and terminates successfully.
- If rev-list sees an error, it closes its output and terminates with
failure. pack-objects sees EOF in its input and terminates successfully.
Again upload-pack reads its inputs until EOF. When it now collects
the exit codes of its child processes, it notices the failure of rev-list
and signals failure to the remote end.
- If pack-objects sees an error, it terminates with failure. Since this
breaks the pipe to rev-list, rev-list is killed with SIGPIPE.
upload-pack reads its input until EOF, then collects the exit codes of
the child processes, notices their failures, and signals failure to the
remote end.
- If upload-pack itself dies unexpectedly, pack-objects is killed with
SIGPIPE, and subsequently also rev-list.
The upshot of this is that precise monitoring of child processes is not
required because both terminate if either one of them dies unexpectedly.
This allows us to use finish_command() and finish_async() instead of
an explicit waitpid(2) call.
The change is smaller than it looks because most of it only reduces the
indentation of a large part of the inner loop.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 03:46:48 +08:00
|
|
|
sleep(1);
|
2006-06-21 09:26:34 +08:00
|
|
|
}
|
upload-pack: Use finish_{command,async}() instead of waitpid().
upload-pack spawns two processes, rev-list and pack-objects, and carefully
monitors their status so that it can report failure to the remote end.
This change removes the complicated procedures on the grounds of the
following observations:
- If everything is OK, rev-list closes its output pipe end, upon which
pack-objects (which reads from the pipe) sees EOF and terminates itself,
closing its output (and error) pipes. upload-pack reads from both until
it sees EOF in both. It collects the exit codes of the child processes
(which indicate success) and terminates successfully.
- If rev-list sees an error, it closes its output and terminates with
failure. pack-objects sees EOF in its input and terminates successfully.
Again upload-pack reads its inputs until EOF. When it now collects
the exit codes of its child processes, it notices the failure of rev-list
and signals failure to the remote end.
- If pack-objects sees an error, it terminates with failure. Since this
breaks the pipe to rev-list, rev-list is killed with SIGPIPE.
upload-pack reads its input until EOF, then collects the exit codes of
the child processes, notices their failures, and signals failure to the
remote end.
- If upload-pack itself dies unexpectedly, pack-objects is killed with
SIGPIPE, and subsequently also rev-list.
The upshot of this is that precise monitoring of child processes is not
required because both terminate if either one of them dies unexpectedly.
This allows us to use finish_command() and finish_async() instead of
an explicit waitpid(2) call.
The change is smaller than it looks because most of it only reduces the
indentation of a large part of the inner loop.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 03:46:48 +08:00
|
|
|
continue;
|
|
|
|
}
|
2009-11-12 06:24:42 +08:00
|
|
|
if (0 <= pe && (pfd[pe].revents & (POLLIN|POLLHUP))) {
|
|
|
|
/* Status ready; we ship that in the side-band
|
|
|
|
* or dump to the standard error.
|
|
|
|
*/
|
|
|
|
sz = xread(pack_objects.err, progress,
|
|
|
|
sizeof(progress));
|
|
|
|
if (0 < sz)
|
2020-06-05 01:54:41 +08:00
|
|
|
send_client_data(2, progress, sz,
|
|
|
|
pack_data->use_sideband);
|
2009-11-12 06:24:42 +08:00
|
|
|
else if (sz == 0) {
|
|
|
|
close(pack_objects.err);
|
|
|
|
pack_objects.err = -1;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
goto fail;
|
|
|
|
/* give priority to status messages */
|
|
|
|
continue;
|
|
|
|
}
|
upload-pack: Use finish_{command,async}() instead of waitpid().
upload-pack spawns two processes, rev-list and pack-objects, and carefully
monitors their status so that it can report failure to the remote end.
This change removes the complicated procedures on the grounds of the
following observations:
- If everything is OK, rev-list closes its output pipe end, upon which
pack-objects (which reads from the pipe) sees EOF and terminates itself,
closing its output (and error) pipes. upload-pack reads from both until
it sees EOF in both. It collects the exit codes of the child processes
(which indicate success) and terminates successfully.
- If rev-list sees an error, it closes its output and terminates with
failure. pack-objects sees EOF in its input and terminates successfully.
Again upload-pack reads its inputs until EOF. When it now collects
the exit codes of its child processes, it notices the failure of rev-list
and signals failure to the remote end.
- If pack-objects sees an error, it terminates with failure. Since this
breaks the pipe to rev-list, rev-list is killed with SIGPIPE.
upload-pack reads its input until EOF, then collects the exit codes of
the child processes, notices their failures, and signals failure to the
remote end.
- If upload-pack itself dies unexpectedly, pack-objects is killed with
SIGPIPE, and subsequently also rev-list.
The upshot of this is that precise monitoring of child processes is not
required because both terminate if either one of them dies unexpectedly.
This allows us to use finish_command() and finish_async() instead of
an explicit waitpid(2) call.
The change is smaller than it looks because most of it only reduces the
indentation of a large part of the inner loop.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 03:46:48 +08:00
|
|
|
if (0 <= pu && (pfd[pu].revents & (POLLIN|POLLHUP))) {
|
2020-06-11 04:57:21 +08:00
|
|
|
int result = relay_pack_data(pack_objects.out,
|
2021-12-15 03:46:26 +08:00
|
|
|
output_state,
|
2020-06-11 04:57:23 +08:00
|
|
|
pack_data->use_sideband,
|
|
|
|
!!uri_protocols);
|
2020-06-11 04:57:21 +08:00
|
|
|
|
|
|
|
if (result == 0) {
|
upload-pack: Use finish_{command,async}() instead of waitpid().
upload-pack spawns two processes, rev-list and pack-objects, and carefully
monitors their status so that it can report failure to the remote end.
This change removes the complicated procedures on the grounds of the
following observations:
- If everything is OK, rev-list closes its output pipe end, upon which
pack-objects (which reads from the pipe) sees EOF and terminates itself,
closing its output (and error) pipes. upload-pack reads from both until
it sees EOF in both. It collects the exit codes of the child processes
(which indicate success) and terminates successfully.
- If rev-list sees an error, it closes its output and terminates with
failure. pack-objects sees EOF in its input and terminates successfully.
Again upload-pack reads its inputs until EOF. When it now collects
the exit codes of its child processes, it notices the failure of rev-list
and signals failure to the remote end.
- If pack-objects sees an error, it terminates with failure. Since this
breaks the pipe to rev-list, rev-list is killed with SIGPIPE.
upload-pack reads its input until EOF, then collects the exit codes of
the child processes, notices their failures, and signals failure to the
remote end.
- If upload-pack itself dies unexpectedly, pack-objects is killed with
SIGPIPE, and subsequently also rev-list.
The upshot of this is that precise monitoring of child processes is not
required because both terminate if either one of them dies unexpectedly.
This allows us to use finish_command() and finish_async() instead of
an explicit waitpid(2) call.
The change is smaller than it looks because most of it only reduces the
indentation of a large part of the inner loop.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 03:46:48 +08:00
|
|
|
close(pack_objects.out);
|
|
|
|
pack_objects.out = -1;
|
2020-06-11 04:57:21 +08:00
|
|
|
} else if (result < 0) {
|
upload-pack: Use finish_{command,async}() instead of waitpid().
upload-pack spawns two processes, rev-list and pack-objects, and carefully
monitors their status so that it can report failure to the remote end.
This change removes the complicated procedures on the grounds of the
following observations:
- If everything is OK, rev-list closes its output pipe end, upon which
pack-objects (which reads from the pipe) sees EOF and terminates itself,
closing its output (and error) pipes. upload-pack reads from both until
it sees EOF in both. It collects the exit codes of the child processes
(which indicate success) and terminates successfully.
- If rev-list sees an error, it closes its output and terminates with
failure. pack-objects sees EOF in its input and terminates successfully.
Again upload-pack reads its inputs until EOF. When it now collects
the exit codes of its child processes, it notices the failure of rev-list
and signals failure to the remote end.
- If pack-objects sees an error, it terminates with failure. Since this
breaks the pipe to rev-list, rev-list is killed with SIGPIPE.
upload-pack reads its input until EOF, then collects the exit codes of
the child processes, notices their failures, and signals failure to the
remote end.
- If upload-pack itself dies unexpectedly, pack-objects is killed with
SIGPIPE, and subsequently also rev-list.
The upshot of this is that precise monitoring of child processes is not
required because both terminate if either one of them dies unexpectedly.
This allows us to use finish_command() and finish_async() instead of
an explicit waitpid(2) call.
The change is smaller than it looks because most of it only reduces the
indentation of a large part of the inner loop.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 03:46:48 +08:00
|
|
|
goto fail;
|
2006-06-21 09:26:34 +08:00
|
|
|
}
|
upload-pack: Use finish_{command,async}() instead of waitpid().
upload-pack spawns two processes, rev-list and pack-objects, and carefully
monitors their status so that it can report failure to the remote end.
This change removes the complicated procedures on the grounds of the
following observations:
- If everything is OK, rev-list closes its output pipe end, upon which
pack-objects (which reads from the pipe) sees EOF and terminates itself,
closing its output (and error) pipes. upload-pack reads from both until
it sees EOF in both. It collects the exit codes of the child processes
(which indicate success) and terminates successfully.
- If rev-list sees an error, it closes its output and terminates with
failure. pack-objects sees EOF in its input and terminates successfully.
Again upload-pack reads its inputs until EOF. When it now collects
the exit codes of its child processes, it notices the failure of rev-list
and signals failure to the remote end.
- If pack-objects sees an error, it terminates with failure. Since this
breaks the pipe to rev-list, rev-list is killed with SIGPIPE.
upload-pack reads its input until EOF, then collects the exit codes of
the child processes, notices their failures, and signals failure to the
remote end.
- If upload-pack itself dies unexpectedly, pack-objects is killed with
SIGPIPE, and subsequently also rev-list.
The upshot of this is that precise monitoring of child processes is not
required because both terminate if either one of them dies unexpectedly.
This allows us to use finish_command() and finish_async() instead of
an explicit waitpid(2) call.
The change is smaller than it looks because most of it only reduces the
indentation of a large part of the inner loop.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 03:46:48 +08:00
|
|
|
}
|
2013-09-08 17:01:31 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We hit the keepalive timeout without saying anything; send
|
|
|
|
* an empty message on the data sideband just to let the other
|
|
|
|
* side know we're still working on it, but don't have any data
|
|
|
|
* yet.
|
|
|
|
*
|
|
|
|
* If we don't have a sideband channel, there's no room in the
|
|
|
|
* protocol to say anything, so those clients are just out of
|
|
|
|
* luck.
|
|
|
|
*/
|
2020-06-05 01:54:41 +08:00
|
|
|
if (!ret && pack_data->use_sideband) {
|
2013-09-08 17:01:31 +08:00
|
|
|
static const char buf[] = "0005\1";
|
|
|
|
write_or_die(1, buf, 5);
|
|
|
|
}
|
upload-pack: Use finish_{command,async}() instead of waitpid().
upload-pack spawns two processes, rev-list and pack-objects, and carefully
monitors their status so that it can report failure to the remote end.
This change removes the complicated procedures on the grounds of the
following observations:
- If everything is OK, rev-list closes its output pipe end, upon which
pack-objects (which reads from the pipe) sees EOF and terminates itself,
closing its output (and error) pipes. upload-pack reads from both until
it sees EOF in both. It collects the exit codes of the child processes
(which indicate success) and terminates successfully.
- If rev-list sees an error, it closes its output and terminates with
failure. pack-objects sees EOF in its input and terminates successfully.
Again upload-pack reads its inputs until EOF. When it now collects
the exit codes of its child processes, it notices the failure of rev-list
and signals failure to the remote end.
- If pack-objects sees an error, it terminates with failure. Since this
breaks the pipe to rev-list, rev-list is killed with SIGPIPE.
upload-pack reads its input until EOF, then collects the exit codes of
the child processes, notices their failures, and signals failure to the
remote end.
- If upload-pack itself dies unexpectedly, pack-objects is killed with
SIGPIPE, and subsequently also rev-list.
The upshot of this is that precise monitoring of child processes is not
required because both terminate if either one of them dies unexpectedly.
This allows us to use finish_command() and finish_async() instead of
an explicit waitpid(2) call.
The change is smaller than it looks because most of it only reduces the
indentation of a large part of the inner loop.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 03:46:48 +08:00
|
|
|
}
|
2006-06-21 09:26:34 +08:00
|
|
|
|
upload-pack: Use finish_{command,async}() instead of waitpid().
upload-pack spawns two processes, rev-list and pack-objects, and carefully
monitors their status so that it can report failure to the remote end.
This change removes the complicated procedures on the grounds of the
following observations:
- If everything is OK, rev-list closes its output pipe end, upon which
pack-objects (which reads from the pipe) sees EOF and terminates itself,
closing its output (and error) pipes. upload-pack reads from both until
it sees EOF in both. It collects the exit codes of the child processes
(which indicate success) and terminates successfully.
- If rev-list sees an error, it closes its output and terminates with
failure. pack-objects sees EOF in its input and terminates successfully.
Again upload-pack reads its inputs until EOF. When it now collects
the exit codes of its child processes, it notices the failure of rev-list
and signals failure to the remote end.
- If pack-objects sees an error, it terminates with failure. Since this
breaks the pipe to rev-list, rev-list is killed with SIGPIPE.
upload-pack reads its input until EOF, then collects the exit codes of
the child processes, notices their failures, and signals failure to the
remote end.
- If upload-pack itself dies unexpectedly, pack-objects is killed with
SIGPIPE, and subsequently also rev-list.
The upshot of this is that precise monitoring of child processes is not
required because both terminate if either one of them dies unexpectedly.
This allows us to use finish_command() and finish_async() instead of
an explicit waitpid(2) call.
The change is smaller than it looks because most of it only reduces the
indentation of a large part of the inner loop.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 03:46:48 +08:00
|
|
|
if (finish_command(&pack_objects)) {
|
2008-09-01 00:39:19 +08:00
|
|
|
error("git upload-pack: git-pack-objects died with error.");
|
upload-pack: Use finish_{command,async}() instead of waitpid().
upload-pack spawns two processes, rev-list and pack-objects, and carefully
monitors their status so that it can report failure to the remote end.
This change removes the complicated procedures on the grounds of the
following observations:
- If everything is OK, rev-list closes its output pipe end, upon which
pack-objects (which reads from the pipe) sees EOF and terminates itself,
closing its output (and error) pipes. upload-pack reads from both until
it sees EOF in both. It collects the exit codes of the child processes
(which indicate success) and terminates successfully.
- If rev-list sees an error, it closes its output and terminates with
failure. pack-objects sees EOF in its input and terminates successfully.
Again upload-pack reads its inputs until EOF. When it now collects
the exit codes of its child processes, it notices the failure of rev-list
and signals failure to the remote end.
- If pack-objects sees an error, it terminates with failure. Since this
breaks the pipe to rev-list, rev-list is killed with SIGPIPE.
upload-pack reads its input until EOF, then collects the exit codes of
the child processes, notices their failures, and signals failure to the
remote end.
- If upload-pack itself dies unexpectedly, pack-objects is killed with
SIGPIPE, and subsequently also rev-list.
The upshot of this is that precise monitoring of child processes is not
required because both terminate if either one of them dies unexpectedly.
This allows us to use finish_command() and finish_async() instead of
an explicit waitpid(2) call.
The change is smaller than it looks because most of it only reduces the
indentation of a large part of the inner loop.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 03:46:48 +08:00
|
|
|
goto fail;
|
|
|
|
}
|
2006-06-21 09:26:34 +08:00
|
|
|
|
upload-pack: Use finish_{command,async}() instead of waitpid().
upload-pack spawns two processes, rev-list and pack-objects, and carefully
monitors their status so that it can report failure to the remote end.
This change removes the complicated procedures on the grounds of the
following observations:
- If everything is OK, rev-list closes its output pipe end, upon which
pack-objects (which reads from the pipe) sees EOF and terminates itself,
closing its output (and error) pipes. upload-pack reads from both until
it sees EOF in both. It collects the exit codes of the child processes
(which indicate success) and terminates successfully.
- If rev-list sees an error, it closes its output and terminates with
failure. pack-objects sees EOF in its input and terminates successfully.
Again upload-pack reads its inputs until EOF. When it now collects
the exit codes of its child processes, it notices the failure of rev-list
and signals failure to the remote end.
- If pack-objects sees an error, it terminates with failure. Since this
breaks the pipe to rev-list, rev-list is killed with SIGPIPE.
upload-pack reads its input until EOF, then collects the exit codes of
the child processes, notices their failures, and signals failure to the
remote end.
- If upload-pack itself dies unexpectedly, pack-objects is killed with
SIGPIPE, and subsequently also rev-list.
The upshot of this is that precise monitoring of child processes is not
required because both terminate if either one of them dies unexpectedly.
This allows us to use finish_command() and finish_async() instead of
an explicit waitpid(2) call.
The change is smaller than it looks because most of it only reduces the
indentation of a large part of the inner loop.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 03:46:48 +08:00
|
|
|
/* flush the data */
|
2021-12-15 03:46:26 +08:00
|
|
|
if (output_state->used > 0) {
|
|
|
|
send_client_data(1, output_state->buffer, output_state->used,
|
2020-06-05 01:54:41 +08:00
|
|
|
pack_data->use_sideband);
|
upload-pack: Use finish_{command,async}() instead of waitpid().
upload-pack spawns two processes, rev-list and pack-objects, and carefully
monitors their status so that it can report failure to the remote end.
This change removes the complicated procedures on the grounds of the
following observations:
- If everything is OK, rev-list closes its output pipe end, upon which
pack-objects (which reads from the pipe) sees EOF and terminates itself,
closing its output (and error) pipes. upload-pack reads from both until
it sees EOF in both. It collects the exit codes of the child processes
(which indicate success) and terminates successfully.
- If rev-list sees an error, it closes its output and terminates with
failure. pack-objects sees EOF in its input and terminates successfully.
Again upload-pack reads its inputs until EOF. When it now collects
the exit codes of its child processes, it notices the failure of rev-list
and signals failure to the remote end.
- If pack-objects sees an error, it terminates with failure. Since this
breaks the pipe to rev-list, rev-list is killed with SIGPIPE.
upload-pack reads its input until EOF, then collects the exit codes of
the child processes, notices their failures, and signals failure to the
remote end.
- If upload-pack itself dies unexpectedly, pack-objects is killed with
SIGPIPE, and subsequently also rev-list.
The upshot of this is that precise monitoring of child processes is not
required because both terminate if either one of them dies unexpectedly.
This allows us to use finish_command() and finish_async() instead of
an explicit waitpid(2) call.
The change is smaller than it looks because most of it only reduces the
indentation of a large part of the inner loop.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 03:46:48 +08:00
|
|
|
fprintf(stderr, "flushed.\n");
|
2006-06-21 09:26:34 +08:00
|
|
|
}
|
2021-12-15 03:46:26 +08:00
|
|
|
free(output_state);
|
2020-06-05 01:54:41 +08:00
|
|
|
if (pack_data->use_sideband)
|
upload-pack: Use finish_{command,async}() instead of waitpid().
upload-pack spawns two processes, rev-list and pack-objects, and carefully
monitors their status so that it can report failure to the remote end.
This change removes the complicated procedures on the grounds of the
following observations:
- If everything is OK, rev-list closes its output pipe end, upon which
pack-objects (which reads from the pipe) sees EOF and terminates itself,
closing its output (and error) pipes. upload-pack reads from both until
it sees EOF in both. It collects the exit codes of the child processes
(which indicate success) and terminates successfully.
- If rev-list sees an error, it closes its output and terminates with
failure. pack-objects sees EOF in its input and terminates successfully.
Again upload-pack reads its inputs until EOF. When it now collects
the exit codes of its child processes, it notices the failure of rev-list
and signals failure to the remote end.
- If pack-objects sees an error, it terminates with failure. Since this
breaks the pipe to rev-list, rev-list is killed with SIGPIPE.
upload-pack reads its input until EOF, then collects the exit codes of
the child processes, notices their failures, and signals failure to the
remote end.
- If upload-pack itself dies unexpectedly, pack-objects is killed with
SIGPIPE, and subsequently also rev-list.
The upshot of this is that precise monitoring of child processes is not
required because both terminate if either one of them dies unexpectedly.
This allows us to use finish_command() and finish_async() instead of
an explicit waitpid(2) call.
The change is smaller than it looks because most of it only reduces the
indentation of a large part of the inner loop.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 03:46:48 +08:00
|
|
|
packet_flush(1);
|
|
|
|
return;
|
|
|
|
|
2006-06-21 09:26:34 +08:00
|
|
|
fail:
|
2022-07-28 07:13:42 +08:00
|
|
|
free(output_state);
|
2024-02-26 02:34:52 +08:00
|
|
|
send_client_data(3, abort_msg, strlen(abort_msg),
|
2020-06-05 01:54:41 +08:00
|
|
|
pack_data->use_sideband);
|
2008-09-01 00:39:19 +08:00
|
|
|
die("git upload-pack: %s", abort_msg);
|
2005-07-05 06:29:17 +08:00
|
|
|
}
|
|
|
|
|
2020-06-11 20:05:18 +08:00
|
|
|
static int do_got_oid(struct upload_pack_data *data, const struct object_id *oid)
|
2005-07-05 04:26:53 +08:00
|
|
|
{
|
2006-07-06 12:28:20 +08:00
|
|
|
int we_knew_they_have = 0;
|
upload-pack: use PARSE_OBJECT_SKIP_HASH_CHECK in more places
In commit 0bc2557951 (upload-pack: skip parse-object re-hashing of
"want" objects, 2022-09-06), we optimized the parse_object() calls for
v2 "want" lines from the client so that they avoided parsing blobs, and
so that they used the commit-graph rather than parsing commit objects
from scratch.
We should extend that to two other spots:
1. We parse "have" objects in the got_oid() function. These won't
generally be non-commits (unlike "want" lines from a partial
clone). But we still benefit from the use of the commit-graph.
2. For v0, the "want" lines are parsed in receive_needs(). These are
also less likely to be non-commits because by default they have to
be ref tips. There are config options you might set to allow
non-tip objects, but you'd mostly do so to support partial clones,
and clients recent enough to support partial clone will generally
speak v2 anyway.
So I don't expect this change to improve performance much for day-to-day
operations. But both are possible denial-of-service vectors, where an
attacker can waste our time by sending over a large number of objects to
parse (of course we may waste even more time serving a pack to them, but
we try as much as possible to optimize that in pack-objects; we should
do what we can here in upload-pack, too).
With this patch, running p5600 with GIT_TEST_PROTOCOL_VERSION=0 shows
similar results to what we saw in 0bc2557951 (which ran with the v2
protocol by default). Here are the numbers for linux.git:
Test HEAD^ HEAD
-----------------------------------------------------------------------------
5600.3: checkout of result 50.91(87.95+2.93) 41.75(79.00+3.18) -18.0%
Or for a more extreme (and malicious) case, we can claim to "have" every
blob in git.git over the v0 protocol:
$ {
echo "0032want $(git rev-parse HEAD)"
printf 0000
git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' |
perl -alne 'print "0032have $F[0]" if $F[1] eq "blob"'
} >input
$ time ./git.old upload-pack . <input >/dev/null
real 0m52.951s
user 0m51.633s
sys 0m1.304s
$ time ./git.new upload-pack . <input >/dev/null
real 0m0.261s
user 0m0.156s
sys 0m0.105s
(Note that these don't actually compute a pack because of the hacky
protocol usage, so those numbers are representing the raw blob-parsing
effort done by upload-pack).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:39:03 +08:00
|
|
|
struct object *o = parse_object_with_flags(the_repository, oid,
|
upload-pack: free tree buffers after parsing
When a client sends us a "want" or "have" line, we call parse_object()
to get an object struct. If the object is a tree, then the parsed state
means that tree->buffer points to the uncompressed contents of the tree.
But we don't really care about it. We only really need to parse commits
and tags; for trees and blobs, the important output is just a "struct
object" with the correct type.
But much worse, we do not ever free that tree buffer. It's not leaked in
the traditional sense, in that we still have a pointer to it from the
global object hash. But if the client requests many trees, we'll hold
all of their contents in memory at the same time.
Nobody really noticed because it's rare for clients to directly request
a tree. It might happen for a lightweight tag pointing straight at a
tree, or it might happen for a "tree:depth" partial clone filling in
missing trees.
But it's also possible for a malicious client to request a lot of trees,
causing upload-pack's memory to balloon. For example, without this
patch, requesting every tree in git.git like:
pktline() {
local msg="$*"
printf "%04x%s\n" $((1+4+${#msg})) "$msg"
}
want_trees() {
pktline command=fetch
printf 0001
git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' |
while read oid type; do
test "$type" = "tree" || continue
pktline want $oid
done
pktline done
printf 0000
}
want_trees | GIT_PROTOCOL=version=2 valgrind --tool=massif ./git upload-pack . >/dev/null
shows a peak heap usage of ~3.7GB. Which is just about the sum of the
sizes of all of the uncompressed trees. For linux.git, it's closer to
17GB.
So the obvious thing to do is to call free_tree_buffer() after we
realize that we've parsed a tree. We know that upload-pack won't need it
later. But let's push the logic into parse_object_with_flags(), telling
it to discard the tree buffer immediately. There are two reasons for
this. One, all of the relevant call-sites already call the with_options
variant to pass the SKIP_HASH flag. So it actually ends up as less code
than manually free-ing in each spot. And two, it enables an extra
optimization that I'll discuss below.
I've touched all of the sites that currently use SKIP_HASH in
upload-pack. That drops the peak heap of the upload-pack invocation
above from 3.7GB to ~24MB.
I've also modified the caller in get_reference(); a partial clone
benefits from its use in pack-objects for the reasons given in
0bc2557951 (upload-pack: skip parse-object re-hashing of "want" objects,
2022-09-06), where we were measuring blob requests. But note that the
results of get_reference() are used for traversing, as well; so we
really would _eventually_ use the tree contents. That makes this at
first glance a space/time tradeoff: we won't hold all of the trees in
memory at once, but we'll have to reload them each when it comes time to
traverse.
And here's where our extra optimization comes in. If the caller is not
going to immediately look at the tree contents, and it doesn't care
about checking the hash, then parse_object() can simply skip loading the
tree entirely, just like we do for blobs! And now it's not a space/time
tradeoff in get_reference() anymore. It's just a lazy-load: we're
delaying reading the tree contents until it's time to actually traverse
them one by one.
And of course for upload-pack, this optimization means we never load the
trees at all, saving lots of CPU time. Timing the "every tree from
git.git" request above shows upload-pack dropping from 32 seconds of CPU
to 19 (the remainder is mostly due to pack-objects actually sending the
pack; timing just the upload-pack portion shows we go from 13s to
~0.28s).
These are all highly gamed numbers, of course. For real-world
partial-clone requests we're saving only a small bit of time in
practice. But it does help harden upload-pack against malicious
denial-of-service attacks.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:39:07 +08:00
|
|
|
PARSE_OBJECT_SKIP_HASH_CHECK |
|
|
|
|
PARSE_OBJECT_DISCARD_TREE);
|
2006-07-06 09:00:02 +08:00
|
|
|
|
|
|
|
if (!o)
|
2017-05-07 06:10:28 +08:00
|
|
|
die("oops (%s)", oid_to_hex(oid));
|
2006-08-13 13:16:51 +08:00
|
|
|
if (o->type == OBJ_COMMIT) {
|
2006-07-06 09:00:02 +08:00
|
|
|
struct commit_list *parents;
|
2006-07-06 12:28:20 +08:00
|
|
|
struct commit *commit = (struct commit *)o;
|
2006-07-06 09:00:02 +08:00
|
|
|
if (o->flags & THEY_HAVE)
|
2006-07-06 12:28:20 +08:00
|
|
|
we_knew_they_have = 1;
|
|
|
|
else
|
|
|
|
o->flags |= THEY_HAVE;
|
2020-06-11 20:05:17 +08:00
|
|
|
if (!data->oldest_have || (commit->date < data->oldest_have))
|
|
|
|
data->oldest_have = commit->date;
|
2006-07-06 12:28:20 +08:00
|
|
|
for (parents = commit->parents;
|
2006-07-06 09:00:02 +08:00
|
|
|
parents;
|
|
|
|
parents = parents->next)
|
|
|
|
parents->item->object.flags |= THEY_HAVE;
|
2005-07-05 06:29:17 +08:00
|
|
|
}
|
2006-07-06 12:28:20 +08:00
|
|
|
if (!we_knew_they_have) {
|
2020-06-11 20:05:16 +08:00
|
|
|
add_object_array(o, NULL, &data->have_obj);
|
2006-07-06 12:28:20 +08:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-06-11 20:05:18 +08:00
|
|
|
static int got_oid(struct upload_pack_data *data,
|
|
|
|
const char *hex, struct object_id *oid)
|
|
|
|
{
|
|
|
|
if (get_oid_hex(hex, oid))
|
|
|
|
die("git upload-pack: expected SHA1 object, got '%s'", hex);
|
2023-03-28 21:58:50 +08:00
|
|
|
if (!repo_has_object_file_with_flags(the_repository, oid,
|
|
|
|
OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT))
|
2020-06-11 20:05:18 +08:00
|
|
|
return -1;
|
|
|
|
return do_got_oid(data, oid);
|
|
|
|
}
|
|
|
|
|
2020-06-11 20:05:15 +08:00
|
|
|
static int ok_to_give_up(struct upload_pack_data *data)
|
2006-07-06 12:28:20 +08:00
|
|
|
{
|
2021-01-17 02:11:13 +08:00
|
|
|
timestamp_t min_generation = GENERATION_NUMBER_ZERO;
|
2006-07-06 12:28:20 +08:00
|
|
|
|
2020-06-11 20:05:15 +08:00
|
|
|
if (!data->have_obj.nr)
|
2006-07-06 12:28:20 +08:00
|
|
|
return 0;
|
|
|
|
|
2020-06-11 20:05:15 +08:00
|
|
|
return can_all_from_reach_with_flag(&data->want_obj, THEY_HAVE,
|
2020-06-11 20:05:17 +08:00
|
|
|
COMMON_KNOWN, data->oldest_have,
|
2018-07-21 00:33:28 +08:00
|
|
|
min_generation);
|
2005-07-05 04:26:53 +08:00
|
|
|
}
|
|
|
|
|
2020-05-15 18:04:46 +08:00
|
|
|
static int get_common_commits(struct upload_pack_data *data,
|
|
|
|
struct packet_reader *reader)
|
2005-07-05 04:26:53 +08:00
|
|
|
{
|
2017-05-07 06:10:28 +08:00
|
|
|
struct object_id oid;
|
|
|
|
char last_hex[GIT_MAX_HEXSZ + 1];
|
2011-03-15 07:48:39 +08:00
|
|
|
int got_common = 0;
|
|
|
|
int got_other = 0;
|
2011-03-30 03:29:10 +08:00
|
|
|
int sent_ready = 0;
|
2005-07-05 04:26:53 +08:00
|
|
|
|
2009-09-01 13:35:10 +08:00
|
|
|
for (;;) {
|
2016-06-12 18:53:49 +08:00
|
|
|
const char *arg;
|
|
|
|
|
2020-06-05 01:54:40 +08:00
|
|
|
reset_timeout(data->timeout);
|
2005-07-05 04:26:53 +08:00
|
|
|
|
2018-12-30 05:19:14 +08:00
|
|
|
if (packet_reader_read(reader) != PACKET_READ_NORMAL) {
|
2020-06-05 01:54:44 +08:00
|
|
|
if (data->multi_ack == MULTI_ACK_DETAILED
|
2020-05-15 18:04:46 +08:00
|
|
|
&& got_common
|
|
|
|
&& !got_other
|
2020-06-11 20:05:15 +08:00
|
|
|
&& ok_to_give_up(data)) {
|
2011-03-30 03:29:10 +08:00
|
|
|
sent_ready = 1;
|
2016-10-17 07:20:29 +08:00
|
|
|
packet_write_fmt(1, "ACK %s ready\n", last_hex);
|
2011-03-30 03:29:10 +08:00
|
|
|
}
|
2020-06-05 01:54:43 +08:00
|
|
|
if (data->have_obj.nr == 0 || data->multi_ack)
|
2016-10-17 07:20:29 +08:00
|
|
|
packet_write_fmt(1, "NAK\n");
|
2011-03-30 03:29:10 +08:00
|
|
|
|
2020-06-05 01:54:40 +08:00
|
|
|
if (data->no_done && sent_ready) {
|
2016-10-17 07:20:29 +08:00
|
|
|
packet_write_fmt(1, "ACK %s\n", last_hex);
|
2011-03-30 03:29:10 +08:00
|
|
|
return 0;
|
|
|
|
}
|
2020-05-15 18:04:52 +08:00
|
|
|
if (data->stateless_rpc)
|
2009-10-31 08:47:33 +08:00
|
|
|
exit(0);
|
2011-03-15 07:48:39 +08:00
|
|
|
got_common = 0;
|
|
|
|
got_other = 0;
|
2005-07-05 04:26:53 +08:00
|
|
|
continue;
|
|
|
|
}
|
2018-12-30 05:19:14 +08:00
|
|
|
if (skip_prefix(reader->line, "have ", &arg)) {
|
2020-06-11 20:05:16 +08:00
|
|
|
switch (got_oid(data, arg, &oid)) {
|
2006-07-06 12:28:20 +08:00
|
|
|
case -1: /* they have what we do not */
|
2011-03-15 07:48:39 +08:00
|
|
|
got_other = 1;
|
2020-06-05 01:54:43 +08:00
|
|
|
if (data->multi_ack
|
2020-06-11 20:05:15 +08:00
|
|
|
&& ok_to_give_up(data)) {
|
2017-05-07 06:10:28 +08:00
|
|
|
const char *hex = oid_to_hex(&oid);
|
2020-06-05 01:54:44 +08:00
|
|
|
if (data->multi_ack == MULTI_ACK_DETAILED) {
|
2011-03-30 03:29:10 +08:00
|
|
|
sent_ready = 1;
|
2016-10-17 07:20:29 +08:00
|
|
|
packet_write_fmt(1, "ACK %s ready\n", hex);
|
2011-03-30 03:29:10 +08:00
|
|
|
} else
|
2016-10-17 07:20:29 +08:00
|
|
|
packet_write_fmt(1, "ACK %s continue\n", hex);
|
2009-10-31 08:47:25 +08:00
|
|
|
}
|
2006-07-06 12:28:20 +08:00
|
|
|
break;
|
|
|
|
default:
|
2011-03-15 07:48:39 +08:00
|
|
|
got_common = 1;
|
2018-05-02 08:25:51 +08:00
|
|
|
oid_to_hex_r(last_hex, &oid);
|
2020-06-05 01:54:44 +08:00
|
|
|
if (data->multi_ack == MULTI_ACK_DETAILED)
|
2016-10-17 07:20:29 +08:00
|
|
|
packet_write_fmt(1, "ACK %s common\n", last_hex);
|
2020-06-05 01:54:43 +08:00
|
|
|
else if (data->multi_ack)
|
2016-10-17 07:20:29 +08:00
|
|
|
packet_write_fmt(1, "ACK %s continue\n", last_hex);
|
2020-05-15 18:04:46 +08:00
|
|
|
else if (data->have_obj.nr == 1)
|
2016-10-17 07:20:29 +08:00
|
|
|
packet_write_fmt(1, "ACK %s\n", last_hex);
|
2006-07-06 12:28:20 +08:00
|
|
|
break;
|
2005-10-26 05:55:24 +08:00
|
|
|
}
|
2005-07-05 04:26:53 +08:00
|
|
|
continue;
|
|
|
|
}
|
2018-12-30 05:19:14 +08:00
|
|
|
if (!strcmp(reader->line, "done")) {
|
2020-05-15 18:04:46 +08:00
|
|
|
if (data->have_obj.nr > 0) {
|
2020-06-05 01:54:43 +08:00
|
|
|
if (data->multi_ack)
|
2016-10-17 07:20:29 +08:00
|
|
|
packet_write_fmt(1, "ACK %s\n", last_hex);
|
2005-10-28 10:49:16 +08:00
|
|
|
return 0;
|
|
|
|
}
|
2016-10-17 07:20:29 +08:00
|
|
|
packet_write_fmt(1, "NAK\n");
|
2005-07-05 04:26:53 +08:00
|
|
|
return -1;
|
|
|
|
}
|
2018-12-30 05:19:14 +08:00
|
|
|
die("git upload-pack: expected SHA1 list, got '%s'", reader->line);
|
2005-07-05 04:26:53 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
upload-pack.c: avoid enumerating hidden refs where possible
In a similar fashion as a previous commit, teach `upload-pack` to avoid
enumerating hidden references where possible.
Note, however, that there are certain cases where cannot avoid
enumerating even hidden references, in particular when either of:
- `uploadpack.allowTipSHA1InWant`, or
- `uploadpack.allowReachableSHA1InWant`
are set, corresponding to `ALLOW_TIP_SHA1` and `ALLOW_REACHABLE_SHA1`,
respectively.
When either of these bits are set, upload-pack's `is_our_ref()` function
needs to consider the `HIDDEN_REF` bit of the referent's object flags.
So we must visit all references, including the hidden ones, in order to
mark their referents with the `HIDDEN_REF` bit.
When neither `ALLOW_TIP_SHA1` nor `ALLOW_REACHABLE_SHA1` are set, the
`is_our_ref()` function considers only the `OUR_REF` bit, and not the
`HIDDEN_REF` one. `OUR_REF` is applied via `mark_our_ref()`, and only
to objects at the tips of non-hidden references, so we do not need to
visit hidden references in this case.
When neither of those bits are set, `upload-pack` can potentially avoid
enumerating a large number of references. In the same example as a
previous commit (linux.git with one hidden reference per commit,
"refs/pull/N"):
$ printf 0000 >in
$ hyperfine --warmup=1 \
'git -c transfer.hideRefs=refs/pull upload-pack . <in' \
'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in' \
'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in'
Benchmark 1: git -c transfer.hideRefs=refs/pull upload-pack . <in
Time (mean ± σ): 406.9 ms ± 1.1 ms [User: 357.3 ms, System: 49.5 ms]
Range (min … max): 405.7 ms … 409.2 ms 10 runs
Benchmark 2: git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in
Time (mean ± σ): 406.5 ms ± 1.3 ms [User: 356.5 ms, System: 49.9 ms]
Range (min … max): 404.6 ms … 408.8 ms 10 runs
Benchmark 3: git.compile -c transfer.hideRefs=refs/pull upload-pack . <in
Time (mean ± σ): 4.7 ms ± 0.2 ms [User: 0.7 ms, System: 3.9 ms]
Range (min … max): 4.3 ms … 6.1 ms 472 runs
Summary
'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in' ran
86.62 ± 4.33 times faster than 'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in'
86.70 ± 4.33 times faster than 'git -c transfer.hideRefs=refs/pull upload-pack . <in'
As above, we must visit every reference when
uploadPack.allowTipSHA1InWant is set. But when it is unset, we can visit
far fewer references.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-11 05:12:45 +08:00
|
|
|
static int allow_hidden_refs(enum allow_uor allow_uor)
|
|
|
|
{
|
|
|
|
if ((allow_uor & ALLOW_ANY_SHA1) == ALLOW_ANY_SHA1)
|
|
|
|
return 1;
|
|
|
|
return !(allow_uor & (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1));
|
|
|
|
}
|
|
|
|
|
|
|
|
static void for_each_namespaced_ref_1(each_ref_fn fn,
|
|
|
|
struct upload_pack_data *data)
|
|
|
|
{
|
|
|
|
const char **excludes = NULL;
|
|
|
|
/*
|
|
|
|
* If `data->allow_uor` allows fetching hidden refs, we need to
|
|
|
|
* mark all references (including hidden ones), to check in
|
|
|
|
* `is_our_ref()` below.
|
|
|
|
*
|
|
|
|
* Otherwise, we only care about whether each reference's object
|
|
|
|
* has the OUR_REF bit set or not, so do not need to visit
|
|
|
|
* hidden references.
|
|
|
|
*/
|
|
|
|
if (allow_hidden_refs(data->allow_uor))
|
|
|
|
excludes = hidden_refs_to_excludes(&data->hidden_refs);
|
|
|
|
|
2024-05-07 15:11:53 +08:00
|
|
|
refs_for_each_namespaced_ref(get_main_ref_store(the_repository),
|
|
|
|
excludes, fn, data);
|
upload-pack.c: avoid enumerating hidden refs where possible
In a similar fashion as a previous commit, teach `upload-pack` to avoid
enumerating hidden references where possible.
Note, however, that there are certain cases where cannot avoid
enumerating even hidden references, in particular when either of:
- `uploadpack.allowTipSHA1InWant`, or
- `uploadpack.allowReachableSHA1InWant`
are set, corresponding to `ALLOW_TIP_SHA1` and `ALLOW_REACHABLE_SHA1`,
respectively.
When either of these bits are set, upload-pack's `is_our_ref()` function
needs to consider the `HIDDEN_REF` bit of the referent's object flags.
So we must visit all references, including the hidden ones, in order to
mark their referents with the `HIDDEN_REF` bit.
When neither `ALLOW_TIP_SHA1` nor `ALLOW_REACHABLE_SHA1` are set, the
`is_our_ref()` function considers only the `OUR_REF` bit, and not the
`HIDDEN_REF` one. `OUR_REF` is applied via `mark_our_ref()`, and only
to objects at the tips of non-hidden references, so we do not need to
visit hidden references in this case.
When neither of those bits are set, `upload-pack` can potentially avoid
enumerating a large number of references. In the same example as a
previous commit (linux.git with one hidden reference per commit,
"refs/pull/N"):
$ printf 0000 >in
$ hyperfine --warmup=1 \
'git -c transfer.hideRefs=refs/pull upload-pack . <in' \
'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in' \
'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in'
Benchmark 1: git -c transfer.hideRefs=refs/pull upload-pack . <in
Time (mean ± σ): 406.9 ms ± 1.1 ms [User: 357.3 ms, System: 49.5 ms]
Range (min … max): 405.7 ms … 409.2 ms 10 runs
Benchmark 2: git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in
Time (mean ± σ): 406.5 ms ± 1.3 ms [User: 356.5 ms, System: 49.9 ms]
Range (min … max): 404.6 ms … 408.8 ms 10 runs
Benchmark 3: git.compile -c transfer.hideRefs=refs/pull upload-pack . <in
Time (mean ± σ): 4.7 ms ± 0.2 ms [User: 0.7 ms, System: 3.9 ms]
Range (min … max): 4.3 ms … 6.1 ms 472 runs
Summary
'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in' ran
86.62 ± 4.33 times faster than 'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in'
86.70 ± 4.33 times faster than 'git -c transfer.hideRefs=refs/pull upload-pack . <in'
As above, we must visit every reference when
uploadPack.allowTipSHA1InWant is set. But when it is unset, we can visit
far fewer references.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-11 05:12:45 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2020-06-11 20:05:12 +08:00
|
|
|
static int is_our_ref(struct object *o, enum allow_uor allow_uor)
|
2013-01-29 13:49:57 +08:00
|
|
|
{
|
upload-pack.c: avoid enumerating hidden refs where possible
In a similar fashion as a previous commit, teach `upload-pack` to avoid
enumerating hidden references where possible.
Note, however, that there are certain cases where cannot avoid
enumerating even hidden references, in particular when either of:
- `uploadpack.allowTipSHA1InWant`, or
- `uploadpack.allowReachableSHA1InWant`
are set, corresponding to `ALLOW_TIP_SHA1` and `ALLOW_REACHABLE_SHA1`,
respectively.
When either of these bits are set, upload-pack's `is_our_ref()` function
needs to consider the `HIDDEN_REF` bit of the referent's object flags.
So we must visit all references, including the hidden ones, in order to
mark their referents with the `HIDDEN_REF` bit.
When neither `ALLOW_TIP_SHA1` nor `ALLOW_REACHABLE_SHA1` are set, the
`is_our_ref()` function considers only the `OUR_REF` bit, and not the
`HIDDEN_REF` one. `OUR_REF` is applied via `mark_our_ref()`, and only
to objects at the tips of non-hidden references, so we do not need to
visit hidden references in this case.
When neither of those bits are set, `upload-pack` can potentially avoid
enumerating a large number of references. In the same example as a
previous commit (linux.git with one hidden reference per commit,
"refs/pull/N"):
$ printf 0000 >in
$ hyperfine --warmup=1 \
'git -c transfer.hideRefs=refs/pull upload-pack . <in' \
'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in' \
'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in'
Benchmark 1: git -c transfer.hideRefs=refs/pull upload-pack . <in
Time (mean ± σ): 406.9 ms ± 1.1 ms [User: 357.3 ms, System: 49.5 ms]
Range (min … max): 405.7 ms … 409.2 ms 10 runs
Benchmark 2: git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in
Time (mean ± σ): 406.5 ms ± 1.3 ms [User: 356.5 ms, System: 49.9 ms]
Range (min … max): 404.6 ms … 408.8 ms 10 runs
Benchmark 3: git.compile -c transfer.hideRefs=refs/pull upload-pack . <in
Time (mean ± σ): 4.7 ms ± 0.2 ms [User: 0.7 ms, System: 3.9 ms]
Range (min … max): 4.3 ms … 6.1 ms 472 runs
Summary
'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in' ran
86.62 ± 4.33 times faster than 'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in'
86.70 ± 4.33 times faster than 'git -c transfer.hideRefs=refs/pull upload-pack . <in'
As above, we must visit every reference when
uploadPack.allowTipSHA1InWant is set. But when it is unset, we can visit
far fewer references.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-11 05:12:45 +08:00
|
|
|
return o->flags & ((allow_hidden_refs(allow_uor) ? 0 : HIDDEN_REF) | OUR_REF);
|
2013-01-29 13:49:57 +08:00
|
|
|
}
|
|
|
|
|
2016-06-12 18:54:07 +08:00
|
|
|
/*
|
|
|
|
* on successful case, it's up to the caller to close cmd->out
|
|
|
|
*/
|
|
|
|
static int do_reachable_revlist(struct child_process *cmd,
|
2016-06-12 18:54:08 +08:00
|
|
|
struct object_array *src,
|
2020-06-11 20:05:11 +08:00
|
|
|
struct object_array *reachable,
|
2020-06-11 20:05:12 +08:00
|
|
|
enum allow_uor allow_uor)
|
2011-08-06 04:54:06 +08:00
|
|
|
{
|
|
|
|
struct object *o;
|
2020-08-13 00:52:55 +08:00
|
|
|
FILE *cmd_in = NULL;
|
2011-08-06 04:54:06 +08:00
|
|
|
int i;
|
|
|
|
|
2021-11-26 06:52:20 +08:00
|
|
|
strvec_pushl(&cmd->args, "rev-list", "--stdin", NULL);
|
2016-06-12 18:54:07 +08:00
|
|
|
cmd->git_cmd = 1;
|
|
|
|
cmd->no_stderr = 1;
|
|
|
|
cmd->in = -1;
|
|
|
|
cmd->out = -1;
|
2011-08-06 04:54:06 +08:00
|
|
|
|
|
|
|
/*
|
2016-06-12 18:53:51 +08:00
|
|
|
* If the next rev-list --stdin encounters an unknown commit,
|
|
|
|
* it terminates, which will cause SIGPIPE in the write loop
|
2011-08-06 04:54:06 +08:00
|
|
|
* below.
|
|
|
|
*/
|
|
|
|
sigchain_push(SIGPIPE, SIG_IGN);
|
|
|
|
|
2016-06-12 18:54:07 +08:00
|
|
|
if (start_command(cmd))
|
2016-06-12 18:53:51 +08:00
|
|
|
goto error;
|
|
|
|
|
2020-08-13 00:52:55 +08:00
|
|
|
cmd_in = xfdopen(cmd->in, "w");
|
|
|
|
|
2011-08-06 04:54:06 +08:00
|
|
|
for (i = get_max_object_index(); 0 < i; ) {
|
|
|
|
o = get_indexed_object(--i);
|
2011-08-24 13:47:17 +08:00
|
|
|
if (!o)
|
|
|
|
continue;
|
2016-06-12 18:54:08 +08:00
|
|
|
if (reachable && o->type == OBJ_COMMIT)
|
|
|
|
o->flags &= ~TMP_MARK;
|
2020-06-11 20:05:12 +08:00
|
|
|
if (!is_our_ref(o, allow_uor))
|
2011-08-06 04:54:06 +08:00
|
|
|
continue;
|
2020-08-13 00:52:55 +08:00
|
|
|
if (fprintf(cmd_in, "^%s\n", oid_to_hex(&o->oid)) < 0)
|
2011-08-06 04:54:06 +08:00
|
|
|
goto error;
|
|
|
|
}
|
2016-06-12 18:53:52 +08:00
|
|
|
for (i = 0; i < src->nr; i++) {
|
|
|
|
o = src->objects[i].item;
|
2020-06-11 20:05:12 +08:00
|
|
|
if (is_our_ref(o, allow_uor)) {
|
2016-06-12 18:54:08 +08:00
|
|
|
if (reachable)
|
|
|
|
add_object_array(o, NULL, reachable);
|
2011-08-06 04:54:06 +08:00
|
|
|
continue;
|
2016-06-12 18:54:08 +08:00
|
|
|
}
|
|
|
|
if (reachable && o->type == OBJ_COMMIT)
|
|
|
|
o->flags |= TMP_MARK;
|
2020-08-13 00:52:55 +08:00
|
|
|
if (fprintf(cmd_in, "%s\n", oid_to_hex(&o->oid)) < 0)
|
2011-08-06 04:54:06 +08:00
|
|
|
goto error;
|
|
|
|
}
|
2020-08-13 00:52:55 +08:00
|
|
|
if (ferror(cmd_in) || fflush(cmd_in))
|
|
|
|
goto error;
|
|
|
|
fclose(cmd_in);
|
2016-06-12 18:54:07 +08:00
|
|
|
cmd->in = -1;
|
|
|
|
sigchain_pop(SIGPIPE);
|
2011-08-06 04:54:06 +08:00
|
|
|
|
2016-06-12 18:54:07 +08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
error:
|
2011-08-06 04:54:06 +08:00
|
|
|
sigchain_pop(SIGPIPE);
|
|
|
|
|
2020-08-13 00:52:55 +08:00
|
|
|
if (cmd_in)
|
|
|
|
fclose(cmd_in);
|
2016-06-12 18:54:07 +08:00
|
|
|
if (cmd->out >= 0)
|
|
|
|
close(cmd->out);
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2020-06-11 20:05:11 +08:00
|
|
|
static int get_reachable_list(struct upload_pack_data *data,
|
2016-06-12 18:54:08 +08:00
|
|
|
struct object_array *reachable)
|
|
|
|
{
|
|
|
|
struct child_process cmd = CHILD_PROCESS_INIT;
|
|
|
|
int i;
|
|
|
|
struct object *o;
|
2018-05-02 08:25:51 +08:00
|
|
|
char namebuf[GIT_MAX_HEXSZ + 2]; /* ^ + hash + LF */
|
|
|
|
const unsigned hexsz = the_hash_algo->hexsz;
|
2024-09-05 18:08:48 +08:00
|
|
|
int ret;
|
2016-06-12 18:54:08 +08:00
|
|
|
|
2020-06-11 20:05:11 +08:00
|
|
|
if (do_reachable_revlist(&cmd, &data->shallows, reachable,
|
2024-09-05 18:08:48 +08:00
|
|
|
data->allow_uor) < 0) {
|
|
|
|
ret = -1;
|
|
|
|
goto out;
|
|
|
|
}
|
2016-06-12 18:54:08 +08:00
|
|
|
|
2018-05-02 08:25:51 +08:00
|
|
|
while ((i = read_in_full(cmd.out, namebuf, hexsz + 1)) == hexsz + 1) {
|
2019-06-20 15:40:54 +08:00
|
|
|
struct object_id oid;
|
2018-05-02 08:25:51 +08:00
|
|
|
const char *p;
|
2016-06-12 18:54:08 +08:00
|
|
|
|
2019-06-20 15:40:54 +08:00
|
|
|
if (parse_oid_hex(namebuf, &oid, &p) || *p != '\n')
|
2016-06-12 18:54:08 +08:00
|
|
|
break;
|
|
|
|
|
2019-06-20 15:41:14 +08:00
|
|
|
o = lookup_object(the_repository, &oid);
|
2016-06-12 18:54:08 +08:00
|
|
|
if (o && o->type == OBJ_COMMIT) {
|
|
|
|
o->flags &= ~TMP_MARK;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
for (i = get_max_object_index(); 0 < i; i--) {
|
|
|
|
o = get_indexed_object(i - 1);
|
|
|
|
if (o && o->type == OBJ_COMMIT &&
|
|
|
|
(o->flags & TMP_MARK)) {
|
|
|
|
add_object_array(o, NULL, reachable);
|
|
|
|
o->flags &= ~TMP_MARK;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
close(cmd.out);
|
|
|
|
|
2024-09-05 18:08:48 +08:00
|
|
|
if (finish_command(&cmd)) {
|
|
|
|
ret = -1;
|
|
|
|
goto out;
|
|
|
|
}
|
2016-06-12 18:54:08 +08:00
|
|
|
|
2024-09-05 18:08:48 +08:00
|
|
|
ret = 0;
|
|
|
|
|
|
|
|
out:
|
|
|
|
child_process_clear(&cmd);
|
|
|
|
return ret;
|
2016-06-12 18:54:08 +08:00
|
|
|
}
|
|
|
|
|
2020-06-11 20:05:12 +08:00
|
|
|
static int has_unreachable(struct object_array *src, enum allow_uor allow_uor)
|
2016-06-12 18:54:07 +08:00
|
|
|
{
|
|
|
|
struct child_process cmd = CHILD_PROCESS_INIT;
|
|
|
|
char buf[1];
|
|
|
|
int i;
|
|
|
|
|
2020-06-11 20:05:12 +08:00
|
|
|
if (do_reachable_revlist(&cmd, src, NULL, allow_uor) < 0)
|
2024-09-05 18:08:48 +08:00
|
|
|
goto error;
|
2011-08-06 04:54:06 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The commits out of the rev-list are not ancestors of
|
|
|
|
* our ref.
|
|
|
|
*/
|
2016-06-12 18:54:07 +08:00
|
|
|
i = read_in_full(cmd.out, buf, 1);
|
2011-08-06 04:54:06 +08:00
|
|
|
if (i)
|
|
|
|
goto error;
|
|
|
|
close(cmd.out);
|
2016-06-12 18:53:51 +08:00
|
|
|
cmd.out = -1;
|
2011-08-06 04:54:06 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* rev-list may have died by encountering a bad commit
|
|
|
|
* in the history, in which case we do want to bail out
|
|
|
|
* even when it showed no commit.
|
|
|
|
*/
|
|
|
|
if (finish_command(&cmd))
|
|
|
|
goto error;
|
|
|
|
|
|
|
|
/* All the non-tip ones are ancestors of what we advertised */
|
2016-06-12 18:53:52 +08:00
|
|
|
return 0;
|
2011-08-06 04:54:06 +08:00
|
|
|
|
|
|
|
error:
|
2016-06-12 18:53:51 +08:00
|
|
|
if (cmd.out >= 0)
|
|
|
|
close(cmd.out);
|
2024-09-05 18:08:48 +08:00
|
|
|
child_process_clear(&cmd);
|
2016-06-12 18:53:52 +08:00
|
|
|
return 1;
|
|
|
|
}
|
2016-06-12 18:53:51 +08:00
|
|
|
|
2020-05-15 18:04:51 +08:00
|
|
|
static void check_non_tip(struct upload_pack_data *data)
|
2016-06-12 18:53:52 +08:00
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* In the normal in-process case without
|
|
|
|
* uploadpack.allowReachableSHA1InWant,
|
|
|
|
* non-tip requests can never happen.
|
|
|
|
*/
|
2020-06-11 20:05:12 +08:00
|
|
|
if (!data->stateless_rpc && !(data->allow_uor & ALLOW_REACHABLE_SHA1))
|
2016-06-12 18:53:52 +08:00
|
|
|
goto error;
|
2020-06-11 20:05:12 +08:00
|
|
|
if (!has_unreachable(&data->want_obj, data->allow_uor))
|
2016-06-12 18:53:52 +08:00
|
|
|
/* All the non-tip ones are ancestors of what we advertised */
|
|
|
|
return;
|
2011-08-06 04:54:06 +08:00
|
|
|
|
|
|
|
error:
|
|
|
|
/* Pick one of them (we know there at least is one) */
|
2020-05-15 18:04:51 +08:00
|
|
|
for (i = 0; i < data->want_obj.nr; i++) {
|
|
|
|
struct object *o = data->want_obj.objects[i].item;
|
2020-06-11 20:05:12 +08:00
|
|
|
if (!is_our_ref(o, data->allow_uor)) {
|
2023-08-10 22:40:50 +08:00
|
|
|
error("git upload-pack: not our ref %s",
|
|
|
|
oid_to_hex(&o->oid));
|
2020-05-15 18:04:51 +08:00
|
|
|
packet_writer_error(&data->writer,
|
upload-pack: send ERR packet for non-tip objects
Commit bdb31eada7 (upload-pack: report "not our ref" to client,
2017-02-23) catches the case where a client asks for an object we don't
have, and issues a message that the client can show to the user (in
addition to dying and writing to stderr).
There's a similar case (with the same message) when the client asks for
an object which we _do_ have, but which isn't a ref tip (or isn't
reachable, when uploadpack.allowReachableSHA1InWant is true). Let's give
that one the same treatment, for the same reason (namely that it's more
informative to the client than just hanging up, since they won't see our
stderr over some protocols).
There are two tests here. We cover it most directly in t5530 by invoking
upload-pack, which matches the existing "not our ref" test.
But a more end-to-end check is that "git fetch" actually shows the
message to the client. We're already checking in t5516 that this case
fails, so we can just check stderr there, too. Note that even after we
started ignoring SIGPIPE in 8bf4becf0c, this could in theory still be
racy as described in that commit (because we die() on write failures
before pumping the connection for any ERR packets).
In practice this should be OK for this case. The server will not
actually check reachability until it has received our whole group of
"want" lines. And since we have no objects in the repository, we won't
send any "have" lines, meaning we're always waiting to read the server
response.
Note also that this case cannot happen in the v2 protocol, since it
allows any available object to be requested. However, we don't have to
take any steps to protect against the upcoming GIT_TEST_PROTOCOL_VERSION
in our tests:
- the tests in t5516 would already need to be skipped under v2, and
that is covered by ab0c5f5096 (tests: always test fetch of
unreachable with v0, 2019-02-25)
- the tests in t5530 invoke upload-pack directly, which will continue
to default to v0. Eventually we may have a test setting which uses
v2 even for bare upload-pack calls, but we can't override it here
until we know what the setting looks like.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-13 13:53:34 +08:00
|
|
|
"upload-pack: not our ref %s",
|
|
|
|
oid_to_hex(&o->oid));
|
2023-08-16 14:06:59 +08:00
|
|
|
exit(128);
|
upload-pack: send ERR packet for non-tip objects
Commit bdb31eada7 (upload-pack: report "not our ref" to client,
2017-02-23) catches the case where a client asks for an object we don't
have, and issues a message that the client can show to the user (in
addition to dying and writing to stderr).
There's a similar case (with the same message) when the client asks for
an object which we _do_ have, but which isn't a ref tip (or isn't
reachable, when uploadpack.allowReachableSHA1InWant is true). Let's give
that one the same treatment, for the same reason (namely that it's more
informative to the client than just hanging up, since they won't see our
stderr over some protocols).
There are two tests here. We cover it most directly in t5530 by invoking
upload-pack, which matches the existing "not our ref" test.
But a more end-to-end check is that "git fetch" actually shows the
message to the client. We're already checking in t5516 that this case
fails, so we can just check stderr there, too. Note that even after we
started ignoring SIGPIPE in 8bf4becf0c, this could in theory still be
racy as described in that commit (because we die() on write failures
before pumping the connection for any ERR packets).
In practice this should be OK for this case. The server will not
actually check reachability until it has received our whole group of
"want" lines. And since we have no objects in the repository, we won't
send any "have" lines, meaning we're always waiting to read the server
response.
Note also that this case cannot happen in the v2 protocol, since it
allows any available object to be requested. However, we don't have to
take any steps to protect against the upcoming GIT_TEST_PROTOCOL_VERSION
in our tests:
- the tests in t5516 would already need to be skipped under v2, and
that is covered by ab0c5f5096 (tests: always test fetch of
unreachable with v0, 2019-02-25)
- the tests in t5530 invoke upload-pack directly, which will continue
to default to v0. Eventually we may have a test setting which uses
v2 even for bare upload-pack calls, but we can't override it here
until we know what the setting looks like.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-13 13:53:34 +08:00
|
|
|
}
|
2011-08-06 04:54:06 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-06-11 20:05:09 +08:00
|
|
|
static void send_shallow(struct upload_pack_data *data,
|
2019-01-16 03:40:27 +08:00
|
|
|
struct commit_list *result)
|
2016-06-12 18:53:46 +08:00
|
|
|
{
|
|
|
|
while (result) {
|
|
|
|
struct object *object = &result->item->object;
|
|
|
|
if (!(object->flags & (CLIENT_SHALLOW|NOT_SHALLOW))) {
|
2020-06-11 20:05:09 +08:00
|
|
|
packet_writer_write(&data->writer, "shallow %s",
|
2019-01-16 03:40:27 +08:00
|
|
|
oid_to_hex(&object->oid));
|
2018-05-18 06:51:44 +08:00
|
|
|
register_shallow(the_repository, &object->oid);
|
2020-06-11 20:05:09 +08:00
|
|
|
data->shallow_nr++;
|
2016-06-12 18:53:46 +08:00
|
|
|
}
|
|
|
|
result = result->next;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-06-11 20:05:08 +08:00
|
|
|
static void send_unshallow(struct upload_pack_data *data)
|
2016-06-12 18:53:45 +08:00
|
|
|
{
|
|
|
|
int i;
|
2016-06-12 18:53:48 +08:00
|
|
|
|
2020-06-11 20:05:08 +08:00
|
|
|
for (i = 0; i < data->shallows.nr; i++) {
|
|
|
|
struct object *object = data->shallows.objects[i].item;
|
2016-06-12 18:53:45 +08:00
|
|
|
if (object->flags & NOT_SHALLOW) {
|
|
|
|
struct commit_list *parents;
|
2020-06-11 20:05:08 +08:00
|
|
|
packet_writer_write(&data->writer, "unshallow %s",
|
2019-01-16 03:40:27 +08:00
|
|
|
oid_to_hex(&object->oid));
|
2016-06-12 18:53:45 +08:00
|
|
|
object->flags &= ~CLIENT_SHALLOW;
|
2016-06-12 18:53:48 +08:00
|
|
|
/*
|
|
|
|
* We want to _register_ "object" as shallow, but we
|
|
|
|
* also need to traverse object's parents to deepen a
|
|
|
|
* shallow clone. Unregister it for now so we can
|
|
|
|
* parse and add the parents to the want list, then
|
|
|
|
* re-register it.
|
|
|
|
*/
|
2017-05-07 06:10:06 +08:00
|
|
|
unregister_shallow(&object->oid);
|
2016-06-12 18:53:45 +08:00
|
|
|
object->parsed = 0;
|
|
|
|
parse_commit_or_die((struct commit *)object);
|
|
|
|
parents = ((struct commit *)object)->parents;
|
|
|
|
while (parents) {
|
|
|
|
add_object_array(&parents->item->object,
|
2020-06-11 20:05:08 +08:00
|
|
|
NULL, &data->want_obj);
|
2016-06-12 18:53:45 +08:00
|
|
|
parents = parents->next;
|
|
|
|
}
|
2020-06-11 20:05:10 +08:00
|
|
|
add_object_array(object, NULL, &data->extra_edge_obj);
|
2016-06-12 18:53:45 +08:00
|
|
|
}
|
|
|
|
/* make sure commit traversal conforms to client */
|
2018-05-18 06:51:44 +08:00
|
|
|
register_shallow(the_repository, &object->oid);
|
2016-06-12 18:53:45 +08:00
|
|
|
}
|
2016-06-12 18:53:48 +08:00
|
|
|
}
|
|
|
|
|
2024-08-09 23:37:50 +08:00
|
|
|
static int check_ref(const char *refname_full, const char *referent UNUSED, const struct object_id *oid,
|
2018-12-19 05:24:35 +08:00
|
|
|
int flag, void *cb_data);
|
2020-06-11 20:05:06 +08:00
|
|
|
static void deepen(struct upload_pack_data *data, int depth)
|
2016-06-12 18:53:48 +08:00
|
|
|
{
|
2018-05-18 06:51:46 +08:00
|
|
|
if (depth == INFINITE_DEPTH && !is_repository_shallow(the_repository)) {
|
2016-06-12 18:53:48 +08:00
|
|
|
int i;
|
|
|
|
|
2020-06-11 20:05:06 +08:00
|
|
|
for (i = 0; i < data->shallows.nr; i++) {
|
|
|
|
struct object *object = data->shallows.objects[i].item;
|
2016-06-12 18:53:48 +08:00
|
|
|
object->flags |= NOT_SHALLOW;
|
|
|
|
}
|
2020-06-11 20:05:06 +08:00
|
|
|
} else if (data->deepen_relative) {
|
fetch, upload-pack: --deepen=N extends shallow boundary by N commits
In git-fetch, --depth argument is always relative with the latest
remote refs. This makes it a bit difficult to cover this use case,
where the user wants to make the shallow history, say 3 levels
deeper. It would work if remote refs have not moved yet, but nobody
can guarantee that, especially when that use case is performed a
couple months after the last clone or "git fetch --depth". Also,
modifying shallow boundary using --depth does not work well with
clones created by --since or --not.
This patch fixes that. A new argument --deepen=<N> will add <N> more (*)
parent commits to the current history regardless of where remote refs
are.
Have/Want negotiation is still respected. So if remote refs move, the
server will send two chunks: one between "have" and "want" and another
to extend shallow history. In theory, the client could send no "want"s
in order to get the second chunk only. But the protocol does not allow
that. Either you send no want lines, which means ls-remote; or you
have to send at least one want line that carries deep-relative to the
server..
The main work was done by Dongcan Jiang. I fixed it up here and there.
And of course all the bugs belong to me.
(*) We could even support --deepen=<N> where <N> is negative. In that
case we can cut some history from the shallow clone. This operation
(and --depth=<shorter depth>) does not require interaction with remote
side (and more complicated to implement as a result).
Helped-by: Duy Nguyen <pclouds@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Dongcan Jiang <dongcan.jiang@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-12 18:54:09 +08:00
|
|
|
struct object_array reachable_shallows = OBJECT_ARRAY_INIT;
|
|
|
|
struct commit_list *result;
|
|
|
|
|
2018-12-19 05:24:35 +08:00
|
|
|
/*
|
|
|
|
* Checking for reachable shallows requires that our refs be
|
|
|
|
* marked with OUR_REF.
|
|
|
|
*/
|
2024-05-07 15:11:53 +08:00
|
|
|
refs_head_ref_namespaced(get_main_ref_store(the_repository),
|
|
|
|
check_ref, data);
|
upload-pack.c: avoid enumerating hidden refs where possible
In a similar fashion as a previous commit, teach `upload-pack` to avoid
enumerating hidden references where possible.
Note, however, that there are certain cases where cannot avoid
enumerating even hidden references, in particular when either of:
- `uploadpack.allowTipSHA1InWant`, or
- `uploadpack.allowReachableSHA1InWant`
are set, corresponding to `ALLOW_TIP_SHA1` and `ALLOW_REACHABLE_SHA1`,
respectively.
When either of these bits are set, upload-pack's `is_our_ref()` function
needs to consider the `HIDDEN_REF` bit of the referent's object flags.
So we must visit all references, including the hidden ones, in order to
mark their referents with the `HIDDEN_REF` bit.
When neither `ALLOW_TIP_SHA1` nor `ALLOW_REACHABLE_SHA1` are set, the
`is_our_ref()` function considers only the `OUR_REF` bit, and not the
`HIDDEN_REF` one. `OUR_REF` is applied via `mark_our_ref()`, and only
to objects at the tips of non-hidden references, so we do not need to
visit hidden references in this case.
When neither of those bits are set, `upload-pack` can potentially avoid
enumerating a large number of references. In the same example as a
previous commit (linux.git with one hidden reference per commit,
"refs/pull/N"):
$ printf 0000 >in
$ hyperfine --warmup=1 \
'git -c transfer.hideRefs=refs/pull upload-pack . <in' \
'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in' \
'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in'
Benchmark 1: git -c transfer.hideRefs=refs/pull upload-pack . <in
Time (mean ± σ): 406.9 ms ± 1.1 ms [User: 357.3 ms, System: 49.5 ms]
Range (min … max): 405.7 ms … 409.2 ms 10 runs
Benchmark 2: git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in
Time (mean ± σ): 406.5 ms ± 1.3 ms [User: 356.5 ms, System: 49.9 ms]
Range (min … max): 404.6 ms … 408.8 ms 10 runs
Benchmark 3: git.compile -c transfer.hideRefs=refs/pull upload-pack . <in
Time (mean ± σ): 4.7 ms ± 0.2 ms [User: 0.7 ms, System: 3.9 ms]
Range (min … max): 4.3 ms … 6.1 ms 472 runs
Summary
'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in' ran
86.62 ± 4.33 times faster than 'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in'
86.70 ± 4.33 times faster than 'git -c transfer.hideRefs=refs/pull upload-pack . <in'
As above, we must visit every reference when
uploadPack.allowTipSHA1InWant is set. But when it is unset, we can visit
far fewer references.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-11 05:12:45 +08:00
|
|
|
for_each_namespaced_ref_1(check_ref, data);
|
2018-12-19 05:24:35 +08:00
|
|
|
|
2020-06-11 20:05:11 +08:00
|
|
|
get_reachable_list(data, &reachable_shallows);
|
fetch, upload-pack: --deepen=N extends shallow boundary by N commits
In git-fetch, --depth argument is always relative with the latest
remote refs. This makes it a bit difficult to cover this use case,
where the user wants to make the shallow history, say 3 levels
deeper. It would work if remote refs have not moved yet, but nobody
can guarantee that, especially when that use case is performed a
couple months after the last clone or "git fetch --depth". Also,
modifying shallow boundary using --depth does not work well with
clones created by --since or --not.
This patch fixes that. A new argument --deepen=<N> will add <N> more (*)
parent commits to the current history regardless of where remote refs
are.
Have/Want negotiation is still respected. So if remote refs move, the
server will send two chunks: one between "have" and "want" and another
to extend shallow history. In theory, the client could send no "want"s
in order to get the second chunk only. But the protocol does not allow
that. Either you send no want lines, which means ls-remote; or you
have to send at least one want line that carries deep-relative to the
server..
The main work was done by Dongcan Jiang. I fixed it up here and there.
And of course all the bugs belong to me.
(*) We could even support --deepen=<N> where <N> is negative. In that
case we can cut some history from the shallow clone. This operation
(and --depth=<shorter depth>) does not require interaction with remote
side (and more complicated to implement as a result).
Helped-by: Duy Nguyen <pclouds@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Dongcan Jiang <dongcan.jiang@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-12 18:54:09 +08:00
|
|
|
result = get_shallow_commits(&reachable_shallows,
|
|
|
|
depth + 1,
|
|
|
|
SHALLOW, NOT_SHALLOW);
|
2020-06-11 20:05:09 +08:00
|
|
|
send_shallow(data, result);
|
fetch, upload-pack: --deepen=N extends shallow boundary by N commits
In git-fetch, --depth argument is always relative with the latest
remote refs. This makes it a bit difficult to cover this use case,
where the user wants to make the shallow history, say 3 levels
deeper. It would work if remote refs have not moved yet, but nobody
can guarantee that, especially when that use case is performed a
couple months after the last clone or "git fetch --depth". Also,
modifying shallow boundary using --depth does not work well with
clones created by --since or --not.
This patch fixes that. A new argument --deepen=<N> will add <N> more (*)
parent commits to the current history regardless of where remote refs
are.
Have/Want negotiation is still respected. So if remote refs move, the
server will send two chunks: one between "have" and "want" and another
to extend shallow history. In theory, the client could send no "want"s
in order to get the second chunk only. But the protocol does not allow
that. Either you send no want lines, which means ls-remote; or you
have to send at least one want line that carries deep-relative to the
server..
The main work was done by Dongcan Jiang. I fixed it up here and there.
And of course all the bugs belong to me.
(*) We could even support --deepen=<N> where <N> is negative. In that
case we can cut some history from the shallow clone. This operation
(and --depth=<shorter depth>) does not require interaction with remote
side (and more complicated to implement as a result).
Helped-by: Duy Nguyen <pclouds@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Dongcan Jiang <dongcan.jiang@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-12 18:54:09 +08:00
|
|
|
free_commit_list(result);
|
|
|
|
object_array_clear(&reachable_shallows);
|
2016-06-12 18:53:48 +08:00
|
|
|
} else {
|
|
|
|
struct commit_list *result;
|
|
|
|
|
2020-06-11 20:05:06 +08:00
|
|
|
result = get_shallow_commits(&data->want_obj, depth,
|
2016-06-12 18:53:48 +08:00
|
|
|
SHALLOW, NOT_SHALLOW);
|
2020-06-11 20:05:09 +08:00
|
|
|
send_shallow(data, result);
|
2016-06-12 18:53:48 +08:00
|
|
|
free_commit_list(result);
|
|
|
|
}
|
|
|
|
|
2020-06-11 20:05:08 +08:00
|
|
|
send_unshallow(data);
|
2016-06-12 18:53:45 +08:00
|
|
|
}
|
|
|
|
|
2020-06-11 20:05:07 +08:00
|
|
|
static void deepen_by_rev_list(struct upload_pack_data *data,
|
|
|
|
int ac,
|
|
|
|
const char **av)
|
2016-06-12 18:53:58 +08:00
|
|
|
{
|
|
|
|
struct commit_list *result;
|
|
|
|
|
upload-pack: disable commit graph more gently for shallow traversal
When the client has asked for certain shallow options like
"deepen-since", we do a custom rev-list walk that pretends to be
shallow. Before doing so, we have to disable the commit-graph, since it
is not compatible with the shallow view of the repository. That's
handled by 829a321569 (commit-graph: close_commit_graph before shallow
walk, 2018-08-20). That commit literally closes and frees our
repo->objects->commit_graph struct.
That creates an interesting problem for commits that have _already_ been
parsed using the commit graph. Their commit->object.parsed flag is set,
their commit->graph_pos is set, but their commit->maybe_tree may still
be NULL. When somebody later calls repo_get_commit_tree(), we see that
we haven't loaded the tree oid yet and try to get it from the commit
graph. But since it has been freed, we segfault!
So the root of the issue is a data dependency between the commit's
lazy-load of the tree oid and the fact that the commit graph can go
away mid-process. How can we resolve it?
There are a couple of general approaches:
1. The obvious answer is to avoid loading the tree from the graph when
we see that it's NULL. But then what do we return for the tree oid?
If we return NULL, our caller in do_traverse() will rightly
complain that we have no tree. We'd have to fallback to loading the
actual commit object and re-parsing it. That requires teaching
parse_commit_buffer() to understand re-parsing (i.e., not starting
from a clean slate and not leaking any allocated bits like parent
list pointers).
2. When we close the commit graph, walk through the set of in-memory
objects and clear any graph_pos pointers. But this means we also
have to "unparse" any such commits so that we know they still need
to open the commit object to fill in their trees. So it's no less
complicated than (1), and is more expensive (since we clear objects
we might not later need).
3. Stop freeing the commit-graph struct. Continue to let it be used
for lazy-loads of tree oids, but let upload-pack specify that it
shouldn't be used for further commit parsing.
4. Push the whole shallow rev-list out to its own sub-process, with
the commit-graph disabled from the start, giving it a clean memory
space to work from.
I've chosen (3) here. Options (1) and (2) would work, but are
non-trivial to implement. Option (4) is more expensive, and I'm not sure
how complicated it is (shelling out for the actual rev-list part is
easy, but we do then parse the resulting commits internally, and I'm not
clear which parts need to be handling shallow-ness).
The new test in t5500 triggers this segfault, but see the comments there
for how horribly intimate it has to be with how both upload-pack and
commit graphs work.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-12 22:44:45 +08:00
|
|
|
disable_commit_graph(the_repository);
|
2016-06-12 18:53:58 +08:00
|
|
|
result = get_shallow_commits_by_rev_list(ac, av, SHALLOW, NOT_SHALLOW);
|
2020-06-11 20:05:09 +08:00
|
|
|
send_shallow(data, result);
|
2016-06-12 18:53:58 +08:00
|
|
|
free_commit_list(result);
|
2020-06-11 20:05:08 +08:00
|
|
|
send_unshallow(data);
|
2018-03-16 01:31:28 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Returns 1 if a shallow list is sent or 0 otherwise */
|
2020-06-11 20:05:05 +08:00
|
|
|
static int send_shallow_list(struct upload_pack_data *data)
|
2018-03-16 01:31:28 +08:00
|
|
|
{
|
|
|
|
int ret = 0;
|
|
|
|
|
2020-06-11 20:05:05 +08:00
|
|
|
if (data->depth > 0 && data->deepen_rev_list)
|
2018-03-16 01:31:28 +08:00
|
|
|
die("git upload-pack: deepen and deepen-since (or deepen-not) cannot be used together");
|
2020-06-11 20:05:05 +08:00
|
|
|
if (data->depth > 0) {
|
2020-06-11 20:05:06 +08:00
|
|
|
deepen(data, data->depth);
|
2018-03-16 01:31:28 +08:00
|
|
|
ret = 1;
|
2020-06-11 20:05:05 +08:00
|
|
|
} else if (data->deepen_rev_list) {
|
2020-07-29 04:25:12 +08:00
|
|
|
struct strvec av = STRVEC_INIT;
|
2018-03-16 01:31:28 +08:00
|
|
|
int i;
|
|
|
|
|
2020-07-29 04:25:12 +08:00
|
|
|
strvec_push(&av, "rev-list");
|
2020-06-11 20:05:05 +08:00
|
|
|
if (data->deepen_since)
|
2020-07-29 04:25:12 +08:00
|
|
|
strvec_pushf(&av, "--max-age=%"PRItime, data->deepen_since);
|
2024-02-29 06:37:44 +08:00
|
|
|
if (oidset_size(&data->deepen_not)) {
|
|
|
|
const struct object_id *oid;
|
|
|
|
struct oidset_iter iter;
|
2020-07-29 04:25:12 +08:00
|
|
|
strvec_push(&av, "--not");
|
2024-02-29 06:37:44 +08:00
|
|
|
oidset_iter_init(&data->deepen_not, &iter);
|
|
|
|
while ((oid = oidset_iter_next(&iter)))
|
2024-02-29 06:37:20 +08:00
|
|
|
strvec_push(&av, oid_to_hex(oid));
|
2020-07-29 04:25:12 +08:00
|
|
|
strvec_push(&av, "--not");
|
2018-03-16 01:31:28 +08:00
|
|
|
}
|
2020-06-11 20:05:05 +08:00
|
|
|
for (i = 0; i < data->want_obj.nr; i++) {
|
|
|
|
struct object *o = data->want_obj.objects[i].item;
|
2020-07-29 04:25:12 +08:00
|
|
|
strvec_push(&av, oid_to_hex(&o->oid));
|
2018-03-16 01:31:28 +08:00
|
|
|
}
|
2020-07-29 08:37:20 +08:00
|
|
|
deepen_by_rev_list(data, av.nr, av.v);
|
2020-07-29 04:25:12 +08:00
|
|
|
strvec_clear(&av);
|
2018-03-16 01:31:28 +08:00
|
|
|
ret = 1;
|
|
|
|
} else {
|
2020-06-11 20:05:05 +08:00
|
|
|
if (data->shallows.nr > 0) {
|
2018-03-16 01:31:28 +08:00
|
|
|
int i;
|
2020-06-11 20:05:05 +08:00
|
|
|
for (i = 0; i < data->shallows.nr; i++)
|
2018-07-19 03:20:27 +08:00
|
|
|
register_shallow(the_repository,
|
2020-06-11 20:05:05 +08:00
|
|
|
&data->shallows.objects[i].item->oid);
|
2018-03-16 01:31:28 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-06-11 20:05:09 +08:00
|
|
|
data->shallow_nr += data->shallows.nr;
|
2018-03-16 01:31:28 +08:00
|
|
|
return ret;
|
2016-06-12 18:53:58 +08:00
|
|
|
}
|
|
|
|
|
2018-03-15 02:31:42 +08:00
|
|
|
static int process_shallow(const char *line, struct object_array *shallows)
|
|
|
|
{
|
|
|
|
const char *arg;
|
|
|
|
if (skip_prefix(line, "shallow ", &arg)) {
|
|
|
|
struct object_id oid;
|
|
|
|
struct object *object;
|
|
|
|
if (get_oid_hex(arg, &oid))
|
|
|
|
die("invalid shallow line: %s", line);
|
2018-06-29 09:21:51 +08:00
|
|
|
object = parse_object(the_repository, &oid);
|
2018-03-15 02:31:42 +08:00
|
|
|
if (!object)
|
|
|
|
return 1;
|
|
|
|
if (object->type != OBJ_COMMIT)
|
|
|
|
die("invalid shallow object %s", oid_to_hex(&oid));
|
|
|
|
if (!(object->flags & CLIENT_SHALLOW)) {
|
|
|
|
object->flags |= CLIENT_SHALLOW;
|
|
|
|
add_object_array(object, NULL, shallows);
|
|
|
|
}
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int process_deepen(const char *line, int *depth)
|
|
|
|
{
|
|
|
|
const char *arg;
|
|
|
|
if (skip_prefix(line, "deepen ", &arg)) {
|
|
|
|
char *end = NULL;
|
|
|
|
*depth = (int)strtol(arg, &end, 0);
|
|
|
|
if (!end || *end || *depth <= 0)
|
|
|
|
die("Invalid deepen: %s", line);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int process_deepen_since(const char *line, timestamp_t *deepen_since, int *deepen_rev_list)
|
|
|
|
{
|
|
|
|
const char *arg;
|
|
|
|
if (skip_prefix(line, "deepen-since ", &arg)) {
|
|
|
|
char *end = NULL;
|
|
|
|
*deepen_since = parse_timestamp(arg, &end, 0);
|
|
|
|
if (!end || *end || !deepen_since ||
|
|
|
|
/* revisions.c's max_age -1 is special */
|
|
|
|
*deepen_since == -1)
|
|
|
|
die("Invalid deepen-since: %s", line);
|
|
|
|
*deepen_rev_list = 1;
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2024-02-29 06:37:44 +08:00
|
|
|
static int process_deepen_not(const char *line, struct oidset *deepen_not, int *deepen_rev_list)
|
2018-03-15 02:31:42 +08:00
|
|
|
{
|
|
|
|
const char *arg;
|
|
|
|
if (skip_prefix(line, "deepen-not ", &arg)) {
|
|
|
|
char *ref = NULL;
|
|
|
|
struct object_id oid;
|
2019-04-06 19:34:27 +08:00
|
|
|
if (expand_ref(the_repository, arg, strlen(arg), &oid, &ref) != 1)
|
2018-03-15 02:31:42 +08:00
|
|
|
die("git upload-pack: ambiguous deepen-not: %s", line);
|
2024-02-29 06:37:44 +08:00
|
|
|
oidset_insert(deepen_not, &oid);
|
2018-03-15 02:31:42 +08:00
|
|
|
free(ref);
|
|
|
|
*deepen_rev_list = 1;
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
return 0;
|
2016-06-12 18:53:58 +08:00
|
|
|
}
|
|
|
|
|
upload-pack.c: allow banning certain object filter(s)
Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.
However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).
Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing two new configuration variables:
- 'uploadpackfilter.allow'
- 'uploadpackfilter.<kind>.allow'
where '<kind>' may be one of 'blobNone', 'blobLimit', 'tree', and so on.
Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.
If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpackfilter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04 02:00:10 +08:00
|
|
|
NORETURN __attribute__((format(printf,2,3)))
|
|
|
|
static void send_err_and_die(struct upload_pack_data *data,
|
|
|
|
const char *fmt, ...)
|
|
|
|
{
|
|
|
|
struct strbuf buf = STRBUF_INIT;
|
|
|
|
va_list ap;
|
|
|
|
|
|
|
|
va_start(ap, fmt);
|
|
|
|
strbuf_vaddf(&buf, fmt, ap);
|
|
|
|
va_end(ap);
|
|
|
|
|
|
|
|
packet_writer_error(&data->writer, "%s", buf.buf);
|
|
|
|
die("%s", buf.buf);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void check_one_filter(struct upload_pack_data *data,
|
|
|
|
struct list_objects_filter_options *opts)
|
|
|
|
{
|
|
|
|
const char *key = list_object_filter_config_name(opts->choice);
|
|
|
|
struct string_list_item *item = string_list_lookup(&data->allowed_filters,
|
|
|
|
key);
|
|
|
|
int allowed;
|
|
|
|
|
|
|
|
if (item)
|
|
|
|
allowed = (intptr_t)item->util;
|
|
|
|
else
|
|
|
|
allowed = data->allow_filter_fallback;
|
|
|
|
|
|
|
|
if (!allowed)
|
|
|
|
send_err_and_die(data, "filter '%s' not supported", key);
|
upload-pack.c: introduce 'uploadpackfilter.tree.maxDepth'
In b79cf959b2 (upload-pack.c: allow banning certain object filter(s),
2020-02-26), we introduced functionality to disallow certain object
filters from being chosen from within 'git upload-pack'. Traditionally,
administrators use this functionality to disallow filters that are known
to perform slowly, for e.g., those that do not have bitmap-level
filtering.
In the past, the '--filter=tree:<n>' was one such filter that does not
have bitmap-level filtering support, and so was likely to be banned by
administrators.
However, in the previous couple of commits, we introduced bitmap-level
filtering for the case when 'n' is equal to '0', i.e., as if we had a
'--filter=tree:none' choice.
While it would be sufficient to simply write
$ git config uploadpackfilter.tree.allow true
(since it would allow all values of 'n'), we would like to be able to
allow this filter for certain values of 'n', i.e., those no greater than
some pre-specified maximum.
In order to do this, introduce a new configuration key, as follows:
$ git config uploadpackfilter.tree.maxDepth <m>
where '<m>' specifies the maximum allowed value of 'n' in the filter
'tree:n'. Administrators who wish to allow for only the value '0' can
write:
$ git config uploadpackfilter.tree.allow true
$ git config uploadpackfilter.tree.maxDepth 0
which allows '--filter=tree:0', but no other values.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04 02:00:17 +08:00
|
|
|
|
|
|
|
if (opts->choice == LOFC_TREE_DEPTH &&
|
|
|
|
opts->tree_exclude_depth > data->tree_filter_max_depth)
|
|
|
|
send_err_and_die(data,
|
|
|
|
"tree filter allows max depth %lu, but got %lu",
|
|
|
|
data->tree_filter_max_depth,
|
|
|
|
opts->tree_exclude_depth);
|
upload-pack.c: allow banning certain object filter(s)
Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.
However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).
Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing two new configuration variables:
- 'uploadpackfilter.allow'
- 'uploadpackfilter.<kind>.allow'
where '<kind>' may be one of 'blobNone', 'blobLimit', 'tree', and so on.
Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.
If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpackfilter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04 02:00:10 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void check_filter_recurse(struct upload_pack_data *data,
|
|
|
|
struct list_objects_filter_options *opts)
|
|
|
|
{
|
|
|
|
size_t i;
|
|
|
|
|
|
|
|
check_one_filter(data, opts);
|
|
|
|
if (opts->choice != LOFC_COMBINE)
|
|
|
|
return;
|
|
|
|
|
|
|
|
for (i = 0; i < opts->sub_nr; i++)
|
|
|
|
check_filter_recurse(data, &opts->sub[i]);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void die_if_using_banned_filter(struct upload_pack_data *data)
|
|
|
|
{
|
|
|
|
check_filter_recurse(data, &data->filter_options);
|
|
|
|
}
|
|
|
|
|
2020-05-15 18:04:47 +08:00
|
|
|
static void receive_needs(struct upload_pack_data *data,
|
|
|
|
struct packet_reader *reader)
|
2005-07-05 06:29:17 +08:00
|
|
|
{
|
2011-08-06 04:54:06 +08:00
|
|
|
int has_non_tip = 0;
|
2005-07-05 06:29:17 +08:00
|
|
|
|
2020-06-11 20:05:09 +08:00
|
|
|
data->shallow_nr = 0;
|
2005-07-05 06:29:17 +08:00
|
|
|
for (;;) {
|
2005-10-25 09:59:18 +08:00
|
|
|
struct object *o;
|
2012-01-09 05:06:19 +08:00
|
|
|
const char *features;
|
2017-05-07 06:10:28 +08:00
|
|
|
struct object_id oid_buf;
|
2016-06-12 18:53:49 +08:00
|
|
|
const char *arg;
|
2023-04-15 05:25:20 +08:00
|
|
|
size_t feature_len;
|
2016-06-12 18:53:49 +08:00
|
|
|
|
2020-06-05 01:54:40 +08:00
|
|
|
reset_timeout(data->timeout);
|
2018-12-30 05:19:14 +08:00
|
|
|
if (packet_reader_read(reader) != PACKET_READ_NORMAL)
|
2006-10-31 03:09:06 +08:00
|
|
|
break;
|
2005-10-06 05:49:54 +08:00
|
|
|
|
2020-05-15 18:04:54 +08:00
|
|
|
if (process_shallow(reader->line, &data->shallows))
|
2006-10-31 03:09:06 +08:00
|
|
|
continue;
|
2020-05-15 18:04:54 +08:00
|
|
|
if (process_deepen(reader->line, &data->depth))
|
allow cloning a repository "shallowly"
By specifying a depth, you can now clone a repository such that
all fetched ancestor-chains' length is at most "depth". For example,
if the upstream repository has only 2 branches ("A" and "B"), which
are linear, and you specify depth 3, you will get A, A~1, A~2, A~3,
B, B~1, B~2, and B~3. The ends are automatically made shallow
commits.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-31 03:09:29 +08:00
|
|
|
continue;
|
2020-05-15 18:04:54 +08:00
|
|
|
if (process_deepen_since(reader->line, &data->deepen_since, &data->deepen_rev_list))
|
2016-06-12 18:53:58 +08:00
|
|
|
continue;
|
2020-05-15 18:04:54 +08:00
|
|
|
if (process_deepen_not(reader->line, &data->deepen_not, &data->deepen_rev_list))
|
2016-06-12 18:54:03 +08:00
|
|
|
continue;
|
2018-03-15 02:31:42 +08:00
|
|
|
|
2018-12-30 05:19:14 +08:00
|
|
|
if (skip_prefix(reader->line, "filter ", &arg)) {
|
2020-06-05 01:54:42 +08:00
|
|
|
if (!data->filter_capability_requested)
|
2017-12-08 23:58:39 +08:00
|
|
|
die("git upload-pack: filtering capability not negotiated");
|
2020-05-15 18:04:47 +08:00
|
|
|
list_objects_filter_die_if_populated(&data->filter_options);
|
|
|
|
parse_list_objects_filter(&data->filter_options, arg);
|
upload-pack.c: allow banning certain object filter(s)
Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.
However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).
Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing two new configuration variables:
- 'uploadpackfilter.allow'
- 'uploadpackfilter.<kind>.allow'
where '<kind>' may be one of 'blobNone', 'blobLimit', 'tree', and so on.
Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.
If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpackfilter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04 02:00:10 +08:00
|
|
|
die_if_using_banned_filter(data);
|
2017-12-08 23:58:39 +08:00
|
|
|
continue;
|
|
|
|
}
|
2018-05-08 14:59:15 +08:00
|
|
|
|
2018-12-30 05:19:14 +08:00
|
|
|
if (!skip_prefix(reader->line, "want ", &arg) ||
|
2018-05-02 08:25:51 +08:00
|
|
|
parse_oid_hex(arg, &oid_buf, &features))
|
2008-09-01 00:39:19 +08:00
|
|
|
die("git upload-pack: protocol error, "
|
2018-12-30 05:19:14 +08:00
|
|
|
"expected to get object ID, not '%s'", reader->line);
|
2012-01-09 05:06:19 +08:00
|
|
|
|
fetch, upload-pack: --deepen=N extends shallow boundary by N commits
In git-fetch, --depth argument is always relative with the latest
remote refs. This makes it a bit difficult to cover this use case,
where the user wants to make the shallow history, say 3 levels
deeper. It would work if remote refs have not moved yet, but nobody
can guarantee that, especially when that use case is performed a
couple months after the last clone or "git fetch --depth". Also,
modifying shallow boundary using --depth does not work well with
clones created by --since or --not.
This patch fixes that. A new argument --deepen=<N> will add <N> more (*)
parent commits to the current history regardless of where remote refs
are.
Have/Want negotiation is still respected. So if remote refs move, the
server will send two chunks: one between "have" and "want" and another
to extend shallow history. In theory, the client could send no "want"s
in order to get the second chunk only. But the protocol does not allow
that. Either you send no want lines, which means ls-remote; or you
have to send at least one want line that carries deep-relative to the
server..
The main work was done by Dongcan Jiang. I fixed it up here and there.
And of course all the bugs belong to me.
(*) We could even support --deepen=<N> where <N> is negative. In that
case we can cut some history from the shallow clone. This operation
(and --depth=<shorter depth>) does not require interaction with remote
side (and more complicated to implement as a result).
Helped-by: Duy Nguyen <pclouds@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Dongcan Jiang <dongcan.jiang@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-12 18:54:09 +08:00
|
|
|
if (parse_feature_request(features, "deepen-relative"))
|
2020-05-15 18:04:54 +08:00
|
|
|
data->deepen_relative = 1;
|
2012-01-09 05:06:19 +08:00
|
|
|
if (parse_feature_request(features, "multi_ack_detailed"))
|
2020-06-05 01:54:44 +08:00
|
|
|
data->multi_ack = MULTI_ACK_DETAILED;
|
2012-01-09 05:06:19 +08:00
|
|
|
else if (parse_feature_request(features, "multi_ack"))
|
2020-06-05 01:54:44 +08:00
|
|
|
data->multi_ack = MULTI_ACK;
|
2012-01-09 05:06:19 +08:00
|
|
|
if (parse_feature_request(features, "no-done"))
|
2020-06-05 01:54:40 +08:00
|
|
|
data->no_done = 1;
|
2012-01-09 05:06:19 +08:00
|
|
|
if (parse_feature_request(features, "thin-pack"))
|
2020-06-05 01:54:38 +08:00
|
|
|
data->use_thin_pack = 1;
|
2012-01-09 05:06:19 +08:00
|
|
|
if (parse_feature_request(features, "ofs-delta"))
|
2020-06-05 01:54:38 +08:00
|
|
|
data->use_ofs_delta = 1;
|
2012-01-09 05:06:19 +08:00
|
|
|
if (parse_feature_request(features, "side-band-64k"))
|
2020-06-05 01:54:41 +08:00
|
|
|
data->use_sideband = LARGE_PACKET_MAX;
|
2012-01-09 05:06:19 +08:00
|
|
|
else if (parse_feature_request(features, "side-band"))
|
2020-06-05 01:54:41 +08:00
|
|
|
data->use_sideband = DEFAULT_PACKET_MAX;
|
2012-01-09 05:06:19 +08:00
|
|
|
if (parse_feature_request(features, "no-progress"))
|
2020-06-05 01:54:38 +08:00
|
|
|
data->no_progress = 1;
|
2012-01-09 05:06:19 +08:00
|
|
|
if (parse_feature_request(features, "include-tag"))
|
2020-06-05 01:54:38 +08:00
|
|
|
data->use_include_tag = 1;
|
2020-06-05 01:54:47 +08:00
|
|
|
if (data->allow_filter &&
|
|
|
|
parse_feature_request(features, "filter"))
|
2020-06-05 01:54:42 +08:00
|
|
|
data->filter_capability_requested = 1;
|
2005-10-25 09:59:18 +08:00
|
|
|
|
2020-11-12 07:29:32 +08:00
|
|
|
arg = parse_feature_value(features, "session-id", &feature_len, NULL);
|
|
|
|
if (arg) {
|
|
|
|
char *client_sid = xstrndup(arg, feature_len);
|
|
|
|
trace2_data_string("transfer", NULL, "client-sid", client_sid);
|
|
|
|
free(client_sid);
|
|
|
|
}
|
|
|
|
|
upload-pack: use PARSE_OBJECT_SKIP_HASH_CHECK in more places
In commit 0bc2557951 (upload-pack: skip parse-object re-hashing of
"want" objects, 2022-09-06), we optimized the parse_object() calls for
v2 "want" lines from the client so that they avoided parsing blobs, and
so that they used the commit-graph rather than parsing commit objects
from scratch.
We should extend that to two other spots:
1. We parse "have" objects in the got_oid() function. These won't
generally be non-commits (unlike "want" lines from a partial
clone). But we still benefit from the use of the commit-graph.
2. For v0, the "want" lines are parsed in receive_needs(). These are
also less likely to be non-commits because by default they have to
be ref tips. There are config options you might set to allow
non-tip objects, but you'd mostly do so to support partial clones,
and clients recent enough to support partial clone will generally
speak v2 anyway.
So I don't expect this change to improve performance much for day-to-day
operations. But both are possible denial-of-service vectors, where an
attacker can waste our time by sending over a large number of objects to
parse (of course we may waste even more time serving a pack to them, but
we try as much as possible to optimize that in pack-objects; we should
do what we can here in upload-pack, too).
With this patch, running p5600 with GIT_TEST_PROTOCOL_VERSION=0 shows
similar results to what we saw in 0bc2557951 (which ran with the v2
protocol by default). Here are the numbers for linux.git:
Test HEAD^ HEAD
-----------------------------------------------------------------------------
5600.3: checkout of result 50.91(87.95+2.93) 41.75(79.00+3.18) -18.0%
Or for a more extreme (and malicious) case, we can claim to "have" every
blob in git.git over the v0 protocol:
$ {
echo "0032want $(git rev-parse HEAD)"
printf 0000
git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' |
perl -alne 'print "0032have $F[0]" if $F[1] eq "blob"'
} >input
$ time ./git.old upload-pack . <input >/dev/null
real 0m52.951s
user 0m51.633s
sys 0m1.304s
$ time ./git.new upload-pack . <input >/dev/null
real 0m0.261s
user 0m0.156s
sys 0m0.105s
(Note that these don't actually compute a pack because of the hacky
protocol usage, so those numbers are representing the raw blob-parsing
effort done by upload-pack).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:39:03 +08:00
|
|
|
o = parse_object_with_flags(the_repository, &oid_buf,
|
upload-pack: free tree buffers after parsing
When a client sends us a "want" or "have" line, we call parse_object()
to get an object struct. If the object is a tree, then the parsed state
means that tree->buffer points to the uncompressed contents of the tree.
But we don't really care about it. We only really need to parse commits
and tags; for trees and blobs, the important output is just a "struct
object" with the correct type.
But much worse, we do not ever free that tree buffer. It's not leaked in
the traditional sense, in that we still have a pointer to it from the
global object hash. But if the client requests many trees, we'll hold
all of their contents in memory at the same time.
Nobody really noticed because it's rare for clients to directly request
a tree. It might happen for a lightweight tag pointing straight at a
tree, or it might happen for a "tree:depth" partial clone filling in
missing trees.
But it's also possible for a malicious client to request a lot of trees,
causing upload-pack's memory to balloon. For example, without this
patch, requesting every tree in git.git like:
pktline() {
local msg="$*"
printf "%04x%s\n" $((1+4+${#msg})) "$msg"
}
want_trees() {
pktline command=fetch
printf 0001
git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' |
while read oid type; do
test "$type" = "tree" || continue
pktline want $oid
done
pktline done
printf 0000
}
want_trees | GIT_PROTOCOL=version=2 valgrind --tool=massif ./git upload-pack . >/dev/null
shows a peak heap usage of ~3.7GB. Which is just about the sum of the
sizes of all of the uncompressed trees. For linux.git, it's closer to
17GB.
So the obvious thing to do is to call free_tree_buffer() after we
realize that we've parsed a tree. We know that upload-pack won't need it
later. But let's push the logic into parse_object_with_flags(), telling
it to discard the tree buffer immediately. There are two reasons for
this. One, all of the relevant call-sites already call the with_options
variant to pass the SKIP_HASH flag. So it actually ends up as less code
than manually free-ing in each spot. And two, it enables an extra
optimization that I'll discuss below.
I've touched all of the sites that currently use SKIP_HASH in
upload-pack. That drops the peak heap of the upload-pack invocation
above from 3.7GB to ~24MB.
I've also modified the caller in get_reference(); a partial clone
benefits from its use in pack-objects for the reasons given in
0bc2557951 (upload-pack: skip parse-object re-hashing of "want" objects,
2022-09-06), where we were measuring blob requests. But note that the
results of get_reference() are used for traversing, as well; so we
really would _eventually_ use the tree contents. That makes this at
first glance a space/time tradeoff: we won't hold all of the trees in
memory at once, but we'll have to reload them each when it comes time to
traverse.
And here's where our extra optimization comes in. If the caller is not
going to immediately look at the tree contents, and it doesn't care
about checking the hash, then parse_object() can simply skip loading the
tree entirely, just like we do for blobs! And now it's not a space/time
tradeoff in get_reference() anymore. It's just a lazy-load: we're
delaying reading the tree contents until it's time to actually traverse
them one by one.
And of course for upload-pack, this optimization means we never load the
trees at all, saving lots of CPU time. Timing the "every tree from
git.git" request above shows upload-pack dropping from 32 seconds of CPU
to 19 (the remainder is mostly due to pack-objects actually sending the
pack; timing just the upload-pack portion shows we go from 13s to
~0.28s).
These are all highly gamed numbers, of course. For real-world
partial-clone requests we're saving only a small bit of time in
practice. But it does help harden upload-pack against malicious
denial-of-service attacks.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:39:07 +08:00
|
|
|
PARSE_OBJECT_SKIP_HASH_CHECK |
|
|
|
|
PARSE_OBJECT_DISCARD_TREE);
|
2017-02-24 02:43:03 +08:00
|
|
|
if (!o) {
|
2020-05-15 18:04:48 +08:00
|
|
|
packet_writer_error(&data->writer,
|
2019-01-16 03:40:27 +08:00
|
|
|
"upload-pack: not our ref %s",
|
|
|
|
oid_to_hex(&oid_buf));
|
2010-08-01 04:11:46 +08:00
|
|
|
die("git upload-pack: not our ref %s",
|
2017-05-07 06:10:28 +08:00
|
|
|
oid_to_hex(&oid_buf));
|
2017-02-24 02:43:03 +08:00
|
|
|
}
|
2005-10-25 09:59:18 +08:00
|
|
|
if (!(o->flags & WANTED)) {
|
|
|
|
o->flags |= WANTED;
|
2020-06-11 20:05:12 +08:00
|
|
|
if (!((data->allow_uor & ALLOW_ANY_SHA1) == ALLOW_ANY_SHA1
|
|
|
|
|| is_our_ref(o, data->allow_uor)))
|
2011-08-06 04:54:06 +08:00
|
|
|
has_non_tip = 1;
|
2020-05-15 18:04:47 +08:00
|
|
|
add_object_array(o, NULL, &data->want_obj);
|
2005-10-25 09:59:18 +08:00
|
|
|
}
|
2005-07-05 06:29:17 +08:00
|
|
|
}
|
2009-06-17 02:41:16 +08:00
|
|
|
|
2011-08-06 04:54:06 +08:00
|
|
|
/*
|
|
|
|
* We have sent all our refs already, and the other end
|
|
|
|
* should have chosen out of them. When we are operating
|
|
|
|
* in the stateless RPC mode, however, their choice may
|
|
|
|
* have been based on the set of older refs advertised
|
|
|
|
* by another process that handled the initial request.
|
|
|
|
*/
|
|
|
|
if (has_non_tip)
|
2020-05-15 18:04:51 +08:00
|
|
|
check_non_tip(data);
|
2011-08-06 04:54:06 +08:00
|
|
|
|
2020-06-05 01:54:41 +08:00
|
|
|
if (!data->use_sideband && data->daemon_mode)
|
2020-06-05 01:54:38 +08:00
|
|
|
data->no_progress = 1;
|
2009-06-17 02:41:16 +08:00
|
|
|
|
2020-05-15 18:04:54 +08:00
|
|
|
if (data->depth == 0 && !data->deepen_rev_list && data->shallows.nr == 0)
|
2006-10-31 03:09:53 +08:00
|
|
|
return;
|
2016-06-12 18:53:58 +08:00
|
|
|
|
2020-06-11 20:05:05 +08:00
|
|
|
if (send_shallow_list(data))
|
2018-03-16 01:31:28 +08:00
|
|
|
packet_flush(1);
|
2005-07-05 06:29:17 +08:00
|
|
|
}
|
|
|
|
|
upload/receive-pack: allow hiding ref hierarchies
A repository may have refs that are only used for its internal
bookkeeping purposes that should not be exposed to the others that
come over the network.
Teach upload-pack to omit some refs from its initial advertisement
by paying attention to the uploadpack.hiderefs multi-valued
configuration variable. Do the same to receive-pack via the
receive.hiderefs variable. As a convenient short-hand, allow using
transfer.hiderefs to set the value to both of these variables.
Any ref that is under the hierarchies listed on the value of these
variable is excluded from responses to requests made by "ls-remote",
"fetch", etc. (for upload-pack) and "push" (for receive-pack).
Because these hidden refs do not count as OUR_REF, an attempt to
fetch objects at the tip of them will be rejected, and because these
refs do not get advertised, "git push :" will not see local branches
that have the same name as them as "matching" ones to be sent.
An attempt to update/delete these hidden refs with an explicit
refspec, e.g. "git push origin :refs/hidden/22", is rejected. This
is not a new restriction. To the pusher, it would appear that there
is no such ref, so its push request will conclude with "Now that I
sent you all the data, it is time for you to update the refs. I saw
that the ref did not exist when I started pushing, and I want the
result to point at this commit". The receiving end will apply the
compare-and-swap rule to this request and rejects the push with
"Well, your update request conflicts with somebody else; I see there
is such a ref.", which is the right thing to do. Otherwise a push to
a hidden ref will always be "the last one wins", which is not a good
default.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 08:08:30 +08:00
|
|
|
/* return non-zero if the ref is hidden, otherwise 0 */
|
2015-11-03 15:58:16 +08:00
|
|
|
static int mark_our_ref(const char *refname, const char *refname_full,
|
2023-07-11 05:12:33 +08:00
|
|
|
const struct object_id *oid, const struct strvec *hidden_refs)
|
2013-01-19 07:48:49 +08:00
|
|
|
{
|
2021-04-13 15:16:36 +08:00
|
|
|
struct object *o = lookup_unknown_object(the_repository, oid);
|
upload/receive-pack: allow hiding ref hierarchies
A repository may have refs that are only used for its internal
bookkeeping purposes that should not be exposed to the others that
come over the network.
Teach upload-pack to omit some refs from its initial advertisement
by paying attention to the uploadpack.hiderefs multi-valued
configuration variable. Do the same to receive-pack via the
receive.hiderefs variable. As a convenient short-hand, allow using
transfer.hiderefs to set the value to both of these variables.
Any ref that is under the hierarchies listed on the value of these
variable is excluded from responses to requests made by "ls-remote",
"fetch", etc. (for upload-pack) and "push" (for receive-pack).
Because these hidden refs do not count as OUR_REF, an attempt to
fetch objects at the tip of them will be rejected, and because these
refs do not get advertised, "git push :" will not see local branches
that have the same name as them as "matching" ones to be sent.
An attempt to update/delete these hidden refs with an explicit
refspec, e.g. "git push origin :refs/hidden/22", is rejected. This
is not a new restriction. To the pusher, it would appear that there
is no such ref, so its push request will conclude with "Now that I
sent you all the data, it is time for you to update the refs. I saw
that the ref did not exist when I started pushing, and I want the
result to point at this commit". The receiving end will apply the
compare-and-swap rule to this request and rejects the push with
"Well, your update request conflicts with somebody else; I see there
is such a ref.", which is the right thing to do. Otherwise a push to
a hidden ref will always be "the last one wins", which is not a good
default.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 08:08:30 +08:00
|
|
|
|
2022-11-17 13:46:43 +08:00
|
|
|
if (ref_is_hidden(refname, refname_full, hidden_refs)) {
|
2013-01-29 13:49:57 +08:00
|
|
|
o->flags |= HIDDEN_REF;
|
upload/receive-pack: allow hiding ref hierarchies
A repository may have refs that are only used for its internal
bookkeeping purposes that should not be exposed to the others that
come over the network.
Teach upload-pack to omit some refs from its initial advertisement
by paying attention to the uploadpack.hiderefs multi-valued
configuration variable. Do the same to receive-pack via the
receive.hiderefs variable. As a convenient short-hand, allow using
transfer.hiderefs to set the value to both of these variables.
Any ref that is under the hierarchies listed on the value of these
variable is excluded from responses to requests made by "ls-remote",
"fetch", etc. (for upload-pack) and "push" (for receive-pack).
Because these hidden refs do not count as OUR_REF, an attempt to
fetch objects at the tip of them will be rejected, and because these
refs do not get advertised, "git push :" will not see local branches
that have the same name as them as "matching" ones to be sent.
An attempt to update/delete these hidden refs with an explicit
refspec, e.g. "git push origin :refs/hidden/22", is rejected. This
is not a new restriction. To the pusher, it would appear that there
is no such ref, so its push request will conclude with "Now that I
sent you all the data, it is time for you to update the refs. I saw
that the ref did not exist when I started pushing, and I want the
result to point at this commit". The receiving end will apply the
compare-and-swap rule to this request and rejects the push with
"Well, your update request conflicts with somebody else; I see there
is such a ref.", which is the right thing to do. Otherwise a push to
a hidden ref will always be "the last one wins", which is not a good
default.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 08:08:30 +08:00
|
|
|
return 1;
|
2013-01-29 13:49:57 +08:00
|
|
|
}
|
2013-01-29 12:45:43 +08:00
|
|
|
o->flags |= OUR_REF;
|
2013-01-19 07:48:49 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2024-08-09 23:37:50 +08:00
|
|
|
static int check_ref(const char *refname_full, const char *referent UNUSED,const struct object_id *oid,
|
2022-11-17 13:46:43 +08:00
|
|
|
int flag UNUSED, void *cb_data)
|
upload-pack: fix transfer.hiderefs over smart-http
When upload-pack advertises the refs (either for a normal,
non-stateless request, or for the initial contact in a
stateless one), we call for_each_ref with the send_ref
function as its callback. send_ref, in turn, calls
mark_our_ref, which checks whether the ref is hidden, and
sets OUR_REF or HIDDEN_REF on the object as appropriate. If
it is hidden, mark_our_ref also returns "1" to signal
send_ref that the ref should not be advertised.
If we are not advertising refs, (i.e., the follow-up
invocation by an http client to send its "want" lines), we
use mark_our_ref directly as a callback to for_each_ref. Its
marking does the right thing, but when it then returns "1"
to for_each_ref, the latter interprets this as an error and
stops iterating. As a result, we skip marking all of the
refs that come lexicographically after it. Any "want" lines
from the client asking for those objects will fail, as they
were not properly marked with OUR_REF.
To solve this, we introduce a wrapper callback around
mark_our_ref which always returns 0 (even if the ref is
hidden, we want to keep iterating). We also tweak the
signature of mark_our_ref to exclude unnecessary parameters
that were present only to conform to the callback interface.
This should make it less likely for somebody to accidentally
use it as a callback in the future.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-13 12:42:12 +08:00
|
|
|
{
|
2015-11-03 15:58:16 +08:00
|
|
|
const char *refname = strip_namespace(refname_full);
|
2022-11-17 13:46:43 +08:00
|
|
|
struct upload_pack_data *data = cb_data;
|
2015-11-03 15:58:16 +08:00
|
|
|
|
2022-11-17 13:46:43 +08:00
|
|
|
mark_our_ref(refname, refname_full, oid, &data->hidden_refs);
|
upload-pack: fix transfer.hiderefs over smart-http
When upload-pack advertises the refs (either for a normal,
non-stateless request, or for the initial contact in a
stateless one), we call for_each_ref with the send_ref
function as its callback. send_ref, in turn, calls
mark_our_ref, which checks whether the ref is hidden, and
sets OUR_REF or HIDDEN_REF on the object as appropriate. If
it is hidden, mark_our_ref also returns "1" to signal
send_ref that the ref should not be advertised.
If we are not advertising refs, (i.e., the follow-up
invocation by an http client to send its "want" lines), we
use mark_our_ref directly as a callback to for_each_ref. Its
marking does the right thing, but when it then returns "1"
to for_each_ref, the latter interprets this as an error and
stops iterating. As a result, we skip marking all of the
refs that come lexicographically after it. Any "want" lines
from the client asking for those objects will fail, as they
were not properly marked with OUR_REF.
To solve this, we introduce a wrapper callback around
mark_our_ref which always returns 0 (even if the ref is
hidden, we want to keep iterating). We also tweak the
signature of mark_our_ref to exclude unnecessary parameters
that were present only to conform to the callback interface.
This should make it less likely for somebody to accidentally
use it as a callback in the future.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-13 12:42:12 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-09-18 07:17:33 +08:00
|
|
|
static void format_symref_info(struct strbuf *buf, struct string_list *symref)
|
|
|
|
{
|
|
|
|
struct string_list_item *item;
|
|
|
|
|
|
|
|
if (!symref->nr)
|
|
|
|
return;
|
|
|
|
for_each_string_list_item(item, symref)
|
|
|
|
strbuf_addf(buf, " symref=%s:%s", item->string, (char *)item->util);
|
|
|
|
}
|
|
|
|
|
2020-11-12 07:29:27 +08:00
|
|
|
static void format_session_id(struct strbuf *buf, struct upload_pack_data *d) {
|
|
|
|
if (d->advertise_sid)
|
|
|
|
strbuf_addf(buf, " session-id=%s", trace2_session_id());
|
|
|
|
}
|
|
|
|
|
upload-pack: advertise capabilities when cloning empty repos
When cloning an empty repository, protocol versions 0 and 1 currently
offer nothing but the header and flush packets for the /info/refs
endpoint. This means that no capabilities are provided, so the client
side doesn't know what capabilities are present.
However, this does pose a problem when working with SHA-256
repositories, since we use the capabilities to know the remote side's
object format (hash algorithm). As of 8b214c2e9d ("clone: propagate
object-format when cloning from void", 2023-04-05), this has been fixed
for protocol v2, since there we always read the hash algorithm from the
remote.
Fortunately, the push version of the protocol already indicates a clue
for how to solve this. When the /info/refs endpoint is accessed for a
push and the remote is empty, we include a dummy "capabilities^{}" ref
pointing to the all-zeros object ID. The protocol documentation already
indicates this should _always_ be sent, even for fetches and clones, so
let's just do that, which means we'll properly announce the hash
algorithm as part of the capabilities. This just works with the
existing code because we share the same ref code for fetches and clones,
and libgit2, JGit, and dulwich do as well.
There is one minor issue to fix, though. If we called send_ref with
namespaces, we would return NULL with the capabilities entry, which
would cause a crash. Instead, let's refactor out a function to print
just the ref itself without stripping the namespace and use it for our
special capabilities entry.
Add several sets of tests for HTTP as well as for local clones. The
behavior can be slightly different for HTTP versus a local or SSH clone
because of the stateless-rpc functionality, so it's worth testing both.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-18 03:24:43 +08:00
|
|
|
static void write_v0_ref(struct upload_pack_data *data,
|
|
|
|
const char *refname, const char *refname_nons,
|
|
|
|
const struct object_id *oid)
|
2005-07-05 04:26:53 +08:00
|
|
|
{
|
2006-10-31 03:09:06 +08:00
|
|
|
static const char *capabilities = "multi_ack thin-pack side-band"
|
fetch, upload-pack: --deepen=N extends shallow boundary by N commits
In git-fetch, --depth argument is always relative with the latest
remote refs. This makes it a bit difficult to cover this use case,
where the user wants to make the shallow history, say 3 levels
deeper. It would work if remote refs have not moved yet, but nobody
can guarantee that, especially when that use case is performed a
couple months after the last clone or "git fetch --depth". Also,
modifying shallow boundary using --depth does not work well with
clones created by --since or --not.
This patch fixes that. A new argument --deepen=<N> will add <N> more (*)
parent commits to the current history regardless of where remote refs
are.
Have/Want negotiation is still respected. So if remote refs move, the
server will send two chunks: one between "have" and "want" and another
to extend shallow history. In theory, the client could send no "want"s
in order to get the second chunk only. But the protocol does not allow
that. Either you send no want lines, which means ls-remote; or you
have to send at least one want line that carries deep-relative to the
server..
The main work was done by Dongcan Jiang. I fixed it up here and there.
And of course all the bugs belong to me.
(*) We could even support --deepen=<N> where <N> is negative. In that
case we can cut some history from the shallow clone. This operation
(and --depth=<shorter depth>) does not require interaction with remote
side (and more complicated to implement as a result).
Helped-by: Duy Nguyen <pclouds@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Dongcan Jiang <dongcan.jiang@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-12 18:54:09 +08:00
|
|
|
" side-band-64k ofs-delta shallow deepen-since deepen-not"
|
|
|
|
" deepen-relative no-progress include-tag multi_ack_detailed";
|
2015-05-26 02:39:13 +08:00
|
|
|
struct object_id peeled;
|
2006-02-18 08:14:52 +08:00
|
|
|
|
2022-11-17 13:46:43 +08:00
|
|
|
if (mark_our_ref(refname_nons, refname, oid, &data->hidden_refs))
|
upload-pack: advertise capabilities when cloning empty repos
When cloning an empty repository, protocol versions 0 and 1 currently
offer nothing but the header and flush packets for the /info/refs
endpoint. This means that no capabilities are provided, so the client
side doesn't know what capabilities are present.
However, this does pose a problem when working with SHA-256
repositories, since we use the capabilities to know the remote side's
object format (hash algorithm). As of 8b214c2e9d ("clone: propagate
object-format when cloning from void", 2023-04-05), this has been fixed
for protocol v2, since there we always read the hash algorithm from the
remote.
Fortunately, the push version of the protocol already indicates a clue
for how to solve this. When the /info/refs endpoint is accessed for a
push and the remote is empty, we include a dummy "capabilities^{}" ref
pointing to the all-zeros object ID. The protocol documentation already
indicates this should _always_ be sent, even for fetches and clones, so
let's just do that, which means we'll properly announce the hash
algorithm as part of the capabilities. This just works with the
existing code because we share the same ref code for fetches and clones,
and libgit2, JGit, and dulwich do as well.
There is one minor issue to fix, though. If we called send_ref with
namespaces, we would return NULL with the capabilities entry, which
would cause a crash. Instead, let's refactor out a function to print
just the ref itself without stripping the namespace and use it for our
special capabilities entry.
Add several sets of tests for HTTP as well as for local clones. The
behavior can be slightly different for HTTP versus a local or SSH clone
because of the stateless-rpc functionality, so it's worth testing both.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-18 03:24:43 +08:00
|
|
|
return;
|
2013-01-19 07:48:49 +08:00
|
|
|
|
2013-09-18 07:17:33 +08:00
|
|
|
if (capabilities) {
|
|
|
|
struct strbuf symref_info = STRBUF_INIT;
|
2020-11-12 07:29:27 +08:00
|
|
|
struct strbuf session_id = STRBUF_INIT;
|
2013-09-18 07:17:33 +08:00
|
|
|
|
2020-05-15 18:04:50 +08:00
|
|
|
format_symref_info(&symref_info, &data->symref);
|
2020-11-12 07:29:27 +08:00
|
|
|
format_session_id(&session_id, data);
|
2021-09-01 20:54:42 +08:00
|
|
|
packet_fwrite_fmt(stdout, "%s %s%c%s%s%s%s%s%s%s object-format=%s agent=%s\n",
|
2015-05-26 02:39:12 +08:00
|
|
|
oid_to_hex(oid), refname_nons,
|
2011-03-30 01:24:59 +08:00
|
|
|
0, capabilities,
|
2020-06-11 20:05:12 +08:00
|
|
|
(data->allow_uor & ALLOW_TIP_SHA1) ?
|
2015-05-22 04:23:38 +08:00
|
|
|
" allow-tip-sha1-in-want" : "",
|
2020-06-11 20:05:12 +08:00
|
|
|
(data->allow_uor & ALLOW_REACHABLE_SHA1) ?
|
2015-05-22 04:23:39 +08:00
|
|
|
" allow-reachable-sha1-in-want" : "",
|
serve.[ch]: remove "serve_options", split up --advertise-refs code
The "advertise capabilities" mode of serve.c added in
ed10cb952d3 (serve: introduce git-serve, 2018-03-15) is only used by
the http-backend.c to call {upload,receive}-pack with the
--advertise-refs parameter. See 42526b478e3 (Add stateless RPC options
to upload-pack, receive-pack, 2009-10-30).
Let's just make cmd_upload_pack() take the two (v2) or three (v2)
parameters the the v2/v1 servicing functions need directly, and pass
those in via the function signature. The logic of whether daemon mode
is implied by the timeout belongs in the v1 function (only used
there).
Once we split up the "advertise v2 refs" from "serve v2 request" it
becomes clear that v2 never cared about those in combination. The only
time it mattered was for v1 to emit its ref advertisement, in that
case we wanted to emit the smart-http-only "no-done" capability.
Since we only do that in the --advertise-refs codepath let's just have
it set "do_done" itself in v1's upload_pack() just before send_ref(),
at that point --advertise-refs and --stateless-rpc in combination are
redundant (the only user is get_info_refs() in http-backend.c), so we
can just pass in --advertise-refs only.
Since we need to touch all the serve() and advertise_capabilities()
codepaths let's rename them to less clever and obvious names, it's
been suggested numerous times, the latest of which is [1]'s suggestion
for protocol_v2_serve_loop(). Let's go with that.
1. https://lore.kernel.org/git/CAFQ2z_NyGb8rju5CKzmo6KhZXD0Dp21u-BbyCb2aNxLEoSPRJw@mail.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-05 09:25:42 +08:00
|
|
|
data->no_done ? " no-done" : "",
|
2013-09-18 07:17:33 +08:00
|
|
|
symref_info.buf,
|
2020-06-05 01:54:47 +08:00
|
|
|
data->allow_filter ? " filter" : "",
|
2020-11-12 07:29:27 +08:00
|
|
|
session_id.buf,
|
2020-05-26 03:58:51 +08:00
|
|
|
the_hash_algo->name,
|
2012-08-04 00:19:16 +08:00
|
|
|
git_user_agent_sanitized());
|
2013-09-18 07:17:33 +08:00
|
|
|
strbuf_release(&symref_info);
|
2020-11-12 07:29:27 +08:00
|
|
|
strbuf_release(&session_id);
|
upload-pack: advertise capabilities when cloning empty repos
When cloning an empty repository, protocol versions 0 and 1 currently
offer nothing but the header and flush packets for the /info/refs
endpoint. This means that no capabilities are provided, so the client
side doesn't know what capabilities are present.
However, this does pose a problem when working with SHA-256
repositories, since we use the capabilities to know the remote side's
object format (hash algorithm). As of 8b214c2e9d ("clone: propagate
object-format when cloning from void", 2023-04-05), this has been fixed
for protocol v2, since there we always read the hash algorithm from the
remote.
Fortunately, the push version of the protocol already indicates a clue
for how to solve this. When the /info/refs endpoint is accessed for a
push and the remote is empty, we include a dummy "capabilities^{}" ref
pointing to the all-zeros object ID. The protocol documentation already
indicates this should _always_ be sent, even for fetches and clones, so
let's just do that, which means we'll properly announce the hash
algorithm as part of the capabilities. This just works with the
existing code because we share the same ref code for fetches and clones,
and libgit2, JGit, and dulwich do as well.
There is one minor issue to fix, though. If we called send_ref with
namespaces, we would return NULL with the capabilities entry, which
would cause a crash. Instead, let's refactor out a function to print
just the ref itself without stripping the namespace and use it for our
special capabilities entry.
Add several sets of tests for HTTP as well as for local clones. The
behavior can be slightly different for HTTP versus a local or SSH clone
because of the stateless-rpc functionality, so it's worth testing both.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-18 03:24:43 +08:00
|
|
|
data->sent_capabilities = 1;
|
2013-09-18 07:17:33 +08:00
|
|
|
} else {
|
2021-09-01 20:54:42 +08:00
|
|
|
packet_fwrite_fmt(stdout, "%s %s\n", oid_to_hex(oid), refname_nons);
|
2013-09-18 07:17:33 +08:00
|
|
|
}
|
2005-10-28 11:56:41 +08:00
|
|
|
capabilities = NULL;
|
2024-05-17 16:19:04 +08:00
|
|
|
if (!peel_iterated_oid(the_repository, oid, &peeled))
|
2021-09-01 20:54:42 +08:00
|
|
|
packet_fwrite_fmt(stdout, "%s %s^{}\n", oid_to_hex(&peeled), refname_nons);
|
upload-pack: advertise capabilities when cloning empty repos
When cloning an empty repository, protocol versions 0 and 1 currently
offer nothing but the header and flush packets for the /info/refs
endpoint. This means that no capabilities are provided, so the client
side doesn't know what capabilities are present.
However, this does pose a problem when working with SHA-256
repositories, since we use the capabilities to know the remote side's
object format (hash algorithm). As of 8b214c2e9d ("clone: propagate
object-format when cloning from void", 2023-04-05), this has been fixed
for protocol v2, since there we always read the hash algorithm from the
remote.
Fortunately, the push version of the protocol already indicates a clue
for how to solve this. When the /info/refs endpoint is accessed for a
push and the remote is empty, we include a dummy "capabilities^{}" ref
pointing to the all-zeros object ID. The protocol documentation already
indicates this should _always_ be sent, even for fetches and clones, so
let's just do that, which means we'll properly announce the hash
algorithm as part of the capabilities. This just works with the
existing code because we share the same ref code for fetches and clones,
and libgit2, JGit, and dulwich do as well.
There is one minor issue to fix, though. If we called send_ref with
namespaces, we would return NULL with the capabilities entry, which
would cause a crash. Instead, let's refactor out a function to print
just the ref itself without stripping the namespace and use it for our
special capabilities entry.
Add several sets of tests for HTTP as well as for local clones. The
behavior can be slightly different for HTTP versus a local or SSH clone
because of the stateless-rpc functionality, so it's worth testing both.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-18 03:24:43 +08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2024-08-09 23:37:50 +08:00
|
|
|
static int send_ref(const char *refname, const char *referent UNUSED, const struct object_id *oid,
|
upload-pack: advertise capabilities when cloning empty repos
When cloning an empty repository, protocol versions 0 and 1 currently
offer nothing but the header and flush packets for the /info/refs
endpoint. This means that no capabilities are provided, so the client
side doesn't know what capabilities are present.
However, this does pose a problem when working with SHA-256
repositories, since we use the capabilities to know the remote side's
object format (hash algorithm). As of 8b214c2e9d ("clone: propagate
object-format when cloning from void", 2023-04-05), this has been fixed
for protocol v2, since there we always read the hash algorithm from the
remote.
Fortunately, the push version of the protocol already indicates a clue
for how to solve this. When the /info/refs endpoint is accessed for a
push and the remote is empty, we include a dummy "capabilities^{}" ref
pointing to the all-zeros object ID. The protocol documentation already
indicates this should _always_ be sent, even for fetches and clones, so
let's just do that, which means we'll properly announce the hash
algorithm as part of the capabilities. This just works with the
existing code because we share the same ref code for fetches and clones,
and libgit2, JGit, and dulwich do as well.
There is one minor issue to fix, though. If we called send_ref with
namespaces, we would return NULL with the capabilities entry, which
would cause a crash. Instead, let's refactor out a function to print
just the ref itself without stripping the namespace and use it for our
special capabilities entry.
Add several sets of tests for HTTP as well as for local clones. The
behavior can be slightly different for HTTP versus a local or SSH clone
because of the stateless-rpc functionality, so it's worth testing both.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-18 03:24:43 +08:00
|
|
|
int flag UNUSED, void *cb_data)
|
|
|
|
{
|
|
|
|
write_v0_ref(cb_data, refname, strip_namespace(refname), oid);
|
2005-07-05 04:26:53 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2024-08-09 23:37:50 +08:00
|
|
|
static int find_symref(const char *refname, const char *referent UNUSED,
|
2022-08-26 01:09:48 +08:00
|
|
|
const struct object_id *oid UNUSED,
|
2015-05-26 02:39:10 +08:00
|
|
|
int flag, void *cb_data)
|
2013-09-18 07:17:33 +08:00
|
|
|
{
|
|
|
|
const char *symref_target;
|
|
|
|
struct string_list_item *item;
|
|
|
|
|
|
|
|
if ((flag & REF_ISSYMREF) == 0)
|
|
|
|
return 0;
|
2024-05-07 15:11:53 +08:00
|
|
|
symref_target = refs_resolve_ref_unsafe(get_main_ref_store(the_repository),
|
|
|
|
refname, 0, NULL, &flag);
|
2013-09-18 07:17:33 +08:00
|
|
|
if (!symref_target || (flag & REF_ISSYMREF) == 0)
|
|
|
|
die("'%s' is a symref but it is not?", refname);
|
upload-pack: strip namespace from symref data
Since 7171d8c15f (upload-pack: send symbolic ref information as
capability, 2013-09-17), we've sent cloning and fetching clients special
information about which branch HEAD is pointing to, so that they don't
have to guess based on matching up commit ids.
However, this feature has never worked properly with the GIT_NAMESPACE
feature. Because upload-pack uses head_ref_namespaced(find_symref), we
do find and report on refs/namespaces/foo/HEAD instead of the actual
HEAD of the repo. This makes sense, since the branch pointed to by the
top-level HEAD may not be advertised at all. But we do two things wrong:
1. We report the full name refs/namespaces/foo/HEAD, instead of just
HEAD. Meaning no client is going to bother doing anything with that
symref, since we're not otherwise advertising it.
2. We report the symref destination using its full name (e.g.,
refs/namespaces/foo/refs/heads/master). That's similarly useless to
the client, who only saw "refs/heads/master" in the advertisement.
We should be stripping the namespace prefix off of both places (which
this patch fixes).
Likely nobody noticed because we tend to do the right thing anyway. Bug
(1) means that we said nothing about HEAD (just refs/namespace/foo/HEAD).
And so the client half of the code, from a45b5f0552 (connect: annotate
refs with their symref information in get_remote_head(), 2013-09-17),
does not annotate HEAD, and we use the fallback in guess_remote_head(),
matching refs by object id. Which is usually right. It only falls down
in ambiguous cases, like the one laid out in the included test.
This also means that we don't have to worry about breaking anybody who
was putting pre-stripped names into their namespace symrefs when we fix
bug (2). Because of bug (1), nobody would have been using the symref we
advertised in the first place (not to mention that those symrefs would
have appeared broken for any non-namespaced access).
Note that we have separate fixes here for the v0 and v2 protocols. The
symref advertisement moved in v2 to be a part of the ls-refs command.
This actually gets part (1) right, since the symref annotation
piggy-backs on the existing ref advertisement, which is properly
stripped. But it still needs a fix for part (2). The included tests
cover both protocols.
Reported-by: Bryan Turner <bturner@atlassian.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-23 14:11:21 +08:00
|
|
|
item = string_list_append(cb_data, strip_namespace(refname));
|
|
|
|
item->util = xstrdup(strip_namespace(symref_target));
|
2013-09-18 07:17:33 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
upload-pack.c: allow banning certain object filter(s)
Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.
However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).
Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing two new configuration variables:
- 'uploadpackfilter.allow'
- 'uploadpackfilter.<kind>.allow'
where '<kind>' may be one of 'blobNone', 'blobLimit', 'tree', and so on.
Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.
If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpackfilter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04 02:00:10 +08:00
|
|
|
static int parse_object_filter_config(const char *var, const char *value,
|
config: pass kvi to die_bad_number()
Plumb "struct key_value_info" through all code paths that end in
die_bad_number(), which lets us remove the helper functions that read
analogous values from "struct config_reader". As a result, nothing reads
config_reader.config_kvi any more, so remove that too.
In config.c, this requires changing the signature of
git_configset_get_value() to 'return' "kvi" in an out parameter so that
git_configset_get_<type>() can pass it to git_config_<type>(). Only
numeric types will use "kvi", so for non-numeric types (e.g.
git_configset_get_string()), pass NULL to indicate that the out
parameter isn't needed.
Outside of config.c, config callbacks now need to pass "ctx->kvi" to any
of the git_config_<type>() functions that parse a config string into a
number type. Included is a .cocci patch to make that refactor.
The only exceptional case is builtin/config.c, where git_config_<type>()
is called outside of a config callback (namely, on user-provided input),
so config source information has never been available. In this case,
die_bad_number() defaults to a generic, but perfectly descriptive
message. Let's provide a safe, non-NULL for "kvi" anyway, but make sure
not to change the message.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-29 03:26:27 +08:00
|
|
|
const struct key_value_info *kvi,
|
|
|
|
struct upload_pack_data *data)
|
upload-pack.c: allow banning certain object filter(s)
Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.
However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).
Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing two new configuration variables:
- 'uploadpackfilter.allow'
- 'uploadpackfilter.<kind>.allow'
where '<kind>' may be one of 'blobNone', 'blobLimit', 'tree', and so on.
Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.
If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpackfilter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04 02:00:10 +08:00
|
|
|
{
|
|
|
|
struct strbuf buf = STRBUF_INIT;
|
|
|
|
const char *sub, *key;
|
|
|
|
size_t sub_len;
|
|
|
|
|
|
|
|
if (parse_config_key(var, "uploadpackfilter", &sub, &sub_len, &key))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (!sub) {
|
|
|
|
if (!strcmp(key, "allow"))
|
|
|
|
data->allow_filter_fallback = git_config_bool(var, value);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
strbuf_add(&buf, sub, sub_len);
|
|
|
|
|
|
|
|
if (!strcmp(key, "allow"))
|
|
|
|
string_list_insert(&data->allowed_filters, buf.buf)->util =
|
|
|
|
(void *)(intptr_t)git_config_bool(var, value);
|
upload-pack.c: introduce 'uploadpackfilter.tree.maxDepth'
In b79cf959b2 (upload-pack.c: allow banning certain object filter(s),
2020-02-26), we introduced functionality to disallow certain object
filters from being chosen from within 'git upload-pack'. Traditionally,
administrators use this functionality to disallow filters that are known
to perform slowly, for e.g., those that do not have bitmap-level
filtering.
In the past, the '--filter=tree:<n>' was one such filter that does not
have bitmap-level filtering support, and so was likely to be banned by
administrators.
However, in the previous couple of commits, we introduced bitmap-level
filtering for the case when 'n' is equal to '0', i.e., as if we had a
'--filter=tree:none' choice.
While it would be sufficient to simply write
$ git config uploadpackfilter.tree.allow true
(since it would allow all values of 'n'), we would like to be able to
allow this filter for certain values of 'n', i.e., those no greater than
some pre-specified maximum.
In order to do this, introduce a new configuration key, as follows:
$ git config uploadpackfilter.tree.maxDepth <m>
where '<m>' specifies the maximum allowed value of 'n' in the filter
'tree:n'. Administrators who wish to allow for only the value '0' can
write:
$ git config uploadpackfilter.tree.allow true
$ git config uploadpackfilter.tree.maxDepth 0
which allows '--filter=tree:0', but no other values.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04 02:00:17 +08:00
|
|
|
else if (!strcmp(buf.buf, "tree") && !strcmp(key, "maxdepth")) {
|
|
|
|
if (!value) {
|
|
|
|
strbuf_release(&buf);
|
|
|
|
return config_error_nonbool(var);
|
|
|
|
}
|
|
|
|
string_list_insert(&data->allowed_filters, buf.buf)->util =
|
|
|
|
(void *)(intptr_t)1;
|
config: pass kvi to die_bad_number()
Plumb "struct key_value_info" through all code paths that end in
die_bad_number(), which lets us remove the helper functions that read
analogous values from "struct config_reader". As a result, nothing reads
config_reader.config_kvi any more, so remove that too.
In config.c, this requires changing the signature of
git_configset_get_value() to 'return' "kvi" in an out parameter so that
git_configset_get_<type>() can pass it to git_config_<type>(). Only
numeric types will use "kvi", so for non-numeric types (e.g.
git_configset_get_string()), pass NULL to indicate that the out
parameter isn't needed.
Outside of config.c, config callbacks now need to pass "ctx->kvi" to any
of the git_config_<type>() functions that parse a config string into a
number type. Included is a .cocci patch to make that refactor.
The only exceptional case is builtin/config.c, where git_config_<type>()
is called outside of a config callback (namely, on user-provided input),
so config source information has never been available. In this case,
die_bad_number() defaults to a generic, but perfectly descriptive
message. Let's provide a safe, non-NULL for "kvi" anyway, but make sure
not to change the message.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-29 03:26:27 +08:00
|
|
|
data->tree_filter_max_depth = git_config_ulong(var, value,
|
|
|
|
kvi);
|
upload-pack.c: introduce 'uploadpackfilter.tree.maxDepth'
In b79cf959b2 (upload-pack.c: allow banning certain object filter(s),
2020-02-26), we introduced functionality to disallow certain object
filters from being chosen from within 'git upload-pack'. Traditionally,
administrators use this functionality to disallow filters that are known
to perform slowly, for e.g., those that do not have bitmap-level
filtering.
In the past, the '--filter=tree:<n>' was one such filter that does not
have bitmap-level filtering support, and so was likely to be banned by
administrators.
However, in the previous couple of commits, we introduced bitmap-level
filtering for the case when 'n' is equal to '0', i.e., as if we had a
'--filter=tree:none' choice.
While it would be sufficient to simply write
$ git config uploadpackfilter.tree.allow true
(since it would allow all values of 'n'), we would like to be able to
allow this filter for certain values of 'n', i.e., those no greater than
some pre-specified maximum.
In order to do this, introduce a new configuration key, as follows:
$ git config uploadpackfilter.tree.maxDepth <m>
where '<m>' specifies the maximum allowed value of 'n' in the filter
'tree:n'. Administrators who wish to allow for only the value '0' can
write:
$ git config uploadpackfilter.tree.allow true
$ git config uploadpackfilter.tree.maxDepth 0
which allows '--filter=tree:0', but no other values.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04 02:00:17 +08:00
|
|
|
}
|
upload-pack.c: allow banning certain object filter(s)
Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.
However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).
Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing two new configuration variables:
- 'uploadpackfilter.allow'
- 'uploadpackfilter.<kind>.allow'
where '<kind>' may be one of 'blobNone', 'blobLimit', 'tree', and so on.
Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.
If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpackfilter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04 02:00:10 +08:00
|
|
|
|
|
|
|
strbuf_release(&buf);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
config: add ctx arg to config_fn_t
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).
In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.
Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:
- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed
Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.
The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:
- trace2/tr2_cfg.c:tr2_cfg_set_fl()
This is indirectly called by git_config_set() so that the trace2
machinery can notice the new config values and update its settings
using the tr2 config parsing function, i.e. tr2_cfg_cb().
- builtin/checkout.c:checkout_main()
This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
This might be worth refactoring away in the future, since
git_xmerge_config() can call git_default_config(), which can do much
more than just parsing.
Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-29 03:26:22 +08:00
|
|
|
static int upload_pack_config(const char *var, const char *value,
|
config: pass kvi to die_bad_number()
Plumb "struct key_value_info" through all code paths that end in
die_bad_number(), which lets us remove the helper functions that read
analogous values from "struct config_reader". As a result, nothing reads
config_reader.config_kvi any more, so remove that too.
In config.c, this requires changing the signature of
git_configset_get_value() to 'return' "kvi" in an out parameter so that
git_configset_get_<type>() can pass it to git_config_<type>(). Only
numeric types will use "kvi", so for non-numeric types (e.g.
git_configset_get_string()), pass NULL to indicate that the out
parameter isn't needed.
Outside of config.c, config callbacks now need to pass "ctx->kvi" to any
of the git_config_<type>() functions that parse a config string into a
number type. Included is a .cocci patch to make that refactor.
The only exceptional case is builtin/config.c, where git_config_<type>()
is called outside of a config callback (namely, on user-provided input),
so config source information has never been available. In this case,
die_bad_number() defaults to a generic, but perfectly descriptive
message. Let's provide a safe, non-NULL for "kvi" anyway, but make sure
not to change the message.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-29 03:26:27 +08:00
|
|
|
const struct config_context *ctx,
|
config: add ctx arg to config_fn_t
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).
In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.
Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:
- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed
Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.
The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:
- trace2/tr2_cfg.c:tr2_cfg_set_fl()
This is indirectly called by git_config_set() so that the trace2
machinery can notice the new config values and update its settings
using the tr2 config parsing function, i.e. tr2_cfg_cb().
- builtin/checkout.c:checkout_main()
This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
This might be worth refactoring away in the future, since
git_xmerge_config() can call git_default_config(), which can do much
more than just parsing.
Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-29 03:26:22 +08:00
|
|
|
void *cb_data)
|
upload/receive-pack: allow hiding ref hierarchies
A repository may have refs that are only used for its internal
bookkeeping purposes that should not be exposed to the others that
come over the network.
Teach upload-pack to omit some refs from its initial advertisement
by paying attention to the uploadpack.hiderefs multi-valued
configuration variable. Do the same to receive-pack via the
receive.hiderefs variable. As a convenient short-hand, allow using
transfer.hiderefs to set the value to both of these variables.
Any ref that is under the hierarchies listed on the value of these
variable is excluded from responses to requests made by "ls-remote",
"fetch", etc. (for upload-pack) and "push" (for receive-pack).
Because these hidden refs do not count as OUR_REF, an attempt to
fetch objects at the tip of them will be rejected, and because these
refs do not get advertised, "git push :" will not see local branches
that have the same name as them as "matching" ones to be sent.
An attempt to update/delete these hidden refs with an explicit
refspec, e.g. "git push origin :refs/hidden/22", is rejected. This
is not a new restriction. To the pusher, it would appear that there
is no such ref, so its push request will conclude with "Now that I
sent you all the data, it is time for you to update the refs. I saw
that the ref did not exist when I started pushing, and I want the
result to point at this commit". The receiving end will apply the
compare-and-swap rule to this request and rejects the push with
"Well, your update request conflicts with somebody else; I see there
is such a ref.", which is the right thing to do. Otherwise a push to
a hidden ref will always be "the last one wins", which is not a good
default.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 08:08:30 +08:00
|
|
|
{
|
2020-06-05 01:54:46 +08:00
|
|
|
struct upload_pack_data *data = cb_data;
|
|
|
|
|
2015-05-22 04:23:38 +08:00
|
|
|
if (!strcmp("uploadpack.allowtipsha1inwant", var)) {
|
|
|
|
if (git_config_bool(var, value))
|
2020-06-11 20:05:12 +08:00
|
|
|
data->allow_uor |= ALLOW_TIP_SHA1;
|
2015-05-22 04:23:38 +08:00
|
|
|
else
|
2020-06-11 20:05:12 +08:00
|
|
|
data->allow_uor &= ~ALLOW_TIP_SHA1;
|
2015-05-22 04:23:39 +08:00
|
|
|
} else if (!strcmp("uploadpack.allowreachablesha1inwant", var)) {
|
|
|
|
if (git_config_bool(var, value))
|
2020-06-11 20:05:12 +08:00
|
|
|
data->allow_uor |= ALLOW_REACHABLE_SHA1;
|
2015-05-22 04:23:39 +08:00
|
|
|
else
|
2020-06-11 20:05:12 +08:00
|
|
|
data->allow_uor &= ~ALLOW_REACHABLE_SHA1;
|
2016-11-12 01:23:48 +08:00
|
|
|
} else if (!strcmp("uploadpack.allowanysha1inwant", var)) {
|
|
|
|
if (git_config_bool(var, value))
|
2020-06-11 20:05:12 +08:00
|
|
|
data->allow_uor |= ALLOW_ANY_SHA1;
|
2016-11-12 01:23:48 +08:00
|
|
|
else
|
2020-06-11 20:05:12 +08:00
|
|
|
data->allow_uor &= ~ALLOW_ANY_SHA1;
|
2015-05-22 04:23:38 +08:00
|
|
|
} else if (!strcmp("uploadpack.keepalive", var)) {
|
config: pass kvi to die_bad_number()
Plumb "struct key_value_info" through all code paths that end in
die_bad_number(), which lets us remove the helper functions that read
analogous values from "struct config_reader". As a result, nothing reads
config_reader.config_kvi any more, so remove that too.
In config.c, this requires changing the signature of
git_configset_get_value() to 'return' "kvi" in an out parameter so that
git_configset_get_<type>() can pass it to git_config_<type>(). Only
numeric types will use "kvi", so for non-numeric types (e.g.
git_configset_get_string()), pass NULL to indicate that the out
parameter isn't needed.
Outside of config.c, config callbacks now need to pass "ctx->kvi" to any
of the git_config_<type>() functions that parse a config string into a
number type. Included is a .cocci patch to make that refactor.
The only exceptional case is builtin/config.c, where git_config_<type>()
is called outside of a config callback (namely, on user-provided input),
so config source information has never been available. In this case,
die_bad_number() defaults to a generic, but perfectly descriptive
message. Let's provide a safe, non-NULL for "kvi" anyway, but make sure
not to change the message.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-29 03:26:27 +08:00
|
|
|
data->keepalive = git_config_int(var, value, ctx->kvi);
|
2020-06-05 01:54:46 +08:00
|
|
|
if (!data->keepalive)
|
|
|
|
data->keepalive = -1;
|
2017-12-08 23:58:39 +08:00
|
|
|
} else if (!strcmp("uploadpack.allowfilter", var)) {
|
2020-06-05 01:54:47 +08:00
|
|
|
data->allow_filter = git_config_bool(var, value);
|
2018-06-28 06:30:17 +08:00
|
|
|
} else if (!strcmp("uploadpack.allowrefinwant", var)) {
|
2020-06-05 01:54:48 +08:00
|
|
|
data->allow_ref_in_want = git_config_bool(var, value);
|
2019-01-17 03:28:14 +08:00
|
|
|
} else if (!strcmp("uploadpack.allowsidebandall", var)) {
|
2020-06-05 01:54:49 +08:00
|
|
|
data->allow_sideband_all = git_config_bool(var, value);
|
2024-02-29 06:50:50 +08:00
|
|
|
} else if (!strcmp("uploadpack.blobpackfileuri", var)) {
|
|
|
|
if (value)
|
|
|
|
data->allow_packfile_uris = 1;
|
Honor core.precomposeUnicode in more places
On Mac's HFS where git sets core.precomposeUnicode to true automatically
by git init/clone, when a user creates a simple unicode refname (in NFC
format) such as españa:
$ git branch españa
different commands would display the branch name differently. For
example, git branch, git log --decorate, and git fast-export all used
65 73 70 61 c3 b1 61 (or "espa\xc3\xb1a")
(NFC form) while show-ref would use
65 73 70 61 6e cc 83 61 (or "espan\xcc\x83a")
(NFD form). A stress test for git filter-repo was tripped up by this
inconsistency, though digging in I found that the problems could
compound; for example, if the user ran
$ git pack-refs --all
and then tried to check out the branch, they would be met with:
$ git checkout españa
error: pathspec 'españa' did not match any file(s) known to git
$ git checkout españa --
fatal: invalid reference: españa
$ git branch
españa
* master
Note that the user could run the `git branch` command first and copy and
paste the `españa` portion of the output and still see the same two
errors. Also, if the user added --no-prune to the pack-refs command,
then they would see three branches: master, españa, and españa (those
last two are NFC vs. NFD forms, even if they render the same).
Further, if the user had the `españa` branch checked out before
running `git pack-refs --all`, the user would be greeted with (note
that I'm trimming trailing output with an ellipsis):
$ git rev-parse HEAD
fatal: ambiguous argument 'HEAD': unknown revision or path...
$ git status
On branch españa
No commits yet...
Or worse, if the user didn't check this stuff first, running `git
commit` will create a new commit with all changes of all of history
being squashed into it.
In addition to pack-refs, one could also get into this state with
upload-pack or anything that calls either pack-refs or upload-pack (e.g.
gc or clone).
Add code in a few places (pack-refs, show-ref, upload-pack) to check and
honor the setting of core.precomposeUnicode to avoid these bugs.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-25 22:58:54 +08:00
|
|
|
} else if (!strcmp("core.precomposeunicode", var)) {
|
|
|
|
precomposed_unicode = git_config_bool(var, value);
|
2020-11-12 07:29:27 +08:00
|
|
|
} else if (!strcmp("transfer.advertisesid", var)) {
|
|
|
|
data->advertise_sid = git_config_bool(var, value);
|
2013-09-08 17:01:31 +08:00
|
|
|
}
|
upload-pack: fix broken if/else chain in config callback
The upload_pack_config() callback uses an if/else chain
like:
if (!strcmp(var, "a"))
...
else if (!strcmp(var, "b"))
...
etc
This works as long as the conditions are mutually exclusive,
but one of them is not. 20b20a22f8 (upload-pack: provide a
hook for running pack-objects, 2016-05-18) added:
else if (current_config_scope() != CONFIG_SCOPE_REPO) {
... check some more options ...
}
That was fine in that commit, because it came at the end of
the chain. But later, 10ac85c785 (upload-pack: add object
filtering for partial clone, 2017-12-08) did this:
else if (current_config_scope() != CONFIG_SCOPE_REPO) {
... check some more options ...
} else if (!strcmp("uploadpack.allowfilter", var))
...
We'd always check the scope condition first, meaning we'd
_only_ respect allowfilter when it's in the repo config. You
can see this with:
git -c uploadpack.allowfilter=true upload-pack . | head -1
which will not advertise the filter capability (but will
after this patch). We never noticed because:
- our tests always set it in the repo config
- in protocol v2, we use a different code path that
actually calls repo_config_get_bool() separately, so
that _does_ work. Real-world people experimenting with
this may be using v2.
The more recent uploadpack.allowrefinwant option is in the
same boat.
There are a few possible fixes:
1. Bump the scope conditional back to the bottom of the
chain. But that just means somebody else is likely to
make the same mistake later.
2. Make the conditional more like the others. I.e.:
else if (!current_config_scope() != CONFIG_SCOPE_REPO &&
!strcmp(var, "uploadpack.notallowedinrepo"))
This works, but the idea of the original structure was
that we may grow multiple sensitive options like this.
3. Pull it out of the chain entirely. The chain mostly
serves to avoid extra strcmp() calls after we've found
a match. But it's not worth caring about those. In the
worst case, when there isn't a match, we're already
hitting every strcmp (and this happens regularly for
stuff like "core.bare", etc).
This patch does (3).
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-24 15:27:52 +08:00
|
|
|
|
config: pass kvi to die_bad_number()
Plumb "struct key_value_info" through all code paths that end in
die_bad_number(), which lets us remove the helper functions that read
analogous values from "struct config_reader". As a result, nothing reads
config_reader.config_kvi any more, so remove that too.
In config.c, this requires changing the signature of
git_configset_get_value() to 'return' "kvi" in an out parameter so that
git_configset_get_<type>() can pass it to git_config_<type>(). Only
numeric types will use "kvi", so for non-numeric types (e.g.
git_configset_get_string()), pass NULL to indicate that the out
parameter isn't needed.
Outside of config.c, config callbacks now need to pass "ctx->kvi" to any
of the git_config_<type>() functions that parse a config string into a
number type. Included is a .cocci patch to make that refactor.
The only exceptional case is builtin/config.c, where git_config_<type>()
is called outside of a config callback (namely, on user-provided input),
so config source information has never been available. In this case,
die_bad_number() defaults to a generic, but perfectly descriptive
message. Let's provide a safe, non-NULL for "kvi" anyway, but make sure
not to change the message.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-29 03:26:27 +08:00
|
|
|
if (parse_object_filter_config(var, value, ctx->kvi, data) < 0)
|
2020-12-03 16:09:42 +08:00
|
|
|
return -1;
|
upload-pack.c: allow banning certain object filter(s)
Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.
However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).
Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing two new configuration variables:
- 'uploadpackfilter.allow'
- 'uploadpackfilter.<kind>.allow'
where '<kind>' may be one of 'blobNone', 'blobLimit', 'tree', and so on.
Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.
If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpackfilter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04 02:00:10 +08:00
|
|
|
|
2022-11-17 13:46:43 +08:00
|
|
|
return parse_hide_refs_config(var, value, "uploadpack", &data->hidden_refs);
|
upload/receive-pack: allow hiding ref hierarchies
A repository may have refs that are only used for its internal
bookkeeping purposes that should not be exposed to the others that
come over the network.
Teach upload-pack to omit some refs from its initial advertisement
by paying attention to the uploadpack.hiderefs multi-valued
configuration variable. Do the same to receive-pack via the
receive.hiderefs variable. As a convenient short-hand, allow using
transfer.hiderefs to set the value to both of these variables.
Any ref that is under the hierarchies listed on the value of these
variable is excluded from responses to requests made by "ls-remote",
"fetch", etc. (for upload-pack) and "push" (for receive-pack).
Because these hidden refs do not count as OUR_REF, an attempt to
fetch objects at the tip of them will be rejected, and because these
refs do not get advertised, "git push :" will not see local branches
that have the same name as them as "matching" ones to be sent.
An attempt to update/delete these hidden refs with an explicit
refspec, e.g. "git push origin :refs/hidden/22", is rejected. This
is not a new restriction. To the pusher, it would appear that there
is no such ref, so its push request will conclude with "Now that I
sent you all the data, it is time for you to update the refs. I saw
that the ref did not exist when I started pushing, and I want the
result to point at this commit". The receiving end will apply the
compare-and-swap rule to this request and rejects the push with
"Well, your update request conflicts with somebody else; I see there
is such a ref.", which is the right thing to do. Otherwise a push to
a hidden ref will always be "the last one wins", which is not a good
default.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 08:08:30 +08:00
|
|
|
}
|
|
|
|
|
config: add ctx arg to config_fn_t
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).
In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.
Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:
- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed
Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.
The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:
- trace2/tr2_cfg.c:tr2_cfg_set_fl()
This is indirectly called by git_config_set() so that the trace2
machinery can notice the new config values and update its settings
using the tr2 config parsing function, i.e. tr2_cfg_cb().
- builtin/checkout.c:checkout_main()
This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
This might be worth refactoring away in the future, since
git_xmerge_config() can call git_default_config(), which can do much
more than just parsing.
Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-29 03:26:22 +08:00
|
|
|
static int upload_pack_protected_config(const char *var, const char *value,
|
|
|
|
const struct config_context *ctx UNUSED,
|
|
|
|
void *cb_data)
|
config: learn `git_protected_config()`
`uploadpack.packObjectsHook` is the only 'protected configuration only'
variable today, but we've noted that `safe.directory` and the upcoming
`safe.bareRepository` should also be 'protected configuration only'. So,
for consistency, we'd like to have a single implementation for protected
configuration.
The primary constraints are:
1. Reading from protected configuration should be fast. Nearly all "git"
commands inside a bare repository will read both `safe.directory` and
`safe.bareRepository`, so we cannot afford to be slow.
2. Protected configuration must be readable when the gitdir is not
known. `safe.directory` and `safe.bareRepository` both affect
repository discovery and the gitdir is not known at that point [1].
The chosen implementation in this commit is to read protected
configuration and cache the values in a global configset. This is
similar to the caching behavior we get with the_repository->config.
Introduce git_protected_config(), which reads protected configuration
and caches them in the global configset protected_config. Then, refactor
`uploadpack.packObjectsHook` to use git_protected_config().
The protected configuration functions are named similarly to their
non-protected counterparts, e.g. git_protected_config_check_init() vs
git_config_check_init().
In light of constraint 1, this implementation can still be improved.
git_protected_config() iterates through every variable in
protected_config, which is wasteful, but it makes the conversion simple
because it matches existing patterns. We will likely implement constant
time lookup functions for protected configuration in a future series
(such functions already exist for non-protected configuration, i.e.
repo_config_get_*()).
An alternative that avoids introducing another configset is to continue
to read all config using git_config(), but only accept values that have
the correct config scope [2]. This technically fulfills constraint 2,
because git_config() simply ignores the local and worktree config when
the gitdir is not known. However, this would read incomplete config into
the_repository->config, which would need to be reset when the gitdir is
known and git_config() needs to read the local and worktree config.
Resetting the_repository->config might be reasonable while we only have
these 'protected configuration only' variables, but it's not clear
whether this extends well to future variables.
[1] In this case, we do have a candidate gitdir though, so with a little
refactoring, it might be possible to provide a gitdir.
[2] This is how `uploadpack.packObjectsHook` was implemented prior to
this commit.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-15 05:27:59 +08:00
|
|
|
{
|
|
|
|
struct upload_pack_data *data = cb_data;
|
|
|
|
|
|
|
|
if (!strcmp("uploadpack.packobjectshook", var))
|
|
|
|
return git_config_string(&data->pack_objects_hook, var, value);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2024-02-29 06:46:47 +08:00
|
|
|
static void get_upload_pack_config(struct repository *r,
|
|
|
|
struct upload_pack_data *data)
|
config: learn `git_protected_config()`
`uploadpack.packObjectsHook` is the only 'protected configuration only'
variable today, but we've noted that `safe.directory` and the upcoming
`safe.bareRepository` should also be 'protected configuration only'. So,
for consistency, we'd like to have a single implementation for protected
configuration.
The primary constraints are:
1. Reading from protected configuration should be fast. Nearly all "git"
commands inside a bare repository will read both `safe.directory` and
`safe.bareRepository`, so we cannot afford to be slow.
2. Protected configuration must be readable when the gitdir is not
known. `safe.directory` and `safe.bareRepository` both affect
repository discovery and the gitdir is not known at that point [1].
The chosen implementation in this commit is to read protected
configuration and cache the values in a global configset. This is
similar to the caching behavior we get with the_repository->config.
Introduce git_protected_config(), which reads protected configuration
and caches them in the global configset protected_config. Then, refactor
`uploadpack.packObjectsHook` to use git_protected_config().
The protected configuration functions are named similarly to their
non-protected counterparts, e.g. git_protected_config_check_init() vs
git_config_check_init().
In light of constraint 1, this implementation can still be improved.
git_protected_config() iterates through every variable in
protected_config, which is wasteful, but it makes the conversion simple
because it matches existing patterns. We will likely implement constant
time lookup functions for protected configuration in a future series
(such functions already exist for non-protected configuration, i.e.
repo_config_get_*()).
An alternative that avoids introducing another configset is to continue
to read all config using git_config(), but only accept values that have
the correct config scope [2]. This technically fulfills constraint 2,
because git_config() simply ignores the local and worktree config when
the gitdir is not known. However, this would read incomplete config into
the_repository->config, which would need to be reset when the gitdir is
known and git_config() needs to read the local and worktree config.
Resetting the_repository->config might be reasonable while we only have
these 'protected configuration only' variables, but it's not clear
whether this extends well to future variables.
[1] In this case, we do have a candidate gitdir though, so with a little
refactoring, it might be possible to provide a gitdir.
[2] This is how `uploadpack.packObjectsHook` was implemented prior to
this commit.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-15 05:27:59 +08:00
|
|
|
{
|
2024-02-29 06:46:47 +08:00
|
|
|
repo_config(r, upload_pack_config, data);
|
config: learn `git_protected_config()`
`uploadpack.packObjectsHook` is the only 'protected configuration only'
variable today, but we've noted that `safe.directory` and the upcoming
`safe.bareRepository` should also be 'protected configuration only'. So,
for consistency, we'd like to have a single implementation for protected
configuration.
The primary constraints are:
1. Reading from protected configuration should be fast. Nearly all "git"
commands inside a bare repository will read both `safe.directory` and
`safe.bareRepository`, so we cannot afford to be slow.
2. Protected configuration must be readable when the gitdir is not
known. `safe.directory` and `safe.bareRepository` both affect
repository discovery and the gitdir is not known at that point [1].
The chosen implementation in this commit is to read protected
configuration and cache the values in a global configset. This is
similar to the caching behavior we get with the_repository->config.
Introduce git_protected_config(), which reads protected configuration
and caches them in the global configset protected_config. Then, refactor
`uploadpack.packObjectsHook` to use git_protected_config().
The protected configuration functions are named similarly to their
non-protected counterparts, e.g. git_protected_config_check_init() vs
git_config_check_init().
In light of constraint 1, this implementation can still be improved.
git_protected_config() iterates through every variable in
protected_config, which is wasteful, but it makes the conversion simple
because it matches existing patterns. We will likely implement constant
time lookup functions for protected configuration in a future series
(such functions already exist for non-protected configuration, i.e.
repo_config_get_*()).
An alternative that avoids introducing another configset is to continue
to read all config using git_config(), but only accept values that have
the correct config scope [2]. This technically fulfills constraint 2,
because git_config() simply ignores the local and worktree config when
the gitdir is not known. However, this would read incomplete config into
the_repository->config, which would need to be reset when the gitdir is
known and git_config() needs to read the local and worktree config.
Resetting the_repository->config might be reasonable while we only have
these 'protected configuration only' variables, but it's not clear
whether this extends well to future variables.
[1] In this case, we do have a candidate gitdir though, so with a little
refactoring, it might be possible to provide a gitdir.
[2] This is how `uploadpack.packObjectsHook` was implemented prior to
this commit.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-15 05:27:59 +08:00
|
|
|
git_protected_config(upload_pack_protected_config, data);
|
2024-02-29 06:47:18 +08:00
|
|
|
|
|
|
|
data->allow_sideband_all |= git_env_bool("GIT_TEST_SIDEBAND_ALL", 0);
|
config: learn `git_protected_config()`
`uploadpack.packObjectsHook` is the only 'protected configuration only'
variable today, but we've noted that `safe.directory` and the upcoming
`safe.bareRepository` should also be 'protected configuration only'. So,
for consistency, we'd like to have a single implementation for protected
configuration.
The primary constraints are:
1. Reading from protected configuration should be fast. Nearly all "git"
commands inside a bare repository will read both `safe.directory` and
`safe.bareRepository`, so we cannot afford to be slow.
2. Protected configuration must be readable when the gitdir is not
known. `safe.directory` and `safe.bareRepository` both affect
repository discovery and the gitdir is not known at that point [1].
The chosen implementation in this commit is to read protected
configuration and cache the values in a global configset. This is
similar to the caching behavior we get with the_repository->config.
Introduce git_protected_config(), which reads protected configuration
and caches them in the global configset protected_config. Then, refactor
`uploadpack.packObjectsHook` to use git_protected_config().
The protected configuration functions are named similarly to their
non-protected counterparts, e.g. git_protected_config_check_init() vs
git_config_check_init().
In light of constraint 1, this implementation can still be improved.
git_protected_config() iterates through every variable in
protected_config, which is wasteful, but it makes the conversion simple
because it matches existing patterns. We will likely implement constant
time lookup functions for protected configuration in a future series
(such functions already exist for non-protected configuration, i.e.
repo_config_get_*()).
An alternative that avoids introducing another configset is to continue
to read all config using git_config(), but only accept values that have
the correct config scope [2]. This technically fulfills constraint 2,
because git_config() simply ignores the local and worktree config when
the gitdir is not known. However, this would read incomplete config into
the_repository->config, which would need to be reset when the gitdir is
known and git_config() needs to read the local and worktree config.
Resetting the_repository->config might be reasonable while we only have
these 'protected configuration only' variables, but it's not clear
whether this extends well to future variables.
[1] In this case, we do have a candidate gitdir though, so with a little
refactoring, it might be possible to provide a gitdir.
[2] This is how `uploadpack.packObjectsHook` was implemented prior to
this commit.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-15 05:27:59 +08:00
|
|
|
}
|
|
|
|
|
serve.[ch]: remove "serve_options", split up --advertise-refs code
The "advertise capabilities" mode of serve.c added in
ed10cb952d3 (serve: introduce git-serve, 2018-03-15) is only used by
the http-backend.c to call {upload,receive}-pack with the
--advertise-refs parameter. See 42526b478e3 (Add stateless RPC options
to upload-pack, receive-pack, 2009-10-30).
Let's just make cmd_upload_pack() take the two (v2) or three (v2)
parameters the the v2/v1 servicing functions need directly, and pass
those in via the function signature. The logic of whether daemon mode
is implied by the timeout belongs in the v1 function (only used
there).
Once we split up the "advertise v2 refs" from "serve v2 request" it
becomes clear that v2 never cared about those in combination. The only
time it mattered was for v1 to emit its ref advertisement, in that
case we wanted to emit the smart-http-only "no-done" capability.
Since we only do that in the --advertise-refs codepath let's just have
it set "do_done" itself in v1's upload_pack() just before send_ref(),
at that point --advertise-refs and --stateless-rpc in combination are
redundant (the only user is get_info_refs() in http-backend.c), so we
can just pass in --advertise-refs only.
Since we need to touch all the serve() and advertise_capabilities()
codepaths let's rename them to less clever and obvious names, it's
been suggested numerous times, the latest of which is [1]'s suggestion
for protocol_v2_serve_loop(). Let's go with that.
1. https://lore.kernel.org/git/CAFQ2z_NyGb8rju5CKzmo6KhZXD0Dp21u-BbyCb2aNxLEoSPRJw@mail.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-05 09:25:42 +08:00
|
|
|
void upload_pack(const int advertise_refs, const int stateless_rpc,
|
|
|
|
const int timeout)
|
2005-07-05 04:26:53 +08:00
|
|
|
{
|
2018-12-30 05:19:14 +08:00
|
|
|
struct packet_reader reader;
|
2020-05-15 18:04:45 +08:00
|
|
|
struct upload_pack_data data;
|
2005-10-20 05:27:01 +08:00
|
|
|
|
2020-05-15 18:04:45 +08:00
|
|
|
upload_pack_data_init(&data);
|
2024-02-29 06:46:47 +08:00
|
|
|
get_upload_pack_config(the_repository, &data);
|
2020-06-05 01:54:45 +08:00
|
|
|
|
serve.[ch]: remove "serve_options", split up --advertise-refs code
The "advertise capabilities" mode of serve.c added in
ed10cb952d3 (serve: introduce git-serve, 2018-03-15) is only used by
the http-backend.c to call {upload,receive}-pack with the
--advertise-refs parameter. See 42526b478e3 (Add stateless RPC options
to upload-pack, receive-pack, 2009-10-30).
Let's just make cmd_upload_pack() take the two (v2) or three (v2)
parameters the the v2/v1 servicing functions need directly, and pass
those in via the function signature. The logic of whether daemon mode
is implied by the timeout belongs in the v1 function (only used
there).
Once we split up the "advertise v2 refs" from "serve v2 request" it
becomes clear that v2 never cared about those in combination. The only
time it mattered was for v1 to emit its ref advertisement, in that
case we wanted to emit the smart-http-only "no-done" capability.
Since we only do that in the --advertise-refs codepath let's just have
it set "do_done" itself in v1's upload_pack() just before send_ref(),
at that point --advertise-refs and --stateless-rpc in combination are
redundant (the only user is get_info_refs() in http-backend.c), so we
can just pass in --advertise-refs only.
Since we need to touch all the serve() and advertise_capabilities()
codepaths let's rename them to less clever and obvious names, it's
been suggested numerous times, the latest of which is [1]'s suggestion
for protocol_v2_serve_loop(). Let's go with that.
1. https://lore.kernel.org/git/CAFQ2z_NyGb8rju5CKzmo6KhZXD0Dp21u-BbyCb2aNxLEoSPRJw@mail.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-05 09:25:42 +08:00
|
|
|
data.stateless_rpc = stateless_rpc;
|
|
|
|
data.timeout = timeout;
|
|
|
|
if (data.timeout)
|
|
|
|
data.daemon_mode = 1;
|
2020-05-15 18:04:52 +08:00
|
|
|
|
2024-05-07 15:11:53 +08:00
|
|
|
refs_head_ref_namespaced(get_main_ref_store(the_repository),
|
|
|
|
find_symref, &data.symref);
|
2007-06-07 15:04:01 +08:00
|
|
|
|
serve.[ch]: remove "serve_options", split up --advertise-refs code
The "advertise capabilities" mode of serve.c added in
ed10cb952d3 (serve: introduce git-serve, 2018-03-15) is only used by
the http-backend.c to call {upload,receive}-pack with the
--advertise-refs parameter. See 42526b478e3 (Add stateless RPC options
to upload-pack, receive-pack, 2009-10-30).
Let's just make cmd_upload_pack() take the two (v2) or three (v2)
parameters the the v2/v1 servicing functions need directly, and pass
those in via the function signature. The logic of whether daemon mode
is implied by the timeout belongs in the v1 function (only used
there).
Once we split up the "advertise v2 refs" from "serve v2 request" it
becomes clear that v2 never cared about those in combination. The only
time it mattered was for v1 to emit its ref advertisement, in that
case we wanted to emit the smart-http-only "no-done" capability.
Since we only do that in the --advertise-refs codepath let's just have
it set "do_done" itself in v1's upload_pack() just before send_ref(),
at that point --advertise-refs and --stateless-rpc in combination are
redundant (the only user is get_info_refs() in http-backend.c), so we
can just pass in --advertise-refs only.
Since we need to touch all the serve() and advertise_capabilities()
codepaths let's rename them to less clever and obvious names, it's
been suggested numerous times, the latest of which is [1]'s suggestion
for protocol_v2_serve_loop(). Let's go with that.
1. https://lore.kernel.org/git/CAFQ2z_NyGb8rju5CKzmo6KhZXD0Dp21u-BbyCb2aNxLEoSPRJw@mail.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-05 09:25:42 +08:00
|
|
|
if (advertise_refs || !data.stateless_rpc) {
|
2020-06-05 01:54:40 +08:00
|
|
|
reset_timeout(data.timeout);
|
serve.[ch]: remove "serve_options", split up --advertise-refs code
The "advertise capabilities" mode of serve.c added in
ed10cb952d3 (serve: introduce git-serve, 2018-03-15) is only used by
the http-backend.c to call {upload,receive}-pack with the
--advertise-refs parameter. See 42526b478e3 (Add stateless RPC options
to upload-pack, receive-pack, 2009-10-30).
Let's just make cmd_upload_pack() take the two (v2) or three (v2)
parameters the the v2/v1 servicing functions need directly, and pass
those in via the function signature. The logic of whether daemon mode
is implied by the timeout belongs in the v1 function (only used
there).
Once we split up the "advertise v2 refs" from "serve v2 request" it
becomes clear that v2 never cared about those in combination. The only
time it mattered was for v1 to emit its ref advertisement, in that
case we wanted to emit the smart-http-only "no-done" capability.
Since we only do that in the --advertise-refs codepath let's just have
it set "do_done" itself in v1's upload_pack() just before send_ref(),
at that point --advertise-refs and --stateless-rpc in combination are
redundant (the only user is get_info_refs() in http-backend.c), so we
can just pass in --advertise-refs only.
Since we need to touch all the serve() and advertise_capabilities()
codepaths let's rename them to less clever and obvious names, it's
been suggested numerous times, the latest of which is [1]'s suggestion
for protocol_v2_serve_loop(). Let's go with that.
1. https://lore.kernel.org/git/CAFQ2z_NyGb8rju5CKzmo6KhZXD0Dp21u-BbyCb2aNxLEoSPRJw@mail.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-05 09:25:42 +08:00
|
|
|
if (advertise_refs)
|
|
|
|
data.no_done = 1;
|
2024-05-07 15:11:53 +08:00
|
|
|
refs_head_ref_namespaced(get_main_ref_store(the_repository),
|
|
|
|
send_ref, &data);
|
upload-pack.c: avoid enumerating hidden refs where possible
In a similar fashion as a previous commit, teach `upload-pack` to avoid
enumerating hidden references where possible.
Note, however, that there are certain cases where cannot avoid
enumerating even hidden references, in particular when either of:
- `uploadpack.allowTipSHA1InWant`, or
- `uploadpack.allowReachableSHA1InWant`
are set, corresponding to `ALLOW_TIP_SHA1` and `ALLOW_REACHABLE_SHA1`,
respectively.
When either of these bits are set, upload-pack's `is_our_ref()` function
needs to consider the `HIDDEN_REF` bit of the referent's object flags.
So we must visit all references, including the hidden ones, in order to
mark their referents with the `HIDDEN_REF` bit.
When neither `ALLOW_TIP_SHA1` nor `ALLOW_REACHABLE_SHA1` are set, the
`is_our_ref()` function considers only the `OUR_REF` bit, and not the
`HIDDEN_REF` one. `OUR_REF` is applied via `mark_our_ref()`, and only
to objects at the tips of non-hidden references, so we do not need to
visit hidden references in this case.
When neither of those bits are set, `upload-pack` can potentially avoid
enumerating a large number of references. In the same example as a
previous commit (linux.git with one hidden reference per commit,
"refs/pull/N"):
$ printf 0000 >in
$ hyperfine --warmup=1 \
'git -c transfer.hideRefs=refs/pull upload-pack . <in' \
'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in' \
'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in'
Benchmark 1: git -c transfer.hideRefs=refs/pull upload-pack . <in
Time (mean ± σ): 406.9 ms ± 1.1 ms [User: 357.3 ms, System: 49.5 ms]
Range (min … max): 405.7 ms … 409.2 ms 10 runs
Benchmark 2: git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in
Time (mean ± σ): 406.5 ms ± 1.3 ms [User: 356.5 ms, System: 49.9 ms]
Range (min … max): 404.6 ms … 408.8 ms 10 runs
Benchmark 3: git.compile -c transfer.hideRefs=refs/pull upload-pack . <in
Time (mean ± σ): 4.7 ms ± 0.2 ms [User: 0.7 ms, System: 3.9 ms]
Range (min … max): 4.3 ms … 6.1 ms 472 runs
Summary
'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in' ran
86.62 ± 4.33 times faster than 'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in'
86.70 ± 4.33 times faster than 'git -c transfer.hideRefs=refs/pull upload-pack . <in'
As above, we must visit every reference when
uploadPack.allowTipSHA1InWant is set. But when it is unset, we can visit
far fewer references.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-11 05:12:45 +08:00
|
|
|
for_each_namespaced_ref_1(send_ref, &data);
|
upload-pack: advertise capabilities when cloning empty repos
When cloning an empty repository, protocol versions 0 and 1 currently
offer nothing but the header and flush packets for the /info/refs
endpoint. This means that no capabilities are provided, so the client
side doesn't know what capabilities are present.
However, this does pose a problem when working with SHA-256
repositories, since we use the capabilities to know the remote side's
object format (hash algorithm). As of 8b214c2e9d ("clone: propagate
object-format when cloning from void", 2023-04-05), this has been fixed
for protocol v2, since there we always read the hash algorithm from the
remote.
Fortunately, the push version of the protocol already indicates a clue
for how to solve this. When the /info/refs endpoint is accessed for a
push and the remote is empty, we include a dummy "capabilities^{}" ref
pointing to the all-zeros object ID. The protocol documentation already
indicates this should _always_ be sent, even for fetches and clones, so
let's just do that, which means we'll properly announce the hash
algorithm as part of the capabilities. This just works with the
existing code because we share the same ref code for fetches and clones,
and libgit2, JGit, and dulwich do as well.
There is one minor issue to fix, though. If we called send_ref with
namespaces, we would return NULL with the capabilities entry, which
would cause a crash. Instead, let's refactor out a function to print
just the ref itself without stripping the namespace and use it for our
special capabilities entry.
Add several sets of tests for HTTP as well as for local clones. The
behavior can be slightly different for HTTP versus a local or SSH clone
because of the stateless-rpc functionality, so it's worth testing both.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-18 03:24:43 +08:00
|
|
|
if (!data.sent_capabilities) {
|
|
|
|
const char *refname = "capabilities^{}";
|
|
|
|
write_v0_ref(&data, refname, refname, null_oid());
|
|
|
|
}
|
2021-09-01 20:54:42 +08:00
|
|
|
/*
|
|
|
|
* fflush stdout before calling advertise_shallow_grafts because send_ref
|
|
|
|
* uses stdio.
|
|
|
|
*/
|
|
|
|
fflush_or_die(stdout);
|
2018-03-15 02:31:41 +08:00
|
|
|
advertise_shallow_grafts(1);
|
|
|
|
packet_flush(1);
|
|
|
|
} else {
|
2024-05-07 15:11:53 +08:00
|
|
|
refs_head_ref_namespaced(get_main_ref_store(the_repository),
|
|
|
|
check_ref, &data);
|
upload-pack.c: avoid enumerating hidden refs where possible
In a similar fashion as a previous commit, teach `upload-pack` to avoid
enumerating hidden references where possible.
Note, however, that there are certain cases where cannot avoid
enumerating even hidden references, in particular when either of:
- `uploadpack.allowTipSHA1InWant`, or
- `uploadpack.allowReachableSHA1InWant`
are set, corresponding to `ALLOW_TIP_SHA1` and `ALLOW_REACHABLE_SHA1`,
respectively.
When either of these bits are set, upload-pack's `is_our_ref()` function
needs to consider the `HIDDEN_REF` bit of the referent's object flags.
So we must visit all references, including the hidden ones, in order to
mark their referents with the `HIDDEN_REF` bit.
When neither `ALLOW_TIP_SHA1` nor `ALLOW_REACHABLE_SHA1` are set, the
`is_our_ref()` function considers only the `OUR_REF` bit, and not the
`HIDDEN_REF` one. `OUR_REF` is applied via `mark_our_ref()`, and only
to objects at the tips of non-hidden references, so we do not need to
visit hidden references in this case.
When neither of those bits are set, `upload-pack` can potentially avoid
enumerating a large number of references. In the same example as a
previous commit (linux.git with one hidden reference per commit,
"refs/pull/N"):
$ printf 0000 >in
$ hyperfine --warmup=1 \
'git -c transfer.hideRefs=refs/pull upload-pack . <in' \
'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in' \
'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in'
Benchmark 1: git -c transfer.hideRefs=refs/pull upload-pack . <in
Time (mean ± σ): 406.9 ms ± 1.1 ms [User: 357.3 ms, System: 49.5 ms]
Range (min … max): 405.7 ms … 409.2 ms 10 runs
Benchmark 2: git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in
Time (mean ± σ): 406.5 ms ± 1.3 ms [User: 356.5 ms, System: 49.9 ms]
Range (min … max): 404.6 ms … 408.8 ms 10 runs
Benchmark 3: git.compile -c transfer.hideRefs=refs/pull upload-pack . <in
Time (mean ± σ): 4.7 ms ± 0.2 ms [User: 0.7 ms, System: 3.9 ms]
Range (min … max): 4.3 ms … 6.1 ms 472 runs
Summary
'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in' ran
86.62 ± 4.33 times faster than 'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in'
86.70 ± 4.33 times faster than 'git -c transfer.hideRefs=refs/pull upload-pack . <in'
As above, we must visit every reference when
uploadPack.allowTipSHA1InWant is set. But when it is unset, we can visit
far fewer references.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-11 05:12:45 +08:00
|
|
|
for_each_namespaced_ref_1(check_ref, &data);
|
2017-10-17 01:55:26 +08:00
|
|
|
}
|
2018-12-30 05:19:14 +08:00
|
|
|
|
serve.[ch]: remove "serve_options", split up --advertise-refs code
The "advertise capabilities" mode of serve.c added in
ed10cb952d3 (serve: introduce git-serve, 2018-03-15) is only used by
the http-backend.c to call {upload,receive}-pack with the
--advertise-refs parameter. See 42526b478e3 (Add stateless RPC options
to upload-pack, receive-pack, 2009-10-30).
Let's just make cmd_upload_pack() take the two (v2) or three (v2)
parameters the the v2/v1 servicing functions need directly, and pass
those in via the function signature. The logic of whether daemon mode
is implied by the timeout belongs in the v1 function (only used
there).
Once we split up the "advertise v2 refs" from "serve v2 request" it
becomes clear that v2 never cared about those in combination. The only
time it mattered was for v1 to emit its ref advertisement, in that
case we wanted to emit the smart-http-only "no-done" capability.
Since we only do that in the --advertise-refs codepath let's just have
it set "do_done" itself in v1's upload_pack() just before send_ref(),
at that point --advertise-refs and --stateless-rpc in combination are
redundant (the only user is get_info_refs() in http-backend.c), so we
can just pass in --advertise-refs only.
Since we need to touch all the serve() and advertise_capabilities()
codepaths let's rename them to less clever and obvious names, it's
been suggested numerous times, the latest of which is [1]'s suggestion
for protocol_v2_serve_loop(). Let's go with that.
1. https://lore.kernel.org/git/CAFQ2z_NyGb8rju5CKzmo6KhZXD0Dp21u-BbyCb2aNxLEoSPRJw@mail.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-05 09:25:42 +08:00
|
|
|
if (!advertise_refs) {
|
2020-05-15 18:04:45 +08:00
|
|
|
packet_reader_init(&reader, 0, NULL, 0,
|
|
|
|
PACKET_READ_CHOMP_NEWLINE |
|
|
|
|
PACKET_READ_DIE_ON_ERR_PACKET);
|
|
|
|
|
2020-05-15 18:04:47 +08:00
|
|
|
receive_needs(&data, &reader);
|
2020-10-31 10:39:02 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* An EOF at this exact point in negotiation should be
|
|
|
|
* acceptable from stateless clients as they will consume the
|
|
|
|
* shallow list before doing subsequent rpc with haves/etc.
|
|
|
|
*/
|
|
|
|
if (data.stateless_rpc)
|
|
|
|
reader.options |= PACKET_READ_GENTLE_ON_EOF;
|
|
|
|
|
|
|
|
if (data.want_obj.nr &&
|
|
|
|
packet_reader_peek(&reader) != PACKET_READ_EOF) {
|
|
|
|
reader.options &= ~PACKET_READ_GENTLE_ON_EOF;
|
2020-05-15 18:04:46 +08:00
|
|
|
get_common_commits(&data, &reader);
|
2020-06-16 07:00:20 +08:00
|
|
|
create_pack_file(&data, NULL);
|
2020-05-15 18:04:45 +08:00
|
|
|
}
|
2018-03-15 02:31:41 +08:00
|
|
|
}
|
upload-pack: clear filter_options for each v2 fetch command
Because of the request/response model of protocol v2, the
upload_pack_v2() function is sometimes called twice in the same
process, while 'struct list_objects_filter_options filter_options'
was declared as static at the beginning of 'upload-pack.c'.
This made the check in list_objects_filter_die_if_populated(), which
is called by process_args(), fail the second time upload_pack_v2() is
called, as filter_options had already been populated the first time.
To fix that, filter_options is not static any more. It's now owned
directly by upload_pack(). It's now also part of 'struct
upload_pack_data', so that it's owned indirectly by upload_pack_v2().
In the long term, the goal is to also have upload_pack() use
'struct upload_pack_data', so adding filter_options to this struct
makes more sense than to have it owned directly by upload_pack_v2().
This fixes the first of the 2 bugs documented by d0badf8797
(partial-clone: demonstrate bugs in partial fetch, 2020-02-21).
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-08 16:01:15 +08:00
|
|
|
|
2020-05-15 18:04:45 +08:00
|
|
|
upload_pack_data_clear(&data);
|
2005-07-05 04:26:53 +08:00
|
|
|
}
|
2008-02-12 19:28:01 +08:00
|
|
|
|
2019-01-16 03:40:27 +08:00
|
|
|
static int parse_want(struct packet_writer *writer, const char *line,
|
|
|
|
struct object_array *want_obj)
|
2018-03-16 01:31:27 +08:00
|
|
|
{
|
|
|
|
const char *arg;
|
|
|
|
if (skip_prefix(line, "want ", &arg)) {
|
|
|
|
struct object_id oid;
|
|
|
|
struct object *o;
|
|
|
|
|
|
|
|
if (get_oid_hex(arg, &oid))
|
|
|
|
die("git upload-pack: protocol error, "
|
|
|
|
"expected to get oid, not '%s'", line);
|
|
|
|
|
2022-09-07 07:06:25 +08:00
|
|
|
o = parse_object_with_flags(the_repository, &oid,
|
upload-pack: free tree buffers after parsing
When a client sends us a "want" or "have" line, we call parse_object()
to get an object struct. If the object is a tree, then the parsed state
means that tree->buffer points to the uncompressed contents of the tree.
But we don't really care about it. We only really need to parse commits
and tags; for trees and blobs, the important output is just a "struct
object" with the correct type.
But much worse, we do not ever free that tree buffer. It's not leaked in
the traditional sense, in that we still have a pointer to it from the
global object hash. But if the client requests many trees, we'll hold
all of their contents in memory at the same time.
Nobody really noticed because it's rare for clients to directly request
a tree. It might happen for a lightweight tag pointing straight at a
tree, or it might happen for a "tree:depth" partial clone filling in
missing trees.
But it's also possible for a malicious client to request a lot of trees,
causing upload-pack's memory to balloon. For example, without this
patch, requesting every tree in git.git like:
pktline() {
local msg="$*"
printf "%04x%s\n" $((1+4+${#msg})) "$msg"
}
want_trees() {
pktline command=fetch
printf 0001
git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' |
while read oid type; do
test "$type" = "tree" || continue
pktline want $oid
done
pktline done
printf 0000
}
want_trees | GIT_PROTOCOL=version=2 valgrind --tool=massif ./git upload-pack . >/dev/null
shows a peak heap usage of ~3.7GB. Which is just about the sum of the
sizes of all of the uncompressed trees. For linux.git, it's closer to
17GB.
So the obvious thing to do is to call free_tree_buffer() after we
realize that we've parsed a tree. We know that upload-pack won't need it
later. But let's push the logic into parse_object_with_flags(), telling
it to discard the tree buffer immediately. There are two reasons for
this. One, all of the relevant call-sites already call the with_options
variant to pass the SKIP_HASH flag. So it actually ends up as less code
than manually free-ing in each spot. And two, it enables an extra
optimization that I'll discuss below.
I've touched all of the sites that currently use SKIP_HASH in
upload-pack. That drops the peak heap of the upload-pack invocation
above from 3.7GB to ~24MB.
I've also modified the caller in get_reference(); a partial clone
benefits from its use in pack-objects for the reasons given in
0bc2557951 (upload-pack: skip parse-object re-hashing of "want" objects,
2022-09-06), where we were measuring blob requests. But note that the
results of get_reference() are used for traversing, as well; so we
really would _eventually_ use the tree contents. That makes this at
first glance a space/time tradeoff: we won't hold all of the trees in
memory at once, but we'll have to reload them each when it comes time to
traverse.
And here's where our extra optimization comes in. If the caller is not
going to immediately look at the tree contents, and it doesn't care
about checking the hash, then parse_object() can simply skip loading the
tree entirely, just like we do for blobs! And now it's not a space/time
tradeoff in get_reference() anymore. It's just a lazy-load: we're
delaying reading the tree contents until it's time to actually traverse
them one by one.
And of course for upload-pack, this optimization means we never load the
trees at all, saving lots of CPU time. Timing the "every tree from
git.git" request above shows upload-pack dropping from 32 seconds of CPU
to 19 (the remainder is mostly due to pack-objects actually sending the
pack; timing just the upload-pack portion shows we go from 13s to
~0.28s).
These are all highly gamed numbers, of course. For real-world
partial-clone requests we're saving only a small bit of time in
practice. But it does help harden upload-pack against malicious
denial-of-service attacks.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:39:07 +08:00
|
|
|
PARSE_OBJECT_SKIP_HASH_CHECK |
|
|
|
|
PARSE_OBJECT_DISCARD_TREE);
|
2022-03-01 17:33:37 +08:00
|
|
|
|
2018-03-16 01:31:27 +08:00
|
|
|
if (!o) {
|
2019-01-16 03:40:27 +08:00
|
|
|
packet_writer_error(writer,
|
|
|
|
"upload-pack: not our ref %s",
|
|
|
|
oid_to_hex(&oid));
|
2018-03-16 01:31:27 +08:00
|
|
|
die("git upload-pack: not our ref %s",
|
|
|
|
oid_to_hex(&oid));
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!(o->flags & WANTED)) {
|
|
|
|
o->flags |= WANTED;
|
2018-10-19 04:43:28 +08:00
|
|
|
add_object_array(o, NULL, want_obj);
|
2018-03-16 01:31:27 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-01-16 03:40:27 +08:00
|
|
|
static int parse_want_ref(struct packet_writer *writer, const char *line,
|
upload-pack: use a strmap for want-ref lines
When the "ref-in-want" capability is advertised (which it is not by
default), then upload-pack processes a "want-ref" line from the client
by checking that the name is a valid ref and recording it in a
string-list.
In theory this list should grow no larger than the number of refs in the
server-side repository. But since we don't do any de-duplication, a
client which sends "want-ref refs/heads/foo" over and over will cause
the array to grow without bound.
We can fix this by switching to strmap, which efficiently detects
duplicates. There are two client-visible changes here:
1. The "wanted-refs" response will now be in an apparently-random
order (based on iterating the hashmap) rather than the order given
by the client. The protocol documentation is quiet on ordering
here. The current fetch-pack implementation is happy with any
order, as it looks up each returned ref using a binary search in
its local sorted list. JGit seems to implement want-ref on the
server side, but has no client-side support. libgit2 doesn't
support either side.
It would obviously be possible to record the original order or to
use the strmap as an auxiliary data structure. But if the client
doesn't care, we may as well do the simplest thing.
2. We'll now reject duplicates explicitly as a protocol error. The
client should never send them (and our current implementation, even
when asked to "git fetch master:one master:two" will de-dup on the
client side).
If we wanted to be more forgiving, we could perhaps just throw away
the duplicates. But then our "wanted-refs" response back to the
client would omit the duplicates, and it's hard to say what a
client that accidentally sent a duplicate would do with that. So I
think we're better off to complain loudly before anybody
accidentally writes such a client.
Let's also add a note to the protocol documentation clarifying that
duplicates are forbidden. As discussed above, this was already the
intent, but it's not very explicit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:38:40 +08:00
|
|
|
struct strmap *wanted_refs,
|
2023-07-11 05:12:33 +08:00
|
|
|
struct strvec *hidden_refs,
|
2018-10-19 04:43:28 +08:00
|
|
|
struct object_array *want_obj)
|
2018-06-28 06:30:17 +08:00
|
|
|
{
|
2021-08-13 14:23:50 +08:00
|
|
|
const char *refname_nons;
|
|
|
|
if (skip_prefix(line, "want-ref ", &refname_nons)) {
|
2018-06-28 06:30:17 +08:00
|
|
|
struct object_id oid;
|
2022-03-01 17:33:37 +08:00
|
|
|
struct object *o = NULL;
|
2021-08-13 14:23:50 +08:00
|
|
|
struct strbuf refname = STRBUF_INIT;
|
2018-06-28 06:30:17 +08:00
|
|
|
|
2021-08-13 14:23:50 +08:00
|
|
|
strbuf_addf(&refname, "%s%s", get_git_namespace(), refname_nons);
|
2022-11-17 13:46:43 +08:00
|
|
|
if (ref_is_hidden(refname_nons, refname.buf, hidden_refs) ||
|
2024-05-07 15:11:53 +08:00
|
|
|
refs_read_ref(get_main_ref_store(the_repository), refname.buf, &oid)) {
|
2021-08-13 14:23:50 +08:00
|
|
|
packet_writer_error(writer, "unknown ref %s", refname_nons);
|
|
|
|
die("unknown ref %s", refname_nons);
|
2018-06-28 06:30:17 +08:00
|
|
|
}
|
2021-08-13 14:23:50 +08:00
|
|
|
strbuf_release(&refname);
|
2018-06-28 06:30:17 +08:00
|
|
|
|
upload-pack: use a strmap for want-ref lines
When the "ref-in-want" capability is advertised (which it is not by
default), then upload-pack processes a "want-ref" line from the client
by checking that the name is a valid ref and recording it in a
string-list.
In theory this list should grow no larger than the number of refs in the
server-side repository. But since we don't do any de-duplication, a
client which sends "want-ref refs/heads/foo" over and over will cause
the array to grow without bound.
We can fix this by switching to strmap, which efficiently detects
duplicates. There are two client-visible changes here:
1. The "wanted-refs" response will now be in an apparently-random
order (based on iterating the hashmap) rather than the order given
by the client. The protocol documentation is quiet on ordering
here. The current fetch-pack implementation is happy with any
order, as it looks up each returned ref using a binary search in
its local sorted list. JGit seems to implement want-ref on the
server side, but has no client-side support. libgit2 doesn't
support either side.
It would obviously be possible to record the original order or to
use the strmap as an auxiliary data structure. But if the client
doesn't care, we may as well do the simplest thing.
2. We'll now reject duplicates explicitly as a protocol error. The
client should never send them (and our current implementation, even
when asked to "git fetch master:one master:two" will de-dup on the
client side).
If we wanted to be more forgiving, we could perhaps just throw away
the duplicates. But then our "wanted-refs" response back to the
client would omit the duplicates, and it's hard to say what a
client that accidentally sent a duplicate would do with that. So I
think we're better off to complain loudly before anybody
accidentally writes such a client.
Let's also add a note to the protocol documentation clarifying that
duplicates are forbidden. As discussed above, this was already the
intent, but it's not very explicit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:38:40 +08:00
|
|
|
if (strmap_put(wanted_refs, refname_nons, oiddup(&oid))) {
|
|
|
|
packet_writer_error(writer, "duplicate want-ref %s",
|
|
|
|
refname_nons);
|
|
|
|
die("duplicate want-ref %s", refname_nons);
|
|
|
|
}
|
2018-06-28 06:30:17 +08:00
|
|
|
|
2022-03-01 17:33:37 +08:00
|
|
|
if (!starts_with(refname_nons, "refs/tags/")) {
|
|
|
|
struct commit *commit = lookup_commit_in_graph(the_repository, &oid);
|
|
|
|
if (commit)
|
|
|
|
o = &commit->object;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!o)
|
|
|
|
o = parse_object_or_die(&oid, refname_nons);
|
|
|
|
|
2018-06-28 06:30:17 +08:00
|
|
|
if (!(o->flags & WANTED)) {
|
|
|
|
o->flags |= WANTED;
|
2018-10-19 04:43:28 +08:00
|
|
|
add_object_array(o, NULL, want_obj);
|
2018-06-28 06:30:17 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
upload-pack: drop separate v2 "haves" array
When upload-pack sees a "have" line in the v0 protocol, it immediately
calls got_oid() with its argument and potentially produces an ACK
response. In the v2 protocol, we simply record the argument in an
oid_array, and only later process all of the "have" objects by calling
the equivalent of got_oid() on the contents of the array.
This makes some sense, as v2 is a pure request/response protocol, as
opposed to v0's asynchronous negotiation phase. But there's a downside:
a client can send us an infinite number of garbage "have" lines, which
we'll happily slurp into the array, consuming memory. Whereas in v0,
they are limited by the number of objects in the repository (because
got_oid() only records objects we have ourselves, and we avoid
duplicates by setting a flag on the object struct).
We can make v2 behave more like v0 by also calling got_oid() directly
when v2 parses a "have" line. Calling it early like this is OK because
got_oid() itself does not interact with the client; it only confirms
that we have the object and sets a few flags. Note that unlike v0, v2
does not ever (before or after this patch) check the return code of
got_oid(), which lets the caller know whether we have the object. But
again, that makes sense; v0 is using it to asynchronously tell the
client to stop sending. In v2's synchronous protocol, we just discard
those entries (and decide how to ACK at the end of each round).
There is one slight tweak we need, though. In v2's state machine, we
reach the SEND_ACKS state if the other side sent us any "have" lines,
whether they were useful or not. Right now we do that by checking
whether the "have" array had any entries, but if we record only the
useful ones, that doesn't work. Instead, we can add a simple boolean
that tells us whether we saw any have line (even if it was useless).
This lets us drop the "haves" array entirely, as we're now placing
objects directly into the "have_obj" object array (which is where
got_oid() put them in the long run anyway). And as a bonus, we can drop
the secondary "common" array used in process_haves_and_send_acks(). It
was essentially a copy of "haves" minus the objects we do not have. But
now that we are using "have_obj" directly, we know everything in it is
useful. So in addition to protecting ourselves against malicious input,
we should slightly lower our memory usage for normal inputs.
Note that there is one user-visible effect. The trace2 output records
the number of "haves". Previously this was the total number of "have"
lines we saw, but now is the number of useful ones. We could retain the
original meaning by keeping a separate counter, but it doesn't seem
worth the effort; this trace info is for debugging and metrics, and
arguably the count of common oids is at least as useful as the total
count.
Reported-by: Benjamin Flesch <benjaminflesch@icloud.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:37:13 +08:00
|
|
|
static int parse_have(const char *line, struct upload_pack_data *data)
|
2018-03-16 01:31:27 +08:00
|
|
|
{
|
|
|
|
const char *arg;
|
|
|
|
if (skip_prefix(line, "have ", &arg)) {
|
|
|
|
struct object_id oid;
|
|
|
|
|
upload-pack: drop separate v2 "haves" array
When upload-pack sees a "have" line in the v0 protocol, it immediately
calls got_oid() with its argument and potentially produces an ACK
response. In the v2 protocol, we simply record the argument in an
oid_array, and only later process all of the "have" objects by calling
the equivalent of got_oid() on the contents of the array.
This makes some sense, as v2 is a pure request/response protocol, as
opposed to v0's asynchronous negotiation phase. But there's a downside:
a client can send us an infinite number of garbage "have" lines, which
we'll happily slurp into the array, consuming memory. Whereas in v0,
they are limited by the number of objects in the repository (because
got_oid() only records objects we have ourselves, and we avoid
duplicates by setting a flag on the object struct).
We can make v2 behave more like v0 by also calling got_oid() directly
when v2 parses a "have" line. Calling it early like this is OK because
got_oid() itself does not interact with the client; it only confirms
that we have the object and sets a few flags. Note that unlike v0, v2
does not ever (before or after this patch) check the return code of
got_oid(), which lets the caller know whether we have the object. But
again, that makes sense; v0 is using it to asynchronously tell the
client to stop sending. In v2's synchronous protocol, we just discard
those entries (and decide how to ACK at the end of each round).
There is one slight tweak we need, though. In v2's state machine, we
reach the SEND_ACKS state if the other side sent us any "have" lines,
whether they were useful or not. Right now we do that by checking
whether the "have" array had any entries, but if we record only the
useful ones, that doesn't work. Instead, we can add a simple boolean
that tells us whether we saw any have line (even if it was useless).
This lets us drop the "haves" array entirely, as we're now placing
objects directly into the "have_obj" object array (which is where
got_oid() put them in the long run anyway). And as a bonus, we can drop
the secondary "common" array used in process_haves_and_send_acks(). It
was essentially a copy of "haves" minus the objects we do not have. But
now that we are using "have_obj" directly, we know everything in it is
useful. So in addition to protecting ourselves against malicious input,
we should slightly lower our memory usage for normal inputs.
Note that there is one user-visible effect. The trace2 output records
the number of "haves". Previously this was the total number of "have"
lines we saw, but now is the number of useful ones. We could retain the
original meaning by keeping a separate counter, but it doesn't seem
worth the effort; this trace info is for debugging and metrics, and
arguably the count of common oids is at least as useful as the total
count.
Reported-by: Benjamin Flesch <benjaminflesch@icloud.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:37:13 +08:00
|
|
|
got_oid(data, arg, &oid);
|
|
|
|
data->seen_haves = 1;
|
2018-03-16 01:31:27 +08:00
|
|
|
return 1;
|
2017-10-17 01:55:26 +08:00
|
|
|
}
|
|
|
|
|
2005-07-05 04:26:53 +08:00
|
|
|
return 0;
|
|
|
|
}
|
2018-03-16 01:31:27 +08:00
|
|
|
|
2023-10-18 05:12:47 +08:00
|
|
|
static void trace2_fetch_info(struct upload_pack_data *data)
|
|
|
|
{
|
|
|
|
struct json_writer jw = JSON_WRITER_INIT;
|
|
|
|
|
|
|
|
jw_object_begin(&jw, 0);
|
upload-pack: drop separate v2 "haves" array
When upload-pack sees a "have" line in the v0 protocol, it immediately
calls got_oid() with its argument and potentially produces an ACK
response. In the v2 protocol, we simply record the argument in an
oid_array, and only later process all of the "have" objects by calling
the equivalent of got_oid() on the contents of the array.
This makes some sense, as v2 is a pure request/response protocol, as
opposed to v0's asynchronous negotiation phase. But there's a downside:
a client can send us an infinite number of garbage "have" lines, which
we'll happily slurp into the array, consuming memory. Whereas in v0,
they are limited by the number of objects in the repository (because
got_oid() only records objects we have ourselves, and we avoid
duplicates by setting a flag on the object struct).
We can make v2 behave more like v0 by also calling got_oid() directly
when v2 parses a "have" line. Calling it early like this is OK because
got_oid() itself does not interact with the client; it only confirms
that we have the object and sets a few flags. Note that unlike v0, v2
does not ever (before or after this patch) check the return code of
got_oid(), which lets the caller know whether we have the object. But
again, that makes sense; v0 is using it to asynchronously tell the
client to stop sending. In v2's synchronous protocol, we just discard
those entries (and decide how to ACK at the end of each round).
There is one slight tweak we need, though. In v2's state machine, we
reach the SEND_ACKS state if the other side sent us any "have" lines,
whether they were useful or not. Right now we do that by checking
whether the "have" array had any entries, but if we record only the
useful ones, that doesn't work. Instead, we can add a simple boolean
that tells us whether we saw any have line (even if it was useless).
This lets us drop the "haves" array entirely, as we're now placing
objects directly into the "have_obj" object array (which is where
got_oid() put them in the long run anyway). And as a bonus, we can drop
the secondary "common" array used in process_haves_and_send_acks(). It
was essentially a copy of "haves" minus the objects we do not have. But
now that we are using "have_obj" directly, we know everything in it is
useful. So in addition to protecting ourselves against malicious input,
we should slightly lower our memory usage for normal inputs.
Note that there is one user-visible effect. The trace2 output records
the number of "haves". Previously this was the total number of "have"
lines we saw, but now is the number of useful ones. We could retain the
original meaning by keeping a separate counter, but it doesn't seem
worth the effort; this trace info is for debugging and metrics, and
arguably the count of common oids is at least as useful as the total
count.
Reported-by: Benjamin Flesch <benjaminflesch@icloud.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:37:13 +08:00
|
|
|
jw_object_intmax(&jw, "haves", data->have_obj.nr);
|
2023-10-18 05:12:47 +08:00
|
|
|
jw_object_intmax(&jw, "wants", data->want_obj.nr);
|
upload-pack: use a strmap for want-ref lines
When the "ref-in-want" capability is advertised (which it is not by
default), then upload-pack processes a "want-ref" line from the client
by checking that the name is a valid ref and recording it in a
string-list.
In theory this list should grow no larger than the number of refs in the
server-side repository. But since we don't do any de-duplication, a
client which sends "want-ref refs/heads/foo" over and over will cause
the array to grow without bound.
We can fix this by switching to strmap, which efficiently detects
duplicates. There are two client-visible changes here:
1. The "wanted-refs" response will now be in an apparently-random
order (based on iterating the hashmap) rather than the order given
by the client. The protocol documentation is quiet on ordering
here. The current fetch-pack implementation is happy with any
order, as it looks up each returned ref using a binary search in
its local sorted list. JGit seems to implement want-ref on the
server side, but has no client-side support. libgit2 doesn't
support either side.
It would obviously be possible to record the original order or to
use the strmap as an auxiliary data structure. But if the client
doesn't care, we may as well do the simplest thing.
2. We'll now reject duplicates explicitly as a protocol error. The
client should never send them (and our current implementation, even
when asked to "git fetch master:one master:two" will de-dup on the
client side).
If we wanted to be more forgiving, we could perhaps just throw away
the duplicates. But then our "wanted-refs" response back to the
client would omit the duplicates, and it's hard to say what a
client that accidentally sent a duplicate would do with that. So I
think we're better off to complain loudly before anybody
accidentally writes such a client.
Let's also add a note to the protocol documentation clarifying that
duplicates are forbidden. As discussed above, this was already the
intent, but it's not very explicit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:38:40 +08:00
|
|
|
jw_object_intmax(&jw, "want-refs", strmap_get_size(&data->wanted_refs));
|
2023-10-18 05:12:47 +08:00
|
|
|
jw_object_intmax(&jw, "depth", data->depth);
|
|
|
|
jw_object_intmax(&jw, "shallows", data->shallows.nr);
|
|
|
|
jw_object_bool(&jw, "deepen-since", data->deepen_since);
|
2024-02-29 06:37:44 +08:00
|
|
|
jw_object_intmax(&jw, "deepen-not", oidset_size(&data->deepen_not));
|
2023-10-18 05:12:47 +08:00
|
|
|
jw_object_bool(&jw, "deepen-relative", data->deepen_relative);
|
|
|
|
if (data->filter_options.choice)
|
|
|
|
jw_object_string(&jw, "filter", list_object_filter_config_name(data->filter_options.choice));
|
|
|
|
else
|
|
|
|
jw_object_null(&jw, "filter");
|
|
|
|
jw_end(&jw);
|
|
|
|
|
|
|
|
trace2_data_json("upload-pack", the_repository, "fetch-info", &jw);
|
|
|
|
|
|
|
|
jw_release(&jw);
|
|
|
|
}
|
|
|
|
|
2018-03-16 01:31:27 +08:00
|
|
|
static void process_args(struct packet_reader *request,
|
2020-05-15 18:04:43 +08:00
|
|
|
struct upload_pack_data *data)
|
2018-03-16 01:31:27 +08:00
|
|
|
{
|
2020-03-27 16:03:38 +08:00
|
|
|
while (packet_reader_read(request) == PACKET_READ_NORMAL) {
|
2018-03-16 01:31:27 +08:00
|
|
|
const char *arg = request->line;
|
2018-05-04 07:46:56 +08:00
|
|
|
const char *p;
|
2018-03-16 01:31:27 +08:00
|
|
|
|
|
|
|
/* process want */
|
2020-05-15 18:04:43 +08:00
|
|
|
if (parse_want(&data->writer, arg, &data->want_obj))
|
2018-03-16 01:31:27 +08:00
|
|
|
continue;
|
2020-06-05 01:54:48 +08:00
|
|
|
if (data->allow_ref_in_want &&
|
2019-01-16 03:40:27 +08:00
|
|
|
parse_want_ref(&data->writer, arg, &data->wanted_refs,
|
2022-11-17 13:46:43 +08:00
|
|
|
&data->hidden_refs, &data->want_obj))
|
2018-06-28 06:30:17 +08:00
|
|
|
continue;
|
2018-03-16 01:31:27 +08:00
|
|
|
/* process have line */
|
upload-pack: drop separate v2 "haves" array
When upload-pack sees a "have" line in the v0 protocol, it immediately
calls got_oid() with its argument and potentially produces an ACK
response. In the v2 protocol, we simply record the argument in an
oid_array, and only later process all of the "have" objects by calling
the equivalent of got_oid() on the contents of the array.
This makes some sense, as v2 is a pure request/response protocol, as
opposed to v0's asynchronous negotiation phase. But there's a downside:
a client can send us an infinite number of garbage "have" lines, which
we'll happily slurp into the array, consuming memory. Whereas in v0,
they are limited by the number of objects in the repository (because
got_oid() only records objects we have ourselves, and we avoid
duplicates by setting a flag on the object struct).
We can make v2 behave more like v0 by also calling got_oid() directly
when v2 parses a "have" line. Calling it early like this is OK because
got_oid() itself does not interact with the client; it only confirms
that we have the object and sets a few flags. Note that unlike v0, v2
does not ever (before or after this patch) check the return code of
got_oid(), which lets the caller know whether we have the object. But
again, that makes sense; v0 is using it to asynchronously tell the
client to stop sending. In v2's synchronous protocol, we just discard
those entries (and decide how to ACK at the end of each round).
There is one slight tweak we need, though. In v2's state machine, we
reach the SEND_ACKS state if the other side sent us any "have" lines,
whether they were useful or not. Right now we do that by checking
whether the "have" array had any entries, but if we record only the
useful ones, that doesn't work. Instead, we can add a simple boolean
that tells us whether we saw any have line (even if it was useless).
This lets us drop the "haves" array entirely, as we're now placing
objects directly into the "have_obj" object array (which is where
got_oid() put them in the long run anyway). And as a bonus, we can drop
the secondary "common" array used in process_haves_and_send_acks(). It
was essentially a copy of "haves" minus the objects we do not have. But
now that we are using "have_obj" directly, we know everything in it is
useful. So in addition to protecting ourselves against malicious input,
we should slightly lower our memory usage for normal inputs.
Note that there is one user-visible effect. The trace2 output records
the number of "haves". Previously this was the total number of "have"
lines we saw, but now is the number of useful ones. We could retain the
original meaning by keeping a separate counter, but it doesn't seem
worth the effort; this trace info is for debugging and metrics, and
arguably the count of common oids is at least as useful as the total
count.
Reported-by: Benjamin Flesch <benjaminflesch@icloud.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:37:13 +08:00
|
|
|
if (parse_have(arg, data))
|
2018-03-16 01:31:27 +08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
/* process args like thin-pack */
|
|
|
|
if (!strcmp(arg, "thin-pack")) {
|
2020-06-05 01:54:38 +08:00
|
|
|
data->use_thin_pack = 1;
|
2018-03-16 01:31:27 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
if (!strcmp(arg, "ofs-delta")) {
|
2020-06-05 01:54:38 +08:00
|
|
|
data->use_ofs_delta = 1;
|
2018-03-16 01:31:27 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
if (!strcmp(arg, "no-progress")) {
|
2020-06-05 01:54:38 +08:00
|
|
|
data->no_progress = 1;
|
2018-03-16 01:31:27 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
if (!strcmp(arg, "include-tag")) {
|
2020-06-05 01:54:38 +08:00
|
|
|
data->use_include_tag = 1;
|
2018-03-16 01:31:27 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
if (!strcmp(arg, "done")) {
|
|
|
|
data->done = 1;
|
|
|
|
continue;
|
|
|
|
}
|
fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 05:16:01 +08:00
|
|
|
if (!strcmp(arg, "wait-for-done")) {
|
|
|
|
data->wait_for_done = 1;
|
|
|
|
continue;
|
|
|
|
}
|
2018-03-16 01:31:27 +08:00
|
|
|
|
2018-03-16 01:31:28 +08:00
|
|
|
/* Shallow related arguments */
|
|
|
|
if (process_shallow(arg, &data->shallows))
|
|
|
|
continue;
|
|
|
|
if (process_deepen(arg, &data->depth))
|
|
|
|
continue;
|
|
|
|
if (process_deepen_since(arg, &data->deepen_since,
|
|
|
|
&data->deepen_rev_list))
|
|
|
|
continue;
|
|
|
|
if (process_deepen_not(arg, &data->deepen_not,
|
|
|
|
&data->deepen_rev_list))
|
|
|
|
continue;
|
|
|
|
if (!strcmp(arg, "deepen-relative")) {
|
|
|
|
data->deepen_relative = 1;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2020-06-05 01:54:47 +08:00
|
|
|
if (data->allow_filter && skip_prefix(arg, "filter ", &p)) {
|
upload-pack: clear filter_options for each v2 fetch command
Because of the request/response model of protocol v2, the
upload_pack_v2() function is sometimes called twice in the same
process, while 'struct list_objects_filter_options filter_options'
was declared as static at the beginning of 'upload-pack.c'.
This made the check in list_objects_filter_die_if_populated(), which
is called by process_args(), fail the second time upload_pack_v2() is
called, as filter_options had already been populated the first time.
To fix that, filter_options is not static any more. It's now owned
directly by upload_pack(). It's now also part of 'struct
upload_pack_data', so that it's owned indirectly by upload_pack_v2().
In the long term, the goal is to also have upload_pack() use
'struct upload_pack_data', so adding filter_options to this struct
makes more sense than to have it owned directly by upload_pack_v2().
This fixes the first of the 2 bugs documented by d0badf8797
(partial-clone: demonstrate bugs in partial fetch, 2020-02-21).
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-08 16:01:15 +08:00
|
|
|
list_objects_filter_die_if_populated(&data->filter_options);
|
|
|
|
parse_list_objects_filter(&data->filter_options, p);
|
upload-pack.c: allow banning certain object filter(s)
Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.
However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).
Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing two new configuration variables:
- 'uploadpackfilter.allow'
- 'uploadpackfilter.<kind>.allow'
where '<kind>' may be one of 'blobNone', 'blobLimit', 'tree', and so on.
Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.
If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpackfilter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04 02:00:10 +08:00
|
|
|
die_if_using_banned_filter(data);
|
2018-05-04 07:46:56 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2024-02-29 06:47:18 +08:00
|
|
|
if (data->allow_sideband_all &&
|
2019-01-17 03:28:15 +08:00
|
|
|
!strcmp(arg, "sideband-all")) {
|
2019-01-17 03:28:14 +08:00
|
|
|
data->writer.use_sideband = 1;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2024-02-29 06:50:50 +08:00
|
|
|
if (data->allow_packfile_uris &&
|
|
|
|
skip_prefix(arg, "packfile-uris ", &p)) {
|
upload-pack: accept only a single packfile-uri line
When we see a packfile-uri line from the client, we use
string_list_split() to split it on commas and store the result in a
string_list. A single packfile-uri line is therefore limited to storing
~64kb, the size of a pkt-line.
But we'll happily accept multiple such lines, and each line appends to
the string list, growing without bound.
In theory this could be useful, making:
0017packfile-uris http
0018packfile-uris https
equivalent to:
001dpackfile-uris http,https
But the protocol documentation doesn't indicate that this should work
(and indeed, refers to this in the singular as "the following argument
can be included in the client's request"). And the client-side
implementation in fetch-pack has always sent a single line (JGit appears
to understand the line on the server side but has no client-side
implementation, and libgit2 understands neither).
If we were worried about compatibility, we could instead just put a
limit on the maximum number of values we'd accept. The current client
implementation limits itself to only two values: "http" and "https", so
something like "256" would be more than enough. But accepting only a
single line seems more in line with the protocol documentation, and
matches other parts of the protocol (e.g., we will not accept a second
"filter" line).
We'll also make this more explicit in the protocol documentation; as
above, I think this was always the intent, but there's no harm in making
it clear.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:38:46 +08:00
|
|
|
if (data->uri_protocols.nr)
|
|
|
|
send_err_and_die(data,
|
|
|
|
"multiple packfile-uris lines forbidden");
|
2020-06-11 04:57:23 +08:00
|
|
|
string_list_split(&data->uri_protocols, p, ',', -1);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2018-03-16 01:31:27 +08:00
|
|
|
/* ignore unknown lines maybe? */
|
2018-05-02 08:31:29 +08:00
|
|
|
die("unexpected line: '%s'", arg);
|
2018-03-16 01:31:27 +08:00
|
|
|
}
|
2020-03-27 16:03:38 +08:00
|
|
|
|
2020-06-11 04:57:23 +08:00
|
|
|
if (data->uri_protocols.nr && !data->writer.use_sideband)
|
|
|
|
string_list_clear(&data->uri_protocols, 0);
|
|
|
|
|
2020-03-27 16:03:38 +08:00
|
|
|
if (request->status != PACKET_READ_FLUSH)
|
|
|
|
die(_("expected flush after fetch arguments"));
|
2023-10-18 05:12:47 +08:00
|
|
|
|
|
|
|
if (trace2_is_enabled())
|
|
|
|
trace2_fetch_info(data);
|
2018-03-16 01:31:27 +08:00
|
|
|
}
|
|
|
|
|
upload-pack: drop separate v2 "haves" array
When upload-pack sees a "have" line in the v0 protocol, it immediately
calls got_oid() with its argument and potentially produces an ACK
response. In the v2 protocol, we simply record the argument in an
oid_array, and only later process all of the "have" objects by calling
the equivalent of got_oid() on the contents of the array.
This makes some sense, as v2 is a pure request/response protocol, as
opposed to v0's asynchronous negotiation phase. But there's a downside:
a client can send us an infinite number of garbage "have" lines, which
we'll happily slurp into the array, consuming memory. Whereas in v0,
they are limited by the number of objects in the repository (because
got_oid() only records objects we have ourselves, and we avoid
duplicates by setting a flag on the object struct).
We can make v2 behave more like v0 by also calling got_oid() directly
when v2 parses a "have" line. Calling it early like this is OK because
got_oid() itself does not interact with the client; it only confirms
that we have the object and sets a few flags. Note that unlike v0, v2
does not ever (before or after this patch) check the return code of
got_oid(), which lets the caller know whether we have the object. But
again, that makes sense; v0 is using it to asynchronously tell the
client to stop sending. In v2's synchronous protocol, we just discard
those entries (and decide how to ACK at the end of each round).
There is one slight tweak we need, though. In v2's state machine, we
reach the SEND_ACKS state if the other side sent us any "have" lines,
whether they were useful or not. Right now we do that by checking
whether the "have" array had any entries, but if we record only the
useful ones, that doesn't work. Instead, we can add a simple boolean
that tells us whether we saw any have line (even if it was useless).
This lets us drop the "haves" array entirely, as we're now placing
objects directly into the "have_obj" object array (which is where
got_oid() put them in the long run anyway). And as a bonus, we can drop
the secondary "common" array used in process_haves_and_send_acks(). It
was essentially a copy of "haves" minus the objects we do not have. But
now that we are using "have_obj" directly, we know everything in it is
useful. So in addition to protecting ourselves against malicious input,
we should slightly lower our memory usage for normal inputs.
Note that there is one user-visible effect. The trace2 output records
the number of "haves". Previously this was the total number of "have"
lines we saw, but now is the number of useful ones. We could retain the
original meaning by keeping a separate counter, but it doesn't seem
worth the effort; this trace info is for debugging and metrics, and
arguably the count of common oids is at least as useful as the total
count.
Reported-by: Benjamin Flesch <benjaminflesch@icloud.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:37:13 +08:00
|
|
|
static int send_acks(struct upload_pack_data *data, struct object_array *acks)
|
2018-03-16 01:31:27 +08:00
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
2020-06-11 20:05:14 +08:00
|
|
|
packet_writer_write(&data->writer, "acknowledgments\n");
|
2018-03-16 01:31:27 +08:00
|
|
|
|
|
|
|
/* Send Acks */
|
|
|
|
if (!acks->nr)
|
2020-06-11 20:05:14 +08:00
|
|
|
packet_writer_write(&data->writer, "NAK\n");
|
2018-03-16 01:31:27 +08:00
|
|
|
|
|
|
|
for (i = 0; i < acks->nr; i++) {
|
2020-06-11 20:05:14 +08:00
|
|
|
packet_writer_write(&data->writer, "ACK %s\n",
|
upload-pack: drop separate v2 "haves" array
When upload-pack sees a "have" line in the v0 protocol, it immediately
calls got_oid() with its argument and potentially produces an ACK
response. In the v2 protocol, we simply record the argument in an
oid_array, and only later process all of the "have" objects by calling
the equivalent of got_oid() on the contents of the array.
This makes some sense, as v2 is a pure request/response protocol, as
opposed to v0's asynchronous negotiation phase. But there's a downside:
a client can send us an infinite number of garbage "have" lines, which
we'll happily slurp into the array, consuming memory. Whereas in v0,
they are limited by the number of objects in the repository (because
got_oid() only records objects we have ourselves, and we avoid
duplicates by setting a flag on the object struct).
We can make v2 behave more like v0 by also calling got_oid() directly
when v2 parses a "have" line. Calling it early like this is OK because
got_oid() itself does not interact with the client; it only confirms
that we have the object and sets a few flags. Note that unlike v0, v2
does not ever (before or after this patch) check the return code of
got_oid(), which lets the caller know whether we have the object. But
again, that makes sense; v0 is using it to asynchronously tell the
client to stop sending. In v2's synchronous protocol, we just discard
those entries (and decide how to ACK at the end of each round).
There is one slight tweak we need, though. In v2's state machine, we
reach the SEND_ACKS state if the other side sent us any "have" lines,
whether they were useful or not. Right now we do that by checking
whether the "have" array had any entries, but if we record only the
useful ones, that doesn't work. Instead, we can add a simple boolean
that tells us whether we saw any have line (even if it was useless).
This lets us drop the "haves" array entirely, as we're now placing
objects directly into the "have_obj" object array (which is where
got_oid() put them in the long run anyway). And as a bonus, we can drop
the secondary "common" array used in process_haves_and_send_acks(). It
was essentially a copy of "haves" minus the objects we do not have. But
now that we are using "have_obj" directly, we know everything in it is
useful. So in addition to protecting ourselves against malicious input,
we should slightly lower our memory usage for normal inputs.
Note that there is one user-visible effect. The trace2 output records
the number of "haves". Previously this was the total number of "have"
lines we saw, but now is the number of useful ones. We could retain the
original meaning by keeping a separate counter, but it doesn't seem
worth the effort; this trace info is for debugging and metrics, and
arguably the count of common oids is at least as useful as the total
count.
Reported-by: Benjamin Flesch <benjaminflesch@icloud.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:37:13 +08:00
|
|
|
oid_to_hex(&acks->objects[i].item->oid));
|
2018-03-16 01:31:27 +08:00
|
|
|
}
|
|
|
|
|
fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 05:16:01 +08:00
|
|
|
if (!data->wait_for_done && ok_to_give_up(data)) {
|
2018-03-16 01:31:27 +08:00
|
|
|
/* Send Ready */
|
2020-06-11 20:05:14 +08:00
|
|
|
packet_writer_write(&data->writer, "ready\n");
|
2018-03-16 01:31:27 +08:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-05-15 18:04:43 +08:00
|
|
|
static int process_haves_and_send_acks(struct upload_pack_data *data)
|
2018-03-16 01:31:27 +08:00
|
|
|
{
|
|
|
|
int ret = 0;
|
|
|
|
|
|
|
|
if (data->done) {
|
|
|
|
ret = 1;
|
upload-pack: drop separate v2 "haves" array
When upload-pack sees a "have" line in the v0 protocol, it immediately
calls got_oid() with its argument and potentially produces an ACK
response. In the v2 protocol, we simply record the argument in an
oid_array, and only later process all of the "have" objects by calling
the equivalent of got_oid() on the contents of the array.
This makes some sense, as v2 is a pure request/response protocol, as
opposed to v0's asynchronous negotiation phase. But there's a downside:
a client can send us an infinite number of garbage "have" lines, which
we'll happily slurp into the array, consuming memory. Whereas in v0,
they are limited by the number of objects in the repository (because
got_oid() only records objects we have ourselves, and we avoid
duplicates by setting a flag on the object struct).
We can make v2 behave more like v0 by also calling got_oid() directly
when v2 parses a "have" line. Calling it early like this is OK because
got_oid() itself does not interact with the client; it only confirms
that we have the object and sets a few flags. Note that unlike v0, v2
does not ever (before or after this patch) check the return code of
got_oid(), which lets the caller know whether we have the object. But
again, that makes sense; v0 is using it to asynchronously tell the
client to stop sending. In v2's synchronous protocol, we just discard
those entries (and decide how to ACK at the end of each round).
There is one slight tweak we need, though. In v2's state machine, we
reach the SEND_ACKS state if the other side sent us any "have" lines,
whether they were useful or not. Right now we do that by checking
whether the "have" array had any entries, but if we record only the
useful ones, that doesn't work. Instead, we can add a simple boolean
that tells us whether we saw any have line (even if it was useless).
This lets us drop the "haves" array entirely, as we're now placing
objects directly into the "have_obj" object array (which is where
got_oid() put them in the long run anyway). And as a bonus, we can drop
the secondary "common" array used in process_haves_and_send_acks(). It
was essentially a copy of "haves" minus the objects we do not have. But
now that we are using "have_obj" directly, we know everything in it is
useful. So in addition to protecting ourselves against malicious input,
we should slightly lower our memory usage for normal inputs.
Note that there is one user-visible effect. The trace2 output records
the number of "haves". Previously this was the total number of "have"
lines we saw, but now is the number of useful ones. We could retain the
original meaning by keeping a separate counter, but it doesn't seem
worth the effort; this trace info is for debugging and metrics, and
arguably the count of common oids is at least as useful as the total
count.
Reported-by: Benjamin Flesch <benjaminflesch@icloud.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:37:13 +08:00
|
|
|
} else if (send_acks(data, &data->have_obj)) {
|
2019-01-16 03:40:27 +08:00
|
|
|
packet_writer_delim(&data->writer);
|
2018-03-16 01:31:27 +08:00
|
|
|
ret = 1;
|
|
|
|
} else {
|
|
|
|
/* Add Flush */
|
2019-01-16 03:40:27 +08:00
|
|
|
packet_writer_flush(&data->writer);
|
2018-03-16 01:31:27 +08:00
|
|
|
ret = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2018-06-28 06:30:17 +08:00
|
|
|
static void send_wanted_ref_info(struct upload_pack_data *data)
|
|
|
|
{
|
upload-pack: use a strmap for want-ref lines
When the "ref-in-want" capability is advertised (which it is not by
default), then upload-pack processes a "want-ref" line from the client
by checking that the name is a valid ref and recording it in a
string-list.
In theory this list should grow no larger than the number of refs in the
server-side repository. But since we don't do any de-duplication, a
client which sends "want-ref refs/heads/foo" over and over will cause
the array to grow without bound.
We can fix this by switching to strmap, which efficiently detects
duplicates. There are two client-visible changes here:
1. The "wanted-refs" response will now be in an apparently-random
order (based on iterating the hashmap) rather than the order given
by the client. The protocol documentation is quiet on ordering
here. The current fetch-pack implementation is happy with any
order, as it looks up each returned ref using a binary search in
its local sorted list. JGit seems to implement want-ref on the
server side, but has no client-side support. libgit2 doesn't
support either side.
It would obviously be possible to record the original order or to
use the strmap as an auxiliary data structure. But if the client
doesn't care, we may as well do the simplest thing.
2. We'll now reject duplicates explicitly as a protocol error. The
client should never send them (and our current implementation, even
when asked to "git fetch master:one master:two" will de-dup on the
client side).
If we wanted to be more forgiving, we could perhaps just throw away
the duplicates. But then our "wanted-refs" response back to the
client would omit the duplicates, and it's hard to say what a
client that accidentally sent a duplicate would do with that. So I
think we're better off to complain loudly before anybody
accidentally writes such a client.
Let's also add a note to the protocol documentation clarifying that
duplicates are forbidden. As discussed above, this was already the
intent, but it's not very explicit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:38:40 +08:00
|
|
|
struct hashmap_iter iter;
|
|
|
|
const struct strmap_entry *e;
|
2018-06-28 06:30:17 +08:00
|
|
|
|
upload-pack: use a strmap for want-ref lines
When the "ref-in-want" capability is advertised (which it is not by
default), then upload-pack processes a "want-ref" line from the client
by checking that the name is a valid ref and recording it in a
string-list.
In theory this list should grow no larger than the number of refs in the
server-side repository. But since we don't do any de-duplication, a
client which sends "want-ref refs/heads/foo" over and over will cause
the array to grow without bound.
We can fix this by switching to strmap, which efficiently detects
duplicates. There are two client-visible changes here:
1. The "wanted-refs" response will now be in an apparently-random
order (based on iterating the hashmap) rather than the order given
by the client. The protocol documentation is quiet on ordering
here. The current fetch-pack implementation is happy with any
order, as it looks up each returned ref using a binary search in
its local sorted list. JGit seems to implement want-ref on the
server side, but has no client-side support. libgit2 doesn't
support either side.
It would obviously be possible to record the original order or to
use the strmap as an auxiliary data structure. But if the client
doesn't care, we may as well do the simplest thing.
2. We'll now reject duplicates explicitly as a protocol error. The
client should never send them (and our current implementation, even
when asked to "git fetch master:one master:two" will de-dup on the
client side).
If we wanted to be more forgiving, we could perhaps just throw away
the duplicates. But then our "wanted-refs" response back to the
client would omit the duplicates, and it's hard to say what a
client that accidentally sent a duplicate would do with that. So I
think we're better off to complain loudly before anybody
accidentally writes such a client.
Let's also add a note to the protocol documentation clarifying that
duplicates are forbidden. As discussed above, this was already the
intent, but it's not very explicit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:38:40 +08:00
|
|
|
if (strmap_empty(&data->wanted_refs))
|
2018-06-28 06:30:17 +08:00
|
|
|
return;
|
|
|
|
|
2019-01-16 03:40:27 +08:00
|
|
|
packet_writer_write(&data->writer, "wanted-refs\n");
|
2018-06-28 06:30:17 +08:00
|
|
|
|
upload-pack: use a strmap for want-ref lines
When the "ref-in-want" capability is advertised (which it is not by
default), then upload-pack processes a "want-ref" line from the client
by checking that the name is a valid ref and recording it in a
string-list.
In theory this list should grow no larger than the number of refs in the
server-side repository. But since we don't do any de-duplication, a
client which sends "want-ref refs/heads/foo" over and over will cause
the array to grow without bound.
We can fix this by switching to strmap, which efficiently detects
duplicates. There are two client-visible changes here:
1. The "wanted-refs" response will now be in an apparently-random
order (based on iterating the hashmap) rather than the order given
by the client. The protocol documentation is quiet on ordering
here. The current fetch-pack implementation is happy with any
order, as it looks up each returned ref using a binary search in
its local sorted list. JGit seems to implement want-ref on the
server side, but has no client-side support. libgit2 doesn't
support either side.
It would obviously be possible to record the original order or to
use the strmap as an auxiliary data structure. But if the client
doesn't care, we may as well do the simplest thing.
2. We'll now reject duplicates explicitly as a protocol error. The
client should never send them (and our current implementation, even
when asked to "git fetch master:one master:two" will de-dup on the
client side).
If we wanted to be more forgiving, we could perhaps just throw away
the duplicates. But then our "wanted-refs" response back to the
client would omit the duplicates, and it's hard to say what a
client that accidentally sent a duplicate would do with that. So I
think we're better off to complain loudly before anybody
accidentally writes such a client.
Let's also add a note to the protocol documentation clarifying that
duplicates are forbidden. As discussed above, this was already the
intent, but it's not very explicit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:38:40 +08:00
|
|
|
strmap_for_each_entry(&data->wanted_refs, &iter, e) {
|
2019-01-16 03:40:27 +08:00
|
|
|
packet_writer_write(&data->writer, "%s %s\n",
|
upload-pack: use a strmap for want-ref lines
When the "ref-in-want" capability is advertised (which it is not by
default), then upload-pack processes a "want-ref" line from the client
by checking that the name is a valid ref and recording it in a
string-list.
In theory this list should grow no larger than the number of refs in the
server-side repository. But since we don't do any de-duplication, a
client which sends "want-ref refs/heads/foo" over and over will cause
the array to grow without bound.
We can fix this by switching to strmap, which efficiently detects
duplicates. There are two client-visible changes here:
1. The "wanted-refs" response will now be in an apparently-random
order (based on iterating the hashmap) rather than the order given
by the client. The protocol documentation is quiet on ordering
here. The current fetch-pack implementation is happy with any
order, as it looks up each returned ref using a binary search in
its local sorted list. JGit seems to implement want-ref on the
server side, but has no client-side support. libgit2 doesn't
support either side.
It would obviously be possible to record the original order or to
use the strmap as an auxiliary data structure. But if the client
doesn't care, we may as well do the simplest thing.
2. We'll now reject duplicates explicitly as a protocol error. The
client should never send them (and our current implementation, even
when asked to "git fetch master:one master:two" will de-dup on the
client side).
If we wanted to be more forgiving, we could perhaps just throw away
the duplicates. But then our "wanted-refs" response back to the
client would omit the duplicates, and it's hard to say what a
client that accidentally sent a duplicate would do with that. So I
think we're better off to complain loudly before anybody
accidentally writes such a client.
Let's also add a note to the protocol documentation clarifying that
duplicates are forbidden. As discussed above, this was already the
intent, but it's not very explicit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:38:40 +08:00
|
|
|
oid_to_hex(e->value),
|
|
|
|
e->key);
|
2018-06-28 06:30:17 +08:00
|
|
|
}
|
|
|
|
|
2019-01-16 03:40:27 +08:00
|
|
|
packet_writer_delim(&data->writer);
|
2018-06-28 06:30:17 +08:00
|
|
|
}
|
|
|
|
|
2020-05-15 18:04:43 +08:00
|
|
|
static void send_shallow_info(struct upload_pack_data *data)
|
2018-03-16 01:31:28 +08:00
|
|
|
{
|
|
|
|
/* No shallow info needs to be sent */
|
|
|
|
if (!data->depth && !data->deepen_rev_list && !data->shallows.nr &&
|
2018-07-19 03:20:27 +08:00
|
|
|
!is_repository_shallow(the_repository))
|
2018-03-16 01:31:28 +08:00
|
|
|
return;
|
|
|
|
|
2019-01-16 03:40:27 +08:00
|
|
|
packet_writer_write(&data->writer, "shallow-info\n");
|
2018-03-16 01:31:28 +08:00
|
|
|
|
2020-06-11 20:05:05 +08:00
|
|
|
if (!send_shallow_list(data) &&
|
2018-07-19 03:20:27 +08:00
|
|
|
is_repository_shallow(the_repository))
|
2020-06-11 20:05:06 +08:00
|
|
|
deepen(data, INFINITE_DEPTH);
|
2018-03-16 01:31:28 +08:00
|
|
|
|
|
|
|
packet_delim(1);
|
|
|
|
}
|
|
|
|
|
2018-03-16 01:31:27 +08:00
|
|
|
enum fetch_state {
|
|
|
|
FETCH_PROCESS_ARGS = 0,
|
|
|
|
FETCH_SEND_ACKS,
|
|
|
|
FETCH_SEND_PACK,
|
|
|
|
FETCH_DONE,
|
|
|
|
};
|
|
|
|
|
2024-02-29 06:46:47 +08:00
|
|
|
int upload_pack_v2(struct repository *r, struct packet_reader *request)
|
2018-03-16 01:31:27 +08:00
|
|
|
{
|
|
|
|
enum fetch_state state = FETCH_PROCESS_ARGS;
|
|
|
|
struct upload_pack_data data;
|
upload-pack: clear flags before each v2 request
Suppose a server has the following commit graph:
A B
\ /
O
We create a client by cloning A from the server with depth 1, and add
many commits to it (so that future fetches span multiple requests due to
lengthy negotiation). If it then fetches B using protocol v2, the fetch
spanning multiple requests, the resulting packfile does not contain O
even though the client did report that A is shallow.
This is because upload_pack_v2() can be called multiple times while
processing the same session. During the 2nd and all subsequent
invocations, some object flags remain from the previous invocations. In
particular, CLIENT_SHALLOW remains, preventing process_shallow() from
adding client-reported shallows to the "shallows" array, and hence
pack-objects not knowing about these client-reported shallows.
Therefore, teach upload_pack_v2() to clear object flags at the start of
each invocation. This has some other results:
- THEY_HAVE gates addition of objects to have_obj in process_haves().
Previously in upload_pack_v2(), have_obj needed to be static because
once an object is added to have_obj, it is never readded and thus we
needed to retain the contents of have_obj between invocations. Now
that flags are cleared, this is no longer necessary. This patch does
not change the behavior of ok_to_give_up() (THEY_HAVE is still set on
each "have") and got_oid() (used only in non-v2)); THEY_HAVE is not
used in any other function.
- WANTED gates addition of objects to want_obj in parse_want() and
parse_want_ref(). It is also used in receive_needs(), but that is
only used in non-v2. For the same reasons as THEY_HAVE, want_obj no
longer needs to be static in upload_pack_v2().
- CLIENT_SHALLOW is changed as discussed above.
Clearing of the other 5 flags does not affect functionality in v2. (Note
that in non-v2, upload_pack() is only called once per process, so each
invocation starts with blank flags anyway.)
- OUR_REF is only used in non-v2.
- COMMON_KNOWN is only used as a scratch flag in ok_to_give_up().
- SHALLOW is passed to invocations in deepen() and
deepen_by_rev_list(), but upload-pack doesn't use it.
- NOT_SHALLOW is used by send_shallow() and send_unshallow(), but
invocations of those functions are always preceded by code that sets
NOT_SHALLOW on the appropriate objects.
- HIDDEN_REF is only used in non-v2.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19 04:43:29 +08:00
|
|
|
|
|
|
|
clear_object_flags(ALL_FLAGS);
|
2018-03-16 01:31:27 +08:00
|
|
|
|
|
|
|
upload_pack_data_init(&data);
|
2020-06-05 01:54:41 +08:00
|
|
|
data.use_sideband = LARGE_PACKET_MAX;
|
2024-02-29 06:46:47 +08:00
|
|
|
get_upload_pack_config(r, &data);
|
2020-06-05 01:54:45 +08:00
|
|
|
|
2018-03-16 01:31:27 +08:00
|
|
|
while (state != FETCH_DONE) {
|
|
|
|
switch (state) {
|
|
|
|
case FETCH_PROCESS_ARGS:
|
2020-05-15 18:04:43 +08:00
|
|
|
process_args(request, &data);
|
2018-03-16 01:31:27 +08:00
|
|
|
|
fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 05:16:01 +08:00
|
|
|
if (!data.want_obj.nr && !data.wait_for_done) {
|
2018-03-16 01:31:27 +08:00
|
|
|
/*
|
fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 05:16:01 +08:00
|
|
|
* Request didn't contain any 'want' lines (and
|
|
|
|
* the request does not contain
|
|
|
|
* "wait-for-done", in which it is reasonable
|
|
|
|
* to just send 'have's without 'want's); guess
|
|
|
|
* they didn't want anything.
|
2018-03-16 01:31:27 +08:00
|
|
|
*/
|
|
|
|
state = FETCH_DONE;
|
upload-pack: drop separate v2 "haves" array
When upload-pack sees a "have" line in the v0 protocol, it immediately
calls got_oid() with its argument and potentially produces an ACK
response. In the v2 protocol, we simply record the argument in an
oid_array, and only later process all of the "have" objects by calling
the equivalent of got_oid() on the contents of the array.
This makes some sense, as v2 is a pure request/response protocol, as
opposed to v0's asynchronous negotiation phase. But there's a downside:
a client can send us an infinite number of garbage "have" lines, which
we'll happily slurp into the array, consuming memory. Whereas in v0,
they are limited by the number of objects in the repository (because
got_oid() only records objects we have ourselves, and we avoid
duplicates by setting a flag on the object struct).
We can make v2 behave more like v0 by also calling got_oid() directly
when v2 parses a "have" line. Calling it early like this is OK because
got_oid() itself does not interact with the client; it only confirms
that we have the object and sets a few flags. Note that unlike v0, v2
does not ever (before or after this patch) check the return code of
got_oid(), which lets the caller know whether we have the object. But
again, that makes sense; v0 is using it to asynchronously tell the
client to stop sending. In v2's synchronous protocol, we just discard
those entries (and decide how to ACK at the end of each round).
There is one slight tweak we need, though. In v2's state machine, we
reach the SEND_ACKS state if the other side sent us any "have" lines,
whether they were useful or not. Right now we do that by checking
whether the "have" array had any entries, but if we record only the
useful ones, that doesn't work. Instead, we can add a simple boolean
that tells us whether we saw any have line (even if it was useless).
This lets us drop the "haves" array entirely, as we're now placing
objects directly into the "have_obj" object array (which is where
got_oid() put them in the long run anyway). And as a bonus, we can drop
the secondary "common" array used in process_haves_and_send_acks(). It
was essentially a copy of "haves" minus the objects we do not have. But
now that we are using "have_obj" directly, we know everything in it is
useful. So in addition to protecting ourselves against malicious input,
we should slightly lower our memory usage for normal inputs.
Note that there is one user-visible effect. The trace2 output records
the number of "haves". Previously this was the total number of "have"
lines we saw, but now is the number of useful ones. We could retain the
original meaning by keeping a separate counter, but it doesn't seem
worth the effort; this trace info is for debugging and metrics, and
arguably the count of common oids is at least as useful as the total
count.
Reported-by: Benjamin Flesch <benjaminflesch@icloud.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:37:13 +08:00
|
|
|
} else if (data.seen_haves) {
|
2018-03-16 01:31:27 +08:00
|
|
|
/*
|
|
|
|
* Request had 'have' lines, so lets ACK them.
|
|
|
|
*/
|
|
|
|
state = FETCH_SEND_ACKS;
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* Request had 'want's but no 'have's so we can
|
2024-09-20 02:34:40 +08:00
|
|
|
* immediately go to construct and send a pack.
|
2018-03-16 01:31:27 +08:00
|
|
|
*/
|
|
|
|
state = FETCH_SEND_PACK;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case FETCH_SEND_ACKS:
|
2020-05-15 18:04:43 +08:00
|
|
|
if (process_haves_and_send_acks(&data))
|
2018-03-16 01:31:27 +08:00
|
|
|
state = FETCH_SEND_PACK;
|
|
|
|
else
|
|
|
|
state = FETCH_DONE;
|
|
|
|
break;
|
|
|
|
case FETCH_SEND_PACK:
|
2018-06-28 06:30:17 +08:00
|
|
|
send_wanted_ref_info(&data);
|
2020-05-15 18:04:43 +08:00
|
|
|
send_shallow_info(&data);
|
2018-03-16 01:31:28 +08:00
|
|
|
|
2020-06-11 04:57:23 +08:00
|
|
|
if (data.uri_protocols.nr) {
|
|
|
|
create_pack_file(&data, &data.uri_protocols);
|
|
|
|
} else {
|
|
|
|
packet_writer_write(&data.writer, "packfile\n");
|
|
|
|
create_pack_file(&data, NULL);
|
|
|
|
}
|
2018-03-16 01:31:27 +08:00
|
|
|
state = FETCH_DONE;
|
|
|
|
break;
|
|
|
|
case FETCH_DONE:
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
upload_pack_data_clear(&data);
|
|
|
|
return 0;
|
|
|
|
}
|
2018-03-16 01:31:28 +08:00
|
|
|
|
|
|
|
int upload_pack_advertise(struct repository *r,
|
|
|
|
struct strbuf *value)
|
|
|
|
{
|
upload-pack: use existing config mechanism for advertisement
When serving a v2 capabilities request, we call upload_pack_advertise()
to tell us the set of features we can advertise to the client. That
involves looking at various config options, all of which need to be kept
in sync with the rules we use in upload_pack_config to set flags like
allow_filter, allow_sideband_all, and so on. If these two pieces of code
get out of sync then we may refuse to respect a capability we
advertised, or vice versa accept one that we should not.
Instead, let's call the same config helper that we'll use for processing
the actual client request, and then just pick the values out of the
resulting struct. This is only a little bit shorter than the current
code, but we don't repeat any policy logic (e.g., we don't have to worry
about the magic sideband-all environment variable here anymore).
And this reveals a gap in the existing code: there is no struct flag for
the packfile-uris capability (we accept it even if it is not advertised,
which we should not). We'll leave the advertisement code for now and
deal with it in the next patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:48:18 +08:00
|
|
|
struct upload_pack_data data;
|
|
|
|
|
|
|
|
upload_pack_data_init(&data);
|
|
|
|
get_upload_pack_config(r, &data);
|
2018-06-28 06:30:17 +08:00
|
|
|
|
2018-05-04 07:46:56 +08:00
|
|
|
if (value) {
|
fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.
This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.
In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.
There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 05:16:01 +08:00
|
|
|
strbuf_addstr(value, "shallow wait-for-done");
|
2018-06-28 06:30:17 +08:00
|
|
|
|
upload-pack: use existing config mechanism for advertisement
When serving a v2 capabilities request, we call upload_pack_advertise()
to tell us the set of features we can advertise to the client. That
involves looking at various config options, all of which need to be kept
in sync with the rules we use in upload_pack_config to set flags like
allow_filter, allow_sideband_all, and so on. If these two pieces of code
get out of sync then we may refuse to respect a capability we
advertised, or vice versa accept one that we should not.
Instead, let's call the same config helper that we'll use for processing
the actual client request, and then just pick the values out of the
resulting struct. This is only a little bit shorter than the current
code, but we don't repeat any policy logic (e.g., we don't have to worry
about the magic sideband-all environment variable here anymore).
And this reveals a gap in the existing code: there is no struct flag for
the packfile-uris capability (we accept it even if it is not advertised,
which we should not). We'll leave the advertisement code for now and
deal with it in the next patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:48:18 +08:00
|
|
|
if (data.allow_filter)
|
2018-05-04 07:46:56 +08:00
|
|
|
strbuf_addstr(value, " filter");
|
2018-06-28 06:30:17 +08:00
|
|
|
|
upload-pack: use existing config mechanism for advertisement
When serving a v2 capabilities request, we call upload_pack_advertise()
to tell us the set of features we can advertise to the client. That
involves looking at various config options, all of which need to be kept
in sync with the rules we use in upload_pack_config to set flags like
allow_filter, allow_sideband_all, and so on. If these two pieces of code
get out of sync then we may refuse to respect a capability we
advertised, or vice versa accept one that we should not.
Instead, let's call the same config helper that we'll use for processing
the actual client request, and then just pick the values out of the
resulting struct. This is only a little bit shorter than the current
code, but we don't repeat any policy logic (e.g., we don't have to worry
about the magic sideband-all environment variable here anymore).
And this reveals a gap in the existing code: there is no struct flag for
the packfile-uris capability (we accept it even if it is not advertised,
which we should not). We'll leave the advertisement code for now and
deal with it in the next patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:48:18 +08:00
|
|
|
if (data.allow_ref_in_want)
|
2018-06-28 06:30:17 +08:00
|
|
|
strbuf_addstr(value, " ref-in-want");
|
2019-01-17 03:28:14 +08:00
|
|
|
|
upload-pack: use existing config mechanism for advertisement
When serving a v2 capabilities request, we call upload_pack_advertise()
to tell us the set of features we can advertise to the client. That
involves looking at various config options, all of which need to be kept
in sync with the rules we use in upload_pack_config to set flags like
allow_filter, allow_sideband_all, and so on. If these two pieces of code
get out of sync then we may refuse to respect a capability we
advertised, or vice versa accept one that we should not.
Instead, let's call the same config helper that we'll use for processing
the actual client request, and then just pick the values out of the
resulting struct. This is only a little bit shorter than the current
code, but we don't repeat any policy logic (e.g., we don't have to worry
about the magic sideband-all environment variable here anymore).
And this reveals a gap in the existing code: there is no struct flag for
the packfile-uris capability (we accept it even if it is not advertised,
which we should not). We'll leave the advertisement code for now and
deal with it in the next patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:48:18 +08:00
|
|
|
if (data.allow_sideband_all)
|
2019-01-17 03:28:14 +08:00
|
|
|
strbuf_addstr(value, " sideband-all");
|
2020-06-11 04:57:23 +08:00
|
|
|
|
2024-02-29 06:50:50 +08:00
|
|
|
if (data.allow_packfile_uris)
|
2020-06-11 04:57:23 +08:00
|
|
|
strbuf_addstr(value, " packfile-uris");
|
2018-05-04 07:46:56 +08:00
|
|
|
}
|
2018-06-28 06:30:17 +08:00
|
|
|
|
upload-pack: use existing config mechanism for advertisement
When serving a v2 capabilities request, we call upload_pack_advertise()
to tell us the set of features we can advertise to the client. That
involves looking at various config options, all of which need to be kept
in sync with the rules we use in upload_pack_config to set flags like
allow_filter, allow_sideband_all, and so on. If these two pieces of code
get out of sync then we may refuse to respect a capability we
advertised, or vice versa accept one that we should not.
Instead, let's call the same config helper that we'll use for processing
the actual client request, and then just pick the values out of the
resulting struct. This is only a little bit shorter than the current
code, but we don't repeat any policy logic (e.g., we don't have to worry
about the magic sideband-all environment variable here anymore).
And this reveals a gap in the existing code: there is no struct flag for
the packfile-uris capability (we accept it even if it is not advertised,
which we should not). We'll leave the advertisement code for now and
deal with it in the next patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 06:48:18 +08:00
|
|
|
upload_pack_data_clear(&data);
|
|
|
|
|
2018-03-16 01:31:28 +08:00
|
|
|
return 1;
|
|
|
|
}
|