Commit Graph

301 Commits

Author SHA1 Message Date
Jonathan Tan
6462d5eb9a fetch: remove fetch_if_missing=0
In fetch_pack() (and all functions it calls), pass
OBJECT_INFO_SKIP_FETCH_OBJECT whenever we query an object that could be
a tree or blob that we do not want to be lazy-fetched even if it is
absent. Thus, the only lazy-fetches occurring for trees and blobs are
when resolving deltas.

Thus, we can remove fetch_if_missing=0 from builtin/fetch.c. Remove
this, and also add a test ensuring that such objects are not
lazy-fetched. (We might be able to remove fetch_if_missing=0 from other
places too, but I have limited myself to builtin/fetch.c in this commit
because I have not written tests for the other commands yet.)

Note that commits and tags may still be lazy-fetched. I limited myself
to objects that could be trees or blobs here because Git does not
support creating such commit- and tag-excluding clones yet, and even if
such a clone were manually created, Git does not have good support for
fetching a single commit (when fetching a commit, it and all its
ancestors would be sent).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-08 15:26:44 +09:00
Junio C Hamano
3b9ec27919 Merge branch 'js/trace2-fetch-push'
Dev support.

* js/trace2-fetch-push:
  transport: push codepath can take arbitrary repository
  push: add trace2 instrumentation
  fetch: add trace2 instrumentation
2019-10-15 13:48:03 +09:00
Junio C Hamano
676278f8ea Merge branch 'bc/object-id-part17'
Preparation for SHA-256 upgrade continues.

* bc/object-id-part17: (26 commits)
  midx: switch to using the_hash_algo
  builtin/show-index: replace sha1_to_hex
  rerere: replace sha1_to_hex
  builtin/receive-pack: replace sha1_to_hex
  builtin/index-pack: replace sha1_to_hex
  packfile: replace sha1_to_hex
  wt-status: convert struct wt_status to object_id
  cache: remove null_sha1
  builtin/worktree: switch null_sha1 to null_oid
  builtin/repack: write object IDs of the proper length
  pack-write: use hash_to_hex when writing checksums
  sequencer: convert to use the_hash_algo
  bisect: switch to using the_hash_algo
  sha1-lookup: switch hard-coded constants to the_hash_algo
  config: use the_hash_algo in abbrev comparison
  combine-diff: replace GIT_SHA1_HEXSZ with the_hash_algo
  bundle: switch to use the_hash_algo
  connected: switch GIT_SHA1_HEXSZ to the_hash_algo
  show-index: switch hard-coded constants to the_hash_algo
  blame: remove needless comparison with GIT_SHA1_HEXSZ
  ...
2019-10-11 14:24:46 +09:00
Josh Steadmon
5fc31180d8 fetch: add trace2 instrumentation
Add trace2 regions to fetch-pack.c and builtins/fetch.c to better track
time spent in the various phases of a fetch:

* listing refs
* negotiation for protocol versions v0-v2
* fetching refs
* consuming refs

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-03 10:13:17 +09:00
Junio C Hamano
627b826834 Merge branch 'md/list-objects-filter-combo'
The list-objects-filter API (used to create a sparse/lazy clone)
learned to take a combined filter specification.

* md/list-objects-filter-combo:
  list-objects-filter-options: make parser void
  list-objects-filter-options: clean up use of ALLOC_GROW
  list-objects-filter-options: allow mult. --filter
  strbuf: give URL-encoding API a char predicate fn
  list-objects-filter-options: make filter_spec a string_list
  list-objects-filter-options: move error check up
  list-objects-filter: implement composite filters
  list-objects-filter-options: always supply *errbuf
  list-objects-filter: put omits set in filter struct
  list-objects-filter: encapsulate filter components
2019-09-18 11:50:09 -07:00
brian m. carlson
f6af19a9ad fetch-pack: use parse_oid_hex
Instead of hard-coding constants, use parse_oid_hex to compute a pointer
and use it in further parsing operations.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-19 15:04:57 -07:00
Derrick Stolee
aaf633c2ad repo-settings: create feature.experimental setting
The 'feature.experimental' setting includes config options that are
not committed to become defaults, but could use additional testing.

Update the following config settings to take new defaults, and to
use the repo_settings struct if not already using them:

* 'pack.useSparse=true'

* 'fetch.negotiationAlgorithm=skipping'

In the case of fetch.negotiationAlgorithm, the existing logic
would load the config option only when about to use the setting,
so had a die() statement on an unknown string value. This is
removed as now the config is parsed under prepare_repo_settings().
In general, this die() is probably misplaced and not valuable.
A test was removed that checked this die() statement executed.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-13 13:33:55 -07:00
Junio C Hamano
b49d337bfb Merge branch 'nd/fetch-capability-tweak'
Protocol capabilities that go over wire should never be translated,
but it was incorrectly marked for translation, which has been
corrected.  The output of protocol capabilities for debugging has
been tweaked a bit.

* nd/fetch-capability-tweak:
  fetch-pack: print server version at the top in -v -v
  fetch-pack: print all relevant supported capabilities with -v -v
  fetch-pack: move capability names out of i18n strings
2019-07-09 15:25:43 -07:00
Matthew DeVore
cf9ceb5a12 list-objects-filter-options: make filter_spec a string_list
Make the filter_spec string a string_list rather than a raw C string.
The list of strings must be concatted together to make a complete
filter_spec. A future patch will use this capability to build "combine:"
filter specs gradually.

A strbuf would seem to be a more natural choice for this object, but it
unfortunately requires initialization besides just zero'ing out the
memory.  This results in all container structs, and all containers of
those structs, etc., to also require initialization. Initializing them
all would be more cumbersome that simply using a string_list, which
behaves properly when its contents are zero'd.

For the purposes of code simplification, change behavior in how filter
specs are conveyed over the protocol: do not normalize the tree:<depth>
filter specs since there should be no server in existence that supports
tree:# but not tree:#k etc.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28 08:41:53 -07:00
Nguyễn Thái Ngọc Duy
0e04297101 fetch-pack: print server version at the top in -v -v
Before the previous patch, the server version is printed after all the
"Server supports" lines. The previous one puts the version in the middle
of "Server supports" group.

Instead of moving it to the bottom, I move it to the top. Version may
stand out more at the top as we will have even more debug out after
capabilities.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20 14:01:00 -07:00
Nguyễn Thái Ngọc Duy
5a88583b0b fetch-pack: print all relevant supported capabilities with -v -v
When we check if some capability is supported, we do print something in
verbose mode. Some capabilities are not printed though (and it made me
think it's not supported; I was more used to GIT_TRACE_PACKET) so let's
print them all.

It's a bit more code. And one could argue for printing all supported
capabilities the server sends us. But I think it's still valuable this
way because we see the capabilities that the client cares about.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20 14:00:59 -07:00
Nguyễn Thái Ngọc Duy
0778b2931c fetch-pack: move capability names out of i18n strings
This reduces the work on translators since they only have one string to
translate (and I think it's still enough context to translate). It also
makes sure no capability name is translated by accident.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20 14:00:59 -07:00
Jeff King
d0229abd93 object: convert lookup_object() to use object_id
There are no callers left of lookup_object() that aren't just passing us
the "hash" member of a "struct object_id". Let's take the whole struct,
which gets us closer to removing all raw sha1 variables.  It also
matches the existing conversions of lookup_blob(), etc.

The conversions of callers were done by hand, but they're all mechanical
one-liners.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20 10:18:09 -07:00
Junio C Hamano
0e5387cd02 Merge branch 'jt/clone-server-option'
A brown-paper-bag bugfix to a change already in 'master'.

* jt/clone-server-option:
  fetch-pack: send server options after command
2019-05-30 10:50:46 -07:00
Jonathan Tan
5b204b7df3 fetch-pack: send server options after command
Currently, if any server options are specified during a protocol v2
fetch, server options will be sent before "command=fetch". Write server
options to the request buffer in send_fetch_request() so that the
components of the request are sent in the correct order.

The protocol documentation states that the command must come first. The
Git server implementation in serve.c (see process_request() in that
file) tolerates any order of command and capability, which is perhaps
why we haven't noticed this. This was noticed when testing against a
JGit server implementation, which follows the documentation in this
regard.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-28 11:01:07 -07:00
Junio C Hamano
57a6b93236 Merge branch 'jk/fetch-reachability-error-fix'
Code clean-up and a fix for "git fetch" by an explicit object name
(as opposed to fetching refs by name).

* jk/fetch-reachability-error-fix:
  fetch: do not consider peeled tags as advertised tips
  remote.c: make singular free_ref() public
  fetch: use free_refs()
  pkt-line: prepare buffer before handling ERR packets
  upload-pack: send ERR packet for non-tip objects
  t5530: check protocol response for "not our ref"
  t5516: drop ok=sigpipe from unreachable-want tests
2019-04-25 16:41:23 +09:00
Junio C Hamano
732ce7aaca Merge branch 'jt/fetch-no-update-shallow-in-proto-v2'
Fix for protocol v2 support in "git fetch-pack" of shallow clones.

* jt/fetch-no-update-shallow-in-proto-v2:
  fetch-pack: respect --no-update-shallow in v2
  fetch-pack: call prepare_shallow_info only if v0
2019-04-25 16:41:16 +09:00
Junio C Hamano
abd7ccdd4d Merge branch 'jt/fetch-pack-wanted-refs-optim'
Performance fix around "git fetch" that grabs many refs.

* jt/fetch-pack-wanted-refs-optim:
  fetch-pack: binary search when storing wanted-refs
2019-04-25 16:41:16 +09:00
Jeff King
34066f0661 fetch: do not consider peeled tags as advertised tips
Our filter_refs() function accidentally considers the target of a peeled
tag to be advertised by the server, even though upload-pack on the
server side does not consider it so. This can result in the client
making a bogus fetch to the server, which will end with the server
complaining "not our ref". Whereas the correct behavior is for the
client to notice that the server will not allow the request and error
out immediately.

So as bugs go, this is not very serious (the outcome is the same either
way -- the fetch fails). But it's worth making the logic here correct
and consistent with other related cases (e.g., fetching an oid that the
server did not mention at all).

The crux of the issue comes from fdb69d33c4 (fetch-pack: always allow
fetching of literal SHA1s, 2017-05-15). After that, the strategy of
filter_refs() is basically:

  - for each advertised ref, try to match it with a "sought" ref
    provided by the user. Skip any malformed refs (which includes
    peeled values like "refs/tags/foo^{}"), and place any unmatched
    items onto the unmatched list.

  - if there are unmatched sought refs, then put all of the advertised
    tips into an oidset, including the unmatched ones.

  - for each sought ref, see if it's in the oidset, in which case it's
    legal for us to ask the server for it

The problem is in the second step. Our list of unmatched refs includes
the peeled refs, even though upload-pack does not allow them to be
directly fetched. So the simplest fix would be to exclude them during
that step.

However, we can observe that the unmatched list isn't used for anything
else, and is freed at the end. We can just free those malformed refs
immediately. That saves us having to check each ref a second time to see
if it's malformed.

Note that this code only kicks in when "strict" is in effect. I.e., if
we are using the v0 protocol and uploadpack.allowReachableSHA1InWant is
not in effect. With v2, all oids are allowed, and we do not bother
creating or consulting the oidset at all. To future-proof our test
against the upcoming GIT_TEST_PROTOCOL_VERSION flag, we'll manually mark
it as a v0-only test.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-15 14:00:52 +09:00
Jeff King
259eddde6a fetch: use free_refs()
There's no need for us to write this loop manually when a helper
function can already do it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-15 14:00:51 +09:00
Jonathan Tan
b764300912 fetch-pack: binary search when storing wanted-refs
In do_fetch_pack_v2(), the "sought" array is sorted by name, and it is
not subsequently reordered (within the function). Therefore,
receive_wanted_refs() can assume that "sought" is sorted, and can thus
use a binary search when storing wanted-refs retrieved from the server.

Replace the existing linear search with a binary search. This improves
performance significantly when mirror cloning a repository with more
than 1 million refs.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01 15:51:05 +09:00
Jonathan Tan
1339078f5e fetch-pack: respect --no-update-shallow in v2
In protocol v0, when sending "shallow" lines, the server distinguishes
between lines caused by the remote repo being shallow and lines caused
by client-specified depth settings. Unless "--update-shallow" is
specified, there is a difference in behavior: refs that reach the former
"shallow" lines, but not the latter, are rejected. But in v2, the server
does not, and the client treats all "shallow" lines like lines caused by
client-specified depth settings.

Full restoration of v0 functionality is not possible without protocol
change, but we can implement a heuristic: if we specify any depth
setting, treat all "shallow" lines like lines caused by client-specified
depth settings (that is, unaffected by "--no-update-shallow"), but
otherwise, treat them like lines caused by the remote repo being shallow
(that is, affected by "--no-update-shallow"). This restores most of v0
behavior, except in the case where a client fetches from a shallow
repository with depth settings.

This patch causes a test that previously failed with
GIT_TEST_PROTOCOL_VERSION=2 to pass.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01 15:35:56 +09:00
Jonathan Tan
1e7d440b0a fetch-pack: call prepare_shallow_info only if v0
In fetch_pack(), be clearer that there is no shallow information before
the fetch when v2 is used - memset the struct shallow_info to 0 instead
of calling prepare_shallow_info().

This patch is in preparation for a future patch in which a v2 fetch
might call prepare_shallow_info() after shallow info has been retrieved
during the fetch, so I needed to ensure that prepare_shallow_info() is
not called before the fetch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01 15:35:54 +09:00
Jeff King
0f804b0bac fetch_pack(): drop unused parameters
We don't need the caller of fetch_pack() to pass in "dest", which is the
remote URL. Since ba227857d2 (Reduce the number of connects when
fetching, 2008-02-04), the caller is responsible for calling
git_connect() itself, and our "dest" parameter is unused.

That commit also started passing us the resulting "conn" child_process
from git_connect(). But likewise, we do not need do anything with it.
The descriptors in "fd" are enough for us, and the caller is responsible
for cleaning up "conn".

We can just drop both parameters.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-20 18:34:09 +09:00
Junio C Hamano
27cdbdd134 Merge branch 'jk/no-sigpipe-during-network-transport'
On platforms where "git fetch" is killed with SIGPIPE (e.g. OSX),
the upload-pack that runs on the other end that hangs up after
detecting an error could cause "git fetch" to die with a signal,
which led to a flakey test.  "git fetch" now ignores SIGPIPE during
the network portion of its operation (this is not a problem as we
check the return status from our write(2)s).

* jk/no-sigpipe-during-network-transport:
  fetch: ignore SIGPIPE during network operation
  fetch: avoid calling write_or_die()
2019-03-20 15:16:06 +09:00
Jeff King
37c80012f1 fetch: avoid calling write_or_die()
The write_or_die() function has one quirk that a caller might not
expect: when it sees EPIPE from the write() call, it translates that
into a death by SIGPIPE. This doesn't change the overall behavior (the
program exits either way), but it does potentially confuse test scripts
looking for a non-signal exit code.

Let's switch away from using write_or_die() in a few code paths, which
will give us more consistent exit codes. It also gives us the
opportunity to write more descriptive error messages, since we have
context that write_or_die() does not.

Note that this won't do much by itself, since we'd typically be killed
by SIGPIPE before write_or_die() even gets a chance to do its thing.
That will be addressed in the next patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-05 15:02:01 +09:00
Junio C Hamano
8f0f46539a Merge branch 'bc/fetch-pack-clear-alternate-shallow'
"git fetch" over protocol v2 that needs to make a second connection
to backfill tags did not clear a variable that holds shallow
repository information correctly, leading to an access of freed
piece of memory.

* bc/fetch-pack-clear-alternate-shallow:
  fetch-pack: clear alternate shallow in one more place
  fetch-pack: clear alternate shallow when complete
2019-02-06 22:05:30 -08:00
brian m. carlson
380ebab209 fetch-pack: clear alternate shallow in one more place
The previous one did not clear the variable in one codepath,
but we should aim to be complete.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
[jc: made a reroll into incremental, as the previous one already is
 in the next branch]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-06 18:50:49 -08:00
Junio C Hamano
5f8b86db94 Merge branch 'jt/fetch-v2-sideband'
"git fetch" and "git upload-pack" learned to send all exchange over
the sideband channel while talking the v2 protocol.

* jt/fetch-v2-sideband:
  tests: define GIT_TEST_SIDEBAND_ALL
  {fetch,upload}-pack: sideband v2 fetch response
  sideband: reverse its dependency on pkt-line
  pkt-line: introduce struct packet_writer
  pack-protocol.txt: accept error packets in any context
  Use packet_reader instead of packet_read_line
2019-02-05 14:26:11 -08:00
Junio C Hamano
073312b4c7 Merge branch 'js/filter-options-should-use-plain-int'
Update the protocol message specification to allow only the limited
use of scaled quantities.  This is ensure potential compatibility
issues will not go out of hand.

* js/filter-options-should-use-plain-int:
  filter-options: expand scaled numbers
  tree:<depth>: skip some trees even when collecting omits
  list-objects-filter: teach tree:# how to handle >0
2019-02-05 14:26:10 -08:00
brian m. carlson
23311f3542 fetch-pack: clear alternate shallow when complete
When we write an alternate shallow file in update_shallow, we write it
into the lock file. The string stored in alternate_shallow_file is
copied from the lock file path, but it is freed the moment that the lock
file is closed, since we call strbuf_release to free that path.

This used to work, since we did not invoke git index-pack more than
once, but now that we do, we reuse the freed memory. Ensure we reset the
value to NULL to avoid using freed memory. git index-pack will read the
repository's shallow file, which will have been updated with the correct
information.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-04 13:33:32 -08:00
Jonathan Tan
07c3c2aa16 tests: define GIT_TEST_SIDEBAND_ALL
Define a GIT_TEST_SIDEBAND_ALL environment variable meant to be used
from tests. When set to true, this overrides uploadpack.allowsidebandall
to true, allowing the entire test suite to be run as if this
configuration is in place for all repositories.

As of this patch, all tests pass whether GIT_TEST_SIDEBAND_ALL is unset
or set to 1.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17 11:25:07 -08:00
Jonathan Tan
0bbc0bc574 {fetch,upload}-pack: sideband v2 fetch response
Currently, a response to a fetch request has sideband support only while
the packfile is being sent, meaning that the server cannot send notices
until the start of the packfile.

Extend sideband support in protocol v2 fetch responses to the whole
response. upload-pack will advertise it if the
uploadpack.allowsidebandall configuration variable is set, and
fetch-pack will automatically request it if advertised.

If the sideband is to be used throughout the whole response, upload-pack
will use it to send errors instead of prefixing a PKT-LINE payload with
"ERR ".

This will be tested in a subsequent patch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17 11:25:07 -08:00
Josh Steadmon
87c2d9d310 filter-options: expand scaled numbers
When communicating with a remote server or a subprocess, use
expanded numbers rather than numbers with scaling suffix in the
object filter spec (e.g.  "limit:blob=1k" becomes
"limit:blob=1024").

Update the protocol docs to note that clients should always perform this
expansion, to allow for more compatibility between server
implementations.

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15 15:42:31 -08:00
Junio C Hamano
17069c7fae Merge branch 'ms/packet-err-check' into jt/fetch-v2-sideband
* ms/packet-err-check:
  pack-protocol.txt: accept error packets in any context
  Use packet_reader instead of packet_read_line
2019-01-14 11:16:04 -08:00
Jonathan Tan
5056cf4a62 upload-pack: teach deepen-relative in protocol v2
Commit 685fbd3291 ("fetch-pack: perform a fetch using v2", 2018-03-15)
attempted to teach Git deepen-relative in protocol v2 (among other
things), but it didn't work:

 (1) fetch-pack.c needs to emit "deepen-relative".
 (2) upload-pack.c needs to ensure that the correct deepen_relative
     variable is passed to deepen() (there are two - the static variable
     and the one in struct upload_pack_data).
 (3) Before deepen() computes the list of reachable shallows, it first
     needs to mark all "our" refs as OUR_REF. v2 currently does not do
     this, because unlike v0, it is not needed otherwise.

Fix all this and include a test demonstrating that it works now. For
(2), the static variable deepen_relative is also eliminated, removing a
source of confusion.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-10 14:53:49 -08:00
Jonathan Tan
bd0b42aed3 fetch-pack: do not take shallow lock unnecessarily
When fetching using protocol v2, the remote may send a "shallow-info"
section if the client is shallow. If so, Git as the client currently
takes the shallow file lock, even if the "shallow-info" section is
empty.

This is not a problem except that Git does not support taking the
shallow file lock after modifying the shallow file, because
is_repository_shallow() stores information that is never cleared. And
this take-after-modify occurs when Git does a tag-following fetch from a
shallow repository on a transport that does not support tag following
(since in this case, 2 fetches are performed).

To solve this issue, take the shallow file lock (and perform all other
shallow processing) only if the "shallow-info" section is non-empty;
otherwise, behave as if it were empty.

A full solution (probably, ensuring that any action of committing
shallow file locks also includes clearing the information stored by
is_repository_shallow()) would solve the issue without need for this
patch, but this patch is independently useful (as an optimization to
prevent writing a file in an unnecessary case), hence why I wrote it. I
have included a NEEDSWORK outlining the full solution.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-10 14:53:35 -08:00
Junio C Hamano
3b2f8a02fa Merge branch 'jk/loose-object-cache'
Code clean-up with optimization for the codepath that checks
(non-)existence of loose objects.

* jk/loose-object-cache:
  odb_load_loose_cache: fix strbuf leak
  fetch-pack: drop custom loose object cache
  sha1-file: use loose object cache for quick existence check
  object-store: provide helpers for loose_objects_cache
  sha1-file: use an object_directory for the main object dir
  handle alternates paths the same as the main object dir
  sha1_file_name(): overwrite buffer instead of appending
  rename "alternate_object_database" to "object_directory"
  submodule--helper: prefer strip_suffix() to ends_with()
  fsck: do not reuse child_process structs
2019-01-04 13:33:32 -08:00
Masaya Suzuki
2d103c31c2 pack-protocol.txt: accept error packets in any context
In the Git pack protocol definition, an error packet may appear only in
a certain context. However, servers can face a runtime error (e.g. I/O
error) at an arbitrary timing. This patch changes the protocol to allow
an error packet to be sent instead of any packet.

Without this protocol spec change, when a server cannot process a
request, there's no way to tell that to a client. Since the server
cannot produce a valid response, it would be forced to cut a connection
without telling why. With this protocol spec change, the server can be
more gentle in this situation. An old client may see these error packets
as an unexpected packet, but this is not worse than having an unexpected
EOF.

Following this protocol spec change, the error packet handling code is
moved to pkt-line.c. Implementation wise, this implementation uses
pkt-line to communicate with a subprocess. Since this is not a part of
Git protocol, it's possible that a packet that is not supposed to be an
error packet is mistakenly parsed as an error packet. This error packet
handling is enabled only for the Git pack protocol parsing code
considering this.

Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-02 13:05:30 -08:00
Masaya Suzuki
01f9ec64c8 Use packet_reader instead of packet_read_line
By using and sharing a packet_reader while handling a Git pack protocol
request, the same reader option is used throughout the code. This makes
it easy to set a reader option to the request parsing code.

Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-02 13:05:27 -08:00
Jeff King
97b2fa08b6 fetch-pack: drop custom loose object cache
Commit 024aa4696c (fetch-pack.c: use oidset to check existence of loose
object, 2018-03-14) added a cache to avoid calling stat() for a bunch of
loose objects we don't have.

Now that OBJECT_INFO_QUICK handles this caching itself, we can drop the
custom solution.

Note that this might perform slightly differently, as the original code
stopped calling readdir() when we saw more loose objects than there were
refs. So:

  1. The old code might have spent work on readdir() to fill the cache,
     but then decided there were too many loose objects, wasting that
     effort.

  2. The new code might spend a lot of time on readdir() if you have a
     lot of loose objects, even though there are very few objects to
     ask about.

In practice it probably won't matter either way; see the previous commit
for some discussion of the tradeoff.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-13 14:22:03 +09:00
Jonathan Tan
5400b2a2d9 fetch-pack: be more precise in parsing v2 response
Each section in a protocol v2 response is followed by either a DELIM
packet (indicating more sections to follow) or a FLUSH packet
(indicating none to follow). But when parsing the "acknowledgments"
section, do_fetch_pack_v2() is liberal in accepting both, but determines
whether to continue reading or not based solely on the contents of the
"acknowledgments" section, not on whether DELIM or FLUSH was read.

There is no issue with a protocol-compliant server, but can result in
confusing error messages when communicating with a server that
serves unexpected additional sections. Consider a server that sends
"new-section" after "acknowledgments":

 - client writes request
 - client reads the "acknowledgments" section which contains no "ready",
   then DELIM
 - since there was no "ready", client needs to continue negotiation, and
   writes request
 - client reads "new-section", and reports to the end user "expected
   'acknowledgments', received 'new-section'"

For the person debugging the involved Git implementation(s), the error
message is confusing in that "new-section" was not received in response
to the latest request, but to the first one.

One solution is to always continue reading after DELIM, but in this
case, we can do better. We know from the protocol that "ready" means at
least the packfile section is coming (hence, DELIM) and that no "ready"
means that no sections are to follow (hence, FLUSH). So teach
process_acks() to enforce this.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-01 14:30:22 +09:00
Junio C Hamano
465e73fff3 Merge branch 'tb/filter-alternate-refs'
When pushing into a repository that borrows its objects from an
alternate object store, "git receive-pack" that responds to the
push request on the other side lists the tips of refs in the
alternate to reduce the amount of objects transferred.  This
sometimes is detrimental when the number of refs in the alternate
is absurdly large, in which case the bandwidth saved in potentially
fewer objects transferred is wasted in excessively large ref
advertisement.  The alternate refs that are advertised are now
configurable with a pair of configuration variables.

* tb/filter-alternate-refs:
  transport.c: introduce core.alternateRefsPrefixes
  transport.c: introduce core.alternateRefsCommand
  transport.c: extract 'fill_alternate_refs_command'
  transport: drop refnames from for_each_alternate_ref
2018-10-19 13:34:08 +09:00
Junio C Hamano
0527fbab68 Merge branch 'jt/avoid-ls-refs'
Over some transports, fetching objects with an exact commit object
name can be done without first seeing the ref advertisements.  The
code has been optimized to exploit this.

* jt/avoid-ls-refs:
  fetch: do not list refs if fetching only hashes
  transport: list refs before fetch if necessary
  transport: do not list refs if possible
  transport: allow skipping of ref listing
2018-10-19 13:34:07 +09:00
Junio C Hamano
fa54cccf1f Merge branch 'jt/non-blob-lazy-fetch'
A partial clone that is configured to lazily fetch missing objects
will on-demand issue a "git fetch" request to the originating
repository to fill not-yet-obtained objects.  The request has been
optimized for requesting a tree object (and not the leaf blob
objects contained in it) by telling the originating repository that
no blobs are needed.

* jt/non-blob-lazy-fetch:
  fetch-pack: exclude blobs when lazy-fetching trees
  fetch-pack: avoid object flags if no_dependents
2018-10-19 13:34:07 +09:00
Jeff King
bdf4276c91 transport: drop refnames from for_each_alternate_ref
None of the current callers use the refname parameter we pass to their
callbacks. In theory somebody _could_ do so, but it's actually quite
weird if you think about it: it's a ref in somebody else's repository.
So the name has no meaning locally, and in fact there may be duplicates
if there are multiple alternates.

The users of this interface really only care about seeing some ref tips,
since that promises that the alternate has the full commit graph
reachable from there. So let's keep the information we pass back to the
bare minimum.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-09 14:30:02 +09:00
Jonathan Tan
0177565148 transport: do not list refs if possible
When all refs to be fetched are exact OIDs, it is possible to perform a
fetch without requiring the remote to list refs if protocol v2 is used.
Teach Git to do this.

This currently has an effect only for lazy fetches done from partial
clones. The change necessary to likewise optimize "git fetch <remote>
<sha-1>" will be done in a subsequent patch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07 09:53:15 +09:00
René Scharfe
22a1646511 fetch-pack: load tip_oids eagerly iff needed
tip_oids_contain() lazily loads refs into an oidset at its first call.
It abuses the internal (sub)member .map.tablesize of that oidset to
check if it has done that already.

Determine if the oidset needs to be populated upfront and then do that
instead.  This duplicates a loop, but simplifies the existing one by
separating concerns between the two.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04 11:12:13 -07:00
René Scharfe
bf73282c0b fetch-pack: factor out is_unmatched_ref()
Move the code to determine if a request is unmatched to its own little
helper.  This allows us to reuse it in a subsequent patch.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04 11:12:13 -07:00
Jonathan Tan
4c7f9567ea fetch-pack: exclude blobs when lazy-fetching trees
A partial clone with missing trees can be obtained using "git clone
--filter=tree:none <repo>". In such a repository, when a tree needs to
be lazily fetched, any tree or blob it directly or indirectly references
is fetched as well, regardless of whether the original command required
those objects, or if the local repository already had some of them.

This is because the fetch protocol, which the lazy fetch uses, does not
allow clients to request that only the wanted objects be sent, which
would be the ideal solution. This patch implements a partial solution:
specify the "blob:none" filter, somewhat reducing the fetch payload.

This change has no effect when lazily fetching blobs (due to how filters
work). And if lazily fetching a commit (such repositories are difficult
to construct and is not a use case we support very well, but it is
possible), referenced commits and trees are still fetched - only the
blobs are not fetched.

The necessary code change is done in fetch_pack() instead of somewhere
closer to where the "filter" instruction is written to the wire so that
only one part of the code needs to be changed in order for users of all
protocol versions to benefit from this optimization.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04 06:03:49 -07:00