git/connected.h
Jonathan Tan cf1e7c0770 fetch-pack: write shallow, then check connectivity
When fetching, connectivity is checked after the shallow file is
updated. There are 2 issues with this: (1) the connectivity check is
only performed up to ancestors of existing refs (which is not thorough
enough if we were deepening an existing ref in the first place), and (2)
there is no rollback of the shallow file if the connectivity check
fails.

To solve (1), update the connectivity check to check the ancestry chain
completely in the case of a deepening fetch by refraining from passing
"--not --all" when invoking rev-list in connected.c.

To solve (2), have fetch_pack() perform its own connectivity check
before updating the shallow file. To support existing use cases in which
"git fetch-pack" is used to download objects without much regard as to
the connectivity of the resulting objects with respect to the existing
repository, the connectivity check is only done if necessary (that is,
the fetch is not a clone, and the fetch involves shallow/deepen
functionality). "git fetch" still performs its own connectivity check,
preserving correctness but sometimes performing redundant work. This
redundancy is mitigated by the fact that fetch_pack() reports if it has
performed a connectivity check itself, and if the transport supports
connect or stateless-connect, it will bubble up that report so that "git
fetch" knows not to perform the connectivity check in such a case.

This was noticed when a user tried to deepen an existing repository by
fetching with --no-shallow from a server that did not send all necessary
objects - the connectivity check as run by "git fetch" succeeded, but a
subsequent "git fsck" failed.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03 14:57:44 -07:00

65 lines
1.7 KiB
C

#ifndef CONNECTED_H
#define CONNECTED_H
struct transport;
/*
* Take callback data, and return next object name in the buffer.
* When called after returning the name for the last object, return -1
* to signal EOF, otherwise return 0.
*/
typedef int (*oid_iterate_fn)(void *, struct object_id *oid);
/*
* Named-arguments struct for check_connected. All arguments are
* optional, and can be left to defaults as set by CHECK_CONNECTED_INIT.
*/
struct check_connected_options {
/* Avoid printing any errors to stderr. */
int quiet;
/* --shallow-file to pass to rev-list sub-process */
const char *shallow_file;
/* Transport whose objects we are checking, if available. */
struct transport *transport;
/*
* If non-zero, send error messages to this descriptor rather
* than stderr. The descriptor is closed before check_connected
* returns.
*/
int err_fd;
/* If non-zero, show progress as we traverse the objects. */
int progress;
/*
* Insert these variables into the environment of the child process.
*/
const char **env;
/*
* If non-zero, check the ancestry chain completely, not stopping at
* any existing ref. This is necessary when deepening existing refs
* during a fetch.
*/
unsigned is_deepening_fetch : 1;
};
#define CHECK_CONNECTED_INIT { 0 }
/*
* Make sure that our object store has all the commits necessary to
* connect the ancestry chain to some of our existing refs, and all
* the trees and blobs that these commits use.
*
* Return 0 if Ok, non zero otherwise (i.e. some missing objects)
*
* If "opt" is NULL, behaves as if CHECK_CONNECTED_INIT was passed.
*/
int check_connected(oid_iterate_fn fn, void *cb_data,
struct check_connected_options *opt);
#endif /* CONNECTED_H */