2005-06-06 12:59:54 +08:00
|
|
|
#include "cache.h"
|
2006-04-02 20:44:09 +08:00
|
|
|
#include "blob.h"
|
2018-05-16 07:42:15 +08:00
|
|
|
#include "object-store.h"
|
2009-01-10 20:07:50 +08:00
|
|
|
#include "dir.h"
|
2011-05-13 05:31:08 +08:00
|
|
|
#include "streaming.h"
|
2017-03-15 05:46:40 +08:00
|
|
|
#include "submodule.h"
|
2017-08-20 23:47:20 +08:00
|
|
|
#include "progress.h"
|
2017-09-23 00:35:40 +08:00
|
|
|
#include "fsmonitor.h"
|
2021-03-23 22:19:32 +08:00
|
|
|
#include "entry.h"
|
unpack-trees: add basic support for parallel checkout
This new interface allows us to enqueue some of the entries being
checked out to later uncompress them, apply in-process filters, and
write out the files in parallel. For now, the parallel checkout
machinery is enabled by default and there is no user configuration, but
run_parallel_checkout() just writes the queued entries in sequence
(without spawning additional workers). The next patch will actually
implement the parallelism and, later, we will make it configurable.
Note that, to avoid potential data races, not all entries are eligible
for parallel checkout. Also, paths that collide on disk (e.g.
case-sensitive paths in case-insensitive file systems), are detected by
the parallel checkout code and skipped, so that they can be safely
sequentially handled later. The collision detection works like the
following:
- If the collision was at basename (e.g. 'a/b' and 'a/B'), the framework
detects it by looking for EEXIST and EISDIR errors after an
open(O_CREAT | O_EXCL) failure.
- If the collision was at dirname (e.g. 'a/b' and 'A'), it is detected
at the has_dirs_only_path() check, which is done for the leading path
of each item in the parallel checkout queue.
Both verifications rely on the fact that, before enqueueing an entry for
parallel checkout, checkout_entry() makes sure that there is no file at
the entry's path and that its leading components are all real
directories. So, any later change in these conditions indicates that
there was a collision (either between two parallel-eligible entries or
between an eligible and an ineligible one).
After all parallel-eligible entries have been processed, the collided
(and thus, skipped) entries are sequentially fed to checkout_entry()
again. This is similar to the way the current code deals with
collisions, overwriting the previously checked out entries with the
subsequent ones. The only difference is that, since we no longer create
the files in the same order that they appear on index, we are not able
to determine which of the colliding entries will survive on disk (for
the classic code, it is always the last entry).
Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-19 08:14:53 +08:00
|
|
|
#include "parallel-checkout.h"
|
2005-06-06 12:59:54 +08:00
|
|
|
|
2009-02-10 04:54:08 +08:00
|
|
|
static void create_directories(const char *path, int path_len,
|
|
|
|
const struct checkout *state)
|
2005-06-06 12:59:54 +08:00
|
|
|
{
|
2016-02-23 06:44:28 +08:00
|
|
|
char *buf = xmallocz(path_len);
|
2009-02-10 04:54:08 +08:00
|
|
|
int len = 0;
|
|
|
|
|
|
|
|
while (len < path_len) {
|
|
|
|
do {
|
|
|
|
buf[len] = path[len];
|
|
|
|
len++;
|
|
|
|
} while (len < path_len && path[len] != '/');
|
|
|
|
if (len >= path_len)
|
|
|
|
break;
|
2005-06-06 12:59:54 +08:00
|
|
|
buf[len] = 0;
|
Do not expect unlink(2) to fail on a directory.
When "git checkout-index" checks out path A/B/C, it makes sure A
and A/B are truly directories; if there is a regular file or
symlink at A, we prefer to remove it.
We used to do this by catching an error return from mkdir(2),
and on EEXIST did unlink(2), and when it succeeded, tried
another mkdir(2).
Thomas Glanzmann found out the above does not work on Solaris
for a root user, as unlink(2) was so old fashioned there that it
allowed to unlink a directory.
As pointed out, this still doesn't guarantee that git won't call
"unlink()" on a directory (race conditions etc), but that's
fundamentally true (there is no "funlink()" like there is
"fstat()"), and besides, that is in no way git-specific (ie it's
true of any application that gets run as root).
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-07-18 13:58:28 +08:00
|
|
|
|
2009-01-18 23:14:52 +08:00
|
|
|
/*
|
|
|
|
* For 'checkout-index --prefix=<dir>', <dir> is
|
|
|
|
* allowed to be a symlink to an existing directory,
|
|
|
|
* and we set 'state->base_dir_len' below, such that
|
|
|
|
* we test the path components of the prefix with the
|
|
|
|
* stat() function instead of the lstat() function.
|
|
|
|
*/
|
2009-02-10 04:54:06 +08:00
|
|
|
if (has_dirs_only_path(buf, len, state->base_dir_len))
|
Do not expect unlink(2) to fail on a directory.
When "git checkout-index" checks out path A/B/C, it makes sure A
and A/B are truly directories; if there is a regular file or
symlink at A, we prefer to remove it.
We used to do this by catching an error return from mkdir(2),
and on EEXIST did unlink(2), and when it succeeded, tried
another mkdir(2).
Thomas Glanzmann found out the above does not work on Solaris
for a root user, as unlink(2) was so old fashioned there that it
allowed to unlink a directory.
As pointed out, this still doesn't guarantee that git won't call
"unlink()" on a directory (race conditions etc), but that's
fundamentally true (there is no "funlink()" like there is
"fstat()"), and besides, that is in no way git-specific (ie it's
true of any application that gets run as root).
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-07-18 13:58:28 +08:00
|
|
|
continue; /* ok, it is already a directory. */
|
|
|
|
|
|
|
|
/*
|
2009-01-18 23:14:52 +08:00
|
|
|
* If this mkdir() would fail, it could be that there
|
|
|
|
* is already a symlink or something else exists
|
|
|
|
* there, therefore we then try to unlink it and try
|
|
|
|
* one more time to create the directory.
|
Do not expect unlink(2) to fail on a directory.
When "git checkout-index" checks out path A/B/C, it makes sure A
and A/B are truly directories; if there is a regular file or
symlink at A, we prefer to remove it.
We used to do this by catching an error return from mkdir(2),
and on EEXIST did unlink(2), and when it succeeded, tried
another mkdir(2).
Thomas Glanzmann found out the above does not work on Solaris
for a root user, as unlink(2) was so old fashioned there that it
allowed to unlink a directory.
As pointed out, this still doesn't guarantee that git won't call
"unlink()" on a directory (race conditions etc), but that's
fundamentally true (there is no "funlink()" like there is
"fstat()"), and besides, that is in no way git-specific (ie it's
true of any application that gets run as root).
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-07-18 13:58:28 +08:00
|
|
|
*/
|
2005-07-06 16:21:46 +08:00
|
|
|
if (mkdir(buf, 0777)) {
|
Do not expect unlink(2) to fail on a directory.
When "git checkout-index" checks out path A/B/C, it makes sure A
and A/B are truly directories; if there is a regular file or
symlink at A, we prefer to remove it.
We used to do this by catching an error return from mkdir(2),
and on EEXIST did unlink(2), and when it succeeded, tried
another mkdir(2).
Thomas Glanzmann found out the above does not work on Solaris
for a root user, as unlink(2) was so old fashioned there that it
allowed to unlink a directory.
As pointed out, this still doesn't guarantee that git won't call
"unlink()" on a directory (race conditions etc), but that's
fundamentally true (there is no "funlink()" like there is
"fstat()"), and besides, that is in no way git-specific (ie it's
true of any application that gets run as root).
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-07-18 13:58:28 +08:00
|
|
|
if (errno == EEXIST && state->force &&
|
2009-04-30 05:22:56 +08:00
|
|
|
!unlink_or_warn(buf) && !mkdir(buf, 0777))
|
Do not expect unlink(2) to fail on a directory.
When "git checkout-index" checks out path A/B/C, it makes sure A
and A/B are truly directories; if there is a regular file or
symlink at A, we prefer to remove it.
We used to do this by catching an error return from mkdir(2),
and on EEXIST did unlink(2), and when it succeeded, tried
another mkdir(2).
Thomas Glanzmann found out the above does not work on Solaris
for a root user, as unlink(2) was so old fashioned there that it
allowed to unlink a directory.
As pointed out, this still doesn't guarantee that git won't call
"unlink()" on a directory (race conditions etc), but that's
fundamentally true (there is no "funlink()" like there is
"fstat()"), and besides, that is in no way git-specific (ie it's
true of any application that gets run as root).
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-07-18 13:58:28 +08:00
|
|
|
continue;
|
2009-06-27 23:58:47 +08:00
|
|
|
die_errno("cannot create directory at '%s'", buf);
|
2005-06-06 12:59:54 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
free(buf);
|
|
|
|
}
|
|
|
|
|
2014-03-13 17:19:08 +08:00
|
|
|
static void remove_subtree(struct strbuf *path)
|
2005-06-06 12:59:54 +08:00
|
|
|
{
|
2014-03-13 17:19:08 +08:00
|
|
|
DIR *dir = opendir(path->buf);
|
2005-06-06 12:59:54 +08:00
|
|
|
struct dirent *de;
|
2014-03-13 17:19:08 +08:00
|
|
|
int origlen = path->len;
|
2007-06-07 15:04:01 +08:00
|
|
|
|
2005-06-06 12:59:54 +08:00
|
|
|
if (!dir)
|
2014-03-13 17:19:08 +08:00
|
|
|
die_errno("cannot opendir '%s'", path->buf);
|
2021-05-13 01:28:22 +08:00
|
|
|
while ((de = readdir_skip_dot_and_dotdot(dir)) != NULL) {
|
2005-06-06 12:59:54 +08:00
|
|
|
struct stat st;
|
2014-03-13 17:19:08 +08:00
|
|
|
|
|
|
|
strbuf_addch(path, '/');
|
|
|
|
strbuf_addstr(path, de->d_name);
|
|
|
|
if (lstat(path->buf, &st))
|
|
|
|
die_errno("cannot lstat '%s'", path->buf);
|
2005-06-06 12:59:54 +08:00
|
|
|
if (S_ISDIR(st.st_mode))
|
2014-03-13 17:19:08 +08:00
|
|
|
remove_subtree(path);
|
|
|
|
else if (unlink(path->buf))
|
|
|
|
die_errno("cannot unlink '%s'", path->buf);
|
|
|
|
strbuf_setlen(path, origlen);
|
2005-06-06 12:59:54 +08:00
|
|
|
}
|
|
|
|
closedir(dir);
|
2014-03-13 17:19:08 +08:00
|
|
|
if (rmdir(path->buf))
|
|
|
|
die_errno("cannot rmdir '%s'", path->buf);
|
2005-06-06 12:59:54 +08:00
|
|
|
}
|
|
|
|
|
2005-07-15 00:58:45 +08:00
|
|
|
static int create_file(const char *path, unsigned int mode)
|
2005-06-06 12:59:54 +08:00
|
|
|
{
|
|
|
|
mode = (mode & 0100) ? 0777 : 0666;
|
2006-01-05 16:58:06 +08:00
|
|
|
return open(path, O_WRONLY | O_CREAT | O_EXCL, mode);
|
2005-06-06 12:59:54 +08:00
|
|
|
}
|
|
|
|
|
2021-11-02 23:46:08 +08:00
|
|
|
void *read_blob_entry(const struct cache_entry *ce, size_t *size)
|
2007-04-14 00:26:04 +08:00
|
|
|
{
|
|
|
|
enum object_type type;
|
2021-11-02 23:46:08 +08:00
|
|
|
unsigned long ul;
|
|
|
|
void *blob_data = read_object_file(&ce->oid, &type, &ul);
|
2007-04-14 00:26:04 +08:00
|
|
|
|
2021-11-02 23:46:08 +08:00
|
|
|
*size = ul;
|
2018-02-15 02:59:41 +08:00
|
|
|
if (blob_data) {
|
2007-04-14 00:26:04 +08:00
|
|
|
if (type == OBJ_BLOB)
|
2018-02-15 02:59:41 +08:00
|
|
|
return blob_data;
|
|
|
|
free(blob_data);
|
2007-04-14 00:26:04 +08:00
|
|
|
}
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
Convert "struct cache_entry *" to "const ..." wherever possible
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is
- diff-lib.c: setting CE_UPTODATE
- name-hash.c: setting CE_HASHED
- preload-index.c, read-cache.c, unpack-trees.c and
builtin/update-index: obvious
- entry.c: write_entry() may refresh the checked out entry via
fill_stat_cache_info(). This causes "non-const struct cache_entry
*" in builtin/apply.c, builtin/checkout-index.c and
builtin/checkout.c
- builtin/ls-files.c: --with-tree changes stagemask and may set
CE_UPDATE
Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.
So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.
The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:
diff --git a/cache.h b/cache.h
index 430d021..1692891 100644
--- a/cache.h
+++ b/cache.h
@@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
#define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
struct index_state {
- struct cache_entry **cache;
+ const struct cache_entry **cache;
unsigned int version;
unsigned int cache_nr, cache_alloc, cache_changed;
struct string_list *resolve_undo;
will help quickly identify them without bogus warnings.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-09 23:29:00 +08:00
|
|
|
static int open_output_fd(char *path, const struct cache_entry *ce, int to_tempfile)
|
2011-05-13 12:36:42 +08:00
|
|
|
{
|
|
|
|
int symlink = (ce->ce_mode & S_IFMT) != S_IFREG;
|
|
|
|
if (to_tempfile) {
|
2015-09-25 05:06:53 +08:00
|
|
|
xsnprintf(path, TEMPORARY_FILENAME_LENGTH, "%s",
|
|
|
|
symlink ? ".merge_link_XXXXXX" : ".merge_file_XXXXXX");
|
2011-05-13 12:36:42 +08:00
|
|
|
return mkstemp(path);
|
|
|
|
} else {
|
|
|
|
return create_file(path, !symlink ? ce->ce_mode : 0666);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-03-23 22:19:33 +08:00
|
|
|
int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st)
|
2011-05-13 12:36:42 +08:00
|
|
|
{
|
|
|
|
/* use fstat() only when path == ce->name */
|
|
|
|
if (fstat_is_reliable() &&
|
|
|
|
state->refresh_cache && !state->base_dir_len) {
|
2020-07-09 10:10:39 +08:00
|
|
|
return !fstat(fd, st);
|
2011-05-13 12:36:42 +08:00
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
Convert "struct cache_entry *" to "const ..." wherever possible
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is
- diff-lib.c: setting CE_UPTODATE
- name-hash.c: setting CE_HASHED
- preload-index.c, read-cache.c, unpack-trees.c and
builtin/update-index: obvious
- entry.c: write_entry() may refresh the checked out entry via
fill_stat_cache_info(). This causes "non-const struct cache_entry
*" in builtin/apply.c, builtin/checkout-index.c and
builtin/checkout.c
- builtin/ls-files.c: --with-tree changes stagemask and may set
CE_UPDATE
Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.
So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.
The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:
diff --git a/cache.h b/cache.h
index 430d021..1692891 100644
--- a/cache.h
+++ b/cache.h
@@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
#define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
struct index_state {
- struct cache_entry **cache;
+ const struct cache_entry **cache;
unsigned int version;
unsigned int cache_nr, cache_alloc, cache_changed;
struct string_list *resolve_undo;
will help quickly identify them without bogus warnings.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-09 23:29:00 +08:00
|
|
|
static int streaming_write_entry(const struct cache_entry *ce, char *path,
|
2011-05-21 05:33:31 +08:00
|
|
|
struct stream_filter *filter,
|
2011-05-13 05:31:08 +08:00
|
|
|
const struct checkout *state, int to_tempfile,
|
|
|
|
int *fstat_done, struct stat *statbuf)
|
|
|
|
{
|
2013-03-26 05:49:36 +08:00
|
|
|
int result = 0;
|
2012-03-07 18:54:15 +08:00
|
|
|
int fd;
|
2011-05-13 05:31:08 +08:00
|
|
|
|
|
|
|
fd = open_output_fd(path, ce, to_tempfile);
|
2013-03-26 05:49:36 +08:00
|
|
|
if (fd < 0)
|
|
|
|
return -1;
|
|
|
|
|
2016-09-06 04:07:59 +08:00
|
|
|
result |= stream_blob_to_fd(fd, &ce->oid, filter, 1);
|
2021-03-23 22:19:33 +08:00
|
|
|
*fstat_done = fstat_checkout_output(fd, state, statbuf);
|
2013-03-26 05:49:36 +08:00
|
|
|
result |= close(fd);
|
|
|
|
|
|
|
|
if (result)
|
2011-05-13 05:31:08 +08:00
|
|
|
unlink(path);
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2017-07-01 04:41:28 +08:00
|
|
|
void enable_delayed_checkout(struct checkout *state)
|
|
|
|
{
|
|
|
|
if (!state->delayed_checkout) {
|
|
|
|
state->delayed_checkout = xmalloc(sizeof(*state->delayed_checkout));
|
|
|
|
state->delayed_checkout->state = CE_CAN_DELAY;
|
2021-07-01 18:51:29 +08:00
|
|
|
string_list_init_nodup(&state->delayed_checkout->filters);
|
|
|
|
string_list_init_nodup(&state->delayed_checkout->paths);
|
2017-07-01 04:41:28 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int remove_available_paths(struct string_list_item *item, void *cb_data)
|
|
|
|
{
|
|
|
|
struct string_list *available_paths = cb_data;
|
|
|
|
struct string_list_item *available;
|
|
|
|
|
|
|
|
available = string_list_lookup(available_paths, item->string);
|
|
|
|
if (available)
|
2022-07-14 19:49:12 +08:00
|
|
|
available->util = item->util;
|
2017-07-01 04:41:28 +08:00
|
|
|
return !available;
|
|
|
|
}
|
|
|
|
|
2022-07-14 19:49:12 +08:00
|
|
|
int finish_delayed_checkout(struct checkout *state, int show_progress)
|
2017-07-01 04:41:28 +08:00
|
|
|
{
|
|
|
|
int errs = 0;
|
entry: show finer-grained counter in "Filtering content" progress line
The "Filtering content" progress in entry.c:finish_delayed_checkout()
is unusual because of how it calculates the progress count and because
it shows the progress of a nested loop. It works basically like this:
start_delayed_progress(p, nr_of_paths_to_filter)
for_each_filter {
display_progress(p, nr_of_paths_to_filter - nr_of_paths_still_left_to_filter)
for_each_path_handled_by_the_current_filter {
checkout_entry()
}
}
stop_progress(p)
There are two issues with this approach:
- The work done by the last filter (or the only filter if there is
only one) is never counted, so if the last filter still has some
paths to process, then the counter shown in the "done" progress
line will not match the expected total.
The partially-RFC series to add a GIT_TEST_CHECK_PROGRESS=1
mode[1] helps spot this issue. Under it the 'missing file in
delayed checkout' and 'invalid file in delayed checkout' tests in
't0021-conversion.sh' fail, because both use only one
filter. (The test 'delayed checkout in process filter' uses two
filters but the first one does all the work, so that test already
happens to succeed even with GIT_TEST_CHECK_PROGRESS=1.)
- The progress counter is updated only once per filter, not once per
processed path, so if a filter has a lot of paths to process, then
the counter might stay unchanged for a long while and then make a
big jump (though the user still gets a sense of progress, because
we call display_throughput() after each processed path to show the
amount of processed data).
Move the display_progress() call to the inner loop, right next to that
checkout_entry() call that does the hard work for each path, and use a
dedicated counter variable that is incremented upon processing each
path.
After this change the 'invalid file in delayed checkout' in
't0021-conversion.sh' would succeed with the GIT_TEST_CHECK_PROGRESS=1
assertion discussed above, but the 'missing file in delayed checkout'
test would still fail.
It'll fail because its purposefully buggy filter doesn't process any
paths, so we won't execute that inner loop at all, see [2] for how to
spot that issue without GIT_TEST_CHECK_PROGRESS=1. It's not
straightforward to fix it with the current progress.c library (see [3]
for an attempt), so let's leave it for now.
Let's also initialize the *progress to "NULL" while we're at it. Since
7a132c628e5 (checkout: make delayed checkout respect --quiet and
--no-progress, 2021-08-26) we have had progress conditional on
"show_progress", usually we use the idiom of a "NULL" initialization
of the "*progress", rather than the more verbose ternary added in
7a132c628e5.
1. https://lore.kernel.org/git/20210620200303.2328957-1-szeder.dev@gmail.com/
2. http://lore.kernel.org/git/20210802214827.GE23408@szeder.dev
3. https://lore.kernel.org/git/20210620200303.2328957-7-szeder.dev@gmail.com/
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 09:10:12 +08:00
|
|
|
unsigned processed_paths = 0;
|
2017-08-20 23:47:20 +08:00
|
|
|
off_t filtered_bytes = 0;
|
2017-07-01 04:41:28 +08:00
|
|
|
struct string_list_item *filter, *path;
|
entry: show finer-grained counter in "Filtering content" progress line
The "Filtering content" progress in entry.c:finish_delayed_checkout()
is unusual because of how it calculates the progress count and because
it shows the progress of a nested loop. It works basically like this:
start_delayed_progress(p, nr_of_paths_to_filter)
for_each_filter {
display_progress(p, nr_of_paths_to_filter - nr_of_paths_still_left_to_filter)
for_each_path_handled_by_the_current_filter {
checkout_entry()
}
}
stop_progress(p)
There are two issues with this approach:
- The work done by the last filter (or the only filter if there is
only one) is never counted, so if the last filter still has some
paths to process, then the counter shown in the "done" progress
line will not match the expected total.
The partially-RFC series to add a GIT_TEST_CHECK_PROGRESS=1
mode[1] helps spot this issue. Under it the 'missing file in
delayed checkout' and 'invalid file in delayed checkout' tests in
't0021-conversion.sh' fail, because both use only one
filter. (The test 'delayed checkout in process filter' uses two
filters but the first one does all the work, so that test already
happens to succeed even with GIT_TEST_CHECK_PROGRESS=1.)
- The progress counter is updated only once per filter, not once per
processed path, so if a filter has a lot of paths to process, then
the counter might stay unchanged for a long while and then make a
big jump (though the user still gets a sense of progress, because
we call display_throughput() after each processed path to show the
amount of processed data).
Move the display_progress() call to the inner loop, right next to that
checkout_entry() call that does the hard work for each path, and use a
dedicated counter variable that is incremented upon processing each
path.
After this change the 'invalid file in delayed checkout' in
't0021-conversion.sh' would succeed with the GIT_TEST_CHECK_PROGRESS=1
assertion discussed above, but the 'missing file in delayed checkout'
test would still fail.
It'll fail because its purposefully buggy filter doesn't process any
paths, so we won't execute that inner loop at all, see [2] for how to
spot that issue without GIT_TEST_CHECK_PROGRESS=1. It's not
straightforward to fix it with the current progress.c library (see [3]
for an attempt), so let's leave it for now.
Let's also initialize the *progress to "NULL" while we're at it. Since
7a132c628e5 (checkout: make delayed checkout respect --quiet and
--no-progress, 2021-08-26) we have had progress conditional on
"show_progress", usually we use the idiom of a "NULL" initialization
of the "*progress", rather than the more verbose ternary added in
7a132c628e5.
1. https://lore.kernel.org/git/20210620200303.2328957-1-szeder.dev@gmail.com/
2. http://lore.kernel.org/git/20210802214827.GE23408@szeder.dev
3. https://lore.kernel.org/git/20210620200303.2328957-7-szeder.dev@gmail.com/
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 09:10:12 +08:00
|
|
|
struct progress *progress = NULL;
|
2017-07-01 04:41:28 +08:00
|
|
|
struct delayed_checkout *dco = state->delayed_checkout;
|
|
|
|
|
|
|
|
if (!state->delayed_checkout)
|
|
|
|
return errs;
|
|
|
|
|
|
|
|
dco->state = CE_RETRY;
|
entry: show finer-grained counter in "Filtering content" progress line
The "Filtering content" progress in entry.c:finish_delayed_checkout()
is unusual because of how it calculates the progress count and because
it shows the progress of a nested loop. It works basically like this:
start_delayed_progress(p, nr_of_paths_to_filter)
for_each_filter {
display_progress(p, nr_of_paths_to_filter - nr_of_paths_still_left_to_filter)
for_each_path_handled_by_the_current_filter {
checkout_entry()
}
}
stop_progress(p)
There are two issues with this approach:
- The work done by the last filter (or the only filter if there is
only one) is never counted, so if the last filter still has some
paths to process, then the counter shown in the "done" progress
line will not match the expected total.
The partially-RFC series to add a GIT_TEST_CHECK_PROGRESS=1
mode[1] helps spot this issue. Under it the 'missing file in
delayed checkout' and 'invalid file in delayed checkout' tests in
't0021-conversion.sh' fail, because both use only one
filter. (The test 'delayed checkout in process filter' uses two
filters but the first one does all the work, so that test already
happens to succeed even with GIT_TEST_CHECK_PROGRESS=1.)
- The progress counter is updated only once per filter, not once per
processed path, so if a filter has a lot of paths to process, then
the counter might stay unchanged for a long while and then make a
big jump (though the user still gets a sense of progress, because
we call display_throughput() after each processed path to show the
amount of processed data).
Move the display_progress() call to the inner loop, right next to that
checkout_entry() call that does the hard work for each path, and use a
dedicated counter variable that is incremented upon processing each
path.
After this change the 'invalid file in delayed checkout' in
't0021-conversion.sh' would succeed with the GIT_TEST_CHECK_PROGRESS=1
assertion discussed above, but the 'missing file in delayed checkout'
test would still fail.
It'll fail because its purposefully buggy filter doesn't process any
paths, so we won't execute that inner loop at all, see [2] for how to
spot that issue without GIT_TEST_CHECK_PROGRESS=1. It's not
straightforward to fix it with the current progress.c library (see [3]
for an attempt), so let's leave it for now.
Let's also initialize the *progress to "NULL" while we're at it. Since
7a132c628e5 (checkout: make delayed checkout respect --quiet and
--no-progress, 2021-08-26) we have had progress conditional on
"show_progress", usually we use the idiom of a "NULL" initialization
of the "*progress", rather than the more verbose ternary added in
7a132c628e5.
1. https://lore.kernel.org/git/20210620200303.2328957-1-szeder.dev@gmail.com/
2. http://lore.kernel.org/git/20210802214827.GE23408@szeder.dev
3. https://lore.kernel.org/git/20210620200303.2328957-7-szeder.dev@gmail.com/
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 09:10:12 +08:00
|
|
|
if (show_progress)
|
|
|
|
progress = start_delayed_progress(_("Filtering content"), dco->paths.nr);
|
2017-07-01 04:41:28 +08:00
|
|
|
while (dco->filters.nr > 0) {
|
|
|
|
for_each_string_list_item(filter, &dco->filters) {
|
|
|
|
struct string_list available_paths = STRING_LIST_INIT_NODUP;
|
|
|
|
|
|
|
|
if (!async_query_available_blobs(filter->string, &available_paths)) {
|
|
|
|
/* Filter reported an error */
|
|
|
|
errs = 1;
|
|
|
|
filter->string = "";
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
if (available_paths.nr <= 0) {
|
|
|
|
/*
|
|
|
|
* Filter responded with no entries. That means
|
|
|
|
* the filter is done and we can remove the
|
|
|
|
* filter from the list (see
|
|
|
|
* "string_list_remove_empty_items" call below).
|
|
|
|
*/
|
|
|
|
filter->string = "";
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* In dco->paths we store a list of all delayed paths.
|
|
|
|
* The filter just send us a list of available paths.
|
|
|
|
* Remove them from the list.
|
|
|
|
*/
|
|
|
|
filter_string_list(&dco->paths, 0,
|
|
|
|
&remove_available_paths, &available_paths);
|
|
|
|
|
|
|
|
for_each_string_list_item(path, &available_paths) {
|
|
|
|
struct cache_entry* ce;
|
|
|
|
|
|
|
|
if (!path->util) {
|
|
|
|
error("external filter '%s' signaled that '%s' "
|
|
|
|
"is now available although it has not been "
|
|
|
|
"delayed earlier",
|
|
|
|
filter->string, path->string);
|
|
|
|
errs |= 1;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Do not ask the filter for available blobs,
|
|
|
|
* again, as the filter is likely buggy.
|
|
|
|
*/
|
|
|
|
filter->string = "";
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
ce = index_file_exists(state->istate, path->string,
|
|
|
|
strlen(path->string), 0);
|
2017-08-20 23:47:20 +08:00
|
|
|
if (ce) {
|
entry: show finer-grained counter in "Filtering content" progress line
The "Filtering content" progress in entry.c:finish_delayed_checkout()
is unusual because of how it calculates the progress count and because
it shows the progress of a nested loop. It works basically like this:
start_delayed_progress(p, nr_of_paths_to_filter)
for_each_filter {
display_progress(p, nr_of_paths_to_filter - nr_of_paths_still_left_to_filter)
for_each_path_handled_by_the_current_filter {
checkout_entry()
}
}
stop_progress(p)
There are two issues with this approach:
- The work done by the last filter (or the only filter if there is
only one) is never counted, so if the last filter still has some
paths to process, then the counter shown in the "done" progress
line will not match the expected total.
The partially-RFC series to add a GIT_TEST_CHECK_PROGRESS=1
mode[1] helps spot this issue. Under it the 'missing file in
delayed checkout' and 'invalid file in delayed checkout' tests in
't0021-conversion.sh' fail, because both use only one
filter. (The test 'delayed checkout in process filter' uses two
filters but the first one does all the work, so that test already
happens to succeed even with GIT_TEST_CHECK_PROGRESS=1.)
- The progress counter is updated only once per filter, not once per
processed path, so if a filter has a lot of paths to process, then
the counter might stay unchanged for a long while and then make a
big jump (though the user still gets a sense of progress, because
we call display_throughput() after each processed path to show the
amount of processed data).
Move the display_progress() call to the inner loop, right next to that
checkout_entry() call that does the hard work for each path, and use a
dedicated counter variable that is incremented upon processing each
path.
After this change the 'invalid file in delayed checkout' in
't0021-conversion.sh' would succeed with the GIT_TEST_CHECK_PROGRESS=1
assertion discussed above, but the 'missing file in delayed checkout'
test would still fail.
It'll fail because its purposefully buggy filter doesn't process any
paths, so we won't execute that inner loop at all, see [2] for how to
spot that issue without GIT_TEST_CHECK_PROGRESS=1. It's not
straightforward to fix it with the current progress.c library (see [3]
for an attempt), so let's leave it for now.
Let's also initialize the *progress to "NULL" while we're at it. Since
7a132c628e5 (checkout: make delayed checkout respect --quiet and
--no-progress, 2021-08-26) we have had progress conditional on
"show_progress", usually we use the idiom of a "NULL" initialization
of the "*progress", rather than the more verbose ternary added in
7a132c628e5.
1. https://lore.kernel.org/git/20210620200303.2328957-1-szeder.dev@gmail.com/
2. http://lore.kernel.org/git/20210802214827.GE23408@szeder.dev
3. https://lore.kernel.org/git/20210620200303.2328957-7-szeder.dev@gmail.com/
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 09:10:12 +08:00
|
|
|
display_progress(progress, ++processed_paths);
|
2022-07-14 19:49:12 +08:00
|
|
|
errs |= checkout_entry(ce, state, NULL, path->util);
|
2017-08-20 23:47:20 +08:00
|
|
|
filtered_bytes += ce->ce_stat_data.sd_size;
|
|
|
|
display_throughput(progress, filtered_bytes);
|
|
|
|
} else
|
|
|
|
errs = 1;
|
2017-07-01 04:41:28 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
string_list_remove_empty_items(&dco->filters, 0);
|
|
|
|
}
|
2017-08-20 23:47:20 +08:00
|
|
|
stop_progress(&progress);
|
2017-07-01 04:41:28 +08:00
|
|
|
string_list_clear(&dco->filters, 0);
|
|
|
|
|
|
|
|
/* At this point we should not have any delayed paths anymore. */
|
|
|
|
errs |= dco->paths.nr;
|
|
|
|
for_each_string_list_item(path, &dco->paths) {
|
|
|
|
error("'%s' was not filtered properly", path->string);
|
|
|
|
}
|
|
|
|
string_list_clear(&dco->paths, 0);
|
|
|
|
|
|
|
|
free(dco);
|
|
|
|
state->delayed_checkout = NULL;
|
|
|
|
|
|
|
|
return errs;
|
|
|
|
}
|
|
|
|
|
2021-03-23 22:19:34 +08:00
|
|
|
void update_ce_after_write(const struct checkout *state, struct cache_entry *ce,
|
|
|
|
struct stat *st)
|
|
|
|
{
|
|
|
|
if (state->refresh_cache) {
|
|
|
|
assert(state->istate);
|
|
|
|
fill_stat_cache_info(state->istate, ce, st);
|
|
|
|
ce->ce_flags |= CE_UPDATE_IN_BASE;
|
|
|
|
mark_fsmonitor_invalid(state->istate, ce);
|
|
|
|
state->istate->cache_changed |= CE_ENTRY_CHANGED;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-03-23 22:19:35 +08:00
|
|
|
/* Note: ca is used (and required) iff the entry refers to a regular file. */
|
|
|
|
static int write_entry(struct cache_entry *ce, char *path, struct conv_attrs *ca,
|
2022-07-14 19:49:12 +08:00
|
|
|
const struct checkout *state, int to_tempfile,
|
|
|
|
int *nr_checkouts)
|
2005-06-06 12:59:54 +08:00
|
|
|
{
|
2009-02-10 04:54:50 +08:00
|
|
|
unsigned int ce_mode_s_ifmt = ce->ce_mode & S_IFMT;
|
2017-10-10 01:48:52 +08:00
|
|
|
struct delayed_checkout *dco = state->delayed_checkout;
|
2009-02-10 04:54:51 +08:00
|
|
|
int fd, ret, fstat_done = 0;
|
2018-02-15 02:59:41 +08:00
|
|
|
char *new_blob;
|
2009-02-10 04:54:50 +08:00
|
|
|
struct strbuf buf = STRBUF_INIT;
|
2021-11-02 23:46:08 +08:00
|
|
|
size_t size;
|
2017-09-14 01:16:28 +08:00
|
|
|
ssize_t wrote;
|
|
|
|
size_t newsize = 0;
|
2009-02-10 04:54:51 +08:00
|
|
|
struct stat st;
|
2017-03-15 05:46:40 +08:00
|
|
|
const struct submodule *sub;
|
2020-03-17 02:05:03 +08:00
|
|
|
struct checkout_metadata meta;
|
2022-07-14 19:49:12 +08:00
|
|
|
static int scratch_nr_checkouts;
|
2020-03-17 02:05:03 +08:00
|
|
|
|
|
|
|
clone_checkout_metadata(&meta, &state->meta, &ce->oid);
|
2009-02-10 04:54:50 +08:00
|
|
|
|
2011-05-21 05:33:31 +08:00
|
|
|
if (ce_mode_s_ifmt == S_IFREG) {
|
2021-03-23 22:19:35 +08:00
|
|
|
struct stream_filter *filter = get_stream_filter_ca(ca, &ce->oid);
|
2011-05-21 05:33:31 +08:00
|
|
|
if (filter &&
|
|
|
|
!streaming_write_entry(ce, path, filter,
|
|
|
|
state, to_tempfile,
|
|
|
|
&fstat_done, &st))
|
|
|
|
goto finish;
|
|
|
|
}
|
2011-05-13 05:31:08 +08:00
|
|
|
|
2009-02-10 04:54:50 +08:00
|
|
|
switch (ce_mode_s_ifmt) {
|
|
|
|
case S_IFLNK:
|
2018-02-15 02:59:41 +08:00
|
|
|
new_blob = read_blob_entry(ce, &size);
|
|
|
|
if (!new_blob)
|
2010-11-28 12:36:38 +08:00
|
|
|
return error("unable to read sha1 file of %s (%s)",
|
2021-02-16 22:06:51 +08:00
|
|
|
ce->name, oid_to_hex(&ce->oid));
|
2007-08-14 16:41:02 +08:00
|
|
|
|
2017-10-10 01:50:05 +08:00
|
|
|
/*
|
|
|
|
* We can't make a real symlink; write out a regular file entry
|
|
|
|
* with the symlink destination as its contents.
|
|
|
|
*/
|
|
|
|
if (!has_symlinks || to_tempfile)
|
|
|
|
goto write_file_entry;
|
|
|
|
|
2018-02-15 02:59:41 +08:00
|
|
|
ret = symlink(new_blob, path);
|
|
|
|
free(new_blob);
|
2017-10-10 01:50:05 +08:00
|
|
|
if (ret)
|
|
|
|
return error_errno("unable to create symlink %s", path);
|
|
|
|
break;
|
|
|
|
|
|
|
|
case S_IFREG:
|
2017-10-10 01:48:52 +08:00
|
|
|
/*
|
|
|
|
* We do not send the blob in case of a retry, so do not
|
|
|
|
* bother reading it at all.
|
|
|
|
*/
|
2017-10-10 01:50:05 +08:00
|
|
|
if (dco && dco->state == CE_RETRY) {
|
2018-02-15 02:59:41 +08:00
|
|
|
new_blob = NULL;
|
2017-10-10 01:48:52 +08:00
|
|
|
size = 0;
|
|
|
|
} else {
|
2018-02-15 02:59:41 +08:00
|
|
|
new_blob = read_blob_entry(ce, &size);
|
|
|
|
if (!new_blob)
|
2017-10-10 01:48:52 +08:00
|
|
|
return error("unable to read sha1 file of %s (%s)",
|
2021-02-16 22:06:51 +08:00
|
|
|
ce->name, oid_to_hex(&ce->oid));
|
2009-02-10 04:54:50 +08:00
|
|
|
}
|
|
|
|
|
2007-08-14 16:41:02 +08:00
|
|
|
/*
|
|
|
|
* Convert from git internal format to working tree format
|
|
|
|
*/
|
2017-10-10 01:50:05 +08:00
|
|
|
if (dco && dco->state != CE_NO_DELAY) {
|
2021-03-23 22:19:35 +08:00
|
|
|
ret = async_convert_to_working_tree_ca(ca, ce->name,
|
|
|
|
new_blob, size,
|
|
|
|
&buf, &meta, dco);
|
2022-07-14 19:49:12 +08:00
|
|
|
if (ret) {
|
|
|
|
struct string_list_item *item =
|
|
|
|
string_list_lookup(&dco->paths, ce->name);
|
|
|
|
if (item) {
|
|
|
|
item->util = nr_checkouts ? nr_checkouts
|
|
|
|
: &scratch_nr_checkouts;
|
|
|
|
free(new_blob);
|
|
|
|
goto delayed;
|
|
|
|
}
|
2017-07-01 04:41:28 +08:00
|
|
|
}
|
2021-03-23 22:19:35 +08:00
|
|
|
} else {
|
|
|
|
ret = convert_to_working_tree_ca(ca, ce->name, new_blob,
|
|
|
|
size, &buf, &meta);
|
|
|
|
}
|
2017-10-10 01:50:05 +08:00
|
|
|
|
|
|
|
if (ret) {
|
2018-02-15 02:59:41 +08:00
|
|
|
free(new_blob);
|
|
|
|
new_blob = strbuf_detach(&buf, &newsize);
|
2017-10-10 01:50:05 +08:00
|
|
|
size = newsize;
|
2007-08-14 16:41:02 +08:00
|
|
|
}
|
2017-10-10 01:50:05 +08:00
|
|
|
/*
|
|
|
|
* No "else" here as errors from convert are OK at this
|
|
|
|
* point. If the error would have been fatal (e.g.
|
|
|
|
* filter is required), then we would have died already.
|
|
|
|
*/
|
2007-08-14 16:41:02 +08:00
|
|
|
|
2017-10-10 01:50:05 +08:00
|
|
|
write_file_entry:
|
2011-05-13 12:36:42 +08:00
|
|
|
fd = open_output_fd(path, ce, to_tempfile);
|
2005-06-06 12:59:54 +08:00
|
|
|
if (fd < 0) {
|
2018-02-15 02:59:41 +08:00
|
|
|
free(new_blob);
|
2016-05-08 17:47:44 +08:00
|
|
|
return error_errno("unable to create file %s", path);
|
2005-06-06 12:59:54 +08:00
|
|
|
}
|
Lazy man's auto-CRLF
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-14 03:07:23 +08:00
|
|
|
|
2018-02-15 02:59:41 +08:00
|
|
|
wrote = write_in_full(fd, new_blob, size);
|
2011-05-13 12:36:42 +08:00
|
|
|
if (!to_tempfile)
|
2021-03-23 22:19:33 +08:00
|
|
|
fstat_done = fstat_checkout_output(fd, state, &st);
|
2005-06-06 12:59:54 +08:00
|
|
|
close(fd);
|
2018-02-15 02:59:41 +08:00
|
|
|
free(new_blob);
|
2017-09-14 01:16:28 +08:00
|
|
|
if (wrote < 0)
|
2010-11-28 12:36:38 +08:00
|
|
|
return error("unable to write file %s", path);
|
2005-06-06 12:59:54 +08:00
|
|
|
break;
|
2017-10-10 01:50:05 +08:00
|
|
|
|
2007-05-22 04:08:28 +08:00
|
|
|
case S_IFGITLINK:
|
2007-04-14 00:26:04 +08:00
|
|
|
if (to_tempfile)
|
2021-02-16 22:06:51 +08:00
|
|
|
return error("cannot create temporary submodule %s", ce->name);
|
2007-04-14 00:26:04 +08:00
|
|
|
if (mkdir(path, 0777) < 0)
|
2013-07-18 20:26:55 +08:00
|
|
|
return error("cannot create submodule directory %s", path);
|
2017-03-15 05:46:40 +08:00
|
|
|
sub = submodule_from_ce(ce);
|
|
|
|
if (sub)
|
|
|
|
return submodule_move_head(ce->name,
|
2017-04-19 05:37:22 +08:00
|
|
|
NULL, oid_to_hex(&ce->oid),
|
|
|
|
state->force ? SUBMODULE_MOVE_HEAD_FORCE : 0);
|
2007-04-14 00:26:04 +08:00
|
|
|
break;
|
2017-10-10 01:50:05 +08:00
|
|
|
|
2005-06-06 12:59:54 +08:00
|
|
|
default:
|
2021-02-16 22:06:51 +08:00
|
|
|
return error("unknown file mode for %s in index", ce->name);
|
2005-06-06 12:59:54 +08:00
|
|
|
}
|
|
|
|
|
2011-05-13 05:31:08 +08:00
|
|
|
finish:
|
2005-06-06 14:15:40 +08:00
|
|
|
if (state->refresh_cache) {
|
2021-03-23 22:19:34 +08:00
|
|
|
if (!fstat_done && lstat(ce->name, &st) < 0)
|
|
|
|
return error_errno("unable to stat just-written file %s",
|
|
|
|
ce->name);
|
|
|
|
update_ce_after_write(state, ce , &st);
|
2005-06-06 12:59:54 +08:00
|
|
|
}
|
2022-07-14 19:49:12 +08:00
|
|
|
if (nr_checkouts)
|
|
|
|
(*nr_checkouts)++;
|
2017-10-05 18:44:06 +08:00
|
|
|
delayed:
|
2005-06-06 12:59:54 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2009-07-30 11:22:25 +08:00
|
|
|
/*
|
|
|
|
* This is like 'lstat()', except it refuses to follow symlinks
|
2009-08-17 14:53:12 +08:00
|
|
|
* in the path, after skipping "skiplen".
|
2009-07-30 11:22:25 +08:00
|
|
|
*/
|
2010-01-12 14:27:31 +08:00
|
|
|
static int check_path(const char *path, int len, struct stat *st, int skiplen)
|
2009-07-30 11:22:25 +08:00
|
|
|
{
|
2009-08-17 14:53:12 +08:00
|
|
|
const char *slash = path + len;
|
|
|
|
|
|
|
|
while (path < slash && *slash != '/')
|
|
|
|
slash--;
|
|
|
|
if (!has_dirs_only_path(path, slash - path, skiplen)) {
|
2009-07-30 11:22:25 +08:00
|
|
|
errno = ENOENT;
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
return lstat(path, st);
|
|
|
|
}
|
|
|
|
|
2018-08-18 02:00:39 +08:00
|
|
|
static void mark_colliding_entries(const struct checkout *state,
|
|
|
|
struct cache_entry *ce, struct stat *st)
|
|
|
|
{
|
|
|
|
int i, trust_ino = check_stat;
|
|
|
|
|
clone: fix colliding file detection on APFS
Commit b878579ae7 (clone: report duplicate entries on case-insensitive
filesystems - 2018-08-17) adds a warning to user when cloning a repo
with case-sensitive file names on a case-insensitive file system. The
"find duplicate file" check was doing by comparing inode number (and
only fall back to fspathcmp() when inode is known to be unreliable
because fspathcmp() can't cover all case folding cases).
The inode check is very simple, and wrong. It compares between a
32-bit number (sd_ino) and potentially a 64-bit number (st_ino). When
an inode is larger than 2^32 (which seems to be the case for APFS), it
will be truncated and stored in sd_ino, but comparing with itself will
fail.
As a result, instead of showing a pair of files that have the same
name, we show just one file (marked before the beginning of the
loop). We fail to find the original one.
The fix could be just a simple type cast (*)
dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino
but this is no longer a reliable test, there are 4G possible inodes
that can match sd_ino because we only match the lower 32 bits instead
of full 64 bits.
There are two options to go. Either we ignore inode and go with
fspathcmp() on Apple platform. This means we can't do accurate inode
check on HFS anymore, or even on APFS when inode numbers are still
below 2^32.
Or we just to to reduce the odds of matching a wrong file by checking
more attributes, counting mostly on st_size because st_xtime is likely
the same. This patch goes with this direction, hoping that false
positive chances are too small to be seen in practice.
While at there, enable the test on Cygwin (verified working by Ramsay
Jones)
(*) this is also already done inside match_stat_data()
Reported-by: Carlo Arenas <carenas@gmail.com>
Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-21 00:28:53 +08:00
|
|
|
#if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__)
|
2018-08-18 02:00:39 +08:00
|
|
|
trust_ino = 0;
|
|
|
|
#endif
|
|
|
|
|
|
|
|
ce->ce_flags |= CE_MATCHED;
|
|
|
|
|
2021-04-01 09:49:55 +08:00
|
|
|
/* TODO: audit for interaction with sparse-index. */
|
|
|
|
ensure_full_index(state->istate);
|
2018-08-18 02:00:39 +08:00
|
|
|
for (i = 0; i < state->istate->cache_nr; i++) {
|
|
|
|
struct cache_entry *dup = state->istate->cache[i];
|
|
|
|
|
unpack-trees: add basic support for parallel checkout
This new interface allows us to enqueue some of the entries being
checked out to later uncompress them, apply in-process filters, and
write out the files in parallel. For now, the parallel checkout
machinery is enabled by default and there is no user configuration, but
run_parallel_checkout() just writes the queued entries in sequence
(without spawning additional workers). The next patch will actually
implement the parallelism and, later, we will make it configurable.
Note that, to avoid potential data races, not all entries are eligible
for parallel checkout. Also, paths that collide on disk (e.g.
case-sensitive paths in case-insensitive file systems), are detected by
the parallel checkout code and skipped, so that they can be safely
sequentially handled later. The collision detection works like the
following:
- If the collision was at basename (e.g. 'a/b' and 'a/B'), the framework
detects it by looking for EEXIST and EISDIR errors after an
open(O_CREAT | O_EXCL) failure.
- If the collision was at dirname (e.g. 'a/b' and 'A'), it is detected
at the has_dirs_only_path() check, which is done for the leading path
of each item in the parallel checkout queue.
Both verifications rely on the fact that, before enqueueing an entry for
parallel checkout, checkout_entry() makes sure that there is no file at
the entry's path and that its leading components are all real
directories. So, any later change in these conditions indicates that
there was a collision (either between two parallel-eligible entries or
between an eligible and an ineligible one).
After all parallel-eligible entries have been processed, the collided
(and thus, skipped) entries are sequentially fed to checkout_entry()
again. This is similar to the way the current code deals with
collisions, overwriting the previously checked out entries with the
subsequent ones. The only difference is that, since we no longer create
the files in the same order that they appear on index, we are not able
to determine which of the colliding entries will survive on disk (for
the classic code, it is always the last entry).
Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-19 08:14:53 +08:00
|
|
|
if (dup == ce) {
|
|
|
|
/*
|
|
|
|
* Parallel checkout doesn't create the files in index
|
|
|
|
* order. So the other side of the collision may appear
|
|
|
|
* after the given cache_entry in the array.
|
|
|
|
*/
|
|
|
|
if (parallel_checkout_status() == PC_RUNNING)
|
|
|
|
continue;
|
|
|
|
else
|
|
|
|
break;
|
|
|
|
}
|
2018-08-18 02:00:39 +08:00
|
|
|
|
|
|
|
if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
|
|
|
|
continue;
|
|
|
|
|
clone: fix colliding file detection on APFS
Commit b878579ae7 (clone: report duplicate entries on case-insensitive
filesystems - 2018-08-17) adds a warning to user when cloning a repo
with case-sensitive file names on a case-insensitive file system. The
"find duplicate file" check was doing by comparing inode number (and
only fall back to fspathcmp() when inode is known to be unreliable
because fspathcmp() can't cover all case folding cases).
The inode check is very simple, and wrong. It compares between a
32-bit number (sd_ino) and potentially a 64-bit number (st_ino). When
an inode is larger than 2^32 (which seems to be the case for APFS), it
will be truncated and stored in sd_ino, but comparing with itself will
fail.
As a result, instead of showing a pair of files that have the same
name, we show just one file (marked before the beginning of the
loop). We fail to find the original one.
The fix could be just a simple type cast (*)
dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino
but this is no longer a reliable test, there are 4G possible inodes
that can match sd_ino because we only match the lower 32 bits instead
of full 64 bits.
There are two options to go. Either we ignore inode and go with
fspathcmp() on Apple platform. This means we can't do accurate inode
check on HFS anymore, or even on APFS when inode numbers are still
below 2^32.
Or we just to to reduce the odds of matching a wrong file by checking
more attributes, counting mostly on st_size because st_xtime is likely
the same. This patch goes with this direction, hoping that false
positive chances are too small to be seen in practice.
While at there, enable the test on Cygwin (verified working by Ramsay
Jones)
(*) this is also already done inside match_stat_data()
Reported-by: Carlo Arenas <carenas@gmail.com>
Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-21 00:28:53 +08:00
|
|
|
if ((trust_ino && !match_stat_data(&dup->ce_stat_data, st)) ||
|
2018-08-18 02:00:39 +08:00
|
|
|
(!trust_ino && !fspathcmp(ce->name, dup->name))) {
|
|
|
|
dup->ce_flags |= CE_MATCHED;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-03-23 22:19:36 +08:00
|
|
|
int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca,
|
|
|
|
const struct checkout *state, char *topath,
|
|
|
|
int *nr_checkouts)
|
2005-06-06 12:59:54 +08:00
|
|
|
{
|
2014-03-13 17:19:07 +08:00
|
|
|
static struct strbuf path = STRBUF_INIT;
|
2006-03-05 16:24:15 +08:00
|
|
|
struct stat st;
|
2021-03-23 22:19:36 +08:00
|
|
|
struct conv_attrs ca_buf;
|
2005-06-06 12:59:54 +08:00
|
|
|
|
2018-12-20 21:48:15 +08:00
|
|
|
if (ce->ce_flags & CE_WT_REMOVE) {
|
|
|
|
if (topath)
|
|
|
|
/*
|
|
|
|
* No content and thus no path to create, so we have
|
|
|
|
* no pathname to return.
|
|
|
|
*/
|
|
|
|
BUG("Can't remove entry to a path");
|
|
|
|
unlink_entry(ce);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2021-03-23 22:19:35 +08:00
|
|
|
if (topath) {
|
2021-03-23 22:19:36 +08:00
|
|
|
if (S_ISREG(ce->ce_mode) && !ca) {
|
2021-03-23 22:19:35 +08:00
|
|
|
convert_attrs(state->istate, &ca_buf, ce->name);
|
|
|
|
ca = &ca_buf;
|
|
|
|
}
|
2022-07-14 19:49:12 +08:00
|
|
|
return write_entry(ce, topath, ca, state, 1, nr_checkouts);
|
2021-03-23 22:19:35 +08:00
|
|
|
}
|
2006-03-05 16:24:15 +08:00
|
|
|
|
2014-03-13 17:19:07 +08:00
|
|
|
strbuf_reset(&path);
|
|
|
|
strbuf_add(&path, state->base_dir, state->base_dir_len);
|
|
|
|
strbuf_add(&path, ce->name, ce_namelen(ce));
|
2005-06-06 12:59:54 +08:00
|
|
|
|
2014-03-13 17:19:07 +08:00
|
|
|
if (!check_path(path.buf, path.len, &st, state->base_dir_len)) {
|
2017-03-15 05:46:40 +08:00
|
|
|
const struct submodule *sub;
|
2018-08-14 00:14:32 +08:00
|
|
|
unsigned changed = ie_match_stat(state->istate, ce, &st,
|
|
|
|
CE_MATCH_IGNORE_VALID | CE_MATCH_IGNORE_SKIP_WORKTREE);
|
2017-03-15 05:46:40 +08:00
|
|
|
/*
|
|
|
|
* Needs to be checked before !changed returns early,
|
|
|
|
* as the possibly empty directory was not changed
|
|
|
|
*/
|
|
|
|
sub = submodule_from_ce(ce);
|
|
|
|
if (sub) {
|
|
|
|
int err;
|
|
|
|
if (!is_submodule_populated_gently(ce->name, &err)) {
|
|
|
|
struct stat sb;
|
|
|
|
if (lstat(ce->name, &sb))
|
|
|
|
die(_("could not stat file '%s'"), ce->name);
|
|
|
|
if (!(st.st_mode & S_IFDIR))
|
|
|
|
unlink_or_warn(ce->name);
|
|
|
|
|
|
|
|
return submodule_move_head(ce->name,
|
2017-04-19 05:37:22 +08:00
|
|
|
NULL, oid_to_hex(&ce->oid), 0);
|
2017-03-15 05:46:40 +08:00
|
|
|
} else
|
|
|
|
return submodule_move_head(ce->name,
|
|
|
|
"HEAD", oid_to_hex(&ce->oid),
|
2017-04-19 05:37:22 +08:00
|
|
|
state->force ? SUBMODULE_MOVE_HEAD_FORCE : 0);
|
2017-03-15 05:46:40 +08:00
|
|
|
}
|
|
|
|
|
2005-06-06 12:59:54 +08:00
|
|
|
if (!changed)
|
|
|
|
return 0;
|
|
|
|
if (!state->force) {
|
|
|
|
if (!state->quiet)
|
2014-03-13 17:19:07 +08:00
|
|
|
fprintf(stderr,
|
|
|
|
"%s already exists, no checkout\n",
|
|
|
|
path.buf);
|
2005-10-04 03:44:48 +08:00
|
|
|
return -1;
|
2005-06-06 12:59:54 +08:00
|
|
|
}
|
|
|
|
|
2018-08-18 02:00:39 +08:00
|
|
|
if (state->clone)
|
|
|
|
mark_colliding_entries(state, ce, &st);
|
|
|
|
|
2005-06-06 12:59:54 +08:00
|
|
|
/*
|
|
|
|
* We unlink the old file, to get the new one with the
|
|
|
|
* right permissions (including umask, which is nasty
|
|
|
|
* to emulate by hand - much easier to let the system
|
|
|
|
* just do the right thing)
|
|
|
|
*/
|
2005-07-15 00:58:45 +08:00
|
|
|
if (S_ISDIR(st.st_mode)) {
|
2007-04-14 00:26:04 +08:00
|
|
|
/* If it is a gitlink, leave it alone! */
|
2008-01-15 08:03:17 +08:00
|
|
|
if (S_ISGITLINK(ce->ce_mode))
|
2007-04-14 00:26:04 +08:00
|
|
|
return 0;
|
2014-03-13 17:19:08 +08:00
|
|
|
remove_subtree(&path);
|
2014-03-13 17:19:07 +08:00
|
|
|
} else if (unlink(path.buf))
|
2016-05-08 17:47:44 +08:00
|
|
|
return error_errno("unable to unlink old '%s'", path.buf);
|
2006-03-05 16:24:15 +08:00
|
|
|
} else if (state->not_new)
|
2005-06-06 12:59:54 +08:00
|
|
|
return 0;
|
2014-03-13 17:19:07 +08:00
|
|
|
|
|
|
|
create_directories(path.buf, path.len, state);
|
2021-03-23 22:19:35 +08:00
|
|
|
|
2021-03-23 22:19:36 +08:00
|
|
|
if (S_ISREG(ce->ce_mode) && !ca) {
|
2021-03-23 22:19:35 +08:00
|
|
|
convert_attrs(state->istate, &ca_buf, ce->name);
|
|
|
|
ca = &ca_buf;
|
|
|
|
}
|
|
|
|
|
2022-07-14 19:49:12 +08:00
|
|
|
if (!enqueue_checkout(ce, ca, nr_checkouts))
|
unpack-trees: add basic support for parallel checkout
This new interface allows us to enqueue some of the entries being
checked out to later uncompress them, apply in-process filters, and
write out the files in parallel. For now, the parallel checkout
machinery is enabled by default and there is no user configuration, but
run_parallel_checkout() just writes the queued entries in sequence
(without spawning additional workers). The next patch will actually
implement the parallelism and, later, we will make it configurable.
Note that, to avoid potential data races, not all entries are eligible
for parallel checkout. Also, paths that collide on disk (e.g.
case-sensitive paths in case-insensitive file systems), are detected by
the parallel checkout code and skipped, so that they can be safely
sequentially handled later. The collision detection works like the
following:
- If the collision was at basename (e.g. 'a/b' and 'a/B'), the framework
detects it by looking for EEXIST and EISDIR errors after an
open(O_CREAT | O_EXCL) failure.
- If the collision was at dirname (e.g. 'a/b' and 'A'), it is detected
at the has_dirs_only_path() check, which is done for the leading path
of each item in the parallel checkout queue.
Both verifications rely on the fact that, before enqueueing an entry for
parallel checkout, checkout_entry() makes sure that there is no file at
the entry's path and that its leading components are all real
directories. So, any later change in these conditions indicates that
there was a collision (either between two parallel-eligible entries or
between an eligible and an ineligible one).
After all parallel-eligible entries have been processed, the collided
(and thus, skipped) entries are sequentially fed to checkout_entry()
again. This is similar to the way the current code deals with
collisions, overwriting the previously checked out entries with the
subsequent ones. The only difference is that, since we no longer create
the files in the same order that they appear on index, we are not able
to determine which of the colliding entries will survive on disk (for
the classic code, it is always the last entry).
Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-19 08:14:53 +08:00
|
|
|
return 0;
|
|
|
|
|
2022-07-14 19:49:12 +08:00
|
|
|
return write_entry(ce, path.buf, ca, state, 0, nr_checkouts);
|
2005-06-06 12:59:54 +08:00
|
|
|
}
|
2018-12-20 21:48:14 +08:00
|
|
|
|
|
|
|
void unlink_entry(const struct cache_entry *ce)
|
|
|
|
{
|
|
|
|
const struct submodule *sub = submodule_from_ce(ce);
|
|
|
|
if (sub) {
|
|
|
|
/* state.force is set at the caller. */
|
|
|
|
submodule_move_head(ce->name, "HEAD", NULL,
|
|
|
|
SUBMODULE_MOVE_HEAD_FORCE);
|
|
|
|
}
|
checkout: don't follow symlinks when removing entries
At 1d718a5108 ("do not overwrite untracked symlinks", 2011-02-20),
symlink.c:check_leading_path() started returning different codes for
FL_ENOENT and FL_SYMLINK. But one of its callers, unlink_entry(), was
not adjusted for this change, so it started to follow symlinks on the
leading path of to-be-removed entries. Fix that and add a regression
test.
Note that since 1d718a5108 check_leading_path() no longer differentiates
the case where it found a symlink in the path's leading components from
the cases where it found a regular file or failed to lstat() the
component. So, a side effect of this current patch is that
unlink_entry() now returns early in all of these three cases. And
because we no longer try to unlink such paths, we also don't get the
warning from remove_or_warn().
For the regular file and symlink cases, it's questionable whether the
warning was useful in the first place: unlink_entry() removes tracked
paths that should no longer be present in the state we are checking out
to. If the path had its leading dir replaced by another file, it means
that the basename already doesn't exist, so there is no need for a
warning. Sure, we are leaving a regular file or symlink behind at the
path's dirname, but this file is either untracked now (so again, no
need to warn), or it will be replaced by a tracked file during the next
phase of this checkout operation.
As for failing to lstat() one of the leading components, the basename
might still exist only we cannot unlink it (e.g. due to the lack of the
required permissions). Since the user expect it to be removed
(especially with checkout's --no-overlay option), add back the warning
in this more relevant case.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-19 02:43:47 +08:00
|
|
|
if (check_leading_path(ce->name, ce_namelen(ce), 1) >= 0)
|
2018-12-20 21:48:14 +08:00
|
|
|
return;
|
|
|
|
if (remove_or_warn(ce->ce_mode, ce->name))
|
|
|
|
return;
|
|
|
|
schedule_dir_for_removal(ce->name, ce_namelen(ce));
|
|
|
|
}
|