Some `clean` / `smudge` filters may require a significant amount of
time to process a single blob (e.g. the Git LFS smudge filter might
perform network requests). During this process the Git checkout
operation is blocked and Git needs to wait until the filter is done to
continue with the checkout.
Teach the filter process protocol, introduced in edcc8581 ("convert: add
filter.<driver>.process option", 2016-10-16), to accept the status
"delayed" as response to a filter request. Upon this response Git
continues with the checkout operation. After the checkout operation Git
calls "finish_delayed_checkout" which queries the filter for remaining
blobs. If the filter is still working on the completion, then the filter
is expected to block. If the filter has completed all remaining blobs
then an empty response is expected.
Git has a multiple code paths that checkout a blob. Support delayed
checkouts only in `clone` (in unpack-trees.c) and `checkout` operations
for now. The optimization is most effective in these code paths as all
files of the tree are processed.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In case of a non-forced worktree update, the submodule movement is tested
in a dry run first, such that it doesn't matter if the actual update is
done via the force flag. However for correctness, we want to give the
flag as specified by the user. All callers have been inspected and updated
if needed.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a submodule is introduced with a new revision
we need to create the submodule in the worktree as well.
As 'submodule_move_head' handles edge cases, all we have
to do is call it from within the function that creates
new files in the working tree for workingtree operations.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since all of its callers have been updated, modify stream_blob_to_fd to
take a struct object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:
@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash
@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We frequently allocate strings as xmalloc(len + 1), where
the extra 1 is for the NUL terminator. This can be done more
simply with xmallocz, which also checks for integer
overflow.
There's no case where switching xmalloc(n+1) to xmallocz(n)
is wrong; the result is the same length, and malloc made no
guarantees about what was in the buffer anyway. But in some
cases, we can stop manually placing NUL at the end of the
allocated buffer. But that's only safe if it's clear that
the contents will always fill the buffer.
In each case where this patch does so, I manually examined
the control flow, and I tried to err on the side of caution.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This particular conversion is non-obvious, because nobody
has passed our function the length of the destination
buffer. However, the interface to checkout_entry specifies
that the buffer must be at least TEMPORARY_FILENAME_LENGTH
bytes long, so we can check that (meaning the existing code
was not buggy, but merely worrisome to somebody reading it).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The large part of this patch just follows CE_ENTRY_CHANGED
marks. replace_index_entry() is updated to update
split_index->base->cache[] as well so base->cache[] does not reference
to a freed entry.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Other fill_stat_cache_info() is on new entries, which should set
CE_ENTRY_ADDED in cache_changed, so we're safe.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mh/remove-subtree-long-pathname-fix:
entry.c: fix possible buffer overflow in remove_subtree()
checkout_entry(): use the strbuf throughout the function
remove_subtree() manipulated path in a fixed-size buffer even though
the length of the input, let alone the length of entries within the
directory, were not known in advance. Change the function to take a
strbuf argument and use that object as its scratch space.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no need to break out the "buf" and "len" members into
separate temporary variables. Rename path_buf to path and use
path.buf and path.len directly. This makes it easier to reason about
the data flow in the function.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The said function has this signature:
extern int checkout_entry(struct cache_entry *ce,
const struct checkout *state,
char *topath);
At first glance, it might appear that the caller of checkout_entry()
can specify to which path the contents are written out by the last
parameter, and it is tempting to add "const" in front of its type.
In reality, however, topath[] is to point at a buffer to store the
temporary path generated by the callchain originating from this
function, and the temporary path is always short, much shorter than
the buffer prepared by its only caller in builtin/checkout-index.c.
Document the code a bit to clarify so that future callers know how
to use the function better.
Noticed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old code does not do boundary check so any paths longer than
PATH_MAX can cause buffer overflow. Replace it with strbuf to handle
paths of arbitrary length.
The OS may reject if the path is too long though. But in that case we
report the cause (e.g. name too long) and usually move on to checking
out the next entry.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are only four (with some generous rounding) instances in the
current source code where we speak of "subproject" instead of
"submodule". They are as follows:
* one error message in git-apply and two in entry.c
* the patch format for submodule changes
The latter was introduced in 0478675 (Expose subprojects as special
files to "git diff" machinery, 2007-04-15), apparently before the
terminology was settled. We can of course not change the patch
format.
Let's at least change the error messages to consistently call them
"submodule".
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is
- diff-lib.c: setting CE_UPTODATE
- name-hash.c: setting CE_HASHED
- preload-index.c, read-cache.c, unpack-trees.c and
builtin/update-index: obvious
- entry.c: write_entry() may refresh the checked out entry via
fill_stat_cache_info(). This causes "non-const struct cache_entry
*" in builtin/apply.c, builtin/checkout-index.c and
builtin/checkout.c
- builtin/ls-files.c: --with-tree changes stagemask and may set
CE_UPDATE
Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.
So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.
The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:
diff --git a/cache.h b/cache.h
index 430d021..1692891 100644
--- a/cache.h
+++ b/cache.h
@@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
#define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
struct index_state {
- struct cache_entry **cache;
+ const struct cache_entry **cache;
unsigned int version;
unsigned int cache_nr, cache_alloc, cache_changed;
struct string_list *resolve_undo;
will help quickly identify them without bogus warnings.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Have the streaming interface and other codepaths more carefully
examine for corrupt objects.
* jk/check-corrupt-objects-carefully:
clone: leave repo in place after checkout errors
clone: run check_everything_connected
clone: die on errors from unpack_trees
add tests for cloning corrupted repositories
streaming_write_entry: propagate streaming errors
add test for streaming corrupt blobs
avoid infinite loop in read_istream_loose
read_istream_filtered: propagate read error from upstream
check_sha1_signature: check return value from read_istream
stream_blob_to_fd: detect errors reading from stream
Codepath to stream blob object contents directly from the object
store to filesystem did not use the correct path to find conversion
filters when writing to temporary files.
* jk/checkout-attribute-lookup:
t2003: work around path mangling issue on Windows
entry: fix filter lookup
t2003: modernize style
When we are streaming an index blob to disk, we store the
error from stream_blob_to_fd in the "result" variable, and
then immediately overwrite that with the return value of
"close". That means we catch errors on close (e.g., problems
committing the file to disk), but miss anything which
happened before then.
We can fix this by using bitwise-OR to accumulate errors in
our result variable.
While we're here, we can also simplify the error handling
with an early return, which makes it easier to see under
which circumstances we need to clean up.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When looking up the stream filter, write_entry() should be passing the
path of the file in the repository, not the path to which the content is
going to be written. This allows the file to be correctly looked up
against the .gitattributes files in the working tree.
This change makes the streaming case match the non-streaming case which
passes ce->name to convert_to_working_tree later in the same function.
The two tests added here test the different paths through write_entry
since the CRLF filter is a streaming filter but the user-defined smudge
filter is not streamed.
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The static function in entry.c takes a cache entry and streams its blob
contents to a file in the working tree. Refactor the logic to a new API
function stream_blob_to_fd() that takes an object name and an open file
descriptor, so that it can be reused by other callers.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This introduces an API to plug custom filters to an input stream.
The caller gets get_stream_filter("path") to obtain an appropriate
filter for the path, and then uses it when opening an input stream
via open_istream(). After that, the caller can read from the stream
with read_istream(), and close it with close_istream(), just like an
unfiltered stream.
This only adds a "null" filter that is a pass-thru filter, but later
changes can add LF-to-CRLF and other filters, and the callers of the
streaming API do not have to change.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One typical use of a large binary file is to hold a sparse on-disk hash
table with a lot of holes. Help preserving the holes with lseek().
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the output to a path does not have to be converted, we can read from
the object database from the streaming API and write to the file in the
working tree, without having to hold everything in the memory.
The ident, auto- and safe- crlf conversions inherently require you to read
the whole thing before deciding what to do, so while it is technically
possible to support them by using a buffer of an unbound size or rewinding
and reading the stream twice, it is less practical than the traditional
"read the whole thing in core and convert" approach.
Adding streaming filters for the other conversions on top of this should
be doable by tweaking the can_bypass_conversion() function (it should be
renamed to can_filter_stream() when it happens). Then the streaming API
can be extended to wrap the git_istream streaming_write_entry() opens on
the underlying object in another git_istream that reads from it, filters
what is read, and let the streaming_write_entry() read the filtered
result. But that is outside the scope of this series.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the write-out codepath, a block of code determines what file in the
working tree to write to, and opens an output file descriptor to it.
After writing the contents out to the file, another block of code runs
fstat() on the file descriptor when appropriate.
Separate these blocks out to open_output_fd() and fstat_output()
helper functions.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Back then when entry.c was part of checkout-index (or checkout-cache
at that time [1]). It makes sense to print the command name in error
messages. Nowadays entry.c is in libgit and can be used by any
commands, printing "git checkout-index: blah" does no more than
confusion. The error messages without it still give enough information.
[1] 12dccc1 (Make fiel checkout function available to the git library - 2005-06-05)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/symbol-static:
date.c: mark file-local function static
Replace parse_blob() with an explanatory comment
symlinks.c: remove unused functions
object.c: remove unused functions
strbuf.c: remove unused function
sha1_file.c: remove unused function
mailmap.c: remove unused function
utf8.c: mark file-local function static
submodule.c: mark file-local function static
quote.c: mark file-local function static
remote-curl.c: mark file-local function static
read-cache.c: mark file-local functions static
parse-options.c: mark file-local function static
entry.c: mark file-local function static
http.c: mark file-local functions static
pretty.c: mark file-local function static
builtin-rev-list.c: mark file-local function static
bisect.c: mark file-local function static
* nd/sparse: (25 commits)
t7002: test for not using external grep on skip-worktree paths
t7002: set test prerequisite "external-grep" if supported
grep: do not do external grep on skip-worktree entries
commit: correctly respect skip-worktree bit
ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALID
tests: rename duplicate t1009
sparse checkout: inhibit empty worktree
Add tests for sparse checkout
read-tree: add --no-sparse-checkout to disable sparse checkout support
unpack-trees(): ignore worktree check outside checkout area
unpack_trees(): apply $GIT_DIR/info/sparse-checkout to the final index
unpack-trees(): "enable" sparse checkout and load $GIT_DIR/info/sparse-checkout
unpack-trees.c: generalize verify_* functions
unpack-trees(): add CE_WT_REMOVE to remove on worktree alone
Introduce "sparse checkout"
dir.c: export excluded_1() and add_excludes_from_file_1()
excluded_1(): support exclude files in index
unpack-trees(): carry skip-worktree bit over in merged_entry()
Read .gitignore from index if it is skip-worktree
Avoid writing to buffer in add_excludes_from_file_1()
...
Conflicts:
.gitignore
Documentation/config.txt
Documentation/git-update-index.txt
Makefile
entry.c
t/t7002-grep.sh
Previously CE_MATCH_IGNORE_VALID flag is used by both valid and
skip-worktree bits. While the two bits have similar behaviour, sharing
this flag means "git update-index --really-refresh" will ignore
skip-worktree while it should not. Instead another flag is
introduced to ignore skip-worktree bit, CE_MATCH_IGNORE_VALID only
applies to valid bit.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merlyn noticed that Documentation/install-doc-quick.sh no longer correctly
removes old installed documents when the target directory has a leading
path that is a symlink. It turns out that "checkout-index --prefix" was
broken by recent b6986d8 (git-checkout: be careful about untracked
symlinks, 2009-07-29).
I suspect has_symlink_leading_path() could learn the third parameter
(prefix that is allowed to be symlinked directories) to allow us to retire
a similar function has_dirs_only_path().
Another avenue of fixing this I considered was to get rid of base_dir and
base_dir_len from "struct checkout", and instead make "git checkout-index"
when run with --prefix mkdir the leading path and chdir in there. It
might be the best longer term solution to this issue, as the base_dir
feature is used only by that rather obscure codepath as far as I know.
But at least this patch should fix this breakage.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fixes the case where an untracked symlink that points at a directory
with tracked paths confuses the checkout logic, demostrated in t6035.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Lots of die() calls did not actually report the kind of error, which
can leave the user confused as to the real problem. Use die_errno()
where we check a system/library call that sets errno on failure, or
one of the following that wrap such calls:
Function Passes on error from
-------- --------------------
odb_pack_keep open
read_ancestry fopen
read_in_full xread
strbuf_read xread
strbuf_read_file open or strbuf_read_file
strbuf_readlink readlink
write_in_full xwrite
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change calls to die(..., strerror(errno)) to use the new die_errno().
In the process, also make slight style adjustments: at least state
_something_ about the function that failed (instead of just printing
the pathname), and put paths in single quotes.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helps to notice when something's going wrong, especially on
systems which lock open files.
I used the following criteria when selecting the code for replacement:
- it was already printing a warning for the unlink failures
- it is in a function which already printing something or is
called from such a function
- it is in a static function, returning void and the function is only
called from a builtin main function (cmd_)
- it is in a function which handles emergency exit (signal handlers)
- it is in a function which is obvously cleaning up the lockfiles
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit e4c72923 (write_entry(): use fstat() instead of lstat() when file
is open, 2009-02-09) introduced an optimization of write_entry().
Unfortunately, we cannot take advantage of this optimization on Windows
because there is no guarantee that the time stamps are updated before the
file is closed:
"The only guarantee about a file timestamp is that the file time is
correctly reflected when the handle that makes the change is closed."
(http://msdn.microsoft.com/en-us/library/ms724290(VS.85).aspx)
The failure of this optimization on Windows can be observed most easily by
running a 'git checkout' that has to update several large files. In this
case, 'git checkout' will report modified files, but infact only the
timestamps were incorrectly recorded in the index, as can be verified by a
subsequent 'git diff', which shows no change.
Dmitry Potapov reports the same fix needs on Cygwin; this commit contains
his updates for that.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently inside write_entry() we do an lstat(path, &st) call on a
file which have just been opened inside the exact same function. It
should be better to call fstat(fd, &st) on the file while it is open,
and it should be at least as fast as the lstat() method.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The switch-cases for S_IFREG and S_IFLNK was so similar that it will
be better to do some cleanup and use the common parts of it.
And the entry.c file should now be clean for 'gcc -Wextra' warnings.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the call to memcpy() and strchr() for each path component
tested, and instead add each path component as we go forward inside
the while-loop.
Impact: small optimisation
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Swap function argument pair (length, string) into (string, length) to
conform with the commonly used order inside the GIT source code.
Also, add a note about this fact into the coding guidelines.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* kb/lstat-cache:
lstat_cache(): introduce clear_lstat_cache() function
lstat_cache(): introduce invalidate_lstat_cache() function
lstat_cache(): introduce has_dirs_only_path() function
lstat_cache(): introduce has_symlink_or_noent_leading_path() function
lstat_cache(): more cache effective symlink/directory detection
The create_directories() function in entry.c currently calls stat()
or lstat() for each path component of the pathname 'path' each and every
time. For the 'git checkout' command, this function is called on each
file for which we must do an update (ce->ce_flags & CE_UPDATE), so we get
lots and lots of calls.
To fix this, we make a new wrapper to the lstat_cache() function, and
call the wrapper function instead of the calls to the stat() or the
lstat() functions. Since the paths given to the create_directories()
function, is sorted alphabetically, the new wrapper would be very
cache effective in this situation.
To support it we must update the lstat_cache() function to be able to
say that "please test the complete length of 'name'", and also to give
it the length of a prefix, where the cache should use the stat()
function instead of the lstat() function to test each path component.
Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable
comments to this patch!
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new inline function is_dot_or_dotdot is used to check if the
directory name is either "." or "..". It returns a non-zero value if
the given string is "." or "..". It's applicable to a lot of Git
source code.
Signed-off-by: Alexander Potashev <aspotashev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a mechanical conversion of all '*.c' files with:
s/((?:die|error|warning)\("git)-(\S+:)/$1 $2/;
The result was manually inspected and no false positive was found.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently when checking out an entry "path", we try to unlink(2) it first
(because there could be stale file), and if there is a directory there,
try to deal with it (typically we run recursive rmdir). We ignore the
error return from this unlink because there may not even be any file
there.
However if you are root on Solaris, you can unlink(2) a directory
successfully and corrupt your filesystem.
This moves the code around and check the directory first, and then
unlink(2). Also we check the error code from it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This converts the index explicitly on read and write to its on-disk
format, allowing the in-core format to contain more flags, and be
simpler.
In particular, the in-core format is now host-endian (as opposed to the
on-disk one that is network endian in order to be able to be shared
across machines) and as a result we can dispense with all the
htonl/ntohl on accesses to the cache_entry fields.
This will make it easier to make use of various temporary flags that do
not exist in the on-disk format.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* jc/maint-add-sync-stat:
t2200: test more cases of "add -u"
git-add: make the entry stat-clean after re-adding the same contents
ce_match_stat, run_diff_files: use symbolic constants for readability
Conflicts:
builtin-add.c
ce_match_stat() can be told:
(1) to ignore CE_VALID bit (used under "assume unchanged" mode)
and perform the stat comparison anyway;
(2) not to perform the contents comparison for racily clean
entries and report mismatch of cached stat information;
using its "option" parameter. Give them symbolic constants.
Similarly, run_diff_files() can be told not to report anything
on removed paths. Also give it a symbolic constant for that.
Signed-off-by: Junio C Hamano <gitster@pobox.com>