Commit Graph

30 Commits

Author SHA1 Message Date
Elijah Newren
00fcce285d symlinks: do not include startup_info->original_cwd in dir removal
symlinks has a pair of schedule_dir_for_removal() and
remove_scheduled_dirs() functions that ensure that directories made
empty by removing other files also themselves get removed.  However, we
want to exclude startup_info->original_cwd and leave it around.  This
avoids the user getting confused by subsequent git commands (and non-git
commands) that would otherwise report confusing messages about being
unable to read the current working directory.

Acked-by: Derrick Stolee <stolee@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 13:33:13 -08:00
Matheus Tavares
fab78a0c3d checkout: don't follow symlinks when removing entries
At 1d718a5108 ("do not overwrite untracked symlinks", 2011-02-20),
symlink.c:check_leading_path() started returning different codes for
FL_ENOENT and FL_SYMLINK. But one of its callers, unlink_entry(), was
not adjusted for this change, so it started to follow symlinks on the
leading path of to-be-removed entries. Fix that and add a regression
test.

Note that since 1d718a5108 check_leading_path() no longer differentiates
the case where it found a symlink in the path's leading components from
the cases where it found a regular file or failed to lstat() the
component. So, a side effect of this current patch is that
unlink_entry() now returns early in all of these three cases. And
because we no longer try to unlink such paths, we also don't get the
warning from remove_or_warn().

For the regular file and symlink cases, it's questionable whether the
warning was useful in the first place: unlink_entry() removes tracked
paths that should no longer be present in the state we are checking out
to. If the path had its leading dir replaced by another file, it means
that the basename already doesn't exist, so there is no need for a
warning. Sure, we are leaving a regular file or symlink behind at the
path's dirname, but this file is either untracked now (so again, no
need to warn), or it will be replaced by a tracked file during the next
phase of this checkout operation.

As for failing to lstat() one of the leading components, the basename
might still exist only we cannot unlink it (e.g. due to the lack of the
required permissions). Since the user expect it to be removed
(especially with checkout's --no-overlay option), add back the warning
in this more relevant case.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-18 12:58:10 -07:00
Matheus Tavares
462b4e8dfd symlinks: update comment on threaded_check_leading_path()
Since 1d718a5108 ("do not overwrite untracked symlinks", 2011-02-20),
the comment on top of threaded_check_leading_path() is outdated and no
longer reflects the behavior of this function. Let's updated it to avoid
confusions. While we are here, also remove some duplicated comments to
avoid similar maintenance problems.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-18 12:58:08 -07:00
Johannes Schindelin
b1726b1a38 Sync with 2.20.5
* maint-2.20:
  Git 2.20.5
  Git 2.19.6
  Git 2.18.5
  Git 2.17.6
  unpack_trees(): start with a fresh lstat cache
  run-command: invalidate lstat cache after a command finished
  checkout: fix bug that makes checkout follow symlinks in leading path
2021-02-12 15:49:35 +01:00
Matheus Tavares
684dd4c2b4 checkout: fix bug that makes checkout follow symlinks in leading path
Before checking out a file, we have to confirm that all of its leading
components are real existing directories. And to reduce the number of
lstat() calls in this process, we cache the last leading path known to
contain only directories. However, when a path collision occurs (e.g.
when checking out case-sensitive files in case-insensitive file
systems), a cached path might have its file type changed on disk,
leaving the cache on an invalid state. Normally, this doesn't bring
any bad consequences as we usually check out files in index order, and
therefore, by the time the cached path becomes outdated, we no longer
need it anyway (because all files in that directory would have already
been written).

But, there are some users of the checkout machinery that do not always
follow the index order. In particular: checkout-index writes the paths
in the same order that they appear on the CLI (or stdin); and the
delayed checkout feature -- used when a long-running filter process
replies with "status=delayed" -- postpones the checkout of some entries,
thus modifying the checkout order.

When we have to check out an out-of-order entry and the lstat() cache is
invalid (due to a previous path collision), checkout_entry() may end up
using the invalid data and thrusting that the leading components are
real directories when, in reality, they are not. In the best case
scenario, where the directory was replaced by a regular file, the user
will get an error: "fatal: unable to create file 'foo/bar': Not a
directory". But if the directory was replaced by a symlink, checkout
could actually end up following the symlink and writing the file at a
wrong place, even outside the repository. Since delayed checkout is
affected by this bug, it could be used by an attacker to write
arbitrary files during the clone of a maliciously crafted repository.

Some candidate solutions considered were to disable the lstat() cache
during unordered checkouts or sort the entries before passing them to
the checkout machinery. But both ideas include some performance penalty
and they don't future-proof the code against new unordered use cases.

Instead, we now manually reset the lstat cache whenever we successfully
remove a directory. Note: We are not even checking whether the directory
was the same as the lstat cache points to because we might face a
scenario where the paths refer to the same location but differ due to
case folding, precomposed UTF-8 issues, or the presence of `..`
components in the path. Two regression tests, with case-collisions and
utf8-collisions, are also added for both checkout-index and delayed
checkout.

Note: to make the previously mentioned clone attack unfeasible, it would
be sufficient to reset the lstat cache only after the remove_subtree()
call inside checkout_entry(). This is the place where we would remove a
directory whose path collides with the path of another entry that we are
currently trying to check out (possibly a symlink). However, in the
interest of a thorough fix that does not leave Git open to
similar-but-not-identical attack vectors, we decided to intercept
all `rmdir()` calls in one fell swoop.

This addresses CVE-2021-21300.

Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
2021-02-12 15:47:02 +01:00
Nguyễn Thái Ngọc Duy
ec36c42a63 Indent code with TABs
We indent with TABs and sometimes for fine alignment, TABs followed by
spaces, but never all spaces (unless the indentation is less than 8
columns). Indenting with spaces slips through in some places. Fix
them.

Imported code and compat/ are left alone on purpose. The former should
remain as close as upstream as possible. The latter pretty much has
separate maintainers, it's up to them to decide.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-09 12:37:32 +09:00
Karsten Blees
e7c7305300 symlinks: remove PATH_MAX limitation
'git checkout' fails if a directory is longer than PATH_MAX, because the
lstat_cache in symlinks.c checks if the leading directory exists using
PATH_MAX-bounded string operations.

Remove the limitation by using strbuf instead.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07 11:22:42 -07:00
Junio C Hamano
72f3196a2d symlinks.c: mark private file-scope symbols as static
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-15 22:58:21 -07:00
Jared Hance
15438d5a56 Add threaded versions of functions in symlinks.c.
check_leading_path() and has_dirs_only_path() both always use the default
cache, which could be a caveat for adding parallelism (which is a concern
and even a GSoC proposal).

Reimplement these two in terms of new threaded_check_leading_path() and
threaded_has_dirs_only_path() that take their own copy of the cache.

Signed-off-by: Jared Hance <jaredhance@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-02 23:56:28 -08:00
Clemens Buchacher
1d718a5108 do not overwrite untracked symlinks
Git traditionally overwrites untracked symlinks silently. This will
generally not cause massive data loss, but it is inconsistent with
the behavior for regular files, which are not silently overwritten.

With this change, git refuses to overwrite untracked symlinks by
default. If the user really wants to overwrite the untracked
symlink, he has git-clean and git-checkout -f at his disposal.

Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-21 22:51:07 -08:00
Clemens Buchacher
f66caaf9c8 do not overwrite files in leading path
If the work tree contains an untracked file x, and
unpack-trees wants to checkout a path x/*, the
file x is removed unconditionally.

Instead, apply the same checks that are normally
used for untracked files, and abort if the file
cannot be removed.

Signed-off-by: Clemens Buchacher <drizzd@aon.at>
2010-10-13 14:34:09 -07:00
Clemens Buchacher
4856ff2a19 lstat_cache: optionally return match_len
Return match_len so that the caller can know which leading path
component matched.

Signed-off-by: Clemens Buchacher <drizzd@aon.at>
2010-10-13 14:34:08 -07:00
Junio C Hamano
64161a6b23 symlinks.c: remove unused functions
invalidate_lstat_cache() and clear_lstat_cache() are not used anywhere.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-17 22:49:36 -08:00
Kjetil Barvik
77716755cb lstat_cache: guard against full match of length of 'name' parameter
longest_path_match() in symlinks.c does exactly what it's name says,
but in some cases that match can be too long, since the
has_*_leading_path() functions assumes that the match will newer be as
long as the name string given to the function.

fix this by adding an extra if test which checks if the match length
is equal to the 'len' parameter.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-29 20:20:12 -07:00
Linus Torvalds
b9fd284657 Export thread-safe version of 'has_symlink_leading_path()'
The threaded index preloading will want it, so that it can avoid
locking by simply using a per-thread symlink/directory cache.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-09 20:05:19 -07:00
Linus Torvalds
867f72bf43 Prepare symlink caching for thread-safety
This doesn't actually change the external interfaces, so they are still
thread-unsafe, but it makes the code internally pass a pointer to a
local 'struct cache_def' around, so that the core code can be made
thread-safe.

The threaded index preloading will want to verify that the paths leading
up to a pathname are all real directories.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-09 20:05:19 -07:00
Kjetil Barvik
cb319c3631 symlinks.c: small style cleanup
Add {}-braces around an else-part, where the if-part already has
{}-braces.

And, also remove some unnecessary "return;"-statements at the end of
"void foo()"-functions.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-07 16:23:04 -07:00
Kjetil Barvik
381b920b8a Revert "lstat_cache(): print a warning if doing ping-pong between cache types"
This reverts commit 7734f04873.

I guess that the reverted commit, 7734f048, has been in test long
enough, and should now be reverted.  I have not received any info
regarding any debug output of the reverted commit, so lets hope that
the lstat_cache() function do not cause any ping-pong.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-17 12:10:58 -07:00
Kjetil Barvik
7734f04873 lstat_cache(): print a warning if doing ping-pong between cache types
This is a debug patch which is only to be used while the lstat_cache()
is in the test stage, and should be removed/reverted before the final
relase.

I think it should be useful to catch these warnings, as I it could be
an indication of that the cache would not be very effective if it is
doing ping-pong by switching between different cache types too many
times.

Also, if someone is experimenting with the lstat_cache(), this patch
will maybe be useful while debugging.

If someone is able to trigger the warning, then send a mail to the GIT
mailing list, containing the first 15 lines of the warning, and a
short description of the GIT commands to trigger the warnings.

I hope someone is willing to use this patch for a while, to be able to
catch possible ping-pong's.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-09 20:59:26 -08:00
Kjetil Barvik
7847892716 unlink_entry(): introduce schedule_dir_for_removal()
Currently inside unlink_entry() if we get a successful removal of one
file with unlink(), we try to remove the leading directories each and
every time.  So if one directory containing 200 files is moved to an
other location we get 199 failed calls to rmdir() and 1 successful
call.

To fix this and avoid some unnecessary calls to rmdir(), we schedule
each directory for removal and wait much longer before we do the real
call to rmdir().

Since the unlink_entry() function is called with alphabetically sorted
names, this new function end up being very effective to avoid
unnecessary calls to rmdir().  In some cases over 95% of all calls to
rmdir() is removed with this patch.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-09 20:59:26 -08:00
Kjetil Barvik
571998921d lstat_cache(): swap func(length, string) into func(string, length)
Swap function argument pair (length, string) into (string, length) to
conform with the commonly used order inside the GIT source code.

Also, add a note about this fact into the coding guidelines.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-09 20:59:26 -08:00
Kjetil Barvik
148bc06b91 lstat_cache(): generalise longest_match_lstat_cache()
Rename the function to longst_path_match() and generalise it such that
it can also be used by other functions.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-09 20:59:26 -08:00
Kjetil Barvik
60b458b7d3 lstat_cache(): small cleanup and optimisation
Simplify the if-else test in longest_match_lstat_cache() such that we
only have one simple if test.  Instead of testing for 'i == cache.len'
or 'i == len', we transform this to a common test for 'i == max_len'.

And to further optimise we use 'i >= max_len' instead of 'i ==
max_len', the reason is that it is now the exact opposite of one part
inside the while-loop termination expression 'i < max_len && name[i]
== cache.path[i]', and then the compiler can probably reuse a test
instruction from it.

We also throw away the arguments to reset_lstat_cache(), such that all
the safeguard logic inside lstat_cache() is handled at one place.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-09 20:59:26 -08:00
Kjetil Barvik
bda6eb0da9 lstat_cache(): introduce clear_lstat_cache() function
If you want to completely clear the contents of the lstat_cache(), then
call this new function.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 13:58:34 -08:00
Kjetil Barvik
aeabab5c71 lstat_cache(): introduce invalidate_lstat_cache() function
In some cases it could maybe be necessary to say to the cache that
"Hey, I deleted/changed the type of this pathname and if you currently
have it inside your cache, you should deleted it".

This patch introduce a function which support this.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 13:58:31 -08:00
Kjetil Barvik
bad4a54fa6 lstat_cache(): introduce has_dirs_only_path() function
The create_directories() function in entry.c currently calls stat()
or lstat() for each path component of the pathname 'path' each and every
time.  For the 'git checkout' command, this function is called on each
file for which we must do an update (ce->ce_flags & CE_UPDATE), so we get
lots and lots of calls.

To fix this, we make a new wrapper to the lstat_cache() function, and
call the wrapper function instead of the calls to the stat() or the
lstat() functions.  Since the paths given to the create_directories()
function, is sorted alphabetically, the new wrapper would be very
cache effective in this situation.

To support it we must update the lstat_cache() function to be able to
say that "please test the complete length of 'name'", and also to give
it the length of a prefix, where the cache should use the stat()
function instead of the lstat() function to test each path component.

Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable
comments to this patch!

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 13:54:54 -08:00
Kjetil Barvik
09c9306658 lstat_cache(): introduce has_symlink_or_noent_leading_path() function
In some cases, especially inside the unpack-trees.c file, and inside
the verify_absent() function, we can avoid some unnecessary calls to
lstat(), if the lstat_cache() function can also be told to keep track
of non-existing directories.

So we update the lstat_cache() function to handle this new fact,
introduce a new wrapper function, and the result is that we save lots
of lstat() calls for a removed directory which previously contained
lots of files, when we call this new wrapper of lstat_cache() instead
of the old one.

We do similar changes inside the unlink_entry() function, since if we
can already say that the leading directory component of a pathname
does not exist, it is not necessary to try to remove a pathname below
it!

Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable
comments to this patch!

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 13:54:49 -08:00
Kjetil Barvik
92604b4663 lstat_cache(): more cache effective symlink/directory detection
Make the cache functionality more effective.  Previously when A/B/C/D
was in the cache and A/B/C/E/file.c was called for, there was no match
at all from the cache.  Now we use the fact that the paths "A", "A/B"
and "A/B/C" are already tested, and we only need to do an lstat() call
on "A/B/C/E".

We only cache/store the last path regardless of its type.  Since the
cache functionality is always used with alphabetically sorted names
(at least it seems so for me), there is no need to store both the last
symlink-leading path and the last real-directory path.  Note that if
the cache is not called with (mostly) alphabetically sorted names,
neither the old, nor this new one, would be very effective.

Previously, when symlink A/B/C/S was cached/stored in the symlink-
leading path, and A/B/C/file.c was called for, it was not easy to use
the fact that we already knew that the paths "A", "A/B" and "A/B/C"
are real directories.

Avoid copying the first path components of the name 2 zillion times
when we test new path components.  Since we always cache/store the
last path, we can copy each component as we test those directly into
the cache.  Previously we ended up doing a memcpy() for the full
path/name right before each lstat() call, and when updating the cache
for each time we have tested a new path component.

We also use less memory, that is, PATH_MAX bytes less memory on the
stack and PATH_MAX bytes less memory on the heap.

Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable
comments to this patch!

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 13:54:45 -08:00
Linus Torvalds
c40641b77b Optimize symlink/directory detection
This is the base for making symlink detection in the middle fo a pathname
saner and (much) more efficient.

Under various loads, we want to verify that the full path leading up to a
filename is a real directory tree, and that when we successfully do an
'lstat()' on a filename, we don't get a false positive due to a symlink in
the middle of the path that git should have seen as a symlink, not as a
normal path component.

The 'has_symlink_leading_path()' function already did this, and cached
a single level of symlink information, but didn't cache the _lack_ of a
symlink, so the normal behaviour was actually the wrong way around, and we
ended up doing an 'lstat()' on each path component to check that it was a
real directory.

This caches the last detected full directory and symlink entries, and
speeds up especially deep directory structures a lot by avoiding to
lstat() all the directories leading up to each entry in the index.

[ This can - and should - probably be extended upon so that we eventually
  never do a bare 'lstat()' on any path entries at *all* when checking the
  index, but always check the full path carefully. Right now we do not
  generally check the whole path for all our normal quick index
  revalidation.

  We should also make sure that we're careful about all the invalidation,
  ie when we remove a link and replace it by a directory we should
  invalidate the symlink cache if it matches (and vice versa for the
  directory cache).

  But regardless, the basic function needs to be sane to do that. The old
  'has_symlink_leading_path()' was not capable enough - or indeed the code
  readable enough - to really do that sanely. So I'm pushing this as not
  just an optimization, but as a base for further work. ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-10 18:16:31 -07:00
Junio C Hamano
f859c846e9 Add has_symlink_leading_path() function.
When we are applying a patch that creates a blob at a path, or
when we are switching from a branch that does not have a blob at
the path to another branch that has one, we need to make sure
that there is nothing at the path in the working tree, as such a
file is a local modification made by the user that would be lost
by the operation.

Normally, lstat() on the path and making sure ENOENT is returned
is good enough for that purpose.  However there is a twist.  We
may be creating a regular file arch/x86_64/boot/Makefile, while
removing an existing symbolic link at arch/x86_64/boot that
points at existing ../i386/boot directory that has Makefile in
it.  We always first check without touching filesystem and then
perform the actual operation, so when we verify the new file,
arch/x86_64/boot/Makefile, does not exist, we haven't removed
the symbolic link arc/x86_64/boot symbolic link yet.  lstat() on
the file sees through the symbolic link and reports the file is
there, which is not what we want.

The function has_symlink_leading_path() function takes a path,
and sees if any of the leading directory component is a symbolic
link.

When files in a new directory are created, we tend to process
them together because both index and tree are sorted.  The
function takes advantage of this and allows the caller to cache
and reuse which symbolic link on the filesystem caused the
function to return true.

The calling sequence would be:

	char last_symlink[PATH_MAX];

        *last_symlink = '\0';
        for each index entry {
		if (!lose)
			continue;
		if (lstat(it))
			if (errno == ENOENT)
				; /* happy */
			else
				error;
		else if (has_symlink_leading_path(it, last_symlink))
			; /* happy */
		else
			error; /* would lose local changes */
		unlink_entry(it, last_symlink);
	}

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-11 22:11:07 -07:00