Commit Graph

37 Commits

Author SHA1 Message Date
Patrick Steinhardt
9f119599a6 cache-tree: refactor verification to return error codes
The function `cache_tree_verify()` will `BUG()` whenever it finds that
the cache-tree extension of the index is corrupt. The function is thus
inherently untestable because the resulting call to `abort()` will be
detected by our testing framework and labelled an error. And rightfully
so: it shouldn't ever be possible to hit bugs, as they should indicate a
programming error rather than corruption of on-disk state.

Refactor the function to instead return error codes. This also ensures
that the function can be used e.g. by git-fsck(1) without the whole
process dying. Furthermore, this refactoring plugs some memory leaks
when returning early by creating a common exit path.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-07 15:08:11 -07:00
Elijah Newren
a64215b6cd object.h: stop depending on cache.h; make cache.h depend on object.h
Things should be able to depend on object.h without pulling in all of
cache.h.  Move an enum to allow this.

Note that a couple files previously depended on things brought in
through cache.h indirectly (revision.h -> commit.h -> object.h ->
cache.h).  As such, this change requires making existing dependencies
more explicit in half a dozen files.  The inclusion of strbuf.h in
some headers if of particular note: these headers directly embedded a
strbuf in some new structs, meaning they should have been including
strbuf.h all along but were indirectly getting the necessary
definitions.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23 17:25:29 -08:00
Ævar Arnfjörð Bjarmason
fcb864bce7 cache-tree API: remove redundant update_main_cache_tree()
Remove the redundant update_main_cache_tree() function, and make its
users use cache_tree_update() instead.

The behavior of populating the "the_index.cache_tree" if it wasn't
present already was needed when this function was introduced in [1],
but it hasn't been needed since [2]; The "cache_tree_update()" will
now lazy-allocate, so there's no need for the wrapper.

1. 996277c520 (Refactor cache_tree_update idiom from commit,
   2011-12-06)
2. fb0882648e (cache-tree: clean up cache_tree_update(), 2021-01-23)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-10 11:38:14 -08:00
Ævar Arnfjörð Bjarmason
99370863e2 cocci & cache-tree.h: migrate "write_cache_as_tree" to "*_index_*"
Add a trivial rule for "write_cache_as_tree" to
"index-compatibility.cocci", and apply it. This was left out of the
rules added in 0e6550a2c6 (cocci: add a
index-compatibility.pending.cocci, 2022-11-19) because this
compatibility wrapper lived in "cache-tree.h", not "cache.h"

But it's like the other "USE_THE_INDEX_COMPATIBILITY_MACROS", so let's
migrate it too.

The replacement of "USE_THE_INDEX_COMPATIBILITY_MACROS" here with
"USE_THE_INDEX_VARIABLE" is a manual change on top, now that these
files only use "&the_index", and don't need any compatibility
macros (or functions).

The wrapping of some argument lists is likewise manual, as coccinelle
would otherwise give us overly long argument lists.

The reason for putting the "O" in the cocci rule on the "-" and "+"
lines is because I couldn't get correct whitespacing otherwise,
i.e. I'd end up with "oid,&the_index", not "oid, &the_index".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-10 11:37:49 -08:00
Derrick Stolee
86aa250aa8 cache-tree: remove cache_tree_find_path()
This reverts 080ab56a46 (cache-tree: implement cache_tree_find_path(),
2022-05-23). The cache_tree_find_path() method was never actually called
in the topic that added it. I cannot find any reference to it in any of
my forks, so this appears to not be needed at the moment.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-16 11:59:56 -07:00
Derrick Stolee
080ab56a46 cache-tree: implement cache_tree_find_path()
Given a 'struct cache_tree', it may be beneficial to navigate directly
to a node within that corresponds to a given path name. Create
cache_tree_find_path() for this function. It returns NULL when no such
path exists.

The implementation is adapted from do_invalidate_path() which does a
similar search but also modifies the nodes it finds along the way. The
method could be implemented simply using tail-recursion, but this while
loop does the same thing.

This new method is not currently used, but will be in an upcoming
change.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-23 11:08:21 -07:00
Derrick Stolee
c80dd3967f cache-tree: extract subtree_pos()
This method will be helpful to use outside of cache-tree.c in a later
feature. The implementation is subtle due to subtree_name_cmp() sorting
by length and then lexicographically.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-23 17:14:07 -08:00
Elijah Newren
724dd767b2 cache-tree: share code between functions writing an index as a tree
write_tree_from_memory() appeared to be a merge-recursive special that
basically duplicated write_index_as_tree().  The two have a different
signature, but the bigger difference was just that write_index_as_tree()
would always unconditionally read the index off of disk instead of
working on the current in-memory index.  So:

  * split out common code into write_index_as_tree_internal()

  * rename write_tree_from_memory() to write_inmemory_index_as_tree(),
    make it call write_index_as_tree_internal(), and move it to
    cache-tree.c

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-19 10:08:03 -07:00
Nguyễn Thái Ngọc Duy
f8adbec9fe cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch
By default, index compat macros are off from now on, because they
could hide the_index dependency.

Only those in builtin can use it.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-24 11:55:06 -08:00
Nguyễn Thái Ngọc Duy
c207e9e1f6 cache-tree.c: remove the_repository references
This case is more interesting than other boring "remove the_repo"
commits because while we need access to the object database, we cannot
simply use r->index because unpack-trees.c can operate on a temporary
index, not $GIT_DIR/index. Ideally we should be able to pass an object
database to lookup_tree() but that ship has sailed.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12 14:50:06 +09:00
Junio C Hamano
7e794d0a3f Merge branch 'nd/unpack-trees-with-cache-tree'
The unpack_trees() API used in checking out a branch and merging
walks one or more trees along with the index.  When the cache-tree
in the index tells us that we are walking a tree whose flattened
contents is known (i.e. matches a span in the index), as linearly
scanning a span in the index is much more efficient than having to
open tree objects recursively and listing their entries, the walk
can be optimized, which is done in this topic.

* nd/unpack-trees-with-cache-tree:
  Document update for nd/unpack-trees-with-cache-tree
  cache-tree: verify valid cache-tree in the test suite
  unpack-trees: add missing cache invalidation
  unpack-trees: reuse (still valid) cache-tree from src_index
  unpack-trees: reduce malloc in cache-tree walk
  unpack-trees: optimize walking same trees with cache-tree
  unpack-trees: add performance tracing
  trace.h: support nested performance tracing
2018-09-17 13:53:53 -07:00
Nguyễn Thái Ngọc Duy
4592e6080f cache-tree: verify valid cache-tree in the test suite
This makes sure that cache-tree is consistent with the index. The main
purpose is to catch potential problems by saving the index in
unpack_trees() but the line in write_index() would also help spot
missing invalidation in other code.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-18 09:47:46 -07:00
Nguyễn Thái Ngọc Duy
07096c9696 cache-tree: wrap the_index based wrappers with #ifdef
This puts update_main_cache_tree() and write_cache_as_tree() in the
same group of "index compat" functions that assume the_index
implicitly, which should only be used within builtin/ or t/helper.

sequencer.c is also updated to not use these functions. As of now, no
files outside builtin/ use these functions anymore.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 14:14:42 -07:00
Nguyễn Thái Ngọc Duy
9ab34f9e05 cache-tree.h: drop extern from function declaration
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-03 10:42:54 -07:00
brian m. carlson
fc5cb99f67 cache-tree: convert write_*_as_tree to object_id
Convert write_index_as_tree and write_cache_as_tree to use struct
object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:47 -07:00
brian m. carlson
e0a9280404 Convert struct cache_tree to use struct object_id
Convert the sha1 member of struct cache_tree to struct object_id by
changing the definition and applying the following semantic patch, plus
the standard object_id transforms:

@@
struct cache_tree E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct cache_tree *E1;
@@
- E1->sha1
+ E1->oid.hash

Fix up one reference to active_cache_tree which was not automatically
caught by Coccinelle.  These changes are prerequisites for converting
parse_object.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-02 10:46:41 +09:00
Paul Tan
d23a5117f8 cache-tree: introduce write_index_as_tree()
A caller may wish to write a temporary index as a tree. However,
write_cache_as_tree() assumes that the index was read from, and will
write to, the default index file path. Introduce write_index_as_tree()
which removes this limitation by allowing the caller to specify its own
index_state and index file path.

Signed-off-by: Paul Tan <pyokagan@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-04 22:02:11 -07:00
Junio C Hamano
3fd13cbcd5 Merge branch 'dt/cache-tree-repair'
Add a few more places in "commit" and "checkout" that make sure
that the cache-tree is fully populated in the index.

* dt/cache-tree-repair:
  cache-tree: do not try to use an invalidated subtree info to build a tree
  cache-tree: Write updated cache-tree after commit
  cache-tree: subdirectory tests
  test-dump-cache-tree: invalid trees are not errors
  cache-tree: create/update cache-tree on checkout
2014-09-11 10:33:32 -07:00
David Turner
aecf567cbf cache-tree: create/update cache-tree on checkout
When git checkout checks out a branch, create or update the
cache-tree so that subsequent operations are faster.

update_main_cache_tree learned a new flag, WRITE_TREE_REPAIR.  When
WRITE_TREE_REPAIR is set, portions of the cache-tree which do not
correspond to existing tree objects are invalidated (and portions which
do are marked as valid).  No new tree objects are created.

Signed-off-by: David Turner <dturner@twitter.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07 12:30:34 -07:00
Nguyễn Thái Ngọc Duy
e6c286e8b2 cache-tree: mark istate->cache_changed on prime_cache_tree()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Nguyễn Thái Ngọc Duy
d0cfc3e866 cache-tree: mark istate->cache_changed on cache tree update
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Nguyễn Thái Ngọc Duy
a5400efe29 cache-tree: mark istate->cache_changed on cache tree invalidation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Nguyễn Thái Ngọc Duy
9c5e6c802c Convert "struct cache_entry *" to "const ..." wherever possible
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is

 - diff-lib.c: setting CE_UPTODATE

 - name-hash.c: setting CE_HASHED

 - preload-index.c, read-cache.c, unpack-trees.c and
   builtin/update-index: obvious

 - entry.c: write_entry() may refresh the checked out entry via
   fill_stat_cache_info(). This causes "non-const struct cache_entry
   *" in builtin/apply.c, builtin/checkout-index.c and
   builtin/checkout.c

 - builtin/ls-files.c: --with-tree changes stagemask and may set
   CE_UPDATE

Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.

So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.

The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:

    diff --git a/cache.h b/cache.h
    index 430d021..1692891 100644
    --- a/cache.h
    +++ b/cache.h
    @@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
     #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)

     struct index_state {
    -	struct cache_entry **cache;
    +	const struct cache_entry **cache;
     	unsigned int version;
     	unsigned int cache_nr, cache_alloc, cache_changed;
     	struct string_list *resolve_undo;

will help quickly identify them without bogus warnings.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-09 09:12:48 -07:00
Nguyễn Thái Ngọc Duy
3cf773e426 cache-tree: fix writing cache-tree when CE_REMOVE is present
entry_count is used in update_one() for two purposes:

1. to skip through the number of processed entries in in-memory index
2. to record the number of entries this cache-tree covers on disk

Unfortunately when CE_REMOVE is present these numbers are not the same
because CE_REMOVE entries are automatically removed before writing to
disk but entry_count is not adjusted and still counts CE_REMOVE
entries.

Separate the two use cases into two different variables. #1 is taken
care by the new field count in struct cache_tree_sub and entry_count
is prepared for #2.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-15 23:04:22 -08:00
Nguyễn Thái Ngọc Duy
e859c69b26 cache-tree: update API to take abitrary flags
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-07 16:35:43 -08:00
Thomas Rast
996277c520 Refactor cache_tree_update idiom from commit
We'll need to safely create or update the cache-tree data of the_index
from other places.  While at it, give it an argument that lets us
silence the messages produced by unmerged entries (which prevent it
from working).

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-06 14:57:36 -08:00
Junio C Hamano
b65982b608 Optimize "diff-index --cached" using cache-tree
When running "diff-index --cached" after making a change to only a small
portion of the index, there is no point unpacking unchanged subtrees into
the index recursively, only to find that all entries match anyway.  Tweak
unpack_trees() logic that is used to read in the tree object to catch the
case where the tree entry we are looking at matches the index as a whole
by looking at the cache-tree.

As an exercise, after modifying a few paths in the kernel tree, here are
a few numbers on my Athlon 64X2 3800+:

    (without patch, hot cache)
    $ /usr/bin/time git diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.07user 0.02system 0:00.09elapsed 102%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+9407minor)pagefaults 0swaps

    (with patch, hot cache)
    $ /usr/bin/time ../git.git/git-diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.02user 0.00system 0:00.02elapsed 103%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+2446minor)pagefaults 0swaps

Cold cache numbers are very impressive, but it does not matter very much
in practice:

    (without patch, cold cache)
    $ su root sh -c 'echo 3 >/proc/sys/vm/drop_caches'
    $ /usr/bin/time git diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.06user 0.17system 0:10.26elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k
    247032inputs+0outputs (1172major+8237minor)pagefaults 0swaps

    (with patch, cold cache)
    $ su root sh -c 'echo 3 >/proc/sys/vm/drop_caches'
    $ /usr/bin/time ../git.git/git-diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.02user 0.01system 0:01.01elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
    18440inputs+0outputs (79major+2369minor)pagefaults 0swaps

This of course helps "git status" as well.

    (without patch, hot cache)
    $ /usr/bin/time ../git.git/git-status >/dev/null
    0.17user 0.18system 0:00.35elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+5336outputs (0major+10970minor)pagefaults 0swaps

    (with patch, hot cache)
    $ /usr/bin/time ../git.git/git-status >/dev/null
    0.10user 0.16system 0:00.27elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+5336outputs (0major+3921minor)pagefaults 0swaps

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-25 11:35:29 -07:00
Junio C Hamano
d11b8d3425 write-tree --ignore-cache-tree
This allows you to discard the cache-tree information before writing the
tree out of the index (i.e. it always recomputes the tree object names for
all the subtrees).

This is only useful as a debug option, so I did not bother documenting it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-20 11:07:07 -07:00
Junio C Hamano
b9d37a5420 Move prime_cache_tree() to cache-tree.c
The interface to build cache-tree belongs there.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-20 04:16:41 -07:00
Nanako Shiraishi
7ba04d9f37 cache-tree.c: make cache_tree_find() static
This function is not used by any other file.

Signed-off-by: Nanako Shiraishi <nanako3@lavabit.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-16 08:50:27 -07:00
Junio C Hamano
45525bd022 Make error messages from cherry-pick/revert more sensible
The original "rewrite in C" did somewhat a sloppy job while
stealing code from git-write-tree.

The caller pretends as if the write_tree() function would return
an error code and being able to issue a sensible error message
itself, but write_tree() function just calls die() and never
returns an error.  Worse yet, the function claims that it was
running git-write-tree (which is no longer true after
cherry-pick stole it).

Tested-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-05 00:39:19 -08:00
Pierre Habouzit
1dffb8fa80 Small cache_tree_write refactor.
This function cannot fail, make it void. Also make write_one act on a
const char* instead of a char*.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-26 02:27:06 -07:00
Junio C Hamano
6bd20358a9 write-tree: --prefix=<path>
The "bind" commit can express an aggregation of multiple
projects into a single commit.

In such an organization, there would be one project, root of
whose tree object is at the same level of the root of the
aggregated projects, and other projects have their toplevel in
separate subdirectories.  Let's call that root level project the
"primary project", and call other ones just "subprojects".

You would first read-tree the primary project, and then graft
the subprojects under their appropriate location using read-tree
--prefix=<subdir>/ repeatedly.

To write out a tree object from such an index for a subproject,
write-tree --prefix=<subdir>/ is used.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-01 22:29:16 -07:00
Junio C Hamano
2956dd3bd7 cache_tree_update: give an option to update cache-tree only.
When the extra "dryrun" parameter is true, cache_tree_update()
recomputes the invalid entry but does not actually creates
new tree object.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-27 16:21:54 -07:00
Junio C Hamano
7927a55d5b read-tree: teach 1-way merege and plain read to prime cache-tree.
This teaches read-tree to fully populate valid cache-tree when
reading a tree from scratch, or reading a single tree into an
existing index, reusing only the cached stat information (i.e.
one-way merge).  We have already taught update-index about cache-tree,
so "git checkout" followed by updates to a few path followed by
a "git commit" would become very efficient.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-27 01:33:07 -07:00
Junio C Hamano
bad68ec924 index: make the index file format extensible.
... and move the cache-tree data into it.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-24 21:24:13 -07:00
Junio C Hamano
749864627c Add cache-tree.
The cache_tree data structure is to cache tree object names that
would result from the current index file.

The idea is to have an optional file to record each tree object
name that corresponds to a directory path in the cache when we
run write_cache(), and read it back when we run read_cache().
During various index manupulations, we selectively invalidate
the parts so that the next write-tree can bypass regenerating
tree objects for unchanged parts of the directory hierarchy.

We could perhaps make the cache-tree data an optional part of
the index file, but that would involve the index format updates,
so unless we need it for performance reasons, the current plan
is to use a separate file, $GIT_DIR/index.aux to store this
information and link it with the index file with the checksum
that is already used for index file integrity check.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-23 20:18:16 -07:00