Commit Graph

13 Commits

Author SHA1 Message Date
Derrick Stolee
4f6d26b167 list-objects: consume sparse tree walk
When creating a pack-file using 'git pack-objects --revs' we provide
a list of interesting and uninteresting commits. For example, a push
operation would make the local topic branch be interesting and the
known remote refs as uninteresting. We want to discover the set of
new objects to send to the server as a thin pack.

We walk these commits until we discover a frontier of commits such
that every commit walk starting at interesting commits ends in a root
commit or unintersting commit. We then need to discover which
non-commit objects are reachable from  uninteresting commits. This
commit walk is not changing during this series.

The mark_edges_uninteresting() method in list-objects.c iterates on
the commit list and does the following:

* If the commit is UNINTERSTING, then mark its root tree and every
  object it can reach as UNINTERESTING.

* If the commit is interesting, then mark the root tree of every
  UNINTERSTING parent (and all objects that tree can reach) as
  UNINTERSTING.

At the very end, we repeat the process on every commit directly
given to the revision walk from stdin. This helps ensure we properly
cover shallow commits that otherwise were not included in the
frontier.

The logic to recursively follow trees is in the
mark_tree_uninteresting() method in revision.c. The algorithm avoids
duplicate work by not recursing into trees that are already marked
UNINTERSTING.

Add a new 'sparse' option to the mark_edges_uninteresting() method
that performs this logic in a slightly different way. As we iterate
over the commits, we add all of the root trees to an oidset. Then,
call mark_trees_uninteresting_sparse() on that oidset. Note that we
include interesting trees in this process. The current implementation
of mark_trees_unintersting_sparse() will walk the same trees as
the old logic, but this will be replaced in a later change.

Add a '--sparse' flag in 'git pack-objects' to call this new logic.
Add a new test script t/t5322-pack-objects-sparse.sh that tests this
option. The tests currently demonstrate that the resulting object
list is the same as the old algorithm. This includes a case where
both algorithms pack an object that is not needed by a remote due to
limits on the explored set of trees. When the sparse algorithm is
changed in a later commit, we will add a test that demonstrates a
change of behavior in some cases.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17 13:44:39 -08:00
Elijah Newren
ef3ca95475 Add missing includes and forward declarations
I looped over the toplevel header files, creating a temporary two-line C
program for each consisting of
  #include "git-compat-util.h"
  #include $HEADER
This patch is the result of manually fixing errors in compiling those
tiny programs.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-15 11:52:09 -07:00
Jeff Hostetler
25ec7bcac0 list-objects: filter objects in traverse_commit_list
Create traverse_commit_list_filtered() and add filtering
interface to allow certain objects to be omitted from the
traversal.

Update traverse_commit_list() to be a wrapper for the above
with a null filter to minimize the number of callers that
needed to be changed.

Object filtering will be used in a future commit by rev-list
and pack-objects for partial clone and fetch to omit unwanted
objects from the result.

traverse_bitmap_commit_list() does not work with filtering.
If a packfile bitmap is present, it will not be used.  It
should be possible to extend such support in the future (at
least to simple filters that do not require object pathnames),
but that is beyond the scope of this patch series.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-22 14:11:57 +09:00
Jeff King
de1e67d070 list-objects: pass full pathname to callbacks
When we find a blob at "a/b/c", we currently pass this to
our show_object_fn callbacks as two components: "a/b/" and
"c". Callbacks which want the full value then call
path_name(), which concatenates the two. But this is an
inefficient interface; the path is a strbuf, and we could
simply append "c" to it temporarily, then roll back the
length, without creating a new copy.

So we could improve this by teaching the callsites of
path_name() this trick (and there are only 3). But we can
also notice that no callback actually cares about the
broken-down representation, and simply pass each callback
the full path "a/b/c" as a string. The callback code becomes
even simpler, then, as we do not have to worry about freeing
an allocated buffer, nor rolling back our modification to
the strbuf.

This is theoretically less efficient, as some callbacks
would not bother to format the final path component. But in
practice this is not measurable. Since we use the same
strbuf over and over, our work to grow it is amortized, and
we really only pay to memcpy a few bytes.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-12 12:51:17 -08:00
Jeff King
bd64516aca list-objects: drop name_path entirely
In the previous commit, we left name_path as a thin wrapper
around a strbuf. This patch drops it entirely. As a result,
every show_object_fn callback needs to be adjusted. However,
none of their code needs to be changed at all, because the
only use was to pass it to path_name(), which now handles
the bare strbuf.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-12 12:51:15 -08:00
Nguyễn Thái Ngọc Duy
e76a5fb459 list-objects: reduce one argument in mark_edges_uninteresting
mark_edges_uninteresting() is always called with this form

  mark_edges_uninteresting(revs->commits, revs, ...);

Remove the first argument and let mark_edges_uninteresting figure that
out by itself. It helps answer the question "are this commit list and
revs related in any way?" when looking at mark_edges_uninteresting
implementation.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-28 11:54:18 -07:00
Junio C Hamano
4947367267 list-objects: pass callback data to show_objects()
The traverse_commit_list() API takes two callback functions, one to show
commit objects, and the other to show other kinds of objects. Even though
the former has a callback data parameter, so that the callback does not
have to rely on global state, the latter does not.

Give the show_objects() callback the same callback data parameter.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-01 15:46:12 -07:00
Junio C Hamano
9824a388e5 Merge branch 'lt/pack-object-memuse'
* lt/pack-object-memuse:
  show_object(): push path_name() call further down
  process_{tree,blob}: show objects without buffering

Conflicts:
	builtin-pack-objects.c
	builtin-rev-list.c
	list-objects.c
	list-objects.h
	upload-pack.c
2009-04-18 14:46:17 -07:00
Linus Torvalds
cf2ab916af show_object(): push path_name() call further down
In particular, pushing the "path_name()" call _into_ the show() function
would seem to allow

 - more clarity into who "owns" the name (ie now when we free the name in
   the show_object callback, it's because we generated it ourselves by
   calling path_name())

 - not calling path_name() at all, either because we don't care about the
   name in the first place, or because we are actually happy walking the
   linked list of "struct name_path *" and the last component.

Now, I didn't do that latter optimization, because it would require some
more coding, but especially looking at "builtin-pack-objects.c", we really
don't even want the whole pathname, we really would be better off with the
list of path components.

Why? We use that name for two things:
 - add_preferred_base_object(), which actually _wants_ to traverse the
   path, and now does it by looking for '/' characters!
 - for 'name_hash()', which only cares about the last 16 characters of a
   name, so again, generating the full name seems to be just unnecessary
   work.

Anyway, so I didn't look any closer at those things, but it did convince
me that the "show_object()" calling convention was crazy, and we're
actually better off doing _less_ in list-objects.c, and giving people
access to the internal data structures so that they can decide whether
they want to generate a path-name or not.

This patch does that, and then for people who did use the name (even if
they might do something more clever in the future), it just does the
straightforward "name = path_name(path, component); .. free(name);" thing.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-12 17:28:31 -07:00
Linus Torvalds
8d2dfc49b1 process_{tree,blob}: show objects without buffering
Here's a less trivial thing, and slightly more dubious one.

I was looking at that "struct object_array objects", and wondering why we
do that. I have honestly totally forgotten. Why not just call the "show()"
function as we encounter the objects? Rather than add the objects to the
object_array, and then at the very end going through the array and doing a
'show' on all, just do things more incrementally.

Now, there are possible downsides to this:

 - the "buffer using object_array" _can_ in theory result in at least
   better I-cache usage (two tight loops rather than one more spread out
   one). I don't think this is a real issue, but in theory..

 - this _does_ change the order of the objects printed. Instead of doing a
   "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop
   over the commits (which puts all the root trees _first_ in the object
   list, this patch just adds them to the list of pending objects, and
   then we'll traverse them in that order (and thus show each root tree
   object together with the objects we discover under it)

   I _think_ the new ordering actually makes more sense, but the object
   ordering is actually a subtle thing when it comes to packing
   efficiency, so any change in order is going to have implications for
   packing. Good or bad, I dunno.

 - There may be some reason why we did it that odd way with the object
   array, that I have simply forgotten.

Anyway, now that we don't buffer up the objects before showing them
that may actually result in lower memory usage during that whole
traverse_commit_list() phase.

This is seriously not very deeply tested. It makes sense to me, it seems
to pass all the tests, it looks ok, but...

Does anybody remember why we did that "object_array" thing? It used to be
an "object_list" a long long time ago, but got changed into the array due
to better memory usage patterns (those linked lists of obejcts are
horrible from a memory allocation standpoint). But I wonder why we didn't
do this back then. Maybe there's a reason for it.

Or maybe there _used_ to be a reason, and no longer is.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-12 17:28:31 -07:00
Christian Couder
11c211fa06 list-objects: add "void *data" parameter to show functions
The goal of this patch is to get rid of the "static struct rev_info
revs" static variable in "builtin-rev-list.c".

To do that, we need to pass the revs to the "show_commit" function
in "builtin-rev-list.c" and this in turn means that the
"traverse_commit_list" function in "list-objects.c" must be passed
functions pointers to functions with 2 parameters instead of one.

So we have to change all the callers and all the functions passed
to "traverse_commit_list".

Anyway this makes the code more clean and more generic, so it
should be a good thing in the long run.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-07 22:12:38 -07:00
Junio C Hamano
8d1d8f83b5 pack-objects: further work on internal rev-list logic.
This teaches the internal rev-list logic to understand options
that are needed for pack handling: --all, --unpacked, and --thin.

It also moves two functions from builtin-rev-list to list-objects
so that the two programs can share more code.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-07 02:46:02 -07:00
Junio C Hamano
c64ed70d25 Separate object listing routines out of rev-list
Create a separate file, list-objects.c, and move object listing
routines from rev-list to it.  The next round will use it in
pack-objects directly.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-07 02:46:01 -07:00