git/revision.c

2211 lines
56 KiB
C
Raw Normal View History

#include "cache.h"
#include "tag.h"
#include "blob.h"
#include "tree.h"
#include "commit.h"
#include "diff.h"
#include "refs.h"
#include "revision.h"
#include "graph.h"
#include "grep.h"
#include "reflog-walk.h"
#include "patch-ids.h"
#include "decorate.h"
#include "log-tree.h"
#include "string-list.h"
volatile show_early_output_fn_t show_early_output;
show_object(): push path_name() call further down In particular, pushing the "path_name()" call _into_ the show() function would seem to allow - more clarity into who "owns" the name (ie now when we free the name in the show_object callback, it's because we generated it ourselves by calling path_name()) - not calling path_name() at all, either because we don't care about the name in the first place, or because we are actually happy walking the linked list of "struct name_path *" and the last component. Now, I didn't do that latter optimization, because it would require some more coding, but especially looking at "builtin-pack-objects.c", we really don't even want the whole pathname, we really would be better off with the list of path components. Why? We use that name for two things: - add_preferred_base_object(), which actually _wants_ to traverse the path, and now does it by looking for '/' characters! - for 'name_hash()', which only cares about the last 16 characters of a name, so again, generating the full name seems to be just unnecessary work. Anyway, so I didn't look any closer at those things, but it did convince me that the "show_object()" calling convention was crazy, and we're actually better off doing _less_ in list-objects.c, and giving people access to the internal data structures so that they can decide whether they want to generate a path-name or not. This patch does that, and then for people who did use the name (even if they might do something more clever in the future), it just does the straightforward "name = path_name(path, component); .. free(name);" thing. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-11 09:15:26 +08:00
char *path_name(const struct name_path *path, const char *name)
{
show_object(): push path_name() call further down In particular, pushing the "path_name()" call _into_ the show() function would seem to allow - more clarity into who "owns" the name (ie now when we free the name in the show_object callback, it's because we generated it ourselves by calling path_name()) - not calling path_name() at all, either because we don't care about the name in the first place, or because we are actually happy walking the linked list of "struct name_path *" and the last component. Now, I didn't do that latter optimization, because it would require some more coding, but especially looking at "builtin-pack-objects.c", we really don't even want the whole pathname, we really would be better off with the list of path components. Why? We use that name for two things: - add_preferred_base_object(), which actually _wants_ to traverse the path, and now does it by looking for '/' characters! - for 'name_hash()', which only cares about the last 16 characters of a name, so again, generating the full name seems to be just unnecessary work. Anyway, so I didn't look any closer at those things, but it did convince me that the "show_object()" calling convention was crazy, and we're actually better off doing _less_ in list-objects.c, and giving people access to the internal data structures so that they can decide whether they want to generate a path-name or not. This patch does that, and then for people who did use the name (even if they might do something more clever in the future), it just does the straightforward "name = path_name(path, component); .. free(name);" thing. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-11 09:15:26 +08:00
const struct name_path *p;
char *n, *m;
int nlen = strlen(name);
int len = nlen + 1;
for (p = path; p; p = p->up) {
if (p->elem_len)
len += p->elem_len + 1;
}
n = xmalloc(len);
m = n + len - (nlen + 1);
strcpy(m, name);
for (p = path; p; p = p->up) {
if (p->elem_len) {
m -= p->elem_len + 1;
memcpy(m, p->elem, p->elem_len);
m[p->elem_len] = '/';
}
}
return n;
}
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-20 08:42:35 +08:00
void add_object(struct object *obj,
struct object_array *p,
struct name_path *path,
const char *name)
{
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-20 08:42:35 +08:00
add_object_array(obj, path_name(path, name), p);
}
static void mark_blob_uninteresting(struct blob *blob)
{
if (!blob)
return;
if (blob->object.flags & UNINTERESTING)
return;
blob->object.flags |= UNINTERESTING;
}
void mark_tree_uninteresting(struct tree *tree)
{
struct tree_desc desc;
tree_entry(): new tree-walking helper function This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-31 00:45:45 +08:00
struct name_entry entry;
struct object *obj = &tree->object;
if (!tree)
return;
if (obj->flags & UNINTERESTING)
return;
obj->flags |= UNINTERESTING;
if (!has_sha1_file(obj->sha1))
return;
if (parse_tree(tree) < 0)
die("bad tree %s", sha1_to_hex(obj->sha1));
init_tree_desc(&desc, tree->buffer, tree->size);
tree_entry(): new tree-walking helper function This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-31 00:45:45 +08:00
while (tree_entry(&desc, &entry)) {
switch (object_type(entry.mode)) {
case OBJ_TREE:
tree_entry(): new tree-walking helper function This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-31 00:45:45 +08:00
mark_tree_uninteresting(lookup_tree(entry.sha1));
break;
case OBJ_BLOB:
tree_entry(): new tree-walking helper function This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-31 00:45:45 +08:00
mark_blob_uninteresting(lookup_blob(entry.sha1));
break;
default:
/* Subproject commit - not in this repository */
break;
}
}
/*
* We don't care about the tree any more
* after it has been marked uninteresting.
*/
free(tree->buffer);
tree->buffer = NULL;
}
void mark_parents_uninteresting(struct commit *commit)
{
struct commit_list *parents = commit->parents;
while (parents) {
struct commit *commit = parents->item;
if (!(commit->object.flags & UNINTERESTING)) {
commit->object.flags |= UNINTERESTING;
/*
* Normally we haven't parsed the parent
* yet, so we won't have a parent of a parent
* here. However, it may turn out that we've
* reached this commit some other way (where it
* wasn't uninteresting), in which case we need
* to mark its parents recursively too..
*/
if (commit->parents)
mark_parents_uninteresting(commit);
}
/*
* A missing commit is ok iff its parent is marked
* uninteresting.
*
* We just mark such a thing parsed, so that when
* it is popped next time around, we won't be trying
* to parse it and get an error.
*/
if (!has_sha1_file(commit->object.sha1))
commit->object.parsed = 1;
parents = parents->next;
}
}
static void add_pending_object_with_mode(struct rev_info *revs, struct object *obj, const char *name, unsigned mode)
{
if (revs->no_walk && (obj->flags & UNINTERESTING))
revs->no_walk = 0;
if (revs->reflog_info && obj->type == OBJ_COMMIT) {
struct strbuf buf = STRBUF_INIT;
int len = interpret_branch_name(name, &buf);
int st;
if (0 < len && name[len] && buf.len)
strbuf_addstr(&buf, name + len);
st = add_reflog_for_walk(revs->reflog_info,
(struct commit *)obj,
buf.buf[0] ? buf.buf: name);
strbuf_release(&buf);
if (st)
return;
}
add_object_array_with_mode(obj, name, &revs->pending, mode);
}
void add_pending_object(struct rev_info *revs, struct object *obj, const char *name)
{
add_pending_object_with_mode(revs, obj, name, S_IFINVALID);
}
void add_head_to_pending(struct rev_info *revs)
{
unsigned char sha1[20];
struct object *obj;
if (get_sha1("HEAD", sha1))
return;
obj = parse_object(sha1);
if (!obj)
return;
add_pending_object(revs, obj, "HEAD");
}
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 07:52:13 +08:00
static struct object *get_reference(struct rev_info *revs, const char *name, const unsigned char *sha1, unsigned int flags)
{
struct object *object;
object = parse_object(sha1);
if (!object)
die("bad object %s", name);
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 07:52:13 +08:00
object->flags |= flags;
return object;
}
static struct commit *handle_commit(struct rev_info *revs, struct object *object, const char *name)
{
unsigned long flags = object->flags;
/*
* Tag object? Look what it points to..
*/
while (object->type == OBJ_TAG) {
struct tag *tag = (struct tag *) object;
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 07:52:13 +08:00
if (revs->tag_objects && !(flags & UNINTERESTING))
add_pending_object(revs, object, tag->tag);
if (!tag->tagged)
die("bad tag");
object = parse_object(tag->tagged->sha1);
if (!object) {
if (flags & UNINTERESTING)
return NULL;
die("bad object %s", sha1_to_hex(tag->tagged->sha1));
}
}
/*
* Commit object? Just return it, we'll do all the complex
* reachability crud.
*/
if (object->type == OBJ_COMMIT) {
struct commit *commit = (struct commit *)object;
if (parse_commit(commit) < 0)
die("unable to parse commit %s", name);
if (flags & UNINTERESTING) {
commit->object.flags |= UNINTERESTING;
mark_parents_uninteresting(commit);
revs->limited = 1;
}
if (revs->show_source && !commit->util)
commit->util = (void *) name;
return commit;
}
/*
* Tree object? Either mark it uninteresting, or add it
* to the list of objects to look at later..
*/
if (object->type == OBJ_TREE) {
struct tree *tree = (struct tree *)object;
if (!revs->tree_objects)
return NULL;
if (flags & UNINTERESTING) {
mark_tree_uninteresting(tree);
return NULL;
}
add_pending_object(revs, object, "");
return NULL;
}
/*
* Blob object? You know the drill by now..
*/
if (object->type == OBJ_BLOB) {
struct blob *blob = (struct blob *)object;
if (!revs->blob_objects)
return NULL;
if (flags & UNINTERESTING) {
mark_blob_uninteresting(blob);
return NULL;
}
add_pending_object(revs, object, "");
return NULL;
}
die("%s is unknown object", name);
}
static int everybody_uninteresting(struct commit_list *orig)
{
struct commit_list *list = orig;
while (list) {
struct commit *commit = list->item;
list = list->next;
if (commit->object.flags & UNINTERESTING)
continue;
return 0;
}
return 1;
}
/*
* The goal is to get REV_TREE_NEW as the result only if the
* diff consists of all '+' (and no other changes), REV_TREE_OLD
* if the whole diff is removal of old data, and otherwise
* REV_TREE_DIFFERENT (of course if the trees are the same we
* want REV_TREE_SAME).
* That means that once we get to REV_TREE_DIFFERENT, we do not
* have to look any further.
*/
static int tree_difference = REV_TREE_SAME;
static void file_add_remove(struct diff_options *options,
int addremove, unsigned mode,
const unsigned char *sha1,
const char *fullpath, unsigned dirty_submodule)
{
int diff = addremove == '+' ? REV_TREE_NEW : REV_TREE_OLD;
tree_difference |= diff;
if (tree_difference == REV_TREE_DIFFERENT)
DIFF_OPT_SET(options, HAS_CHANGES);
}
static void file_change(struct diff_options *options,
unsigned old_mode, unsigned new_mode,
const unsigned char *old_sha1,
const unsigned char *new_sha1,
const char *fullpath,
unsigned old_dirty_submodule, unsigned new_dirty_submodule)
{
tree_difference = REV_TREE_DIFFERENT;
DIFF_OPT_SET(options, HAS_CHANGES);
}
static int rev_compare_tree(struct rev_info *revs, struct commit *parent, struct commit *commit)
{
struct tree *t1 = parent->tree;
struct tree *t2 = commit->tree;
if (!t1)
return REV_TREE_NEW;
if (!t2)
return REV_TREE_OLD;
if (revs->simplify_by_decoration) {
/*
* If we are simplifying by decoration, then the commit
* is worth showing if it has a tag pointing at it.
*/
if (lookup_decoration(&name_decoration, &commit->object))
return REV_TREE_DIFFERENT;
/*
* A commit that is not pointed by a tag is uninteresting
* if we are not limited by path. This means that you will
* see the usual "commits that touch the paths" plus any
* tagged commit by specifying both --simplify-by-decoration
* and pathspec.
*/
if (!revs->prune_data)
return REV_TREE_SAME;
}
tree_difference = REV_TREE_SAME;
DIFF_OPT_CLR(&revs->pruning, HAS_CHANGES);
if (diff_tree_sha1(t1->object.sha1, t2->object.sha1, "",
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 07:52:13 +08:00
&revs->pruning) < 0)
return REV_TREE_DIFFERENT;
return tree_difference;
}
static int rev_same_tree_as_empty(struct rev_info *revs, struct commit *commit)
{
int retval;
void *tree;
unsigned long size;
struct tree_desc empty, real;
struct tree *t1 = commit->tree;
if (!t1)
return 0;
tree = read_object_with_reference(t1->object.sha1, tree_type, &size, NULL);
if (!tree)
return 0;
init_tree_desc(&real, tree, size);
init_tree_desc(&empty, "", 0);
tree_difference = REV_TREE_SAME;
DIFF_OPT_CLR(&revs->pruning, HAS_CHANGES);
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 07:52:13 +08:00
retval = diff_tree(&empty, &real, "", &revs->pruning);
free(tree);
return retval >= 0 && (tree_difference == REV_TREE_SAME);
}
static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
{
struct commit_list **pp, *parent;
int tree_changed = 0, tree_same = 0;
/*
* If we don't do pruning, everything is interesting
*/
if (!revs->prune)
return;
if (!commit->tree)
return;
if (!commit->parents) {
if (rev_same_tree_as_empty(revs, commit))
commit->object.flags |= TREESAME;
return;
}
/*
* Normal non-merge commit? If we don't want to make the
* history dense, we consider it always to be a change..
*/
if (!revs->dense && !commit->parents->next)
return;
pp = &commit->parents;
while ((parent = *pp) != NULL) {
struct commit *p = parent->item;
if (parse_commit(p) < 0)
die("cannot simplify commit %s (because of %s)",
sha1_to_hex(commit->object.sha1),
sha1_to_hex(p->object.sha1));
switch (rev_compare_tree(revs, p, commit)) {
case REV_TREE_SAME:
tree_same = 1;
gitweb.cgi history not shown This does: - add a "rev.simplify_history" flag which defaults to on - it turns it off for "git whatchanged" (which thus now has real semantics outside of "git log") - it adds a command line flag ("--full-history") to turn it off for others (ie you can make "git log" and "gitk" etc get the semantics if you want to. Now, just as an example of _why_ you really really really want to simplify history by default, apply this patch, install it, and try these two command lines: gitk --full-history -- git.c gitk -- git.c and compare the output. So with this, you can also now do git whatchanged -p -- gitweb.cgi git log -p --full-history -- gitweb.cgi and it will show the old history of gitweb.cgi, even though it's not relevant to the _current_ state of the name "gitweb.cgi" NOTE NOTE NOTE! It will still actually simplify away merges that didn't change anything at all into either child. That creates these bogus strange discontinuities if you look at it with "gitk" (look at the --full-history gitk output for git.c, and you'll see a few strange cases). So the whole "--parent" thing ends up somewhat bogus with --full-history because of this, but I'm not sure it's worth even worrying about. I don't think you'd ever want to really use "--full-history" with the graphical representation, I just give it as an example exactly to show _why_ doing so would be insane. I think this is trivial enough and useful enough to be worth merging into the stable branch. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-12 01:57:35 +08:00
if (!revs->simplify_history || (p->object.flags & UNINTERESTING)) {
/* Even if a merge with an uninteresting
* side branch brought the entire change
* we are interested in, we do not want
* to lose the other branches of this
* merge, so we just keep going.
*/
pp = &parent->next;
continue;
}
parent->next = NULL;
commit->parents = parent;
commit->object.flags |= TREESAME;
return;
case REV_TREE_NEW:
if (revs->remove_empty_trees &&
rev_same_tree_as_empty(revs, p)) {
/* We are adding all the specified
* paths from this parent, so the
* history beyond this parent is not
* interesting. Remove its parents
* (they are grandparents for us).
* IOW, we pretend this parent is a
* "root" commit.
*/
if (parse_commit(p) < 0)
die("cannot simplify commit %s (invalid %s)",
sha1_to_hex(commit->object.sha1),
sha1_to_hex(p->object.sha1));
p->parents = NULL;
}
/* fallthrough */
case REV_TREE_OLD:
case REV_TREE_DIFFERENT:
tree_changed = 1;
pp = &parent->next;
continue;
}
die("bad tree compare for commit %s", sha1_to_hex(commit->object.sha1));
}
if (tree_changed && !tree_same)
return;
commit->object.flags |= TREESAME;
}
static void insert_by_date_cached(struct commit *p, struct commit_list **head,
struct commit_list *cached_base, struct commit_list **cache)
{
struct commit_list *new_entry;
if (cached_base && p->date < cached_base->item->date)
new_entry = insert_by_date(p, &cached_base->next);
else
new_entry = insert_by_date(p, head);
if (cache && (!*cache || p->date < (*cache)->item->date))
*cache = new_entry;
}
static int add_parents_to_list(struct rev_info *revs, struct commit *commit,
struct commit_list **list, struct commit_list **cache_ptr)
{
struct commit_list *parent = commit->parents;
unsigned left_flag;
struct commit_list *cached_base = cache_ptr ? *cache_ptr : NULL;
if (commit->object.flags & ADDED)
return 0;
commit->object.flags |= ADDED;
/*
* If the commit is uninteresting, don't try to
* prune parents - we want the maximal uninteresting
* set.
*
* Normally we haven't parsed the parent
* yet, so we won't have a parent of a parent
* here. However, it may turn out that we've
* reached this commit some other way (where it
* wasn't uninteresting), in which case we need
* to mark its parents recursively too..
*/
if (commit->object.flags & UNINTERESTING) {
while (parent) {
struct commit *p = parent->item;
parent = parent->next;
if (p)
p->object.flags |= UNINTERESTING;
if (parse_commit(p) < 0)
continue;
if (p->parents)
mark_parents_uninteresting(p);
if (p->object.flags & SEEN)
continue;
p->object.flags |= SEEN;
insert_by_date_cached(p, list, cached_base, cache_ptr);
}
return 0;
}
/*
* Ok, the commit wasn't uninteresting. Try to
* simplify the commit history and find the parent
* that has no differences in the path set if one exists.
*/
try_to_simplify_commit(revs, commit);
if (revs->no_walk)
return 0;
left_flag = (commit->object.flags & SYMMETRIC_LEFT);
for (parent = commit->parents; parent; parent = parent->next) {
struct commit *p = parent->item;
if (parse_commit(p) < 0)
return -1;
if (revs->show_source && !p->util)
p->util = commit->util;
p->object.flags |= left_flag;
if (!(p->object.flags & SEEN)) {
p->object.flags |= SEEN;
insert_by_date_cached(p, list, cached_base, cache_ptr);
}
if (revs->first_parent_only)
break;
}
return 0;
}
static void cherry_pick_list(struct commit_list *list, struct rev_info *revs)
{
struct commit_list *p;
int left_count = 0, right_count = 0;
int left_first;
struct patch_ids ids;
/* First count the commits on the left and on the right */
for (p = list; p; p = p->next) {
struct commit *commit = p->item;
unsigned flags = commit->object.flags;
if (flags & BOUNDARY)
;
else if (flags & SYMMETRIC_LEFT)
left_count++;
else
right_count++;
}
if (!left_count || !right_count)
return;
left_first = left_count < right_count;
init_patch_ids(&ids);
if (revs->diffopt.nr_paths) {
ids.diffopts.nr_paths = revs->diffopt.nr_paths;
ids.diffopts.paths = revs->diffopt.paths;
ids.diffopts.pathlens = revs->diffopt.pathlens;
}
/* Compute patch-ids for one side */
for (p = list; p; p = p->next) {
struct commit *commit = p->item;
unsigned flags = commit->object.flags;
if (flags & BOUNDARY)
continue;
/*
* If we have fewer left, left_first is set and we omit
* commits on the right branch in this loop. If we have
* fewer right, we skip the left ones.
*/
if (left_first != !!(flags & SYMMETRIC_LEFT))
continue;
commit->util = add_commit_patch_id(commit, &ids);
}
/* Check the other side */
for (p = list; p; p = p->next) {
struct commit *commit = p->item;
struct patch_id *id;
unsigned flags = commit->object.flags;
if (flags & BOUNDARY)
continue;
/*
* If we have fewer left, left_first is set and we omit
* commits on the left branch in this loop.
*/
if (left_first == !!(flags & SYMMETRIC_LEFT))
continue;
/*
* Have we seen the same patch id?
*/
id = has_commit_patch_id(commit, &ids);
if (!id)
continue;
id->seen = 1;
commit->object.flags |= SHOWN;
}
/* Now check the original side for seen ones */
for (p = list; p; p = p->next) {
struct commit *commit = p->item;
struct patch_id *ent;
ent = commit->util;
if (!ent)
continue;
if (ent->seen)
commit->object.flags |= SHOWN;
commit->util = NULL;
}
free_patch_ids(&ids);
}
/* How many extra uninteresting commits we want to see.. */
#define SLOP 5
static int still_interesting(struct commit_list *src, unsigned long date, int slop)
Add "--show-all" revision walker flag for debugging It's really not very easy to visualize the commit walker, because - on purpose - it obvously doesn't show the uninteresting commits! This adds a "--show-all" flag to the revision walker, which will make it show uninteresting commits too, and they'll have a '^' in front of them (it also fixes a logic error for !verbose_header for boundary commits - we should show the '-' even if left_right isn't shown). A separate patch to gitk to teach it the new '^' was sent to paulus. With the change in place, it actually is interesting even for the cases that git doesn't have any problems with, ie for the kernel you can do: gitk -d --show-all v2.6.24.. and you see just how far down it has to parse things to see it all. The use of "-d" is a good idea, since the date-ordered toposort is much better at showing why it goes deep down (ie the date of some of those commits after 2.6.24 is much older, because they were merged from trees that weren't rebased). So I think this is a useful feature even for non-debugging - just to visualize what git does internally more. When it actually breaks out due to the "everybody_uninteresting()" case, it adds the uninteresting commits (both the one it's looking at now, and the list of pending ones) to the list This way, we really list *all* the commits we've looked at. Because we now end up listing commits we may not even have been parsed at all "show_log" and "show_commit" need to protect against commits that don't have a commit buffer entry. That second part is debatable just how it should work. Maybe we shouldn't show such entries at all (with this patch those entries do get shown, they just don't get any message shown with them). But I think this is a useful case. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-10 06:02:07 +08:00
{
/*
* No source list at all? We're definitely done..
*/
if (!src)
return 0;
/*
* Does the destination list contain entries with a date
* before the source list? Definitely _not_ done.
*/
if (date < src->item->date)
return SLOP;
/*
* Does the source list still have interesting commits in
* it? Definitely not done..
*/
if (!everybody_uninteresting(src))
return SLOP;
/* Ok, we're closing in.. */
return slop-1;
Add "--show-all" revision walker flag for debugging It's really not very easy to visualize the commit walker, because - on purpose - it obvously doesn't show the uninteresting commits! This adds a "--show-all" flag to the revision walker, which will make it show uninteresting commits too, and they'll have a '^' in front of them (it also fixes a logic error for !verbose_header for boundary commits - we should show the '-' even if left_right isn't shown). A separate patch to gitk to teach it the new '^' was sent to paulus. With the change in place, it actually is interesting even for the cases that git doesn't have any problems with, ie for the kernel you can do: gitk -d --show-all v2.6.24.. and you see just how far down it has to parse things to see it all. The use of "-d" is a good idea, since the date-ordered toposort is much better at showing why it goes deep down (ie the date of some of those commits after 2.6.24 is much older, because they were merged from trees that weren't rebased). So I think this is a useful feature even for non-debugging - just to visualize what git does internally more. When it actually breaks out due to the "everybody_uninteresting()" case, it adds the uninteresting commits (both the one it's looking at now, and the list of pending ones) to the list This way, we really list *all* the commits we've looked at. Because we now end up listing commits we may not even have been parsed at all "show_log" and "show_commit" need to protect against commits that don't have a commit buffer entry. That second part is debatable just how it should work. Maybe we shouldn't show such entries at all (with this patch those entries do get shown, they just don't get any message shown with them). But I think this is a useful case. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-10 06:02:07 +08:00
}
/*
* "rev-list --ancestry-path A..B" computes commits that are ancestors
* of B but not ancestors of A but further limits the result to those
* that are descendants of A. This takes the list of bottom commits and
* the result of "A..B" without --ancestry-path, and limits the latter
* further to the ones that can reach one of the commits in "bottom".
*/
static void limit_to_ancestry(struct commit_list *bottom, struct commit_list *list)
{
struct commit_list *p;
struct commit_list *rlist = NULL;
int made_progress;
/*
* Reverse the list so that it will be likely that we would
* process parents before children.
*/
for (p = list; p; p = p->next)
commit_list_insert(p->item, &rlist);
for (p = bottom; p; p = p->next)
p->item->object.flags |= TMP_MARK;
/*
* Mark the ones that can reach bottom commits in "list",
* in a bottom-up fashion.
*/
do {
made_progress = 0;
for (p = rlist; p; p = p->next) {
struct commit *c = p->item;
struct commit_list *parents;
if (c->object.flags & (TMP_MARK | UNINTERESTING))
continue;
for (parents = c->parents;
parents;
parents = parents->next) {
if (!(parents->item->object.flags & TMP_MARK))
continue;
c->object.flags |= TMP_MARK;
made_progress = 1;
break;
}
}
} while (made_progress);
/*
* NEEDSWORK: decide if we want to remove parents that are
* not marked with TMP_MARK from commit->parents for commits
* in the resulting list. We may not want to do that, though.
*/
/*
* The ones that are not marked with TMP_MARK are uninteresting
*/
for (p = list; p; p = p->next) {
struct commit *c = p->item;
if (c->object.flags & TMP_MARK)
continue;
c->object.flags |= UNINTERESTING;
}
/* We are done with the TMP_MARK */
for (p = list; p; p = p->next)
p->item->object.flags &= ~TMP_MARK;
for (p = bottom; p; p = p->next)
p->item->object.flags &= ~TMP_MARK;
free_commit_list(rlist);
}
/*
* Before walking the history, keep the set of "negative" refs the
* caller has asked to exclude.
*
* This is used to compute "rev-list --ancestry-path A..B", as we need
* to filter the result of "A..B" further to the ones that can actually
* reach A.
*/
static struct commit_list *collect_bottom_commits(struct commit_list *list)
{
struct commit_list *elem, *bottom = NULL;
for (elem = list; elem; elem = elem->next)
if (elem->item->object.flags & UNINTERESTING)
commit_list_insert(elem->item, &bottom);
return bottom;
}
static int limit_list(struct rev_info *revs)
{
int slop = SLOP;
unsigned long date = ~0ul;
struct commit_list *list = revs->commits;
struct commit_list *newlist = NULL;
struct commit_list **p = &newlist;
struct commit_list *bottom = NULL;
if (revs->ancestry_path) {
bottom = collect_bottom_commits(list);
if (!bottom)
die("--ancestry-path given but there are no bottom commits");
}
while (list) {
struct commit_list *entry = list;
struct commit *commit = list->item;
struct object *obj = &commit->object;
show_early_output_fn_t show;
list = list->next;
free(entry);
if (revs->max_age != -1 && (commit->date < revs->max_age))
obj->flags |= UNINTERESTING;
if (add_parents_to_list(revs, commit, &list, NULL) < 0)
return -1;
if (obj->flags & UNINTERESTING) {
mark_parents_uninteresting(commit);
if (revs->show_all)
p = &commit_list_insert(commit, p)->next;
slop = still_interesting(list, date, slop);
if (slop)
Add "--show-all" revision walker flag for debugging It's really not very easy to visualize the commit walker, because - on purpose - it obvously doesn't show the uninteresting commits! This adds a "--show-all" flag to the revision walker, which will make it show uninteresting commits too, and they'll have a '^' in front of them (it also fixes a logic error for !verbose_header for boundary commits - we should show the '-' even if left_right isn't shown). A separate patch to gitk to teach it the new '^' was sent to paulus. With the change in place, it actually is interesting even for the cases that git doesn't have any problems with, ie for the kernel you can do: gitk -d --show-all v2.6.24.. and you see just how far down it has to parse things to see it all. The use of "-d" is a good idea, since the date-ordered toposort is much better at showing why it goes deep down (ie the date of some of those commits after 2.6.24 is much older, because they were merged from trees that weren't rebased). So I think this is a useful feature even for non-debugging - just to visualize what git does internally more. When it actually breaks out due to the "everybody_uninteresting()" case, it adds the uninteresting commits (both the one it's looking at now, and the list of pending ones) to the list This way, we really list *all* the commits we've looked at. Because we now end up listing commits we may not even have been parsed at all "show_log" and "show_commit" need to protect against commits that don't have a commit buffer entry. That second part is debatable just how it should work. Maybe we shouldn't show such entries at all (with this patch those entries do get shown, they just don't get any message shown with them). But I think this is a useful case. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-10 06:02:07 +08:00
continue;
/* If showing all, add the whole pending list to the end */
if (revs->show_all)
*p = list;
break;
}
if (revs->min_age != -1 && (commit->date > revs->min_age))
continue;
date = commit->date;
p = &commit_list_insert(commit, p)->next;
show = show_early_output;
if (!show)
continue;
show(revs, newlist);
show_early_output = NULL;
}
if (revs->cherry_pick)
cherry_pick_list(newlist, revs);
if (bottom) {
limit_to_ancestry(bottom, newlist);
free_commit_list(bottom);
}
revs->commits = newlist;
return 0;
}
struct all_refs_cb {
int all_flags;
int warned_bad_reflog;
struct rev_info *all_revs;
const char *name_for_errormsg;
};
static int handle_one_ref(const char *path, const unsigned char *sha1, int flag, void *cb_data)
{
struct all_refs_cb *cb = cb_data;
struct object *object = get_reference(cb->all_revs, path, sha1,
cb->all_flags);
2007-02-23 09:56:31 +08:00
add_pending_object(cb->all_revs, object, path);
return 0;
}
static void init_all_refs_cb(struct all_refs_cb *cb, struct rev_info *revs,
unsigned flags)
{
cb->all_revs = revs;
cb->all_flags = flags;
}
static void handle_refs(struct rev_info *revs, unsigned flags,
int (*for_each)(each_ref_fn, void *))
{
struct all_refs_cb cb;
init_all_refs_cb(&cb, revs, flags);
for_each(handle_one_ref, &cb);
}
static void handle_one_reflog_commit(unsigned char *sha1, void *cb_data)
{
struct all_refs_cb *cb = cb_data;
if (!is_null_sha1(sha1)) {
struct object *o = parse_object(sha1);
if (o) {
o->flags |= cb->all_flags;
add_pending_object(cb->all_revs, o, "");
}
else if (!cb->warned_bad_reflog) {
warning("reflog of '%s' references pruned commits",
cb->name_for_errormsg);
cb->warned_bad_reflog = 1;
}
}
}
static int handle_one_reflog_ent(unsigned char *osha1, unsigned char *nsha1,
const char *email, unsigned long timestamp, int tz,
const char *message, void *cb_data)
{
handle_one_reflog_commit(osha1, cb_data);
handle_one_reflog_commit(nsha1, cb_data);
return 0;
}
static int handle_one_reflog(const char *path, const unsigned char *sha1, int flag, void *cb_data)
{
struct all_refs_cb *cb = cb_data;
cb->warned_bad_reflog = 0;
cb->name_for_errormsg = path;
for_each_reflog_ent(path, handle_one_reflog_ent, cb_data);
return 0;
}
static void handle_reflog(struct rev_info *revs, unsigned flags)
{
struct all_refs_cb cb;
cb.all_revs = revs;
cb.all_flags = flags;
for_each_reflog(handle_one_reflog, &cb);
}
static int add_parents_only(struct rev_info *revs, const char *arg, int flags)
{
unsigned char sha1[20];
struct object *it;
struct commit *commit;
struct commit_list *parents;
if (*arg == '^') {
flags ^= UNINTERESTING;
arg++;
}
if (get_sha1(arg, sha1))
return 0;
while (1) {
it = get_reference(revs, arg, sha1, 0);
if (it->type != OBJ_TAG)
break;
if (!((struct tag*)it)->tagged)
return 0;
hashcpy(sha1, ((struct tag*)it)->tagged->sha1);
}
if (it->type != OBJ_COMMIT)
return 0;
commit = (struct commit *)it;
for (parents = commit->parents; parents; parents = parents->next) {
it = &parents->item->object;
it->flags |= flags;
add_pending_object(revs, it, arg);
}
return 1;
}
void init_revisions(struct rev_info *revs, const char *prefix)
{
memset(revs, 0, sizeof(*revs));
revs->abbrev = DEFAULT_ABBREV;
revs->ignore_merges = 1;
gitweb.cgi history not shown This does: - add a "rev.simplify_history" flag which defaults to on - it turns it off for "git whatchanged" (which thus now has real semantics outside of "git log") - it adds a command line flag ("--full-history") to turn it off for others (ie you can make "git log" and "gitk" etc get the semantics if you want to. Now, just as an example of _why_ you really really really want to simplify history by default, apply this patch, install it, and try these two command lines: gitk --full-history -- git.c gitk -- git.c and compare the output. So with this, you can also now do git whatchanged -p -- gitweb.cgi git log -p --full-history -- gitweb.cgi and it will show the old history of gitweb.cgi, even though it's not relevant to the _current_ state of the name "gitweb.cgi" NOTE NOTE NOTE! It will still actually simplify away merges that didn't change anything at all into either child. That creates these bogus strange discontinuities if you look at it with "gitk" (look at the --full-history gitk output for git.c, and you'll see a few strange cases). So the whole "--parent" thing ends up somewhat bogus with --full-history because of this, but I'm not sure it's worth even worrying about. I don't think you'd ever want to really use "--full-history" with the graphical representation, I just give it as an example exactly to show _why_ doing so would be insane. I think this is trivial enough and useful enough to be worth merging into the stable branch. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-12 01:57:35 +08:00
revs->simplify_history = 1;
DIFF_OPT_SET(&revs->pruning, RECURSIVE);
DIFF_OPT_SET(&revs->pruning, QUICK);
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 07:52:13 +08:00
revs->pruning.add_remove = file_add_remove;
revs->pruning.change = file_change;
revs->lifo = 1;
revs->dense = 1;
revs->prefix = prefix;
revs->max_age = -1;
revs->min_age = -1;
revs->skip_count = -1;
revs->max_count = -1;
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 07:52:13 +08:00
revs->commit_format = CMIT_FMT_DEFAULT;
revs->grep_filter.status_only = 1;
revs->grep_filter.pattern_tail = &(revs->grep_filter.pattern_list);
revs->grep_filter.header_tail = &(revs->grep_filter.header_list);
revs->grep_filter.regflags = REG_NEWLINE;
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 07:52:13 +08:00
diff_setup(&revs->diffopt);
if (prefix && !revs->diffopt.prefix) {
diff --relative: output paths as relative to the current subdirectory This adds --relative option to the diff family. When you start from a subdirectory: $ git diff --relative shows only the diff that is inside your current subdirectory, and without $prefix part. People who usually live in subdirectories may like it. There are a few things I should also mention about the change: - This works not just with diff but also works with the log family of commands, but the history pruning is not affected. In other words, if you go to a subdirectory, you can say: $ git log --relative -p but it will show the log message even for commits that do not touch the current directory. You can limit it by giving pathspec yourself: $ git log --relative -p . This originally was not a conscious design choice, but we have a way to affect diff pathspec and pruning pathspec independently. IOW "git log --full-diff -p ." tells it to prune history to commits that affect the current subdirectory but show the changes with full context. I think it makes more sense to leave pruning independent from --relative than the obvious alternative of always pruning with the current subdirectory, which would break the symmetry. - Because this works also with the log family, you could format-patch a single change, limiting the effect to your subdirectory, like so: $ cd gitk-git $ git format-patch -1 --relative 911f1eb But because that is a special purpose usage, this option will never become the default, with or without repository or user preference configuration. The risk of producing a partial patch and sending it out by mistake is too great if we did so. - This is inherently incompatible with --no-index, which is a bolted-on hack that does not have much to do with git itself. I didn't bother checking and erroring out on the combined use of the options, but probably I should. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-13 06:26:02 +08:00
revs->diffopt.prefix = prefix;
revs->diffopt.prefix_length = strlen(prefix);
}
}
static void add_pending_commit_list(struct rev_info *revs,
struct commit_list *commit_list,
unsigned int flags)
{
while (commit_list) {
struct object *object = &commit_list->item->object;
object->flags |= flags;
add_pending_object(revs, object, sha1_to_hex(object->sha1));
commit_list = commit_list->next;
}
}
static void prepare_show_merge(struct rev_info *revs)
{
struct commit_list *bases;
struct commit *head, *other;
unsigned char sha1[20];
const char **prune = NULL;
int i, prune_num = 1; /* counting terminating NULL */
if (get_sha1("HEAD", sha1) || !(head = lookup_commit(sha1)))
die("--merge without HEAD?");
if (get_sha1("MERGE_HEAD", sha1) || !(other = lookup_commit(sha1)))
die("--merge without MERGE_HEAD?");
add_pending_object(revs, &head->object, "HEAD");
add_pending_object(revs, &other->object, "MERGE_HEAD");
bases = get_merge_bases(head, other, 1);
add_pending_commit_list(revs, bases, UNINTERESTING);
free_commit_list(bases);
head->object.flags |= SYMMETRIC_LEFT;
if (!active_nr)
read_cache();
for (i = 0; i < active_nr; i++) {
struct cache_entry *ce = active_cache[i];
if (!ce_stage(ce))
continue;
if (ce_path_match(ce, revs->prune_data)) {
prune_num++;
prune = xrealloc(prune, sizeof(*prune) * prune_num);
prune[prune_num-2] = ce->name;
prune[prune_num-1] = NULL;
}
while ((i+1 < active_nr) &&
ce_same_name(ce, active_cache[i+1]))
i++;
}
revs->prune_data = prune;
revs->limited = 1;
}
int handle_revision_arg(const char *arg, struct rev_info *revs,
int flags,
int cant_be_filename)
{
unsigned mode;
char *dotdot;
struct object *object;
unsigned char sha1[20];
int local_flags;
dotdot = strstr(arg, "..");
if (dotdot) {
unsigned char from_sha1[20];
const char *next = dotdot + 2;
const char *this = arg;
int symmetric = *next == '.';
unsigned int flags_exclude = flags ^ UNINTERESTING;
*dotdot = 0;
next += symmetric;
if (!*next)
next = "HEAD";
if (dotdot == arg)
this = "HEAD";
if (!get_sha1(this, from_sha1) &&
!get_sha1(next, sha1)) {
struct commit *a, *b;
struct commit_list *exclude;
a = lookup_commit_reference(from_sha1);
b = lookup_commit_reference(sha1);
if (!a || !b) {
die(symmetric ?
"Invalid symmetric difference expression %s...%s" :
"Invalid revision range %s..%s",
arg, next);
}
if (!cant_be_filename) {
*dotdot = '.';
verify_non_filename(revs->prefix, arg);
}
if (symmetric) {
exclude = get_merge_bases(a, b, 1);
add_pending_commit_list(revs, exclude,
flags_exclude);
free_commit_list(exclude);
a->object.flags |= flags | SYMMETRIC_LEFT;
} else
a->object.flags |= flags_exclude;
b->object.flags |= flags;
add_pending_object(revs, &a->object, this);
add_pending_object(revs, &b->object, next);
return 0;
}
*dotdot = '.';
}
dotdot = strstr(arg, "^@");
if (dotdot && !dotdot[2]) {
*dotdot = 0;
if (add_parents_only(revs, arg, flags))
return 0;
*dotdot = '^';
}
dotdot = strstr(arg, "^!");
if (dotdot && !dotdot[2]) {
*dotdot = 0;
if (!add_parents_only(revs, arg, flags ^ UNINTERESTING))
*dotdot = '^';
}
local_flags = 0;
if (*arg == '^') {
local_flags = UNINTERESTING;
arg++;
}
if (get_sha1_with_mode(arg, sha1, &mode))
return -1;
if (!cant_be_filename)
verify_non_filename(revs->prefix, arg);
object = get_reference(revs, arg, sha1, flags ^ local_flags);
add_pending_object_with_mode(revs, object, arg, mode);
return 0;
}
static void read_pathspec_from_stdin(struct rev_info *revs, struct strbuf *sb, const char ***prune_data)
{
const char **prune = *prune_data;
int prune_nr;
int prune_alloc;
/* count existing ones */
if (!prune)
prune_nr = 0;
else
for (prune_nr = 0; prune[prune_nr]; prune_nr++)
;
prune_alloc = prune_nr; /* not really, but we do not know */
while (strbuf_getwholeline(sb, stdin, '\n') != EOF) {
int len = sb->len;
if (len && sb->buf[len - 1] == '\n')
sb->buf[--len] = '\0';
ALLOC_GROW(prune, prune_nr+1, prune_alloc);
prune[prune_nr++] = xstrdup(sb->buf);
}
if (prune) {
ALLOC_GROW(prune, prune_nr+1, prune_alloc);
prune[prune_nr] = NULL;
}
*prune_data = prune;
}
static void read_revisions_from_stdin(struct rev_info *revs, const char ***prune)
{
struct strbuf sb;
int seen_dashdash = 0;
strbuf_init(&sb, 1000);
while (strbuf_getwholeline(&sb, stdin, '\n') != EOF) {
int len = sb.len;
if (len && sb.buf[len - 1] == '\n')
sb.buf[--len] = '\0';
if (!len)
break;
if (sb.buf[0] == '-') {
if (len == 2 && sb.buf[1] == '-') {
seen_dashdash = 1;
break;
}
die("options not supported in --stdin mode");
}
if (handle_revision_arg(sb.buf, revs, 0, 1))
die("bad revision '%s'", sb.buf);
}
if (seen_dashdash)
read_pathspec_from_stdin(revs, &sb, prune);
strbuf_release(&sb);
}
static void add_grep(struct rev_info *revs, const char *ptn, enum grep_pat_token what)
{
append_grep_pattern(&revs->grep_filter, ptn, "command line", 0, what);
}
log --author/--committer: really match only with name part When we tried to find commits done by AUTHOR, the first implementation tried to pattern match a line with "^author .*AUTHOR", which later was enhanced to strip leading caret and look for "^author AUTHOR" when the search pattern was anchored at the left end (i.e. --author="^AUTHOR"). This had a few problems: * When looking for fixed strings (e.g. "git log -F --author=x --grep=y"), the regexp internally used "^author .*x" would never match anything; * To match at the end (e.g. "git log --author='google.com>$'"), the generated regexp has to also match the trailing timestamp part the commit header lines have. Also, in order to determine if the '$' at the end means "match at the end of the line" or just a literal dollar sign (probably backslash-quoted), we would need to parse the regexp ourselves. An earlier alternative tried to make sure that a line matches "^author " (to limit by field name) and the user supplied pattern at the same time. While it solved the -F problem by introducing a special override for matching the "^author ", it did not solve the trailing timestamp nor tail match problem. It also would have matched every commit if --author=author was asked for, not because the author's email part had this string, but because every commit header line that talks about the author begins with that field name, regardleses of who wrote it. Instead of piling more hacks on top of hacks, this rethinks the grep machinery that is used to look for strings in the commit header, and makes sure that (1) field name matches literally at the beginning of the line, followed by a SP, and (2) the user supplied pattern is matched against the remainder of the line, excluding the trailing timestamp data. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-09-05 13:15:02 +08:00
static void add_header_grep(struct rev_info *revs, enum grep_header_field field, const char *pattern)
{
log --author/--committer: really match only with name part When we tried to find commits done by AUTHOR, the first implementation tried to pattern match a line with "^author .*AUTHOR", which later was enhanced to strip leading caret and look for "^author AUTHOR" when the search pattern was anchored at the left end (i.e. --author="^AUTHOR"). This had a few problems: * When looking for fixed strings (e.g. "git log -F --author=x --grep=y"), the regexp internally used "^author .*x" would never match anything; * To match at the end (e.g. "git log --author='google.com>$'"), the generated regexp has to also match the trailing timestamp part the commit header lines have. Also, in order to determine if the '$' at the end means "match at the end of the line" or just a literal dollar sign (probably backslash-quoted), we would need to parse the regexp ourselves. An earlier alternative tried to make sure that a line matches "^author " (to limit by field name) and the user supplied pattern at the same time. While it solved the -F problem by introducing a special override for matching the "^author ", it did not solve the trailing timestamp nor tail match problem. It also would have matched every commit if --author=author was asked for, not because the author's email part had this string, but because every commit header line that talks about the author begins with that field name, regardleses of who wrote it. Instead of piling more hacks on top of hacks, this rethinks the grep machinery that is used to look for strings in the commit header, and makes sure that (1) field name matches literally at the beginning of the line, followed by a SP, and (2) the user supplied pattern is matched against the remainder of the line, excluding the trailing timestamp data. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-09-05 13:15:02 +08:00
append_header_grep_pattern(&revs->grep_filter, field, pattern);
}
static void add_message_grep(struct rev_info *revs, const char *pattern)
{
add_grep(revs, pattern, GREP_PATTERN_BODY);
}
static int handle_revision_opt(struct rev_info *revs, int argc, const char **argv,
int *unkc, const char **unkv)
{
const char *arg = argv[0];
/* pseudo revision arguments */
if (!strcmp(arg, "--all") || !strcmp(arg, "--branches") ||
!strcmp(arg, "--tags") || !strcmp(arg, "--remotes") ||
!strcmp(arg, "--reflog") || !strcmp(arg, "--not") ||
!strcmp(arg, "--no-walk") || !strcmp(arg, "--do-walk") ||
!strcmp(arg, "--bisect"))
{
unkv[(*unkc)++] = arg;
return 1;
}
if (!prefixcmp(arg, "--max-count=")) {
revs->max_count = atoi(arg + 12);
revs->no_walk = 0;
} else if (!prefixcmp(arg, "--skip=")) {
revs->skip_count = atoi(arg + 7);
} else if ((*arg == '-') && isdigit(arg[1])) {
/* accept -<digit>, like traditional "head" */
revs->max_count = atoi(arg + 1);
revs->no_walk = 0;
} else if (!strcmp(arg, "-n")) {
if (argc <= 1)
return error("-n requires an argument");
revs->max_count = atoi(argv[1]);
revs->no_walk = 0;
return 2;
} else if (!prefixcmp(arg, "-n")) {
revs->max_count = atoi(arg + 2);
revs->no_walk = 0;
} else if (!prefixcmp(arg, "--max-age=")) {
revs->max_age = atoi(arg + 10);
} else if (!prefixcmp(arg, "--since=")) {
revs->max_age = approxidate(arg + 8);
} else if (!prefixcmp(arg, "--after=")) {
revs->max_age = approxidate(arg + 8);
} else if (!prefixcmp(arg, "--min-age=")) {
revs->min_age = atoi(arg + 10);
} else if (!prefixcmp(arg, "--before=")) {
revs->min_age = approxidate(arg + 9);
} else if (!prefixcmp(arg, "--until=")) {
revs->min_age = approxidate(arg + 8);
} else if (!strcmp(arg, "--first-parent")) {
revs->first_parent_only = 1;
} else if (!strcmp(arg, "--ancestry-path")) {
revs->ancestry_path = 1;
revs->simplify_history = 0;
revs->limited = 1;
} else if (!strcmp(arg, "-g") || !strcmp(arg, "--walk-reflogs")) {
init_reflog_walk(&revs->reflog_info);
} else if (!strcmp(arg, "--default")) {
if (argc <= 1)
return error("bad --default argument");
revs->def = argv[1];
return 2;
} else if (!strcmp(arg, "--merge")) {
revs->show_merge = 1;
} else if (!strcmp(arg, "--topo-order")) {
revs->lifo = 1;
revs->topo_order = 1;
revision traversal: show full history with merge simplification The --full-history traversal keeps all merges in addition to non-merge commits that touch paths in the given pathspec. This is useful to view both sides of a merge in a topology like this: A---M---o / / ---O---B even when A and B makes identical change to the given paths. The revision traversal without --full-history aims to come up with the simplest history to explain the final state of the tree, and one of the side branches can be pruned away. The behaviour to keep all merges however is inconvenient if neither A nor B touches the paths we are interested in. --full-history reduces the topology to: ---O---M---o in such a case, without removing M. This adds a post processing phase on top of --full-history traversal to remove needless merges from the resulting history. The idea is to compute, for each commit in the "full history" result set, the commit that should replace it in the simplified history. The commit to replace it in the final history is determined as follows: * In any case, we first figure out the replacement commits of parents of the commit we are looking at. The commit we are looking at is rewritten as if the replacement commits of its original parents are its parents. While doing so, we reduce the redundant parents from the rewritten parent list by not just removing the identical ones, but also removing a parent that is an ancestor of another parent. * After the above parent simplification, if the commit is a root commit, an UNINTERESTING commit, a merge commit, or modifies the paths we are interested in, then the replacement commit of the commit is itself. In other words, such a commit is not dropped from the final result. The first point above essentially means that the history is rewritten in the bottom up direction. We can rewrite the parent list of a commit only after we know how all of its parents are rewritten. This means that the processing needs to happen on the full history (i.e. after limit_list()). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-31 16:17:41 +08:00
} else if (!strcmp(arg, "--simplify-merges")) {
revs->simplify_merges = 1;
revs->rewrite_parents = 1;
revs->simplify_history = 0;
revs->limited = 1;
} else if (!strcmp(arg, "--simplify-by-decoration")) {
revs->simplify_merges = 1;
revs->rewrite_parents = 1;
revs->simplify_history = 0;
revs->simplify_by_decoration = 1;
revs->limited = 1;
revs->prune = 1;
load_ref_decorations(DECORATE_SHORT_REFS);
} else if (!strcmp(arg, "--date-order")) {
revs->lifo = 0;
revs->topo_order = 1;
} else if (!prefixcmp(arg, "--early-output")) {
int count = 100;
switch (arg[14]) {
case '=':
count = atoi(arg+15);
/* Fallthrough */
case 0:
revs->topo_order = 1;
revs->early_output = count;
}
} else if (!strcmp(arg, "--parents")) {
revs->rewrite_parents = 1;
revs->print_parents = 1;
} else if (!strcmp(arg, "--dense")) {
revs->dense = 1;
} else if (!strcmp(arg, "--sparse")) {
revs->dense = 0;
} else if (!strcmp(arg, "--show-all")) {
revs->show_all = 1;
} else if (!strcmp(arg, "--remove-empty")) {
revs->remove_empty_trees = 1;
} else if (!strcmp(arg, "--merges")) {
revs->merges_only = 1;
} else if (!strcmp(arg, "--no-merges")) {
revs->no_merges = 1;
} else if (!strcmp(arg, "--boundary")) {
revs->boundary = 1;
} else if (!strcmp(arg, "--left-right")) {
revs->left_right = 1;
} else if (!strcmp(arg, "--count")) {
revs->count = 1;
} else if (!strcmp(arg, "--cherry-pick")) {
revs->cherry_pick = 1;
revs->limited = 1;
} else if (!strcmp(arg, "--objects")) {
revs->tag_objects = 1;
revs->tree_objects = 1;
revs->blob_objects = 1;
} else if (!strcmp(arg, "--objects-edge")) {
revs->tag_objects = 1;
revs->tree_objects = 1;
revs->blob_objects = 1;
revs->edge_hint = 1;
} else if (!strcmp(arg, "--unpacked")) {
revs->unpacked = 1;
} else if (!prefixcmp(arg, "--unpacked=")) {
die("--unpacked=<packfile> no longer supported.");
} else if (!strcmp(arg, "-r")) {
revs->diff = 1;
DIFF_OPT_SET(&revs->diffopt, RECURSIVE);
} else if (!strcmp(arg, "-t")) {
revs->diff = 1;
DIFF_OPT_SET(&revs->diffopt, RECURSIVE);
DIFF_OPT_SET(&revs->diffopt, TREE_IN_RECURSIVE);
} else if (!strcmp(arg, "-m")) {
revs->ignore_merges = 0;
} else if (!strcmp(arg, "-c")) {
revs->diff = 1;
revs->dense_combined_merges = 0;
revs->combine_merges = 1;
} else if (!strcmp(arg, "--cc")) {
revs->diff = 1;
revs->dense_combined_merges = 1;
revs->combine_merges = 1;
} else if (!strcmp(arg, "-v")) {
revs->verbose_header = 1;
} else if (!strcmp(arg, "--pretty")) {
revs->verbose_header = 1;
revs->pretty_given = 1;
get_commit_format(arg+8, revs);
} else if (!prefixcmp(arg, "--pretty=") || !prefixcmp(arg, "--format=")) {
revs->verbose_header = 1;
revs->pretty_given = 1;
get_commit_format(arg+9, revs);
} else if (!strcmp(arg, "--show-notes")) {
revs->show_notes = 1;
revs->show_notes_given = 1;
} else if (!prefixcmp(arg, "--show-notes=")) {
struct strbuf buf = STRBUF_INIT;
revs->show_notes = 1;
revs->show_notes_given = 1;
if (!revs->notes_opt.extra_notes_refs)
revs->notes_opt.extra_notes_refs = xcalloc(1, sizeof(struct string_list));
if (!prefixcmp(arg+13, "refs/"))
/* happy */;
else if (!prefixcmp(arg+13, "notes/"))
strbuf_addstr(&buf, "refs/");
else
strbuf_addstr(&buf, "refs/notes/");
strbuf_addstr(&buf, arg+13);
string_list_append(revs->notes_opt.extra_notes_refs,
strbuf_detach(&buf, NULL));
} else if (!strcmp(arg, "--no-notes")) {
revs->show_notes = 0;
revs->show_notes_given = 1;
} else if (!strcmp(arg, "--standard-notes")) {
revs->show_notes_given = 1;
revs->notes_opt.suppress_default_notes = 0;
} else if (!strcmp(arg, "--no-standard-notes")) {
revs->notes_opt.suppress_default_notes = 1;
} else if (!strcmp(arg, "--oneline")) {
revs->verbose_header = 1;
get_commit_format("oneline", revs);
revs->pretty_given = 1;
revs->abbrev_commit = 1;
} else if (!strcmp(arg, "--graph")) {
revs->topo_order = 1;
revs->rewrite_parents = 1;
revs->graph = graph_init(revs);
} else if (!strcmp(arg, "--root")) {
revs->show_root_diff = 1;
} else if (!strcmp(arg, "--no-commit-id")) {
revs->no_commit_id = 1;
} else if (!strcmp(arg, "--always")) {
revs->always_show_header = 1;
} else if (!strcmp(arg, "--no-abbrev")) {
revs->abbrev = 0;
} else if (!strcmp(arg, "--abbrev")) {
revs->abbrev = DEFAULT_ABBREV;
} else if (!prefixcmp(arg, "--abbrev=")) {
revs->abbrev = strtoul(arg + 9, NULL, 10);
if (revs->abbrev < MINIMUM_ABBREV)
revs->abbrev = MINIMUM_ABBREV;
else if (revs->abbrev > 40)
revs->abbrev = 40;
} else if (!strcmp(arg, "--abbrev-commit")) {
revs->abbrev_commit = 1;
} else if (!strcmp(arg, "--full-diff")) {
revs->diff = 1;
revs->full_diff = 1;
} else if (!strcmp(arg, "--full-history")) {
revs->simplify_history = 0;
} else if (!strcmp(arg, "--relative-date")) {
revs->date_mode = DATE_RELATIVE;
revs->date_mode_explicit = 1;
} else if (!strncmp(arg, "--date=", 7)) {
revs->date_mode = parse_date_format(arg + 7);
revs->date_mode_explicit = 1;
} else if (!strcmp(arg, "--log-size")) {
revs->show_log_size = 1;
}
/*
* Grepping the commit log
*/
else if (!prefixcmp(arg, "--author=")) {
log --author/--committer: really match only with name part When we tried to find commits done by AUTHOR, the first implementation tried to pattern match a line with "^author .*AUTHOR", which later was enhanced to strip leading caret and look for "^author AUTHOR" when the search pattern was anchored at the left end (i.e. --author="^AUTHOR"). This had a few problems: * When looking for fixed strings (e.g. "git log -F --author=x --grep=y"), the regexp internally used "^author .*x" would never match anything; * To match at the end (e.g. "git log --author='google.com>$'"), the generated regexp has to also match the trailing timestamp part the commit header lines have. Also, in order to determine if the '$' at the end means "match at the end of the line" or just a literal dollar sign (probably backslash-quoted), we would need to parse the regexp ourselves. An earlier alternative tried to make sure that a line matches "^author " (to limit by field name) and the user supplied pattern at the same time. While it solved the -F problem by introducing a special override for matching the "^author ", it did not solve the trailing timestamp nor tail match problem. It also would have matched every commit if --author=author was asked for, not because the author's email part had this string, but because every commit header line that talks about the author begins with that field name, regardleses of who wrote it. Instead of piling more hacks on top of hacks, this rethinks the grep machinery that is used to look for strings in the commit header, and makes sure that (1) field name matches literally at the beginning of the line, followed by a SP, and (2) the user supplied pattern is matched against the remainder of the line, excluding the trailing timestamp data. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-09-05 13:15:02 +08:00
add_header_grep(revs, GREP_HEADER_AUTHOR, arg+9);
} else if (!prefixcmp(arg, "--committer=")) {
log --author/--committer: really match only with name part When we tried to find commits done by AUTHOR, the first implementation tried to pattern match a line with "^author .*AUTHOR", which later was enhanced to strip leading caret and look for "^author AUTHOR" when the search pattern was anchored at the left end (i.e. --author="^AUTHOR"). This had a few problems: * When looking for fixed strings (e.g. "git log -F --author=x --grep=y"), the regexp internally used "^author .*x" would never match anything; * To match at the end (e.g. "git log --author='google.com>$'"), the generated regexp has to also match the trailing timestamp part the commit header lines have. Also, in order to determine if the '$' at the end means "match at the end of the line" or just a literal dollar sign (probably backslash-quoted), we would need to parse the regexp ourselves. An earlier alternative tried to make sure that a line matches "^author " (to limit by field name) and the user supplied pattern at the same time. While it solved the -F problem by introducing a special override for matching the "^author ", it did not solve the trailing timestamp nor tail match problem. It also would have matched every commit if --author=author was asked for, not because the author's email part had this string, but because every commit header line that talks about the author begins with that field name, regardleses of who wrote it. Instead of piling more hacks on top of hacks, this rethinks the grep machinery that is used to look for strings in the commit header, and makes sure that (1) field name matches literally at the beginning of the line, followed by a SP, and (2) the user supplied pattern is matched against the remainder of the line, excluding the trailing timestamp data. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-09-05 13:15:02 +08:00
add_header_grep(revs, GREP_HEADER_COMMITTER, arg+12);
} else if (!prefixcmp(arg, "--grep=")) {
add_message_grep(revs, arg+7);
} else if (!strcmp(arg, "--extended-regexp") || !strcmp(arg, "-E")) {
revs->grep_filter.regflags |= REG_EXTENDED;
} else if (!strcmp(arg, "--regexp-ignore-case") || !strcmp(arg, "-i")) {
revs->grep_filter.regflags |= REG_ICASE;
} else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) {
revs->grep_filter.fixed = 1;
} else if (!strcmp(arg, "--all-match")) {
revs->grep_filter.all_match = 1;
} else if (!prefixcmp(arg, "--encoding=")) {
arg += 11;
if (strcmp(arg, "none"))
git_log_output_encoding = xstrdup(arg);
else
git_log_output_encoding = "";
} else if (!strcmp(arg, "--reverse")) {
revs->reverse ^= 1;
} else if (!strcmp(arg, "--children")) {
revs->children.name = "children";
revs->limited = 1;
} else {
int opts = diff_opt_parse(&revs->diffopt, argv, argc);
if (!opts)
unkv[(*unkc)++] = arg;
return opts;
}
return 1;
}
void parse_revision_opt(struct rev_info *revs, struct parse_opt_ctx_t *ctx,
const struct option *options,
const char * const usagestr[])
{
int n = handle_revision_opt(revs, ctx->argc, ctx->argv,
&ctx->cpidx, ctx->out);
if (n <= 0) {
error("unknown option `%s'", ctx->argv[0]);
usage_with_options(usagestr, options);
}
ctx->argv += n;
ctx->argc -= n;
}
static int for_each_bad_bisect_ref(each_ref_fn fn, void *cb_data)
{
return for_each_ref_in("refs/bisect/bad", fn, cb_data);
}
static int for_each_good_bisect_ref(each_ref_fn fn, void *cb_data)
{
return for_each_ref_in("refs/bisect/good", fn, cb_data);
}
static void append_prune_data(const char ***prune_data, const char **av)
{
const char **prune = *prune_data;
int prune_nr;
int prune_alloc;
if (!prune) {
*prune_data = av;
return;
}
/* count existing ones */
for (prune_nr = 0; prune[prune_nr]; prune_nr++)
;
prune_alloc = prune_nr; /* not really, but we do not know */
while (*av) {
ALLOC_GROW(prune, prune_nr+1, prune_alloc);
prune[prune_nr++] = *av;
av++;
}
if (prune) {
ALLOC_GROW(prune, prune_nr+1, prune_alloc);
prune[prune_nr] = NULL;
}
*prune_data = prune;
}
/*
* Parse revision information, filling in the "rev_info" structure,
* and removing the used arguments from the argument list.
*
* Returns the number of arguments left that weren't recognized
* (which are also moved to the head of the argument list)
*/
int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct setup_revision_opt *opt)
{
int i, flags, left, seen_dashdash, read_from_stdin, got_rev_arg = 0;
const char **prune_data = NULL;
/* First, search for "--" */
seen_dashdash = 0;
for (i = 1; i < argc; i++) {
const char *arg = argv[i];
if (strcmp(arg, "--"))
continue;
argv[i] = NULL;
argc = i;
if (argv[i + 1])
prune_data = argv + i + 1;
seen_dashdash = 1;
break;
}
/* Second, deal with arguments and options */
flags = 0;
read_from_stdin = 0;
for (left = i = 1; i < argc; i++) {
const char *arg = argv[i];
if (*arg == '-') {
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 07:52:13 +08:00
int opts;
if (!strcmp(arg, "--all")) {
handle_refs(revs, flags, for_each_ref);
handle_refs(revs, flags, head_ref);
continue;
}
if (!strcmp(arg, "--branches")) {
handle_refs(revs, flags, for_each_branch_ref);
continue;
}
if (!strcmp(arg, "--bisect")) {
handle_refs(revs, flags, for_each_bad_bisect_ref);
handle_refs(revs, flags ^ UNINTERESTING, for_each_good_bisect_ref);
revs->bisect = 1;
continue;
}
if (!strcmp(arg, "--tags")) {
handle_refs(revs, flags, for_each_tag_ref);
continue;
}
if (!strcmp(arg, "--remotes")) {
handle_refs(revs, flags, for_each_remote_ref);
continue;
}
if (!prefixcmp(arg, "--glob=")) {
struct all_refs_cb cb;
init_all_refs_cb(&cb, revs, flags);
for_each_glob_ref(handle_one_ref, arg + 7, &cb);
continue;
}
if (!prefixcmp(arg, "--branches=")) {
struct all_refs_cb cb;
init_all_refs_cb(&cb, revs, flags);
for_each_glob_ref_in(handle_one_ref, arg + 11, "refs/heads/", &cb);
continue;
}
if (!prefixcmp(arg, "--tags=")) {
struct all_refs_cb cb;
init_all_refs_cb(&cb, revs, flags);
for_each_glob_ref_in(handle_one_ref, arg + 7, "refs/tags/", &cb);
continue;
}
if (!prefixcmp(arg, "--remotes=")) {
struct all_refs_cb cb;
init_all_refs_cb(&cb, revs, flags);
for_each_glob_ref_in(handle_one_ref, arg + 10, "refs/remotes/", &cb);
continue;
}
if (!strcmp(arg, "--reflog")) {
handle_reflog(revs, flags);
continue;
}
if (!strcmp(arg, "--not")) {
flags ^= UNINTERESTING;
continue;
}
if (!strcmp(arg, "--no-walk")) {
revs->no_walk = 1;
continue;
}
if (!strcmp(arg, "--do-walk")) {
revs->no_walk = 0;
continue;
}
if (!strcmp(arg, "--stdin")) {
if (revs->disable_stdin) {
argv[left++] = arg;
continue;
}
if (read_from_stdin++)
die("--stdin given twice?");
read_revisions_from_stdin(revs, &prune_data);
continue;
}
opts = handle_revision_opt(revs, argc - i, argv + i, &left, argv);
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 07:52:13 +08:00
if (opts > 0) {
i += opts - 1;
continue;
}
if (opts < 0)
exit(128);
continue;
}
if (handle_revision_arg(arg, revs, flags, seen_dashdash)) {
int j;
if (seen_dashdash || *arg == '^')
die("bad revision '%s'", arg);
/* If we didn't have a "--":
* (1) all filenames must exist;
* (2) all rev-args must not be interpretable
* as a valid filename.
* but the latter we have checked in the main loop.
*/
for (j = i; j < argc; j++)
verify_filename(revs->prefix, argv[j]);
append_prune_data(&prune_data, argv + i);
break;
}
else
got_rev_arg = 1;
}
if (prune_data)
revs->prune_data = get_pathspec(revs->prefix, prune_data);
if (revs->def == NULL)
revs->def = opt ? opt->def : NULL;
if (opt && opt->tweak)
opt->tweak(revs, opt);
if (revs->show_merge)
prepare_show_merge(revs);
if (revs->def && !revs->pending.nr && !got_rev_arg) {
unsigned char sha1[20];
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 07:52:13 +08:00
struct object *object;
unsigned mode;
if (get_sha1_with_mode(revs->def, sha1, &mode))
die("bad default revision '%s'", revs->def);
object = get_reference(revs, revs->def, sha1, 0);
add_pending_object_with_mode(revs, object, revs->def, mode);
}
/* Did the user ask for any diff output? Run the diff! */
if (revs->diffopt.output_format & ~DIFF_FORMAT_NO_OUTPUT)
revs->diff = 1;
/* Pickaxe, diff-filter and rename following need diffs */
if (revs->diffopt.pickaxe ||
revs->diffopt.filter ||
DIFF_OPT_TST(&revs->diffopt, FOLLOW_RENAMES))
revs->diff = 1;
if (revs->topo_order)
revs->limited = 1;
if (revs->prune_data) {
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 07:52:13 +08:00
diff_tree_setup_paths(revs->prune_data, &revs->pruning);
Finally implement "git log --follow" Ok, I've really held off doing this too damn long, because I'm lazy, and I was always hoping that somebody else would do it. But no, people keep asking for it, but nobody actually did anything, so I decided I might as well bite the bullet, and instead of telling people they could add a "--follow" flag to "git log" to do what they want to do, I decided that it looks like I just have to do it for them.. The code wasn't actually that complicated, in that the diffstat for this patch literally says "70 insertions(+), 1 deletions(-)", but I will have to admit that in order to get to this fairly simple patch, you did have to know and understand the internal git diff generation machinery pretty well, and had to really be able to follow how commit generation interacts with generating patches and generating the log. So I suspect that while I was right that it wasn't that hard, I might have been expecting too much of random people - this patch does seem to be firmly in the core "Linus or Junio" territory. To make a long story short: I'm sorry for it taking so long until I just did it. I'm not going to guarantee that this works for everybody, but you really can just look at the patch, and after the appropriate appreciative noises ("Ooh, aah") over how clever I am, you can then just notice that the code itself isn't really that complicated. All the real new code is in the new "try_to_follow_renames()" function. It really isn't rocket science: we notice that the pathname we were looking at went away, so we start a full tree diff and try to see if we can instead make that pathname be a rename or a copy from some other previous pathname. And if we can, we just continue, except we show *that* particular diff, and ever after we use the _previous_ pathname. One thing to look out for: the "rename detection" is considered to be a singular event in the _linear_ "git log" output! That's what people want to do, but I just wanted to point out that this patch is *not* carrying around a "commit,pathname" kind of pair and it's *not* going to be able to notice the file coming from multiple *different* files in earlier history. IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind of "files have single identities" kind of semantics, and git log will just pick the identity based on the normal move/copy heuristics _as_if_ the history could be linearized. Put another way: I think the model is broken, but given the broken model, I think this patch does just about as well as you can do. If you have merges with the same "file" having different filenames over the two branches, git will just end up picking _one_ of the pathnames at the point where the newer one goes away. It never looks at multiple pathnames in parallel. And if you understood all that, you probably didn't need it explained, and if you didn't understand the above blathering, it doesn't really mtter to you. What matters to you is that you can now do git log -p --follow builtin-rev-list.c and it will find the point where the old "rev-list.c" got renamed to "builtin-rev-list.c" and show it as such. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-20 05:22:46 +08:00
/* Can't prune commits with rename following: the paths change.. */
if (!DIFF_OPT_TST(&revs->diffopt, FOLLOW_RENAMES))
revs->prune = 1;
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 07:52:13 +08:00
if (!revs->full_diff)
diff_tree_setup_paths(revs->prune_data, &revs->diffopt);
}
if (revs->combine_merges)
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 07:52:13 +08:00
revs->ignore_merges = 0;
revs->diffopt.abbrev = revs->abbrev;
if (diff_setup_done(&revs->diffopt) < 0)
die("diff_setup_done failed");
compile_grep_patterns(&revs->grep_filter);
if (revs->reverse && revs->reflog_info)
die("cannot combine --reverse with --walk-reflogs");
if (revs->rewrite_parents && revs->children.name)
die("cannot combine --parents and --children");
/*
* Limitations on the graph functionality
*/
if (revs->reverse && revs->graph)
die("cannot combine --reverse with --graph");
if (revs->reflog_info && revs->graph)
die("cannot combine --walk-reflogs with --graph");
return left;
}
static void add_child(struct rev_info *revs, struct commit *parent, struct commit *child)
{
struct commit_list *l = xcalloc(1, sizeof(*l));
l->item = child;
l->next = add_decoration(&revs->children, &parent->object, l);
}
revision traversal: show full history with merge simplification The --full-history traversal keeps all merges in addition to non-merge commits that touch paths in the given pathspec. This is useful to view both sides of a merge in a topology like this: A---M---o / / ---O---B even when A and B makes identical change to the given paths. The revision traversal without --full-history aims to come up with the simplest history to explain the final state of the tree, and one of the side branches can be pruned away. The behaviour to keep all merges however is inconvenient if neither A nor B touches the paths we are interested in. --full-history reduces the topology to: ---O---M---o in such a case, without removing M. This adds a post processing phase on top of --full-history traversal to remove needless merges from the resulting history. The idea is to compute, for each commit in the "full history" result set, the commit that should replace it in the simplified history. The commit to replace it in the final history is determined as follows: * In any case, we first figure out the replacement commits of parents of the commit we are looking at. The commit we are looking at is rewritten as if the replacement commits of its original parents are its parents. While doing so, we reduce the redundant parents from the rewritten parent list by not just removing the identical ones, but also removing a parent that is an ancestor of another parent. * After the above parent simplification, if the commit is a root commit, an UNINTERESTING commit, a merge commit, or modifies the paths we are interested in, then the replacement commit of the commit is itself. In other words, such a commit is not dropped from the final result. The first point above essentially means that the history is rewritten in the bottom up direction. We can rewrite the parent list of a commit only after we know how all of its parents are rewritten. This means that the processing needs to happen on the full history (i.e. after limit_list()). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-31 16:17:41 +08:00
static int remove_duplicate_parents(struct commit *commit)
{
struct commit_list **pp, *p;
int surviving_parents;
/* Examine existing parents while marking ones we have seen... */
pp = &commit->parents;
while ((p = *pp) != NULL) {
struct commit *parent = p->item;
if (parent->object.flags & TMP_MARK) {
*pp = p->next;
continue;
}
parent->object.flags |= TMP_MARK;
pp = &p->next;
}
/* count them while clearing the temporary mark */
surviving_parents = 0;
for (p = commit->parents; p; p = p->next) {
p->item->object.flags &= ~TMP_MARK;
surviving_parents++;
}
return surviving_parents;
}
struct merge_simplify_state {
struct commit *simplified;
};
static struct merge_simplify_state *locate_simplify_state(struct rev_info *revs, struct commit *commit)
{
struct merge_simplify_state *st;
st = lookup_decoration(&revs->merge_simplification, &commit->object);
if (!st) {
st = xcalloc(1, sizeof(*st));
add_decoration(&revs->merge_simplification, &commit->object, st);
}
return st;
}
static struct commit_list **simplify_one(struct rev_info *revs, struct commit *commit, struct commit_list **tail)
revision traversal: show full history with merge simplification The --full-history traversal keeps all merges in addition to non-merge commits that touch paths in the given pathspec. This is useful to view both sides of a merge in a topology like this: A---M---o / / ---O---B even when A and B makes identical change to the given paths. The revision traversal without --full-history aims to come up with the simplest history to explain the final state of the tree, and one of the side branches can be pruned away. The behaviour to keep all merges however is inconvenient if neither A nor B touches the paths we are interested in. --full-history reduces the topology to: ---O---M---o in such a case, without removing M. This adds a post processing phase on top of --full-history traversal to remove needless merges from the resulting history. The idea is to compute, for each commit in the "full history" result set, the commit that should replace it in the simplified history. The commit to replace it in the final history is determined as follows: * In any case, we first figure out the replacement commits of parents of the commit we are looking at. The commit we are looking at is rewritten as if the replacement commits of its original parents are its parents. While doing so, we reduce the redundant parents from the rewritten parent list by not just removing the identical ones, but also removing a parent that is an ancestor of another parent. * After the above parent simplification, if the commit is a root commit, an UNINTERESTING commit, a merge commit, or modifies the paths we are interested in, then the replacement commit of the commit is itself. In other words, such a commit is not dropped from the final result. The first point above essentially means that the history is rewritten in the bottom up direction. We can rewrite the parent list of a commit only after we know how all of its parents are rewritten. This means that the processing needs to happen on the full history (i.e. after limit_list()). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-31 16:17:41 +08:00
{
struct commit_list *p;
struct merge_simplify_state *st, *pst;
revision traversal: show full history with merge simplification The --full-history traversal keeps all merges in addition to non-merge commits that touch paths in the given pathspec. This is useful to view both sides of a merge in a topology like this: A---M---o / / ---O---B even when A and B makes identical change to the given paths. The revision traversal without --full-history aims to come up with the simplest history to explain the final state of the tree, and one of the side branches can be pruned away. The behaviour to keep all merges however is inconvenient if neither A nor B touches the paths we are interested in. --full-history reduces the topology to: ---O---M---o in such a case, without removing M. This adds a post processing phase on top of --full-history traversal to remove needless merges from the resulting history. The idea is to compute, for each commit in the "full history" result set, the commit that should replace it in the simplified history. The commit to replace it in the final history is determined as follows: * In any case, we first figure out the replacement commits of parents of the commit we are looking at. The commit we are looking at is rewritten as if the replacement commits of its original parents are its parents. While doing so, we reduce the redundant parents from the rewritten parent list by not just removing the identical ones, but also removing a parent that is an ancestor of another parent. * After the above parent simplification, if the commit is a root commit, an UNINTERESTING commit, a merge commit, or modifies the paths we are interested in, then the replacement commit of the commit is itself. In other words, such a commit is not dropped from the final result. The first point above essentially means that the history is rewritten in the bottom up direction. We can rewrite the parent list of a commit only after we know how all of its parents are rewritten. This means that the processing needs to happen on the full history (i.e. after limit_list()). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-31 16:17:41 +08:00
int cnt;
st = locate_simplify_state(revs, commit);
revision traversal: show full history with merge simplification The --full-history traversal keeps all merges in addition to non-merge commits that touch paths in the given pathspec. This is useful to view both sides of a merge in a topology like this: A---M---o / / ---O---B even when A and B makes identical change to the given paths. The revision traversal without --full-history aims to come up with the simplest history to explain the final state of the tree, and one of the side branches can be pruned away. The behaviour to keep all merges however is inconvenient if neither A nor B touches the paths we are interested in. --full-history reduces the topology to: ---O---M---o in such a case, without removing M. This adds a post processing phase on top of --full-history traversal to remove needless merges from the resulting history. The idea is to compute, for each commit in the "full history" result set, the commit that should replace it in the simplified history. The commit to replace it in the final history is determined as follows: * In any case, we first figure out the replacement commits of parents of the commit we are looking at. The commit we are looking at is rewritten as if the replacement commits of its original parents are its parents. While doing so, we reduce the redundant parents from the rewritten parent list by not just removing the identical ones, but also removing a parent that is an ancestor of another parent. * After the above parent simplification, if the commit is a root commit, an UNINTERESTING commit, a merge commit, or modifies the paths we are interested in, then the replacement commit of the commit is itself. In other words, such a commit is not dropped from the final result. The first point above essentially means that the history is rewritten in the bottom up direction. We can rewrite the parent list of a commit only after we know how all of its parents are rewritten. This means that the processing needs to happen on the full history (i.e. after limit_list()). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-31 16:17:41 +08:00
/*
* Have we handled this one?
*/
if (st->simplified)
revision traversal: show full history with merge simplification The --full-history traversal keeps all merges in addition to non-merge commits that touch paths in the given pathspec. This is useful to view both sides of a merge in a topology like this: A---M---o / / ---O---B even when A and B makes identical change to the given paths. The revision traversal without --full-history aims to come up with the simplest history to explain the final state of the tree, and one of the side branches can be pruned away. The behaviour to keep all merges however is inconvenient if neither A nor B touches the paths we are interested in. --full-history reduces the topology to: ---O---M---o in such a case, without removing M. This adds a post processing phase on top of --full-history traversal to remove needless merges from the resulting history. The idea is to compute, for each commit in the "full history" result set, the commit that should replace it in the simplified history. The commit to replace it in the final history is determined as follows: * In any case, we first figure out the replacement commits of parents of the commit we are looking at. The commit we are looking at is rewritten as if the replacement commits of its original parents are its parents. While doing so, we reduce the redundant parents from the rewritten parent list by not just removing the identical ones, but also removing a parent that is an ancestor of another parent. * After the above parent simplification, if the commit is a root commit, an UNINTERESTING commit, a merge commit, or modifies the paths we are interested in, then the replacement commit of the commit is itself. In other words, such a commit is not dropped from the final result. The first point above essentially means that the history is rewritten in the bottom up direction. We can rewrite the parent list of a commit only after we know how all of its parents are rewritten. This means that the processing needs to happen on the full history (i.e. after limit_list()). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-31 16:17:41 +08:00
return tail;
/*
* An UNINTERESTING commit simplifies to itself, so does a
* root commit. We do not rewrite parents of such commit
* anyway.
*/
if ((commit->object.flags & UNINTERESTING) || !commit->parents) {
st->simplified = commit;
revision traversal: show full history with merge simplification The --full-history traversal keeps all merges in addition to non-merge commits that touch paths in the given pathspec. This is useful to view both sides of a merge in a topology like this: A---M---o / / ---O---B even when A and B makes identical change to the given paths. The revision traversal without --full-history aims to come up with the simplest history to explain the final state of the tree, and one of the side branches can be pruned away. The behaviour to keep all merges however is inconvenient if neither A nor B touches the paths we are interested in. --full-history reduces the topology to: ---O---M---o in such a case, without removing M. This adds a post processing phase on top of --full-history traversal to remove needless merges from the resulting history. The idea is to compute, for each commit in the "full history" result set, the commit that should replace it in the simplified history. The commit to replace it in the final history is determined as follows: * In any case, we first figure out the replacement commits of parents of the commit we are looking at. The commit we are looking at is rewritten as if the replacement commits of its original parents are its parents. While doing so, we reduce the redundant parents from the rewritten parent list by not just removing the identical ones, but also removing a parent that is an ancestor of another parent. * After the above parent simplification, if the commit is a root commit, an UNINTERESTING commit, a merge commit, or modifies the paths we are interested in, then the replacement commit of the commit is itself. In other words, such a commit is not dropped from the final result. The first point above essentially means that the history is rewritten in the bottom up direction. We can rewrite the parent list of a commit only after we know how all of its parents are rewritten. This means that the processing needs to happen on the full history (i.e. after limit_list()). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-31 16:17:41 +08:00
return tail;
}
/*
* Do we know what commit all of our parents should be rewritten to?
* Otherwise we are not ready to rewrite this one yet.
*/
for (cnt = 0, p = commit->parents; p; p = p->next) {
pst = locate_simplify_state(revs, p->item);
if (!pst->simplified) {
revision traversal: show full history with merge simplification The --full-history traversal keeps all merges in addition to non-merge commits that touch paths in the given pathspec. This is useful to view both sides of a merge in a topology like this: A---M---o / / ---O---B even when A and B makes identical change to the given paths. The revision traversal without --full-history aims to come up with the simplest history to explain the final state of the tree, and one of the side branches can be pruned away. The behaviour to keep all merges however is inconvenient if neither A nor B touches the paths we are interested in. --full-history reduces the topology to: ---O---M---o in such a case, without removing M. This adds a post processing phase on top of --full-history traversal to remove needless merges from the resulting history. The idea is to compute, for each commit in the "full history" result set, the commit that should replace it in the simplified history. The commit to replace it in the final history is determined as follows: * In any case, we first figure out the replacement commits of parents of the commit we are looking at. The commit we are looking at is rewritten as if the replacement commits of its original parents are its parents. While doing so, we reduce the redundant parents from the rewritten parent list by not just removing the identical ones, but also removing a parent that is an ancestor of another parent. * After the above parent simplification, if the commit is a root commit, an UNINTERESTING commit, a merge commit, or modifies the paths we are interested in, then the replacement commit of the commit is itself. In other words, such a commit is not dropped from the final result. The first point above essentially means that the history is rewritten in the bottom up direction. We can rewrite the parent list of a commit only after we know how all of its parents are rewritten. This means that the processing needs to happen on the full history (i.e. after limit_list()). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-31 16:17:41 +08:00
tail = &commit_list_insert(p->item, tail)->next;
cnt++;
}
}
if (cnt) {
tail = &commit_list_insert(commit, tail)->next;
revision traversal: show full history with merge simplification The --full-history traversal keeps all merges in addition to non-merge commits that touch paths in the given pathspec. This is useful to view both sides of a merge in a topology like this: A---M---o / / ---O---B even when A and B makes identical change to the given paths. The revision traversal without --full-history aims to come up with the simplest history to explain the final state of the tree, and one of the side branches can be pruned away. The behaviour to keep all merges however is inconvenient if neither A nor B touches the paths we are interested in. --full-history reduces the topology to: ---O---M---o in such a case, without removing M. This adds a post processing phase on top of --full-history traversal to remove needless merges from the resulting history. The idea is to compute, for each commit in the "full history" result set, the commit that should replace it in the simplified history. The commit to replace it in the final history is determined as follows: * In any case, we first figure out the replacement commits of parents of the commit we are looking at. The commit we are looking at is rewritten as if the replacement commits of its original parents are its parents. While doing so, we reduce the redundant parents from the rewritten parent list by not just removing the identical ones, but also removing a parent that is an ancestor of another parent. * After the above parent simplification, if the commit is a root commit, an UNINTERESTING commit, a merge commit, or modifies the paths we are interested in, then the replacement commit of the commit is itself. In other words, such a commit is not dropped from the final result. The first point above essentially means that the history is rewritten in the bottom up direction. We can rewrite the parent list of a commit only after we know how all of its parents are rewritten. This means that the processing needs to happen on the full history (i.e. after limit_list()). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-31 16:17:41 +08:00
return tail;
}
revision traversal: show full history with merge simplification The --full-history traversal keeps all merges in addition to non-merge commits that touch paths in the given pathspec. This is useful to view both sides of a merge in a topology like this: A---M---o / / ---O---B even when A and B makes identical change to the given paths. The revision traversal without --full-history aims to come up with the simplest history to explain the final state of the tree, and one of the side branches can be pruned away. The behaviour to keep all merges however is inconvenient if neither A nor B touches the paths we are interested in. --full-history reduces the topology to: ---O---M---o in such a case, without removing M. This adds a post processing phase on top of --full-history traversal to remove needless merges from the resulting history. The idea is to compute, for each commit in the "full history" result set, the commit that should replace it in the simplified history. The commit to replace it in the final history is determined as follows: * In any case, we first figure out the replacement commits of parents of the commit we are looking at. The commit we are looking at is rewritten as if the replacement commits of its original parents are its parents. While doing so, we reduce the redundant parents from the rewritten parent list by not just removing the identical ones, but also removing a parent that is an ancestor of another parent. * After the above parent simplification, if the commit is a root commit, an UNINTERESTING commit, a merge commit, or modifies the paths we are interested in, then the replacement commit of the commit is itself. In other words, such a commit is not dropped from the final result. The first point above essentially means that the history is rewritten in the bottom up direction. We can rewrite the parent list of a commit only after we know how all of its parents are rewritten. This means that the processing needs to happen on the full history (i.e. after limit_list()). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-31 16:17:41 +08:00
/*
* Rewrite our list of parents.
*/
for (p = commit->parents; p; p = p->next) {
pst = locate_simplify_state(revs, p->item);
p->item = pst->simplified;
}
revision traversal: show full history with merge simplification The --full-history traversal keeps all merges in addition to non-merge commits that touch paths in the given pathspec. This is useful to view both sides of a merge in a topology like this: A---M---o / / ---O---B even when A and B makes identical change to the given paths. The revision traversal without --full-history aims to come up with the simplest history to explain the final state of the tree, and one of the side branches can be pruned away. The behaviour to keep all merges however is inconvenient if neither A nor B touches the paths we are interested in. --full-history reduces the topology to: ---O---M---o in such a case, without removing M. This adds a post processing phase on top of --full-history traversal to remove needless merges from the resulting history. The idea is to compute, for each commit in the "full history" result set, the commit that should replace it in the simplified history. The commit to replace it in the final history is determined as follows: * In any case, we first figure out the replacement commits of parents of the commit we are looking at. The commit we are looking at is rewritten as if the replacement commits of its original parents are its parents. While doing so, we reduce the redundant parents from the rewritten parent list by not just removing the identical ones, but also removing a parent that is an ancestor of another parent. * After the above parent simplification, if the commit is a root commit, an UNINTERESTING commit, a merge commit, or modifies the paths we are interested in, then the replacement commit of the commit is itself. In other words, such a commit is not dropped from the final result. The first point above essentially means that the history is rewritten in the bottom up direction. We can rewrite the parent list of a commit only after we know how all of its parents are rewritten. This means that the processing needs to happen on the full history (i.e. after limit_list()). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-31 16:17:41 +08:00
cnt = remove_duplicate_parents(commit);
/*
* It is possible that we are a merge and one side branch
* does not have any commit that touches the given paths;
* in such a case, the immediate parents will be rewritten
* to different commits.
*
* o----X X: the commit we are looking at;
* / / o: a commit that touches the paths;
* ---o----'
*
* Further reduce the parents by removing redundant parents.
*/
if (1 < cnt) {
struct commit_list *h = reduce_heads(commit->parents);
cnt = commit_list_count(h);
free_commit_list(commit->parents);
commit->parents = h;
}
/*
* A commit simplifies to itself if it is a root, if it is
* UNINTERESTING, if it touches the given paths, or if it is a
* merge and its parents simplifies to more than one commits
* (the first two cases are already handled at the beginning of
* this function).
*
* Otherwise, it simplifies to what its sole parent simplifies to.
*/
if (!cnt ||
(commit->object.flags & UNINTERESTING) ||
!(commit->object.flags & TREESAME) ||
(1 < cnt))
st->simplified = commit;
else {
pst = locate_simplify_state(revs, commit->parents->item);
st->simplified = pst->simplified;
}
revision traversal: show full history with merge simplification The --full-history traversal keeps all merges in addition to non-merge commits that touch paths in the given pathspec. This is useful to view both sides of a merge in a topology like this: A---M---o / / ---O---B even when A and B makes identical change to the given paths. The revision traversal without --full-history aims to come up with the simplest history to explain the final state of the tree, and one of the side branches can be pruned away. The behaviour to keep all merges however is inconvenient if neither A nor B touches the paths we are interested in. --full-history reduces the topology to: ---O---M---o in such a case, without removing M. This adds a post processing phase on top of --full-history traversal to remove needless merges from the resulting history. The idea is to compute, for each commit in the "full history" result set, the commit that should replace it in the simplified history. The commit to replace it in the final history is determined as follows: * In any case, we first figure out the replacement commits of parents of the commit we are looking at. The commit we are looking at is rewritten as if the replacement commits of its original parents are its parents. While doing so, we reduce the redundant parents from the rewritten parent list by not just removing the identical ones, but also removing a parent that is an ancestor of another parent. * After the above parent simplification, if the commit is a root commit, an UNINTERESTING commit, a merge commit, or modifies the paths we are interested in, then the replacement commit of the commit is itself. In other words, such a commit is not dropped from the final result. The first point above essentially means that the history is rewritten in the bottom up direction. We can rewrite the parent list of a commit only after we know how all of its parents are rewritten. This means that the processing needs to happen on the full history (i.e. after limit_list()). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-31 16:17:41 +08:00
return tail;
}
static void simplify_merges(struct rev_info *revs)
{
struct commit_list *list;
struct commit_list *yet_to_do, **tail;
if (!revs->topo_order)
sort_in_topological_order(&revs->commits, revs->lifo);
if (!revs->prune)
return;
revision traversal: show full history with merge simplification The --full-history traversal keeps all merges in addition to non-merge commits that touch paths in the given pathspec. This is useful to view both sides of a merge in a topology like this: A---M---o / / ---O---B even when A and B makes identical change to the given paths. The revision traversal without --full-history aims to come up with the simplest history to explain the final state of the tree, and one of the side branches can be pruned away. The behaviour to keep all merges however is inconvenient if neither A nor B touches the paths we are interested in. --full-history reduces the topology to: ---O---M---o in such a case, without removing M. This adds a post processing phase on top of --full-history traversal to remove needless merges from the resulting history. The idea is to compute, for each commit in the "full history" result set, the commit that should replace it in the simplified history. The commit to replace it in the final history is determined as follows: * In any case, we first figure out the replacement commits of parents of the commit we are looking at. The commit we are looking at is rewritten as if the replacement commits of its original parents are its parents. While doing so, we reduce the redundant parents from the rewritten parent list by not just removing the identical ones, but also removing a parent that is an ancestor of another parent. * After the above parent simplification, if the commit is a root commit, an UNINTERESTING commit, a merge commit, or modifies the paths we are interested in, then the replacement commit of the commit is itself. In other words, such a commit is not dropped from the final result. The first point above essentially means that the history is rewritten in the bottom up direction. We can rewrite the parent list of a commit only after we know how all of its parents are rewritten. This means that the processing needs to happen on the full history (i.e. after limit_list()). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-31 16:17:41 +08:00
/* feed the list reversed */
yet_to_do = NULL;
for (list = revs->commits; list; list = list->next)
commit_list_insert(list->item, &yet_to_do);
while (yet_to_do) {
list = yet_to_do;
yet_to_do = NULL;
tail = &yet_to_do;
while (list) {
struct commit *commit = list->item;
struct commit_list *next = list->next;
free(list);
list = next;
tail = simplify_one(revs, commit, tail);
revision traversal: show full history with merge simplification The --full-history traversal keeps all merges in addition to non-merge commits that touch paths in the given pathspec. This is useful to view both sides of a merge in a topology like this: A---M---o / / ---O---B even when A and B makes identical change to the given paths. The revision traversal without --full-history aims to come up with the simplest history to explain the final state of the tree, and one of the side branches can be pruned away. The behaviour to keep all merges however is inconvenient if neither A nor B touches the paths we are interested in. --full-history reduces the topology to: ---O---M---o in such a case, without removing M. This adds a post processing phase on top of --full-history traversal to remove needless merges from the resulting history. The idea is to compute, for each commit in the "full history" result set, the commit that should replace it in the simplified history. The commit to replace it in the final history is determined as follows: * In any case, we first figure out the replacement commits of parents of the commit we are looking at. The commit we are looking at is rewritten as if the replacement commits of its original parents are its parents. While doing so, we reduce the redundant parents from the rewritten parent list by not just removing the identical ones, but also removing a parent that is an ancestor of another parent. * After the above parent simplification, if the commit is a root commit, an UNINTERESTING commit, a merge commit, or modifies the paths we are interested in, then the replacement commit of the commit is itself. In other words, such a commit is not dropped from the final result. The first point above essentially means that the history is rewritten in the bottom up direction. We can rewrite the parent list of a commit only after we know how all of its parents are rewritten. This means that the processing needs to happen on the full history (i.e. after limit_list()). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-31 16:17:41 +08:00
}
}
/* clean up the result, removing the simplified ones */
list = revs->commits;
revs->commits = NULL;
tail = &revs->commits;
while (list) {
struct commit *commit = list->item;
struct commit_list *next = list->next;
struct merge_simplify_state *st;
revision traversal: show full history with merge simplification The --full-history traversal keeps all merges in addition to non-merge commits that touch paths in the given pathspec. This is useful to view both sides of a merge in a topology like this: A---M---o / / ---O---B even when A and B makes identical change to the given paths. The revision traversal without --full-history aims to come up with the simplest history to explain the final state of the tree, and one of the side branches can be pruned away. The behaviour to keep all merges however is inconvenient if neither A nor B touches the paths we are interested in. --full-history reduces the topology to: ---O---M---o in such a case, without removing M. This adds a post processing phase on top of --full-history traversal to remove needless merges from the resulting history. The idea is to compute, for each commit in the "full history" result set, the commit that should replace it in the simplified history. The commit to replace it in the final history is determined as follows: * In any case, we first figure out the replacement commits of parents of the commit we are looking at. The commit we are looking at is rewritten as if the replacement commits of its original parents are its parents. While doing so, we reduce the redundant parents from the rewritten parent list by not just removing the identical ones, but also removing a parent that is an ancestor of another parent. * After the above parent simplification, if the commit is a root commit, an UNINTERESTING commit, a merge commit, or modifies the paths we are interested in, then the replacement commit of the commit is itself. In other words, such a commit is not dropped from the final result. The first point above essentially means that the history is rewritten in the bottom up direction. We can rewrite the parent list of a commit only after we know how all of its parents are rewritten. This means that the processing needs to happen on the full history (i.e. after limit_list()). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-31 16:17:41 +08:00
free(list);
list = next;
st = locate_simplify_state(revs, commit);
if (st->simplified == commit)
revision traversal: show full history with merge simplification The --full-history traversal keeps all merges in addition to non-merge commits that touch paths in the given pathspec. This is useful to view both sides of a merge in a topology like this: A---M---o / / ---O---B even when A and B makes identical change to the given paths. The revision traversal without --full-history aims to come up with the simplest history to explain the final state of the tree, and one of the side branches can be pruned away. The behaviour to keep all merges however is inconvenient if neither A nor B touches the paths we are interested in. --full-history reduces the topology to: ---O---M---o in such a case, without removing M. This adds a post processing phase on top of --full-history traversal to remove needless merges from the resulting history. The idea is to compute, for each commit in the "full history" result set, the commit that should replace it in the simplified history. The commit to replace it in the final history is determined as follows: * In any case, we first figure out the replacement commits of parents of the commit we are looking at. The commit we are looking at is rewritten as if the replacement commits of its original parents are its parents. While doing so, we reduce the redundant parents from the rewritten parent list by not just removing the identical ones, but also removing a parent that is an ancestor of another parent. * After the above parent simplification, if the commit is a root commit, an UNINTERESTING commit, a merge commit, or modifies the paths we are interested in, then the replacement commit of the commit is itself. In other words, such a commit is not dropped from the final result. The first point above essentially means that the history is rewritten in the bottom up direction. We can rewrite the parent list of a commit only after we know how all of its parents are rewritten. This means that the processing needs to happen on the full history (i.e. after limit_list()). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-31 16:17:41 +08:00
tail = &commit_list_insert(commit, tail)->next;
}
}
static void set_children(struct rev_info *revs)
{
struct commit_list *l;
for (l = revs->commits; l; l = l->next) {
struct commit *commit = l->item;
struct commit_list *p;
for (p = commit->parents; p; p = p->next)
add_child(revs, p->item, commit);
}
}
int prepare_revision_walk(struct rev_info *revs)
{
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-20 08:42:35 +08:00
int nr = revs->pending.nr;
struct object_array_entry *e, *list;
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 07:52:13 +08:00
e = list = revs->pending.objects;
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-20 08:42:35 +08:00
revs->pending.nr = 0;
revs->pending.alloc = 0;
revs->pending.objects = NULL;
while (--nr >= 0) {
struct commit *commit = handle_commit(revs, e->item, e->name);
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 07:52:13 +08:00
if (commit) {
if (!(commit->object.flags & SEEN)) {
commit->object.flags |= SEEN;
insert_by_date(commit, &revs->commits);
}
}
e++;
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 07:52:13 +08:00
}
free(list);
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 07:52:13 +08:00
if (revs->no_walk)
return 0;
if (revs->limited)
if (limit_list(revs) < 0)
return -1;
if (revs->topo_order)
sort_in_topological_order(&revs->commits, revs->lifo);
revision traversal: show full history with merge simplification The --full-history traversal keeps all merges in addition to non-merge commits that touch paths in the given pathspec. This is useful to view both sides of a merge in a topology like this: A---M---o / / ---O---B even when A and B makes identical change to the given paths. The revision traversal without --full-history aims to come up with the simplest history to explain the final state of the tree, and one of the side branches can be pruned away. The behaviour to keep all merges however is inconvenient if neither A nor B touches the paths we are interested in. --full-history reduces the topology to: ---O---M---o in such a case, without removing M. This adds a post processing phase on top of --full-history traversal to remove needless merges from the resulting history. The idea is to compute, for each commit in the "full history" result set, the commit that should replace it in the simplified history. The commit to replace it in the final history is determined as follows: * In any case, we first figure out the replacement commits of parents of the commit we are looking at. The commit we are looking at is rewritten as if the replacement commits of its original parents are its parents. While doing so, we reduce the redundant parents from the rewritten parent list by not just removing the identical ones, but also removing a parent that is an ancestor of another parent. * After the above parent simplification, if the commit is a root commit, an UNINTERESTING commit, a merge commit, or modifies the paths we are interested in, then the replacement commit of the commit is itself. In other words, such a commit is not dropped from the final result. The first point above essentially means that the history is rewritten in the bottom up direction. We can rewrite the parent list of a commit only after we know how all of its parents are rewritten. This means that the processing needs to happen on the full history (i.e. after limit_list()). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-31 16:17:41 +08:00
if (revs->simplify_merges)
simplify_merges(revs);
if (revs->children.name)
set_children(revs);
return 0;
}
enum rewrite_result {
rewrite_one_ok,
rewrite_one_noparents,
rewrite_one_error
};
static enum rewrite_result rewrite_one(struct rev_info *revs, struct commit **pp)
{
struct commit_list *cache = NULL;
for (;;) {
struct commit *p = *pp;
if (!revs->limited)
if (add_parents_to_list(revs, p, &revs->commits, &cache) < 0)
return rewrite_one_error;
if (p->parents && p->parents->next)
return rewrite_one_ok;
if (p->object.flags & UNINTERESTING)
return rewrite_one_ok;
if (!(p->object.flags & TREESAME))
return rewrite_one_ok;
if (!p->parents)
return rewrite_one_noparents;
*pp = p->parents->item;
}
}
static int rewrite_parents(struct rev_info *revs, struct commit *commit)
{
struct commit_list **pp = &commit->parents;
while (*pp) {
struct commit_list *parent = *pp;
switch (rewrite_one(revs, &parent->item)) {
case rewrite_one_ok:
break;
case rewrite_one_noparents:
*pp = parent->next;
continue;
case rewrite_one_error:
return -1;
}
pp = &parent->next;
}
remove_duplicate_parents(commit);
return 0;
}
static int commit_match(struct commit *commit, struct rev_info *opt)
{
if (!opt->grep_filter.pattern_list && !opt->grep_filter.header_list)
return 1;
return grep_buffer(&opt->grep_filter,
NULL, /* we say nothing, not even filename */
commit->buffer, strlen(commit->buffer));
}
static inline int want_ancestry(struct rev_info *revs)
{
return (revs->rewrite_parents || revs->children.name);
}
enum commit_action get_commit_action(struct rev_info *revs, struct commit *commit)
Enhance --early-output format This makes --early-output a bit more advanced, and actually makes it generate multiple "Final output:" headers as it updates things asynchronously. I realize that the "Final output:" line is now illogical, since it's not really final until it also says "done", but It now _always_ generates a "Final output:" header in front of any commit list, and that output header gives you a *guess* at the maximum number of commits available. However, it should be noted that the guess can be completely off: I do a reasonable job estimating it, but it is not meant to be exact. So what happens is that you may get output like this: - at 0.1 seconds: Final output: 2 incomplete .. 2 commits listed .. - half a second later: Final output: 33 incomplete .. 33 commits listed .. - another half a second after that: Final output: 71 incomplete .. 71 commits listed .. - another half second later: Final output: 136 incomplete .. 100 commits listed: we hit the --early-output limit, and .. will only output 100 commits, and after this you'll not .. see an "incomplete" report any more since you got as much .. early output as you asked for! - .. and then finally: Final output: 73106 done .. all the commits .. The above is a real-life scenario on my current kernel tree after having flushed all the caches. Tested with the experimental gitk patch that Paul sent out, and by looking at the actual log output (and verifying that my commit count guesses actually match real life fairly well). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 04:12:05 +08:00
{
if (commit->object.flags & SHOWN)
return commit_ignore;
if (revs->unpacked && has_sha1_pack(commit->object.sha1))
Enhance --early-output format This makes --early-output a bit more advanced, and actually makes it generate multiple "Final output:" headers as it updates things asynchronously. I realize that the "Final output:" line is now illogical, since it's not really final until it also says "done", but It now _always_ generates a "Final output:" header in front of any commit list, and that output header gives you a *guess* at the maximum number of commits available. However, it should be noted that the guess can be completely off: I do a reasonable job estimating it, but it is not meant to be exact. So what happens is that you may get output like this: - at 0.1 seconds: Final output: 2 incomplete .. 2 commits listed .. - half a second later: Final output: 33 incomplete .. 33 commits listed .. - another half a second after that: Final output: 71 incomplete .. 71 commits listed .. - another half second later: Final output: 136 incomplete .. 100 commits listed: we hit the --early-output limit, and .. will only output 100 commits, and after this you'll not .. see an "incomplete" report any more since you got as much .. early output as you asked for! - .. and then finally: Final output: 73106 done .. all the commits .. The above is a real-life scenario on my current kernel tree after having flushed all the caches. Tested with the experimental gitk patch that Paul sent out, and by looking at the actual log output (and verifying that my commit count guesses actually match real life fairly well). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 04:12:05 +08:00
return commit_ignore;
Add "--show-all" revision walker flag for debugging It's really not very easy to visualize the commit walker, because - on purpose - it obvously doesn't show the uninteresting commits! This adds a "--show-all" flag to the revision walker, which will make it show uninteresting commits too, and they'll have a '^' in front of them (it also fixes a logic error for !verbose_header for boundary commits - we should show the '-' even if left_right isn't shown). A separate patch to gitk to teach it the new '^' was sent to paulus. With the change in place, it actually is interesting even for the cases that git doesn't have any problems with, ie for the kernel you can do: gitk -d --show-all v2.6.24.. and you see just how far down it has to parse things to see it all. The use of "-d" is a good idea, since the date-ordered toposort is much better at showing why it goes deep down (ie the date of some of those commits after 2.6.24 is much older, because they were merged from trees that weren't rebased). So I think this is a useful feature even for non-debugging - just to visualize what git does internally more. When it actually breaks out due to the "everybody_uninteresting()" case, it adds the uninteresting commits (both the one it's looking at now, and the list of pending ones) to the list This way, we really list *all* the commits we've looked at. Because we now end up listing commits we may not even have been parsed at all "show_log" and "show_commit" need to protect against commits that don't have a commit buffer entry. That second part is debatable just how it should work. Maybe we shouldn't show such entries at all (with this patch those entries do get shown, they just don't get any message shown with them). But I think this is a useful case. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-10 06:02:07 +08:00
if (revs->show_all)
return commit_show;
Enhance --early-output format This makes --early-output a bit more advanced, and actually makes it generate multiple "Final output:" headers as it updates things asynchronously. I realize that the "Final output:" line is now illogical, since it's not really final until it also says "done", but It now _always_ generates a "Final output:" header in front of any commit list, and that output header gives you a *guess* at the maximum number of commits available. However, it should be noted that the guess can be completely off: I do a reasonable job estimating it, but it is not meant to be exact. So what happens is that you may get output like this: - at 0.1 seconds: Final output: 2 incomplete .. 2 commits listed .. - half a second later: Final output: 33 incomplete .. 33 commits listed .. - another half a second after that: Final output: 71 incomplete .. 71 commits listed .. - another half second later: Final output: 136 incomplete .. 100 commits listed: we hit the --early-output limit, and .. will only output 100 commits, and after this you'll not .. see an "incomplete" report any more since you got as much .. early output as you asked for! - .. and then finally: Final output: 73106 done .. all the commits .. The above is a real-life scenario on my current kernel tree after having flushed all the caches. Tested with the experimental gitk patch that Paul sent out, and by looking at the actual log output (and verifying that my commit count guesses actually match real life fairly well). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 04:12:05 +08:00
if (commit->object.flags & UNINTERESTING)
return commit_ignore;
if (revs->min_age != -1 && (commit->date > revs->min_age))
return commit_ignore;
if (revs->no_merges && commit->parents && commit->parents->next)
return commit_ignore;
if (revs->merges_only && !(commit->parents && commit->parents->next))
return commit_ignore;
Enhance --early-output format This makes --early-output a bit more advanced, and actually makes it generate multiple "Final output:" headers as it updates things asynchronously. I realize that the "Final output:" line is now illogical, since it's not really final until it also says "done", but It now _always_ generates a "Final output:" header in front of any commit list, and that output header gives you a *guess* at the maximum number of commits available. However, it should be noted that the guess can be completely off: I do a reasonable job estimating it, but it is not meant to be exact. So what happens is that you may get output like this: - at 0.1 seconds: Final output: 2 incomplete .. 2 commits listed .. - half a second later: Final output: 33 incomplete .. 33 commits listed .. - another half a second after that: Final output: 71 incomplete .. 71 commits listed .. - another half second later: Final output: 136 incomplete .. 100 commits listed: we hit the --early-output limit, and .. will only output 100 commits, and after this you'll not .. see an "incomplete" report any more since you got as much .. early output as you asked for! - .. and then finally: Final output: 73106 done .. all the commits .. The above is a real-life scenario on my current kernel tree after having flushed all the caches. Tested with the experimental gitk patch that Paul sent out, and by looking at the actual log output (and verifying that my commit count guesses actually match real life fairly well). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 04:12:05 +08:00
if (!commit_match(commit, revs))
return commit_ignore;
if (revs->prune && revs->dense) {
Enhance --early-output format This makes --early-output a bit more advanced, and actually makes it generate multiple "Final output:" headers as it updates things asynchronously. I realize that the "Final output:" line is now illogical, since it's not really final until it also says "done", but It now _always_ generates a "Final output:" header in front of any commit list, and that output header gives you a *guess* at the maximum number of commits available. However, it should be noted that the guess can be completely off: I do a reasonable job estimating it, but it is not meant to be exact. So what happens is that you may get output like this: - at 0.1 seconds: Final output: 2 incomplete .. 2 commits listed .. - half a second later: Final output: 33 incomplete .. 33 commits listed .. - another half a second after that: Final output: 71 incomplete .. 71 commits listed .. - another half second later: Final output: 136 incomplete .. 100 commits listed: we hit the --early-output limit, and .. will only output 100 commits, and after this you'll not .. see an "incomplete" report any more since you got as much .. early output as you asked for! - .. and then finally: Final output: 73106 done .. all the commits .. The above is a real-life scenario on my current kernel tree after having flushed all the caches. Tested with the experimental gitk patch that Paul sent out, and by looking at the actual log output (and verifying that my commit count guesses actually match real life fairly well). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 04:12:05 +08:00
/* Commit without changes? */
if (commit->object.flags & TREESAME) {
Enhance --early-output format This makes --early-output a bit more advanced, and actually makes it generate multiple "Final output:" headers as it updates things asynchronously. I realize that the "Final output:" line is now illogical, since it's not really final until it also says "done", but It now _always_ generates a "Final output:" header in front of any commit list, and that output header gives you a *guess* at the maximum number of commits available. However, it should be noted that the guess can be completely off: I do a reasonable job estimating it, but it is not meant to be exact. So what happens is that you may get output like this: - at 0.1 seconds: Final output: 2 incomplete .. 2 commits listed .. - half a second later: Final output: 33 incomplete .. 33 commits listed .. - another half a second after that: Final output: 71 incomplete .. 71 commits listed .. - another half second later: Final output: 136 incomplete .. 100 commits listed: we hit the --early-output limit, and .. will only output 100 commits, and after this you'll not .. see an "incomplete" report any more since you got as much .. early output as you asked for! - .. and then finally: Final output: 73106 done .. all the commits .. The above is a real-life scenario on my current kernel tree after having flushed all the caches. Tested with the experimental gitk patch that Paul sent out, and by looking at the actual log output (and verifying that my commit count guesses actually match real life fairly well). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 04:12:05 +08:00
/* drop merges unless we want parenthood */
if (!want_ancestry(revs))
Enhance --early-output format This makes --early-output a bit more advanced, and actually makes it generate multiple "Final output:" headers as it updates things asynchronously. I realize that the "Final output:" line is now illogical, since it's not really final until it also says "done", but It now _always_ generates a "Final output:" header in front of any commit list, and that output header gives you a *guess* at the maximum number of commits available. However, it should be noted that the guess can be completely off: I do a reasonable job estimating it, but it is not meant to be exact. So what happens is that you may get output like this: - at 0.1 seconds: Final output: 2 incomplete .. 2 commits listed .. - half a second later: Final output: 33 incomplete .. 33 commits listed .. - another half a second after that: Final output: 71 incomplete .. 71 commits listed .. - another half second later: Final output: 136 incomplete .. 100 commits listed: we hit the --early-output limit, and .. will only output 100 commits, and after this you'll not .. see an "incomplete" report any more since you got as much .. early output as you asked for! - .. and then finally: Final output: 73106 done .. all the commits .. The above is a real-life scenario on my current kernel tree after having flushed all the caches. Tested with the experimental gitk patch that Paul sent out, and by looking at the actual log output (and verifying that my commit count guesses actually match real life fairly well). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 04:12:05 +08:00
return commit_ignore;
/* non-merge - always ignore it */
if (!commit->parents || !commit->parents->next)
return commit_ignore;
}
}
return commit_show;
}
enum commit_action simplify_commit(struct rev_info *revs, struct commit *commit)
{
enum commit_action action = get_commit_action(revs, commit);
if (action == commit_show &&
!revs->show_all &&
revs->prune && revs->dense && want_ancestry(revs)) {
if (rewrite_parents(revs, commit) < 0)
return commit_error;
}
return action;
}
static struct commit *get_revision_1(struct rev_info *revs)
{
if (!revs->commits)
return NULL;
do {
struct commit_list *entry = revs->commits;
struct commit *commit = entry->item;
revs->commits = entry->next;
free(entry);
Make path-limiting be incremental when possible. This makes git-rev-list able to do path-limiting without having to parse all of history before it starts showing the results. This makes things like "git log -- pathname" much more pleasant to use. This is actually a pretty small patch, and the biggest part of it is purely cleanups (turning the "goto next" statements into "continue"), but it's conceptually a lot bigger than it looks. What it does is that if you do a path-limited revision list, and you do _not_ ask for pseudo-parenthood information, it won't do all the path-limiting up-front, but instead do it incrementally in "get_revision()". This is an absolutely huge deal for anything like "git log -- <pathname>", but also for some things that we don't do yet - like the "find where things changed" logic I've described elsewhere, where we want to find the previous revision that changed a file. The reason I put "RFC" in the subject line is that while I've validated it various ways, like doing git-rev-list HEAD -- drivers/char/ | md5sum before-and-after on the kernel archive, it's "git-rev-list" after all. In other words, it's that really really subtle and complex central piece of software. So while I think this is important and should go in asap, I also think it should get lots of testing and eyeballs looking at the code. Btw, don't even bother testing this with the git archive. git itself is so small that parsing the whole revision history for it takes about a second even with path limiting. The thing that _really_ shows this off is doing git log drivers/ on the kernel archive, or even better, on the _historic_ kernel archive. With this change, the response is instantaneous (although seeking to the end of the result will obviously take as long as it ever did). Before this change, the command would think about the result for tens of seconds - or even minutes, in the case of the bigger old kernel archive - before starting to output the results. NOTE NOTE NOTE! Using path limiting with things like "gitk", which uses the "--parents" flag to actually generate a pseudo-history of the resulting commits won't actually see the improvement in interactivity, since that forces git-rev-list to do the whole-history thing after all. MAYBE we can fix that too at some point, but I won't promise anything. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-31 09:05:25 +08:00
if (revs->reflog_info)
fake_reflog_parent(revs->reflog_info, commit);
Make path-limiting be incremental when possible. This makes git-rev-list able to do path-limiting without having to parse all of history before it starts showing the results. This makes things like "git log -- pathname" much more pleasant to use. This is actually a pretty small patch, and the biggest part of it is purely cleanups (turning the "goto next" statements into "continue"), but it's conceptually a lot bigger than it looks. What it does is that if you do a path-limited revision list, and you do _not_ ask for pseudo-parenthood information, it won't do all the path-limiting up-front, but instead do it incrementally in "get_revision()". This is an absolutely huge deal for anything like "git log -- <pathname>", but also for some things that we don't do yet - like the "find where things changed" logic I've described elsewhere, where we want to find the previous revision that changed a file. The reason I put "RFC" in the subject line is that while I've validated it various ways, like doing git-rev-list HEAD -- drivers/char/ | md5sum before-and-after on the kernel archive, it's "git-rev-list" after all. In other words, it's that really really subtle and complex central piece of software. So while I think this is important and should go in asap, I also think it should get lots of testing and eyeballs looking at the code. Btw, don't even bother testing this with the git archive. git itself is so small that parsing the whole revision history for it takes about a second even with path limiting. The thing that _really_ shows this off is doing git log drivers/ on the kernel archive, or even better, on the _historic_ kernel archive. With this change, the response is instantaneous (although seeking to the end of the result will obviously take as long as it ever did). Before this change, the command would think about the result for tens of seconds - or even minutes, in the case of the bigger old kernel archive - before starting to output the results. NOTE NOTE NOTE! Using path limiting with things like "gitk", which uses the "--parents" flag to actually generate a pseudo-history of the resulting commits won't actually see the improvement in interactivity, since that forces git-rev-list to do the whole-history thing after all. MAYBE we can fix that too at some point, but I won't promise anything. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-31 09:05:25 +08:00
/*
* If we haven't done the list limiting, we need to look at
* the parents here. We also need to do the date-based limiting
* that we'd otherwise have done in limit_list().
Make path-limiting be incremental when possible. This makes git-rev-list able to do path-limiting without having to parse all of history before it starts showing the results. This makes things like "git log -- pathname" much more pleasant to use. This is actually a pretty small patch, and the biggest part of it is purely cleanups (turning the "goto next" statements into "continue"), but it's conceptually a lot bigger than it looks. What it does is that if you do a path-limited revision list, and you do _not_ ask for pseudo-parenthood information, it won't do all the path-limiting up-front, but instead do it incrementally in "get_revision()". This is an absolutely huge deal for anything like "git log -- <pathname>", but also for some things that we don't do yet - like the "find where things changed" logic I've described elsewhere, where we want to find the previous revision that changed a file. The reason I put "RFC" in the subject line is that while I've validated it various ways, like doing git-rev-list HEAD -- drivers/char/ | md5sum before-and-after on the kernel archive, it's "git-rev-list" after all. In other words, it's that really really subtle and complex central piece of software. So while I think this is important and should go in asap, I also think it should get lots of testing and eyeballs looking at the code. Btw, don't even bother testing this with the git archive. git itself is so small that parsing the whole revision history for it takes about a second even with path limiting. The thing that _really_ shows this off is doing git log drivers/ on the kernel archive, or even better, on the _historic_ kernel archive. With this change, the response is instantaneous (although seeking to the end of the result will obviously take as long as it ever did). Before this change, the command would think about the result for tens of seconds - or even minutes, in the case of the bigger old kernel archive - before starting to output the results. NOTE NOTE NOTE! Using path limiting with things like "gitk", which uses the "--parents" flag to actually generate a pseudo-history of the resulting commits won't actually see the improvement in interactivity, since that forces git-rev-list to do the whole-history thing after all. MAYBE we can fix that too at some point, but I won't promise anything. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-31 09:05:25 +08:00
*/
if (!revs->limited) {
if (revs->max_age != -1 &&
revision walker: Fix --boundary when limited This cleans up the boundary processing in the commit walker. It - rips out the boundary logic from the commit walker. Placing "negative" commits in the revs->commits list was Ok if all we cared about "boundary" was the UNINTERESTING limiting case, but conceptually it was wrong. - makes get_revision_1() function to walk the commits and return the results as if there is no funny postprocessing flags such as --reverse, --skip nor --max-count. - makes get_revision() function the postprocessing phase: If reverse is given, wait for get_revision_1() to give everything that it would normally give, and then reverse it before consuming. If skip is given, skip that many before going further. If max is given, stop when we gave out that many. Now that we are about to return one positive commit, mark the parents of that commit to be potential boundaries before returning, iff we are doing the boundary processing. Return the commit. - After get_revision() finishes giving out all the positive commits, if we are doing the boundary processing, we look at the parents that we marked as potential boundaries earlier, see if they are really boundaries, and give them out. It loses more code than it adds, even when the new gc_boundary() function, which is purely for early optimization, is counted. Note that this patch is purely for eyeballing and discussion only. It breaks git-bundle's verify logic because the logic does not use BOUNDARY_SHOW flag for its internal computation anymore. After we correct it not to attempt to affect the boundary processing by setting the BOUNDARY_SHOW flag, we can remove BOUNDARY_SHOW from revision.h and use that bit assignment for the new CHILD_SHOWN flag. Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-06 05:10:06 +08:00
(commit->date < revs->max_age))
continue;
if (add_parents_to_list(revs, commit, &revs->commits, NULL) < 0)
die("Failed to traverse parents of commit %s",
sha1_to_hex(commit->object.sha1));
}
Enhance --early-output format This makes --early-output a bit more advanced, and actually makes it generate multiple "Final output:" headers as it updates things asynchronously. I realize that the "Final output:" line is now illogical, since it's not really final until it also says "done", but It now _always_ generates a "Final output:" header in front of any commit list, and that output header gives you a *guess* at the maximum number of commits available. However, it should be noted that the guess can be completely off: I do a reasonable job estimating it, but it is not meant to be exact. So what happens is that you may get output like this: - at 0.1 seconds: Final output: 2 incomplete .. 2 commits listed .. - half a second later: Final output: 33 incomplete .. 33 commits listed .. - another half a second after that: Final output: 71 incomplete .. 71 commits listed .. - another half second later: Final output: 136 incomplete .. 100 commits listed: we hit the --early-output limit, and .. will only output 100 commits, and after this you'll not .. see an "incomplete" report any more since you got as much .. early output as you asked for! - .. and then finally: Final output: 73106 done .. all the commits .. The above is a real-life scenario on my current kernel tree after having flushed all the caches. Tested with the experimental gitk patch that Paul sent out, and by looking at the actual log output (and verifying that my commit count guesses actually match real life fairly well). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 04:12:05 +08:00
switch (simplify_commit(revs, commit)) {
case commit_ignore:
continue;
Enhance --early-output format This makes --early-output a bit more advanced, and actually makes it generate multiple "Final output:" headers as it updates things asynchronously. I realize that the "Final output:" line is now illogical, since it's not really final until it also says "done", but It now _always_ generates a "Final output:" header in front of any commit list, and that output header gives you a *guess* at the maximum number of commits available. However, it should be noted that the guess can be completely off: I do a reasonable job estimating it, but it is not meant to be exact. So what happens is that you may get output like this: - at 0.1 seconds: Final output: 2 incomplete .. 2 commits listed .. - half a second later: Final output: 33 incomplete .. 33 commits listed .. - another half a second after that: Final output: 71 incomplete .. 71 commits listed .. - another half second later: Final output: 136 incomplete .. 100 commits listed: we hit the --early-output limit, and .. will only output 100 commits, and after this you'll not .. see an "incomplete" report any more since you got as much .. early output as you asked for! - .. and then finally: Final output: 73106 done .. all the commits .. The above is a real-life scenario on my current kernel tree after having flushed all the caches. Tested with the experimental gitk patch that Paul sent out, and by looking at the actual log output (and verifying that my commit count guesses actually match real life fairly well). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 04:12:05 +08:00
case commit_error:
die("Failed to simplify parents of commit %s",
sha1_to_hex(commit->object.sha1));
Enhance --early-output format This makes --early-output a bit more advanced, and actually makes it generate multiple "Final output:" headers as it updates things asynchronously. I realize that the "Final output:" line is now illogical, since it's not really final until it also says "done", but It now _always_ generates a "Final output:" header in front of any commit list, and that output header gives you a *guess* at the maximum number of commits available. However, it should be noted that the guess can be completely off: I do a reasonable job estimating it, but it is not meant to be exact. So what happens is that you may get output like this: - at 0.1 seconds: Final output: 2 incomplete .. 2 commits listed .. - half a second later: Final output: 33 incomplete .. 33 commits listed .. - another half a second after that: Final output: 71 incomplete .. 71 commits listed .. - another half second later: Final output: 136 incomplete .. 100 commits listed: we hit the --early-output limit, and .. will only output 100 commits, and after this you'll not .. see an "incomplete" report any more since you got as much .. early output as you asked for! - .. and then finally: Final output: 73106 done .. all the commits .. The above is a real-life scenario on my current kernel tree after having flushed all the caches. Tested with the experimental gitk patch that Paul sent out, and by looking at the actual log output (and verifying that my commit count guesses actually match real life fairly well). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 04:12:05 +08:00
default:
return commit;
}
} while (revs->commits);
return NULL;
}
revision walker: Fix --boundary when limited This cleans up the boundary processing in the commit walker. It - rips out the boundary logic from the commit walker. Placing "negative" commits in the revs->commits list was Ok if all we cared about "boundary" was the UNINTERESTING limiting case, but conceptually it was wrong. - makes get_revision_1() function to walk the commits and return the results as if there is no funny postprocessing flags such as --reverse, --skip nor --max-count. - makes get_revision() function the postprocessing phase: If reverse is given, wait for get_revision_1() to give everything that it would normally give, and then reverse it before consuming. If skip is given, skip that many before going further. If max is given, stop when we gave out that many. Now that we are about to return one positive commit, mark the parents of that commit to be potential boundaries before returning, iff we are doing the boundary processing. Return the commit. - After get_revision() finishes giving out all the positive commits, if we are doing the boundary processing, we look at the parents that we marked as potential boundaries earlier, see if they are really boundaries, and give them out. It loses more code than it adds, even when the new gc_boundary() function, which is purely for early optimization, is counted. Note that this patch is purely for eyeballing and discussion only. It breaks git-bundle's verify logic because the logic does not use BOUNDARY_SHOW flag for its internal computation anymore. After we correct it not to attempt to affect the boundary processing by setting the BOUNDARY_SHOW flag, we can remove BOUNDARY_SHOW from revision.h and use that bit assignment for the new CHILD_SHOWN flag. Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-06 05:10:06 +08:00
static void gc_boundary(struct object_array *array)
{
unsigned nr = array->nr;
unsigned alloc = array->alloc;
struct object_array_entry *objects = array->objects;
if (alloc <= nr) {
unsigned i, j;
for (i = j = 0; i < nr; i++) {
if (objects[i].item->flags & SHOWN)
continue;
if (i != j)
objects[j] = objects[i];
j++;
}
for (i = j; i < nr; i++)
revision walker: Fix --boundary when limited This cleans up the boundary processing in the commit walker. It - rips out the boundary logic from the commit walker. Placing "negative" commits in the revs->commits list was Ok if all we cared about "boundary" was the UNINTERESTING limiting case, but conceptually it was wrong. - makes get_revision_1() function to walk the commits and return the results as if there is no funny postprocessing flags such as --reverse, --skip nor --max-count. - makes get_revision() function the postprocessing phase: If reverse is given, wait for get_revision_1() to give everything that it would normally give, and then reverse it before consuming. If skip is given, skip that many before going further. If max is given, stop when we gave out that many. Now that we are about to return one positive commit, mark the parents of that commit to be potential boundaries before returning, iff we are doing the boundary processing. Return the commit. - After get_revision() finishes giving out all the positive commits, if we are doing the boundary processing, we look at the parents that we marked as potential boundaries earlier, see if they are really boundaries, and give them out. It loses more code than it adds, even when the new gc_boundary() function, which is purely for early optimization, is counted. Note that this patch is purely for eyeballing and discussion only. It breaks git-bundle's verify logic because the logic does not use BOUNDARY_SHOW flag for its internal computation anymore. After we correct it not to attempt to affect the boundary processing by setting the BOUNDARY_SHOW flag, we can remove BOUNDARY_SHOW from revision.h and use that bit assignment for the new CHILD_SHOWN flag. Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-06 05:10:06 +08:00
objects[i].item = NULL;
array->nr = j;
}
}
static void create_boundary_commit_list(struct rev_info *revs)
{
unsigned i;
struct commit *c;
struct object_array *array = &revs->boundary_commits;
struct object_array_entry *objects = array->objects;
/*
* If revs->commits is non-NULL at this point, an error occurred in
* get_revision_1(). Ignore the error and continue printing the
* boundary commits anyway. (This is what the code has always
* done.)
*/
if (revs->commits) {
free_commit_list(revs->commits);
revs->commits = NULL;
}
/*
* Put all of the actual boundary commits from revs->boundary_commits
* into revs->commits
*/
for (i = 0; i < array->nr; i++) {
c = (struct commit *)(objects[i].item);
if (!c)
continue;
if (!(c->object.flags & CHILD_SHOWN))
continue;
if (c->object.flags & (SHOWN | BOUNDARY))
continue;
c->object.flags |= BOUNDARY;
commit_list_insert(c, &revs->commits);
}
/*
* If revs->topo_order is set, sort the boundary commits
* in topological order
*/
sort_in_topological_order(&revs->commits, revs->lifo);
}
static struct commit *get_revision_internal(struct rev_info *revs)
{
struct commit *c = NULL;
revision walker: Fix --boundary when limited This cleans up the boundary processing in the commit walker. It - rips out the boundary logic from the commit walker. Placing "negative" commits in the revs->commits list was Ok if all we cared about "boundary" was the UNINTERESTING limiting case, but conceptually it was wrong. - makes get_revision_1() function to walk the commits and return the results as if there is no funny postprocessing flags such as --reverse, --skip nor --max-count. - makes get_revision() function the postprocessing phase: If reverse is given, wait for get_revision_1() to give everything that it would normally give, and then reverse it before consuming. If skip is given, skip that many before going further. If max is given, stop when we gave out that many. Now that we are about to return one positive commit, mark the parents of that commit to be potential boundaries before returning, iff we are doing the boundary processing. Return the commit. - After get_revision() finishes giving out all the positive commits, if we are doing the boundary processing, we look at the parents that we marked as potential boundaries earlier, see if they are really boundaries, and give them out. It loses more code than it adds, even when the new gc_boundary() function, which is purely for early optimization, is counted. Note that this patch is purely for eyeballing and discussion only. It breaks git-bundle's verify logic because the logic does not use BOUNDARY_SHOW flag for its internal computation anymore. After we correct it not to attempt to affect the boundary processing by setting the BOUNDARY_SHOW flag, we can remove BOUNDARY_SHOW from revision.h and use that bit assignment for the new CHILD_SHOWN flag. Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-06 05:10:06 +08:00
struct commit_list *l;
if (revs->boundary == 2) {
/*
* All of the normal commits have already been returned,
* and we are now returning boundary commits.
* create_boundary_commit_list() has populated
* revs->commits with the remaining commits to return.
*/
c = pop_commit(&revs->commits);
if (c)
c->object.flags |= SHOWN;
return c;
}
revision walker: Fix --boundary when limited This cleans up the boundary processing in the commit walker. It - rips out the boundary logic from the commit walker. Placing "negative" commits in the revs->commits list was Ok if all we cared about "boundary" was the UNINTERESTING limiting case, but conceptually it was wrong. - makes get_revision_1() function to walk the commits and return the results as if there is no funny postprocessing flags such as --reverse, --skip nor --max-count. - makes get_revision() function the postprocessing phase: If reverse is given, wait for get_revision_1() to give everything that it would normally give, and then reverse it before consuming. If skip is given, skip that many before going further. If max is given, stop when we gave out that many. Now that we are about to return one positive commit, mark the parents of that commit to be potential boundaries before returning, iff we are doing the boundary processing. Return the commit. - After get_revision() finishes giving out all the positive commits, if we are doing the boundary processing, we look at the parents that we marked as potential boundaries earlier, see if they are really boundaries, and give them out. It loses more code than it adds, even when the new gc_boundary() function, which is purely for early optimization, is counted. Note that this patch is purely for eyeballing and discussion only. It breaks git-bundle's verify logic because the logic does not use BOUNDARY_SHOW flag for its internal computation anymore. After we correct it not to attempt to affect the boundary processing by setting the BOUNDARY_SHOW flag, we can remove BOUNDARY_SHOW from revision.h and use that bit assignment for the new CHILD_SHOWN flag. Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-06 05:10:06 +08:00
/*
* Now pick up what they want to give us
*/
c = get_revision_1(revs);
if (c) {
while (0 < revs->skip_count) {
revs->skip_count--;
c = get_revision_1(revs);
if (!c)
break;
}
revision walker: Fix --boundary when limited This cleans up the boundary processing in the commit walker. It - rips out the boundary logic from the commit walker. Placing "negative" commits in the revs->commits list was Ok if all we cared about "boundary" was the UNINTERESTING limiting case, but conceptually it was wrong. - makes get_revision_1() function to walk the commits and return the results as if there is no funny postprocessing flags such as --reverse, --skip nor --max-count. - makes get_revision() function the postprocessing phase: If reverse is given, wait for get_revision_1() to give everything that it would normally give, and then reverse it before consuming. If skip is given, skip that many before going further. If max is given, stop when we gave out that many. Now that we are about to return one positive commit, mark the parents of that commit to be potential boundaries before returning, iff we are doing the boundary processing. Return the commit. - After get_revision() finishes giving out all the positive commits, if we are doing the boundary processing, we look at the parents that we marked as potential boundaries earlier, see if they are really boundaries, and give them out. It loses more code than it adds, even when the new gc_boundary() function, which is purely for early optimization, is counted. Note that this patch is purely for eyeballing and discussion only. It breaks git-bundle's verify logic because the logic does not use BOUNDARY_SHOW flag for its internal computation anymore. After we correct it not to attempt to affect the boundary processing by setting the BOUNDARY_SHOW flag, we can remove BOUNDARY_SHOW from revision.h and use that bit assignment for the new CHILD_SHOWN flag. Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-06 05:10:06 +08:00
}
/*
* Check the max_count.
*/
switch (revs->max_count) {
case -1:
break;
case 0:
revision walker: Fix --boundary when limited This cleans up the boundary processing in the commit walker. It - rips out the boundary logic from the commit walker. Placing "negative" commits in the revs->commits list was Ok if all we cared about "boundary" was the UNINTERESTING limiting case, but conceptually it was wrong. - makes get_revision_1() function to walk the commits and return the results as if there is no funny postprocessing flags such as --reverse, --skip nor --max-count. - makes get_revision() function the postprocessing phase: If reverse is given, wait for get_revision_1() to give everything that it would normally give, and then reverse it before consuming. If skip is given, skip that many before going further. If max is given, stop when we gave out that many. Now that we are about to return one positive commit, mark the parents of that commit to be potential boundaries before returning, iff we are doing the boundary processing. Return the commit. - After get_revision() finishes giving out all the positive commits, if we are doing the boundary processing, we look at the parents that we marked as potential boundaries earlier, see if they are really boundaries, and give them out. It loses more code than it adds, even when the new gc_boundary() function, which is purely for early optimization, is counted. Note that this patch is purely for eyeballing and discussion only. It breaks git-bundle's verify logic because the logic does not use BOUNDARY_SHOW flag for its internal computation anymore. After we correct it not to attempt to affect the boundary processing by setting the BOUNDARY_SHOW flag, we can remove BOUNDARY_SHOW from revision.h and use that bit assignment for the new CHILD_SHOWN flag. Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-06 05:10:06 +08:00
c = NULL;
break;
default:
revs->max_count--;
}
revision walker: Fix --boundary when limited This cleans up the boundary processing in the commit walker. It - rips out the boundary logic from the commit walker. Placing "negative" commits in the revs->commits list was Ok if all we cared about "boundary" was the UNINTERESTING limiting case, but conceptually it was wrong. - makes get_revision_1() function to walk the commits and return the results as if there is no funny postprocessing flags such as --reverse, --skip nor --max-count. - makes get_revision() function the postprocessing phase: If reverse is given, wait for get_revision_1() to give everything that it would normally give, and then reverse it before consuming. If skip is given, skip that many before going further. If max is given, stop when we gave out that many. Now that we are about to return one positive commit, mark the parents of that commit to be potential boundaries before returning, iff we are doing the boundary processing. Return the commit. - After get_revision() finishes giving out all the positive commits, if we are doing the boundary processing, we look at the parents that we marked as potential boundaries earlier, see if they are really boundaries, and give them out. It loses more code than it adds, even when the new gc_boundary() function, which is purely for early optimization, is counted. Note that this patch is purely for eyeballing and discussion only. It breaks git-bundle's verify logic because the logic does not use BOUNDARY_SHOW flag for its internal computation anymore. After we correct it not to attempt to affect the boundary processing by setting the BOUNDARY_SHOW flag, we can remove BOUNDARY_SHOW from revision.h and use that bit assignment for the new CHILD_SHOWN flag. Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-06 05:10:06 +08:00
if (c)
c->object.flags |= SHOWN;
if (!revs->boundary) {
return c;
}
revision walker: Fix --boundary when limited This cleans up the boundary processing in the commit walker. It - rips out the boundary logic from the commit walker. Placing "negative" commits in the revs->commits list was Ok if all we cared about "boundary" was the UNINTERESTING limiting case, but conceptually it was wrong. - makes get_revision_1() function to walk the commits and return the results as if there is no funny postprocessing flags such as --reverse, --skip nor --max-count. - makes get_revision() function the postprocessing phase: If reverse is given, wait for get_revision_1() to give everything that it would normally give, and then reverse it before consuming. If skip is given, skip that many before going further. If max is given, stop when we gave out that many. Now that we are about to return one positive commit, mark the parents of that commit to be potential boundaries before returning, iff we are doing the boundary processing. Return the commit. - After get_revision() finishes giving out all the positive commits, if we are doing the boundary processing, we look at the parents that we marked as potential boundaries earlier, see if they are really boundaries, and give them out. It loses more code than it adds, even when the new gc_boundary() function, which is purely for early optimization, is counted. Note that this patch is purely for eyeballing and discussion only. It breaks git-bundle's verify logic because the logic does not use BOUNDARY_SHOW flag for its internal computation anymore. After we correct it not to attempt to affect the boundary processing by setting the BOUNDARY_SHOW flag, we can remove BOUNDARY_SHOW from revision.h and use that bit assignment for the new CHILD_SHOWN flag. Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-06 05:10:06 +08:00
if (!c) {
/*
* get_revision_1() runs out the commits, and
* we are done computing the boundaries.
* switch to boundary commits output mode.
*/
revs->boundary = 2;
/*
* Update revs->commits to contain the list of
* boundary commits.
*/
create_boundary_commit_list(revs);
return get_revision_internal(revs);
revision walker: Fix --boundary when limited This cleans up the boundary processing in the commit walker. It - rips out the boundary logic from the commit walker. Placing "negative" commits in the revs->commits list was Ok if all we cared about "boundary" was the UNINTERESTING limiting case, but conceptually it was wrong. - makes get_revision_1() function to walk the commits and return the results as if there is no funny postprocessing flags such as --reverse, --skip nor --max-count. - makes get_revision() function the postprocessing phase: If reverse is given, wait for get_revision_1() to give everything that it would normally give, and then reverse it before consuming. If skip is given, skip that many before going further. If max is given, stop when we gave out that many. Now that we are about to return one positive commit, mark the parents of that commit to be potential boundaries before returning, iff we are doing the boundary processing. Return the commit. - After get_revision() finishes giving out all the positive commits, if we are doing the boundary processing, we look at the parents that we marked as potential boundaries earlier, see if they are really boundaries, and give them out. It loses more code than it adds, even when the new gc_boundary() function, which is purely for early optimization, is counted. Note that this patch is purely for eyeballing and discussion only. It breaks git-bundle's verify logic because the logic does not use BOUNDARY_SHOW flag for its internal computation anymore. After we correct it not to attempt to affect the boundary processing by setting the BOUNDARY_SHOW flag, we can remove BOUNDARY_SHOW from revision.h and use that bit assignment for the new CHILD_SHOWN flag. Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-06 05:10:06 +08:00
}
/*
* boundary commits are the commits that are parents of the
* ones we got from get_revision_1() but they themselves are
* not returned from get_revision_1(). Before returning
* 'c', we need to mark its parents that they could be boundaries.
*/
for (l = c->parents; l; l = l->next) {
struct object *p;
p = &(l->item->object);
if (p->flags & (CHILD_SHOWN | SHOWN))
revision walker: Fix --boundary when limited This cleans up the boundary processing in the commit walker. It - rips out the boundary logic from the commit walker. Placing "negative" commits in the revs->commits list was Ok if all we cared about "boundary" was the UNINTERESTING limiting case, but conceptually it was wrong. - makes get_revision_1() function to walk the commits and return the results as if there is no funny postprocessing flags such as --reverse, --skip nor --max-count. - makes get_revision() function the postprocessing phase: If reverse is given, wait for get_revision_1() to give everything that it would normally give, and then reverse it before consuming. If skip is given, skip that many before going further. If max is given, stop when we gave out that many. Now that we are about to return one positive commit, mark the parents of that commit to be potential boundaries before returning, iff we are doing the boundary processing. Return the commit. - After get_revision() finishes giving out all the positive commits, if we are doing the boundary processing, we look at the parents that we marked as potential boundaries earlier, see if they are really boundaries, and give them out. It loses more code than it adds, even when the new gc_boundary() function, which is purely for early optimization, is counted. Note that this patch is purely for eyeballing and discussion only. It breaks git-bundle's verify logic because the logic does not use BOUNDARY_SHOW flag for its internal computation anymore. After we correct it not to attempt to affect the boundary processing by setting the BOUNDARY_SHOW flag, we can remove BOUNDARY_SHOW from revision.h and use that bit assignment for the new CHILD_SHOWN flag. Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-06 05:10:06 +08:00
continue;
p->flags |= CHILD_SHOWN;
gc_boundary(&revs->boundary_commits);
add_object_array(p, NULL, &revs->boundary_commits);
}
return c;
}
struct commit *get_revision(struct rev_info *revs)
{
struct commit *c;
struct commit_list *reversed;
if (revs->reverse) {
reversed = NULL;
while ((c = get_revision_internal(revs))) {
commit_list_insert(c, &reversed);
}
revs->commits = reversed;
revs->reverse = 0;
revs->reverse_output_stage = 1;
}
if (revs->reverse_output_stage)
return pop_commit(&revs->commits);
c = get_revision_internal(revs);
if (c && revs->graph)
graph_update(revs->graph, c);
return c;
}