2009-10-09 18:21:57 +08:00
|
|
|
#include "cache.h"
|
|
|
|
#include "notes.h"
|
Refactor notes concatenation into a flexible interface for combining notes
When adding a note to an object that already has an existing note, the
current solution is to concatenate the contents of the two notes. However,
the caller may instead wish to _overwrite_ the existing note with the new
note, or maybe even _ignore_ the new note, and keep the existing one. There
might also be other ways of combining notes that are only known to the
caller.
Therefore, instead of unconditionally concatenating notes, we let the caller
specify how to combine notes, by passing in a pointer to a function for
combining notes. The caller may choose to implement its own function for
notes combining, but normally one of the following three conveniently
supplied notes combination functions will be sufficient:
- combine_notes_concatenate() combines the two notes by appending the
contents of the new note to the contents of the existing note.
- combine_notes_overwrite() replaces the existing note with the new note.
- combine_notes_ignore() keeps the existing note, and ignores the new note.
A combine_notes function can be passed to init_notes() to choose a default
combine_notes function for that notes tree. If NULL is given, the notes tree
falls back to combine_notes_concatenate() as the ultimate default.
A combine_notes function can also be passed directly to add_note(), to
control the notes combining behaviour for a note addition in particular.
If NULL is passed, the combine_notes function registered for the given
notes tree is used.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:19 +08:00
|
|
|
#include "blob.h"
|
2010-02-14 05:28:17 +08:00
|
|
|
#include "tree.h"
|
2009-10-09 18:21:57 +08:00
|
|
|
#include "utf8.h"
|
|
|
|
#include "strbuf.h"
|
2009-10-09 18:21:59 +08:00
|
|
|
#include "tree-walk.h"
|
|
|
|
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
/*
|
|
|
|
* Use a non-balancing simple 16-tree structure with struct int_node as
|
|
|
|
* internal nodes, and struct leaf_node as leaf nodes. Each int_node has a
|
|
|
|
* 16-array of pointers to its children.
|
|
|
|
* The bottom 2 bits of each pointer is used to identify the pointer type
|
|
|
|
* - ptr & 3 == 0 - NULL pointer, assert(ptr == NULL)
|
|
|
|
* - ptr & 3 == 1 - pointer to next internal node - cast to struct int_node *
|
|
|
|
* - ptr & 3 == 2 - pointer to note entry - cast to struct leaf_node *
|
|
|
|
* - ptr & 3 == 3 - pointer to subtree entry - cast to struct leaf_node *
|
|
|
|
*
|
|
|
|
* The root node is a statically allocated struct int_node.
|
|
|
|
*/
|
|
|
|
struct int_node {
|
|
|
|
void *a[16];
|
2009-10-09 18:21:59 +08:00
|
|
|
};
|
|
|
|
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
/*
|
|
|
|
* Leaf nodes come in two variants, note entries and subtree entries,
|
|
|
|
* distinguished by the LSb of the leaf node pointer (see above).
|
2010-02-14 05:28:10 +08:00
|
|
|
* As a note entry, the key is the SHA1 of the referenced object, and the
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
* value is the SHA1 of the note object.
|
|
|
|
* As a subtree entry, the key is the prefix SHA1 (w/trailing NULs) of the
|
2010-02-14 05:28:10 +08:00
|
|
|
* referenced object, using the last byte of the key to store the length of
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
* the prefix. The value is the SHA1 of the tree object containing the notes
|
|
|
|
* subtree.
|
|
|
|
*/
|
|
|
|
struct leaf_node {
|
|
|
|
unsigned char key_sha1[20];
|
|
|
|
unsigned char val_sha1[20];
|
2009-10-09 18:21:59 +08:00
|
|
|
};
|
2009-10-09 18:21:57 +08:00
|
|
|
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
/*
|
|
|
|
* A notes tree may contain entries that are not notes, and that do not follow
|
|
|
|
* the naming conventions of notes. There are typically none/few of these, but
|
|
|
|
* we still need to keep track of them. Keep a simple linked list sorted alpha-
|
|
|
|
* betically on the non-note path. The list is populated when parsing tree
|
|
|
|
* objects in load_subtree(), and the non-notes are correctly written back into
|
|
|
|
* the tree objects produced by write_notes_tree().
|
|
|
|
*/
|
|
|
|
struct non_note {
|
|
|
|
struct non_note *next; /* grounded (last->next == NULL) */
|
|
|
|
char *path;
|
|
|
|
unsigned int mode;
|
|
|
|
unsigned char sha1[20];
|
|
|
|
};
|
|
|
|
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
#define PTR_TYPE_NULL 0
|
|
|
|
#define PTR_TYPE_INTERNAL 1
|
|
|
|
#define PTR_TYPE_NOTE 2
|
|
|
|
#define PTR_TYPE_SUBTREE 3
|
2009-10-09 18:21:59 +08:00
|
|
|
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
#define GET_PTR_TYPE(ptr) ((uintptr_t) (ptr) & 3)
|
|
|
|
#define CLR_PTR_TYPE(ptr) ((void *) ((uintptr_t) (ptr) & ~3))
|
|
|
|
#define SET_PTR_TYPE(ptr, type) ((void *) ((uintptr_t) (ptr) | (type)))
|
2009-10-09 18:21:59 +08:00
|
|
|
|
2010-02-14 05:28:14 +08:00
|
|
|
#define GET_NIBBLE(n, sha1) (((sha1[(n) >> 1]) >> ((~(n) & 0x01) << 2)) & 0x0f)
|
2009-10-09 18:21:59 +08:00
|
|
|
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
#define SUBTREE_SHA1_PREFIXCMP(key_sha1, subtree_sha1) \
|
|
|
|
(memcmp(key_sha1, subtree_sha1, subtree_sha1[19]))
|
2009-10-09 18:21:59 +08:00
|
|
|
|
2010-02-14 05:28:18 +08:00
|
|
|
struct notes_tree default_notes_tree;
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
|
|
|
|
struct int_node *node, unsigned int n);
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
|
|
|
|
/*
|
2009-10-09 18:22:09 +08:00
|
|
|
* Search the tree until the appropriate location for the given key is found:
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
* 1. Start at the root node, with n = 0
|
2009-10-09 18:22:09 +08:00
|
|
|
* 2. If a[0] at the current level is a matching subtree entry, unpack that
|
|
|
|
* subtree entry and remove it; restart search at the current level.
|
|
|
|
* 3. Use the nth nibble of the key as an index into a:
|
|
|
|
* - If a[n] is an int_node, recurse from #2 into that node and increment n
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
* - If a matching subtree entry, unpack that subtree entry (and remove it);
|
|
|
|
* restart search at the current level.
|
2009-10-09 18:22:09 +08:00
|
|
|
* - Otherwise, we have found one of the following:
|
|
|
|
* - a subtree entry which does not match the key
|
|
|
|
* - a note entry which may or may not match the key
|
|
|
|
* - an unused leaf node (NULL)
|
|
|
|
* In any case, set *tree and *n, and return pointer to the tree location.
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
*/
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
static void **note_tree_search(struct notes_tree *t, struct int_node **tree,
|
2009-10-09 18:22:09 +08:00
|
|
|
unsigned char *n, const unsigned char *key_sha1)
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
{
|
|
|
|
struct leaf_node *l;
|
2009-10-09 18:22:09 +08:00
|
|
|
unsigned char i;
|
|
|
|
void *p = (*tree)->a[0];
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
|
2009-10-09 18:22:09 +08:00
|
|
|
if (GET_PTR_TYPE(p) == PTR_TYPE_SUBTREE) {
|
|
|
|
l = (struct leaf_node *) CLR_PTR_TYPE(p);
|
|
|
|
if (!SUBTREE_SHA1_PREFIXCMP(key_sha1, l->key_sha1)) {
|
|
|
|
/* unpack tree and resume search */
|
|
|
|
(*tree)->a[0] = NULL;
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
load_subtree(t, l, *tree, *n);
|
2009-10-09 18:22:09 +08:00
|
|
|
free(l);
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
return note_tree_search(t, tree, n, key_sha1);
|
2009-10-09 18:22:09 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
i = GET_NIBBLE(*n, key_sha1);
|
|
|
|
p = (*tree)->a[i];
|
2010-02-14 05:28:09 +08:00
|
|
|
switch (GET_PTR_TYPE(p)) {
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
case PTR_TYPE_INTERNAL:
|
2009-10-09 18:22:09 +08:00
|
|
|
*tree = CLR_PTR_TYPE(p);
|
|
|
|
(*n)++;
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
return note_tree_search(t, tree, n, key_sha1);
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
case PTR_TYPE_SUBTREE:
|
|
|
|
l = (struct leaf_node *) CLR_PTR_TYPE(p);
|
|
|
|
if (!SUBTREE_SHA1_PREFIXCMP(key_sha1, l->key_sha1)) {
|
|
|
|
/* unpack tree and resume search */
|
2009-10-09 18:22:09 +08:00
|
|
|
(*tree)->a[i] = NULL;
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
load_subtree(t, l, *tree, *n);
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
free(l);
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
return note_tree_search(t, tree, n, key_sha1);
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
}
|
2009-10-09 18:22:09 +08:00
|
|
|
/* fall through */
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
default:
|
2009-10-09 18:22:09 +08:00
|
|
|
return &((*tree)->a[i]);
|
2009-10-09 18:21:59 +08:00
|
|
|
}
|
2009-10-09 18:22:09 +08:00
|
|
|
}
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
|
2009-10-09 18:22:09 +08:00
|
|
|
/*
|
|
|
|
* To find a leaf_node:
|
|
|
|
* Search to the tree location appropriate for the given key:
|
|
|
|
* If a note entry with matching key, return the note entry, else return NULL.
|
|
|
|
*/
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
static struct leaf_node *note_tree_find(struct notes_tree *t,
|
|
|
|
struct int_node *tree, unsigned char n,
|
2009-10-09 18:22:09 +08:00
|
|
|
const unsigned char *key_sha1)
|
|
|
|
{
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
void **p = note_tree_search(t, &tree, &n, key_sha1);
|
2009-10-09 18:22:09 +08:00
|
|
|
if (GET_PTR_TYPE(*p) == PTR_TYPE_NOTE) {
|
|
|
|
struct leaf_node *l = (struct leaf_node *) CLR_PTR_TYPE(*p);
|
|
|
|
if (!hashcmp(key_sha1, l->key_sha1))
|
|
|
|
return l;
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
}
|
|
|
|
return NULL;
|
2009-10-09 18:21:59 +08:00
|
|
|
}
|
|
|
|
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
/*
|
|
|
|
* To insert a leaf_node:
|
2009-10-09 18:22:09 +08:00
|
|
|
* Search to the tree location appropriate for the given leaf_node's key:
|
|
|
|
* - If location is unused (NULL), store the tweaked pointer directly there
|
|
|
|
* - If location holds a note entry that matches the note-to-be-inserted, then
|
Refactor notes concatenation into a flexible interface for combining notes
When adding a note to an object that already has an existing note, the
current solution is to concatenate the contents of the two notes. However,
the caller may instead wish to _overwrite_ the existing note with the new
note, or maybe even _ignore_ the new note, and keep the existing one. There
might also be other ways of combining notes that are only known to the
caller.
Therefore, instead of unconditionally concatenating notes, we let the caller
specify how to combine notes, by passing in a pointer to a function for
combining notes. The caller may choose to implement its own function for
notes combining, but normally one of the following three conveniently
supplied notes combination functions will be sufficient:
- combine_notes_concatenate() combines the two notes by appending the
contents of the new note to the contents of the existing note.
- combine_notes_overwrite() replaces the existing note with the new note.
- combine_notes_ignore() keeps the existing note, and ignores the new note.
A combine_notes function can be passed to init_notes() to choose a default
combine_notes function for that notes tree. If NULL is given, the notes tree
falls back to combine_notes_concatenate() as the ultimate default.
A combine_notes function can also be passed directly to add_note(), to
control the notes combining behaviour for a note addition in particular.
If NULL is passed, the combine_notes function registered for the given
notes tree is used.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:19 +08:00
|
|
|
* combine the two notes (by calling the given combine_notes function).
|
2009-10-09 18:22:09 +08:00
|
|
|
* - If location holds a note entry that matches the subtree-to-be-inserted,
|
|
|
|
* then unpack the subtree-to-be-inserted into the location.
|
|
|
|
* - If location holds a matching subtree entry, unpack the subtree at that
|
|
|
|
* location, and restart the insert operation from that level.
|
|
|
|
* - Else, create a new int_node, holding both the node-at-location and the
|
|
|
|
* node-to-be-inserted, and store the new int_node into the location.
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
*/
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
static void note_tree_insert(struct notes_tree *t, struct int_node *tree,
|
|
|
|
unsigned char n, struct leaf_node *entry, unsigned char type,
|
Refactor notes concatenation into a flexible interface for combining notes
When adding a note to an object that already has an existing note, the
current solution is to concatenate the contents of the two notes. However,
the caller may instead wish to _overwrite_ the existing note with the new
note, or maybe even _ignore_ the new note, and keep the existing one. There
might also be other ways of combining notes that are only known to the
caller.
Therefore, instead of unconditionally concatenating notes, we let the caller
specify how to combine notes, by passing in a pointer to a function for
combining notes. The caller may choose to implement its own function for
notes combining, but normally one of the following three conveniently
supplied notes combination functions will be sufficient:
- combine_notes_concatenate() combines the two notes by appending the
contents of the new note to the contents of the existing note.
- combine_notes_overwrite() replaces the existing note with the new note.
- combine_notes_ignore() keeps the existing note, and ignores the new note.
A combine_notes function can be passed to init_notes() to choose a default
combine_notes function for that notes tree. If NULL is given, the notes tree
falls back to combine_notes_concatenate() as the ultimate default.
A combine_notes function can also be passed directly to add_note(), to
control the notes combining behaviour for a note addition in particular.
If NULL is passed, the combine_notes function registered for the given
notes tree is used.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:19 +08:00
|
|
|
combine_notes_fn combine_notes)
|
2009-10-09 18:21:59 +08:00
|
|
|
{
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
struct int_node *new_node;
|
2009-10-09 18:22:09 +08:00
|
|
|
struct leaf_node *l;
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
void **p = note_tree_search(t, &tree, &n, entry->key_sha1);
|
2009-10-09 18:22:09 +08:00
|
|
|
|
|
|
|
assert(GET_PTR_TYPE(entry) == 0); /* no type bits set */
|
|
|
|
l = (struct leaf_node *) CLR_PTR_TYPE(*p);
|
2010-02-14 05:28:09 +08:00
|
|
|
switch (GET_PTR_TYPE(*p)) {
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
case PTR_TYPE_NULL:
|
2009-10-09 18:22:09 +08:00
|
|
|
assert(!*p);
|
|
|
|
*p = SET_PTR_TYPE(entry, type);
|
|
|
|
return;
|
|
|
|
case PTR_TYPE_NOTE:
|
|
|
|
switch (type) {
|
|
|
|
case PTR_TYPE_NOTE:
|
|
|
|
if (!hashcmp(l->key_sha1, entry->key_sha1)) {
|
|
|
|
/* skip concatenation if l == entry */
|
|
|
|
if (!hashcmp(l->val_sha1, entry->val_sha1))
|
|
|
|
return;
|
|
|
|
|
Refactor notes concatenation into a flexible interface for combining notes
When adding a note to an object that already has an existing note, the
current solution is to concatenate the contents of the two notes. However,
the caller may instead wish to _overwrite_ the existing note with the new
note, or maybe even _ignore_ the new note, and keep the existing one. There
might also be other ways of combining notes that are only known to the
caller.
Therefore, instead of unconditionally concatenating notes, we let the caller
specify how to combine notes, by passing in a pointer to a function for
combining notes. The caller may choose to implement its own function for
notes combining, but normally one of the following three conveniently
supplied notes combination functions will be sufficient:
- combine_notes_concatenate() combines the two notes by appending the
contents of the new note to the contents of the existing note.
- combine_notes_overwrite() replaces the existing note with the new note.
- combine_notes_ignore() keeps the existing note, and ignores the new note.
A combine_notes function can be passed to init_notes() to choose a default
combine_notes function for that notes tree. If NULL is given, the notes tree
falls back to combine_notes_concatenate() as the ultimate default.
A combine_notes function can also be passed directly to add_note(), to
control the notes combining behaviour for a note addition in particular.
If NULL is passed, the combine_notes function registered for the given
notes tree is used.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:19 +08:00
|
|
|
if (combine_notes(l->val_sha1, entry->val_sha1))
|
|
|
|
die("failed to combine notes %s and %s"
|
|
|
|
" for object %s",
|
2009-10-09 18:22:09 +08:00
|
|
|
sha1_to_hex(l->val_sha1),
|
Refactor notes concatenation into a flexible interface for combining notes
When adding a note to an object that already has an existing note, the
current solution is to concatenate the contents of the two notes. However,
the caller may instead wish to _overwrite_ the existing note with the new
note, or maybe even _ignore_ the new note, and keep the existing one. There
might also be other ways of combining notes that are only known to the
caller.
Therefore, instead of unconditionally concatenating notes, we let the caller
specify how to combine notes, by passing in a pointer to a function for
combining notes. The caller may choose to implement its own function for
notes combining, but normally one of the following three conveniently
supplied notes combination functions will be sufficient:
- combine_notes_concatenate() combines the two notes by appending the
contents of the new note to the contents of the existing note.
- combine_notes_overwrite() replaces the existing note with the new note.
- combine_notes_ignore() keeps the existing note, and ignores the new note.
A combine_notes function can be passed to init_notes() to choose a default
combine_notes function for that notes tree. If NULL is given, the notes tree
falls back to combine_notes_concatenate() as the ultimate default.
A combine_notes function can also be passed directly to add_note(), to
control the notes combining behaviour for a note addition in particular.
If NULL is passed, the combine_notes function registered for the given
notes tree is used.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:19 +08:00
|
|
|
sha1_to_hex(entry->val_sha1),
|
2009-10-09 18:22:09 +08:00
|
|
|
sha1_to_hex(l->key_sha1));
|
|
|
|
free(entry);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case PTR_TYPE_SUBTREE:
|
|
|
|
if (!SUBTREE_SHA1_PREFIXCMP(l->key_sha1,
|
|
|
|
entry->key_sha1)) {
|
|
|
|
/* unpack 'entry' */
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
load_subtree(t, entry, tree, n);
|
2009-10-09 18:22:09 +08:00
|
|
|
free(entry);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case PTR_TYPE_SUBTREE:
|
|
|
|
if (!SUBTREE_SHA1_PREFIXCMP(entry->key_sha1, l->key_sha1)) {
|
|
|
|
/* unpack 'l' and restart insert */
|
|
|
|
*p = NULL;
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
load_subtree(t, l, tree, n);
|
2009-10-09 18:22:09 +08:00
|
|
|
free(l);
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
note_tree_insert(t, tree, n, entry, type,
|
|
|
|
combine_notes);
|
2009-10-09 18:22:09 +08:00
|
|
|
return;
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
}
|
2009-10-09 18:22:09 +08:00
|
|
|
break;
|
2009-10-09 18:21:59 +08:00
|
|
|
}
|
2009-10-09 18:22:09 +08:00
|
|
|
|
|
|
|
/* non-matching leaf_node */
|
|
|
|
assert(GET_PTR_TYPE(*p) == PTR_TYPE_NOTE ||
|
|
|
|
GET_PTR_TYPE(*p) == PTR_TYPE_SUBTREE);
|
|
|
|
new_node = (struct int_node *) xcalloc(sizeof(struct int_node), 1);
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
note_tree_insert(t, new_node, n + 1, l, GET_PTR_TYPE(*p),
|
|
|
|
combine_notes);
|
2009-10-09 18:22:09 +08:00
|
|
|
*p = SET_PTR_TYPE(new_node, PTR_TYPE_INTERNAL);
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
note_tree_insert(t, new_node, n + 1, entry, type, combine_notes);
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
}
|
2009-10-09 18:21:59 +08:00
|
|
|
|
2010-02-14 05:28:14 +08:00
|
|
|
/*
|
|
|
|
* How to consolidate an int_node:
|
|
|
|
* If there are > 1 non-NULL entries, give up and return non-zero.
|
|
|
|
* Otherwise replace the int_node at the given index in the given parent node
|
|
|
|
* with the only entry (or a NULL entry if no entries) from the given tree,
|
|
|
|
* and return 0.
|
|
|
|
*/
|
|
|
|
static int note_tree_consolidate(struct int_node *tree,
|
|
|
|
struct int_node *parent, unsigned char index)
|
|
|
|
{
|
|
|
|
unsigned int i;
|
|
|
|
void *p = NULL;
|
|
|
|
|
|
|
|
assert(tree && parent);
|
|
|
|
assert(CLR_PTR_TYPE(parent->a[index]) == tree);
|
|
|
|
|
|
|
|
for (i = 0; i < 16; i++) {
|
|
|
|
if (GET_PTR_TYPE(tree->a[i]) != PTR_TYPE_NULL) {
|
|
|
|
if (p) /* more than one entry */
|
|
|
|
return -2;
|
|
|
|
p = tree->a[i];
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* replace tree with p in parent[index] */
|
|
|
|
parent->a[index] = p;
|
|
|
|
free(tree);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* To remove a leaf_node:
|
|
|
|
* Search to the tree location appropriate for the given leaf_node's key:
|
|
|
|
* - If location does not hold a matching entry, abort and do nothing.
|
|
|
|
* - Replace the matching leaf_node with a NULL entry (and free the leaf_node).
|
|
|
|
* - Consolidate int_nodes repeatedly, while walking up the tree towards root.
|
|
|
|
*/
|
2010-02-14 05:28:18 +08:00
|
|
|
static void note_tree_remove(struct notes_tree *t, struct int_node *tree,
|
|
|
|
unsigned char n, struct leaf_node *entry)
|
2010-02-14 05:28:14 +08:00
|
|
|
{
|
|
|
|
struct leaf_node *l;
|
|
|
|
struct int_node *parent_stack[20];
|
|
|
|
unsigned char i, j;
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
void **p = note_tree_search(t, &tree, &n, entry->key_sha1);
|
2010-02-14 05:28:14 +08:00
|
|
|
|
|
|
|
assert(GET_PTR_TYPE(entry) == 0); /* no type bits set */
|
|
|
|
if (GET_PTR_TYPE(*p) != PTR_TYPE_NOTE)
|
|
|
|
return; /* type mismatch, nothing to remove */
|
|
|
|
l = (struct leaf_node *) CLR_PTR_TYPE(*p);
|
|
|
|
if (hashcmp(l->key_sha1, entry->key_sha1))
|
|
|
|
return; /* key mismatch, nothing to remove */
|
|
|
|
|
|
|
|
/* we have found a matching entry */
|
|
|
|
free(l);
|
|
|
|
*p = SET_PTR_TYPE(NULL, PTR_TYPE_NULL);
|
|
|
|
|
|
|
|
/* consolidate this tree level, and parent levels, if possible */
|
|
|
|
if (!n)
|
|
|
|
return; /* cannot consolidate top level */
|
|
|
|
/* first, build stack of ancestors between root and current node */
|
2010-02-14 05:28:18 +08:00
|
|
|
parent_stack[0] = t->root;
|
2010-02-14 05:28:14 +08:00
|
|
|
for (i = 0; i < n; i++) {
|
|
|
|
j = GET_NIBBLE(i, entry->key_sha1);
|
|
|
|
parent_stack[i + 1] = CLR_PTR_TYPE(parent_stack[i]->a[j]);
|
|
|
|
}
|
|
|
|
assert(i == n && parent_stack[i] == tree);
|
|
|
|
/* next, unwind stack until note_tree_consolidate() is done */
|
|
|
|
while (i > 0 &&
|
|
|
|
!note_tree_consolidate(parent_stack[i], parent_stack[i - 1],
|
|
|
|
GET_NIBBLE(i - 1, entry->key_sha1)))
|
|
|
|
i--;
|
|
|
|
}
|
|
|
|
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
/* Free the entire notes data contained in the given tree */
|
|
|
|
static void note_tree_free(struct int_node *tree)
|
|
|
|
{
|
|
|
|
unsigned int i;
|
|
|
|
for (i = 0; i < 16; i++) {
|
|
|
|
void *p = tree->a[i];
|
2010-02-14 05:28:09 +08:00
|
|
|
switch (GET_PTR_TYPE(p)) {
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
case PTR_TYPE_INTERNAL:
|
|
|
|
note_tree_free(CLR_PTR_TYPE(p));
|
|
|
|
/* fall through */
|
|
|
|
case PTR_TYPE_NOTE:
|
|
|
|
case PTR_TYPE_SUBTREE:
|
|
|
|
free(CLR_PTR_TYPE(p));
|
|
|
|
}
|
2009-10-09 18:21:59 +08:00
|
|
|
}
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
}
|
2009-10-09 18:21:59 +08:00
|
|
|
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
/*
|
|
|
|
* Convert a partial SHA1 hex string to the corresponding partial SHA1 value.
|
|
|
|
* - hex - Partial SHA1 segment in ASCII hex format
|
|
|
|
* - hex_len - Length of above segment. Must be multiple of 2 between 0 and 40
|
|
|
|
* - sha1 - Partial SHA1 value is written here
|
|
|
|
* - sha1_len - Max #bytes to store in sha1, Must be >= hex_len / 2, and < 20
|
2010-02-14 05:28:09 +08:00
|
|
|
* Returns -1 on error (invalid arguments or invalid SHA1 (not in hex format)).
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
* Otherwise, returns number of bytes written to sha1 (i.e. hex_len / 2).
|
|
|
|
* Pads sha1 with NULs up to sha1_len (not included in returned length).
|
|
|
|
*/
|
|
|
|
static int get_sha1_hex_segment(const char *hex, unsigned int hex_len,
|
|
|
|
unsigned char *sha1, unsigned int sha1_len)
|
|
|
|
{
|
|
|
|
unsigned int i, len = hex_len >> 1;
|
|
|
|
if (hex_len % 2 != 0 || len > sha1_len)
|
|
|
|
return -1;
|
|
|
|
for (i = 0; i < len; i++) {
|
|
|
|
unsigned int val = (hexval(hex[0]) << 4) | hexval(hex[1]);
|
|
|
|
if (val & ~0xff)
|
|
|
|
return -1;
|
|
|
|
*sha1++ = val;
|
|
|
|
hex += 2;
|
|
|
|
}
|
|
|
|
for (; i < sha1_len; i++)
|
|
|
|
*sha1++ = 0;
|
|
|
|
return len;
|
2009-10-09 18:21:59 +08:00
|
|
|
}
|
|
|
|
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
static int non_note_cmp(const struct non_note *a, const struct non_note *b)
|
|
|
|
{
|
|
|
|
return strcmp(a->path, b->path);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void add_non_note(struct notes_tree *t, const char *path,
|
|
|
|
unsigned int mode, const unsigned char *sha1)
|
|
|
|
{
|
|
|
|
struct non_note *p = t->prev_non_note, *n;
|
|
|
|
n = (struct non_note *) xmalloc(sizeof(struct non_note));
|
|
|
|
n->next = NULL;
|
|
|
|
n->path = xstrdup(path);
|
|
|
|
n->mode = mode;
|
|
|
|
hashcpy(n->sha1, sha1);
|
|
|
|
t->prev_non_note = n;
|
|
|
|
|
|
|
|
if (!t->first_non_note) {
|
|
|
|
t->first_non_note = n;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (non_note_cmp(p, n) < 0)
|
|
|
|
; /* do nothing */
|
|
|
|
else if (non_note_cmp(t->first_non_note, n) <= 0)
|
|
|
|
p = t->first_non_note;
|
|
|
|
else {
|
|
|
|
/* n sorts before t->first_non_note */
|
|
|
|
n->next = t->first_non_note;
|
|
|
|
t->first_non_note = n;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* n sorts equal or after p */
|
|
|
|
while (p->next && non_note_cmp(p->next, n) <= 0)
|
|
|
|
p = p->next;
|
|
|
|
|
|
|
|
if (non_note_cmp(p, n) == 0) { /* n ~= p; overwrite p with n */
|
|
|
|
assert(strcmp(p->path, n->path) == 0);
|
|
|
|
p->mode = n->mode;
|
|
|
|
hashcpy(p->sha1, n->sha1);
|
|
|
|
free(n);
|
|
|
|
t->prev_non_note = p;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* n sorts between p and p->next */
|
|
|
|
n->next = p->next;
|
|
|
|
p->next = n;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
|
|
|
|
struct int_node *node, unsigned int n)
|
2009-10-09 18:21:59 +08:00
|
|
|
{
|
2010-02-14 05:28:10 +08:00
|
|
|
unsigned char object_sha1[20];
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
unsigned int prefix_len;
|
|
|
|
void *buf;
|
2009-10-09 18:21:59 +08:00
|
|
|
struct tree_desc desc;
|
|
|
|
struct name_entry entry;
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
int len, path_len;
|
|
|
|
unsigned char type;
|
|
|
|
struct leaf_node *l;
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
|
|
|
|
buf = fill_tree_descriptor(&desc, subtree->val_sha1);
|
|
|
|
if (!buf)
|
|
|
|
die("Could not read %s for notes-index",
|
|
|
|
sha1_to_hex(subtree->val_sha1));
|
|
|
|
|
|
|
|
prefix_len = subtree->key_sha1[19];
|
|
|
|
assert(prefix_len * 2 >= n);
|
2010-02-14 05:28:10 +08:00
|
|
|
memcpy(object_sha1, subtree->key_sha1, prefix_len);
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
while (tree_entry(&desc, &entry)) {
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
path_len = strlen(entry.path);
|
|
|
|
len = get_sha1_hex_segment(entry.path, path_len,
|
2010-02-14 05:28:10 +08:00
|
|
|
object_sha1 + prefix_len, 20 - prefix_len);
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
if (len < 0)
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
goto handle_non_note; /* entry.path is not a SHA1 */
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
len += prefix_len;
|
|
|
|
|
|
|
|
/*
|
2010-02-14 05:28:10 +08:00
|
|
|
* If object SHA1 is complete (len == 20), assume note object
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
* If object SHA1 is incomplete (len < 20), and current
|
|
|
|
* component consists of 2 hex chars, assume note subtree
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
*/
|
|
|
|
if (len <= 20) {
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
type = PTR_TYPE_NOTE;
|
|
|
|
l = (struct leaf_node *)
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
xcalloc(sizeof(struct leaf_node), 1);
|
2010-02-14 05:28:10 +08:00
|
|
|
hashcpy(l->key_sha1, object_sha1);
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
hashcpy(l->val_sha1, entry.sha1);
|
|
|
|
if (len < 20) {
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
if (!S_ISDIR(entry.mode) || path_len != 2)
|
|
|
|
goto handle_non_note; /* not subtree */
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
l->key_sha1[19] = (unsigned char) len;
|
|
|
|
type = PTR_TYPE_SUBTREE;
|
|
|
|
}
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
note_tree_insert(t, node, n, l, type,
|
Refactor notes concatenation into a flexible interface for combining notes
When adding a note to an object that already has an existing note, the
current solution is to concatenate the contents of the two notes. However,
the caller may instead wish to _overwrite_ the existing note with the new
note, or maybe even _ignore_ the new note, and keep the existing one. There
might also be other ways of combining notes that are only known to the
caller.
Therefore, instead of unconditionally concatenating notes, we let the caller
specify how to combine notes, by passing in a pointer to a function for
combining notes. The caller may choose to implement its own function for
notes combining, but normally one of the following three conveniently
supplied notes combination functions will be sufficient:
- combine_notes_concatenate() combines the two notes by appending the
contents of the new note to the contents of the existing note.
- combine_notes_overwrite() replaces the existing note with the new note.
- combine_notes_ignore() keeps the existing note, and ignores the new note.
A combine_notes function can be passed to init_notes() to choose a default
combine_notes function for that notes tree. If NULL is given, the notes tree
falls back to combine_notes_concatenate() as the ultimate default.
A combine_notes function can also be passed directly to add_note(), to
control the notes combining behaviour for a note addition in particular.
If NULL is passed, the combine_notes function registered for the given
notes tree is used.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:19 +08:00
|
|
|
combine_notes_concatenate);
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
}
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
handle_non_note:
|
|
|
|
/*
|
|
|
|
* Determine full path for this non-note entry:
|
|
|
|
* The filename is already found in entry.path, but the
|
|
|
|
* directory part of the path must be deduced from the subtree
|
|
|
|
* containing this entry. We assume here that the overall notes
|
|
|
|
* tree follows a strict byte-based progressive fanout
|
|
|
|
* structure (i.e. using 2/38, 2/2/36, etc. fanouts, and not
|
|
|
|
* e.g. 4/36 fanout). This means that if a non-note is found at
|
|
|
|
* path "dead/beef", the following code will register it as
|
|
|
|
* being found on "de/ad/beef".
|
|
|
|
* On the other hand, if you use such non-obvious non-note
|
|
|
|
* paths in the middle of a notes tree, you deserve what's
|
|
|
|
* coming to you ;). Note that for non-notes that are not
|
|
|
|
* SHA1-like at the top level, there will be no problems.
|
|
|
|
*
|
|
|
|
* To conclude, it is strongly advised to make sure non-notes
|
|
|
|
* have at least one non-hex character in the top-level path
|
|
|
|
* component.
|
|
|
|
*/
|
|
|
|
{
|
|
|
|
char non_note_path[PATH_MAX];
|
|
|
|
char *p = non_note_path;
|
|
|
|
const char *q = sha1_to_hex(subtree->key_sha1);
|
|
|
|
int i;
|
|
|
|
for (i = 0; i < prefix_len; i++) {
|
|
|
|
*p++ = *q++;
|
|
|
|
*p++ = *q++;
|
|
|
|
*p++ = '/';
|
|
|
|
}
|
|
|
|
strcpy(p, entry.path);
|
|
|
|
add_non_note(t, non_note_path, entry.mode, entry.sha1);
|
|
|
|
}
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
}
|
|
|
|
free(buf);
|
|
|
|
}
|
|
|
|
|
2010-02-14 05:28:16 +08:00
|
|
|
/*
|
|
|
|
* Determine optimal on-disk fanout for this part of the notes tree
|
|
|
|
*
|
|
|
|
* Given a (sub)tree and the level in the internal tree structure, determine
|
|
|
|
* whether or not the given existing fanout should be expanded for this
|
|
|
|
* (sub)tree.
|
|
|
|
*
|
|
|
|
* Values of the 'fanout' variable:
|
|
|
|
* - 0: No fanout (all notes are stored directly in the root notes tree)
|
|
|
|
* - 1: 2/38 fanout
|
|
|
|
* - 2: 2/2/36 fanout
|
|
|
|
* - 3: 2/2/2/34 fanout
|
|
|
|
* etc.
|
|
|
|
*/
|
|
|
|
static unsigned char determine_fanout(struct int_node *tree, unsigned char n,
|
|
|
|
unsigned char fanout)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* The following is a simple heuristic that works well in practice:
|
|
|
|
* For each even-numbered 16-tree level (remember that each on-disk
|
|
|
|
* fanout level corresponds to _two_ 16-tree levels), peek at all 16
|
|
|
|
* entries at that tree level. If all of them are either int_nodes or
|
|
|
|
* subtree entries, then there are likely plenty of notes below this
|
|
|
|
* level, so we return an incremented fanout.
|
|
|
|
*/
|
|
|
|
unsigned int i;
|
|
|
|
if ((n % 2) || (n > 2 * fanout))
|
|
|
|
return fanout;
|
|
|
|
for (i = 0; i < 16; i++) {
|
|
|
|
switch (GET_PTR_TYPE(tree->a[i])) {
|
|
|
|
case PTR_TYPE_SUBTREE:
|
|
|
|
case PTR_TYPE_INTERNAL:
|
|
|
|
continue;
|
|
|
|
default:
|
|
|
|
return fanout;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return fanout + 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void construct_path_with_fanout(const unsigned char *sha1,
|
|
|
|
unsigned char fanout, char *path)
|
|
|
|
{
|
|
|
|
unsigned int i = 0, j = 0;
|
|
|
|
const char *hex_sha1 = sha1_to_hex(sha1);
|
|
|
|
assert(fanout < 20);
|
|
|
|
while (fanout) {
|
|
|
|
path[i++] = hex_sha1[j++];
|
|
|
|
path[i++] = hex_sha1[j++];
|
|
|
|
path[i++] = '/';
|
|
|
|
fanout--;
|
|
|
|
}
|
|
|
|
strcpy(path + i, hex_sha1 + j);
|
|
|
|
}
|
|
|
|
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
static int for_each_note_helper(struct notes_tree *t, struct int_node *tree,
|
|
|
|
unsigned char n, unsigned char fanout, int flags,
|
|
|
|
each_note_fn fn, void *cb_data)
|
2010-02-14 05:28:16 +08:00
|
|
|
{
|
|
|
|
unsigned int i;
|
|
|
|
void *p;
|
|
|
|
int ret = 0;
|
|
|
|
struct leaf_node *l;
|
|
|
|
static char path[40 + 19 + 1]; /* hex SHA1 + 19 * '/' + NUL */
|
|
|
|
|
|
|
|
fanout = determine_fanout(tree, n, fanout);
|
|
|
|
for (i = 0; i < 16; i++) {
|
|
|
|
redo:
|
|
|
|
p = tree->a[i];
|
|
|
|
switch (GET_PTR_TYPE(p)) {
|
|
|
|
case PTR_TYPE_INTERNAL:
|
|
|
|
/* recurse into int_node */
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
ret = for_each_note_helper(t, CLR_PTR_TYPE(p), n + 1,
|
2010-02-14 05:28:16 +08:00
|
|
|
fanout, flags, fn, cb_data);
|
|
|
|
break;
|
|
|
|
case PTR_TYPE_SUBTREE:
|
|
|
|
l = (struct leaf_node *) CLR_PTR_TYPE(p);
|
|
|
|
/*
|
|
|
|
* Subtree entries in the note tree represent parts of
|
|
|
|
* the note tree that have not yet been explored. There
|
|
|
|
* is a direct relationship between subtree entries at
|
|
|
|
* level 'n' in the tree, and the 'fanout' variable:
|
|
|
|
* Subtree entries at level 'n <= 2 * fanout' should be
|
|
|
|
* preserved, since they correspond exactly to a fanout
|
|
|
|
* directory in the on-disk structure. However, subtree
|
|
|
|
* entries at level 'n > 2 * fanout' should NOT be
|
|
|
|
* preserved, but rather consolidated into the above
|
|
|
|
* notes tree level. We achieve this by unconditionally
|
|
|
|
* unpacking subtree entries that exist below the
|
|
|
|
* threshold level at 'n = 2 * fanout'.
|
|
|
|
*/
|
|
|
|
if (n <= 2 * fanout &&
|
|
|
|
flags & FOR_EACH_NOTE_YIELD_SUBTREES) {
|
|
|
|
/* invoke callback with subtree */
|
|
|
|
unsigned int path_len =
|
|
|
|
l->key_sha1[19] * 2 + fanout;
|
|
|
|
assert(path_len < 40 + 19);
|
|
|
|
construct_path_with_fanout(l->key_sha1, fanout,
|
|
|
|
path);
|
|
|
|
/* Create trailing slash, if needed */
|
|
|
|
if (path[path_len - 1] != '/')
|
|
|
|
path[path_len++] = '/';
|
|
|
|
path[path_len] = '\0';
|
|
|
|
ret = fn(l->key_sha1, l->val_sha1, path,
|
|
|
|
cb_data);
|
|
|
|
}
|
|
|
|
if (n > fanout * 2 ||
|
|
|
|
!(flags & FOR_EACH_NOTE_DONT_UNPACK_SUBTREES)) {
|
|
|
|
/* unpack subtree and resume traversal */
|
|
|
|
tree->a[i] = NULL;
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
load_subtree(t, l, tree, n);
|
2010-02-14 05:28:16 +08:00
|
|
|
free(l);
|
|
|
|
goto redo;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case PTR_TYPE_NOTE:
|
|
|
|
l = (struct leaf_node *) CLR_PTR_TYPE(p);
|
|
|
|
construct_path_with_fanout(l->key_sha1, fanout, path);
|
|
|
|
ret = fn(l->key_sha1, l->val_sha1, path, cb_data);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-02-14 05:28:17 +08:00
|
|
|
struct tree_write_stack {
|
|
|
|
struct tree_write_stack *next;
|
|
|
|
struct strbuf buf;
|
|
|
|
char path[2]; /* path to subtree in next, if any */
|
|
|
|
};
|
|
|
|
|
|
|
|
static inline int matches_tree_write_stack(struct tree_write_stack *tws,
|
|
|
|
const char *full_path)
|
|
|
|
{
|
|
|
|
return full_path[0] == tws->path[0] &&
|
|
|
|
full_path[1] == tws->path[1] &&
|
|
|
|
full_path[2] == '/';
|
|
|
|
}
|
|
|
|
|
|
|
|
static void write_tree_entry(struct strbuf *buf, unsigned int mode,
|
|
|
|
const char *path, unsigned int path_len, const
|
|
|
|
unsigned char *sha1)
|
|
|
|
{
|
|
|
|
strbuf_addf(buf, "%06o %.*s%c", mode, path_len, path, '\0');
|
|
|
|
strbuf_add(buf, sha1, 20);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void tree_write_stack_init_subtree(struct tree_write_stack *tws,
|
|
|
|
const char *path)
|
|
|
|
{
|
|
|
|
struct tree_write_stack *n;
|
|
|
|
assert(!tws->next);
|
|
|
|
assert(tws->path[0] == '\0' && tws->path[1] == '\0');
|
|
|
|
n = (struct tree_write_stack *)
|
|
|
|
xmalloc(sizeof(struct tree_write_stack));
|
|
|
|
n->next = NULL;
|
|
|
|
strbuf_init(&n->buf, 256 * (32 + 40)); /* assume 256 entries per tree */
|
|
|
|
n->path[0] = n->path[1] = '\0';
|
|
|
|
tws->next = n;
|
|
|
|
tws->path[0] = path[0];
|
|
|
|
tws->path[1] = path[1];
|
|
|
|
}
|
|
|
|
|
|
|
|
static int tree_write_stack_finish_subtree(struct tree_write_stack *tws)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
struct tree_write_stack *n = tws->next;
|
|
|
|
unsigned char s[20];
|
|
|
|
if (n) {
|
|
|
|
ret = tree_write_stack_finish_subtree(n);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
ret = write_sha1_file(n->buf.buf, n->buf.len, tree_type, s);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
strbuf_release(&n->buf);
|
|
|
|
free(n);
|
|
|
|
tws->next = NULL;
|
|
|
|
write_tree_entry(&tws->buf, 040000, tws->path, 2, s);
|
|
|
|
tws->path[0] = tws->path[1] = '\0';
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int write_each_note_helper(struct tree_write_stack *tws,
|
|
|
|
const char *path, unsigned int mode,
|
|
|
|
const unsigned char *sha1)
|
|
|
|
{
|
|
|
|
size_t path_len = strlen(path);
|
|
|
|
unsigned int n = 0;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
/* Determine common part of tree write stack */
|
|
|
|
while (tws && 3 * n < path_len &&
|
|
|
|
matches_tree_write_stack(tws, path + 3 * n)) {
|
|
|
|
n++;
|
|
|
|
tws = tws->next;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* tws point to last matching tree_write_stack entry */
|
|
|
|
ret = tree_write_stack_finish_subtree(tws);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
/* Start subtrees needed to satisfy path */
|
|
|
|
while (3 * n + 2 < path_len && path[3 * n + 2] == '/') {
|
|
|
|
tree_write_stack_init_subtree(tws, path + 3 * n);
|
|
|
|
n++;
|
|
|
|
tws = tws->next;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* There should be no more directory components in the given path */
|
|
|
|
assert(memchr(path + 3 * n, '/', path_len - (3 * n)) == NULL);
|
|
|
|
|
|
|
|
/* Finally add given entry to the current tree object */
|
|
|
|
write_tree_entry(&tws->buf, mode, path + 3 * n, path_len - (3 * n),
|
|
|
|
sha1);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct write_each_note_data {
|
|
|
|
struct tree_write_stack *root;
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
struct non_note *next_non_note;
|
2010-02-14 05:28:17 +08:00
|
|
|
};
|
|
|
|
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
static int write_each_non_note_until(const char *note_path,
|
|
|
|
struct write_each_note_data *d)
|
|
|
|
{
|
|
|
|
struct non_note *n = d->next_non_note;
|
|
|
|
int cmp, ret;
|
|
|
|
while (n && (!note_path || (cmp = strcmp(n->path, note_path)) <= 0)) {
|
|
|
|
if (note_path && cmp == 0)
|
|
|
|
; /* do nothing, prefer note to non-note */
|
|
|
|
else {
|
|
|
|
ret = write_each_note_helper(d->root, n->path, n->mode,
|
|
|
|
n->sha1);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
n = n->next;
|
|
|
|
}
|
|
|
|
d->next_non_note = n;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-02-14 05:28:17 +08:00
|
|
|
static int write_each_note(const unsigned char *object_sha1,
|
|
|
|
const unsigned char *note_sha1, char *note_path,
|
|
|
|
void *cb_data)
|
|
|
|
{
|
|
|
|
struct write_each_note_data *d =
|
|
|
|
(struct write_each_note_data *) cb_data;
|
|
|
|
size_t note_path_len = strlen(note_path);
|
|
|
|
unsigned int mode = 0100644;
|
|
|
|
|
|
|
|
if (note_path[note_path_len - 1] == '/') {
|
|
|
|
/* subtree entry */
|
|
|
|
note_path_len--;
|
|
|
|
note_path[note_path_len] = '\0';
|
|
|
|
mode = 040000;
|
|
|
|
}
|
|
|
|
assert(note_path_len <= 40 + 19);
|
|
|
|
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
/* Weave non-note entries into note entries */
|
|
|
|
return write_each_non_note_until(note_path, d) ||
|
|
|
|
write_each_note_helper(d->root, note_path, mode, note_sha1);
|
2010-02-14 05:28:17 +08:00
|
|
|
}
|
|
|
|
|
2010-02-14 05:28:27 +08:00
|
|
|
struct note_delete_list {
|
|
|
|
struct note_delete_list *next;
|
|
|
|
const unsigned char *sha1;
|
|
|
|
};
|
|
|
|
|
|
|
|
static int prune_notes_helper(const unsigned char *object_sha1,
|
|
|
|
const unsigned char *note_sha1, char *note_path,
|
|
|
|
void *cb_data)
|
|
|
|
{
|
|
|
|
struct note_delete_list **l = (struct note_delete_list **) cb_data;
|
|
|
|
struct note_delete_list *n;
|
|
|
|
|
|
|
|
if (has_sha1_file(object_sha1))
|
|
|
|
return 0; /* nothing to do for this note */
|
|
|
|
|
|
|
|
/* failed to find object => prune this note */
|
|
|
|
n = (struct note_delete_list *) xmalloc(sizeof(*n));
|
|
|
|
n->next = *l;
|
|
|
|
n->sha1 = object_sha1;
|
|
|
|
*l = n;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
Refactor notes concatenation into a flexible interface for combining notes
When adding a note to an object that already has an existing note, the
current solution is to concatenate the contents of the two notes. However,
the caller may instead wish to _overwrite_ the existing note with the new
note, or maybe even _ignore_ the new note, and keep the existing one. There
might also be other ways of combining notes that are only known to the
caller.
Therefore, instead of unconditionally concatenating notes, we let the caller
specify how to combine notes, by passing in a pointer to a function for
combining notes. The caller may choose to implement its own function for
notes combining, but normally one of the following three conveniently
supplied notes combination functions will be sufficient:
- combine_notes_concatenate() combines the two notes by appending the
contents of the new note to the contents of the existing note.
- combine_notes_overwrite() replaces the existing note with the new note.
- combine_notes_ignore() keeps the existing note, and ignores the new note.
A combine_notes function can be passed to init_notes() to choose a default
combine_notes function for that notes tree. If NULL is given, the notes tree
falls back to combine_notes_concatenate() as the ultimate default.
A combine_notes function can also be passed directly to add_note(), to
control the notes combining behaviour for a note addition in particular.
If NULL is passed, the combine_notes function registered for the given
notes tree is used.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:19 +08:00
|
|
|
int combine_notes_concatenate(unsigned char *cur_sha1,
|
|
|
|
const unsigned char *new_sha1)
|
|
|
|
{
|
|
|
|
char *cur_msg = NULL, *new_msg = NULL, *buf;
|
|
|
|
unsigned long cur_len, new_len, buf_len;
|
|
|
|
enum object_type cur_type, new_type;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
/* read in both note blob objects */
|
|
|
|
if (!is_null_sha1(new_sha1))
|
|
|
|
new_msg = read_sha1_file(new_sha1, &new_type, &new_len);
|
|
|
|
if (!new_msg || !new_len || new_type != OBJ_BLOB) {
|
|
|
|
free(new_msg);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
if (!is_null_sha1(cur_sha1))
|
|
|
|
cur_msg = read_sha1_file(cur_sha1, &cur_type, &cur_len);
|
|
|
|
if (!cur_msg || !cur_len || cur_type != OBJ_BLOB) {
|
|
|
|
free(cur_msg);
|
|
|
|
free(new_msg);
|
|
|
|
hashcpy(cur_sha1, new_sha1);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* we will separate the notes by a newline anyway */
|
|
|
|
if (cur_msg[cur_len - 1] == '\n')
|
|
|
|
cur_len--;
|
|
|
|
|
|
|
|
/* concatenate cur_msg and new_msg into buf */
|
|
|
|
buf_len = cur_len + 1 + new_len;
|
|
|
|
buf = (char *) xmalloc(buf_len);
|
|
|
|
memcpy(buf, cur_msg, cur_len);
|
|
|
|
buf[cur_len] = '\n';
|
|
|
|
memcpy(buf + cur_len + 1, new_msg, new_len);
|
|
|
|
free(cur_msg);
|
|
|
|
free(new_msg);
|
|
|
|
|
|
|
|
/* create a new blob object from buf */
|
|
|
|
ret = write_sha1_file(buf, buf_len, blob_type, cur_sha1);
|
|
|
|
free(buf);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
int combine_notes_overwrite(unsigned char *cur_sha1,
|
|
|
|
const unsigned char *new_sha1)
|
|
|
|
{
|
|
|
|
hashcpy(cur_sha1, new_sha1);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
int combine_notes_ignore(unsigned char *cur_sha1,
|
|
|
|
const unsigned char *new_sha1)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
void init_notes(struct notes_tree *t, const char *notes_ref,
|
|
|
|
combine_notes_fn combine_notes, int flags)
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
{
|
2010-02-14 05:28:10 +08:00
|
|
|
unsigned char sha1[20], object_sha1[20];
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
unsigned mode;
|
|
|
|
struct leaf_node root_tree;
|
2009-10-09 18:21:59 +08:00
|
|
|
|
2010-02-14 05:28:18 +08:00
|
|
|
if (!t)
|
|
|
|
t = &default_notes_tree;
|
|
|
|
assert(!t->initialized);
|
2010-02-14 05:28:12 +08:00
|
|
|
|
|
|
|
if (!notes_ref)
|
|
|
|
notes_ref = getenv(GIT_NOTES_REF_ENVIRONMENT);
|
|
|
|
if (!notes_ref)
|
|
|
|
notes_ref = notes_ref_name; /* value of core.notesRef config */
|
|
|
|
if (!notes_ref)
|
|
|
|
notes_ref = GIT_NOTES_DEFAULT_REF;
|
|
|
|
|
Refactor notes concatenation into a flexible interface for combining notes
When adding a note to an object that already has an existing note, the
current solution is to concatenate the contents of the two notes. However,
the caller may instead wish to _overwrite_ the existing note with the new
note, or maybe even _ignore_ the new note, and keep the existing one. There
might also be other ways of combining notes that are only known to the
caller.
Therefore, instead of unconditionally concatenating notes, we let the caller
specify how to combine notes, by passing in a pointer to a function for
combining notes. The caller may choose to implement its own function for
notes combining, but normally one of the following three conveniently
supplied notes combination functions will be sufficient:
- combine_notes_concatenate() combines the two notes by appending the
contents of the new note to the contents of the existing note.
- combine_notes_overwrite() replaces the existing note with the new note.
- combine_notes_ignore() keeps the existing note, and ignores the new note.
A combine_notes function can be passed to init_notes() to choose a default
combine_notes function for that notes tree. If NULL is given, the notes tree
falls back to combine_notes_concatenate() as the ultimate default.
A combine_notes function can also be passed directly to add_note(), to
control the notes combining behaviour for a note addition in particular.
If NULL is passed, the combine_notes function registered for the given
notes tree is used.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:19 +08:00
|
|
|
if (!combine_notes)
|
|
|
|
combine_notes = combine_notes_concatenate;
|
|
|
|
|
2010-02-14 05:28:18 +08:00
|
|
|
t->root = (struct int_node *) xcalloc(sizeof(struct int_node), 1);
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
t->first_non_note = NULL;
|
|
|
|
t->prev_non_note = NULL;
|
2010-02-14 05:28:18 +08:00
|
|
|
t->ref = notes_ref ? xstrdup(notes_ref) : NULL;
|
Refactor notes concatenation into a flexible interface for combining notes
When adding a note to an object that already has an existing note, the
current solution is to concatenate the contents of the two notes. However,
the caller may instead wish to _overwrite_ the existing note with the new
note, or maybe even _ignore_ the new note, and keep the existing one. There
might also be other ways of combining notes that are only known to the
caller.
Therefore, instead of unconditionally concatenating notes, we let the caller
specify how to combine notes, by passing in a pointer to a function for
combining notes. The caller may choose to implement its own function for
notes combining, but normally one of the following three conveniently
supplied notes combination functions will be sufficient:
- combine_notes_concatenate() combines the two notes by appending the
contents of the new note to the contents of the existing note.
- combine_notes_overwrite() replaces the existing note with the new note.
- combine_notes_ignore() keeps the existing note, and ignores the new note.
A combine_notes function can be passed to init_notes() to choose a default
combine_notes function for that notes tree. If NULL is given, the notes tree
falls back to combine_notes_concatenate() as the ultimate default.
A combine_notes function can also be passed directly to add_note(), to
control the notes combining behaviour for a note addition in particular.
If NULL is passed, the combine_notes function registered for the given
notes tree is used.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:19 +08:00
|
|
|
t->combine_notes = combine_notes;
|
2010-02-14 05:28:18 +08:00
|
|
|
t->initialized = 1;
|
|
|
|
|
2010-02-14 05:28:12 +08:00
|
|
|
if (flags & NOTES_INIT_EMPTY || !notes_ref ||
|
|
|
|
read_ref(notes_ref, object_sha1))
|
2009-10-09 18:21:59 +08:00
|
|
|
return;
|
2010-02-14 05:28:12 +08:00
|
|
|
if (get_tree_entry(object_sha1, "", sha1, &mode))
|
|
|
|
die("Failed to read notes tree referenced by %s (%s)",
|
|
|
|
notes_ref, object_sha1);
|
2009-10-09 18:21:59 +08:00
|
|
|
|
Teach the notes lookup code to parse notes trees with various fanout schemes
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-09 18:22:07 +08:00
|
|
|
hashclr(root_tree.key_sha1);
|
|
|
|
hashcpy(root_tree.val_sha1, sha1);
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
load_subtree(t, &root_tree, t->root, 0);
|
2009-10-09 18:21:59 +08:00
|
|
|
}
|
|
|
|
|
2010-02-14 05:28:18 +08:00
|
|
|
void add_note(struct notes_tree *t, const unsigned char *object_sha1,
|
Refactor notes concatenation into a flexible interface for combining notes
When adding a note to an object that already has an existing note, the
current solution is to concatenate the contents of the two notes. However,
the caller may instead wish to _overwrite_ the existing note with the new
note, or maybe even _ignore_ the new note, and keep the existing one. There
might also be other ways of combining notes that are only known to the
caller.
Therefore, instead of unconditionally concatenating notes, we let the caller
specify how to combine notes, by passing in a pointer to a function for
combining notes. The caller may choose to implement its own function for
notes combining, but normally one of the following three conveniently
supplied notes combination functions will be sufficient:
- combine_notes_concatenate() combines the two notes by appending the
contents of the new note to the contents of the existing note.
- combine_notes_overwrite() replaces the existing note with the new note.
- combine_notes_ignore() keeps the existing note, and ignores the new note.
A combine_notes function can be passed to init_notes() to choose a default
combine_notes function for that notes tree. If NULL is given, the notes tree
falls back to combine_notes_concatenate() as the ultimate default.
A combine_notes function can also be passed directly to add_note(), to
control the notes combining behaviour for a note addition in particular.
If NULL is passed, the combine_notes function registered for the given
notes tree is used.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:19 +08:00
|
|
|
const unsigned char *note_sha1, combine_notes_fn combine_notes)
|
2010-02-14 05:28:13 +08:00
|
|
|
{
|
|
|
|
struct leaf_node *l;
|
|
|
|
|
2010-02-14 05:28:18 +08:00
|
|
|
if (!t)
|
|
|
|
t = &default_notes_tree;
|
|
|
|
assert(t->initialized);
|
Refactor notes concatenation into a flexible interface for combining notes
When adding a note to an object that already has an existing note, the
current solution is to concatenate the contents of the two notes. However,
the caller may instead wish to _overwrite_ the existing note with the new
note, or maybe even _ignore_ the new note, and keep the existing one. There
might also be other ways of combining notes that are only known to the
caller.
Therefore, instead of unconditionally concatenating notes, we let the caller
specify how to combine notes, by passing in a pointer to a function for
combining notes. The caller may choose to implement its own function for
notes combining, but normally one of the following three conveniently
supplied notes combination functions will be sufficient:
- combine_notes_concatenate() combines the two notes by appending the
contents of the new note to the contents of the existing note.
- combine_notes_overwrite() replaces the existing note with the new note.
- combine_notes_ignore() keeps the existing note, and ignores the new note.
A combine_notes function can be passed to init_notes() to choose a default
combine_notes function for that notes tree. If NULL is given, the notes tree
falls back to combine_notes_concatenate() as the ultimate default.
A combine_notes function can also be passed directly to add_note(), to
control the notes combining behaviour for a note addition in particular.
If NULL is passed, the combine_notes function registered for the given
notes tree is used.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:19 +08:00
|
|
|
if (!combine_notes)
|
|
|
|
combine_notes = t->combine_notes;
|
2010-02-14 05:28:13 +08:00
|
|
|
l = (struct leaf_node *) xmalloc(sizeof(struct leaf_node));
|
|
|
|
hashcpy(l->key_sha1, object_sha1);
|
|
|
|
hashcpy(l->val_sha1, note_sha1);
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
note_tree_insert(t, t->root, 0, l, PTR_TYPE_NOTE, combine_notes);
|
2010-02-14 05:28:13 +08:00
|
|
|
}
|
|
|
|
|
2010-02-14 05:28:18 +08:00
|
|
|
void remove_note(struct notes_tree *t, const unsigned char *object_sha1)
|
2010-02-14 05:28:14 +08:00
|
|
|
{
|
|
|
|
struct leaf_node l;
|
|
|
|
|
2010-02-14 05:28:18 +08:00
|
|
|
if (!t)
|
|
|
|
t = &default_notes_tree;
|
|
|
|
assert(t->initialized);
|
2010-02-14 05:28:14 +08:00
|
|
|
hashcpy(l.key_sha1, object_sha1);
|
|
|
|
hashclr(l.val_sha1);
|
2010-02-14 05:28:18 +08:00
|
|
|
return note_tree_remove(t, t->root, 0, &l);
|
2010-02-14 05:28:14 +08:00
|
|
|
}
|
|
|
|
|
2010-02-14 05:28:18 +08:00
|
|
|
const unsigned char *get_note(struct notes_tree *t,
|
|
|
|
const unsigned char *object_sha1)
|
2009-10-09 18:21:59 +08:00
|
|
|
{
|
2010-02-14 05:28:15 +08:00
|
|
|
struct leaf_node *found;
|
|
|
|
|
2010-02-14 05:28:18 +08:00
|
|
|
if (!t)
|
|
|
|
t = &default_notes_tree;
|
|
|
|
assert(t->initialized);
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
found = note_tree_find(t, t->root, 0, object_sha1);
|
2010-02-14 05:28:15 +08:00
|
|
|
return found ? found->val_sha1 : NULL;
|
2009-10-09 18:21:59 +08:00
|
|
|
}
|
2009-10-09 18:21:57 +08:00
|
|
|
|
2010-02-14 05:28:18 +08:00
|
|
|
int for_each_note(struct notes_tree *t, int flags, each_note_fn fn,
|
|
|
|
void *cb_data)
|
2010-02-14 05:28:16 +08:00
|
|
|
{
|
2010-02-14 05:28:18 +08:00
|
|
|
if (!t)
|
|
|
|
t = &default_notes_tree;
|
|
|
|
assert(t->initialized);
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
return for_each_note_helper(t, t->root, 0, 0, flags, fn, cb_data);
|
2010-02-14 05:28:16 +08:00
|
|
|
}
|
|
|
|
|
2010-02-14 05:28:18 +08:00
|
|
|
int write_notes_tree(struct notes_tree *t, unsigned char *result)
|
2010-02-14 05:28:17 +08:00
|
|
|
{
|
|
|
|
struct tree_write_stack root;
|
|
|
|
struct write_each_note_data cb_data;
|
|
|
|
int ret;
|
|
|
|
|
2010-02-14 05:28:18 +08:00
|
|
|
if (!t)
|
|
|
|
t = &default_notes_tree;
|
|
|
|
assert(t->initialized);
|
2010-02-14 05:28:17 +08:00
|
|
|
|
|
|
|
/* Prepare for traversal of current notes tree */
|
|
|
|
root.next = NULL; /* last forward entry in list is grounded */
|
|
|
|
strbuf_init(&root.buf, 256 * (32 + 40)); /* assume 256 entries */
|
|
|
|
root.path[0] = root.path[1] = '\0';
|
|
|
|
cb_data.root = &root;
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
cb_data.next_non_note = t->first_non_note;
|
2010-02-14 05:28:17 +08:00
|
|
|
|
|
|
|
/* Write tree objects representing current notes tree */
|
2010-02-14 05:28:18 +08:00
|
|
|
ret = for_each_note(t, FOR_EACH_NOTE_DONT_UNPACK_SUBTREES |
|
2010-02-14 05:28:17 +08:00
|
|
|
FOR_EACH_NOTE_YIELD_SUBTREES,
|
|
|
|
write_each_note, &cb_data) ||
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
write_each_non_note_until(NULL, &cb_data) ||
|
2010-02-14 05:28:17 +08:00
|
|
|
tree_write_stack_finish_subtree(&root) ||
|
|
|
|
write_sha1_file(root.buf.buf, root.buf.len, tree_type, result);
|
|
|
|
strbuf_release(&root.buf);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2010-02-14 05:28:27 +08:00
|
|
|
void prune_notes(struct notes_tree *t)
|
|
|
|
{
|
|
|
|
struct note_delete_list *l = NULL;
|
|
|
|
|
|
|
|
if (!t)
|
|
|
|
t = &default_notes_tree;
|
|
|
|
assert(t->initialized);
|
|
|
|
|
|
|
|
for_each_note(t, 0, prune_notes_helper, &l);
|
|
|
|
|
|
|
|
while (l) {
|
|
|
|
remove_note(t, l->sha1);
|
|
|
|
l = l->next;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-02-14 05:28:18 +08:00
|
|
|
void free_notes(struct notes_tree *t)
|
2009-10-09 18:22:06 +08:00
|
|
|
{
|
2010-02-14 05:28:18 +08:00
|
|
|
if (!t)
|
|
|
|
t = &default_notes_tree;
|
|
|
|
if (t->root)
|
|
|
|
note_tree_free(t->root);
|
|
|
|
free(t->root);
|
Teach notes code to properly preserve non-notes in the notes tree
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:23 +08:00
|
|
|
while (t->first_non_note) {
|
|
|
|
t->prev_non_note = t->first_non_note->next;
|
|
|
|
free(t->first_non_note->path);
|
|
|
|
free(t->first_non_note);
|
|
|
|
t->first_non_note = t->prev_non_note;
|
|
|
|
}
|
2010-02-14 05:28:18 +08:00
|
|
|
free(t->ref);
|
|
|
|
memset(t, 0, sizeof(struct notes_tree));
|
2009-10-09 18:22:06 +08:00
|
|
|
}
|
|
|
|
|
2010-02-14 05:28:18 +08:00
|
|
|
void format_note(struct notes_tree *t, const unsigned char *object_sha1,
|
|
|
|
struct strbuf *sb, const char *output_encoding, int flags)
|
2009-10-09 18:21:57 +08:00
|
|
|
{
|
|
|
|
static const char utf8[] = "utf-8";
|
2010-02-14 05:28:15 +08:00
|
|
|
const unsigned char *sha1;
|
2009-10-09 18:21:57 +08:00
|
|
|
char *msg, *msg_p;
|
|
|
|
unsigned long linelen, msglen;
|
|
|
|
enum object_type type;
|
|
|
|
|
2010-02-14 05:28:18 +08:00
|
|
|
if (!t)
|
|
|
|
t = &default_notes_tree;
|
|
|
|
if (!t->initialized)
|
Refactor notes concatenation into a flexible interface for combining notes
When adding a note to an object that already has an existing note, the
current solution is to concatenate the contents of the two notes. However,
the caller may instead wish to _overwrite_ the existing note with the new
note, or maybe even _ignore_ the new note, and keep the existing one. There
might also be other ways of combining notes that are only known to the
caller.
Therefore, instead of unconditionally concatenating notes, we let the caller
specify how to combine notes, by passing in a pointer to a function for
combining notes. The caller may choose to implement its own function for
notes combining, but normally one of the following three conveniently
supplied notes combination functions will be sufficient:
- combine_notes_concatenate() combines the two notes by appending the
contents of the new note to the contents of the existing note.
- combine_notes_overwrite() replaces the existing note with the new note.
- combine_notes_ignore() keeps the existing note, and ignores the new note.
A combine_notes function can be passed to init_notes() to choose a default
combine_notes function for that notes tree. If NULL is given, the notes tree
falls back to combine_notes_concatenate() as the ultimate default.
A combine_notes function can also be passed directly to add_note(), to
control the notes combining behaviour for a note addition in particular.
If NULL is passed, the combine_notes function registered for the given
notes tree is used.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-14 05:28:19 +08:00
|
|
|
init_notes(t, NULL, NULL, 0);
|
2009-10-09 18:21:57 +08:00
|
|
|
|
2010-02-14 05:28:18 +08:00
|
|
|
sha1 = get_note(t, object_sha1);
|
2009-10-09 18:21:59 +08:00
|
|
|
if (!sha1)
|
2009-10-09 18:21:57 +08:00
|
|
|
return;
|
|
|
|
|
|
|
|
if (!(msg = read_sha1_file(sha1, &type, &msglen)) || !msglen ||
|
|
|
|
type != OBJ_BLOB) {
|
|
|
|
free(msg);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (output_encoding && *output_encoding &&
|
|
|
|
strcmp(utf8, output_encoding)) {
|
|
|
|
char *reencoded = reencode_string(msg, output_encoding, utf8);
|
|
|
|
if (reencoded) {
|
|
|
|
free(msg);
|
|
|
|
msg = reencoded;
|
|
|
|
msglen = strlen(msg);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* we will end the annotation by a newline anyway */
|
|
|
|
if (msglen && msg[msglen - 1] == '\n')
|
|
|
|
msglen--;
|
|
|
|
|
2009-10-09 18:22:04 +08:00
|
|
|
if (flags & NOTES_SHOW_HEADER)
|
|
|
|
strbuf_addstr(sb, "\nNotes:\n");
|
2009-10-09 18:21:57 +08:00
|
|
|
|
|
|
|
for (msg_p = msg; msg_p < msg + msglen; msg_p += linelen + 1) {
|
|
|
|
linelen = strchrnul(msg_p, '\n') - msg_p;
|
|
|
|
|
2009-10-09 18:22:04 +08:00
|
|
|
if (flags & NOTES_INDENT)
|
|
|
|
strbuf_addstr(sb, " ");
|
2009-10-09 18:21:57 +08:00
|
|
|
strbuf_add(sb, msg_p, linelen);
|
|
|
|
strbuf_addch(sb, '\n');
|
|
|
|
}
|
|
|
|
|
|
|
|
free(msg);
|
|
|
|
}
|