We may return objects in one of two orders: how they appear in the .idx
(sorted by object id) or how they appear in the packfile itself. To
further complicate matters, we have two ordering variables, "i" and
"pos", and it is not clear to which order they apply.
Let's clarify this by using an unambiguous name where possible, and
leaving a comment for the variable that does double-duty.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To prepare for on-disk reverse indexes, remove a spot in
'offset_to_pack_pos()' that looks at the 'revindex' array in 'struct
packed_git'.
Even though this use of the revindex pointer is within pack-revindex.c,
this clean up is still worth doing. Since the 'revindex' pointer will be
NULL when reading from an on-disk reverse index (instead the
'revindex_data' pointer will be mmaped to the 'pack-*.rev' file), this
call-site would have to include a conditional to lookup the offset for
position 'mi' each iteration through the search.
So instead of open-coding 'pack_pos_to_offset()', call it directly from
within 'offset_to_pack_pos()'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that all spots outside of pack-revindex.c that reference 'struct
revindex_entry' directly have been removed, it is safe to hide the
implementation by moving it from pack-revindex.h to pack-revindex.c.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that all 'find_revindex_position()' callers have been removed (and
converted to the more descriptive 'offset_to_pack_pos()'), it is almost
safe to get rid of 'find_revindex_position()' entirely. Almost, except
for the fact that 'offset_to_pack_pos()' calls
'find_revindex_position()'.
Inline 'find_revindex_position()' into 'offset_to_pack_pos()', and
then remove 'find_revindex_position()' entirely.
This is a straightforward refactoring with one minor snag.
'offset_to_pack_pos()' used to load the index before calling
'find_revindex_position()'. That means that by the time
'find_revindex_position()' starts executing, 'p->num_objects' can be
safely read. After inlining, be careful to not read 'p->num_objects'
until _after_ 'load_pack_revindex()' (which loads the index as a
side-effect) has been called.
Another small fix that is included is converting the upper- and
lower-bounds to be unsigned's instead of ints. This dates back to
92e5c77c37 (revindex: export new APIs, 2013-10-24)--ironically, the last
time we introduced new APIs here--but this unifies the types.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that no callers of 'find_pack_revindex()' remain, remove the
function's declaration and implementation entirely.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'estimate_repack_memory()' takes into account the amount of memory
required to load the reverse index in memory by multiplying the assumed
number of objects by the size of the 'revindex_entry' struct.
Prepare for hiding the definition of 'struct revindex_entry' by removing
a 'sizeof()' of that type from outside of pack-revindex.c. Instead,
guess that one off_t and one uint32_t are required per object. Strictly
speaking, this is a worse guess than asking for 'sizeof(struct
revindex_entry)' directly, since the true size of this struct is 16
bytes with padding on the end of the struct in order to align the offset
field.
But, this is an approximation anyway, and it does remove a use of the
'struct revindex_entry' from outside of pack-revindex internals.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid looking at the 'revindex' pointer directly and instead call
'pack_pos_to_index()'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove direct manipulation of the 'struct revindex_entry' type as well
as calls to the deprecated API in 'packfile.c:unpack_entry()'. Usual
clean-up is performed (replacing '->nr' with calls to
'pack_pos_to_index()' and so on).
Add an additional check to make sure that 'obj_offset()' points at a
valid object. In the case this check is violated, we cannot call
'mark_bad_packed_object()' because we don't know the OID. At the top of
the call stack is do_oid_object_info_extended() (via
packed_object_info()), which does mark the object.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert another call of 'find_pack_revindex()' to its replacement
'pack_pos_to_offset()'. Likewise:
- Avoid manipulating `struct packed_git`'s `revindex` pointer directly
by removing the pointer-as-array indexing.
- Add an additional guard to check that the offset 'obj_offset()'
points to a real object. This should be the case with well-behaved
callers to 'packed_object_info()', but isn't guarenteed.
Other blocks that fill in various other values from the 'struct
object_info' request handle bad inputs by setting the type to
'OBJ_BAD' and jumping to 'out'. Do the same when given a bad offset
here.
The previous code would have segfaulted when given a bad
'obj_offset' value, since 'find_pack_revindex()' would return
'NULL', and then the line that fills 'oi->disk_sizep' would try to
access 'NULL[1]' with a stride of 16 bytes (the width of 'struct
revindex_entry)'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Perform exactly the same conversion as in the previous commit to another
caller within 'packfile.c'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace direct accesses to the 'struct revindex' type with a call to
'pack_pos_to_index()'.
Likewise drop the old-style 'find_pack_revindex()' with its replacement
'offset_to_pack_pos()' (while continuing to perform the same error
checking).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove another instance of looking at the revindex directly by instead
calling 'pack_pos_to_index()'. Unlike other patches, this caller only
cares about the index position of each object in the loop.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove another instance of direct revindex manipulation by calling
'pack_pos_to_offset()' instead (the caller here does not care about the
index position of the object at position 'pos').
Note that we cannot just use the existing "offset" variable to store the
value we get from pack_pos_to_offset(). It is incremented by
unpack_object_header(), but we later need the original value. Since
we'll no longer have revindex->offset to read it from, we'll store that
in a separate variable ("header" since it points to the entry's header
bytes).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove another caller that holds onto a 'struct revindex_entry' by
replacing the direct indexing with calls to 'pack_pos_to_offset()' and
'pack_pos_to_index()'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid storing the revindex entry directly, since this structure will
soon be removed from the public interface. Instead, store the offset and
index position by calling 'pack_pos_to_offset()' and
'pack_pos_to_index()', respectively.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace find_revindex_position() with its counterpart in the new API,
offset_to_pack_pos().
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace direct accesses to the revindex with calls to
'offset_to_pack_pos()' and 'pack_pos_to_index()'.
Since this caller already had some error checking (it can jump to the
'give_up' label if it encounters an error), we can easily check whether
or not the provided offset points to an object in the given pack. This
error checking existed prior to this patch, too, since the caller checks
whether the return value from 'find_pack_revindex()' was NULL or not.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace a direct access to the revindex array with
'pack_pos_to_offset()'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace direct revindex accesses with calls to 'pack_pos_to_offset()'
and 'pack_pos_to_index()'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
First replace 'find_pack_revindex()' with its replacement
'offset_to_pack_pos()'. This prevents any bogus OFS_DELTA that may make
its way through until 'write_reuse_object()' from causing a bad memory
read (if 'revidx' is 'NULL')
Next, replace a direct access of '->nr' with the wrapper function
'pack_pos_to_index()'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the next several patches, we will prepare for loading a reverse index
either in memory (mapping the inverse of the .idx's contents in-core),
or directly from a yet-to-be-introduced on-disk format. To prepare for
that, we'll introduce an API that avoids the caller explicitly indexing
the revindex pointer in the packed_git structure.
There are four ways to interact with the reverse index. Accordingly,
four functions will be exported from 'pack-revindex.h' by the time that
the existing API is removed. A caller may:
1. Load the pack's reverse index. This involves opening up the index,
generating an array, and then sorting it. Since opening the index
can fail, this function ('load_pack_revindex()') returns an int.
Accordingly, it takes only a single argument: the 'struct
packed_git' the caller wants to build a reverse index for.
This function is well-suited for both the current and new API.
Callers will have to continue to open the reverse index explicitly,
but this function will eventually learn how to detect and load a
reverse index from the on-disk format, if one exists. Otherwise, it
will fallback to generating one in memory from scratch.
2. Convert a pack position into an offset. This operation is now
called `pack_pos_to_offset()`. It takes a pack and a position, and
returns the corresponding off_t.
Any error simply calls BUG(), since the callers are not well-suited
to handle a failure and keep going.
3. Convert a pack position into an index position. Same as above; this
takes a pack and a position, and returns a uint32_t. This operation
is known as `pack_pos_to_index()`. The same thinking about error
conditions applies here as well.
4. Find the pack position for a given offset. This operation is now
known as `offset_to_pack_pos()`. It takes a pack, an offset, and a
pointer to a uint32_t where the position is written, if an object
exists at that offset. Otherwise, -1 is returned to indicate
failure.
Unlike some of the callers that used to access '->offset' and '->nr'
directly, the error checking around this call is somewhat more
robust. This is important since callers should always pass an offset
which points at the boundary of two objects. The API, unlike direct
access, enforces that that is the case.
This will become important in a subsequent patch where a caller
which does not but could check the return value treats the signed
`-1` from `find_revindex_position()` as an index into the 'revindex'
array.
Two design warts are carried over into the new API:
- Asking for the index position of an out-of-bounds object will result
in a BUG() (since no such object exists), but asking for the offset
of the non-existent object at the end of the pack returns the total
size of the pack.
This makes it convenient for callers who always want to take the
difference of two adjacent object's offsets (to compute the on-disk
size) but don't want to worry about boundaries at the end of the
pack.
- offset_to_pack_pos() lazily loads the reverse index, but
pack_pos_to_index() doesn't (callers of the former are well-suited
to handle errors, but callers of the latter are not).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A 3-year old test that was not testing anything useful has been
corrected.
* fc/t6030-bisect-reset-removes-auxiliary-files:
test: bisect-porcelain: fix location of files
"git worktree repair" learned to deal with the case where both the
repository and the worktree moved.
* es/worktree-repair-both-moved:
worktree: teach `repair` to fix multi-directional breakage
The ORT merge strategy learned to synthesize virtual ancestor tree
by recursively merging multiple merge bases together, just like the
recursive backend has done for years.
* en/merge-ort-recursive:
merge-ort: implement merge_incore_recursive()
merge-ort: make clear_internal_opts() aware of partial clearing
merge-ort: copy a few small helper functions from merge-recursive.c
commit: move reverse_commit_list() from merge-recursive
When a user does not tell "git pull" to use rebase or merge, the
command gives a loud message telling a user to choose between
rebase or merge but creates a merge anyway, forcing users who would
want to rebase to redo the operation. Fix an early part of this
problem by tightening the condition to give the message---there is
no reason to stop or force the user to choose between rebase or
merge if the history fast-forwards.
* fc/pull-merge-rebase:
pull: display default warning only when non-ff
pull: correct condition to trigger non-ff advice
pull: get rid of unnecessary global variable
pull: give the advice for choosing rebase/merge much later
pull: refactor fast-forward check
More "ORT" merge strategy.
* en/merge-ort-2:
merge-ort: add modify/delete handling and delayed output processing
merge-ort: add die-not-implemented stub handle_content_merge() function
merge-ort: add function grouping comments
merge-ort: add a paths_to_free field to merge_options_internal
merge-ort: add a path_conflict field to merge_options_internal
merge-ort: add a clear_internal_opts helper
merge-ort: add a few includes
The merge backend "done right" starts to emerge.
* en/merge-ort-impl:
merge-ort: free data structures in merge_finalize()
merge-ort: add implementation of record_conflicted_index_entries()
tree: enable cmp_cache_name_compare() to be used elsewhere
merge-ort: add implementation of checkout()
merge-ort: basic outline for merge_switch_to_result()
merge-ort: step 3 of tree writing -- handling subdirectories as we go
merge-ort: step 2 of tree writing -- function to create tree object
merge-ort: step 1 of tree writing -- record basenames, modes, and oids
merge-ort: have process_entries operate in a defined order
merge-ort: add a preliminary simple process_entries() implementation
merge-ort: avoid recursing into identical trees
merge-ort: record stage and auxiliary info for every path
merge-ort: compute a few more useful fields for collect_merge_info
merge-ort: avoid repeating fill_tree_descriptor() on the same tree
merge-ort: implement a very basic collect_merge_info()
merge-ort: add an err() function similar to one from merge-recursive
merge-ort: use histogram diff
merge-ort: port merge_start() from merge-recursive
merge-ort: add some high-level algorithm structure
merge-ort: setup basic internal data structures
Various improvements to the codepath that writes out pack bitmaps.
* tb/pack-bitmap: (24 commits)
pack-bitmap-write: better reuse bitmaps
pack-bitmap-write: relax unique revwalk condition
pack-bitmap-write: use existing bitmaps
pack-bitmap: factor out 'add_commit_to_bitmap()'
pack-bitmap: factor out 'bitmap_for_commit()'
pack-bitmap-write: ignore BITMAP_FLAG_REUSE
pack-bitmap-write: build fewer intermediate bitmaps
pack-bitmap.c: check reads more aggressively when loading
pack-bitmap-write: rename children to reverse_edges
t5310: add branch-based checks
commit: implement commit_list_contains()
bitmap: implement bitmap_is_subset()
pack-bitmap-write: fill bitmap with commit history
pack-bitmap-write: pass ownership of intermediate bitmaps
pack-bitmap-write: reimplement bitmap writing
ewah: add bitmap_dup() function
ewah: implement bitmap_or()
ewah: make bitmap growth less aggressive
ewah: factor out bitmap growth
rev-list: die when --test-bitmap detects a mismatch
...
The "--format=%(trailers)" mechanism gets enhanced to make it
easier to design output for machine consumption.
* ab/trailers-extra-format:
pretty format %(trailers): add a "key_value_separator"
pretty format %(trailers): add a "keyonly"
pretty-format %(trailers): fix broken standalone "valueonly"
pretty format %(trailers) doc: avoid repetition
pretty format %(trailers) test: split a long line
Hotfix for a topic of this cycle.
* ma/maintenance-crontab-fix:
t7900-maintenance: test for magic markers
gc: fix handling of crontab magic markers
git-maintenance.txt: add missing word
Test coverage fix.
* js/no-more-prepare-for-main-in-test:
tests: drop the `PREPARE_FOR_MAIN_BRANCH` prereq
t9902: use `main` as initial branch name
t6302: use `main` as initial branch name
t5703: use `main` as initial branch name
t5510: use `main` as initial branch name
t5505: finalize transitioning to using the branch name `main`
t3205: finalize transitioning to using the branch name `main`
t3203: complete the transition to using the branch name `main`
t3201: finalize transitioning to using the branch name `main`
t3200: finish transitioning to the initial branch name `main`
t1400: use `main` as initial branch name
"git pack-redandant" when there is only one packfile used to crash,
which has been corrected.
* jx/pack-redundant-on-single-pack:
pack-redundant: fix crash when one packfile in repo
test_export() has been self-recursive since its inception even though a
simple for-loop would have served just as well to append its arguments
to the `test_export_` variable separated by the pipe character "|".
Recently `test_export_` was changed instead to a space-separated list of
tokens to be exported, an operation which can be accomplished via a
single simple assignment, with no need for looping or recursion.
Therefore, simplify the implementation.
While at it, take advantage of the fact that variable names to be
exported are shell identifiers, thus won't be composed of special
characters or whitespace, thus simple a `$*` can be used rather than
magical `"$@"`.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'linkgit' Asciidoc macro is misspelled as 'linkit' in the
description of 'GIT_SEQUENCE_EDITOR' since the addition of that variable
to git(1) in 902a126eca (doc: mention GIT_SEQUENCE_EDITOR and
'sequence.editor' more, 2020-08-31). Also, it uses two colons instead of
one.
Fix that.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>