Update reading and updating packed-refs file, correcting corner case
bugs.
* mh/packed-refs-various: (33 commits)
refs: handle the main ref_cache specially
refs: change do_for_each_*() functions to take ref_cache arguments
pack_one_ref(): do some cheap tests before a more expensive one
pack_one_ref(): use write_packed_entry() to do the writing
pack_one_ref(): use function peel_entry()
refs: inline function do_not_prune()
pack_refs(): change to use do_for_each_entry()
refs: use same lock_file object for both ref-packing functions
pack_one_ref(): rename "path" parameter to "refname"
pack-refs: merge code from pack-refs.{c,h} into refs.{c,h}
pack-refs: rename handle_one_ref() to pack_one_ref()
refs: extract a function write_packed_entry()
repack_without_ref(): write peeled refs in the rewritten file
t3211: demonstrate loss of peeled refs if a packed ref is deleted
refs: change how packed refs are deleted
search_ref_dir(): return an index rather than a pointer
repack_without_ref(): silence errors for dangling packed refs
t3210: test for spurious error messages for dangling packed refs
refs: change the internal reference-iteration API
refs: extract a function peel_entry()
...
Hold the ref_cache instance for the main repository in a dedicated,
statically-allocated instance to avoid the need for a function call
and a linked-list traversal when it is needed.
Suggested by: Heiko Voigt <hvoigt@hvoigt.net>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the callers convert submodule names into ref_cache pointers.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change pack_refs() to work with a file descriptor instead of a FILE*
(making the file-locking code less awkward) and use
write_packed_entry() to do the writing.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change pack_one_ref() to call peel_entry() rather than using its own
code for peeling references. Aside from sharing code, this lets it
take advantage of the optimization introduced by 6c4a060d7d.
Please note that we *could* use any peeled values that happen to
already be stored in the ref_entries, which would avoid some object
lookups for references that were already packed. But doing so would
also propagate any peeling errors across runs of "git pack-refs" and
give no way to recover from such errors. And "git pack-refs" isn't
run often enough that the performance cost is a problem. So instead,
add a new option to peel_entry() to force the entry to be re-peeled,
and call it with that option set.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Function do_not_prune() was redundantly checking REF_ISSYMREF, which
was already tested at the top of pack_one_ref(), so remove that check.
And the rest was trivial, so inline the function.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack_refs() was not using any of the extra features of for_each_ref(),
so change it to use do_for_each_entry(). This also gives it access to
the ref_entry and in particular its peeled field, which will be taken
advantage of in the next commit.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use a single struct lock_file for both pack_refs() and
repack_without_ref().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make this function conform to the naming convention established in
65385ef7d4 for the rest of the refs.c file.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-refs.c doesn't contain much code, and the code it does contain is
closely related to reference handling. Moreover, there is some
duplication between pack_refs() and repack_without_ref(). Therefore,
merge pack-refs.c into refs.c and pack-refs.h into refs.h.
The code duplication will be addressed in future commits.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract the I/O code from the "business logic" in repack_ref_fn().
Later there will be another caller for this function.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a reference that existed in the packed-refs file is deleted, the
packed-refs file must be rewritten. Previously, the file was
rewritten without any peeled refs, even if the file contained peeled
refs when it was read. This was not a bug, because the packed-refs
file header didn't claim that the file contained peeled values. But
it had a performance cost, because the repository would lose the
benefit of having precomputed peeled references until pack-refs was
run again.
Teach repack_without_ref() to write peeled refs to the packed-refs
file (regardless of whether they were present in the old version of
the file).
This means that if the old version of the packed-refs file was not
fully peeled, then repack_without_ref() will have to peel references.
To avoid the expense of reading lots of loose references, we take two
shortcuts relative to pack-refs:
* If the peeled value of a reference is already known (i.e., because
it was read from the old version of the packed-refs file), then
output that peeled value again without any checks. This is the
usual code path and should avoid any noticeable overhead. (This is
different than pack-refs, which always re-peels references.)
* We don't verify that the packed ref is still current. It could be
that a packed references is overridden by a loose reference, in
which case the packed ref is no longer needed and might even refer
to an object that has been garbage collected. But we don't check;
instead, we just try to peel all references. If peeling is
successful, the peeled value is written out (even though it might
not be needed any more); if not, then the reference is silently
omitted from the output.
The extra overhead of peeling references in repack_without_ref()
should only be incurred the first time the packed-refs file is written
by a version of Git that knows about the "fully-peeled" attribute.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a function remove_ref(), which removes a single entry from a
reference cache.
Use this function to reimplement repack_without_ref(). The old
version iterated over all refs, packing all of them except for the one
to be deleted, then discarded the entire packed reference cache. The
new version deletes the doomed reference from the cache *before*
iterating.
This has two advantages:
* the code for writing packed-refs becomes simpler, because it doesn't
have to exclude one of the references.
* it is no longer necessary to discard the packed refs cache after
deleting a reference: symbolic refs cannot be packed, so packed
references cannot depend on each other, so the rest of the packed
refs cache remains valid after a reference is deleted.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change search_ref_dir() to return the index of the sought entry (or -1
on error) rather than a pointer to the entry. This will make it more
natural to use the function for removing an entry from the list.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop emitting an error message when deleting a packed reference if we
find another dangling packed reference that is overridden by a loose
reference. See the previous commit for a longer explanation of the
issue.
We have to be careful to make sure that the invalid packed reference
really *is* overridden by a loose reference; otherwise what we have
found is repository corruption, which we *should* report.
Please note that this approach is vulnerable to a race condition
similar to the race conditions already known to affect packed
references [1]:
* Process 1 tries to peel packed reference X as part of deleting
another packed reference. It discovers that X does not refer to a
valid object (because the object that it referred to has been
garbage collected).
* Process 2 tries to delete reference X. It starts by deleting the
loose reference X.
* Process 1 checks whether there is a loose reference X. There is not
(it has just been deleted by process 2), so process 1 reports a
spurious error "X does not point to a valid object!"
The worst case seems relatively harmless, and the fix is identical to
the fix that will be needed for the other race conditions (namely
holding a lock on the packed-refs file during *all* reference
deletions), so we leave the cleaning up of all of them as a future
project.
[1] http://thread.gmane.org/gmane.comp.version-control.git/211956
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Establish an internal API for iterating over references, which gives
the callback functions direct access to the ref_entry structure
describing the reference. (Do not change the iteration API that is
exposed outside of the module.)
Define a new internal callback signature
int each_ref_entry_fn(struct ref_entry *entry, void *cb_data)
Change do_for_each_ref_in_dir() and do_for_each_ref_in_dirs() to
accept each_ref_entry_fn callbacks, and rename them to
do_for_each_entry_in_dir() and do_for_each_entry_in_dirs(),
respectively. Adapt their callers accordingly.
Add a new function do_for_each_entry() analogous to do_for_each_ref()
but using the new callback style.
Change do_one_ref() into an each_ref_entry_fn that does some
bookkeeping and then calls a wrapped each_ref_fn.
Reimplement do_for_each_ref() in terms of do_for_each_entry(), using
do_one_ref() as an adapter.
Please note that the responsibility for setting current_ref remains in
do_one_ref(), which means that current_ref is *not* set when iterating
over references via the new internal API. This is not a disadvantage,
because current_ref is not needed by callers of the internal API (they
receive a pointer to the current ref_entry anyway). But more
importantly, this change prevents peel_ref() from returning invalid
results in the following scenario:
When iterating via the external API, the iteration always includes
both packed and loose references, and in particular never presents a
packed ref if there is a loose ref with the same name. The internal
API, on the other hand, gives the option to iterate over only the
packed references. During such an iteration, there is no check
whether the packed ref might be hidden by a loose ref of the same
name. But until now the packed ref was recorded in current_ref during
the iteration. So if peel_ref() were called with the reference name
corresponding to current ref, it would return the peeled version of
the packed ref even though there might be a loose ref that peels to a
different value. This scenario doesn't currently occur in the code,
but fix it to prevent things from breaking in a very confusing way in
the future.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Peel the entry, and as a side effect store the peeled value in the
entry. Use this function from two places in peel_ref(); a third
caller will be added soon.
Please note that this change can lead to ref_entries for unpacked refs
being peeled. This has no practical benefit but is harmless.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old version was inconsistent: when a reference was
REF_KNOWS_PEELED but with a null peeled value, it returned non-zero
for the current reference but zero for other references. Change the
behavior for non-current references to match that of current_ref,
which is what callers expect. Document the behavior.
Current callers only call peel_ref() from within a for_each_ref-style
iteration and only for the current ref; therefore, the buggy code path
was never reached.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of just returning a success/failure bit, return an enumeration
value that explains the reason for any failure. This will come in
handy shortly.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is a nice, logical unit of work, and putting it in a function
removes the need to use a goto in peel_ref(). Soon it will also have
other uses.
The algorithm is unchanged.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is a nice unit of work and soon will be needed from multiple
locations.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of copying the reference's SHA1 into a caller-supplied
variable, just return the ref_entry itself (or NULL if there is no
such entry). This change will allow the function to be used from
elsewhere.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no way to drop out of the while loop. This code has been
dead since 432ad41e.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document the bits that can appear in the "flags" parameter passed to
an each_ref_function and/or in the ref_entry::flag field.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An internal function used to implement "git checkout @{-1}" was
hard to use correctly.
* jc/reflog-reverse-walk:
refs.c: fix fread error handling
reflog: add for_each_reflog_ent_reverse() API
for_each_recent_reflog_ent(): simplify opening of a reflog file
for_each_reflog_ent(): extract a helper to process a single entry
Not that we do not actively encourage having annotated tags outside
refs/tags/ hierarchy, but they were not advertised correctly to the
ls-remote and fetch with recent version of Git.
* jk/fully-peeled-packed-ref:
pack-refs: add fully-peeled trait
pack-refs: write peeled entry for non-tags
use parse_object_or_die instead of die("bad object")
avoid segfaults on parse_object failure
fread returns the number of items read, with no special error return.
Commit 98f85ff (reflog: add for_each_reflog_ent_reverse() API -
2013-03-08) introduced a call to fread which checks for an error with
"nread < 0" which is tautological since nread is unsigned. The correct
check in this case (which tries to read a single item) is "nread != 1".
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Older versions of pack-refs did not write peel lines for
refs outside of refs/tags. This meant that on reading the
pack-refs file, we might set the REF_KNOWS_PEELED flag for
such a ref, even though we do not know anything about its
peeled value.
The previous commit updated the writer to always peel, no
matter what the ref is. That means that packed-refs files
written by newer versions of git are fine to be read by both
old and new versions of git. However, we still have the
problem of reading packed-refs files written by older
versions of git, or by other implementations which have not
yet learned the same trick.
The simplest fix would be to always unset the
REF_KNOWS_PEELED flag for refs outside of refs/tags that do
not have a peel line (if it has a peel line, we know it is
valid, but we cannot assume a missing peel line means
anything). But that loses an important optimization, as
upload-pack should not need to load the object pointed to by
refs/heads/foo to determine that it is not a tag.
Instead, we add a "fully-peeled" trait to the packed-refs
file. If it is set, we know that we can trust a missing peel
line to mean that a ref cannot be peeled. Otherwise, we fall
back to assuming nothing.
[commit message and tests by Jeff King <peff@peff.net>]
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git checkout -" is a short-hand for "git checkout @{-1}" and the
"@{nth}" notation for a negative number is to find nth previous
checkout in the reflog of the HEAD to determine the name of the
branch the user was on. We would want to find the nth most recent
reflog entry that matches "checkout: moving from X to Y" for this.
Unfortunately, reflog is implemented as an append-only file, and the
API to iterate over its entries, for_each_reflog_ent(), reads the
file in order, giving the entries from the oldest to newer. For the
purpose of finding nth most recent one, this API forces us to record
the last n entries in a rotating buffer and give the result out only
after we read everything. To optimize for a common case of finding
the nth most recent one for a small value of n, we also have a side
API for_each_recent_reflog_ent() that starts reading near the end of
the file, but it still has to read the entries in the "wrong" order.
The implementation of understanding @{-1} uses this interface.
This all becomes unnecessary if we add an API to let us iterate over
reflog entries in the reverse order, from the newest to older.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split the logic that takes a single line of reflog entry in a
strbuf, parses the message, and calls the callback function out of
the loop into a separate helper function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow the server side to redact the refs/ namespace it shows to the
client.
Will merge to 'master'.
* jc/hidden-refs:
upload/receive-pack: allow hiding ref hierarchies
upload-pack: simplify request validation
upload-pack: share more code
A repository may have refs that are only used for its internal
bookkeeping purposes that should not be exposed to the others that
come over the network.
Teach upload-pack to omit some refs from its initial advertisement
by paying attention to the uploadpack.hiderefs multi-valued
configuration variable. Do the same to receive-pack via the
receive.hiderefs variable. As a convenient short-hand, allow using
transfer.hiderefs to set the value to both of these variables.
Any ref that is under the hierarchies listed on the value of these
variable is excluded from responses to requests made by "ls-remote",
"fetch", etc. (for upload-pack) and "push" (for receive-pack).
Because these hidden refs do not count as OUR_REF, an attempt to
fetch objects at the tip of them will be rejected, and because these
refs do not get advertised, "git push :" will not see local branches
that have the same name as them as "matching" ones to be sent.
An attempt to update/delete these hidden refs with an explicit
refspec, e.g. "git push origin :refs/hidden/22", is rejected. This
is not a new restriction. To the pusher, it would appear that there
is no such ref, so its push request will conclude with "Now that I
sent you all the data, it is time for you to update the refs. I saw
that the ref did not exist when I started pushing, and I want the
result to point at this commit". The receiving end will apply the
compare-and-swap rule to this request and rejects the push with
"Well, your update request conflicts with somebody else; I see there
is such a ref.", which is the right thing to do. Otherwise a push to
a hidden ref will always be "the last one wins", which is not a good
default.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Simplify ref_entry_cmp_sslice() by using strncmp() to compare the
length-limited key and a NUL-terminated entry. While we're at it,
retain the const attribute of the input pointers.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git pack-refs" that ran in parallel to another process that created
new refs had a nasty race.
* jk/repack-ref-racefix:
refs: do not use cached refs in repack_without_ref
When we delete a ref that is packed, we rewrite the whole
packed-refs file and simply omit the ref that no longer
exists. However, we base the rewrite on whatever happens to
be in our refs cache, not what is necessarily on disk. That
opens us up to a race condition if another process is
simultaneously packing the refs, as we will overwrite their
newly-made pack-refs file with our potentially stale data,
losing commits.
You can demonstrate the race like this:
# setup some repositories
git init --bare parent &&
(cd parent && git config core.logallrefupdates true) &&
git clone parent child &&
(cd child && git commit --allow-empty -m base)
# in one terminal, repack the refs repeatedly
cd parent &&
while true; do
git pack-refs --all
done
# in another terminal, simultaneously push updates to
# master, and create and delete an unrelated ref
cd child &&
while true; do
git push origin HEAD:newbranch &&
git commit --allow-empty -m foo
us=`git rev-parse master` &&
git push origin master &&
git push origin :newbranch &&
them=`git --git-dir=../parent rev-parse master` &&
if test "$them" != "$us"; then
echo >&2 "$them" != "$us"
exit 1
fi
done
In many cases the two processes will conflict over locking
the packed-refs file, and the deletion of newbranch will
simply fail. But eventually you will hit the race, which
happens like this:
1. We push a new commit to master. It is already packed
(from the looping pack-refs call). We write the new
value (let us call it B) to $GIT_DIR/refs/heads/master,
but the old value (call it A) remains in the
packed-refs file.
2. We push the deletion of newbranch, spawning a
receive-pack process. Receive-pack advertises all refs
to the client, causing it to iterate over each ref; it
caches the packed refs in memory, which points at the
stale value A.
3. Meanwhile, a separate pack-refs process is running. It
runs to completion, updating the packed-refs file to
point master at B, and deleting $GIT_DIR/refs/heads/master
which also pointed at B.
4. Back in the receive-pack process, we get the
instruction to delete :newbranch. We take a lock on
packed-refs (which works, as the other pack-refs
process has already finished). We then rewrite the
contents using the cached refs, which contain the stale
value A.
The resulting packed-refs file points master once again at
A. The loose ref which would override it to point at B was
deleted (rightfully) in step 3. As a result, master now
points at A. The only trace that B ever existed in the
parent is in the reflog: the final entry will show master
moving from A to B, even though the ref still points at A
(so you can detect this race after the fact, because the
next reflog entry will move from A to C).
We can fix this by invalidating the packed-refs cache after
we have taken the lock. This means that we will re-read the
packed-refs file, and since we have the lock, we will be
sure that what we read will be atomically up-to-date when we
write (it may be out of date with respect to loose refs, but
that is OK, as loose refs take precedence).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"update-ref -d --deref SYM" to delete a ref through a symbolic ref
that points to it did not remove it correctly.
* jh/update-ref-d-through-symref:
Fix failure to delete a packed ref through a symref
t1400-update-ref: Add test verifying bug with symrefs in delete_ref()
When "update-ref -d --no-deref SYM" tried to delete a symbolic ref
SYM, it incorrectly locked the underlying reference pointed by SYM,
not the symbolic ref itself.
* rs/lock-correct-ref-during-delete:
refs: lock symref that is to be deleted, not its target
When deleting a ref through a symref (e.g. using 'git update-ref -d HEAD'
to delete refs/heads/master), we would remove the loose ref, but a packed
version of the same ref would remain, the end result being that instead of
deleting refs/heads/master we would appear to reset it to its state as of
the last repack.
This patch fixes the issue, by making sure we pass the correct ref name
when invoking repack_without_ref() from within delete_ref().
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When delete_ref is called on a symref then it locks its target and then
either deletes the target or the symref, depending on whether the flag
REF_NODEREF was set in the parameter delopt.
Instead, simply pass the flag to lock_ref_sha1_basic, which will then
either lock the target or the symref, and delete the locked ref.
This reimplements part of eca35a25 (Fix git branch -m for symrefs.).
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>