Commit Graph

141 Commits

Author SHA1 Message Date
Junio C Hamano
69c54c7284 Merge branch 'ma/leakplugs'
Memory leaks in various codepaths have been plugged.

* ma/leakplugs:
  pack-bitmap[-write]: use `object_array_clear()`, don't leak
  object_array: add and use `object_array_pop()`
  object_array: use `object_array_clear()`, not `free()`
  leak_pending: use `object_array_clear()`, not `free()`
  commit: fix memory leak in `reduce_heads()`
  builtin/commit: fix memory leak in `prepare_index()`
2017-09-29 11:23:43 +09:00
Junio C Hamano
c50424a6f0 Merge branch 'jk/write-in-full-fix'
Many codepaths did not diagnose write failures correctly when disks
go full, due to their misuse of write_in_full() helper function,
which have been corrected.

* jk/write-in-full-fix:
  read_pack_header: handle signed/unsigned comparison in read result
  config: flip return value of store_write_*()
  notes-merge: use ssize_t for write_in_full() return value
  pkt-line: check write_in_full() errors against "< 0"
  convert less-trivial versions of "write_in_full() != len"
  avoid "write_in_full(fd, buf, len) != len" pattern
  get-tar-commit-id: check write_in_full() return against 0
  config: avoid "write_in_full(fd, buf, len) < len" pattern
2017-09-25 15:24:06 +09:00
Martin Ågren
7199203937 object_array: add and use object_array_pop()
In a couple of places, we pop objects off an object array `foo` by
decreasing `foo.nr`. We access `foo.nr` in many places, but most if not
all other times we do so read-only, e.g., as we iterate over the array.
But when we change `foo.nr` behind the array's back, it feels a bit
nasty and looks like it might leak memory.

Leaks happen if the popped element has an allocated `name` or `path`.
At the moment, that is not the case. Still, 1) the object array might
gain more fields that want to be freed, 2) a code path where we pop
might start using names or paths, 3) one of these code paths might be
copied to somewhere where we do, and 4) using a dedicated function for
popping is conceptually cleaner.

Introduce and use `object_array_pop()` instead. Release memory in the
new function. Document that popping an object leaves the associated
elements in limbo.

The converted places were identified by grepping for "\.nr\>" and
looking for "--".

Make the new function return NULL on an empty array. This is consistent
with `pop_commit()` and allows the following:

	while ((o = object_array_pop(&foo)) != NULL) {
		// do something
	}

But as noted above, we don't need to go out of our way to avoid reading
`foo.nr`. This is probably more readable:

	while (foo.nr) {
		... o = object_array_pop(&foo);
		// do something
	}

The name of `object_array_pop()` does not quite align with
`add_object_array()`. That is unfortunate. On the other hand, it matches
`object_array_clear()`. Arguably it's `add_...` that is the odd one out,
since it reads like it's used to "add" an "object array". For that
reason, side with `object_array_clear()`.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-24 10:06:04 +09:00
Jeff King
06f46f237a avoid "write_in_full(fd, buf, len) != len" pattern
The return value of write_in_full() is either "-1", or the
requested number of bytes[1]. If we make a partial write
before seeing an error, we still return -1, not a partial
value. This goes back to f6aa66cb95 (write_in_full: really
write in full or return error on disk full., 2007-01-11).

So checking anything except "was the return value negative"
is pointless. And there are a couple of reasons not to do
so:

  1. It can do a funny signed/unsigned comparison. If your
     "len" is signed (e.g., a size_t) then the compiler will
     promote the "-1" to its unsigned variant.

     This works out for "!= len" (unless you really were
     trying to write the maximum size_t bytes), but is a
     bug if you check "< len" (an example of which was fixed
     recently in config.c).

     We should avoid promoting the mental model that you
     need to check the length at all, so that new sites are
     not tempted to copy us.

  2. Checking for a negative value is shorter to type,
     especially when the length is an expression.

  3. Linus says so. In d34cf19b89 (Clean up write_in_full()
     users, 2007-01-11), right after the write_in_full()
     semantics were changed, he wrote:

       I really wish every "write_in_full()" user would just
       check against "<0" now, but this fixes the nasty and
       stupid ones.

     Appeals to authority aside, this makes it clear that
     writing it this way does not have an intentional
     benefit. It's a historical curiosity that we never
     bothered to clean up (and which was undoubtedly
     cargo-culted into new sites).

So let's convert these obviously-correct cases (this
includes write_str_in_full(), which is just a wrapper for
write_in_full()).

[1] A careful reader may notice there is one way that
    write_in_full() can return a different value. If we ask
    write() to write N bytes and get a return value that is
    _larger_ than N, we could return a larger total. But
    besides the fact that this would imply a totally broken
    version of write(), it would already invoke undefined
    behavior. Our internal remaining counter is an unsigned
    size_t, which means that subtracting too many byte will
    wrap it around to a very large number. So we'll instantly
    begin reading off the end of the buffer, trying to write
    gigabytes (or petabytes) of data.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-14 15:17:59 +09:00
Jeff King
076aa2cbda tempfile: auto-allocate tempfiles on heap
The previous commit taught the tempfile code to give up
ownership over tempfiles that have been renamed or deleted.
That makes it possible to use a stack variable like this:

  struct tempfile t;

  create_tempfile(&t, ...);
  ...
  if (!err)
          rename_tempfile(&t, ...);
  else
          delete_tempfile(&t);

But doing it this way has a high potential for creating
memory errors. The tempfile we pass to create_tempfile()
ends up on a global linked list, and it's not safe for it to
go out of scope until we've called one of those two
deactivation functions.

Imagine that we add an early return from the function that
forgets to call delete_tempfile(). With a static or heap
tempfile variable, the worst case is that the tempfile hangs
around until the program exits (and some functions like
setup_shallow_temporary rely on this intentionally, creating
a tempfile and then leaving it for later cleanup).

But with a stack variable as above, this is a serious memory
error: the variable goes out of scope and may be filled with
garbage by the time the tempfile code looks at it.  Let's
see if we can make it harder to get this wrong.

Since many callers need to allocate arbitrary numbers of
tempfiles, we can't rely on static storage as a general
solution. So we need to turn to the heap. We could just ask
all callers to pass us a heap variable, but that puts the
burden on them to call free() at the right time.

Instead, let's have the tempfile code handle the heap
allocation _and_ the deallocation (when the tempfile is
deactivated and removed from the list).

This changes the return value of all of the creation
functions. For the cleanup functions (delete and rename),
we'll add one extra bit of safety: instead of taking a
tempfile pointer, we'll take a pointer-to-pointer and set it
to NULL after freeing the object. This makes it safe to
double-call functions like delete_tempfile(), as the second
call treats the NULL input as a noop. Several callsites
follow this pattern.

The resulting patch does have a fair bit of noise, as each
caller needs to be converted to handle:

  1. Storing a pointer instead of the struct itself.

  2. Passing the pointer instead of taking the struct
     address.

  3. Handling a "struct tempfile *" return instead of a file
     descriptor.

We could play games to make this less noisy. For example, by
defining the tempfile like this:

  struct tempfile {
	struct heap_allocated_part_of_tempfile {
                int fd;
                ...etc
        } *actual_data;
  }

Callers would continue to have a "struct tempfile", and it
would be "active" only when the inner pointer was non-NULL.
But that just makes things more awkward in the long run.
There aren't that many callers, so we can simply bite
the bullet and adjust all of them. And the compiler makes it
easy for us to find them all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06 17:19:54 +09:00
Jeff King
49bd0fc222 tempfile: do not delete tempfile on failed close
When close_tempfile() fails, we delete the tempfile and
reset the fields of the tempfile struct. This makes it
easier for callers to return without cleaning up, but it
also makes this common pattern:

  if (close_tempfile(tempfile))
	return error_errno("error closing %s", tempfile->filename.buf);

wrong, because the "filename" field has been reset after the
failed close. And it's not easy to fix, as in many cases we
don't have another copy of the filename (e.g., if it was
created via one of the mks_tempfile functions, and we just
have the original template string).

Let's drop the feature that a failed close automatically
deletes the file. This puts the burden on the caller to do
the deletion themselves, but this isn't that big a deal.
Callers which do:

  if (write(...) || close_tempfile(...)) {
	delete_tempfile(...);
	return -1;
  }

already had to call delete when the write() failed, and so
aren't affected. Likewise, any caller which just calls die()
in the error path is OK; we'll delete the tempfile during
the atexit handler.

Because this patch changes the semantics of close_tempfile()
without changing its signature, all callers need to be
manually checked and converted to the new scheme. This patch
covers all in-tree callers, but there may be others for
not-yet-merged topics. To catch these, we rename the
function to close_tempfile_gently(), which will attract
compile-time attention to new callers. (Technically the
original could be considered "gentle" already in that it
didn't die() on errors, but this one is even more so).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06 17:19:53 +09:00
Jeff King
45c6b1ed24 always check return value of close_tempfile
If close_tempfile() encounters an error, then it deletes the
tempfile and resets the "struct tempfile". But many code
paths ignore the return value and continue to use the
tempfile. Instead, we should generally treat this the same
as a write() error.

Note that in the postimage of some of these cases our error
message will be bogus after a failed close because we look
at tempfile->filename (either directly or via get_tempfile_path).
But after the failed close resets the tempfile object, this
is guaranteed to be the empty string. That will be addressed
in a future patch (because there are many more cases of the
same problem than just these instances).

Note also in the hunk in gpg-interface.c that it's fine to
call delete_tempfile() in the error path, even if
close_tempfile() failed and already deleted the file. The
tempfile code is smart enough to know the second deletion is
a noop.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06 17:19:53 +09:00
Jeff King
c0e963b77c setup_temporary_shallow: move tempfile struct into function
The setup_temporary_shallow() function creates a temporary
file, but we never access the tempfile struct outside of the
function. This is OK, since it means we'll just clean up the
tempfile on exit.  But we can simplify the code a bit by
moving the global tempfile struct to the only function in
which it's used.

Note that it must remain "static" due to tempfile.c's
requirement that tempfile storage never goes away until
program exit.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06 17:19:52 +09:00
Jeff King
0899013993 setup_temporary_shallow: avoid using inactive tempfile
When there are no shallow entries to write, we skip creating
the tempfile entirely and try to return the empty string.

But we do so by calling get_tempfile_path() on the inactive
tempfile object. This will trigger an assertion that kills
the program. The bug was introduced by 6e122b449b
(setup_temporary_shallow(): use tempfile module,
2015-08-10). But nobody seems to have noticed since then
because we do not end up calling this function at all when
there are no shallow items. In other words, this code path
is completely unexercised.

Since the tempfile object is a static global, it _is_
possible that we call the function twice, writing out
shallow info the first time and then "reusing" our tempfile
object the second time. But:

  1. It seems unlikely that this was the intent, as hitting
     this code path would imply somebody clearing the
     shallow_info list between calls.

     And if somebody _did_ call the function multiple times
     without clearing the shallow_info list, we'd hit a
     different BUG for trying to reuse an already-active
     tempfile.

  2. I verified by code inspection that the function is only
     called once per program. And also replacing this code
     with a BUG() and running the test suite demonstrates
     that it is not triggered there.

So we could probably just replace this with an assertion and
confirm that it's never called. However, the original intent
does seem to be that you _could_ call it when the
shallow_info is empty. And that's easy enough to do; since
the return value doesn't need to point to a writable buffer,
we can just return a string literal.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06 17:19:52 +09:00
Stefan Beller
8b65a34c4a commit: convert lookup_commit_graft to struct object_id
With this patch, commit.h doesn't contain the string 'sha1' any more.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-13 12:02:40 -07:00
Junio C Hamano
f056cde60e Merge branch 'rs/use-div-round-up'
Code cleanup.

* rs/use-div-round-up:
  use DIV_ROUND_UP
2017-07-12 15:18:23 -07:00
René Scharfe
42c78a216e use DIV_ROUND_UP
Convert code that divides and rounds up to use DIV_ROUND_UP to make the
intent clearer and reduce the number of magic constants.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-10 14:24:36 -07:00
Junio C Hamano
849e671b52 Merge branch 'js/plug-leaks'
Fix memory leaks pointed out by Coverity (and people).

* js/plug-leaks: (26 commits)
  checkout: fix memory leak
  submodule_uses_worktrees(): plug memory leak
  show_worktree(): plug memory leak
  name-rev: avoid leaking memory in the `deref` case
  remote: plug memory leak in match_explicit()
  add_reflog_for_walk: avoid memory leak
  shallow: avoid memory leak
  line-log: avoid memory leak
  receive-pack: plug memory leak in update()
  fast-export: avoid leaking memory in handle_tag()
  mktree: plug memory leaks reported by Coverity
  pack-redundant: plug memory leak
  setup_discovered_git_dir(): plug memory leak
  setup_bare_git_dir(): help static analysis
  split_commit_in_progress(): simplify & fix memory leak
  checkout: fix memory leak
  cat-file: fix memory leak
  mailinfo & mailsplit: check for EOF while parsing
  status: close file descriptor after reading git-rebase-todo
  difftool: address a couple of resource/memory leaks
  ...
2017-05-29 12:34:44 +09:00
brian m. carlson
bc83266abe Convert lookup_commit* to struct object_id
Convert lookup_commit, lookup_commit_or_die,
lookup_commit_reference, and lookup_commit_reference_gently to take
struct object_id arguments.

Introduce a temporary in parse_object buffer in order to convert this
function.  This is required since in order to convert parse_object and
parse_object_buffer, lookup_commit_reference_gently and
lookup_commit_or_die would need to be converted.  Not introducing a
temporary would therefore require that lookup_commit_or_die take a
struct object_id *, but lookup_commit would take unsigned char *,
leaving a confusing and hard-to-use interface.

parse_object_buffer will lose this temporary in a later patch.

This commit was created with manual changes to commit.c, commit.h, and
object.c, plus the following semantic patch:

@@
expression E1, E2;
@@
- lookup_commit_reference_gently(E1.hash, E2)
+ lookup_commit_reference_gently(&E1, E2)

@@
expression E1, E2;
@@
- lookup_commit_reference_gently(E1->hash, E2)
+ lookup_commit_reference_gently(E1, E2)

@@
expression E1;
@@
- lookup_commit_reference(E1.hash)
+ lookup_commit_reference(&E1)

@@
expression E1;
@@
- lookup_commit_reference(E1->hash)
+ lookup_commit_reference(E1)

@@
expression E1;
@@
- lookup_commit(E1.hash)
+ lookup_commit(&E1)

@@
expression E1;
@@
- lookup_commit(E1->hash)
+ lookup_commit(E1)

@@
expression E1, E2;
@@
- lookup_commit_or_die(E1.hash, E2)
+ lookup_commit_or_die(&E1, E2)

@@
expression E1, E2;
@@
- lookup_commit_or_die(E1->hash, E2)
+ lookup_commit_or_die(E1, E2)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:57 +09:00
brian m. carlson
1e43ed9867 Convert remaining callers of lookup_commit_reference* to object_id
There are a small number of remaining callers of lookup_commit_reference
and lookup_commit_reference_gently that still need to be converted to
struct object_id.  Convert these.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:57 +09:00
brian m. carlson
e92b848cb6 shallow: convert shallow registration functions to object_id
Convert register_shallow and unregister_shallow to take struct
object_id.  register_shallow is a caller of lookup_commit, which we will
convert later.  It doesn't make sense for the registration and
unregistration functions to have incompatible interfaces, so convert
them both.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:57 +09:00
Johannes Schindelin
7c565a6b2d shallow: avoid memory leak
Reported by Coverity.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 12:18:20 +09:00
brian m. carlson
910650d2f8 Rename sha1_array to oid_array
Since this structure handles an array of object IDs, rename it to struct
oid_array.  Also rename the accessor functions and the initialization
constant.

This commit was produced mechanically by providing non-Documentation
files to the following Perl one-liners:

    perl -pi -E 's/struct sha1_array/struct oid_array/g'
    perl -pi -E 's/\bsha1_array_/oid_array_/g'
    perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g'

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-31 08:33:56 -07:00
brian m. carlson
ee3051bd23 sha1-array: convert internal storage for struct sha1_array to object_id
Make the internal storage for struct sha1_array use an array of struct
object_id internally.  Update the users of this struct which inspect its
internals.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-28 09:59:34 -07:00
Junio C Hamano
3c9979be9b Merge branch 'nd/shallow-fixup'
Code cleanup in shallow boundary computation.

* nd/shallow-fixup:
  shallow.c: remove useless code
  shallow.c: bit manipulation tweaks
  shallow.c: avoid theoretical pointer wrap-around
  shallow.c: make paint_alloc slightly more robust
  shallow.c: stop abusing COMMIT_SLAB_SIZE for paint_info's memory pools
  shallow.c: rename fields in paint_info to better express their purposes
2016-12-21 14:55:02 -08:00
Nguyễn Thái Ngọc Duy
649b0c316a shallow.c: remove useless code
Some context before we talk about the removed code.

This paint_down() is part of step 6 of 58babff (shallow.c: the 8 steps
to select new commits for .git/shallow - 2013-12-05). When we fetch from
a shallow repository, we need to know if one of the new/updated refs
needs new "shallow commits" in .git/shallow (because we don't have
enough history of those refs) and which one.

The question at step 6 is, what (new) shallow commits are required in
other to maintain reachability throughout the repository _without_
cutting our history short? To answer, we mark all commits reachable from
existing refs with UNINTERESTING ("rev-list --not --all"), mark shallow
commits with BOTTOM, then for each new/updated refs, walk through the
commit graph until we either hit UNINTERESTING or BOTTOM, marking the
ref on the commit as we walk.

After all the walking is done, we check the new shallow commits. If we
have not seen any new ref marked on a new shallow commit, we know all
new/updated refs are reachable using just our history and .git/shallow.
The shallow commit in question is not needed and can be thrown away.

So, the code.

The loop here (to walk through commits) is basically

1.  get one commit from the queue
2.  ignore if it's SEEN or UNINTERESTING
3.  mark it
4.  go through all the parents and..
5a. mark it if it's never marked before
5b. put it back in the queue

What we do in this patch is drop step 5a because it is not
necessary. The commit being marked at 5a is put back on the queue, and
will be marked at step 3 at the next iteration. The only case it will
not be marked is when the commit is already marked UNINTERESTING (5a
does not check this), which will be ignored at step 2.

But we don't care about refs marking on UNINTERESTING. We care about the
marking on _shallow commits_ that are not reachable from our current
history (and having UNINTERESTING on it means it's reachable). So it's
ok for an UNINTERESTING not to be ref-marked.

Reported-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-07 15:44:31 -08:00
Rasmus Villemoes
1127b3ced5 shallow.c: bit manipulation tweaks
First of all, 1 << 31 is technically undefined behaviour, so let's just
use an unsigned literal.

If i is 'signed int' and gcc doesn't know that i is positive, gcc
generates code to compute the C99-mandated values of "i / 32" and "i %
32", which is a lot more complicated than simple a simple shifts/mask.

The only caller of paint_down actually passes an "unsigned int" value,
but the prototype of paint_down causes (completely well-defined)
conversion to signed int, and gcc has no way of knowing that the
converted value is non-negative. Just make the id parameter unsigned.

In update_refstatus, the change in generated code is much smaller,
presumably because gcc is smart enough to see that i starts as 0 and is
only incremented, so it is allowed (per the UD of signed overflow) to
assume that i is always non-negative. But let's just help less smart
compilers generate good code anyway.

Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-07 15:44:31 -08:00
Rasmus Villemoes
381aa8e730 shallow.c: avoid theoretical pointer wrap-around
The expression info->free+size is technically undefined behaviour in
exactly the case we want to test for. Moreover, the compiler is likely
to translate the expression to

  (unsigned long)info->free + size > (unsigned long)info->end

where there's at least a theoretical chance that the LHS could wrap
around 0, giving a false negative.

This might as well be written using pointer subtraction avoiding these
issues.

Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-07 15:44:31 -08:00
Nguyễn Thái Ngọc Duy
f2386c6b77 shallow.c: make paint_alloc slightly more robust
paint_alloc() allocates a big block of memory and splits it into
smaller, fixed size, chunks of memory whenever it's called. Each chunk
contains enough bits to present all "new refs" [1] in a fetch from a
shallow repository.

We do not check if the new "big block" is smaller than the requested
memory chunk though. If it happens, we'll happily pass back a memory
region smaller than expected. Which will lead to problems eventually.

A normal fetch may add/update a dozen new refs. Let's stay on the
"reasonably extreme" side and say we need 16k refs (or bits from
paint_alloc's perspective). Each chunk of memory would be 2k, much
smaller than the memory pool (512k).

So, normally, the under-allocation situation should never happen. A bad
guy, however, could make a fetch that adds more than 4m new/updated refs
to this code which results in a memory chunk larger than pool size.
Check this case and abort.

Noticed-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
Reviewed-by: Jeff King <peff@peff.net>

[1] Details are in commit message of 58babff (shallow.c: the 8 steps to
    select new commits for .git/shallow - 2013-12-05), step 6.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-07 15:44:31 -08:00
Nguyễn Thái Ngọc Duy
6bc3d8c5ec shallow.c: stop abusing COMMIT_SLAB_SIZE for paint_info's memory pools
We need to allocate a "big" block of memory in paint_alloc(). The exact
size does not really matter. But the pool size has no relation with
commit-slab. Stop using that macro here.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-07 15:44:31 -08:00
Nguyễn Thái Ngọc Duy
0afd307ab4 shallow.c: rename fields in paint_info to better express their purposes
paint_alloc() is basically malloc(), tuned for allocating a fixed number
of bits on every call without worrying about freeing any individual
allocation since all will be freed at the end. It does it by allocating
a big block of memory every time it runs out of "free memory". "slab" is
a poor choice of name, at least poorer than "pool".

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-07 15:44:31 -08:00
Junio C Hamano
dbaa6bdce2 Merge branch 'ls/filter-process'
The smudge/clean filter API expect an external process is spawned
to filter the contents for each path that has a filter defined.  A
new type of "process" filter API has been added to allow the first
request to run the filter for a path to spawn a single process, and
all filtering need is served by this single process for multiple
paths, reducing the process creation overhead.

* ls/filter-process:
  contrib/long-running-filter: add long running filter example
  convert: add filter.<driver>.process option
  convert: prepare filter.<driver>.process option
  convert: make apply_filter() adhere to standard Git error handling
  pkt-line: add functions to read/write flush terminated packet streams
  pkt-line: add packet_write_gently()
  pkt-line: add packet_flush_gently()
  pkt-line: add packet_write_fmt_gently()
  pkt-line: extract set_packet_header()
  pkt-line: rename packet_write() to packet_write_fmt()
  run-command: add clean_on_exit_handler
  run-command: move check_pipe() from write_or_die to run_command
  convert: modernize tests
  convert: quote filter names in error messages
2016-10-31 13:15:21 -07:00
Lars Schneider
81c634e94f pkt-line: rename packet_write() to packet_write_fmt()
packet_write() should be called packet_write_fmt() because it is a
printf-like function that takes a format string as first parameter.

packet_write_fmt() should be used for text strings only. Arbitrary
binary data should use a new packet_write() function that is introduced
in a subsequent patch.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-17 11:36:50 -07:00
Junio C Hamano
a460ea4a3c Merge branch 'nd/shallow-deepen'
The existing "git fetch --depth=<n>" option was hard to use
correctly when making the history of an existing shallow clone
deeper.  A new option, "--deepen=<n>", has been added to make this
easier to use.  "git clone" also learned "--shallow-since=<date>"
and "--shallow-exclude=<tag>" options to make it easier to specify
"I am interested only in the recent N months worth of history" and
"Give me only the history since that version".

* nd/shallow-deepen: (27 commits)
  fetch, upload-pack: --deepen=N extends shallow boundary by N commits
  upload-pack: add get_reachable_list()
  upload-pack: split check_unreachable() in two, prep for get_reachable_list()
  t5500, t5539: tests for shallow depth excluding a ref
  clone: define shallow clone boundary with --shallow-exclude
  fetch: define shallow boundary with --shallow-exclude
  upload-pack: support define shallow boundary by excluding revisions
  refs: add expand_ref()
  t5500, t5539: tests for shallow depth since a specific date
  clone: define shallow clone boundary based on time with --shallow-since
  fetch: define shallow boundary with --shallow-since
  upload-pack: add deepen-since to cut shallow repos based on time
  shallow.c: implement a generic shallow boundary finder based on rev-list
  fetch-pack: use a separate flag for fetch in deepening mode
  fetch-pack.c: mark strings for translating
  fetch-pack: use a common function for verbose printing
  fetch-pack: use skip_prefix() instead of starts_with()
  upload-pack: move rev-list code out of check_non_tip()
  upload-pack: make check_non_tip() clean things up on error
  upload-pack: tighten number parsing at "deepen" lines
  ...
2016-10-10 14:03:50 -07:00
René Scharfe
50492f7b38 pass constants as first argument to st_mult()
The result of st_mult() is the same no matter the order of its
arguments.  It invokes the macro unsigned_mult_overflows(), which
divides the second parameter by the first one.  Pass constants
first to allow that division to be done already at compile time.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-01 14:01:03 -07:00
Nguyễn Thái Ngọc Duy
3d9ff4d736 shallow.c: implement a generic shallow boundary finder based on rev-list
Instead of a custom commit walker like get_shallow_commits(), this new
function uses rev-list to mark NOT_SHALLOW to all reachable commits,
except borders. The definition of reachable is to be defined by the
protocol later. This makes it more flexible to define shallow boundary.

The way we find border is paint all reachable commits NOT_SHALLOW.  Any
of them that "touches" commits without NOT_SHALLOW flag are considered
shallow (e.g. zero parents via grafting mechanism). Shallow commits and
their true parents are all marked SHALLOW. Then NOT_SHALLOW is removed
from shallow commits at the end.

There is an interesting observation. With a generic walker, we can
produce all kinds of shallow cutting. In the following graph, every
commit but "x" is reachable. "b" is a parent of "a".

           x -- a -- o
          /    /
    x -- c -- b -- o

After this function is run, "a" and "c" are both considered shallow
commits. After grafting occurs at the client side, what we see is

                a -- o
                    /
         c -- b -- o

Notice that because of grafting, "a" has zero parents, so "b" is no
longer a parent of "a".

This is unfortunate and may be solved in two ways. The first is change
the way shallow grafting works and keep "a -- b" connection if "b"
exists and always ends at shallow commits (iow, no loose ends). This is
hard to detect, or at least not cheap to do.

The second way is mark one "x" as shallow commit instead of "a" and
produce this graph at client side:

           x -- a -- o
               /    /
         c -- b -- o

More commits, but simpler grafting rules.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-13 14:38:16 -07:00
Jeff King
50a6c8efa2 use st_add and st_mult for allocation size computation
If our size computation overflows size_t, we may allocate a
much smaller buffer than we expected and overflow it. It's
probably impossible to trigger an overflow in most of these
sites in practice, but it is easy enough convert their
additions and multiplications into overflow-checking
variants. This may be fixing real bugs, and it makes
auditing the code easier.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 14:51:09 -08:00
Jeff King
b32fa95fd8 convert trivial cases to ALLOC_ARRAY
Each of these cases can be converted to use ALLOC_ARRAY or
REALLOC_ARRAY, which has two advantages:

  1. It automatically checks the array-size multiplication
     for overflow.

  2. It always uses sizeof(*array) for the element-size,
     so that it can never go out of sync with the declared
     type of the array.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 14:51:09 -08:00
brian m. carlson
ed1c9977cb Remove get_object_hash.
Convert all instances of get_object_hash to use an appropriate reference
to the hash member of the oid member of struct object.  This provides no
functional change, as it is essentially a macro substitution.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
brian m. carlson
f2fd0760f6 Convert struct object to object_id
struct object is one of the major data structures dealing with object
IDs.  Convert it to use struct object_id instead of an unsigned char
array.  Convert get_object_hash to refer to the new member as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
brian m. carlson
7999b2cf77 Add several uses of get_object_hash.
Convert most instances where the sha1 member of struct object is
dereferenced to use get_object_hash.  Most instances that are passed to
functions that have versions taking struct object_id, such as
get_sha1_hex/get_oid_hex, or instances that can be trivially converted
to use struct object_id instead, are not converted.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
Junio C Hamano
0692a6c22c Merge branch 'rs/pop-commit'
Code simplification.

* rs/pop-commit:
  use pop_commit() for consuming the first entry of a struct commit_list
2015-10-30 13:07:03 -07:00
Junio C Hamano
433cc7e3fb Merge branch 'tk/sigchain-unnecessary-post-tempfile'
Remove no-longer used #include.

* tk/sigchain-unnecessary-post-tempfile:
  shallow: remove unused #include "sigchain.h"
  read-cache: remove unused #include "sigchain.h"
  diff: remove unused #include "sigchain.h"
  credential-cache--daemon: remove unused #include "sigchain.h"
2015-10-29 13:59:18 -07:00
René Scharfe
e510ab8988 use pop_commit() for consuming the first entry of a struct commit_list
Instead of open-coding the function pop_commit() just call it.  This
makes the intent clearer and reduces code size.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-26 14:06:46 -07:00
Tobias Klauser
4b7c50a184 shallow: remove unused #include "sigchain.h"
After switching to use the tempfile module in commit 6e122b44
(setup_temporary_shallow(): use tempfile module), no declarations from
sigchain.h are used in read-cache.c anymore. Thus, remove the #include.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-22 11:12:38 -07:00
Junio C Hamano
db86e61cbb Merge branch 'mh/tempfile'
The "lockfile" API has been rebuilt on top of a new "tempfile" API.

* mh/tempfile:
  credential-cache--daemon: use tempfile module
  credential-cache--daemon: delete socket from main()
  gc: use tempfile module to handle gc.pid file
  lock_repo_for_gc(): compute the path to "gc.pid" only once
  diff: use tempfile module
  setup_temporary_shallow(): use tempfile module
  write_shared_index(): use tempfile module
  register_tempfile(): new function to handle an existing temporary file
  tempfile: add several functions for creating temporary files
  prepare_tempfile_object(): new function, extracted from create_tempfile()
  tempfile: a new module for handling temporary files
  commit_lock_file(): use get_locked_file_path()
  lockfile: add accessor get_lock_file_path()
  lockfile: add accessors get_lock_file_fd() and get_lock_file_fp()
  create_bundle(): duplicate file descriptor to avoid closing it twice
  lockfile: move documentation to lockfile.h and lockfile.c
2015-08-25 14:57:09 -07:00
Jeff King
f932729cc7 memoize common git-path "constant" files
One of the most common uses of git_path() is to pass a
constant, like git_path("MERGE_MSG"). This has two
drawbacks:

  1. The return value is a static buffer, and the lifetime
     is dependent on other calls to git_path, etc.

  2. There's no compile-time checking of the pathname. This
     is OK for a one-off (after all, we have to spell it
     correctly at least once), but many of these constant
     strings appear throughout the code.

This patch introduces a series of functions to "memoize"
these strings, which are essentially globals for the
lifetime of the program. We compute the value once, take
ownership of the buffer, and return the cached value for
subsequent calls.  cache.h provides a helper macro for
defining these functions as one-liners, and defines a few
common ones for global use.

Using a macro is a little bit gross, but it does nicely
document the purpose of the functions. If we need to touch
them all later (e.g., because we learned how to change the
git_dir variable at runtime, and need to invalidate all of
the stored values), it will be much easier to have the
complete list.

Note that the shared-global functions have separate, manual
declarations. We could do something clever with the macros
(e.g., expand it to a declaration in some places, and a
declaration _and_ a definition in path.c). But there aren't
that many, and it's probably better to stay away from
too-magical macros.

Likewise, if we abandon the C preprocessor in favor of
generating these with a script, we could get much fancier.
E.g., normalizing "FOO/BAR-BAZ" into "git_path_foo_bar_baz".
But the small amount of saved typing is probably not worth
the resulting confusion to readers who want to grep for the
function's definition.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-10 15:37:14 -07:00
Michael Haggerty
6e122b449b setup_temporary_shallow(): use tempfile module
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-10 12:57:14 -07:00
Michael Haggerty
b4fb09e4da lockfile: add accessor get_lock_file_path()
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-10 12:57:14 -07:00
Michael Haggerty
580b04ef98 shallow: rewrite functions to take object_id arguments
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-25 12:19:36 -07:00
Michael Haggerty
2b2a5be394 each_ref_fn: change to take an object_id parameter
Change typedef each_ref_fn to take a "const struct object_id *oid"
parameter instead of "const unsigned char *sha1".

To aid this transition, implement an adapter that can be used to wrap
old-style functions matching the old typedef, which is now called
"each_ref_sha1_fn"), and make such functions callable via the new
interface. This requires the old function and its cb_data to be
wrapped in a "struct each_ref_fn_sha1_adapter", and that object to be
used as the cb_data for an adapter function, each_ref_fn_adapter().

This is an enormous diff, but most of it consists of simple,
mechanical changes to the sites that call any of the "for_each_ref"
family of functions. Subsequent to this change, the call sites can be
rewritten one by one to use the new interface.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-25 12:19:27 -07:00
Junio C Hamano
a916cb5fb4 Merge branch 'bc/object-id'
Identify parts of the code that knows that we use SHA-1 hash to
name our objects too much, and use (1) symbolic constants instead
of hardcoded 20 as byte count and/or (2) use struct object_id
instead of unsigned char [20] for object names.

* bc/object-id:
  apply: convert threeway_stage to object_id
  patch-id: convert to use struct object_id
  commit: convert parts to struct object_id
  diff: convert struct combine_diff_path to object_id
  bulk-checkin.c: convert to use struct object_id
  zip: use GIT_SHA1_HEXSZ for trailers
  archive.c: convert to use struct object_id
  bisect.c: convert leaf functions to use struct object_id
  define utility functions for object IDs
  define a structure for object IDs
2015-05-05 21:00:23 -07:00
brian m. carlson
7683e2e6e3 commit: convert parts to struct object_id
Convert struct commit_graft and necessary local parts of commit.c.
Also, convert several constants based on the hex length of an SHA-1 to
use GIT_SHA1_HEXSZ, and move several magic constants into variables for
readability.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-13 22:43:13 -07:00
Junio C Hamano
faf723a631 Merge branch 'jk/blame-commit-label' into maint
"git blame HEAD -- missing" failed to correctly say "HEAD" when it
tried to say "No such path 'missing' in HEAD".

* jk/blame-commit-label:
  blame.c: fix garbled error message
  use xstrdup_or_null to replace ternary conditionals
  builtin/commit.c: use xstrdup_or_null instead of envdup
  builtin/apply.c: use xstrdup_or_null instead of null_strdup
  git-compat-util: add xstrdup_or_null helper
2015-02-24 22:09:54 -08:00
Junio C Hamano
c985aaf879 Merge branch 'jc/unused-symbols'
Mark file-local symbols as "static", and drop functions that nobody
uses.

* jc/unused-symbols:
  shallow.c: make check_shallow_file_for_update() static
  remote.c: make clear_cas_option() static
  urlmatch.c: make match_urls() static
  revision.c: make save_parents() and free_saved_parents() static
  line-log.c: make line_log_data_init() static
  pack-bitmap.c: make pack_bitmap_filename() static
  prompt.c: remove git_getpass() nobody uses
  http.c: make finish_active_slot() and handle_curl_result() static
2015-02-11 13:44:07 -08:00
Junio C Hamano
092c4be7f5 Merge branch 'jk/blame-commit-label'
"git blame HEAD -- missing" failed to correctly say "HEAD" when it
tried to say "No such path 'missing' in HEAD".

* jk/blame-commit-label:
  blame.c: fix garbled error message
  use xstrdup_or_null to replace ternary conditionals
  builtin/commit.c: use xstrdup_or_null instead of envdup
  builtin/apply.c: use xstrdup_or_null instead of null_strdup
  git-compat-util: add xstrdup_or_null helper
2015-02-11 13:39:50 -08:00
Junio C Hamano
167832c2ca shallow.c: make check_shallow_file_for_update() static
No external callers exist.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-15 11:05:48 -08:00
Jeff King
8c53f0719b use xstrdup_or_null to replace ternary conditionals
This replaces "x ? xstrdup(x) : NULL" with xstrdup_or_null(x).
The change is fairly mechanical, with the exception of
resolve_refdup, which can eliminate a temporary variable.

There are still a few hits grepping for "?.*xstrdup", but
these are of slightly different forms and cannot be
converted (e.g., "x ? xstrdup(x->foo) : NULL").

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-13 10:05:48 -08:00
Junio C Hamano
e4da4fbe0e Merge branch 'eb/no-pthreads'
Allow us build with NO_PTHREADS=NoThanks compilation option.

* eb/no-pthreads:
  Handle atexit list internaly for unthreaded builds
  pack-objects: set number of threads before checking and warning
  index-pack: fix compilation with NO_PTHREADS
2014-10-24 14:59:10 -07:00
Etienne Buira
0f4b6db3ba Handle atexit list internaly for unthreaded builds
Wrap atexit()s calls on unthreaded builds to handle callback list
internally.

This is needed because on unthreaded builds, asyncs inherits parent's
atexit() list, that gets run as soon as the async exit()s (and again at
the end of async's parent process). That led to remove temporary files
too early.

Also remove a by-atexit-callback guard against this kind of issue in
clone.c, as this patch makes it redundant.

Fixes test 5537 (temporary shallow file vanished before unpack-objects
could open it)

BTW remove an unused variable in shallow.c.

Helped-by: Duy Nguyen <pclouds@gmail.com>
Helped-by: Andreas Schwab <schwab@linux-m68k.org>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Etienne Buira <etienne.buira@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:38:30 -07:00
Michael Haggerty
697cc8efd9 lockfile.h: extract new header file for the functions in lockfile.c
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).

Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:56:14 -07:00
Michael Haggerty
cf6950d3bf lockfile: change lock_file::filename into a strbuf
For now, we still make sure to allocate at least PATH_MAX characters
for the strbuf because resolve_symlink() doesn't know how to expand
the space for its return value.  (That will be fixed in a moment.)

Another alternative would be to just use a strbuf as scratch space in
lock_file() but then store a pointer to the naked string in struct
lock_file.  But lock_file objects are often reused.  By reusing the
same strbuf, we can avoid having to reallocate the string most times
when a lock_file object is reused.

Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:50:01 -07:00
René Scharfe
2756ca4347 use REALLOC_ARRAY for changing the allocation size of arrays
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-18 09:13:42 -07:00
Karsten Blees
6aa3085702 trace: improve trace performance
The trace API currently rechecks the environment variable and reopens the
trace file on every API call. This has the ugly side effect that errors
(e.g. file cannot be opened, or the user specified a relative path) are
also reported on every call. Performance can be improved by about factor
three by remembering the environment state and keeping the file open.

Replace the 'const char *key' parameter in the API with a pointer to a
'struct trace_key' that bundles the environment variable name with
additional, trace-internal state. Change the call sites of these APIs to
use a static 'struct trace_key' instead of a string constant.

In trace.c::get_trace_fd(), save and reuse the file descriptor in 'struct
trace_key'.

Add a 'trace_disable()' API, so that packet_trace() can cleanly disable
tracing when it encounters packed data (instead of using unsetenv()).

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-13 21:24:23 -07:00
Jeff King
7839632167 shallow: verify shallow file after taking lock
Before writing the shallow file, we stat() the existing file
to make sure it has not been updated since our operation
began. However, we do not do so under a lock, so there is a
possible race:

  1. Process A takes the lock.

  2. Process B calls check_shallow_file_for_update and finds
     no update.

  3. Process A commits the lockfile.

  4. Process B takes the lock, then overwrite's process A's
     changes.

We can fix this by doing our check while we hold the lock.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-17 15:03:32 -07:00
Jeff King
0179c945fc shallow: automatically clean up shallow tempfiles
We sometimes write tempfiles of the form "shallow_XXXXXX"
during fetch/push operations with shallow repositories.
Under normal circumstances, we clean up the result when we
are done. However, we do no take steps to clean up after
ourselves when we exit due to die() or signal death.

This patch teaches the tempfile creation code to register
handlers to clean up after ourselves. To handle this, we
change the ownership semantics of the filename returned by
setup_temporary_shallow. It now keeps a copy of the filename
itself, and returns only a const pointer to it.

We can also do away with explicit tempfile removal in the
callers. They all exit not long after finishing with the
file, so they can rely on the auto-cleanup, simplifying the
code.

Note that we keep things simple and maintain only a single
filename to be cleaned. This is sufficient for the current
caller, but we future-proof it with a die("BUG").

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-27 12:07:13 -08:00
Jeff King
0cc77c386c shallow: use stat_validity to check for up-to-date file
When we are about to write the shallow file, we check that
it has not changed since we last read it. Instead of
hand-rolling this, we can use stat_validity. This is built
around the index stat-check, so it is more robust than just
checking the mtime, as we do now (it uses the same check as
we do for index files).

The new code also handles the case of a shallow file
appearing unexpectedly. With the current code, two
simultaneous processes making us shallow (e.g., two "git
fetch --depth=1" running at the same time in a non-shallow
repository) can race to overwrite each other.

As a bonus, we also remove a race in determining the stat
information of what we read (we stat and then open, leaving
a race window; instead we should open and then fstat the
descriptor).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-27 12:04:23 -08:00
Junio C Hamano
92251b1b5b Merge branch 'nd/shallow-clone'
Fetching from a shallow-cloned repository used to be forbidden,
primarily because the codepaths involved were not carefully vetted
and we did not bother supporting such usage. This attempts to allow
object transfer out of a shallow-cloned repository in a controlled
way (i.e. the receiver become a shallow repository with truncated
history).

* nd/shallow-clone: (31 commits)
  t5537: fix incorrect expectation in test case 10
  shallow: remove unused code
  send-pack.c: mark a file-local function static
  git-clone.txt: remove shallow clone limitations
  prune: clean .git/shallow after pruning objects
  clone: use git protocol for cloning shallow repo locally
  send-pack: support pushing from a shallow clone via http
  receive-pack: support pushing to a shallow clone via http
  smart-http: support shallow fetch/clone
  remote-curl: pass ref SHA-1 to fetch-pack as well
  send-pack: support pushing to a shallow clone
  receive-pack: allow pushes that update .git/shallow
  connected.c: add new variant that runs with --shallow-file
  add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses
  receive/send-pack: support pushing from a shallow clone
  receive-pack: reorder some code in unpack()
  fetch: add --update-shallow to accept refs that update .git/shallow
  upload-pack: make sure deepening preserves shallow roots
  fetch: support fetching from a shallow repository
  clone: support remote shallow repository
  ...
2014-01-17 12:21:20 -08:00
Ramsay Jones
feefdf62c1 shallow: remove unused code
Commit 58babfff ("shallow.c: the 8 steps to select new commits for
.git/shallow", 05-12-2013) added a function to implement step 5 of
the quoted eight steps, namely 'remove_nonexistent_ours_in_pack()'.
This function implements an optional optimization step in the new
shallow commit selection algorithm. However, this function has no
callers. (The commented out call sites would need to change, in
order to provide information required by the function.)

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Acked-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-06 09:05:40 -08:00
Nguyễn Thái Ngọc Duy
eab3296c7e prune: clean .git/shallow after pruning objects
This patch teaches "prune" to remove shallow roots that are no longer
reachable from any refs (e.g. when the relevant refs are removed).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10 16:14:19 -08:00
Nguyễn Thái Ngọc Duy
0a1bc12b6e receive-pack: allow pushes that update .git/shallow
The basic 8 steps to update .git/shallow does not fully apply here
because the user may choose to accept just a few refs (while fetch
always accepts all refs). The steps are modified a bit.

1-6. same as before. After calling assign_shallow_commits_to_refs at
   step 6, each shallow commit has a bitmap that marks all refs that
   require it.

7. mark all "ours" shallow commits that are reachable from any
   refs. We will need to do the original step 7 on them later.

8. go over all shallow commit bitmaps, mark refs that require new
   shallow commits.

9. setup a strict temporary shallow file to plug all the holes, even
   if it may cut some of our history short. This file is used by all
   hooks. The hooks could use --shallow-file=$GIT_DIR/shallow to
   overcome this and reach everything in current repo.

10. go over the new refs one by one. For each ref, do the reachability
   test if it needs a shallow commit on the list from step 7. Remove
   it if it's reachable from our refs. Gather all required shallow
   commits, run check_everything_connected() with the new ref, then
   install them to .git/shallow.

This mode is disabled by default and can be turned on with
receive.shallowupdate

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10 16:14:18 -08:00
Nguyễn Thái Ngọc Duy
069c053222 add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses
This may be needed when a hook is run after a new shallow pack is
received, but .git/shallow is not settled yet. A temporary shallow
file to plug all loose ends should be used instead. GIT_SHALLOW_FILE
is overriden by --shallow-file.

--shallow-file does not work in this case because the hook may spawn
many git subprocesses and the launch commands do not have
--shallow-file as it's a recent addition.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10 16:14:17 -08:00
Nguyễn Thái Ngọc Duy
79d3a236c5 upload-pack: make sure deepening preserves shallow roots
When "fetch --depth=N" where N exceeds the longest chain of history in
the source repo, usually we just send an "unshallow" line to the
client so full history is obtained.

When the source repo is shallow we need to make sure to "unshallow"
the current shallow point _and_ "shallow" again when the commit
reaches its shallow bottom in the source repo.

This should fix both cases: large <N> and --unshallow.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10 16:14:17 -08:00
Nguyễn Thái Ngọc Duy
8e277383e0 shallow.c: steps 6 and 7 to select new commits for .git/shallow
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10 16:14:16 -08:00
Nguyễn Thái Ngọc Duy
58babfffde shallow.c: the 8 steps to select new commits for .git/shallow
Suppose a fetch or push is requested between two shallow repositories
(with no history deepening or shortening). A pack that contains
necessary objects is transferred over together with .git/shallow of
the sender. The receiver has to determine whether it needs to update
.git/shallow if new refs needs new shallow comits.

The rule here is avoid updating .git/shallow by default. But we don't
want to waste the received pack. If the pack contains two refs, one
needs new shallow commits installed in .git/shallow and one does not,
we keep the latter and reject/warn about the former.

Even if .git/shallow update is allowed, we only add shallow commits
strictly necessary for the former ref (remember the sender can send
more shallow commits than necessary) and pay attention not to
accidentally cut the receiver history short (no history shortening is
asked for)

So the steps to figure out what ref need what new shallow commits are:

1. Split the sender shallow commit list into "ours" and "theirs" list
   by has_sha1_file. Those that exist in current repo in "ours", the
   remaining in "theirs".

2. Check the receiver .git/shallow, remove from "ours" the ones that
   also exist in .git/shallow.

3. Fetch the new pack. Either install or unpack it.

4. Do has_sha1_file on "theirs" list again. Drop the ones that fail
   has_sha1_file. Obviously the new pack does not need them.

5. If the pack is kept, remove from "ours" the ones that do not exist
   in the new pack.

6. Walk the new refs to answer the question "what shallow commits,
   both ours and theirs, are required in .git/shallow in order to add
   this ref?". Shallow commits not associated to any refs are removed
   from their respective list.

7. (*) Check reachability (from the current refs) of all remaining
   commits in "ours". Those reachable are removed. We do not want to
   cut any part of our (reachable) history. We only check up
   commits. True reachability test is done by
   check_everything_connected() at the end as usual.

8. Combine the final "ours" and "theirs" and add them all to
   .git/shallow. Install new refs. The case where some hook rejects
   some refs on a push is explained in more detail in the push
   patches.

Of these steps, #6 and #7 are expensive. Both require walking through
some commits, or in the worst case all commits. And we rather avoid
them in at least common case, where the transferred pack does not
contain any shallow commits that the sender advertises. Let's look at
each scenario:

1) the sender has longer history than the receiver

   All shallow commits from the sender will be put into "theirs" list
   at step 1 because none of them exists in current repo. In the
   common case, "theirs" becomes empty at step 4 and exit early.

2) the sender has shorter history than the receiver

   All shallow commits from the sender are likely in "ours" list at
   step 1. In the common case, if the new pack is kept, we could empty
   "ours" and exit early at step 5.

   If the pack is not kept, we hit the expensive step 6 then exit
   after "ours" is emptied. There'll be only a handful of objects to
   walk in fast-forward case. If it's forced update, we may need to
   walk to the bottom.

3) the sender has same .git/shallow as the receiver

   This is similar to case 2 except that "ours" should be emptied at
   step 2 and exit early.

A fetch after "clone --depth=X" is case 1. A fetch after "clone" (from
a shallow repo) is case 3. Luckily they're cheap for the common case.

A push from "clone --depth=X" falls into case 2, which is expensive.
Some more work may be done at the sender/client side to avoid more
work on the server side: if the transferred pack does not contain any
shallow commits, send-pack should not send any shallow commits to the
receive-pack, effectively turning it into a normal push and avoid all
steps.

This patch implements all steps except #3, already handled by
fetch-pack and receive-pack, #6 and #7, which has their own patch due
to their size.

(*) in previous versions step 7 was put before step 3. I reorder it so
    that the common case that keeps the pack does not need to walk
    commits at all. In future if we implement faster commit
    reachability check (maybe with the help of pack bitmaps or commit
    cache), step 7 could become cheap and be moved up before 6 again.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10 16:14:16 -08:00
Nguyễn Thái Ngọc Duy
1a30f5a2f2 shallow.c: extend setup_*_shallow() to accept extra shallow commits
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10 16:14:16 -08:00
Nguyễn Thái Ngọc Duy
ad491366de make the sender advertise shallow commits to the receiver
If either receive-pack or upload-pack is called on a shallow
repository, shallow commits (*) will be sent after the ref
advertisement (but before the packet flush), so that the receiver has
the full "shape" of the sender's commit graph. This will be needed for
the receiver to update its .git/shallow if necessary.

This breaks the protocol for all clients trying to push to a shallow
repo, or fetch from one. Which is basically the same end result as
today's "is_repository_shallow() && die()" in receive-pack and
upload-pack. New clients will be made aware of shallow upstream and
can make use of this information.

The sender must send all shallow commits that are sent in the
following pack. It may send more shallow commits than necessary.

upload-pack for example may choose to advertise no shallow commits if
it knows in advance that the pack it's going to send contains no
shallow commits. But upload-pack is the server, so we choose the
cheaper way, send full .git/shallow and let the client deal with it.

Smart HTTP is not affected by this patch. Shallow support on
smart-http comes later separately.

(*) A shallow commit is a commit that terminates the revision
    walker. It is usually put in .git/shallow in order to keep the
    revision walker from going out of bound because there is no
    guarantee that objects behind this commit is available.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10 16:14:16 -08:00
Junio C Hamano
5bb62059f2 Merge branch 'jk/robustify-parse-commit'
* jk/robustify-parse-commit:
  checkout: do not die when leaving broken detached HEAD
  use parse_commit_or_die instead of custom message
  use parse_commit_or_die instead of segfaulting
  assume parse_commit checks for NULL commit
  assume parse_commit checks commit->object.parsed
  log_tree_diff: die when we fail to parse a commit
2013-12-05 12:54:01 -08:00
Jeff King
367068e0dd use parse_commit_or_die instead of custom message
Many calls to parse_commit detect errors and die. In some
cases, the custom error messages are more useful than what
parse_commit_or_die could produce, because they give some
context, like which ref the commit came from. Some, however,
just say "invalid commit". Let's convert the latter to use
parse_commit_or_die; its message is slightly more informative,
and it makes the error more consistent throughout git.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24 15:43:51 -07:00
Nguyễn Thái Ngọc Duy
08ea65ad13 shallow: add setup_temporary_shallow()
This function is like setup_alternate_shallow() except that it does
not lock $GIT_DIR/shallow.  It is supposed to be used when a program
generates temporary shallow for use by another program, then throw
the shallow file away.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-28 11:51:54 -07:00
Nguyễn Thái Ngọc Duy
6a3bbb4db4 shallow: only add shallow graft points to new shallow file
for_each_commit_graft() goes through all graft points, and shallow
boundaries are just one special kind of grafting.

If $GIT_DIR/shallow and $GIT_DIR/info/grafts are both present,
write_shallow_commits() may catch both sets, accidentally turning
some graft points to shallow boundaries.  Don't do that.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-28 11:51:17 -07:00
Nguyễn Thái Ngọc Duy
3125fe528b move setup_alternate_shallow and write_shallow_commits to shallow.c
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-18 13:00:17 -07:00
Junio C Hamano
4c72ee838f Merge branch 'mk/upload-pack-off-by-one-dead-code-removal'
* mk/upload-pack-off-by-one-dead-code-removal:
  upload-pack: remove a piece of dead code
2013-07-22 11:24:05 -07:00
Matthijs Kooijman
4b796951ff upload-pack: remove a piece of dead code
Commit 682c7d2 (upload-pack: fix off-by-one depth calculation in shallow
clone) introduced a new check in get_shallow_commits to decide when to
stop traversing the history and mark the current commit as a shallow
root.

With this new check in place, the old check can no longer be true, since
the first check always fires first. This commit removes that check,
making the code a bit more simple again.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
Acked-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:49:42 -07:00
Nguyễn Thái Ngọc Duy
6035d6aad8 fetch-pack: prepare updated shallow file before fetching the pack
index-pack --strict looks up and follows parent commits. If shallow
information is not ready by the time index-pack is run, index-pack may
be led to non-existent objects. Make fetch-pack save shallow file to
disk before invoking index-pack.

git learns new global option --shallow-file to pass on the alternate
shallow file path. Undocumented (and not even support --shallow-file=
syntax) because it's unlikely to be used again elsewhere.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28 08:06:08 -07:00
Nguyễn Thái Ngọc Duy
682c7d2f1a upload-pack: fix off-by-one depth calculation in shallow clone
get_shallow_commits() is used to determine the cut points at a given
depth (i.e. the number of commits in a chain that the user likes to
get). However we count current depth up to the commit "commit" but we
do the cutting at its parents (i.e. current depth + 1). This makes
upload-pack always return one commit more than requested. This patch
fixes it.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-11 09:10:57 -08:00
Thiago Farina
3cd474599f object.h: Add OBJECT_ARRAY_INIT macro and make use of it.
Signed-off-by: Thiago Farina <tfransosi@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-29 22:42:49 -07:00
Junio C Hamano
ee4f06c0a6 Merge branch 'mk/maint-parse-careful'
* mk/maint-parse-careful:
  peel_onion: handle NULL
  check return value from parse_commit() in various functions
  parse_commit: don't fail, if object is NULL
  revision.c: handle tag->tagged == NULL
  reachable.c::process_tree/blob: check for NULL
  process_tag: handle tag->tagged == NULL
  check results of parse_commit in merge_bases
  list-objects.c::process_tree/blob: check for NULL
  reachable.c::add_one_tree: handle NULL from lookup_tree
  mark_blob/tree_uninteresting: check for NULL
  get_sha1_oneline: check return value of parse_object
  read_object_with_reference: don't read beyond the buffer
2008-02-18 20:56:01 -08:00
Martin Koegler
dec38c8165 check return value from parse_commit() in various functions
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-18 20:49:13 -08:00
Martin Koegler
affeef12fb deref_tag: handle return value NULL
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-17 23:46:55 -08:00
Junio C Hamano
a6080a0a44 War on whitespace
This uses "git-apply --whitespace=strip" to fix whitespace errors that have
crept in to our source files over time.  There are a few files that need
to have trailing whitespaces (most notably, test vectors).  The results
still passes the test, and build result in Documentation/ area is unchanged.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-07 00:04:01 -07:00
Junio C Hamano
f43117a6c1 is_repository_shallow(): prototype fix.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-21 22:22:23 -08:00
Alexandre Julliard
d64d6c9fc7 get_shallow_commits: Avoid memory leak if a commit has been reached already.
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-11-24 15:42:50 -08:00
Alexandre Julliard
abef3a1625 Shallow clone: do not ignore shallowness when following tags
Tags should be considered when truncating the
commit list. The patch below fixes it, and fetches the right number of
commits for each tag. However the correct fix is probably to not fetch
historical tags at all.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-11-24 15:42:49 -08:00
Johannes Schindelin
f53514bc2d allow deepening of a shallow repository
Now, by saying "git fetch -depth <n> <repo>" you can deepen
a shallow repository.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-11-24 15:42:49 -08:00
Johannes Schindelin
ed09aef06f support fetching into a shallow repository
A shallow commit is a commit which has parents, which in turn are
"grafted away", i.e. the commit appears as if it were a root.

Since these shallow commits should not be edited by the user, but
only by core git, they are recorded in the file $GIT_DIR/shallow.

A repository containing shallow commits is called shallow.

The advantage of a shallow repository is that even if the upstream
contains lots of history, your local (shallow) repository needs not
occupy much disk space.

The disadvantage is that you might miss a merge base when pulling
some remote branch.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-11-24 15:42:49 -08:00