Memory leaks in various codepaths have been plugged.
* ma/leakplugs:
pack-bitmap[-write]: use `object_array_clear()`, don't leak
object_array: add and use `object_array_pop()`
object_array: use `object_array_clear()`, not `free()`
leak_pending: use `object_array_clear()`, not `free()`
commit: fix memory leak in `reduce_heads()`
builtin/commit: fix memory leak in `prepare_index()`
"git fast-export" with -M/-C option issued "copy" instruction on a
path that is simultaneously modified, which was incorrect.
* jt/fast-export-copy-modify-fix:
fast-export: do not copy from modified file
In a couple of places, we pop objects off an object array `foo` by
decreasing `foo.nr`. We access `foo.nr` in many places, but most if not
all other times we do so read-only, e.g., as we iterate over the array.
But when we change `foo.nr` behind the array's back, it feels a bit
nasty and looks like it might leak memory.
Leaks happen if the popped element has an allocated `name` or `path`.
At the moment, that is not the case. Still, 1) the object array might
gain more fields that want to be freed, 2) a code path where we pop
might start using names or paths, 3) one of these code paths might be
copied to somewhere where we do, and 4) using a dedicated function for
popping is conceptually cleaner.
Introduce and use `object_array_pop()` instead. Release memory in the
new function. Document that popping an object leaves the associated
elements in limbo.
The converted places were identified by grepping for "\.nr\>" and
looking for "--".
Make the new function return NULL on an empty array. This is consistent
with `pop_commit()` and allows the following:
while ((o = object_array_pop(&foo)) != NULL) {
// do something
}
But as noted above, we don't need to go out of our way to avoid reading
`foo.nr`. This is probably more readable:
while (foo.nr) {
... o = object_array_pop(&foo);
// do something
}
The name of `object_array_pop()` does not quite align with
`add_object_array()`. That is unfortunate. On the other hand, it matches
`object_array_clear()`. Arguably it's `add_...` that is the odd one out,
since it reads like it's used to "add" an "object array". For that
reason, side with `object_array_clear()`.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When run with the "-C" option, fast-export writes 'C' commands in its
output whenever the internal diff mechanism detects a file copy,
indicating that fast-import should copy the given existing file to the
given new filename. However, the diff mechanism works against the
prior version of the file, whereas fast-import uses whatever is current.
This causes issues when a commit both modifies a file and uses it as the
source for a copy.
Therefore, teach fast-export to refrain from writing 'C' when it has
already written a modification command for a file.
An existing test in t9350-fast-export is also fixed in this patch. The
existing line "C file6 file7" copies the wrong version of file6, but it
has coincidentally worked because file7 was subsequently overridden.
Reported-by: Juraj Oršulić <juraj.orsulic@fer.hr>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using the hashmap a common need is to have access to caller provided
data in the compare function. A couple of times we abuse the keydata field
to pass in the data needed. This happens for example in patch-ids.c.
This patch changes the function signature of the compare function
to have one more void pointer available. The pointer given for each
invocation of the compare function must be defined in the init function
of the hashmap and is just passed through.
Documentation of this new feature is deferred to a later patch.
This is a rather mechanical conversion, just adding the new pass-through
parameter. However while at it improve the naming of the fields of all
compare functions used by hashmaps by ensuring unused parameters are
prefixed with 'unused_' and naming the parameters what they are (instead
of 'unused' make it 'unused_keydata').
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
Stop including config.h by default in cache.h. Instead only include
config.h in those files which require use of the config system.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We often try to open a file for reading whose existence is
optional, and silently ignore errors from open/fopen; report such
errors if they are not due to missing files.
* nd/fopen-errors:
mingw_fopen: report ENOENT for invalid file names
mingw: verify that paths are not mistaken for remote nicknames
log: fix memory leak in open_next_file()
rerere.c: move error_errno() closer to the source system call
print errno when reporting a system call error
wrapper.c: make warn_on_inaccessible() static
wrapper.c: add and use fopen_or_warn()
wrapper.c: add and use warn_on_fopen_errors()
config.mak.uname: set FREAD_READS_DIRECTORIES for Darwin, too
config.mak.uname: set FREAD_READS_DIRECTORIES for Linux and FreeBSD
clone: use xfopen() instead of fopen()
use xfopen() in more places
git_fopen: fix a sparse 'not declared' warning
xfopen()
- provides error details
- explains error on reading, or writing, or whatever operation
- has l10n support
- prints file name in the error
Some of these are missing in the places that are replaced with xfopen(),
which is a clear win. In some other places, it's just less code (not as
clearly a win as the previous case but still is).
The only slight regresssion is in remote-testsvn, where we don't report
the file class (marks files) in the error messages anymore. But since
this is a _test_ svn remote transport, I'm not too concerned.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert lookup_blob to take a pointer to struct object_id.
The commit was created with manual changes to blob.c and blob.h, plus
the following semantic patch:
@@
expression E1;
@@
- lookup_blob(E1.hash)
+ lookup_blob(&E1)
@@
expression E1;
@@
- lookup_blob(E1->hash)
+ lookup_blob(E1)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert lookup_commit, lookup_commit_or_die,
lookup_commit_reference, and lookup_commit_reference_gently to take
struct object_id arguments.
Introduce a temporary in parse_object buffer in order to convert this
function. This is required since in order to convert parse_object and
parse_object_buffer, lookup_commit_reference_gently and
lookup_commit_or_die would need to be converted. Not introducing a
temporary would therefore require that lookup_commit_or_die take a
struct object_id *, but lookup_commit would take unsigned char *,
leaving a confusing and hard-to-use interface.
parse_object_buffer will lose this temporary in a later patch.
This commit was created with manual changes to commit.c, commit.h, and
object.c, plus the following semantic patch:
@@
expression E1, E2;
@@
- lookup_commit_reference_gently(E1.hash, E2)
+ lookup_commit_reference_gently(&E1, E2)
@@
expression E1, E2;
@@
- lookup_commit_reference_gently(E1->hash, E2)
+ lookup_commit_reference_gently(E1, E2)
@@
expression E1;
@@
- lookup_commit_reference(E1.hash)
+ lookup_commit_reference(&E1)
@@
expression E1;
@@
- lookup_commit_reference(E1->hash)
+ lookup_commit_reference(E1)
@@
expression E1;
@@
- lookup_commit(E1.hash)
+ lookup_commit(&E1)
@@
expression E1;
@@
- lookup_commit(E1->hash)
+ lookup_commit(E1)
@@
expression E1, E2;
@@
- lookup_commit_or_die(E1.hash, E2)
+ lookup_commit_or_die(&E1, E2)
@@
expression E1, E2;
@@
- lookup_commit_or_die(E1->hash, E2)
+ lookup_commit_or_die(E1, E2)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reported by, you guessed it, Coverity.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In addition to converting to struct object_id, write some hardcoded
buffer sizes in terms of GIT_SHA1_RAWSZ.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply the semantic patch contrib/coccinelle/qsort.cocci to the code
base, replacing calls of qsort(3) with QSORT. The resulting code is
shorter and supports empty arrays with NULL pointers.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert struct diff_filespec's sha1 member to use a struct object_id
called "oid" instead. The following Coccinelle semantic patch was used
to implement this, followed by the transformations in object_id.cocci:
@@
struct diff_filespec o;
@@
- o.sha1
+ o.oid.hash
@@
struct diff_filespec *p;
@@
- p->sha1
+ p->oid.hash
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each of these cases can be converted to use ALLOC_ARRAY or
REALLOC_ARRAY, which has two advantages:
1. It automatically checks the array-size multiplication
for overflow.
2. It always uses sizeof(*array) for the element-size,
so that it can never go out of sync with the declared
type of the array.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some codepaths used fopen(3) when opening a fixed path in $GIT_DIR
(e.g. COMMIT_EDITMSG) that is meant to be left after the command is
done. This however did not work well if the repository is set to
be shared with core.sharedRepository and the umask of the previous
user is tighter. They have been made to work better by calling
unlink(2) and retrying after fopen(3) fails with EPERM.
* js/fopen-harder:
Handle more file writes correctly in shared repos
commit: allow editing the commit message even in shared repos
In shared repositories, we have to be careful when writing files whose
permissions do not allow users other than the owner to write them.
In particular, we force the marks file of fast-export and the FETCH_HEAD
when fetching to be rewritten from scratch.
This commit does not touch other calls to fopen() that want to
write files:
- commands that write to working tree files (core.sharedRepository
does not affect permission bits of working tree files),
e.g. .rej file created by "apply --reject", result of applying a
previous conflict resolution by "rerere", "git merge-file".
- git am, when splitting mails (git-am correctly cleans up its directory
after finishing, so there is no need to share those files between users)
- git submodule clone, when writing the .git file, because the file
will not be overwritten
- git_terminal_prompt() in compat/terminal.c, because it is not writing to
a file at all
- git diff --output, because the output file is clearly not intended to be
shared between the users of the current repository
- git fast-import, when writing a crash report, because the reports' file
names are unique due to an embedded process ID
- mailinfo() in mailinfo.c, because the output is clearly not intended to
be shared between the users of the current repository
- check_or_regenerate_marks() in remote-testsvn.c, because this is only
used for Git's internal testing
- git fsck, when writing lost&found blobs (this should probably be
changed, but left as a low-hanging fruit for future contributors).
Note that this patch does not touch callers of write_file() and
write_file_gently(), which would benefit from the same scrutiny as
to usage in shared repositories. Most notable users are branch,
daemon, submodule & worktree, and a worrisome call in transport.c
when updating one ref (which ignores the shared flag).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert all instances of get_object_hash to use an appropriate reference
to the hash member of the oid member of struct object. This provides no
functional change, as it is essentially a macro substitution.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
struct object is one of the major data structures dealing with object
IDs. Convert it to use struct object_id instead of an unsigned char
array. Convert get_object_hash to refer to the new member as well.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
Convert most instances where the sha1 member of struct object is
dereferenced to use get_object_hash. Most instances that are passed to
functions that have versions taking struct object_id, such as
get_sha1_hex/get_oid_hex, or instances that can be trivially converted
to use struct object_id instead, are not converted.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
Some functions from the refs module were still declared in cache.h.
Move them to refs.h.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes users want to report a bug they experience on
their repository, but they are not at liberty to share the
contents of the repository. It would be useful if they could
produce a repository that has a similar shape to its history
and tree, but without leaking any information. This
"anonymized" repository could then be shared with developers
(assuming it still replicates the original problem).
This patch implements an "--anonymize" option to
fast-export, which generates a stream that can recreate such
a repository. Producing a single stream makes it easy for
the caller to verify that they are not leaking any useful
information. You can get an overview of what will be shared
by running a command like:
git fast-export --anonymize --all |
perl -pe 's/\d+/X/g' |
sort -u |
less
which will show every unique line we generate, modulo any
numbers (each anonymized token is assigned a number, like
"User 0", and we replace it consistently in the output).
In addition to anonymizing, this produces test cases that
are relatively small (compared to the original repository)
and fast to generate (compared to using filter-branch, or
modifying the output of fast-export yourself). Here are
numbers for git.git:
$ time git fast-export --anonymize --all \
--tag-of-filtered-object=drop >output
real 0m2.883s
user 0m2.828s
sys 0m0.052s
$ gzip output
$ ls -lh output.gz | awk '{print $5}'
2.9M
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move "commit->buffer" out of the in-core commit object and keep
track of their lengths. Use this to optimize the code paths to
validate GPG signatures in commit objects.
* jk/commit-buffer-length:
reuse cached commit buffer when parsing signatures
commit: record buffer length in cache
commit: convert commit->buffer to a slab
commit-slab: provide a static initializer
use get_commit_buffer everywhere
convert logmsg_reencode to get_commit_buffer
use get_commit_buffer to avoid duplicate code
use get_cached_commit_buffer where appropriate
provide helpers to access the commit buffer
provide a helper to set the commit buffer
provide a helper to free commit buffer
sequencer: use logmsg_reencode in get_message
logmsg_reencode: return const buffer
do not create "struct commit" with xcalloc
commit: push commit_index update into alloc_commit_node
alloc: include any-object allocations in alloc_report
replace dangerous uses of strbuf_attach
commit_tree: take a pointer/len pair rather than a const strbuf
Most callsites which use the commit buffer try to use the
cached version attached to the commit, rather than
re-reading from disk. Unfortunately, that interface provides
only a pointer to the NUL-terminated buffer, with no
indication of the original length.
For the most part, this doesn't matter. People do not put
NULs in their commit messages, and the log code is happy to
treat it all as a NUL-terminated string. However, some code
paths do care. For example, when checking signatures, we
want to be very careful that we verify all the bytes to
avoid malicious trickery.
This patch just adds an optional "size" out-pointer to
get_commit_buffer and friends. The existing callers all pass
NULL (there did not seem to be any obvious sites where we
could avoid an immediate strlen() call, though perhaps with
some further refactoring we could).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each of these sites assumes that commit->buffer is valid.
Since they would segfault if this was not the case, they are
likely to be correct in practice. However, we can
future-proof them by using get_commit_buffer.
And as a side effect, we abstract away the final bare uses
of commit->buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
So that we can convert the exported ref names.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't want to pass arguments specific to fast-export to
setup_revisions.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove a few duplicate implementations of prefix/suffix comparison
functions, and rename them to starts_with and ends_with.
* cc/starts-n-ends-with:
replace {pre,suf}fixcmp() with {starts,ends}_with()
strbuf: introduce starts_with() and ends_with()
builtin/remote: remove postfixcmp() and use suffixcmp() instead
environment: normalize use of prefixcmp() by removing " != 0"
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.
The change can be recreated by mechanically applying this:
$ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
grep -v strbuf\\.c |
xargs perl -pi -e '
s|!prefixcmp\(|starts_with\(|g;
s|prefixcmp\(|!starts_with\(|g;
s|!suffixcmp\(|ends_with\(|g;
s|suffixcmp\(|!ends_with\(|g;
'
on the result of preparatory changes in this series.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/robustify-parse-commit:
checkout: do not die when leaving broken detached HEAD
use parse_commit_or_die instead of custom message
use parse_commit_or_die instead of segfaulting
assume parse_commit checks for NULL commit
assume parse_commit checks commit->object.parsed
log_tree_diff: die when we fail to parse a commit
Some unchecked calls to parse_commit should obviously die on
error, because their next step is to start looking at the
parsed fields, which will cause a segfault. These are
obvious candidates for parse_commit_or_die, which will be a
strict improvement in behavior.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert most uses of OPT_BOOLEAN/OPTION_BOOLEAN that can use
OPT_BOOL/OPTION_BOOLEAN which have much saner semantics, and turn
remaining ones into OPT_SET_INT, OPT_COUNTUP, etc. as necessary.
* sb/parseopt-boolean-removal:
revert: use the OPT_CMDMODE for parsing, reducing code
checkout-index: fix negations of even numbers of -n
config parsing options: allow one flag multiple times
hash-object: replace stdin parsing OPT_BOOLEAN by OPT_COUNTUP
branch, commit, name-rev: ease up boolean conditions
checkout: remove superfluous local variable
log, format-patch: parsing uses OPT__QUIET
Replace deprecated OPT_BOOLEAN by OPT_BOOL
Remove deprecated OPTION_BOOLEAN for parsing arguments
Split into a separate helper function get_commit() so that the part that
finds the relevant commit, and the part that does something with it
(handle tag object, etc.) are in different places.
No functional changes.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no need to pass it around everywhere. This would make easier
further refactoring that makes use of this variable.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This task emerged from b04ba2bb (parse-options: deprecate OPT_BOOLEAN,
2011-09-27). All occurrences of the respective variables have
been reviewed and none of them relied on the counting up mechanism,
but all of them were using the variable as a true boolean.
This patch does not change semantics of any command intentionally.
Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's wrong to call get_sha1() if they should be SHA-1s, plus
inefficient.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't need the parsed objects at this point, merely the
information that they have marks.
Seems to be three times faster in my setup with lots of objects.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We read from the marks file and keep only marked commits, but in
order to find the type of object, we are parsing the whole thing,
which is slow, specially in big repositories with lots of big files.
There's no need for that, we can query the object information with
sha1_object_info().
Before this, loading the objects of a fresh emacs import, with 260598
blobs took 14 minutes, after this patch, it takes 3 seconds.
This is the way fast-import does it. Also die if the object is not
found (like fast-import).
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This issues a warning while stripping signatures from signed tags, which
allows us to use it as default behaviour for remote helpers which cannot
specify how to handle signed tags.
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
Correct common spelling mistakes in comments and tests
kwset: fix spelling in comments
precompose-utf8: fix spelling of "want" in error message
compat/nedmalloc: fix spelling in comments
compat/regex: fix spelling and grammar in comments
obstack: fix spelling of similar
contrib/subtree: fix spelling of accidentally
git-remote-mediawiki: spelling fixes
doc: various spelling fixes
fast-export: fix argument name in error messages
Documentation: distinguish between ref and offset deltas in pack-format
i18n: make the translation of -u advice in one go