There are no callers left of lookup_unknown_object() that aren't just
passing us the "hash" member of a "struct object_id". Let's take the
whole struct, which gets us closer to removing all raw sha1 variables.
It also matches the existing conversions of lookup_blob(), etc.
The conversions of callers were done by hand, but they're all mechanical
one-liners.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
init_skiplist() took a file consisting of SHA-1s and comments and added
the objects to an oidset. This functionality is useful for other
commands and will be moved to oidset.c in a future commit.
In preparation for that move, this commit renames it to
oidset_parse_file() to reflect its more generic usage and cleans up a
few of the names.
Signed-off-by: Barret Rhoden <brho@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
struct diff_filespec defines mode to be an 'unsigned short'. Several
other places in the API which we'd like to interact with using a
diff_filespec used a plain unsigned (or unsigned int). This caused
problems when taking addresses, so switch to unsigned short.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parsing a tree, we read the object ID directly out of the tree
buffer. This is normally fine, but such an object ID cannot be used with
oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1,
there may not be that many bytes to copy.
Instead, store the object ID in a separate struct member. Since we can
no longer efficiently compute the path length, store that information as
well in struct name_entry. Ensure we only copy the object ID into the
new buffer if the path length is nonzero, as some callers will pass us
an empty path with no object ID following it, and we will not want to
read past the end of the buffer.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update fsck.skipList implementation and documentation.
* ab/fsck-skiplist:
fsck: support comments & empty lines in skipList
fsck: use oidset instead of oid_array for skipList
fsck: use strbuf_getline() to read skiplist file
fsck: add a performance test for skipList
fsck: add a performance test
fsck: document that skipList input must be unabbreviated
fsck: document and test commented & empty line skipList input
fsck: document and test sorted skipList input
fsck tests: add a test for no skipList input
fsck tests: setup of bogus commit object
* maint-2.18:
Git 2.18.1
Git 2.17.2
fsck: detect submodule paths starting with dash
fsck: detect submodule urls starting with dash
Git 2.16.5
Git 2.15.3
Git 2.14.5
submodule-config: ban submodule paths that start with a dash
submodule-config: ban submodule urls that start with dash
submodule--helper: use "--" to signal end of clone options
* maint-2.17:
Git 2.17.2
fsck: detect submodule paths starting with dash
fsck: detect submodule urls starting with dash
Git 2.16.5
Git 2.15.3
Git 2.14.5
submodule-config: ban submodule paths that start with a dash
submodule-config: ban submodule urls that start with dash
submodule--helper: use "--" to signal end of clone options
As with urls, submodule paths with dashes are ignored by
git, but may end up confusing older versions. Detecting them
via fsck lets us prevent modern versions of git from being a
vector to spread broken .gitmodules to older versions.
Compared to blocking leading-dash urls, though, this
detection may be less of a good idea:
1. While such paths provide confusing and broken results,
they don't seem to actually work as option injections
against anything except "cd". In particular, the
submodule code seems to canonicalize to an absolute
path before running "git clone" (so it passes
/your/clone/-sub).
2. It's more likely that we may one day make such names
actually work correctly. Even after we revert this fsck
check, it will continue to be a hassle until hosting
servers are all updated.
On the other hand, it's not entirely clear that the behavior
in older versions is safe. And if we do want to eventually
allow this, we may end up doing so with a special syntax
anyway (e.g., writing "./-sub" in the .gitmodules file, and
teaching the submodule code to canonicalize it when
comparing).
So on balance, this is probably a good protection.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Urls with leading dashes can cause mischief on older
versions of Git. We should detect them so that they can be
rejected by receive.fsckObjects, preventing modern versions
of git from being a vector by which attacks can spread.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's annoying not to be able to put comments and empty lines in the
skipList, when e.g. keeping a big central list of commits to skip in
/etc/gitconfig, which was my motivation for 1362df0d41 ("fetch:
implement fetch.fsck.*", 2018-07-27).
Implement that, and document what version of Git this was changed in,
since this on-disk format can be expected to be used by multiple
versions of git.
There is no notable performance impact from this change, using the
test setup described a couple of commits back:
Test HEAD~ HEAD
----------------------------------------------------------------------------------------
1450.3: fsck with 0 skipped bad commits 7.69(7.27+0.42) 7.86(7.48+0.37) +2.2%
1450.5: fsck with 1 skipped bad commits 7.69(7.30+0.38) 7.83(7.47+0.36) +1.8%
1450.7: fsck with 10 skipped bad commits 7.76(7.38+0.38) 7.79(7.38+0.41) +0.4%
1450.9: fsck with 100 skipped bad commits 7.76(7.38+0.38) 7.74(7.36+0.38) -0.3%
1450.11: fsck with 1000 skipped bad commits 7.71(7.30+0.41) 7.72(7.34+0.38) +0.1%
1450.13: fsck with 10000 skipped bad commits 7.74(7.34+0.40) 7.72(7.34+0.38) -0.3%
1450.15: fsck with 100000 skipped bad commits 7.75(7.40+0.35) 7.70(7.29+0.40) -0.6%
1450.17: fsck with 1000000 skipped bad commits 7.12(6.86+0.26) 7.13(6.87+0.26) +0.1%
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the implementation of the skipList feature to use oidset
instead of oid_array to store SHA-1s for later lookup.
This list is parsed once on startup by fsck, fetch-pack or
receive-pack depending on the *.skipList config in use. I.e. only once
per invocation, but note that for "clone --recurse-submodules" each
submodule will re-parse the list, in addition to the main project, and
it will be re-parsed when checking .gitmodules blobs, see
fb16287719 ("fsck: check skiplist for object in fsck_blob()",
2018-06-27).
Memory usage is a bit higher, but we don't need to keep track of the
sort order anymore. Embed the oidset into struct fsck_options to make
its ownership clear (no hidden sharing) and avoid unnecessary pointer
indirection.
The cumulative impact on performance of this & the preceding change,
using the test setup described in the previous commit:
Test HEAD~2 HEAD~ HEAD
----------------------------------------------------------------------------------------------------------------
1450.3: fsck with 0 skipped bad commits 7.70(7.31+0.38) 7.72(7.33+0.38) +0.3% 7.70(7.30+0.40) +0.0%
1450.5: fsck with 1 skipped bad commits 7.84(7.47+0.37) 7.69(7.32+0.36) -1.9% 7.71(7.29+0.41) -1.7%
1450.7: fsck with 10 skipped bad commits 7.81(7.40+0.40) 7.94(7.57+0.36) +1.7% 7.92(7.55+0.37) +1.4%
1450.9: fsck with 100 skipped bad commits 7.81(7.42+0.38) 7.95(7.53+0.41) +1.8% 7.83(7.42+0.41) +0.3%
1450.11: fsck with 1000 skipped bad commits 7.99(7.62+0.36) 7.90(7.50+0.40) -1.1% 7.86(7.49+0.37) -1.6%
1450.13: fsck with 10000 skipped bad commits 7.98(7.57+0.40) 7.94(7.53+0.40) -0.5% 7.90(7.45+0.44) -1.0%
1450.15: fsck with 100000 skipped bad commits 7.97(7.57+0.39) 8.03(7.67+0.36) +0.8% 7.84(7.43+0.41) -1.6%
1450.17: fsck with 1000000 skipped bad commits 7.72(7.22+0.50) 7.28(7.07+0.20) -5.7% 7.13(6.87+0.25) -7.6%
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The buffer is unlikely to contain a NUL character, so printing its
contents using %s in a die() format is unsafe (detected with ASan).
Use an idiomatic strbuf_getline() loop instead, which ensures the buffer
is always NUL-terminated, supports CRLF files as well, accepts files
without a newline after the last line, supports any hash length
automatically, and is shorter.
This fixes a bug where emitting an error about an invalid line on say
line 1 would continue printing subsequent lines, and usually continue
into uninitialized memory.
The performance impact of this, on a CentOS 7 box with RedHat GCC
4.8.5-28:
$ GIT_PERF_REPEAT_COUNT=5 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run HEAD~ HEAD p1451-fsck-skip-list.sh
Test HEAD~ HEAD
----------------------------------------------------------------------------------------
1450.3: fsck with 0 skipped bad commits 7.75(7.39+0.35) 7.68(7.29+0.39) -0.9%
1450.5: fsck with 1 skipped bad commits 7.70(7.30+0.40) 7.80(7.42+0.37) +1.3%
1450.7: fsck with 10 skipped bad commits 7.77(7.37+0.40) 7.87(7.47+0.40) +1.3%
1450.9: fsck with 100 skipped bad commits 7.82(7.41+0.40) 7.88(7.43+0.44) +0.8%
1450.11: fsck with 1000 skipped bad commits 7.88(7.49+0.39) 7.84(7.43+0.40) -0.5%
1450.13: fsck with 10000 skipped bad commits 8.02(7.63+0.39) 8.07(7.67+0.39) +0.6%
1450.15: fsck with 100000 skipped bad commits 8.01(7.60+0.41) 8.08(7.70+0.38) +0.9%
1450.17: fsck with 1000000 skipped bad commits 7.60(7.10+0.50) 7.37(7.18+0.19) -3.0%
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent "security fix" to pay attention to contents of ".gitmodules"
while accepting "git push" was a bit overly strict than necessary,
which has been adjusted.
* jk/fsck-gitmodules-gently:
fsck: downgrade gitmodulesParse default to "info"
fsck: split ".gitmodules too large" error from parse failure
fsck: silence stderr when parsing .gitmodules
config: add options parameter to git_config_from_mem
config: add CONFIG_ERROR_SILENT handler
config: turn die_on_error into caller-facing enum
"fsck.skipList" did not prevent a blob object listed there from
being inspected for is contents (e.g. we recently started to
inspect the contents of ".gitmodules" for certain malicious
patterns), which has been corrected.
* rj/submodule-fsck-skip:
fsck: check skiplist for object in fsck_blob()
The conversion to pass "the_repository" and then "a_repository"
throughout the object access API continues.
* sb/object-store-grafts:
commit: allow lookup_commit_graft to handle arbitrary repositories
commit: allow prepare_commit_graft to handle arbitrary repositories
shallow: migrate shallow information into the object parser
path.c: migrate global git_path_* to take a repository argument
cache: convert get_graft_file to handle arbitrary repositories
commit: convert read_graft_file to handle arbitrary repositories
commit: convert register_commit_graft to handle arbitrary repositories
commit: convert commit_graft_pos() to handle arbitrary repositories
shallow: add repository argument to is_repository_shallow
shallow: add repository argument to check_shallow_file_for_update
shallow: add repository argument to register_shallow
shallow: add repository argument to set_alternate_shallow_file
commit: add repository argument to lookup_commit_graft
commit: add repository argument to prepare_commit_graft
commit: add repository argument to read_graft_file
commit: add repository argument to register_commit_graft
commit: add repository argument to commit_graft_pos
object: move grafts to object parser
object-store: move object access functions to object-store.h
We added an fsck check in ed8b10f631 (fsck: check
.gitmodules content, 2018-05-02) as a defense against the
vulnerability from 0383bbb901 (submodule-config: verify
submodule names as paths, 2018-04-30). With the idea that
up-to-date hosting sites could protect downstream unpatched
clients that fetch from them.
As part of that defense, we reject any ".gitmodules" entry
that is not syntactically valid. The theory is that if we
cannot even parse the file, we cannot accurately check it
for vulnerabilities. And anybody with a broken .gitmodules
file would eventually want to know anyway.
But there are a few reasons this is a bad tradeoff in
practice:
- for this particular vulnerability, the client has to be
able to parse the file. So you cannot sneak an attack
through using a broken file, assuming the config parsers
for the process running fsck and the eventual victim are
functionally equivalent.
- a broken .gitmodules file is not necessarily a problem.
Our fsck check detects .gitmodules in _any_ tree, not
just at the root. And the presence of a .gitmodules file
does not necessarily mean it will be used; you'd have to
also have gitlinks in the tree. The cgit repository, for
example, has a file named .gitmodules from a
pre-submodule attempt at sharing code, but does not
actually have any gitlinks.
- when the fsck check is used to reject a push, it's often
hard to work around. The pusher may not have full control
over the destination repository (e.g., if it's on a
hosting server, they may need to contact the hosting
site's support). And the broken .gitmodules may be too
far back in history for rewriting to be feasible (again,
this is an issue for cgit).
So we're being unnecessarily restrictive without actually
improving the security in a meaningful way. It would be more
convenient to downgrade this check to "info", which means
we'd still comment on it, but not reject a push. Site admins
can already do this via config, but we should ship sensible
defaults.
There are a few counterpoints to consider in favor of
keeping the check as an error:
- the first point above assumes that the config parsers for
the victim and the fsck process are equivalent. This is
pretty true now, but as time goes on will become less so.
Hosting sites are likely to upgrade their version of Git,
whereas vulnerable clients will be stagnant (if they did
upgrade, they'd cease to be vulnerable!). So in theory we
may see drift over time between what two config parsers
will accept.
In practice, this is probably OK. The config format is
pretty established at this point and shouldn't change a
lot. And the farther we get from the announcement of the
vulnerability, the less interesting this extra layer of
protection becomes. I.e., it was _most_ valuable on day
0, when everybody's client was still vulnerable and
hosting sites could protect people. But as time goes on
and people upgrade, the population of vulnerable clients
becomes smaller and smaller.
- In theory this could protect us from other
vulnerabilities in the future. E.g., .gitmodules are the
only way for a malicious repository to feed data to the
config parser, so this check could similarly protect
clients from a future (to-be-found) bug there.
But that's trading a hypothetical case for real-world
pain today. If we do find such a bug, the hosting site
would need to be updated to fix it, too. At which point
we could figure out whether it's possible to detect
_just_ the malicious case without hurting existing
broken-but-not-evil cases.
- Until recently, we hadn't made any restrictions on
.gitmodules content. So now in tightening that we're
hitting cases where certain things used to work, but
don't anymore. There's some moderate pain now. But as
time goes on, we'll see more (and more varied) cases that
will make tightening harder in the future. So there's
some argument for putting rules in place _now_, before
users grow more cases that violate them.
Again, this is trading pain now for hypothetical benefit
in the future. And if we try hard in the future to keep
our tightening to a minimum (i.e., rejecting true
maliciousness without hurting broken-but-not-evil repos),
then that reduces even the hypothetical benefit.
Considering both sets of arguments, it makes sense to loosen
this check for now.
Note that we have to tweak the test in t7415 since fsck will
no longer consider this a fatal error. But we still check
that it reports the warning, and that we don't get the
spurious error from the config code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since ed8b10f631 (fsck: check .gitmodules content,
2018-05-02), we'll report a gitmodulesParse error for two
conditions:
- a .gitmodules entry is not syntactically valid
- a .gitmodules entry is larger than core.bigFileThreshold
with the intent that we can detect malicious files and
protect downstream clients. E.g., from the issue in
0383bbb901 (submodule-config: verify submodule names as
paths, 2018-04-30).
But these conditions are actually quite different with
respect to that bug:
- a syntactically invalid file cannot trigger the problem,
as the victim would barf before hitting the problematic
code
- a too-big .gitmodules _can_ trigger the problem. Even
though it is obviously silly to have a 500MB .gitmodules
file, the submodule code will happily parse it if you
have enough memory.
So it may be reasonable to configure their severity
separately. Let's add a new class for the "too large" case
to allow that.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since commit ed8b10f631 ("fsck: check .gitmodules content", 2018-05-02),
fsck will issue an error message for '.gitmodules' content that cannot
be parsed correctly. This is the case, even when the corresponding blob
object has been included on the skiplist. For example, using the cgit
repository, we see the following:
$ git fsck
Checking object directories: 100% (256/256), done.
error: bad config line 5 in blob .gitmodules
error in blob 51dd1eff1edc663674df9ab85d2786a40f7ae3a5: gitmodulesParse: could not parse gitmodules blob
Checking objects: 100% (6626/6626), done.
$
$ git config fsck.skiplist '.git/skip'
$ echo 51dd1eff1edc663674df9ab85d2786a40f7ae3a5 >.git/skip
$
$ git fsck
Checking object directories: 100% (256/256), done.
error: bad config line 5 in blob .gitmodules
Checking objects: 100% (6626/6626), done.
$
Note that the error message issued by the config parser is still
present, despite adding the object-id of the blob to the skiplist.
One solution would be to provide a means of suppressing the messages
issued by the config parser. However, given that (logically) we are
asking fsck to ignore this object, a simpler approach is to just not
call the config parser if the object is to be skipped. Add a check to
the 'fsck_blob()' processing function, to determine if the object is
on the skiplist and, if so, exit the function early.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If there's a parsing error we'll already report it via the
usual fsck report() function (or not, if the user has asked
to skip this object or warning type). The error message from
the config parser just adds confusion. Let's suppress it.
Note that we didn't test this case at all, so I've added
coverage in t7415. We may end up toning down or removing
this fsck check in the future. So take this test as checking
what happens now with a focus on stderr, and not any
ironclad guarantee that we must detect and report parse
failures in the future.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The underlying config parser knows how to handle a
config_options struct, but git_config_from_mem() always
passes NULL. Let's allow our callers to specify the options
struct.
We could add a "_with_options" variant, but since there are
only a handful of callers, let's just update them to pass
NULL.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a repository argument to allow the callers of lookup_tree
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a repository argument to allow the callers of lookup_blob
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a repository argument to allow the callers of parse_object
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Continuing with the idea to programatically enumerate various
pieces of data required for command line completion, teach the
codebase to report the list of configuration variables
subcommands care about to help complete them.
* nd/complete-config-vars:
completion: complete general config vars in two steps
log-tree: allow to customize 'grafted' color
completion: support case-insensitive config vars
completion: keep other config var completion in camelCase
completion: drop the hard coded list of config vars
am: move advice.amWorkDir parsing back to advice.c
advice: keep config name in camelCase in advice_config[]
fsck: produce camelCase config key names
help: add --config to list all available config
fsck: factor out msg_id_info[] lazy initialization code
grep: keep all colors in an array
Add and use generic name->id mapping code for color slot parsing
Finishing touches to a topic that already is in 'maint'.
* jk/submodule-fsck-loose-fixup:
fsck: avoid looking at NULL blob->object
t7415: don't bother creating commit for symlink test
Commit 159e7b080b (fsck: detect gitmodules files,
2018-05-02) taught fsck to look at the content of
.gitmodules files. If the object turns out not to be a blob
at all, we just complain and punt on checking the content.
And since this was such an obvious and trivial code path, I
didn't even bother to add a test.
Except it _does_ do one non-trivial thing, which is call the
report() function, which wants us to pass a pointer to a
"struct object". Which we don't have (we have only a "struct
object_id"). So we erroneously pass a NULL object to
report(), which gets dereferenced and causes a segfault.
It seems like we could refactor report() to just take the
object_id itself. But we pass the object pointer along to
a callback function, and indeed this ends up in
builtin/fsck.c's objreport() which does want to look at
other parts of the object (like the type).
So instead, let's just use lookup_unknown_object() to get
the real "struct object", and pass that.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Conversion from uchar[20] to struct object_id continues.
* bc/object-id: (42 commits)
merge-one-file: compute empty blob object ID
add--interactive: compute the empty tree value
Update shell scripts to compute empty tree object ID
sha1_file: only expose empty object constants through git_hash_algo
dir: use the_hash_algo for empty blob object ID
sequencer: use the_hash_algo for empty tree object ID
cache-tree: use is_empty_tree_oid
sha1_file: convert cached object code to struct object_id
builtin/reset: convert use of EMPTY_TREE_SHA1_BIN
builtin/receive-pack: convert one use of EMPTY_TREE_SHA1_HEX
wt-status: convert two uses of EMPTY_TREE_SHA1_HEX
submodule: convert several uses of EMPTY_TREE_SHA1_HEX
sequencer: convert one use of EMPTY_TREE_SHA1_HEX
merge: convert empty tree constant to the_hash_algo
builtin/merge: switch tree functions to use object_id
builtin/am: convert uses of EMPTY_TREE_SHA1_BIN to the_hash_algo
sha1-file: add functions for hex empty tree and blob OIDs
builtin/receive-pack: avoid hard-coded constants for push certs
diff: specify abbreviation size in terms of the_hash_algo
upload-pack: replace use of several hard-coded constants
...
Sometimes it helps to list all available config vars so the user can
search for something they want. The config man page can also be used
but it's harder to search if you want to focus on the variable name,
for example.
This is not the best way to collect the available config since it's
not precise. Ideally we should have a centralized list of config in C
code (pretty much like 'struct option'), but that's a lot more work.
This will do for now.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This array will be used by some other function than parse_msg_id() in
the following commit. Factor out this prep code so it could be called
from that one.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code has been taught to use the duplicated information stored
in the commit-graph file to learn the tree object name for a commit
to avoid opening and parsing the commit object when it makes sense
to do so.
* ds/lazy-load-trees:
coccinelle: avoid wrong transformation suggestions from commit.cocci
commit-graph: lazy-load trees for commits
treewide: replace maybe_tree with accessor methods
commit: create get_commit_tree() method
treewide: rename tree to maybe_tree
We've recently forbidden .gitmodules to be a symlink in
verify_path(). And it's an easy way to circumvent our fsck
checks for .gitmodules content. So let's complain when we
see it.
Signed-off-by: Jeff King <peff@peff.net>
This patch detects and blocks submodule names which do not
match the policy set forth in submodule-config. These should
already be caught by the submodule code itself, but putting
the check here means that newer versions of Git can protect
older ones from malicious entries (e.g., a server with
receive.fsckObjects will block the objects, protecting
clients which fetch from it).
As a side effect, this means fsck will also complain about
.gitmodules files that cannot be parsed (or were larger than
core.bigFileThreshold).
Signed-off-by: Jeff King <peff@peff.net>
If we have a tree that points to a .gitmodules blob but
don't have that blob, we can't check its contents. This
produces an fsck error when we encounter it.
But in the case of a promisor object, this absence is
expected, and we must not complain. Note that this can
technically circumvent our transfer.fsckObjects check.
Imagine a client fetches a tree, but not the matching
.gitmodules blob. An fsck of the incoming objects will show
that we don't have enough information. Later, we do fetch
the actual blob. But we have no idea that it's a .gitmodules
file.
The only ways to get around this would be to re-scan all of
the existing trees whenever new ones enter (which is
expensive), or to somehow persist the gitmodules_found set
between fsck runs (which is complicated).
In practice, it's probably OK to ignore the problem. Any
repository which has all of the objects (including the one
serving the promisor packs) can perform the checks. Since
promisor packs are inherently about a hierarchical topology
in which clients rely on upstream repositories, those
upstream repositories can protect all of their downstream
clients from broken objects.
Signed-off-by: Jeff King <peff@peff.net>
In preparation for performing fsck checks on .gitmodules
files, this commit plumbs in the actual detection of the
files. Note that unlike most other fsck checks, this cannot
be a property of a single object: we must know that the
object is found at a ".gitmodules" path at the root tree of
a commit.
Since the fsck code only sees one object at a time, we have
to mark the related objects to fit the puzzle together. When
we see a commit we mark its tree as a root tree, and when
we see a root tree with a .gitmodules file, we mark the
corresponding blob to be checked.
In an ideal world, we'd check the objects in topological
order: commits followed by trees followed by blobs. In that
case we can avoid ever loading an object twice, since all
markings would be complete by the time we get to the marked
objects. And indeed, if we are checking a single packfile,
this is the order in which Git will generally write the
objects. But we can't count on that:
1. git-fsck may show us the objects in arbitrary order
(loose objects are fed in sha1 order, but we may also
have multiple packs, and we process each pack fully in
sequence).
2. The type ordering is just what git-pack-objects happens
to write now. The pack format does not require a
specific order, and it's possible that future versions
of Git (or a custom version trying to fool official
Git's fsck checks!) may order it differently.
3. We may not even be fscking all of the relevant objects
at once. Consider pushing with transfer.fsckObjects,
where one push adds a blob at path "foo", and then a
second push adds the same blob at path ".gitmodules".
The blob is not part of the second push at all, but we
need to mark and check it.
So in the general case, we need to make up to three passes
over the objects: once to make sure we've seen all commits,
then once to cover any trees we might have missed, and then
a final pass to cover any .gitmodules blobs we found in the
second pass.
We can simplify things a bit by loosening the requirement
that we find .gitmodules only at root trees. Technically
a file like "subdir/.gitmodules" is not parsed by Git, but
it's not unreasonable for us to declare that Git is aware of
all ".gitmodules" files and make them eligible for checking.
That lets us drop the root-tree requirement, which
eliminates one pass entirely. And it makes our worst case
much better: instead of potentially queueing every root tree
to be re-examined, the worst case is that we queue each
unique .gitmodules blob for a second look.
This patch just adds the boilerplate to find .gitmodules
files. The actual content checks will come in a subsequent
commit.
Signed-off-by: Jeff King <peff@peff.net>
Because fscking a blob has always been a noop, we didn't
bother passing around the blob data. In preparation for
content-level checks, let's fix up a few things:
1. The fsck_object() function just returns success for any
blob. Let's a noop fsck_blob(), which we can fill in
with actual logic later.
2. The fsck_loose() function in builtin/fsck.c
just threw away blob content after loading it. Let's
hold onto it until after we've called fsck_object().
The easiest way to do this is to just drop the
parse_loose_object() helper entirely. Incidentally,
this also fixes a memory leak: if we successfully
loaded the object data but did not parse it, we would
have left the function without freeing it.
3. When fsck_loose() loads the object data, it
does so with a custom read_loose_object() helper. This
function streams any blobs, regardless of size, under
the assumption that we're only checking the sha1.
Instead, let's actually load blobs smaller than
big_file_threshold, as the normal object-reading
code-paths would do. This lets us fsck small files, and
a NULL return is an indication that the blob was so big
that it needed to be streamed, and we can pass that
information along to fsck_blob().
Signed-off-by: Jeff King <peff@peff.net>
There's no need for us to manually check for ".git"; it's a
subset of the other filesystem-specific tests. Dropping it
makes our code slightly shorter. More importantly, the
existing code may make a reader wonder why ".GIT" is not
covered here, and whether that is a bug (it isn't, as it's
also covered in the filesystem-specific tests).
Signed-off-by: Jeff King <peff@peff.net>
Add a repository argument to allow callers of lookup_commit_graft to
be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This should make these functions easier to find and cache.h less
overwhelming to read.
In particular, this moves:
- read_object_file
- oid_object_info
- write_object_file
As a result, most of the codebase needs to #include object-store.h.
In this patch the #include is only added to files that would fail to
compile otherwise. It would be better to #include wherever
identifiers from the header are used. That can happen later
when we have better tooling for it.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert two static functions to use struct object_id and parse_oid_hex,
instead of relying on harcoded 20 and 40-based constants.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of making trees load lazily, create a Coccinelle
script (contrib/coccinelle/commit.cocci) to ensure that all
references to the 'maybe_tree' member of struct commit are either
mutations or accesses through get_commit_tree() or
get_commit_tree_oid().
Apply the Coccinelle script to create the rest of the patch.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using the commit-graph file to walk commit history removes the large
cost of parsing commits during the walk. This exposes a performance
issue: lookup_tree() takes a large portion of the computation time,
even when Git never uses those trees.
In anticipation of lazy-loading these trees, rename the 'tree' member
of struct commit to 'maybe_tree'. This serves two purposes: it hints
at the future role of possibly being NULL even if the commit has a
valid tree, and it allows for unambiguous transformation from simple
member access (i.e. commit->maybe_tree) to method access.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert read_sha1_file to take a pointer to struct object_id and rename
it read_object_file. Do the same for read_sha1_file_extended.
Convert one use in grep.c to use the new function without any other code
change, since the pointer being passed is a void pointer that is already
initialized with a pointer to struct object_id. Update the declaration
and definitions of the modified functions, and apply the following
semantic patch to convert the remaining callers:
@@
expression E1, E2, E3;
@@
- read_sha1_file(E1.hash, E2, E3)
+ read_object_file(&E1, E2, E3)
@@
expression E1, E2, E3;
@@
- read_sha1_file(E1->hash, E2, E3)
+ read_object_file(E1, E2, E3)
@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1.hash, E2, E3, E4)
+ read_object_file_extended(&E1, E2, E3, E4)
@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1->hash, E2, E3, E4)
+ read_object_file_extended(E1, E2, E3, E4)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve behaviour of "git fsck" upon finding a missing object.
* rs/fsck-null-return-from-lookup:
fsck: handle NULL return of lookup_blob() and lookup_tree()
lookup_blob() and lookup_tree() can return NULL if they find an object
of an unexpected type. Accessing the object member is undefined in that
case. Cast the result to a struct object pointer instead; we can do
that because object is the first member of all object types. This trick
is already used in other places in the code.
An error message is already shown by object_as_type(), which is called
by the lookup functions. The walk callback functions are expected to
handle NULL object pointers passed to them, but put_object_name() needs
a valid object, so avoid calling it without one.
Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Gcc 7 adds -Wimplicit-fallthrough, which can warn when a
switch case falls through to the next case. The general idea
is that the compiler can't tell if this was intentional or
not, so you should annotate any intentional fall-throughs as
such, leaving it to complain about any unannotated ones.
There's a GNU __attribute__ which can be used for
annotation, but of course we'd have to #ifdef it away on
non-gcc compilers. Gcc will also recognize
specially-formatted comments, which matches our current
practice. Let's extend that practice to all of the
unannotated sites (which I did look over and verify that
they were behaving as intended).
Ideally in each case we'd actually give some reasons in the
comment about why we're falling through, or what we're
falling through to. And gcc does support that with
-Wimplicit-fallthrough=2, which relaxes the comment pattern
matching to anything that contains "fallthrough" (or a
variety of spelling variants). However, this isn't the
default for -Wimplicit-fallthrough, nor for -Wextra. In the
name of simplicity, it's probably better for us to support
the default level, which requires "fallthrough" to be the
only thing in the comment (modulo some window dressing like
"else" and some punctuation; see the gcc manual for the
complete set of patterns).
This patch suppresses all warnings due to
-Wimplicit-fallthrough. We might eventually want to add that
to the DEVELOPER Makefile knob, but we should probably wait
until gcc 7 is more widely adopted (since earlier versions
will complain about the unknown warning type).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With this patch, commit.h doesn't contain the string 'sha1' any more.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Conversion from uchar[20] to struct object_id continues.
* bc/object-id: (53 commits)
object: convert parse_object* to take struct object_id
tree: convert parse_tree_indirect to struct object_id
sequencer: convert do_recursive_merge to struct object_id
diff-lib: convert do_diff_cache to struct object_id
builtin/ls-tree: convert to struct object_id
merge: convert checkout_fast_forward to struct object_id
sequencer: convert fast_forward_to to struct object_id
builtin/ls-files: convert overlay_tree_on_cache to object_id
builtin/read-tree: convert to struct object_id
sha1_name: convert internals of peel_onion to object_id
upload-pack: convert remaining parse_object callers to object_id
revision: convert remaining parse_object callers to object_id
revision: rename add_pending_sha1 to add_pending_oid
http-push: convert process_ls_object and descendants to object_id
refs/files-backend: convert many internals to struct object_id
refs: convert struct ref_update to use struct object_id
ref-filter: convert some static functions to struct object_id
Convert struct ref_array_item to struct object_id
Convert the verify_pack callback to struct object_id
Convert lookup_tag to struct object_id
...
Convert the lookup_tree function to take a pointer to struct object_id.
The commit was created with manual changes to tree.c, tree.h, and
object.c, plus the following semantic patch:
@@
@@
- lookup_tree(EMPTY_TREE_SHA1_BIN)
+ lookup_tree(&empty_tree_oid)
@@
expression E1;
@@
- lookup_tree(E1.hash)
+ lookup_tree(&E1)
@@
expression E1;
@@
- lookup_tree(E1->hash)
+ lookup_tree(E1)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert lookup_blob to take a pointer to struct object_id.
The commit was created with manual changes to blob.c and blob.h, plus
the following semantic patch:
@@
expression E1;
@@
- lookup_blob(E1.hash)
+ lookup_blob(&E1)
@@
expression E1;
@@
- lookup_blob(E1->hash)
+ lookup_blob(E1)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, Git's source code represents all timestamps as `unsigned
long`. In preparation for using a more appropriate data type, let's
introduce a symbol `parse_timestamp` (currently being defined to
`strtoul`) where appropriate, so that we can later easily switch to,
say, use `strtoull()` instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since this structure handles an array of object IDs, rename it to struct
oid_array. Also rename the accessor functions and the initialization
constant.
This commit was produced mechanically by providing non-Documentation
files to the following Perl one-liners:
perl -pi -E 's/struct sha1_array/struct oid_array/g'
perl -pi -E 's/\bsha1_array_/oid_array_/g'
perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g'
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert this function by changing the declaration and definition and
applying the following semantic patch to update the callers:
@@
expression E1, E2;
@@
- sha1_array_lookup(E1, E2.hash)
+ sha1_array_lookup(E1, &E2)
@@
expression E1, E2;
@@
- sha1_array_lookup(E1, E2->hash)
+ sha1_array_lookup(E1, E2)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the callers to pass struct object_id by changing the function
declaration and definition and applying the following semantic patch:
@@
expression E1, E2;
@@
- sha1_array_append(E1, E2.hash)
+ sha1_array_append(E1, &E2)
@@
expression E1, E2;
@@
- sha1_array_append(E1, E2->hash)
+ sha1_array_append(E1, E2)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the internal storage for struct sha1_array use an array of struct
object_id internally. Update the users of this struct which inspect its
internals.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert a hardcoded constant buffer size to a use of GIT_MAX_HEXSZ, and
use parse_oid_hex to reduce the dependency on the size of the hash.
This function is a caller of sha1_array_append, which will be converted
later.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The recent fixes to "fsck --connectivity-only" load all of
the objects with their correct types. This keeps the
connectivity-only code path close to the regular one, but it
also introduces some unnecessary inefficiency. While getting
the type of an object is cheap compared to actually opening
and parsing the object (as the non-connectivity-only case
would do), it's still not free.
For reachable non-blob objects, we end up having to parse
them later anyway (to see what they point to), making our
type lookup here redundant.
For unreachable objects, we might never hit them at all in
the reachability traversal, making the lookup completely
wasted. And in some cases, we might have quite a few
unreachable objects (e.g., when alternates are used for
shared object storage between repositories, it's normal for
there to be objects reachable from other repositories but
not the one running fsck).
The comment in mark_object_for_connectivity() claims two
benefits to getting the type up front:
1. We need to know the types during fsck_walk(). (And not
explicitly mentioned, but we also need them when
printing the types of broken or dangling commits).
We can address this by lazy-loading the types as
necessary. Most objects never need this lazy-load at
all, because they fall into one of these categories:
a. Reachable from our tips, and are coerced into the
correct type as we traverse (e.g., a parent link
will call lookup_commit(), which converts OBJ_NONE
to OBJ_COMMIT).
b. Unreachable, but not at the tip of a chunk of
unreachable history. We only mention the tips as
"dangling", so an unreachable commit which links
to hundreds of other objects needs only report the
type of the tip commit.
2. It serves as a cross-check that the coercion in (1a) is
correct (i.e., we'll complain about a parent link that
points to a blob). But we get most of this for free
already, because right after coercing, we'll parse any
non-blob objects. So we'd notice then if we expected a
commit and got a blob.
The one exception is when we expect a blob, in which
case we never actually read the object contents.
So this is a slight weakening, but given that the whole
point of --connectivity-only is to sacrifice some data
integrity checks for speed, this seems like an
acceptable tradeoff.
Here are before and after timings for an extreme case with
~5M reachable objects and another ~12M unreachable (it's the
torvalds/linux repository on GitHub, connected to shared
storage for all of the other kernel forks):
[before]
$ time git fsck --no-dangling --connectivity-only
real 3m4.323s
user 1m25.121s
sys 1m38.710s
[after]
$ time git fsck --no-dangling --connectivity-only
real 0m51.497s
user 0m49.575s
sys 0m1.776s
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of dying when fsck hits a malformed tree object, log the error
like any other and continue. Now fsck can tell the user which tree is
bad, too.
Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reporting broken links between commits/trees/blobs, it would be
quite helpful at times if the user would be told how the object is
supposed to be reachable.
With the new --name-objects option, git-fsck will try to do exactly
that: name the objects in a way that shows how they are reachable.
For example, when some reflog got corrupted and a blob is missing that
should not be, the user might want to remove the corresponding reflog
entry. This option helps them find that entry: `git fsck` will now
report something like this:
broken link from tree b5eb6ff... (refs/stash@{<date>}~37:)
to blob ec5cf80...
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We will need this in the next commit, where fsck will be taught to
optionally name the objects when reporting issues about them.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If fsck_options->name_objects is initialized, and if it already has
name(s) for the object(s) that are to be the starting point(s) for
fsck_walk(), then that function will now add names for the objects
that were walked.
This will be highly useful for teaching git-fsck to identify root causes
for broken links, which is the task for the next patch in this series.
Note that this patch opts for decorating the objects with plain strings
instead of full-blown structs (à la `struct rev_name` in the code of
the `git name-rev` command), for several reasons:
- the code is much simpler than if it had to work with structs that
describe arbitrarily long names such as "master~14^2~5:builtin/am.c",
- the string processing is actually quite light-weight compared to the
rest of fsck's operation,
- the caller of fsck_walk() is expected to provide names for the
starting points, and using plain and simple strings is just the
easiest way to do that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fsck" learned to catch NUL byte in a commit object as
potential error and warn.
* jc/fsck-nul-in-commit:
fsck: detect and warn a commit with embedded NUL
fsck_commit_buffer(): do not special case the last validation
Even though a Git commit object is designed to be capable of storing
any binary data as its payload, in practice people use it to describe
the changes in textual form, and tools like "git log" are designed to
treat the payload as text.
Detect and warn when we see any commit object with a NUL byte in
it.
Note that a NUL byte in the header part is already detected as a
grave error. This change is purely about the message part.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pattern taken by all the validations in this function is:
if (notice a violation exists) {
err = report(... VIOLATION_KIND ...);
if (err)
return err;
}
where report() returns zero if specified kind of violation is set to
be ignored, and otherwise shows an error message and returns non-zero.
The last validation in the function immediately before the function
returns 0 to declare "all good" can cheat and directly return the
return value from report(), and the current code does so, i.e.
if (notice a violation exists)
return report(... VIOLATION_KIND ...);
return 0;
But that is a selfish code that declares it is the ultimate and
final form of the function, never to be enhanced later. To allow
and invite future enhancements, make the last test follow the same
pattern.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each of these cases can be converted to use ALLOC_ARRAY or
REALLOC_ARRAY, which has two advantages:
1. It automatically checks the array-size multiplication
for overflow.
2. It always uses sizeof(*array) for the element-size,
so that it can never go out of sync with the declared
type of the array.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More transition from "unsigned char[40]" to "struct object_id".
This needed a few merge fixups, but is mostly disentangled from other
topics.
* bc/object-id:
remote: convert functions to struct object_id
Remove get_object_hash.
Convert struct object to object_id
Add several uses of get_object_hash.
object: introduce get_object_hash macro.
ref_newer: convert to use struct object_id
push_refs_with_export: convert to struct object_id
get_remote_heads: convert to struct object_id
parse_fetch: convert to use struct object_id
add_sought_entry_mem: convert to struct object_id
Convert struct ref to use object_id.
sha1_file: introduce has_object_file helper.
We check the return value of verify_header() for commits already, so do
the same for tags as well.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
Convert all instances of get_object_hash to use an appropriate reference
to the hash member of the oid member of struct object. This provides no
functional change, as it is essentially a macro substitution.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
struct object is one of the major data structures dealing with object
IDs. Convert it to use struct object_id instead of an unsigned char
array. Convert get_object_hash to refer to the new member as well.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
Convert most instances where the sha1 member of struct object is
dereferenced to use get_object_hash. Most instances that are passed to
functions that have versions taking struct object_id, such as
get_sha1_hex/get_oid_hex, or instances that can be trivially converted
to use struct object_id instead, are not converted.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
Allow ignoring fsck errors on specific set of known-to-be-bad
objects, and also tweaking warning level of various kinds of non
critical breakages reported.
* js/fsck-opt:
fsck: support ignoring objects in `git fsck` via fsck.skiplist
fsck: git receive-pack: support excluding objects from fsck'ing
fsck: introduce `git fsck --connectivity-only`
fsck: support demoting errors to warnings
fsck: document the new receive.fsck.<msg-id> options
fsck: allow upgrading fsck warnings to errors
fsck: optionally ignore specific fsck issues completely
fsck: disallow demoting grave fsck errors to warnings
fsck: add a simple test for receive.fsck.<msg-id>
fsck: make fsck_tag() warn-friendly
fsck: handle multiple authors in commits specially
fsck: make fsck_commit() warn-friendly
fsck: make fsck_ident() warn-friendly
fsck: report the ID of the error/warning
fsck (receive-pack): allow demoting errors to warnings
fsck: offer a function to demote fsck errors to warnings
fsck: provide a function to parse fsck message IDs
fsck: introduce identifiers for fsck messages
fsck: introduce fsck options
A fix to a minor regression to "git fsck" in v2.2 era that started
complaining about a body-less tag object when it lacks a separator
empty line after its header to separate it with a non-existent body.
* jc/fsck-retire-require-eoh:
fsck: it is OK for a tag and a commit to lack the body
When fsck validates a commit or a tag, it scans each line in the
header of the object using helper functions such as "start_with()",
etc. that work on a NUL terminated buffer, but before a1e920a0
(index-pack: terminate object buffers with NUL, 2014-12-08), the
validation functions were fed the object data in a piece of memory
that is not necessarily terminated with a NUL.
We added a helper function require_end_of_header() to be called at
the beginning of these validation functions to insist that the
object data contains an empty line before its end. The theory is
that the validating functions will notice and stop when it hits an
empty line as a normal end of header (or a required header line that
is missing) without scanning past the end of potentially not
NUL-terminated buffer.
But the theory forgot that in the older days, Git itself happily
created objects with only the header lines without a body. This
caused Git 2.2 and later to issue an unnecessary warning in some
existing repositories.
With a1e920a0, we do not need to require an empty line (or the body)
in these objects to safely parse and validate them. Drop the
offending "must have an empty line" check from this helper function,
while keeping the other check to make sure that there is no NUL in
the header part of the object, and adjust the name of the helper to
what it does accordingly.
Noticed-by: Wolfgang Denk <wd@denx.de>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The optional new config option `receive.fsck.skipList` specifies the path
to a file listing the names, i.e. SHA-1s, one per line, of objects that
are to be ignored by `git receive-pack` when `receive.fsckObjects = true`.
This is extremely handy in case of legacy repositories where it would
cause more pain to change incorrect objects than to live with them
(e.g. a duplicate 'author' line in an early commit object).
The intended use case is for server administrators to inspect objects
that are reported by `git push` as being too problematic to enter the
repository, and to add the objects' SHA-1 to a (preferably sorted) file
when the objects are legitimate, i.e. when it is determined that those
problematic objects should be allowed to enter the server.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'invalid tag name' and 'missing tagger entry' warnings can now be
upgraded to errors by specifying `invalidTagName` and
`missingTaggerEntry` in the receive.fsck.<msg-id> config setting.
Incidentally, the missing tagger warning is now really shown as a warning
(as opposed to being reported with the "error:" prefix, as it used to be
the case before this commit).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An fsck issue in a legacy repository might be so common that one would
like not to bother the user with mentioning it at all. With this change,
that is possible by setting the respective message type to "ignore".
This change "abuses" the missingEmail=warn test to verify that "ignore"
is also accepted and works correctly. And while at it, it makes sure
that multiple options work, too (they are passed to unpack-objects or
index-pack as a comma-separated list via the --strict=... command-line
option).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some kinds of errors are intrinsically unrecoverable (e.g. errors while
uncompressing objects). It does not make sense to allow demoting them to
mere warnings.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fsck_tag() identifies a problem with the commit, it should try
to make it possible to continue checking the commit object, in case the
user wants to demote the detected errors to mere warnings.
Just like fsck_commit(), there are certain problems that could hide other
issues with the same tag object. For example, if the 'type' line is not
encountered in the correct position, the 'tag' line – if there is any –
would not be handled at all.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This problem has been detected in the wild, and is the primary reason
to introduce an option to demote certain fsck errors to warnings. Let's
offer to ignore this particular problem specifically.
Technically, we could handle such repositories by setting
receive.fsck.<msg-id> to missingCommitter=warn, but that could hide
missing tree objects in the same commit because we cannot continue
verifying any commit object after encountering a missing committer line,
while we can continue in the case of multiple author lines.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fsck_commit() identifies a problem with the commit, it should try
to make it possible to continue checking the commit object, in case the
user wants to demote the detected errors to mere warnings.
Note that some problems are too problematic to simply ignore. For
example, when the header lines are mixed up, we punt after encountering
an incorrect line. Therefore, demoting certain warnings to errors can
hide other problems. Example: demoting the missingauthor error to
a warning would hide a problematic committer line.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fsck_ident() identifies a problem with the ident, it should still
advance the pointer to the next line so that fsck can continue in the
case of a mere warning.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some repositories written by legacy code have objects with non-fatal
fsck issues. To allow the user to ignore those issues, let's print
out the ID (e.g. when encountering "missingEmail", the user might
want to call `git config --add receive.fsck.missingEmail=warn`).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For example, missing emails in commit and tag objects can be demoted to
mere warnings with
git config receive.fsck.missingemail=warn
The value is actually a comma-separated list.
In case that the same key is listed in multiple receive.fsck.<msg-id>
lines in the config, the latter configuration wins (this can happen for
example when both $HOME/.gitconfig and .git/config contain message type
settings).
As git receive-pack does not actually perform the checks, it hands off
the setting to index-pack or unpack-objects in the form of an optional
argument to the --strict option.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are legacy repositories out there whose older commits and tags
have issues that prevent pushing them when 'receive.fsckObjects' is set.
One real-life example is a commit object that has been hand-crafted to
list two authors.
Often, it is not possible to fix those issues without disrupting the
work with said repositories, yet it is still desirable to perform checks
by setting `receive.fsckObjects = true`. This commit is the first step
to allow demoting specific fsck issues to mere warnings.
The `fsck_set_msg_types()` function added by this commit parses a list
of settings in the form:
missingemail=warn,badname=warn,...
Unfortunately, the FSCK_WARN/FSCK_ERROR flag is only really heeded by
git fsck so far, but other call paths (e.g. git index-pack --strict)
error out *always* no matter what type was specified. Therefore, we need
to take extra care to set all message types to FSCK_ERROR by default in
those cases.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These functions will be used in the next commits to allow the user to
ask fsck to handle specific problems differently, e.g. demoting certain
errors to warnings. The upcoming `fsck_set_msg_types()` function has to
handle partial strings because we would like to be able to parse, say,
'missingemail=warn,missingtaggerentry=warn' command line parameters
(which will be passed by receive-pack to index-pack and unpack-objects).
To make the parsing robust, we generate strings from the enum keys, and
using these keys, we match up strings without dashes case-insensitively
to the corresponding enum values.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of specifying whether a message by the fsck machinery constitutes
an error or a warning, let's specify an identifier relating to the
concrete problem that was encountered. This is necessary for upcoming
support to be able to demote certain errors to warnings.
In the process, simplify the requirements on the calling code: instead of
having to handle full-blown varargs in every callback, we now send a
string buffer ready to be used by the callback.
We could use a simple enum for the message IDs here, but we want to
guarantee that the enum values are associated with the appropriate
message types (i.e. error or warning?). Besides, we want to introduce a
parser in the next commit that maps the string representation to the
enum value, hence we use the slightly ugly preprocessor construct that
is extensible for use with said parser.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Just like the diff machinery, we are about to introduce more settings,
therefore it makes sense to carry them around as a (pointer to a) struct
containing all of them.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
New tag object format validation added in 2.2 showed garbage
after a tagname it reported in its error message.
* js/fsck-tag-validation:
index-pack: terminate object buffers with NUL
fsck: properly bound "invalid tag name" error message
Now that the index can block pathnames that can be mistaken
to mean ".git" on NTFS and FAT32, it would be helpful for
fsck to notice such problematic paths. This lets servers
which use receive.fsckObjects block them before the damage
spreads.
Note that the fsck check is always on, even for systems
without core.protectNTFS set. This is technically more
restrictive than we need to be, as a set of users on ext4
could happily use these odd filenames without caring about
NTFS.
However, on balance, it's helpful for all servers to block
these (because the paths can be used for mischief, and
servers which bother to fsck would want to stop the spread
whether they are on NTFS themselves or not), and hardly
anybody will be affected (because the blocked names are
variants of .git or git~1, meaning mischief is almost
certainly what the tree author had in mind).
Ideally these would be controlled by a separate
"fsck.protectNTFS" flag. However, it would be much nicer to
be able to enable/disable _any_ fsck flag individually, and
any scheme we choose should match such a system. Given the
likelihood of anybody using such a path in practice, it is
not unreasonable to wait until such a system materializes.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the index can block pathnames that case-fold to
".git" on HFS+, it would be helpful for fsck to notice such
problematic paths. This lets servers which use
receive.fsckObjects block them before the damage spreads.
Note that the fsck check is always on, even for systems
without core.protectHFS set. This is technically more
restrictive than we need to be, as a set of users on ext4
could happily use these odd filenames without caring about
HFS+.
However, on balance, it's helpful for all servers to block
these (because the paths can be used for mischief, and
servers which bother to fsck would want to stop the spread
whether they are on HFS+ themselves or not), and hardly
anybody will be affected (because the blocked names are
variants of .git with invisible Unicode code-points mixed
in, meaning mischief is almost certainly what the tree
author had in mind).
Ideally these would be controlled by a separate
"fsck.protectHFS" flag. However, it would be much nicer to
be able to enable/disable _any_ fsck flag individually, and
any scheme we choose should match such a system. Given the
likelihood of anybody using such a path in practice, it is
not unreasonable to wait until such a system materializes.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We complain about ".git" in a tree because it cannot be
loaded into the index or checked out. Since we now also
reject ".GIT" case-insensitively, fsck should notice the
same, so that errors do not propagate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we detect an invalid tag-name header in a tag object,
like, "tag foo bar\n", we feed the pointer starting at "foo
bar" to a printf "%s" formatter. This shows the name, as we
want, but then it keeps printing the rest of the tag buffer,
rather than stopping at the end of the line.
Our tests did not notice because they look only for the
matching line, but the bug is that we print much more than
we wanted to. So we also adjust the test to be more exact.
Note that when fscking tags with "index-pack --strict", this
is even worse. index-pack does not add a trailing
NUL-terminator after the object, so we may actually read
past the buffer and print uninitialized memory. Running
t5302 with valgrind does notice the bug for that reason.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We inspect commit objects pretty much in detail in git-fsck, but we just
glanced over the tag objects. Let's be stricter.
Since we do not want to limit 'tag' lines unduly, values that would fail
the refname check only result in warnings, not errors.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
So far, we assumed that the buffer is NUL terminated, but this is not
a safe assumption, now that we opened the fsck_object() API to pass a
buffer directly.
So let's make sure that there is at least an empty line in the buffer.
That way, our checks would fail if the empty line was encountered
prematurely, and consequently we can get away with the current string
comparisons even with non-NUL-terminated buffers are passed to
fsck_object().
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fsck'ing an incoming pack, we need to fsck objects that cannot be
read via read_sha1_file() because they are not local yet (and might even
be rejected if transfer.fsckobjects is set to 'true').
For commits, there is a hack in place: we basically cache commit
objects' buffers anyway, but the same is not true, say, for tag objects.
By refactoring fsck_object() to take the object buffer and size as
optional arguments -- optional, because we still fall back to the
previous method to look at the cached commit objects if the caller
passes NULL -- we prepare the machinery for the upcoming handling of tag
objects.
The assumption that such buffers are inherently NUL terminated is now
wrong, of course, hence we pass the size of the buffer so that we can
add a sanity check later, to prevent running past the end of the buffer.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck_commit_buffer() checks that the number of items in the parents
list of a commit matches the number of parent lines in its buffer or --
if a graft is used -- the number of parents in that graft. Simplify
the code by using commit_list_count() instead of counting by hand.
Also use different variables for the number of lines and the number of
list items, making it easier to compare them.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/skip-prefix:
http-push: refactor parsing of remote object names
imap-send: use skip_prefix instead of using magic numbers
use skip_prefix to avoid repeated calculations
git: avoid magic number with skip_prefix
fetch-pack: refactor parsing in get_ack
fast-import: refactor parsing of spaces
stat_opt: check extra strlen call
daemon: use skip_prefix to avoid magic numbers
fast-import: use skip_prefix for parsing input
use skip_prefix to avoid repeating strings
use skip_prefix to avoid magic numbers
transport-helper: avoid reading past end-of-string
fast-import: fix read of uninitialized argv memory
apply: use skip_prefix instead of raw addition
refactor skip_prefix to return a boolean
avoid using skip_prefix as a boolean
daemon: mark some strings as const
parse_diff_color_slot: drop ofs parameter
The skip_prefix() function returns a pointer to the content
past the prefix, or NULL if the prefix was not found. While
this is nice and simple, in practice it makes it hard to use
for two reasons:
1. When you want to conditionally skip or keep the string
as-is, you have to introduce a temporary variable.
For example:
tmp = skip_prefix(buf, "foo");
if (tmp)
buf = tmp;
2. It is verbose to check the outcome in a conditional, as
you need extra parentheses to silence compiler
warnings. For example:
if ((cp = skip_prefix(buf, "foo"))
/* do something with cp */
Both of these make it harder to use for long if-chains, and
we tend to use starts_with() instead. However, the first line
of "do something" is often to then skip forward in buf past
the prefix, either using a magic constant or with an extra
strlen(3) (which is generally computed at compile time, but
means we are repeating ourselves).
This patch refactors skip_prefix() to return a simple boolean,
and to provide the pointer value as an out-parameter. If the
prefix is not found, the out-parameter is untouched. This
lets you write:
if (skip_prefix(arg, "foo ", &arg))
do_foo(arg);
else if (skip_prefix(arg, "bar ", &arg))
do_bar(arg);
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most callsites which use the commit buffer try to use the
cached version attached to the commit, rather than
re-reading from disk. Unfortunately, that interface provides
only a pointer to the NUL-terminated buffer, with no
indication of the original length.
For the most part, this doesn't matter. People do not put
NULs in their commit messages, and the log code is happy to
treat it all as a NUL-terminated string. However, some code
paths do care. For example, when checking signatures, we
want to be very careful that we verify all the bytes to
avoid malicious trickery.
This patch just adds an optional "size" out-pointer to
get_commit_buffer and friends. The existing callers all pass
NULL (there did not seem to be any obvious sites where we
could avoid an immediate strlen() call, though perhaps with
some further refactoring we could).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each of these sites assumes that commit->buffer is valid.
Since they would segfault if this was not the case, they are
likely to be correct in practice. However, we can
future-proof them by using get_commit_buffer.
And as a side effect, we abstract away the final bare uses
of commit->buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck_tree() has two different ways to set a flag variable, either by
using a if-statement that guards an assignment, or by using a
bitwise-or assignment operator. Most are done with the former, and
only one variable is assigned with the latter.
Since all the conditions are short-and-sweet, we can afford to
uniformly use the latter style, which makes the resulting code
shorter and easier to read.
Signed-off-by: Hiroyuki Sano <sh19910711@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck_commit() uses memcmp() to check if the buffer starts with a
certain prefix, and skips the prefix if it does.
This is exactly what skip_prefix() was designed for.
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since fsck_ident doesn't change the content of **ident, the type of
ident could be const char **.
This change is required to rewrite fsck_commit() to use skip_prefix().
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we check whether a timestamp has overflowed, we check
only against ULONG_MAX, meaning that strtoul has overflowed.
However, we also feed these timestamps to system functions
like gmtime, which expect a time_t. On many systems, time_t
is actually smaller than "unsigned long" (e.g., because it
is signed), and we would overflow when using these
functions. We don't know the actual size or signedness of
time_t, but we can easily check for truncation with a simple
assignment.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we check commit objects, we complain if commit->date is
ULONG_MAX, which is an indication that we saw integer
overflow when parsing it. However, we do not do any check at
all for author lines, which also contain a timestamp.
Let's actually check the timestamps on each ident line
with strtoul. This catches both author and committer lines,
and we can get rid of the now-redundant commit->date check.
Note that like the existing check, we compare only against
ULONG_MAX. Now that we are calling strtoul at the site of
the check, we could be slightly more careful and also check
that errno is set to ERANGE. However, this will make further
refactoring in future patches a little harder, and it
doesn't really matter in practice.
For 32-bit systems, one would have to create a commit at the
exact wrong second in 2038. But by the time we get close to
that, all systems will hopefully have moved to 64-bit (and
if they haven't, they have a real problem one second later).
For 64-bit systems, by the time we get close to ULONG_MAX,
all systems will hopefully have been consumed in the fiery
wrath of our expanding Sun.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Having a ".git" entry inside a tree can cause confusing
results on checkout. At the top-level, you could not
checkout such a tree, as it would complain about overwriting
the real ".git" directory. In a subdirectory, you might
check it out, but performing operations in the subdirectory
would confusingly consider the in-tree ".git" directory as
the repository.
The regular git tools already make it hard to accidentally
add such an entry to a tree, and do not allow such entries
to enter the index at all. Teaching fsck about it provides
an additional safety check, and let's us avoid propagating
any such bogosity when transfer.fsckObjects is on.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A tree with meta-paths like '.' or '..' does not work well
with git; the index will refuse to load it or check it out
to the filesystem (and even if we did not have that safety,
it would look like we were overwriting an untracked
directory). For the same reason, it is difficult to create
such a tree with regular git.
Let's warn about these dubious entries during fsck, just in
case somebody has created a bogus tree (and this also lets
us prevent them from propagating when transfer.fsckObjects
is set).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff" had a confusion between taking data from a path in the
working tree and taking data from an object that happens to have
name 0{40} recorded in a tree.
* jk/maint-null-in-trees:
fsck: detect null sha1 in tree entries
do not write null sha1s to on-disk index
diff: do not use null sha1 as a sentinel value
Short of somebody happening to beat the 1 in 2^160 odds of
actually generating content that hashes to the null sha1, we
should never see this value in a tree entry. So let's have
fsck warn if it it seen.
As in the previous commit, we test both blob and submodule
entries to future-proof the test suite against the
implementation depending on connectivity to notice the
error.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The error handling routines add a newline. Remove
the duplicate ones in error messages.
Signed-off-by: Pete Wyckoff <pw@padd.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck allows a name with > character in it like "name> <email>". Also for
"name email>" fsck says "missing space before email".
More precisely, it seeks for a first '<', checks that ' ' preceeds it.
Then seeks to '<' or '>' and checks that it is the '>'. Missing space is
reported if either '<' is not found or it's not preceeded with ' '.
Change it to following. Seek to '<' or '>', check that it is '<' and is
preceeded with ' '. Seek to '<' or '>' and check that it is '>'. So now
"name> <email>" is rejected as "bad name". More strict name check is the
only change in what is accepted.
Report 'missing space' only if '<' is found and is not preceeded with a
space.
Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jm/maint-misc-fix:
read_gitfile_gently: use ssize_t to hold read result
remove tests of always-false condition
rerere.c: diagnose a corrupt MERGE_RR when hitting EOF between TAB and '\0'
* fsck.c (fsck_error_function): Don't test obj->sha1 == 0.
It can never be true, since that sha1 member is an array.
* transport.c (set_upstreams): Likewise for ref->new_sha1.
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a variable-args function, the code for writing into a strbuf is
non-trivial. We ended up cutting and pasting it in several places
because there was no vprintf-style function for strbufs (which in turn
was held up by a lack of va_copy).
Now that we have a fallback va_copy, we can add strbuf_vaddf, the
strbuf equivalent of vsprintf. And we can clean up the cut and paste
mess.
Signed-off-by: Jeff King <peff@peff.net>
Improved-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
daae1922 (fsck: check ident lines in commit objects, 2010-04-24)
taught fsck to expect commit objects to have the form
tree <object name>
<parents>
author <valid ident string>
committer <valid ident string>
log message
The check is overly strict: for example, it errors out with the
message “expected blank line” for perfectly valid commits with an
"encoding ISO-8859-1" line.
Later it might make sense to teach fsck about the rest of the header
and warn about unrecognized header lines, but for simplicity, let’s
accept arbitrary trailing lines for now.
Reported-by: Tuncer Ayaz <tuncer.ayaz@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Check that email addresses do not contain <, >, or newline so they can
be quickly scanned without trouble. The copy() function in ident.c
already ensures that ordinary git commands will not write email
addresses without this property.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is common practice to use the Unix epoch as a fallback date
when a suitable date is not available. This is true of git svn
and possibly other importing tools that import non-git history
into git.
Instead of clobbering established strtoul() error reporting
semantics with our own, preserve the strtoul() error value
of ULONG_MAX for fsck.c to handle.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These variables were unused and can be removed safely:
builtin-clone.c::cmd_clone(): use_local_hardlinks, use_separate_remote
builtin-fetch-pack.c::find_common(): len
builtin-remote.c::mv(): symref
diff.c::show_stats():show_stats(): total
diffcore-break.c::should_break(): base_size
fast-import.c::validate_raw_date(): date, sign
fsck.c::fsck_tree(): o_sha1, sha1
xdiff-interface.c::parse_num(): read_some
Signed-off-by: Benjamin Kramer <benny.kra@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
Fix non-literal format in printf-style calls
git-submodule: Avoid printing a spurious message.
git ls-remote: make usage string match manpage
Makefile: help people who run 'make check' by mistake
These were found using gcc 4.3.2-1ubuntu11 with the warning:
warning: format not a string literal and no format arguments
Incorporated suggestions from Brandon Casey <casey@nrlssc.navy.mil>.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many call sites use strbuf_init(&foo, 0) to initialize local
strbuf variable "foo" which has not been accessed since its
declaration. These can be replaced with a static initialization
using the STRBUF_INIT macro which is just as readable, saves a
function call, and takes up fewer lines.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ba002f3 (builtin-fsck: move common object checking code to fsck.c) did
more than what it claimed to. Most notably, it wrongly made an empty tree
object an error by pretending to only move code from fsck_tree() in
builtin-fsck.c to fsck_tree() in fsck.c, but in fact adding a bogus check
to barf on an empty tree.
An empty tree object is _unusual_. Recent porcelains try reasonably hard
not to let the user create a commit that contains such a tree. Perhaps
warning about them in git-fsck may have some merit.
HOWEVER.
Being unusual and being errorneous are two quite different things. This
is especially true now we seem to use the same fsck_$object() code in
places other than git-fsck itself. For example, receive-pack should not
reject unusual objects, even if it would be a good idea to tighten it to
reject incorrect ones.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The requirements are:
* it may not crash on NULL pointers
* a callback function is needed, as index-pack/unpack-objects
need to do different things
* the type information is needed to check the expected <-> real type
and print better error messages
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The earlier change df391b192 to rename fsck-objects to fsck broke
fsck-objects. This should fix it again.
Signed-off-by: Mark Wooding <mdw@distorted.org.uk>
Signed-off-by: Junio C Hamano <junkio@cox.net>