mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-27 20:14:30 +08:00

Author	SHA1	Message	Date
Jeff King	159e7b080b	fsck: detect gitmodules files In preparation for performing fsck checks on .gitmodules files, this commit plumbs in the actual detection of the files. Note that unlike most other fsck checks, this cannot be a property of a single object: we must know that the object is found at a ".gitmodules" path at the root tree of a commit. Since the fsck code only sees one object at a time, we have to mark the related objects to fit the puzzle together. When we see a commit we mark its tree as a root tree, and when we see a root tree with a .gitmodules file, we mark the corresponding blob to be checked. In an ideal world, we'd check the objects in topological order: commits followed by trees followed by blobs. In that case we can avoid ever loading an object twice, since all markings would be complete by the time we get to the marked objects. And indeed, if we are checking a single packfile, this is the order in which Git will generally write the objects. But we can't count on that: 1. git-fsck may show us the objects in arbitrary order (loose objects are fed in sha1 order, but we may also have multiple packs, and we process each pack fully in sequence). 2. The type ordering is just what git-pack-objects happens to write now. The pack format does not require a specific order, and it's possible that future versions of Git (or a custom version trying to fool official Git's fsck checks!) may order it differently. 3. We may not even be fscking all of the relevant objects at once. Consider pushing with transfer.fsckObjects, where one push adds a blob at path "foo", and then a second push adds the same blob at path ".gitmodules". The blob is not part of the second push at all, but we need to mark and check it. So in the general case, we need to make up to three passes over the objects: once to make sure we've seen all commits, then once to cover any trees we might have missed, and then a final pass to cover any .gitmodules blobs we found in the second pass. We can simplify things a bit by loosening the requirement that we find .gitmodules only at root trees. Technically a file like "subdir/.gitmodules" is not parsed by Git, but it's not unreasonable for us to declare that Git is aware of all ".gitmodules" files and make them eligible for checking. That lets us drop the root-tree requirement, which eliminates one pass entirely. And it makes our worst case much better: instead of potentially queueing every root tree to be re-examined, the worst case is that we queue each unique .gitmodules blob for a second look. This patch just adds the boilerplate to find .gitmodules files. The actual content checks will come in a subsequent commit. Signed-off-by: Jeff King <peff@peff.net>	2018-05-21 23:55:12 -04:00
Jeff King	7ac4f3a007	fsck: actually fsck blob data Because fscking a blob has always been a noop, we didn't bother passing around the blob data. In preparation for content-level checks, let's fix up a few things: 1. The fsck_object() function just returns success for any blob. Let's a noop fsck_blob(), which we can fill in with actual logic later. 2. The fsck_loose() function in builtin/fsck.c just threw away blob content after loading it. Let's hold onto it until after we've called fsck_object(). The easiest way to do this is to just drop the parse_loose_object() helper entirely. Incidentally, this also fixes a memory leak: if we successfully loaded the object data but did not parse it, we would have left the function without freeing it. 3. When fsck_loose() loads the object data, it does so with a custom read_loose_object() helper. This function streams any blobs, regardless of size, under the assumption that we're only checking the sha1. Instead, let's actually load blobs smaller than big_file_threshold, as the normal object-reading code-paths would do. This lets us fsck small files, and a NULL return is an indication that the blob was so big that it needed to be streamed, and we can pass that information along to fsck_blob(). Signed-off-by: Jeff King <peff@peff.net>	2018-05-21 23:55:12 -04:00
Jeff King	ed9c322062	fsck: simplify ".git" check There's no need for us to manually check for ".git"; it's a subset of the other filesystem-specific tests. Dropping it makes our code slightly shorter. More importantly, the existing code may make a reader wonder why ".GIT" is not covered here, and whether that is a bug (it isn't, as it's also covered in the filesystem-specific tests). Signed-off-by: Jeff King <peff@peff.net>	2018-05-21 23:55:12 -04:00
Jonathan Nieder	1f93ecd1ab	commit: add repository argument to lookup_commit_graft Add a repository argument to allow callers of lookup_commit_graft to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-18 08:13:10 +09:00
Stefan Beller	cbd53a2193	object-store: move object access functions to object-store.h This should make these functions easier to find and cache.h less overwhelming to read. In particular, this moves: - read_object_file - oid_object_info - write_object_file As a result, most of the codebase needs to #include object-store.h. In this patch the #include is only added to files that would fail to compile otherwise. It would be better to #include wherever identifiers from the header are used. That can happen later when we have better tooling for it. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-16 11:42:03 +09:00
brian m. carlson	c54f5ca970	fsck: convert static functions to struct object_id Convert two static functions to use struct object_id and parse_oid_hex, instead of relying on harcoded 20 and 40-based constants. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-02 13:59:50 +09:00
Derrick Stolee	2e27bd7731	treewide: replace maybe_tree with accessor methods In anticipation of making trees load lazily, create a Coccinelle script (contrib/coccinelle/commit.cocci) to ensure that all references to the 'maybe_tree' member of struct commit are either mutations or accesses through get_commit_tree() or get_commit_tree_oid(). Apply the Coccinelle script to create the rest of the patch. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-04-11 10:47:16 +09:00
Derrick Stolee	891435d55d	treewide: rename tree to maybe_tree Using the commit-graph file to walk commit history removes the large cost of parsing commits during the walk. This exposes a performance issue: lookup_tree() takes a large portion of the computation time, even when Git never uses those trees. In anticipation of lazy-loading these trees, rename the 'tree' member of struct commit to 'maybe_tree'. This serves two purposes: it hints at the future role of possibly being NULL even if the commit has a valid tree, and it allows for unambiguous transformation from simple member access (i.e. commit->maybe_tree) to method access. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-04-11 10:47:16 +09:00
brian m. carlson	b4f5aca40e	sha1_file: convert read_sha1_file to struct object_id Convert read_sha1_file to take a pointer to struct object_id and rename it read_object_file. Do the same for read_sha1_file_extended. Convert one use in grep.c to use the new function without any other code change, since the pointer being passed is a void pointer that is already initialized with a pointer to struct object_id. Update the declaration and definitions of the modified functions, and apply the following semantic patch to convert the remaining callers: @@ expression E1, E2, E3; @@ - read_sha1_file(E1.hash, E2, E3) + read_object_file(&E1, E2, E3) @@ expression E1, E2, E3; @@ - read_sha1_file(E1->hash, E2, E3) + read_object_file(E1, E2, E3) @@ expression E1, E2, E3, E4; @@ - read_sha1_file_extended(E1.hash, E2, E3, E4) + read_object_file_extended(&E1, E2, E3, E4) @@ expression E1, E2, E3, E4; @@ - read_sha1_file_extended(E1->hash, E2, E3, E4) + read_object_file_extended(E1, E2, E3, E4) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-03-14 09:23:50 -07:00
Brandon Williams	debca9d2fe	object: rename function 'typename' to 'type_name' Rename C++ keyword in order to bring the codebase closer to being able to be compiled with a C++ compiler. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-02-14 13:10:05 -08:00
Junio C Hamano	6defdc9fe8	Merge branch 'rs/fsck-null-return-from-lookup' Improve behaviour of "git fsck" upon finding a missing object. * rs/fsck-null-return-from-lookup: fsck: handle NULL return of lookup_blob() and lookup_tree()	2017-10-11 14:52:23 +09:00
René Scharfe	2720f6db5d	fsck: handle NULL return of lookup_blob() and lookup_tree() lookup_blob() and lookup_tree() can return NULL if they find an object of an unexpected type. Accessing the object member is undefined in that case. Cast the result to a struct object pointer instead; we can do that because object is the first member of all object types. This trick is already used in other places in the code. An error message is already shown by object_as_type(), which is called by the lookup functions. The walk callback functions are expected to handle NULL object pointers passed to them, but put_object_name() needs a valid object, so avoid calling it without one. Suggested-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-06 11:04:34 +09:00
Jeff King	1cf01a34ea	consistently use "fallthrough" comments in switches Gcc 7 adds -Wimplicit-fallthrough, which can warn when a switch case falls through to the next case. The general idea is that the compiler can't tell if this was intentional or not, so you should annotate any intentional fall-throughs as such, leaving it to complain about any unannotated ones. There's a GNU __attribute__ which can be used for annotation, but of course we'd have to #ifdef it away on non-gcc compilers. Gcc will also recognize specially-formatted comments, which matches our current practice. Let's extend that practice to all of the unannotated sites (which I did look over and verify that they were behaving as intended). Ideally in each case we'd actually give some reasons in the comment about why we're falling through, or what we're falling through to. And gcc does support that with -Wimplicit-fallthrough=2, which relaxes the comment pattern matching to anything that contains "fallthrough" (or a variety of spelling variants). However, this isn't the default for -Wimplicit-fallthrough, nor for -Wextra. In the name of simplicity, it's probably better for us to support the default level, which requires "fallthrough" to be the only thing in the comment (modulo some window dressing like "else" and some punctuation; see the gcc manual for the complete set of patterns). This patch suppresses all warnings due to -Wimplicit-fallthrough. We might eventually want to add that to the DEVELOPER Makefile knob, but we should probably wait until gcc 7 is more widely adopted (since earlier versions will complain about the unknown warning type). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-22 12:49:57 +09:00
Stefan Beller	8b65a34c4a	commit: convert lookup_commit_graft to struct object_id With this patch, commit.h doesn't contain the string 'sha1' any more. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-07-13 12:02:40 -07:00
Junio C Hamano	6b526ced6f	Merge branch 'bc/object-id' Conversion from uchar[20] to struct object_id continues. * bc/object-id: (53 commits) object: convert parse_object* to take struct object_id tree: convert parse_tree_indirect to struct object_id sequencer: convert do_recursive_merge to struct object_id diff-lib: convert do_diff_cache to struct object_id builtin/ls-tree: convert to struct object_id merge: convert checkout_fast_forward to struct object_id sequencer: convert fast_forward_to to struct object_id builtin/ls-files: convert overlay_tree_on_cache to object_id builtin/read-tree: convert to struct object_id sha1_name: convert internals of peel_onion to object_id upload-pack: convert remaining parse_object callers to object_id revision: convert remaining parse_object callers to object_id revision: rename add_pending_sha1 to add_pending_oid http-push: convert process_ls_object and descendants to object_id refs/files-backend: convert many internals to struct object_id refs: convert struct ref_update to use struct object_id ref-filter: convert some static functions to struct object_id Convert struct ref_array_item to struct object_id Convert the verify_pack callback to struct object_id Convert lookup_tag to struct object_id ...	2017-05-29 12:34:43 +09:00
brian m. carlson	c251c83df2	object: convert parse_object* to take struct object_id Make parse_object, parse_object_or_die, and parse_object_buffer take a pointer to struct object_id. Remove the temporary variables inserted earlier, since they are no longer necessary. Transform all of the callers using the following semantic patch: @@ expression E1; @@ - parse_object(E1.hash) + parse_object(&E1) @@ expression E1; @@ - parse_object(E1->hash) + parse_object(E1) @@ expression E1, E2; @@ - parse_object_or_die(E1.hash, E2) + parse_object_or_die(&E1, E2) @@ expression E1, E2; @@ - parse_object_or_die(E1->hash, E2) + parse_object_or_die(E1, E2) @@ expression E1, E2, E3, E4, E5; @@ - parse_object_buffer(E1.hash, E2, E3, E4, E5) + parse_object_buffer(&E1, E2, E3, E4, E5) @@ expression E1, E2, E3, E4, E5; @@ - parse_object_buffer(E1->hash, E2, E3, E4, E5) + parse_object_buffer(E1, E2, E3, E4, E5) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-08 15:12:58 +09:00
brian m. carlson	740ee055c6	Convert lookup_tree to struct object_id Convert the lookup_tree function to take a pointer to struct object_id. The commit was created with manual changes to tree.c, tree.h, and object.c, plus the following semantic patch: @@ @@ - lookup_tree(EMPTY_TREE_SHA1_BIN) + lookup_tree(&empty_tree_oid) @@ expression E1; @@ - lookup_tree(E1.hash) + lookup_tree(&E1) @@ expression E1; @@ - lookup_tree(E1->hash) + lookup_tree(E1) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-08 15:12:57 +09:00
brian m. carlson	3aca1fc6c9	Convert lookup_blob to struct object_id Convert lookup_blob to take a pointer to struct object_id. The commit was created with manual changes to blob.c and blob.h, plus the following semantic patch: @@ expression E1; @@ - lookup_blob(E1.hash) + lookup_blob(&E1) @@ expression E1; @@ - lookup_blob(E1->hash) + lookup_blob(E1) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-08 15:12:57 +09:00
Johannes Schindelin	1aeb7e756c	parse_timestamp(): specify explicitly where we parse timestamps Currently, Git's source code represents all timestamps as `unsigned long`. In preparation for using a more appropriate data type, let's introduce a symbol `parse_timestamp` (currently being defined to `strtoul`) where appropriate, so that we can later easily switch to, say, use `strtoull()` instead. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-04-23 20:19:15 -07:00
brian m. carlson	910650d2f8	Rename sha1_array to oid_array Since this structure handles an array of object IDs, rename it to struct oid_array. Also rename the accessor functions and the initialization constant. This commit was produced mechanically by providing non-Documentation files to the following Perl one-liners: perl -pi -E 's/struct sha1_array/struct oid_array/g' perl -pi -E 's/\bsha1_array_/oid_array_/g' perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g' Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-31 08:33:56 -07:00
brian m. carlson	5d3206d501	Convert sha1_array_lookup to take struct object_id Convert this function by changing the declaration and definition and applying the following semantic patch to update the callers: @@ expression E1, E2; @@ - sha1_array_lookup(E1, E2.hash) + sha1_array_lookup(E1, &E2) @@ expression E1, E2; @@ - sha1_array_lookup(E1, E2->hash) + sha1_array_lookup(E1, E2) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-31 08:33:55 -07:00
brian m. carlson	98a72ddc12	Make sha1_array_append take a struct object_id * Convert the callers to pass struct object_id by changing the function declaration and definition and applying the following semantic patch: @@ expression E1, E2; @@ - sha1_array_append(E1, E2.hash) + sha1_array_append(E1, &E2) @@ expression E1, E2; @@ - sha1_array_append(E1, E2->hash) + sha1_array_append(E1, E2) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-31 08:33:55 -07:00
brian m. carlson	ee3051bd23	sha1-array: convert internal storage for struct sha1_array to object_id Make the internal storage for struct sha1_array use an array of struct object_id internally. Update the users of this struct which inspect its internals. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-28 09:59:34 -07:00
brian m. carlson	365c27fbff	fsck: convert init_skiplist to struct object_id Convert a hardcoded constant buffer size to a use of GIT_MAX_HEXSZ, and use parse_oid_hex to reduce the dependency on the size of the hash. This function is a caller of sha1_array_append, which will be converted later. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-28 09:59:33 -07:00
Jeff King	a2b22854bd	fsck: lazily load types under --connectivity-only The recent fixes to "fsck --connectivity-only" load all of the objects with their correct types. This keeps the connectivity-only code path close to the regular one, but it also introduces some unnecessary inefficiency. While getting the type of an object is cheap compared to actually opening and parsing the object (as the non-connectivity-only case would do), it's still not free. For reachable non-blob objects, we end up having to parse them later anyway (to see what they point to), making our type lookup here redundant. For unreachable objects, we might never hit them at all in the reachability traversal, making the lookup completely wasted. And in some cases, we might have quite a few unreachable objects (e.g., when alternates are used for shared object storage between repositories, it's normal for there to be objects reachable from other repositories but not the one running fsck). The comment in mark_object_for_connectivity() claims two benefits to getting the type up front: 1. We need to know the types during fsck_walk(). (And not explicitly mentioned, but we also need them when printing the types of broken or dangling commits). We can address this by lazy-loading the types as necessary. Most objects never need this lazy-load at all, because they fall into one of these categories: a. Reachable from our tips, and are coerced into the correct type as we traverse (e.g., a parent link will call lookup_commit(), which converts OBJ_NONE to OBJ_COMMIT). b. Unreachable, but not at the tip of a chunk of unreachable history. We only mention the tips as "dangling", so an unreachable commit which links to hundreds of other objects needs only report the type of the tip commit. 2. It serves as a cross-check that the coercion in (1a) is correct (i.e., we'll complain about a parent link that points to a blob). But we get most of this for free already, because right after coercing, we'll parse any non-blob objects. So we'd notice then if we expected a commit and got a blob. The one exception is when we expect a blob, in which case we never actually read the object contents. So this is a slight weakening, but given that the whole point of --connectivity-only is to sacrifice some data integrity checks for speed, this seems like an acceptable tradeoff. Here are before and after timings for an extreme case with ~5M reachable objects and another ~12M unreachable (it's the torvalds/linux repository on GitHub, connected to shared storage for all of the other kernel forks): [before] $ time git fsck --no-dangling --connectivity-only real 3m4.323s user 1m25.121s sys 1m38.710s [after] $ time git fsck --no-dangling --connectivity-only real 0m51.497s user 0m49.575s sys 0m1.776s Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-26 10:51:09 -08:00
David Turner	8354fa3d4c	fsck: handle bad trees like other errors Instead of dying when fsck hits a malformed tree object, log the error like any other and continue. Now fsck can tell the user which tree is bad, too. Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-27 14:09:10 -07:00
Johannes Schindelin	90cf590f53	fsck: optionally show more helpful info for broken links When reporting broken links between commits/trees/blobs, it would be quite helpful at times if the user would be told how the object is supposed to be reachable. With the new --name-objects option, git-fsck will try to do exactly that: name the objects in a way that shows how they are reachable. For example, when some reflog got corrupted and a blob is missing that should not be, the user might want to remove the corresponding reflog entry. This option helps them find that entry: `git fsck` will now report something like this: broken link from tree b5eb6ff... (refs/stash@{<date>}~37:) to blob ec5cf80... Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-18 15:15:59 -07:00
Johannes Schindelin	1cd772cc41	fsck: give the error function a chance to see the fsck_options We will need this in the next commit, where fsck will be taught to optionally name the objects when reporting issues about them. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-18 11:35:00 -07:00
Johannes Schindelin	7b35efd734	fsck_walk(): optionally name objects on the go If fsck_options->name_objects is initialized, and if it already has name(s) for the object(s) that are to be the starting point(s) for fsck_walk(), then that function will now add names for the objects that were walked. This will be highly useful for teaching git-fsck to identify root causes for broken links, which is the task for the next patch in this series. Note that this patch opts for decorating the objects with plain strings instead of full-blown structs (à la `struct rev_name` in the code of the `git name-rev` command), for several reasons: - the code is much simpler than if it had to work with structs that describe arbitrarily long names such as "master~14^2~5:builtin/am.c", - the string processing is actually quite light-weight compared to the rest of fsck's operation, - the caller of fsck_walk() is expected to provide names for the starting points, and using plain and simple strings is just the easiest way to do that. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-18 11:35:00 -07:00
Junio C Hamano	6bfb7de89e	Merge branch 'jc/fsck-nul-in-commit' "git fsck" learned to catch NUL byte in a commit object as potential error and warn. * jc/fsck-nul-in-commit: fsck: detect and warn a commit with embedded NUL fsck_commit_buffer(): do not special case the last validation	2016-05-17 14:38:34 -07:00
Junio C Hamano	6d2d780f63	fsck: detect and warn a commit with embedded NUL Even though a Git commit object is designed to be capable of storing any binary data as its payload, in practice people use it to describe the changes in textual form, and tools like "git log" are designed to treat the payload as text. Detect and warn when we see any commit object with a NUL byte in it. Note that a NUL byte in the header part is already detected as a grave error. This change is purely about the message part. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-05-10 10:02:06 -07:00
brian m. carlson	ce6663a9da	tree-walk: convert tree_entry_extract() to use struct object_id Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-04-25 14:26:28 -07:00
brian m. carlson	7d924c9139	struct name_entry: use struct object_id instead of unsigned char sha1[20] Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-04-25 14:23:42 -07:00
Junio C Hamano	5af297185e	fsck_commit_buffer(): do not special case the last validation The pattern taken by all the validations in this function is: if (notice a violation exists) { err = report(... VIOLATION_KIND ...); if (err) return err; } where report() returns zero if specified kind of violation is set to be ignored, and otherwise shows an error message and returns non-zero. The last validation in the function immediately before the function returns 0 to declare "all good" can cheat and directly return the return value from report(), and the current code does so, i.e. if (notice a violation exists) return report(... VIOLATION_KIND ...); return 0; But that is a selfish code that declares it is the ultimate and final form of the function, never to be enhanced later. To allow and invite future enhancements, make the last test follow the same pattern. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-04-14 11:15:48 -07:00
Jeff King	b32fa95fd8	convert trivial cases to ALLOC_ARRAY Each of these cases can be converted to use ALLOC_ARRAY or REALLOC_ARRAY, which has two advantages: 1. It automatically checks the array-size multiplication for overflow. 2. It always uses sizeof(*array) for the element-size, so that it can never go out of sync with the declared type of the array. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-02-22 14:51:09 -08:00
Junio C Hamano	844a9ce472	Merge branch 'bc/object-id' More transition from "unsigned char[40]" to "struct object_id". This needed a few merge fixups, but is mostly disentangled from other topics. * bc/object-id: remote: convert functions to struct object_id Remove get_object_hash. Convert struct object to object_id Add several uses of get_object_hash. object: introduce get_object_hash macro. ref_newer: convert to use struct object_id push_refs_with_export: convert to struct object_id get_remote_heads: convert to struct object_id parse_fetch: convert to use struct object_id add_sought_entry_mem: convert to struct object_id Convert struct ref to use object_id. sha1_file: introduce has_object_file helper.	2015-12-10 12:36:13 -08:00
René Scharfe	8a272f291a	fsck: treat a NUL in a tag header as an error We check the return value of verify_header() for commits already, so do the same for tags as well. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net>	2015-11-20 08:02:07 -05:00
brian m. carlson	ed1c9977cb	Remove get_object_hash. Convert all instances of get_object_hash to use an appropriate reference to the hash member of the oid member of struct object. This provides no functional change, as it is essentially a macro substitution. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>	2015-11-20 08:02:05 -05:00
brian m. carlson	f2fd0760f6	Convert struct object to object_id struct object is one of the major data structures dealing with object IDs. Convert it to use struct object_id instead of an unsigned char array. Convert get_object_hash to refer to the new member as well. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>	2015-11-20 08:02:05 -05:00
brian m. carlson	7999b2cf77	Add several uses of get_object_hash. Convert most instances where the sha1 member of struct object is dereferenced to use get_object_hash. Most instances that are passed to functions that have versions taking struct object_id, such as get_sha1_hex/get_oid_hex, or instances that can be trivially converted to use struct object_id instead, are not converted. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>	2015-11-20 08:02:05 -05:00
Junio C Hamano	b2f44feba5	Merge branch 'js/fsck-opt' Allow ignoring fsck errors on specific set of known-to-be-bad objects, and also tweaking warning level of various kinds of non critical breakages reported. * js/fsck-opt: fsck: support ignoring objects in `git fsck` via fsck.skiplist fsck: git receive-pack: support excluding objects from fsck'ing fsck: introduce `git fsck --connectivity-only` fsck: support demoting errors to warnings fsck: document the new receive.fsck.<msg-id> options fsck: allow upgrading fsck warnings to errors fsck: optionally ignore specific fsck issues completely fsck: disallow demoting grave fsck errors to warnings fsck: add a simple test for receive.fsck.<msg-id> fsck: make fsck_tag() warn-friendly fsck: handle multiple authors in commits specially fsck: make fsck_commit() warn-friendly fsck: make fsck_ident() warn-friendly fsck: report the ID of the error/warning fsck (receive-pack): allow demoting errors to warnings fsck: offer a function to demote fsck errors to warnings fsck: provide a function to parse fsck message IDs fsck: introduce identifiers for fsck messages fsck: introduce fsck options	2015-08-03 11:01:18 -07:00
Junio C Hamano	acf7189512	Merge branch 'jc/fsck-retire-require-eoh' A fix to a minor regression to "git fsck" in v2.2 era that started complaining about a body-less tag object when it lacks a separator empty line after its header to separate it with a non-existent body. * jc/fsck-retire-require-eoh: fsck: it is OK for a tag and a commit to lack the body	2015-07-13 14:00:24 -07:00
Junio C Hamano	84d18c0bcf	fsck: it is OK for a tag and a commit to lack the body When fsck validates a commit or a tag, it scans each line in the header of the object using helper functions such as "start_with()", etc. that work on a NUL terminated buffer, but before `a1e920a0` (index-pack: terminate object buffers with NUL, 2014-12-08), the validation functions were fed the object data in a piece of memory that is not necessarily terminated with a NUL. We added a helper function require_end_of_header() to be called at the beginning of these validation functions to insist that the object data contains an empty line before its end. The theory is that the validating functions will notice and stop when it hits an empty line as a normal end of header (or a required header line that is missing) without scanning past the end of potentially not NUL-terminated buffer. But the theory forgot that in the older days, Git itself happily created objects with only the header lines without a body. This caused Git 2.2 and later to issue an unnecessary warning in some existing repositories. With `a1e920a0`, we do not need to require an empty line (or the body) in these objects to safely parse and validate them. Drop the offending "must have an empty line" check from this helper function, while keeping the other check to make sure that there is no NUL in the header part of the object, and adjust the name of the helper to what it does accordingly. Noticed-by: Wolfgang Denk <wd@denx.de> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-28 14:47:28 -07:00
Johannes Schindelin	cd94c6f91e	fsck: git receive-pack: support excluding objects from fsck'ing The optional new config option `receive.fsck.skipList` specifies the path to a file listing the names, i.e. SHA-1s, one per line, of objects that are to be ignored by `git receive-pack` when `receive.fsckObjects = true`. This is extremely handy in case of legacy repositories where it would cause more pain to change incorrect objects than to live with them (e.g. a duplicate 'author' line in an early commit object). The intended use case is for server administrators to inspect objects that are reported by `git push` as being too problematic to enter the repository, and to add the objects' SHA-1 to a (preferably sorted) file when the objects are legitimate, i.e. when it is determined that those problematic objects should be allowed to enter the server. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:37 -07:00
Johannes Schindelin	f27d05b170	fsck: allow upgrading fsck warnings to errors The 'invalid tag name' and 'missing tagger entry' warnings can now be upgraded to errors by specifying `invalidTagName` and `missingTaggerEntry` in the receive.fsck.<msg-id> config setting. Incidentally, the missing tagger warning is now really shown as a warning (as opposed to being reported with the "error:" prefix, as it used to be the case before this commit). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:36 -07:00
Johannes Schindelin	efaba7cc77	fsck: optionally ignore specific fsck issues completely An fsck issue in a legacy repository might be so common that one would like not to bother the user with mentioning it at all. With this change, that is possible by setting the respective message type to "ignore". This change "abuses" the missingEmail=warn test to verify that "ignore" is also accepted and works correctly. And while at it, it makes sure that multiple options work, too (they are passed to unpack-objects or index-pack as a comma-separated list via the --strict=... command-line option). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:36 -07:00
Johannes Schindelin	f50c440730	fsck: disallow demoting grave fsck errors to warnings Some kinds of errors are intrinsically unrecoverable (e.g. errors while uncompressing objects). It does not make sense to allow demoting them to mere warnings. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:36 -07:00
Johannes Schindelin	7d7d5b0568	fsck: make fsck_tag() warn-friendly When fsck_tag() identifies a problem with the commit, it should try to make it possible to continue checking the commit object, in case the user wants to demote the detected errors to mere warnings. Just like fsck_commit(), there are certain problems that could hide other issues with the same tag object. For example, if the 'type' line is not encountered in the correct position, the 'tag' line – if there is any – would not be handled at all. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:35 -07:00
Johannes Schindelin	c9ad147f83	fsck: handle multiple authors in commits specially This problem has been detected in the wild, and is the primary reason to introduce an option to demote certain fsck errors to warnings. Let's offer to ignore this particular problem specifically. Technically, we could handle such repositories by setting receive.fsck.<msg-id> to missingCommitter=warn, but that could hide missing tree objects in the same commit because we cannot continue verifying any commit object after encountering a missing committer line, while we can continue in the case of multiple author lines. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:35 -07:00
Johannes Schindelin	b3584761eb	fsck: make fsck_commit() warn-friendly When fsck_commit() identifies a problem with the commit, it should try to make it possible to continue checking the commit object, in case the user wants to demote the detected errors to mere warnings. Note that some problems are too problematic to simply ignore. For example, when the header lines are mixed up, we punt after encountering an incorrect line. Therefore, demoting certain warnings to errors can hide other problems. Example: demoting the missingauthor error to a warning would hide a problematic committer line. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:35 -07:00
Johannes Schindelin	e6826e335a	fsck: make fsck_ident() warn-friendly When fsck_ident() identifies a problem with the ident, it should still advance the pointer to the next line so that fsck can continue in the case of a mere warning. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:35 -07:00
Johannes Schindelin	71ab8fa840	fsck: report the ID of the error/warning Some repositories written by legacy code have objects with non-fatal fsck issues. To allow the user to ignore those issues, let's print out the ID (e.g. when encountering "missingEmail", the user might want to call `git config --add receive.fsck.missingEmail=warn`). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:35 -07:00
Johannes Schindelin	5d477a334a	fsck (receive-pack): allow demoting errors to warnings For example, missing emails in commit and tag objects can be demoted to mere warnings with git config receive.fsck.missingemail=warn The value is actually a comma-separated list. In case that the same key is listed in multiple receive.fsck.<msg-id> lines in the config, the latter configuration wins (this can happen for example when both $HOME/.gitconfig and .git/config contain message type settings). As git receive-pack does not actually perform the checks, it hands off the setting to index-pack or unpack-objects in the form of an optional argument to the --strict option. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:34 -07:00
Johannes Schindelin	0282f4dced	fsck: offer a function to demote fsck errors to warnings There are legacy repositories out there whose older commits and tags have issues that prevent pushing them when 'receive.fsckObjects' is set. One real-life example is a commit object that has been hand-crafted to list two authors. Often, it is not possible to fix those issues without disrupting the work with said repositories, yet it is still desirable to perform checks by setting `receive.fsckObjects = true`. This commit is the first step to allow demoting specific fsck issues to mere warnings. The `fsck_set_msg_types()` function added by this commit parses a list of settings in the form: missingemail=warn,badname=warn,... Unfortunately, the FSCK_WARN/FSCK_ERROR flag is only really heeded by git fsck so far, but other call paths (e.g. git index-pack --strict) error out always no matter what type was specified. Therefore, we need to take extra care to set all message types to FSCK_ERROR by default in those cases. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:26:46 -07:00
Johannes Schindelin	f417eed8cd	fsck: provide a function to parse fsck message IDs These functions will be used in the next commits to allow the user to ask fsck to handle specific problems differently, e.g. demoting certain errors to warnings. The upcoming `fsck_set_msg_types()` function has to handle partial strings because we would like to be able to parse, say, 'missingemail=warn,missingtaggerentry=warn' command line parameters (which will be passed by receive-pack to index-pack and unpack-objects). To make the parsing robust, we generate strings from the enum keys, and using these keys, we match up strings without dashes case-insensitively to the corresponding enum values. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 12:53:25 -07:00
Johannes Schindelin	c99ba492f1	fsck: introduce identifiers for fsck messages Instead of specifying whether a message by the fsck machinery constitutes an error or a warning, let's specify an identifier relating to the concrete problem that was encountered. This is necessary for upcoming support to be able to demote certain errors to warnings. In the process, simplify the requirements on the calling code: instead of having to handle full-blown varargs in every callback, we now send a string buffer ready to be used by the callback. We could use a simple enum for the message IDs here, but we want to guarantee that the enum values are associated with the appropriate message types (i.e. error or warning?). Besides, we want to introduce a parser in the next commit that maps the string representation to the enum value, hence we use the slightly ugly preprocessor construct that is extensible for use with said parser. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 10:24:27 -07:00
Johannes Schindelin	22410549fc	fsck: introduce fsck options Just like the diff machinery, we are about to introduce more settings, therefore it makes sense to carry them around as a (pointer to a) struct containing all of them. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 10:23:32 -07:00
Junio C Hamano	1cb4b3d380	Merge branch 'js/fsck-tag-validation' New tag object format validation added in 2.2 showed garbage after a tagname it reported in its error message. * js/fsck-tag-validation: index-pack: terminate object buffers with NUL fsck: properly bound "invalid tag name" error message	2014-12-22 12:27:41 -08:00
Junio C Hamano	77933f4449	Sync with v2.1.4 * maint-2.1: Git 2.1.4 Git 2.0.5 Git 1.9.5 Git 1.8.5.6 fsck: complain about NTFS ".git" aliases in trees read-cache: optionally disallow NTFS .git variants path: add is_ntfs_dotgit() helper fsck: complain about HFS+ ".git" aliases in trees read-cache: optionally disallow HFS+ .git variants utf8: add is_hfs_dotgit() helper fsck: notice .git case-insensitively t1450: refactor ".", "..", and ".git" fsck tests verify_dotfile(): reject .git case-insensitively read-tree: add tests for confusing paths like ".." and ".git" unpack-trees: propagate errors adding entries to the index	2014-12-17 11:46:57 -08:00
Junio C Hamano	58f1d950e3	Sync with v2.0.5 * maint-2.0: Git 2.0.5 Git 1.9.5 Git 1.8.5.6 fsck: complain about NTFS ".git" aliases in trees read-cache: optionally disallow NTFS .git variants path: add is_ntfs_dotgit() helper fsck: complain about HFS+ ".git" aliases in trees read-cache: optionally disallow HFS+ .git variants utf8: add is_hfs_dotgit() helper fsck: notice .git case-insensitively t1450: refactor ".", "..", and ".git" fsck tests verify_dotfile(): reject .git case-insensitively read-tree: add tests for confusing paths like ".." and ".git" unpack-trees: propagate errors adding entries to the index	2014-12-17 11:42:28 -08:00
Junio C Hamano	5e519fb8b0	Sync with v1.9.5 * maint-1.9: Git 1.9.5 Git 1.8.5.6 fsck: complain about NTFS ".git" aliases in trees read-cache: optionally disallow NTFS .git variants path: add is_ntfs_dotgit() helper fsck: complain about HFS+ ".git" aliases in trees read-cache: optionally disallow HFS+ .git variants utf8: add is_hfs_dotgit() helper fsck: notice .git case-insensitively t1450: refactor ".", "..", and ".git" fsck tests verify_dotfile(): reject .git case-insensitively read-tree: add tests for confusing paths like ".." and ".git" unpack-trees: propagate errors adding entries to the index	2014-12-17 11:28:54 -08:00
Junio C Hamano	6898b79721	Sync with v1.8.5.6 * maint-1.8.5: Git 1.8.5.6 fsck: complain about NTFS ".git" aliases in trees read-cache: optionally disallow NTFS .git variants path: add is_ntfs_dotgit() helper fsck: complain about HFS+ ".git" aliases in trees read-cache: optionally disallow HFS+ .git variants utf8: add is_hfs_dotgit() helper fsck: notice .git case-insensitively t1450: refactor ".", "..", and ".git" fsck tests verify_dotfile(): reject .git case-insensitively read-tree: add tests for confusing paths like ".." and ".git" unpack-trees: propagate errors adding entries to the index	2014-12-17 11:20:31 -08:00
Johannes Schindelin	d08c13b947	fsck: complain about NTFS ".git" aliases in trees Now that the index can block pathnames that can be mistaken to mean ".git" on NTFS and FAT32, it would be helpful for fsck to notice such problematic paths. This lets servers which use receive.fsckObjects block them before the damage spreads. Note that the fsck check is always on, even for systems without core.protectNTFS set. This is technically more restrictive than we need to be, as a set of users on ext4 could happily use these odd filenames without caring about NTFS. However, on balance, it's helpful for all servers to block these (because the paths can be used for mischief, and servers which bother to fsck would want to stop the spread whether they are on NTFS themselves or not), and hardly anybody will be affected (because the blocked names are variants of .git or git~1, meaning mischief is almost certainly what the tree author had in mind). Ideally these would be controlled by a separate "fsck.protectNTFS" flag. However, it would be much nicer to be able to enable/disable _any_ fsck flag individually, and any scheme we choose should match such a system. Given the likelihood of anybody using such a path in practice, it is not unreasonable to wait until such a system materializes. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-17 11:04:45 -08:00
Jeff King	a18fcc9ff2	fsck: complain about HFS+ ".git" aliases in trees Now that the index can block pathnames that case-fold to ".git" on HFS+, it would be helpful for fsck to notice such problematic paths. This lets servers which use receive.fsckObjects block them before the damage spreads. Note that the fsck check is always on, even for systems without core.protectHFS set. This is technically more restrictive than we need to be, as a set of users on ext4 could happily use these odd filenames without caring about HFS+. However, on balance, it's helpful for all servers to block these (because the paths can be used for mischief, and servers which bother to fsck would want to stop the spread whether they are on HFS+ themselves or not), and hardly anybody will be affected (because the blocked names are variants of .git with invisible Unicode code-points mixed in, meaning mischief is almost certainly what the tree author had in mind). Ideally these would be controlled by a separate "fsck.protectHFS" flag. However, it would be much nicer to be able to enable/disable _any_ fsck flag individually, and any scheme we choose should match such a system. Given the likelihood of anybody using such a path in practice, it is not unreasonable to wait until such a system materializes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-17 11:04:45 -08:00
Jeff King	76e86fc6e3	fsck: notice .git case-insensitively We complain about ".git" in a tree because it cannot be loaded into the index or checked out. Since we now also reject ".GIT" case-insensitively, fsck should notice the same, so that errors do not propagate. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-17 11:04:39 -08:00
Jeff King	7add441984	fsck: properly bound "invalid tag name" error message When we detect an invalid tag-name header in a tag object, like, "tag foo bar\n", we feed the pointer starting at "foo bar" to a printf "%s" formatter. This shows the name, as we want, but then it keeps printing the rest of the tag buffer, rather than stopping at the end of the line. Our tests did not notice because they look only for the matching line, but the bug is that we print much more than we wanted to. So we also adjust the test to be more exact. Note that when fscking tags with "index-pack --strict", this is even worse. index-pack does not add a trailing NUL-terminator after the object, so we may actually read past the buffer and print uninitialized memory. Running t5302 with valgrind does notice the bug for that reason. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-09 11:54:25 -08:00
Johannes Schindelin	cec097be3a	fsck: check tag objects' headers We inspect commit objects pretty much in detail in git-fsck, but we just glanced over the tag objects. Let's be stricter. Since we do not want to limit 'tag' lines unduly, values that would fail the refname check only result in warnings, not errors. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-09-11 10:44:26 -07:00
Johannes Schindelin	4d0d89755e	Make sure fsck_commit_buffer() does not run out of the buffer So far, we assumed that the buffer is NUL terminated, but this is not a safe assumption, now that we opened the fsck_object() API to pass a buffer directly. So let's make sure that there is at least an empty line in the buffer. That way, our checks would fail if the empty line was encountered prematurely, and consequently we can get away with the current string comparisons even with non-NUL-terminated buffers are passed to fsck_object(). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-09-11 10:44:01 -07:00
Johannes Schindelin	90a398bbd7	fsck_object(): allow passing object data separately from the object itself When fsck'ing an incoming pack, we need to fsck objects that cannot be read via read_sha1_file() because they are not local yet (and might even be rejected if transfer.fsckobjects is set to 'true'). For commits, there is a hack in place: we basically cache commit objects' buffers anyway, but the same is not true, say, for tag objects. By refactoring fsck_object() to take the object buffer and size as optional arguments -- optional, because we still fall back to the previous method to look at the cached commit objects if the caller passes NULL -- we prepare the machinery for the upcoming handling of tag objects. The assumption that such buffers are inherently NUL terminated is now wrong, of course, hence we pass the size of the buffer so that we can add a sanity check later, to prevent running past the end of the buffer. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-09-10 13:54:21 -07:00
René Scharfe	9d02150cf4	fsck: simplify fsck_commit_buffer() by using commit_list_count() fsck_commit_buffer() checks that the number of items in the parents list of a commit matches the number of parent lines in its buffer or -- if a graft is used -- the number of parents in that graft. Simplify the code by using commit_list_count() instead of counting by hand. Also use different variables for the number of lines and the number of list items, making it easier to compare them. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-07-10 14:10:27 -07:00
Junio C Hamano	e91ae32a01	Merge branch 'jk/skip-prefix' * jk/skip-prefix: http-push: refactor parsing of remote object names imap-send: use skip_prefix instead of using magic numbers use skip_prefix to avoid repeated calculations git: avoid magic number with skip_prefix fetch-pack: refactor parsing in get_ack fast-import: refactor parsing of spaces stat_opt: check extra strlen call daemon: use skip_prefix to avoid magic numbers fast-import: use skip_prefix for parsing input use skip_prefix to avoid repeating strings use skip_prefix to avoid magic numbers transport-helper: avoid reading past end-of-string fast-import: fix read of uninitialized argv memory apply: use skip_prefix instead of raw addition refactor skip_prefix to return a boolean avoid using skip_prefix as a boolean daemon: mark some strings as const parse_diff_color_slot: drop ofs parameter	2014-07-09 11:33:28 -07:00
Jeff King	cf4fff579e	refactor skip_prefix to return a boolean The skip_prefix() function returns a pointer to the content past the prefix, or NULL if the prefix was not found. While this is nice and simple, in practice it makes it hard to use for two reasons: 1. When you want to conditionally skip or keep the string as-is, you have to introduce a temporary variable. For example: tmp = skip_prefix(buf, "foo"); if (tmp) buf = tmp; 2. It is verbose to check the outcome in a conditional, as you need extra parentheses to silence compiler warnings. For example: if ((cp = skip_prefix(buf, "foo")) /* do something with cp */ Both of these make it harder to use for long if-chains, and we tend to use starts_with() instead. However, the first line of "do something" is often to then skip forward in buf past the prefix, either using a magic constant or with an extra strlen(3) (which is generally computed at compile time, but means we are repeating ourselves). This patch refactors skip_prefix() to return a simple boolean, and to provide the pointer value as an out-parameter. If the prefix is not found, the out-parameter is untouched. This lets you write: if (skip_prefix(arg, "foo ", &arg)) do_foo(arg); else if (skip_prefix(arg, "bar ", &arg)) do_bar(arg); Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-06-20 10:44:43 -07:00
Jeff King	8597ea3afe	commit: record buffer length in cache Most callsites which use the commit buffer try to use the cached version attached to the commit, rather than re-reading from disk. Unfortunately, that interface provides only a pointer to the NUL-terminated buffer, with no indication of the original length. For the most part, this doesn't matter. People do not put NULs in their commit messages, and the log code is happy to treat it all as a NUL-terminated string. However, some code paths do care. For example, when checking signatures, we want to be very careful that we verify all the bytes to avoid malicious trickery. This patch just adds an optional "size" out-pointer to get_commit_buffer and friends. The existing callers all pass NULL (there did not seem to be any obvious sites where we could avoid an immediate strlen() call, though perhaps with some further refactoring we could). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-06-13 12:09:38 -07:00
Jeff King	bc6b8fc130	use get_commit_buffer everywhere Each of these sites assumes that commit->buffer is valid. Since they would segfault if this was not the case, they are likely to be correct in practice. However, we can future-proof them by using get_commit_buffer. And as a side effect, we abstract away the final bare uses of commit->buffer. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-06-13 12:08:17 -07:00
Junio C Hamano	f7804e250d	Merge branch 'hs/simplify-bit-setting-in-fsck-tree' * hs/simplify-bit-setting-in-fsck-tree: fsck: use bitwise-or assignment operator to set flag	2014-03-31 16:30:44 -07:00
Junio C Hamano	40adf520a3	Merge branch 'ys/fsck-commit-parsing' * ys/fsck-commit-parsing: fsck.c:fsck_commit(): use skip_prefix() to verify and skip constant fsck.c:fsck_ident(): ident points at a const string	2014-03-28 13:51:24 -07:00
Hiroyuki Sano	effd12ec87	fsck: use bitwise-or assignment operator to set flag fsck_tree() has two different ways to set a flag variable, either by using a if-statement that guards an assignment, or by using a bitwise-or assignment operator. Most are done with the former, and only one variable is assigned with the latter. Since all the conditions are short-and-sweet, we can afford to uniformly use the latter style, which makes the resulting code shorter and easier to read. Signed-off-by: Hiroyuki Sano <sh19910711@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-20 11:20:48 -07:00
Yuxuan Shui	2d820a61df	fsck.c:fsck_commit(): use skip_prefix() to verify and skip constant fsck_commit() uses memcmp() to check if the buffer starts with a certain prefix, and skips the prefix if it does. This is exactly what skip_prefix() was designed for. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-19 15:34:56 -07:00
Yuxuan Shui	de42180f6a	fsck.c:fsck_ident(): ident points at a const string Since fsck_ident doesn't change the content of ident, the type of ident could be const char . This change is required to rewrite fsck_commit() to use skip_prefix(). Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-13 13:00:30 -07:00
Jeff King	7ca36d9398	date: check date overflow against time_t When we check whether a timestamp has overflowed, we check only against ULONG_MAX, meaning that strtoul has overflowed. However, we also feed these timestamps to system functions like gmtime, which expect a time_t. On many systems, time_t is actually smaller than "unsigned long" (e.g., because it is signed), and we would overflow when using these functions. We don't know the actual size or signedness of time_t, but we can easily check for truncation with a simple assignment. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 10:12:58 -08:00
Jeff King	d4b8de0420	fsck: report integer overflow in author timestamps When we check commit objects, we complain if commit->date is ULONG_MAX, which is an indication that we saw integer overflow when parsing it. However, we do not do any check at all for author lines, which also contain a timestamp. Let's actually check the timestamps on each ident line with strtoul. This catches both author and committer lines, and we can get rid of the now-redundant commit->date check. Note that like the existing check, we compare only against ULONG_MAX. Now that we are calling strtoul at the site of the check, we could be slightly more careful and also check that errno is set to ERANGE. However, this will make further refactoring in future patches a little harder, and it doesn't really matter in practice. For 32-bit systems, one would have to create a commit at the exact wrong second in 2038. But by the time we get close to that, all systems will hopefully have moved to 64-bit (and if they haven't, they have a real problem one second later). For 64-bit systems, by the time we get close to ULONG_MAX, all systems will hopefully have been consumed in the fiery wrath of our expanding Sun. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 10:12:58 -08:00
Jeff King	5c17f51270	fsck: warn about ".git" in trees Having a ".git" entry inside a tree can cause confusing results on checkout. At the top-level, you could not checkout such a tree, as it would complain about overwriting the real ".git" directory. In a subdirectory, you might check it out, but performing operations in the subdirectory would confusingly consider the in-tree ".git" directory as the repository. The regular git tools already make it hard to accidentally add such an entry to a tree, and do not allow such entries to enter the index at all. Teaching fsck about it provides an additional safety check, and let's us avoid propagating any such bogosity when transfer.fsckObjects is on. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-11-28 13:52:54 -08:00
Jeff King	5d34a4359d	fsck: warn about '.' and '..' in trees A tree with meta-paths like '.' or '..' does not work well with git; the index will refuse to load it or check it out to the filesystem (and even if we did not have that safety, it would look like we were overwriting an untracked directory). For the same reason, it is difficult to create such a tree with regular git. Let's warn about these dubious entries during fsck, just in case somebody has created a bogus tree (and this also lets us prevent them from propagating when transfer.fsckObjects is set). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-11-28 10:41:08 -08:00
Junio C Hamano	03adeeaad6	Merge branch 'jk/maint-null-in-trees' into maint-1.7.11 "git diff" had a confusion between taking data from a path in the working tree and taking data from an object that happens to have name 0{40} recorded in a tree. * jk/maint-null-in-trees: fsck: detect null sha1 in tree entries do not write null sha1s to on-disk index diff: do not use null sha1 as a sentinel value	2012-09-10 15:24:54 -07:00
Jeff King	c479d14a80	fsck: detect null sha1 in tree entries Short of somebody happening to beat the 1 in 2^160 odds of actually generating content that hashes to the null sha1, we should never see this value in a tree entry. So let's have fsck warn if it it seen. As in the previous commit, we test both blob and submodule entries to future-proof the test suite against the implementation depending on connectivity to notice the error. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-07-29 15:14:08 -07:00
Pete Wyckoff	82247e9bd5	remove superfluous newlines in error messages The error handling routines add a newline. Remove the duplicate ones in error messages. Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-04-30 15:45:51 -07:00
Dmitry Ivankov	53f53cff24	fsck: improve committer/author check fsck allows a name with > character in it like "name> <email>". Also for "name email>" fsck says "missing space before email". More precisely, it seeks for a first '<', checks that ' ' preceeds it. Then seeks to '<' or '>' and checks that it is the '>'. Missing space is reported if either '<' is not found or it's not preceeded with ' '. Change it to following. Seek to '<' or '>', check that it is '<' and is preceeded with ' '. Seek to '<' or '>' and check that it is '>'. So now "name> <email>" is rejected as "bad name". More strict name check is the only change in what is accepted. Report 'missing space' only if '<' is found and is not preceeded with a space. Signed-off-by: Dmitry Ivankov <divanorama@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-11 12:21:07 -07:00
Junio C Hamano	5b42477b59	Merge branch 'jm/maint-misc-fix' into maint * jm/maint-misc-fix: read_gitfile_gently: use ssize_t to hold read result remove tests of always-false condition rerere.c: diagnose a corrupt MERGE_RR when hitting EOF between TAB and '\0'	2011-05-30 00:09:41 -07:00
Jim Meyering	5dd564895e	remove tests of always-false condition * fsck.c (fsck_error_function): Don't test obj->sha1 == 0. It can never be true, since that sha1 member is an array. * transport.c (set_upstreams): Likewise for ref->new_sha1. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-26 11:24:24 -07:00
Jeff King	ebeb60900f	strbuf: add strbuf_vaddf In a variable-args function, the code for writing into a strbuf is non-trivial. We ended up cutting and pasting it in several places because there was no vprintf-style function for strbufs (which in turn was held up by a lack of va_copy). Now that we have a fallback va_copy, we can add strbuf_vaddf, the strbuf equivalent of vsprintf. And we can clean up the cut and paste mess. Signed-off-by: Jeff King <peff@peff.net> Improved-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-02-26 01:06:50 -08:00
Jonathan Nieder	0adc6a3d49	fsck: fix bogus commit header check `daae1922` (fsck: check ident lines in commit objects, 2010-04-24) taught fsck to expect commit objects to have the form tree <object name> <parents> author <valid ident string> committer <valid ident string> log message The check is overly strict: for example, it errors out with the message “expected blank line” for perfectly valid commits with an "encoding ISO-8859-1" line. Later it might make sense to teach fsck about the rest of the header and warn about unrecognized header lines, but for simplicity, let’s accept arbitrary trailing lines for now. Reported-by: Tuncer Ayaz <tuncer.ayaz@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-05-28 15:08:27 -07:00
Jonathan Nieder	daae19224a	fsck: check ident lines in commit objects Check that email addresses do not contain <, >, or newline so they can be quickly scanned without trouble. The copy() function in ident.c already ensures that ordinary git commands will not write email addresses without this property. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-05-01 12:15:06 -07:00
Eric Wong	f290974390	Allow the Unix epoch to be a valid commit date It is common practice to use the Unix epoch as a fallback date when a suitable date is not available. This is true of git svn and possibly other importing tools that import non-git history into git. Instead of clobbering established strtoul() error reporting semantics with our own, preserve the strtoul() error value of ULONG_MAX for fsck.c to handle. Signed-off-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-07-06 09:36:16 -07:00
Benjamin Kramer	eb3a9dd327	Remove unused function scope local variables These variables were unused and can be removed safely: builtin-clone.c::cmd_clone(): use_local_hardlinks, use_separate_remote builtin-fetch-pack.c::find_common(): len builtin-remote.c::mv(): symref diff.c::show_stats():show_stats(): total diffcore-break.c::should_break(): base_size fast-import.c::validate_raw_date(): date, sign fsck.c::fsck_tree(): o_sha1, sha1 xdiff-interface.c::parse_num(): read_some Signed-off-by: Benjamin Kramer <benny.kra@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-03-07 20:52:17 -08:00
Junio C Hamano	8bb4646dae	Merge branch 'maint' * maint: Fix non-literal format in printf-style calls git-submodule: Avoid printing a spurious message. git ls-remote: make usage string match manpage Makefile: help people who run 'make check' by mistake	2008-11-11 14:49:50 -08:00
Daniel Lowe	9db56f71b9	Fix non-literal format in printf-style calls These were found using gcc 4.3.2-1ubuntu11 with the warning: warning: format not a string literal and no format arguments Incorporated suggestions from Brandon Casey <casey@nrlssc.navy.mil>. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-11-11 14:43:59 -08:00
Brandon Casey	f285a2d7ed	Replace calls to strbuf_init(&foo, 0) with STRBUF_INIT initializer Many call sites use strbuf_init(&foo, 0) to initialize local strbuf variable "foo" which has not been accessed since its declaration. These can be replaced with a static initialization using the STRBUF_INIT macro which is just as readable, saves a function call, and takes up fewer lines. Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2008-10-12 12:36:19 -07:00
Junio C Hamano	79b1138e78	fsck.c: fix bogus "empty tree" check `ba002f3` (builtin-fsck: move common object checking code to fsck.c) did more than what it claimed to. Most notably, it wrongly made an empty tree object an error by pretending to only move code from fsck_tree() in builtin-fsck.c to fsck_tree() in fsck.c, but in fact adding a bogus check to barf on an empty tree. An empty tree object is _unusual_. Recent porcelains try reasonably hard not to let the user create a commit that contains such a tree. Perhaps warning about them in git-fsck may have some merit. HOWEVER. Being unusual and being errorneous are two quite different things. This is especially true now we seem to use the same fsck_$object() code in places other than git-fsck itself. For example, receive-pack should not reject unusual objects, even if it would be a good idea to tighten it to reject incorrect ones. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-03-05 10:32:01 -08:00
Martin Koegler	d6ffc8d784	add common fsck error printing function Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-02-25 23:57:35 -08:00
Martin Koegler	ba002f3b28	builtin-fsck: move common object checking code to fsck.c Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-02-25 23:57:35 -08:00
Martin Koegler	355885d531	add generic, type aware object chain walker The requirements are: * it may not crash on NULL pointers * a callback function is needed, as index-pack/unpack-objects need to do different things * the type information is needed to check the expected <-> real type and print better error messages Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-02-25 23:57:34 -08:00
Mark Wooding	b4dfefe00f	Make fsck and fsck-objects be builtins. The earlier change `df391b192` to rename fsck-objects to fsck broke fsck-objects. This should fix it again. Signed-off-by: Mark Wooding <mdw@distorted.org.uk> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-01-29 09:36:21 -08:00
Junio C Hamano	df391b192d	git-fsck-objects is now synonym to git-fsck Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-01-28 16:33:58 -08:00

... 2 3 4 5 6

253 Commits