mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-26 11:34:00 +08:00

Author	SHA1	Message	Date
Jeff King	13e0b0d3dc	use_pack: handle signed off_t overflow A v2 pack index file can specify an offset within a packfile of up to 2^64-1 bytes. On a system with a signed 64-bit off_t, we can represent only up to 2^63-1. This means that a corrupted .idx file can end up with a negative offset in the pack code. Our bounds-checking use_pack function looks for too-large offsets, but not for ones that have wrapped around to negative. Let's do so, which fixes an out-of-bounds access demonstrated in t5313. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-02-25 11:32:46 -08:00
Jeff King	47fe3f6ef0	nth_packed_object_offset: bounds-check extended offset If a pack .idx file has a corrupted offset for an object, we may try to access an offset in the .idx or .pack file that is larger than the file's size. For the .pack case, we have use_pack() to protect us, which realizes the access is out of bounds. But if the corrupted value asks us to look in the .idx file's secondary 64-bit offset table, we blindly add it to the mmap'd index data and access arbitrary memory. We can fix this with a simple bounds-check compared to the size we found when we opened the .idx file. Note that there's similar code in index-pack that is triggered only during "index-pack --verify". To support both, we pull the bounds-check into a separate function, which dies when it sees a corrupted file. It would be nice if we could return an error, so that the pack code could try to find a good copy of the object elsewhere. Currently nth_packed_object_offset doesn't have any way to return an error, but it could probably use "0" as a sentinel value (since no object can start there). This is the minimal fix, and we can improve the resilience later on top. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-02-25 11:32:43 -08:00
Jeff King	50a6c8efa2	use st_add and st_mult for allocation size computation If our size computation overflows size_t, we may allocate a much smaller buffer than we expected and overflow it. It's probably impossible to trigger an overflow in most of these sites in practice, but it is easy enough convert their additions and multiplications into overflow-checking variants. This may be fixing real bugs, and it makes auditing the code easier. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-02-22 14:51:09 -08:00
Jeff King	b32fa95fd8	convert trivial cases to ALLOC_ARRAY Each of these cases can be converted to use ALLOC_ARRAY or REALLOC_ARRAY, which has two advantages: 1. It automatically checks the array-size multiplication for overflow. 2. It always uses sizeof(*array) for the element-size, so that it can never go out of sync with the declared type of the array. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-02-22 14:51:09 -08:00
Junio C Hamano	3f16396228	clone/sha1_file: read info/alternates with strbuf_getline() $GIT_OBJECT_DIRECTORY/info/alternates is a text file that can be edited with a DOS editor. We do not want to use the real path with CR appended at the end. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-01-15 10:34:53 -08:00
Junio C Hamano	8f309aeb82	strbuf: introduce strbuf_getline_{lf,nul}() The strbuf_getline() interface allows a byte other than LF or NUL as the line terminator, but this is only because I wrote these codepaths anticipating that there might be a value other than NUL and LF that could be useful when I introduced line_termination long time ago. No useful caller that uses other value has emerged. By now, it is clear that the interface is overly broad without a good reason. Many codepaths have hardcoded preference to read either LF terminated or NUL terminated records from their input, and then call strbuf_getline() with LF or NUL as the third parameter. This step introduces two thin wrappers around strbuf_getline(), namely, strbuf_getline_lf() and strbuf_getline_nul(), and mechanically rewrites these call sites to call either one of them. The changes contained in this patch are: * introduction of these two functions in strbuf.[ch] * mechanical conversion of all callers to strbuf_getline() with either '\n' or '\0' as the third parameter to instead call the respective thin wrapper. After this step, output from "git grep 'strbuf_getline('" would become a lot smaller. An interim goal of this series is to make this an empty set, so that we can have strbuf_getline_crlf() take over the shorter name strbuf_getline(). Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-01-15 10:12:51 -08:00
Junio C Hamano	fbe959dde7	Merge branch 'bc/format-patch-null-from-line' "format-patch" has learned a new option to zero-out the commit object name on the mbox "From " line. * bc/format-patch-null-from-line: format-patch: check that header line has expected format format-patch: add an option to suppress commit hash sha1_file.c: introduce a null_oid constant	2015-12-21 10:59:08 -08:00
Junio C Hamano	897b18508b	Merge branch 'jk/prune-mtime' The helper used to iterate over loose object directories to prune stale objects did not closedir() immediately when it is done with a directory--a callback such as the one used for "git prune" may want to do rmdir(), but it would fail on open directory on platforms such as WinXP. * jk/prune-mtime: prune: close directory earlier during loose-object directory traversal	2015-12-15 08:02:16 -08:00
brian m. carlson	3e56e7245c	sha1_file.c: introduce a null_oid constant null_oid is the struct object_id equivalent to null_sha1. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-12-14 13:35:54 -08:00
brian m. carlson	b419aa25d5	sha1_file: introduce has_object_file helper. Add has_object_file, which is a wrapper around has_sha1_file, but for struct object_id. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>	2015-11-20 08:02:05 -05:00
Jeff King	45014beac0	Merge branch 'dk/gc-idx-wo-pack' Having a leftover .idx file without corresponding .pack file in the repository hurts performance; "git gc" learned to prune them. * dk/gc-idx-wo-pack: gc: remove garbage .idx files from pack dir t5304: test cleaning pack garbage prepare_packed_git(): refactor garbage reporting in pack directory	2015-11-20 06:55:34 -05:00
Junio C Hamano	808d119263	Merge branch 'js/misc-fixes' Various compilation fixes and squelching of warnings. * js/misc-fixes: Correct fscanf formatting string for I64u values Silence GCC's "cast of pointer to integer of a different size" warning Squelch warning about an integer overflow	2015-10-30 13:07:00 -07:00
Johannes Schindelin	56a1a3ab44	Silence GCC's "cast of pointer to integer of a different size" warning When calculating hashes from pointers, it actually makes sense to cut off the most significant bits. In that case, said warning does not make a whole lot of sense. So let's just work around it by casting the pointer first to intptr_t and then casting up/down to the final integral type. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-10-26 13:24:03 -07:00
Junio C Hamano	78891795df	Merge branch 'jk/war-on-sprintf' Many allocations that is manually counted (correctly) that are followed by strcpy/sprintf have been replaced with a less error prone constructs such as xstrfmt. Macintosh-specific breakage was noticed and corrected in this reroll. * jk/war-on-sprintf: (70 commits) name-rev: use strip_suffix to avoid magic numbers use strbuf_complete to conditionally append slash fsck: use for_each_loose_file_in_objdir Makefile: drop D_INO_IN_DIRENT build knob fsck: drop inode-sorting code convert strncpy to memcpy notes: document length of fanout path with a constant color: add color_set helper for copying raw colors prefer memcpy to strcpy help: clean up kfmclient munging receive-pack: simplify keep_arg computation avoid sprintf and strcpy with flex arrays use alloc_ref rather than hand-allocating "struct ref" color: add overflow checks for parsing colors drop strcpy in favor of raw sha1_to_hex use sha1_to_hex_r() instead of strcpy daemon: use cld->env_array when re-spawning stat_tracking_info: convert to argv_array http-push: use an argv_array for setup_revisions fetch-pack: use argv_array for index-pack / unpack-objects ...	2015-10-20 15:24:01 -07:00
Junio C Hamano	db5adf24bf	Merge branch 'js/clone-dissociate' "git clone --dissociate" runs a big "git repack" process at the end, and it helps to close file descriptors that are open on the packs and their idx files before doing so on filesystems that cannot remove a file that is still open. * js/clone-dissociate: clone --dissociate: avoid locking pack files sha1_file.c: add a function to release all packs sha1_file: consolidate code to close a pack's file descriptor t5700: demonstrate a Windows file locking issue with `git clone --dissociate`	2015-10-15 15:43:49 -07:00
Johannes Schindelin	38849a8116	sha1_file.c: add a function to release all packs On Windows, files that are in use cannot be removed or renamed. That means that we have to release pack files when we are about to, say, repack them. Let's introduce a convenient function to close all the pack files and their idx files. While at it, we consolidate the close windows/close fd/close index stanza in `free_pack_by_name()` into the `close_pack()` function that is used by the new `close_all_packs()` function to avoid repeated code. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-10-07 10:47:10 -07:00
Johannes Schindelin	71fe5d7fb0	sha1_file: consolidate code to close a pack's file descriptor There was a lot of repeated code to close the file descriptor of a given pack. Let's just refactor this code into a single function. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-10-05 14:43:58 -07:00
Jeff King	c7ab0ba340	avoid sprintf and strcpy with flex arrays When we are allocating a struct with a FLEX_ARRAY member, we generally compute the size of the array and then sprintf or strcpy into it. Normally we could improve a dynamic allocation like this by using xstrfmt, but it doesn't work here; we have to account for the size of the rest of the struct. But we can improve things a bit by storing the length that we use for the allocation, and then feeding it to xsnprintf or memcpy, which makes it more obvious that we are not writing more than the allocated number of bytes. It would be nice if we had some kind of helper for allocating generic flex arrays, but it doesn't work that well: - the call signature is a little bit unwieldy: d = flex_struct(sizeof(*d), offsetof(d, path), fmt, ...); You need offsetof here instead of just writing to the end of the base size, because we don't know how the struct is packed (partially this is because FLEX_ARRAY might not be zero, though we can account for that; but the size of the struct may actually be rounded up for alignment, and we can't know that). - some sites do clever things, like over-allocating because they know they will write larger things into the buffer later (e.g., struct packed_git here). So we're better off to just write out each allocation (or add type-specific helpers, though many of these are one-off allocations anyway). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-10-05 11:08:05 -07:00
Jeff King	d4b3d11a03	write_loose_object: convert to strbuf When creating a loose object tempfile, we use a fixed PATH_MAX-sized buffer, and strcpy directly into it. This isn't buggy, because we do a rough check of the size, but there's no verification that our guesstimate of the required space is enough (in fact, it's several bytes too big for the current naming scheme). Let's switch to a strbuf, which makes this much easier to verify. The allocation overhead should be negligible, since we are replacing a static buffer with a static strbuf, and we'll only need to allocate on the first call. While we're here, we can also document a subtle interaction with mkstemp that would be easy to overlook. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-10-05 11:08:05 -07:00
Jeff King	ac5190cc48	sha1_get_pack_name: use a strbuf We do some manual memory computation here, and there's no check that our 60 is not overflowed by the raw sprintf (it isn't, because the "which" parameter is never longer than "pack"). We can simplify this greatly with a strbuf. Technically the end result is not identical, as the original took care not to rewrite the object directory on each call for performance reasons. We could do that here, too (by saving the baselen and resetting to it), but it's not worth the complexity; this function is not called a lot (generally once per packfile that we open). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-09-25 10:18:18 -07:00
Jeff King	9ae97018fb	use strip_suffix and xstrfmt to replace suffix When we want to convert "foo.pack" to "foo.idx", we do it by duplicating the original string and then munging the bytes in place. Let's use strip_suffix and xstrfmt instead, which has several advantages: 1. It's more clear what the intent is. 2. It does not implicitly rely on the fact that strlen(".idx") <= strlen(".pack") to avoid an overflow. 3. We communicate the assumption that the input file ends with ".pack" (and get a run-time check that this is so). 4. We drop calls to strcpy, which makes auditing the code base easier. Likewise, we can do this to convert ".pack" to ".bitmap", avoiding some manual memory computation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-09-25 10:18:18 -07:00
Jeff King	48bcc1c3cc	add_packed_git: convert strcpy into xsnprintf We have the path "foo.idx", and we create a buffer big enough to hold "foo.pack" and "foo.keep", and then strcpy straight into it. This isn't a bug (we have enough space), but it's very hard to tell from the strcpy that this is so. Let's instead use strip_suffix to take off the ".idx", record the size of our allocation, and use xsnprintf to make sure we don't violate our assumptions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-09-25 10:18:18 -07:00
Jeff King	ef1286d3c0	use xsnprintf for generating git object headers We generally use 32-byte buffers to format git's "type size" header fields. These should not generally overflow unless you can produce some truly gigantic objects (and our types come from our internal array of constant strings). But it is a good idea to use xsnprintf to make sure this is the case. Note that we slightly modify the interface to write_sha1_file_prepare, which nows uses "hdrlen" as an "in" parameter as well as an "out" (on the way in it stores the allocated size of the header, and on the way out it returns the ultimate size of the header). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-09-25 10:18:18 -07:00
Junio C Hamano	f0bc854623	Sync with 2.5.2	2015-09-09 14:30:35 -07:00
Junio C Hamano	3d3caf0b78	Sync with 2.4.9	2015-09-04 10:43:23 -07:00
Junio C Hamano	ef0e938a1a	Sync with 2.3.9	2015-09-04 10:34:19 -07:00
Junio C Hamano	8267cd11d6	Sync with 2.2.3	2015-09-04 10:29:28 -07:00
Jeff King	5015f01c12	read_info_alternates: handle paths larger than PATH_MAX This function assumes that the relative_base path passed into it is no larger than PATH_MAX, and writes into a fixed-size buffer. However, this path may not have actually come from the filesystem; for example, add_submodule_odb generates a path using a strbuf and passes it in. This is hard to trigger in practice, though, because the long submodule directory would have to exist on disk before we would try to open its info/alternates file. We can easily avoid the bug, though, by simply creating the filename on the heap. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-09-04 09:36:51 -07:00
Junio C Hamano	cbcd3dcaa8	Merge branch 'cb/open-noatime-clear-errno' into maint When trying to see that an object does not exist, a state errno leaked from our "first try to open a packfile with O_NOATIME and then if it fails retry without it" logic on a system that refuses O_NOATIME. This confused us and caused us to die, saying that the packfile is unreadable, when we should have just reported that the object does not exist in that packfile to the caller. * cb/open-noatime-clear-errno: git_open_noatime: return with errno=0 on success	2015-09-03 19:17:49 -07:00
Junio C Hamano	9b8d731995	Merge branch 'cb/open-noatime-clear-errno' When trying to see that an object does not exist, a state errno leaked from our "first try to open a packfile with O_NOATIME and then if it fails retry without it" logic on a system that refuses O_NOATIME. This confused us and caused us to die, saying that the packfile is unreadable, when we should have just reported that the object does not exist in that packfile to the caller. * cb/open-noatime-clear-errno: git_open_noatime: return with errno=0 on success	2015-08-25 14:57:10 -07:00
Junio C Hamano	8c9155e031	Merge branch 'jk/git-path' git_path() and mkpath() are handy helper functions but it is easy to misuse, as the callers need to be careful to keep the number of active results below 4. Their uses have been reduced. * jk/git-path: memoize common git-path "constant" files get_repo_path: refactor path-allocation find_hook: keep our own static buffer refs.c: remove_empty_directories can take a strbuf refs.c: avoid git_path assignment in lock_ref_sha1_basic refs.c: avoid repeated git_path calls in rename_tmp_log refs.c: simplify strbufs in reflog setup and writing path.c: drop git_path_submodule refs.c: remove extra git_path calls from read_loose_refs remote.c: drop extraneous local variable from migrate_file prefer mkpathdup to mkpath in assignments prefer git_pathdup to git_path in some possibly-dangerous cases add_to_alternates_file: don't add duplicate entries t5700: modernize style cache.h: complete set of git_path_submodule helpers cache.h: clarify documentation for git_path, et al	2015-08-19 14:48:56 -07:00
Junio C Hamano	51a22ce147	Merge branch 'jc/finalize-temp-file' Long overdue micro clean-up. * jc/finalize-temp-file: sha1_file.c: rename move_temp_to_file() to finalize_object_file()	2015-08-19 14:48:55 -07:00
Junio C Hamano	0a489b0680	prepare_packed_git(): refactor garbage reporting in pack directory The hook to report "garbage" files in $GIT_OBJECT_DIRECTORY/pack/ could be generic but is too specific to count-object's needs. Move the part to produce human-readable messages to count-objects, and refine the interface to callback with the "bits" with values defined in the cache.h header file, so that other callers (e.g. prune) can later use the same mechanism to enumerate different kinds of garbage files and do something intelligent about them, other than reporting in textual messages. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-08-17 09:14:59 -07:00
Clemens Buchacher	dff6f280df	git_open_noatime: return with errno=0 on success In read_sha1_file_extended we die if read_object fails with a fatal error. We detect a fatal error if errno is non-zero and is not ENOENT. If the object could not be read because it does not exist, this is not considered a fatal error and we want to return NULL. Somewhere down the line, read_object calls git_open_noatime to open a pack index file, for example. We first try open with O_NOATIME. If O_NOATIME fails with EPERM, we retry without O_NOATIME. When the second open succeeds, errno is however still set to EPERM from the first attempt. When we finally determine that the object does not exist, read_object returns NULL and read_sha1_file_extended dies with a fatal error: fatal: failed to read object <sha1>: Operation not permitted Fix this by resetting errno to zero before we call open again. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Clemens Buchacher <clemens.buchacher@intel.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-08-12 13:56:19 -07:00
Johannes Sixt	094c7e6352	prune: close directory earlier during loose-object directory traversal `27e1e22d` (prune: factor out loose-object directory traversal, 2014-10-16) introduced a new function for_each_loose_file_in_objdir() with a helper for_each_file_in_obj_subdir(). The latter calls callbacks for each file found during a directory traversal and finally also a callback for the directory itself. git-prune uses the function to clean up the object directory. In particular, in the directory callback it calls rmdir(). On Windows XP, this rmdir call fails, because the directory is still open while the callback is called. Close the directory before calling the callback. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-08-12 12:06:00 -07:00
Jeff King	77b9b1d13a	add_to_alternates_file: don't add duplicate entries The add_to_alternates_file function blindly uses hold_lock_file_for_append to copy the existing contents, and then adds the new line to it. This has two minor problems: 1. We might add duplicate entries, which are ugly and inefficient. 2. We do not check that the file ends with a newline, in which case we would bogusly append to the final line. This is quite unlikely in practice, though, as we call this function only from git-clone, so presumably we are the only writers of the file (and we always add a newline). Instead of using hold_lock_file_for_append, let's copy the file line by line, which ensures all records are properly terminated. If we see an extra line, we can simply abort the update (there is no point in even copying the rest, as we know that it would be identical to the original). As a bonus, we also get rid of some calls to the static-buffer mkpath and git_path functions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-08-10 15:15:42 -07:00
Junio C Hamano	cb5add5868	sha1_file.c: rename move_temp_to_file() to finalize_object_file() Since `5a688fe4` ("core.sharedrepository = 0mode" should set, not loosen, 2009-03-25), we kept reminding ourselves: NEEDSWORK: this should be renamed to finalize_temp_file() as "moving" is only a part of what it does, when no patch between master to pu changes the call sites of this function. without doing anything about it. Let's do so. The purpose of this function was not to move but to finalize. The detail of the primarily implementation of finalizing was to link the temporary file to its final name and then to unlink, which wasn't even "moving". The alternative implementation did "move" by calling rename(2), which is a fun tangent. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-08-10 11:10:37 -07:00
Junio C Hamano	7c696007ca	Merge branch 'jk/fix-refresh-utime' into maint Fix a small bug in our use of umask() return value. * jk/fix-refresh-utime: check_and_freshen_file: fix reversed success-check	2015-07-27 12:21:40 -07:00
Junio C Hamano	de62fe8c42	Merge branch 'jk/index-pack-reduce-recheck' into maint Disable "have we lost a race with competing repack?" check while receiving a huge object transfer that runs index-pack. * jk/index-pack-reduce-recheck: index-pack: avoid excessive re-reading of pack directory	2015-07-27 12:21:38 -07:00
Junio C Hamano	1f9e0a5348	Merge branch 'jk/fix-refresh-utime' Fix a small bug in our use of umask() return value. * jk/fix-refresh-utime: check_and_freshen_file: fix reversed success-check	2015-07-10 14:26:15 -07:00
Junio C Hamano	c07173f215	Merge branch 'jk/maint-for-each-packed-object' The for_each_packed_object() API function did not iterate over objects in a packfile that hasn't been used yet. * jk/maint-for-each-packed-object: for_each_packed_object: automatically open pack index	2015-07-09 14:31:43 -07:00
Jeff King	3096b2ecdb	check_and_freshen_file: fix reversed success-check When we want to write out a loose object file, we have always first made sure we don't already have the object somewhere. Since `33d4221` (write_sha1_file: freshen existing objects, 2014-10-15), we also update the timestamp on the file, so that a simultaneous prune knows somebody is likely to reference it soon. If our utime() call fails, we treat this the same as not having the object in the first place; the safe thing to do is write out another copy. However, the loose-object check accidentally inverts the utime() check; it returns failure _only_ when the utime() call actually succeeded. Thus it was failing to protect us there, and in the normal case where utime() succeeds, it caused us to pointlessly write out and link the object. This passed our freshening tests, because writing out the new object is certainly _one_ way of updating its utime. So the normal case was inefficient, but not wrong. While we're here, let's also drop a comment in front of the check_and_freshen functions, making a note of their return type (since it is not our usual "0 for success, -1 for error"). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-07-08 15:58:28 -07:00
Junio C Hamano	c5baf18a40	Merge branch 'jk/diagnose-config-mmap-failure' into maint The configuration reader/writer uses mmap(2) interface to access the files; when we find a directory, it barfed with "Out of memory?". * jk/diagnose-config-mmap-failure: xmmap(): drop "Out of memory?" config.c: rewrite ENODEV into EISDIR when mmap fails config.c: avoid xmmap error messages config.c: fix mmap leak when writing config read-cache.c: drop PROT_WRITE from mmap of index	2015-06-25 11:02:11 -07:00
Junio C Hamano	712b351bd3	Merge branch 'jk/index-pack-reduce-recheck' Disable "have we lost a race with competing repack?" check while receiving a huge object transfer that runs index-pack. * jk/index-pack-reduce-recheck: index-pack: avoid excessive re-reading of pack directory	2015-06-24 12:21:54 -07:00
Jeff King	f813e9ea5f	for_each_packed_object: automatically open pack index When for_each_packed_object is called, we call prepare_packed_git() to make sure we have the actual list of packs. But the latter does not actually open the pack indices, meaning that pack->nr_objects may simply be 0 if the pack has not otherwise been used since the program started. In practice, this didn't come up for the current callers, because they iterate the packed objects only after iterating all reachable objects (so for it to matter you would have to have a pack consisting only of unreachable objects). But it is a dangerous and confusing interface that should be fixed for future callers. Note that we do not end the iteration when a pack cannot be opened, but we do return an error. That lets you complete the iteration even in actively-repacked repository where an .idx file may racily go away, but it also lets callers know that they may not have gotten the complete list (which the current reachability-check caller does care about). We have to tweak one of the prune tests due to the changed return value; an earlier test creates bogus .idx files and does not clean them up. Having to make this tweak is a good thing; it means we will not prune in a broken repository, and the test confirms that we do not negatively impact a more lenient caller, count-objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 14:53:58 -07:00
Junio C Hamano	070d276cc1	Merge branch 'jh/filter-empty-contents' into maint The clean/smudge interface did not work well when filtering an empty contents (failed and then passed the empty input through). It can be argued that a filter that produces anything but empty for an empty input is nonsense, but if the user wants to do strange things, then why not? * jh/filter-empty-contents: sha1_file: pass empty buffer to index empty file	2015-06-16 14:33:44 -07:00
Junio C Hamano	dee47925c1	Merge branch 'jk/diagnose-config-mmap-failure' The configuration reader/writer uses mmap(2) interface to access the files; when we find a directory, it barfed with "Out of memory?". * jk/diagnose-config-mmap-failure: xmmap(): drop "Out of memory?" config.c: rewrite ENODEV into EISDIR when mmap fails config.c: avoid xmmap error messages config.c: fix mmap leak when writing config read-cache.c: drop PROT_WRITE from mmap of index	2015-06-11 09:29:55 -07:00
Jeff King	0eeb077be7	index-pack: avoid excessive re-reading of pack directory Since `45e8a74` (has_sha1_file: re-check pack directory before giving up, 2013-08-30), we spend extra effort for has_sha1_file to give the right answer when somebody else is repacking. Usually this effort does not matter, because after finding that the object does not exist, the next step is usually to die(). However, some code paths make a large number of has_sha1_file checks which are _not_ expected to return 1. The collision test in index-pack.c is such a case. On a local system, this can cause a performance slowdown of around 5%. But on a system with high-latency system calls (like NFS), it can be much worse. This patch introduces a "quick" flag to has_sha1_file which callers can use when they would prefer high performance at the cost of false negatives during repacks. There may be other code paths that can use this, but the index-pack one is the most obviously critical, so we'll start with switching that one. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-09 12:26:35 -07:00
Junio C Hamano	3c91e9966a	Merge branch 'jk/sha1-file-reduce-useless-warnings' into maint * jk/sha1-file-reduce-useless-warnings: sha1_file: squelch "packfile cannot be accessed" warnings	2015-06-05 12:00:07 -07:00
Junio C Hamano	152722f155	Merge branch 'jh/filter-empty-contents' The clean/smudge interface did not work well when filtering an empty contents (failed and then passed the empty input through). It can be argued that a filter that produces anything but empty for an empty input is nonsense, but if the user wants to do strange things, then why not? * jh/filter-empty-contents: sha1_file: pass empty buffer to index empty file	2015-06-01 12:45:11 -07:00
Junio C Hamano	9ca0aaf6de	xmmap(): drop "Out of memory?" We show that message with die_errno(), but the OS is ought to know why mmap(2) failed much better than we do. There is no reason for us to say "Out of memory?" here. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-05-28 11:35:25 -07:00
Jeff King	1570856b51	config.c: avoid xmmap error messages The config-writing code uses xmmap to map the existing config file, which will die if the map fails. This has two downsides: 1. The error message is not very helpful, as it lacks any context about the file we are mapping: $ mkdir foo $ git config --file=foo some.key value fatal: Out of memory? mmap failed: No such device 2. We normally do not die in this code path; instead, we'd rather report the error and return an appropriate exit status (which is part of the public interface documented in git-config.1). This patch introduces a "gentle" form of xmmap which lets us produce our own error message. We do not want to use mmap directly, because we would like to use the other compatibility elements of xmmap (e.g., handling 0-length maps portably). The end result is: $ git.compile config --file=foo some.key value error: unable to mmap 'foo': No such device $ echo $? 3 Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-05-28 11:33:18 -07:00
Junio C Hamano	1e6c8babf8	Merge branch 'jc/hash-object' into maint "hash-object --literally" introduced in v2.2 was not prepared to take a really long object type name. * jc/hash-object: write_sha1_file(): do not use a separate sha1[] array t1007: add hash-object --literally tests hash-object --literally: fix buffer overrun with extra-long object type git-hash-object.txt: document --literally option	2015-05-26 13:49:25 -07:00
Junio C Hamano	3b7d373ae2	Merge branch 'kn/cat-file-literally' Add the "--allow-unknown-type" option to "cat-file" to allow inspecting loose objects of an experimental or a broken type. * kn/cat-file-literally: t1006: add tests for git cat-file --allow-unknown-type cat-file: teach cat-file a '--allow-unknown-type' option cat-file: make the options mutually exclusive sha1_file: support reading from a loose object of unknown type	2015-05-19 13:17:58 -07:00
Jim Hill	f6a1e1e288	sha1_file: pass empty buffer to index empty file `git add` of an empty file with a filter pops complaints from `copy_fd` about a bad file descriptor. This traces back to these lines in sha1_file.c:index_core: if (!size) { ret = index_mem(sha1, NULL, size, type, path, flags); The problem here is that content to be added to the index can be supplied from an fd, or from a memory buffer, or from a pathname. This call is supplying a NULL buffer pointer and a zero size. Downstream logic takes the complete absence of a buffer to mean the data is to be found elsewhere -- for instance, these, from convert.c: if (params->src) { write_err = (write_in_full(child_process.in, params->src, params->size) < 0); } else { write_err = copy_fd(params->fd, child_process.in); } ~If there's a buffer, write from that, otherwise the data must be coming from an open fd.~ Perfectly reasonable logic in a routine that's going to write from either a buffer or an fd. So change `index_core` to supply an empty buffer when indexing an empty file. There's a patch out there that instead changes the logic quoted above to take a `-1` fd to mean "use the buffer", but it seems to me that the distinction between a missing buffer and an empty one carries intrinsic semantics, where the logic change is adapting the code to handle incorrect arguments. Signed-off-by: Jim Hill <gjthill@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-05-18 10:15:20 -07:00
Junio C Hamano	ebb464f0cb	Merge branch 'jk/prune-mtime' into maint Access to objects in repositories that borrow from another one on a slow NFS server unnecessarily got more expensive due to recent code becoming more cautious in a naive way not to lose objects to pruning. * jk/prune-mtime: sha1_file: only freshen packs once per run sha1_file: freshen pack objects before loose reachable: only mark local objects as recent	2015-05-13 14:05:50 -07:00
Junio C Hamano	051086b947	Merge branch 'jc/hash-object' "hash-object --literally" introduced in v2.2 was not prepared to take a really long object type name. * jc/hash-object: write_sha1_file(): do not use a separate sha1[] array t1007: add hash-object --literally tests hash-object --literally: fix buffer overrun with extra-long object type git-hash-object.txt: document --literally option	2015-05-11 14:23:59 -07:00
Junio C Hamano	cedeffeee0	Merge branch 'jk/sha1-file-reduce-useless-warnings' * jk/sha1-file-reduce-useless-warnings: sha1_file: squelch "packfile cannot be accessed" warnings	2015-05-11 14:23:41 -07:00
Junio C Hamano	68a2e6a2c8	Merge branch 'nd/multiple-work-trees' A replacement for contrib/workdir/git-new-workdir that does not rely on symbolic links and make sharing of objects and refs safer by making the borrowee and borrowers aware of each other. * nd/multiple-work-trees: (41 commits) prune --worktrees: fix expire vs worktree existence condition t1501: fix test with split index t2026: fix broken &&-chain t2026 needs procondition SANITY git-checkout.txt: a note about multiple checkout support for submodules checkout: add --ignore-other-wortrees checkout: pass whole struct to parse_branchname_arg instead of individual flags git-common-dir: make "modules/" per-working-directory directory checkout: do not fail if target is an empty directory t2025: add a test to make sure grafts is working from a linked checkout checkout: don't require a work tree when checking out into a new one git_path(): keep "info/sparse-checkout" per work-tree count-objects: report unused files in $GIT_DIR/worktrees/... gc: support prune --worktrees gc: factor out gc.pruneexpire parsing code gc: style change -- no SP before closing parenthesis checkout: clean up half-prepared directories in --to mode checkout: reject if the branch is already checked out elsewhere prune: strategies for linked checkouts checkout: support checking out into a new working directory ...	2015-05-11 14:23:39 -07:00
Karthik Nayak	46f034483e	sha1_file: support reading from a loose object of unknown type Update sha1_loose_object_info() to optionally allow it to read from a loose object file of unknown/bogus type; as the function usually returns the type of the object it read in the form of enum for known types, add an optional "typename" field to receive the name of the type in textual form and a flag to indicate the reading of a loose object file of unknown/bogus type. Add parse_sha1_header_extended() which acts as a wrapper around parse_sha1_header() allowing more information to be obtained. Add unpack_sha1_header_to_strbuf() to unpack sha1 headers of unknown/corrupt objects which have a unknown sha1 header size to a strbuf structure. This was written by Junio C Hamano but tested by me. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Hepled-by: Jeff King <peff@peff.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-05-06 13:35:48 -07:00
Junio C Hamano	e3b199aef1	Merge branch 'jk/prune-mtime' Access to objects in repositories that borrow from another one on a slow NFS server unnecessarily got more expensive due to recent code becoming more cautious in a naive way not to lose objects to pruning. * jk/prune-mtime: sha1_file: only freshen packs once per run sha1_file: freshen pack objects before loose reachable: only mark local objects as recent	2015-05-05 21:00:37 -07:00
Junio C Hamano	1427a7ff70	write_sha1_file(): do not use a separate sha1[] array In the beginning, write_sha1_file() did not have a way to tell the caller the name of the object it wrote to the caller. This was changed in `d6d3f9d0` (This implements the new "recursive tree" write-tree., 2005-04-09) by adding the "returnsha1" parameter to the function so that the callers who are interested in the value can optionally pass a pointer to receive it. It turns out that all callers do want to know the name of the object it just has written. Nobody passes a NULL to this parameter, hence it is not necessary to use a separate sha1[] array to receive the result from write_sha1_file_prepare(), and copy the result to the returnsha1 supplied by the caller. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-05-05 10:17:56 -07:00
Eric Sunshine	0c3db67cc8	hash-object --literally: fix buffer overrun with extra-long object type "hash-object" learned in `5ba9a93` (hash-object: add --literally option, 2014-09-11) to allow crafting a corrupt/broken object of unknown type. When the user-provided type is particularly long, however, it can overflow the relatively small stack-based character array handed to write_sha1_file_prepare() by hash_sha1_file() and write_sha1_file(), leading to stack corruption (and crash). Introduce a custom helper to allow arbitrarily long typenames just for "hash-object --literally". [jc: Eric's original used a strbuf in the more common codepaths, and I rewrote it to avoid penalizing the non-literally code. Bugs are mine] Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-05-05 10:14:18 -07:00
Jeff King	ee1c6c34ac	sha1_file: only freshen packs once per run Since `33d4221` (write_sha1_file: freshen existing objects, 2014-10-15), we update the mtime of existing objects that we would have written out (had they not existed). For the common case in which many objects are packed, we may update the mtime on a single packfile repeatedly. This can result in a noticeable performance problem if calling utime() is expensive (e.g., because your storage is on NFS). We can fix this by keeping a per-pack flag that lets us freshen only once per program invocation. An alternative would be to keep the packed_git.mtime flag up to date as we freshen, and freshen only once every N seconds. In practice, it's not worth the complexity. We are racing against prune expiration times here, which inherently must be set to accomodate reasonable program running times (because they really care about the time between an object being written and it becoming referenced, and the latter is typically the last step a program takes). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-04-20 13:09:40 -07:00
Jeff King	b5f52f372e	sha1_file: freshen pack objects before loose When writing out an object file, we first check whether it already exists and if so optimize out the write. Prior to `33d4221`, we did this by calling has_sha1_file(), which will check for packed objects followed by loose. Since that commit, we check loose objects first. For the common case of a repository whose objects are mostly packed, this means we will make a lot of extra access() system calls checking for loose objects. We should follow the same packed-then-loose order that all of our other lookups use. Reported-by: Stefan Saasen <ssaasen@atlassian.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-04-20 13:09:38 -07:00
Jeff King	1385bb7ba3	reachable: only mark local objects as recent When pruning and repacking a repository that has an alternate object store configured, we may traverse a large number of objects in the alternate. This serves no purpose, and may be expensive to do. A longer explanation is below. Commits `d3038d2` and `abcb865` taught prune and pack-objects (respectively) to treat "recent" objects as tips for reachability, so that we keep whole chunks of history. They built on the object traversal in `660c889` (sha1_file: add for_each iterators for loose and packed objects, 2014-10-15), which covers both local and alternate objects. In both cases, covering alternate objects is unnecessary, as both commands can only drop objects from the local repository. In the case of prune, we traverse only the local object directory. And in the case of repacking, while we may or may not include local objects in our pack, we will never reach into the alternate with "repack -d". The "-l" option is only a question of whether we are migrating objects from the alternate into our repository, or leaving them untouched. It is possible that we may drop an object that is depended upon by another object in the alternate. For example, imagine two repositories, A and B, with A pointing to B as an alternate. Now imagine a commit that is in B which references a tree that is only in A. Traversing from recent objects in B might prevent A from dropping that tree. But this case isn't worth covering. Repo B should take responsibility for its own objects. It would never have had the commit in the first place if it did not also have the tree, and assuming it is using the same "keep recent chunks of history" scheme, then it would itself keep the tree, as well. So checking the alternate objects is not worth doing, and come with a significant performance impact. In both cases, we skip any recent objects that have already been marked SEEN (i.e., that we know are already reachable for prune, or included in the pack for a repack). So there is a slight waste of time in opening the alternate packs at all, only to notice that we have already considered each object. But much worse, the alternate repository may have a large number of objects that are not reachable from the local repository at all, and we end up adding them to the traversal. We can fix this by considering only local unseen objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-04-20 13:09:27 -07:00
Jeff King	319b678a7b	sha1_file: squelch "packfile cannot be accessed" warnings When we find an object in a packfile index, we make sure we can still open the packfile itself (or that it is already open), as it might have been deleted by a simultaneous repack. If we can't access the packfile, we print a warning for the user and tell the caller that we don't have the object (we can then look in other packfiles, or find a loose version, before giving up). The warning we print to the user isn't really accomplishing anything, and it is potentially confusing to users. In the normal case, it is complete noise; we find the object elsewhere, and the user does not have to care that we racily saw a packfile index that became stale. It didn't affect the operation at all. A possibly more interesting case is when we later can't find the object, and report failure to the user. In this case the warning could be considered a clue toward that ultimate failure. But it's not really a useful clue in practice. We wouldn't even print it consistently (since we are racing with another process, we might not even see the .idx file, or we might win the race and open the packfile, completing the operation). This patch drops the warning entirely (not only from the fill_pack_entry site, but also from an identical use in pack-objects). If we did find the warning interesting in the error case, we could stuff it away and reveal it to the user when we later die() due to the broken object. But that complexity just isn't worth it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-03-30 21:47:39 -07:00
Junio C Hamano	a393c6bfd9	Merge branch 'rs/deflate-init-cleanup' into maint Code simplification. * rs/deflate-init-cleanup: zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}	2015-03-23 11:23:38 -07:00
Junio C Hamano	6902c4da58	Merge branch 'rs/deflate-init-cleanup' Code simplification. * rs/deflate-init-cleanup: zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}	2015-03-17 16:01:26 -07:00
René Scharfe	9a6f1287fb	zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw} Clear the git_zstream variable at the start of git_deflate_init() etc. so that callers don't have to do that. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-03-05 15:46:03 -08:00
Junio C Hamano	cbc8d6d8f8	Merge branch 'jk/prune-mtime' into maint In v2.2.0, we broke "git prune" that runs in a repository that borrows from an alternate object store. * jk/prune-mtime: sha1_file: fix iterating loose alternate objects for_each_loose_file_in_objdir: take an optional strbuf path	2015-03-05 13:13:08 -08:00
Junio C Hamano	7cf6232e2c	Merge branch 'jk/prune-mtime' In v2.2.0, we broke "git prune" that runs in a repository that borrows from an alternate object store. * jk/prune-mtime: sha1_file: fix iterating loose alternate objects for_each_loose_file_in_objdir: take an optional strbuf path	2015-02-22 12:28:28 -08:00
Jonathon Mah	b0a4264277	sha1_file: fix iterating loose alternate objects The string in 'base' contains a path suffix to a specific object; when its value is used, the suffix must either be filled (as in stat_sha1_file, open_sha1_file, check_and_freshen_nonlocal) or cleared (as in prepare_packed_git) to avoid junk at the end. `660c889e` (sha1_file: add for_each iterators for loose and packed objects, 2014-10-15) introduced loose_from_alt_odb(), but this did neither and treated 'base' as a complete path to the "base" object directory, instead of a pointer to the "base" of the full path string. The trailing path after 'base' is still initialized to NUL, hiding the bug in some common cases. Additionally the descendent for_each_file_in_obj_subdir() function swallows ENOENT, so an error only shows if the alternate's path was last filled with a valid object (where statting /path/to/existing/00/0bjectfile/00 fails). Signed-off-by: Jonathon Mah <me@JonathonMah.com> Helped-by: Kyle J. McKay <mackyle@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-02-09 14:14:56 -08:00
Jeff King	e6f875e052	for_each_loose_file_in_objdir: take an optional strbuf path We feed a root "objdir" path to this iterator function, which then copies the result into a strbuf, so that it can repeatedly append the object sub-directories to it. Let's make it easy for callers to just pass us a strbuf in the first place. We leave the original interface as a convenience for callers who want to just pass a const string like the result of get_object_directory(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-02-09 14:14:53 -08:00
Nguyễn Thái Ngọc Duy	dcf692625a	path.c: make get_pathname() call sites return const char * Before the previous commit, get_pathname returns an array of PATH_MAX length. Even if git_path() and similar functions does not use the whole array, git_path() caller can, in theory. After the commit, get_pathname() may return a buffer that has just enough room for the returned string and git_path() caller should never write beyond that. Make git_path(), mkpath() and git_path_submodule() return a const buffer to make sure callers do not write in it at all. This could have been part of the previous commit, but the "const" conversion is too much distraction from the core changes in path.c. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-01 11:00:10 -08:00
Michael Haggerty	3383e19984	sort_string_list(): rename to string_list_sort() The new name is more consistent with the names of other string_list-related functions. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-11-25 10:11:34 -08:00
Junio C Hamano	d70e331c0e	Merge branch 'jk/prune-mtime' Tighten the logic to decide that an unreachable cruft is sufficiently old by covering corner cases such as an ancient object becoming reachable and then going unreachable again, in which case its retention period should be prolonged. * jk/prune-mtime: (28 commits) drop add_object_array_with_mode revision: remove definition of unused 'add_object' function pack-objects: double-check options before discarding objects repack: pack objects mentioned by the index pack-objects: use argv_array reachable: use revision machinery's --indexed-objects code rev-list: add --indexed-objects option rev-list: document --reflog option t5516: test pushing a tag of an otherwise unreferenced blob traverse_commit_list: support pending blobs/trees with paths make add_object_array_with_context interface more sane write_sha1_file: freshen existing objects pack-objects: match prune logic for discarding objects pack-objects: refactor unpack-unreachable expiration check prune: keep objects reachable from recent objects sha1_file: add for_each iterators for loose and packed objects count-objects: use for_each_loose_file_in_objdir count-objects: do not use xsize_t when counting object size prune-packed: use for_each_loose_file_in_objdir reachable: mark index blobs as SEEN ...	2014-10-29 10:07:56 -07:00
Jeff King	33d4221c79	write_sha1_file: freshen existing objects When we try to write a loose object file, we first check whether that object already exists. If so, we skip the write as an optimization. However, this can interfere with prune's strategy of using mtimes to mark files in progress. For example, if a branch contains a particular tree object and is deleted, that tree object may become unreachable, and have an old mtime. If a new operation then tries to write the same tree, this ends up as a noop; we notice we already have the object and do nothing. A prune running simultaneously with this operation will see the object as old, and may delete it. We can solve this by "freshening" objects that we avoid writing by updating their mtime. The algorithm for doing so is essentially the same as that of has_sha1_file. Therefore we provide a new (static) interface "check_and_freshen", which finds and optionally freshens the object. It's trivial to implement freshening and simple checking by tweaking a single parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-16 10:10:43 -07:00
Jeff King	660c889e46	sha1_file: add for_each iterators for loose and packed objects We typically iterate over the reachable objects in a repository by starting at the tips and walking the graph. There's no easy way to iterate over all of the objects, including unreachable ones. Let's provide a way of doing so. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-16 10:10:41 -07:00
Jeff King	27e1e22d5e	prune: factor out loose-object directory traversal Prune has to walk $GIT_DIR/objects/?? in order to find the set of loose objects to prune. Other parts of the code (e.g., count-objects) want to do the same. Let's factor it out into a reusable for_each-style function. Note that this is not quite a straight code movement. The original code had strange behavior when it found a file of the form "[0-9a-f]{2}/.{38}" that did _not_ contain all hex digits. It executed a "break" from the loop, meaning that we stopped pruning in that directory (but still pruned other directories!). This was probably a bug; we do not want to process the file as an object, but we should keep going otherwise (and that is how the new code handles it). We are also a little more careful with loose object directories which fail to open. The original code silently ignored any failures, but the new code will complain about any problems besides ENOENT. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-16 10:10:39 -07:00
Jeff King	fe1b22686f	foreach_alt_odb: propagate return value from callback We check the return value of the callback and stop iterating if it is non-zero. However, we do not make the non-zero return value available to the caller, so they have no way of knowing whether the operation succeeded or not (technically they can keep their own error flag in the callback data, but that is unlike our other for_each functions). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-16 10:10:35 -07:00
Junio C Hamano	bd107e1052	Merge branch 'mh/lockfile' The lockfile API and its users have been cleaned up. * mh/lockfile: (38 commits) lockfile.h: extract new header file for the functions in lockfile.c hold_locked_index(): move from lockfile.c to read-cache.c hold_lock_file_for_append(): restore errno before returning get_locked_file_path(): new function lockfile.c: rename static functions lockfile: rename LOCK_NODEREF to LOCK_NO_DEREF commit_lock_file_to(): refactor a helper out of commit_lock_file() trim_last_path_component(): replace last_path_elm() resolve_symlink(): take a strbuf parameter resolve_symlink(): use a strbuf for internal scratch space lockfile: change lock_file::filename into a strbuf commit_lock_file(): use a strbuf to manage temporary space try_merge_strategy(): use a statically-allocated lock_file object try_merge_strategy(): remove redundant lock_file allocation struct lock_file: declare some fields volatile lockfile: avoid transitory invalid states git_config_set_multivar_in_file(): avoid call to rollback_lock_file() dump_marks(): remove a redundant call to rollback_lock_file() api-lockfile: document edge cases commit_lock_file(): rollback lock file on failure to rename ...	2014-10-14 10:49:45 -07:00
Junio C Hamano	f0d8900175	Merge branch 'sp/stream-clean-filter' When running a required clean filter, we do not have to mmap the original before feeding the filter. Instead, stream the file contents directly to the filter and process its output. * sp/stream-clean-filter: sha1_file: don't convert off_t to size_t too early to avoid potential die() convert: stream from fd to required clean filter to reduce used address space copy_fd(): do not close the input file descriptor mmap_limit: introduce GIT_MMAP_LIMIT to allow testing expected mmap size memory_limit: use git_env_ulong() to parse GIT_ALLOC_LIMIT config.c: add git_env_ulong() to parse environment variable convert: drop arguments other than 'path' from would_convert_to_git()	2014-10-08 13:05:32 -07:00
Michael Haggerty	697cc8efd9	lockfile.h: extract new header file for the functions in lockfile.c Move the interface declaration for the functions in lockfile.c from cache.h to a new file, lockfile.h. Add #includes where necessary (and remove some redundant includes of cache.h by files that already include builtin.h). Move the documentation of the lock_file state diagram from lockfile.c to the new header file. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-10-01 13:56:14 -07:00
Steffen Prohaska	9079ab7cb6	sha1_file: don't convert off_t to size_t too early to avoid potential die() xsize_t() checks if an off_t argument can be safely converted to a size_t return value. If the check is executed too early, it could fail for large files on 32-bit architectures even if the size_t code path is not taken. Other paths might be able to handle the large file. Specifically, index_stream_convert_blob() is able to handle a large file if a filter is configured that returns a small result. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-09-22 12:40:55 -07:00
Junio C Hamano	bedd3b4b7b	Merge branch 'nd/large-blobs' Teach a few codepaths to punt (instead of dying) when large blobs that would not fit in core are involved in the operation. * nd/large-blobs: diff: shortcut for diff'ing two binary SHA-1 objects diff --stat: mark any file larger than core.bigfilethreshold binary diff.c: allow to pass more flags to diff_populate_filespec sha1_file.c: do not die failing to malloc in unpack_compressed_entry wrapper.c: introduce gentle xmallocz that does not die()	2014-09-11 10:33:33 -07:00
Junio C Hamano	f655651e09	Merge branch 'rs/strbuf-getcwd' Reduce the use of fixed sized buffer passed to getcwd() calls by introducing xgetcwd() helper. * rs/strbuf-getcwd: use strbuf_add_absolute_path() to add absolute paths abspath: convert absolute_path() to strbuf use xgetcwd() to set $GIT_DIR use xgetcwd() to get the current directory or die wrapper: add xgetcwd() abspath: convert real_path_internal() to strbuf abspath: use strbuf_getcwd() to remember original working directory setup: convert setup_git_directory_gently_1 et al. to strbuf unix-sockets: use strbuf_getcwd() strbuf: add strbuf_getcwd()	2014-09-02 13:28:44 -07:00
Steffen Prohaska	9035d75a2b	convert: stream from fd to required clean filter to reduce used address space The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. If it fails, the original data is not immediately available. The condition can easily be handled as a fatal error, which is expected for a required filter anyway. If the filter was not required, the condition would need to be handled in a different way, like seeking to 0 and reading the data. But this would require more restructuring of the code and is probably not worth it. The obvious approach of falling back to reading all data would not help achieving the main purpose of this patch, which is to handle large files with limited address space. If reading all data is an option, we can simply take the old code path right away and mmap the entire file. The environment variable GIT_MMAP_LIMIT, which has been introduced in a previous commit is used to test that the expected code path is taken. A related test that exercises required filters is modified to verify that the data actually has been modified on its way from the file system to the object store. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-08-28 10:25:15 -07:00
Steffen Prohaska	02710228dd	mmap_limit: introduce GIT_MMAP_LIMIT to allow testing expected mmap size In order to test expectations about mmap in a way similar to testing expectations about malloc with GIT_ALLOC_LIMIT introduced by `d41489a6` (Add more large blob test cases, 2012-03-07), introduce a new environment variable GIT_MMAP_LIMIT to limit the largest allowed mmap length. xmmap() is modified to check the size of the requested region and fail it if it is beyond the limit. Together with GIT_ALLOC_LIMIT tests can now confirm expectations about memory consumption. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-08-28 10:25:14 -07:00
Steffen Prohaska	7ce7c7607b	convert: drop arguments other than 'path' from would_convert_to_git() It is only the path that matters in the decision whether to filter or not. Clarify this by making path the only argument of would_convert_to_git(). Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-08-21 15:27:20 -07:00
Nguyễn Thái Ngọc Duy	735efde838	sha1_file.c: do not die failing to malloc in unpack_compressed_entry Fewer die() gives better control to the caller, provided that the caller _can_ handle it. And in unpack_compressed_entry() case, it can, because unpack_compressed_entry() already returns NULL if it fails to inflate data. A side effect from this is fsck continues to run when very large blobs are present (and do not fit in memory). Noticed-by: Dale R. Worley <worley@alum.mit.edu> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-08-18 10:15:19 -07:00
Junio C Hamano	9f2de9c121	Merge branch 'kb/perf-trace' * kb/perf-trace: api-trace.txt: add trace API documentation progress: simplify performance measurement by using getnanotime() wt-status: simplify performance measurement by using getnanotime() git: add performance tracing for git's main() function to debug scripts trace: add trace_performance facility to debug performance issues trace: add high resolution timer function to debug performance issues trace: add 'file:line' to all trace output trace: move code around, in preparation to file:line output trace: add current timestamp to all trace output trace: disable additional trace output for unit tests trace: add infrastructure to augment trace output with additional info sha1_file: change GIT_TRACE_PACK_ACCESS logging to use trace API Documentation/git.txt: improve documentation of 'GIT_TRACE*' variables trace: improve trace performance trace: remove redundant printf format attribute trace: consistently name the format parameter trace: move trace declarations from cache.h to new trace.h	2014-07-22 10:59:19 -07:00
Junio C Hamano	a8c565b227	Merge branch 'ek/alt-odb-entry-fix' * ek/alt-odb-entry-fix: sha1_file: do not add own object directory as alternate	2014-07-21 11:18:46 -07:00
Junio C Hamano	6e4094731a	Merge branch 'jk/strip-suffix' * jk/strip-suffix: prepare_packed_git_one: refactor duplicate-pack check verify-pack: use strbuf_strip_suffix strbuf: implement strbuf_strip_suffix index-pack: use strip_suffix to avoid magic numbers use strip_suffix instead of ends_with in simple cases replace has_extension with ends_with implement ends_with via strip_suffix add strip_suffix function sha1_file: replace PATH_MAX buffer with strbuf in prepare_packed_git_one()	2014-07-16 11:26:00 -07:00
Junio C Hamano	5a3db94539	Merge branch 'rs/fix-alt-odb-path-comparison' into maint Code to avoid adding the same alternate object store twice was subtly broken for a long time, but nobody seems to have noticed. * rs/fix-alt-odb-path-comparison: sha1_file: avoid overrunning alternate object base string	2014-07-16 11:17:08 -07:00
Ephrim Khong	539e75069f	sha1_file: do not add own object directory as alternate When adding alternate object directories, we try not to add the directory of the current repository to avoid cycles. Unfortunately, that test was broken, since it compared an absolute with a relative path. Signed-off-by: Ephrim Khong <dr.khong@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-07-15 11:50:15 -07:00
Karsten Blees	67dc598ec4	sha1_file: change GIT_TRACE_PACK_ACCESS logging to use trace API This changes GIT_TRACE_PACK_ACCESS functionality as follows: * supports the same options as GIT_TRACE (e.g. printing to stderr) * no longer supports relative paths * appends to the trace file rather than overwriting Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-07-13 21:25:18 -07:00
Junio C Hamano	b41a4636ee	Merge branch 'rs/fix-alt-odb-path-comparison' * rs/fix-alt-odb-path-comparison: sha1_file: avoid overrunning alternate object base string	2014-07-10 11:27:52 -07:00
René Scharfe	80b47854ca	sha1_file: avoid overrunning alternate object base string While checking if a new alternate object database is a duplicate make sure that old and new base paths have the same length before comparing them with memcmp. This avoids overrunning the buffer of the existing entry if the new one is longer and it stops rejecting foobar/ after foo/ was already added. Signed-off-by: Rene Scharfe <ls.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-07-01 13:30:50 -07:00
Jeff King	47bf4b0fc5	prepare_packed_git_one: refactor duplicate-pack check When we are reloading the list of packs, we check whether a particular pack has been loaded. This is slightly tricky, because we load packs based on the presence of their ".idx" files, but record the name of the matching ".pack" file. Therefore we want to compare their bases. The existing code stripped off ".idx" from a file we found, then compared that whole base length to strings containing the ".pack" version. This meant we could end up comparing bytes past what the ".pack" string contained, if the ".idx" file name was much longer. In practice, it worked OK because memcmp would end up seeing a difference in the two strings and would return early before hitting the full length. However, memcmp may sometimes read extra bytes past a difference (e.g., because it is comparing 64-bit words), or is even free to compare in reverse order. Furthermore, our memcmp made no guarantees that we matched the whole pack name, up to ".pack". So "foo.idx" would match "foo-bar.pack", which is wrong (but does not typically happen, because our pack names have a fixed size). We can fix both issues, avoid magic numbers, and document that we expect to compare against a string with ".pack" by using strip_suffix. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-06-30 13:43:32 -07:00
Jeff King	2975c770ca	replace has_extension with ends_with These two are almost the same function, with the exception that has_extension only matches if there is content before the suffix. So ends_with(".exe", ".exe") is true, but has_extension would not be. This distinction does not matter to any of the callers, though, and we can just replace uses of has_extension with ends_with. We prefer the "ends_with" name because it is more generic, and there is nothing about the function that requires it to be used for file extensions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-06-30 13:43:16 -07:00
René Scharfe	880fb8de67	sha1_file: replace PATH_MAX buffer with strbuf in prepare_packed_git_one() Instead of using strbuf to create a message string in case a path is too long for our fixed-size buffer, replace that buffer with a strbuf and thus get rid of the limitation. Helped-by: Duy Nguyen <pclouds@gmail.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-06-30 13:43:16 -07:00
Junio C Hamano	81bd9b1000	Merge branch 'jk/report-fail-to-read-objects-better' into maint Reworded the error message given upon a failure to open an existing loose object file due to e.g. permission issues; it was reported as the object being corrupt, but that is not quite true. * jk/report-fail-to-read-objects-better: open_sha1_file: report "most interesting" errno	2014-06-25 11:43:58 -07:00
Junio C Hamano	9d1d882e9c	Merge branch 'jk/report-fail-to-read-objects-better' * jk/report-fail-to-read-objects-better: open_sha1_file: report "most interesting" errno	2014-06-16 10:06:15 -07:00
Jeff King	d6c8a05bd5	open_sha1_file: report "most interesting" errno When we try to open a loose object file, we first attempt to open in the local object database, and then try any alternates. This means that the errno value when we return will be from the last place we looked (and due to the way the code is structured, simply ENOENT if we do not have have any alternates). This can cause confusing error messages, as read_sha1_file checks for ENOENT when reporting a missing object. If errno is something else, we report that. If it is ENOENT, but has_loose_object reports that we have it, then we claim the object is corrupted. For example: $ chmod 0 .git/objects/??/* $ git rev-list --all fatal: loose object b2d6fab18b92d49eac46dc3c5a0bcafabda20131 (stored in .git/objects/b2/d6fab18b92d49eac46dc3c5a0bcafabda20131) is corrupt This patch instead keeps track of the "most interesting" errno we receive during our search. We consider ENOENT to be the least interesting of all, and otherwise report the first error found (so problems in the object database take precedence over ones in alternates). Here it is with this patch: $ git rev-list --all fatal: failed to read object b2d6fab18b92d49eac46dc3c5a0bcafabda20131: Permission denied Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-05-15 10:03:06 -07:00
Junio C Hamano	d59c12d7ad	Merge branch 'jl/nor-or-nand-and' Eradicate mistaken use of "nor" (that is, essentially "nor" used not in "neither A nor B" ;-)) from in-code comments, command output strings, and documentations. * jl/nor-or-nand-and: code and test: fix misuses of "nor" comments: fix misuses of "nor" contrib: fix misuses of "nor" Documentation: fix misuses of "nor"	2014-04-08 12:00:28 -07:00
Justin Lebar	235e8d5914	code and test: fix misuses of "nor" Signed-off-by: Justin Lebar <jlebar@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-31 15:29:33 -07:00
Justin Lebar	01689909eb	comments: fix misuses of "nor" Signed-off-by: Justin Lebar <jlebar@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-31 15:29:27 -07:00
Junio C Hamano	fe9122a352	Merge branch 'dd/use-alloc-grow' Replace open-coded reallocation with ALLOC_GROW() macro. * dd/use-alloc-grow: sha1_file.c: use ALLOC_GROW() in pretend_sha1_file() read-cache.c: use ALLOC_GROW() in add_index_entry() builtin/mktree.c: use ALLOC_GROW() in append_to_tree() attr.c: use ALLOC_GROW() in handle_attr_line() dir.c: use ALLOC_GROW() in create_simplify() reflog-walk.c: use ALLOC_GROW() replace_object.c: use ALLOC_GROW() in register_replace_object() patch-ids.c: use ALLOC_GROW() in add_commit() diffcore-rename.c: use ALLOC_GROW() diff.c: use ALLOC_GROW() commit.c: use ALLOC_GROW() in register_commit_graft() cache-tree.c: use ALLOC_GROW() in find_subtree() bundle.c: use ALLOC_GROW() in add_to_ref_list() builtin/pack-objects.c: use ALLOC_GROW() in check_pbase_path()	2014-03-18 13:50:21 -07:00
Junio C Hamano	decba94d2c	Merge branch 'nd/sha1-file-delta-stack-leakage-fix' Fix a small leak in the delta stack used when resolving a long delta chain at runtime. * nd/sha1-file-delta-stack-leakage-fix: sha1_file: fix delta_stack memory leak in unpack_entry	2014-03-18 13:49:23 -07:00
Junio C Hamano	060be00621	Merge branch 'mh/object-code-cleanup' * mh/object-code-cleanup: sha1_file.c: document a bunch of functions defined in the file sha1_file_name(): declare to return a const string find_pack_entry(): document last_found_pack replace_object: use struct members instead of an array	2014-03-14 14:26:29 -07:00
Dmitry S. Dolzhenko	c7353967ca	sha1_file.c: use ALLOC_GROW() in pretend_sha1_file() Helped-by: He Sun <sunheehnus@gmail.com> Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-03 14:54:58 -08:00
Junio C Hamano	0f9e62e084	Merge branch 'jk/pack-bitmap' Borrow the bitmap index into packfiles from JGit to speed up enumeration of objects involved in a commit range without having to fully traverse the history. * jk/pack-bitmap: (26 commits) ewah: unconditionally ntohll ewah data ewah: support platforms that require aligned reads read-cache: use get_be32 instead of hand-rolled ntoh_l block-sha1: factor out get_be and put_be wrappers do not discard revindex when re-preparing packfiles pack-bitmap: implement optional name_hash cache t/perf: add tests for pack bitmaps t: add basic bitmap functionality tests count-objects: recognize .bitmap in garbage-checking repack: consider bitmaps when performing repacks repack: handle optional files created by pack-objects repack: turn exts array into array-of-struct repack: stop using magic number for ARRAY_SIZE(exts) pack-objects: implement bitmap writing rev-list: add bitmap mode to speed up object lists pack-objects: use bitmaps when packing objects pack-objects: split add_object_entry pack-bitmap: add support for bitmap indexes documentation: add documentation for the bitmap format ewah: compressed bitmap implementation ...	2014-02-27 14:01:48 -08:00
Michael Haggerty	d40d535b89	sha1_file.c: document a bunch of functions defined in the file Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 16:01:11 -08:00
Michael Haggerty	30d6c6eabf	sha1_file_name(): declare to return a const string Change the return value of sha1_file_name() to (const char *). (Callers have no business mucking about here.) Change callers accordingly, deleting a few superfluous temporary variables along the way. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 09:10:22 -08:00
Michael Haggerty	1b1005d1b5	find_pack_entry(): document last_found_pack Add a comment at the declaration of last_found_pack and where it is used in find_pack_entry(). In the latter, separate the cases (1) to make a place for the new comment and (2) to turn the success case into affirmative logic. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Reviewed-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 09:09:56 -08:00
Nguyễn Thái Ngọc Duy	019d1e65f5	sha1_file: fix delta_stack memory leak in unpack_entry This delta_stack array can grow to any length depending on the actual delta chain, but we forget to free it. Normally it does not matter because we use small_delta_stack[] from stack and small_delta_stack can hold 64-delta chains, more than standard --depth=50 in pack-objects. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 09:07:12 -08:00
Junio C Hamano	33d4669aaa	Merge branch 'ss/safe-create-leading-dir-with-slash' "git clone $origin foo\bar\baz" on Windows failed to create the leading directories (i.e. a moral-equivalent of "mkdir -p"). * ss/safe-create-leading-dir-with-slash: safe_create_leading_directories(): on Windows, \ can separate path components	2014-01-27 10:45:37 -08:00
Junio C Hamano	d0956cfa8e	Merge branch 'mh/safe-create-leading-directories' Code clean-up and protection against concurrent write access to the ref namespace. * mh/safe-create-leading-directories: rename_tmp_log(): on SCLD_VANISHED, retry rename_tmp_log(): limit the number of remote_empty_directories() attempts rename_tmp_log(): handle a possible mkdir/rmdir race rename_ref(): extract function rename_tmp_log() remove_dir_recurse(): handle disappearing files and directories remove_dir_recurse(): tighten condition for removing unreadable dir lock_ref_sha1_basic(): if locking fails with ENOENT, retry lock_ref_sha1_basic(): on SCLD_VANISHED, retry safe_create_leading_directories(): add new error value SCLD_VANISHED cmd_init_db(): when creating directories, handle errors conservatively safe_create_leading_directories(): introduce enum for return values safe_create_leading_directories(): always restore slash at end of loop safe_create_leading_directories(): split on first of multiple slashes safe_create_leading_directories(): rename local variable safe_create_leading_directories(): add explicit "slash" pointer safe_create_leading_directories(): reduce scope of local variable safe_create_leading_directories(): fix format of "if" chaining	2014-01-27 10:45:33 -08:00
Michael Haggerty	0f5274033e	safe_create_leading_directories(): on Windows, \ can separate path components When cloning to a directory "C:\foo\bar" from Windows' cmd.exe where "foo" does not exist yet, Git would throw an error like fatal: could not create work tree dir 'c:\foo\bar'.: No such file or directory Fix this by not hard-coding a platform specific directory separator into safe_create_leading_directories(). This patch, including its entire commit message, is derived from a patch by Sebastian Schuberth. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-22 11:00:07 -08:00
Jeff King	1a6d8b9148	do not discard revindex when re-preparing packfiles When an object lookup fails, we re-read the objects/pack directory to pick up any new packfiles that may have been created since our last read. We also discard any pack revindex structs we've allocated. The discarding is a problem for the pack-bitmap code, which keeps a pointer to the revindex for the bitmapped pack. After the discard, the pointer is invalid, and we may read free()d memory. Other revindex users do not keep a bare pointer to the revindex; instead, they always access it through revindex_for_pack(), which lazily builds the revindex. So one solution is to teach the pack-bitmap code a similar trick. It would be slightly less efficient, but probably not all that noticeable. However, it turns out this discarding is not actually necessary. When we call reprepare_packed_git, we do not throw away our old pack list. We keep the existing entries, and only add in new ones. So there is no safety problem; we will still have the pack struct that matches each revindex. The packfile itself may go away, of course, but we are already prepared to handle that, and it may happen outside of reprepare_packed_git anyway. Throwing away the revindex may save some RAM if the pack never gets reused (about 12 bytes per object). But it also wastes some CPU time (to regenerate the index) if the pack does get reused. It's hard to say which is more valuable, but in either case, it happens very rarely (only when we race with a simultaneous repack). Just leaving the revindex in place is simple and safe both for current and future code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-16 14:33:46 -08:00
Junio C Hamano	b2132068c6	Merge branch 'jk/oi-delta-base' Teach "cat-file --batch" to show delta-base object name for a packed object that is represented as a delta. * jk/oi-delta-base: cat-file: provide %(deltabase) batch format sha1_object_info_extended: provide delta base sha1s	2014-01-10 10:33:11 -08:00
Junio C Hamano	c4bccea2d5	Merge branch 'jh/rlimit-nofile-fallback' When we figure out how many file descriptors to allocate for keeping packfiles open, a system with non-working getrlimit() could cause us to die(), but because we make this call only to get a rough estimate of how many is available and we do not even attempt to use up all file descriptors available ourselves, it is nicer to fall back to a reasonable low value rather than dying. * jh/rlimit-nofile-fallback: get_max_fd_limit(): fall back to OPEN_MAX upon getrlimit/sysconf failure	2014-01-10 10:32:28 -08:00
Junio C Hamano	b0504a9519	Merge branch 'cc/replace-object-info' read_sha1_file() that is the workhorse to read the contents given an object name honoured object replacements, but there is no corresponding mechanism to sha1_object_info() that is used to obtain the metainfo (e.g. type & size) about the object, leading callers to weird inconsistencies. * cc/replace-object-info: replace info: rename 'full' to 'long' and clarify in-code symbols Documentation/git-replace: describe --format option builtin/replace: unset read_replace_refs t6050: add tests for listing with --format builtin/replace: teach listing using short, medium or full formats sha1_file: perform object replacement in sha1_object_info_extended() t6050: show that git cat-file --batch fails with replace objects sha1_object_info_extended(): add an "unsigned flags" parameter sha1_file.c: add lookup_replace_object_extended() to pass flags replace_object: don't check read_replace_refs twice rename READ_SHA1_FILE_REPLACE flag to LOOKUP_REPLACE_OBJECT	2014-01-10 10:32:10 -08:00
Michael Haggerty	18d37e860d	safe_create_leading_directories(): add new error value SCLD_VANISHED Add a new possible error result that can be returned by safe_create_leading_directories() and safe_create_leading_directories_const(): SCLD_VANISHED. This value indicates that a file or directory on the path existed at one point (either it already existed or the function created it), but then it disappeared. This probably indicates that another process deleted the directory while we were working. If SCLD_VANISHED is returned, the caller might want to retry the function call, as there is a chance that a new attempt will succeed. Why doesn't safe_create_leading_directories() do the retrying internally? Because an empty directory isn't really ever safe until it holds a file. So even if safe_create_leading_directories() were absolutely sure that the directory existed before it returned, there would be no guarantee that the directory still existed when the caller tried to write something in it. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:34:22 -08:00
Michael Haggerty	0be0521b23	safe_create_leading_directories(): introduce enum for return values Instead of returning magic integer values (which a couple of callers go to the trouble of distinguishing), return values from an enum. Add a docstring. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:34:21 -08:00
Michael Haggerty	9e6f885d14	safe_create_leading_directories(): always restore slash at end of loop Always restore the slash that we scribbled over at the end of the loop, rather than also fixing it up at each premature exit from the loop. This makes it harder to forget to do the cleanup as new paths are added to the code. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:34:21 -08:00
Michael Haggerty	bf10cf70ad	safe_create_leading_directories(): split on first of multiple slashes If the input path has multiple slashes between path components (e.g., "foo//bar"), then the old code was breaking the path at the last slash, not the first one. So in the above example, the second slash was overwritten with NUL, resulting in the parent directory being sought as "foo/". When stat() is called on "foo/", it fails with ENOTDIR if "foo" exists but is not a directory. This caused the wrong path to be taken in the subsequent logic. So instead, split path components at the first intercomponent slash rather than the last one. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:34:20 -08:00
Michael Haggerty	26c8ae2a57	safe_create_leading_directories(): rename local variable Rename "pos" to "next_component", because now it always points at the next component of the path name that has to be processed. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:34:20 -08:00
Michael Haggerty	831651fde8	safe_create_leading_directories(): add explicit "slash" pointer Keep track of the position of the slash character independently of "pos", thereby making the purpose of each variable clearer and working towards other upcoming changes. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:34:19 -08:00
Michael Haggerty	f05023324c	safe_create_leading_directories(): reduce scope of local variable This makes it more obvious that values of "st" don't persist across loop iterations. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:34:19 -08:00
Michael Haggerty	53a3972171	safe_create_leading_directories(): fix format of "if" chaining Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:34:19 -08:00
Nguyễn Thái Ngọc Duy	d3d3e4c490	count-objects: recognize .bitmap in garbage-checking Count-objects will report any "garbage" files in the packs directory, including files whose extensions it does not know (case 1), and files whose matching ".pack" file is missing (case 2). Without having learned about ".bitmap" files, the current code reports all such files as garbage (case 1), even if their pack exists. Instead, they should be treated as case 2. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:23 -08:00
Jeff King	5d642e7506	sha1_object_info_extended: provide delta base sha1s A caller of sha1_object_info_extended technically has enough information to determine the base sha1 from the results of the call. It knows the pack, offset, and delta type of the object, which is sufficient to find the base. However, the functions to do so are not publicly available, and the code itself is intimate enough with the pack details that it should be abstracted away. We could add a public helper to allow callers to query the delta base separately, but it is simpler and slightly more efficient to optionally grab it along with the rest of the object_info data. For cases where the object is not stored as a delta, we write the null sha1 into the query field. A careful caller could check "oi.whence == OI_PACKED && oi.u.packed.is_delta" before looking at the base sha1, but using the null sha1 provides a simple alternative (and gives a better sanity check for a non-careful caller than simply returning random bytes). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-26 11:53:32 -08:00
Junio C Hamano	491a8dec44	get_max_fd_limit(): fall back to OPEN_MAX upon getrlimit/sysconf failure On broken systems where RLIMIT_NOFILE is visible by the compliers but underlying getrlimit() system call does not behave, we used to simply die() when we are trying to decide how many file descriptors to allocate for keeping packfiles open. Instead, allow the fallback codepath to take over when we get such a failure from getrlimit(). The same issue exists with _SC_OPEN_MAX and sysconf(); restructure the code in a similar way to prepare for a broken sysconf() as well. Noticed-by: Joey Hess <joey@kitenet.net> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-18 14:59:43 -08:00
Junio C Hamano	a5d56530e0	Merge branch 'jh/loose-object-dirs-creation-race' into maint Two processes creating loose objects at the same time could have failed unnecessarily when the name of their new objects started with the same byte value, due to a race condition. * jh/loose-object-dirs-creation-race: sha1_file.c:create_tmpfile(): Fix race when creating loose object dirs	2013-12-17 11:32:50 -08:00
Junio C Hamano	66c24cd8a4	Merge branch 'sb/sha1-loose-object-info-check-existence' into maint "git cat-file --batch-check=ok" did not check the existence of the named object. * sb/sha1-loose-object-info-check-existence: sha1_loose_object_info(): do not return success on missing object	2013-12-17 11:31:18 -08:00
Christian Couder	1f7117ef7a	sha1_file: perform object replacement in sha1_object_info_extended() sha1_object_info_extended() should perform object replacement if it is needed. The simplest way to do that is to make it call lookup_replace_object_extended(). And now its "unsigned flags" parameter is used as it is passed to lookup_replace_object_extended(). Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-12 11:53:49 -08:00
Christian Couder	de7b5d6218	sha1_object_info_extended(): add an "unsigned flags" parameter This parameter is not used yet, but it will be used to tell sha1_object_info_extended() if it should perform object replacement or not. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-12 11:53:48 -08:00
Christian Couder	bf93eea0f6	sha1_file.c: add lookup_replace_object_extended() to pass flags Currently, there is only one caller to lookup_replace_object() that can benefit from passing it some flags, but we expect that there could be more. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-12 11:53:48 -08:00
Christian Couder	ffe68cf9ac	rename READ_SHA1_FILE_REPLACE flag to LOOKUP_REPLACE_OBJECT The READ_SHA1_FILE_REPLACE flag is more related to using the lookup_replace_object() function rather than the read_sha1_file() function. We also need such a flag to be used with sha1_object_info() instead of read_sha1_file(). The name LOOKUP_REPLACE_OBJECT is therefore better for this flag. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-12 11:53:48 -08:00
Junio C Hamano	dd1cec578d	Merge branch 'jk/remove-experimental-loose-object-support' * jk/remove-experimental-loose-object-support: drop support for "experimental" loose objects	2013-12-06 11:09:43 -08:00
Junio C Hamano	c17fa972d3	Merge branch 'sb/sha1-loose-object-info-check-existence' "git cat-file --batch-check=ok" did not check the existence of the named object. * sb/sha1-loose-object-info-check-existence: sha1_loose_object_info(): do not return success on missing object	2013-12-05 13:00:12 -08:00
Junio C Hamano	86cd8dc8e7	Merge branch 'jh/loose-object-dirs-creation-race' When two processes created one loose object file each, which fell into the same fan-out bucket that previously did not have any objects, they both tried to do an equivalent of mkdir .git/objects/$fanout && chmod $shared_perm .git/objects/$fanout before writing into their file .git/objects/$fanout/$remainder, one of which could have failed unnecessarily when the second invocation of mkdir found that the directory already has been created by the first one. * jh/loose-object-dirs-creation-race: sha1_file.c:create_tmpfile(): Fix race when creating loose object dirs	2013-12-05 12:54:14 -08:00
Jeff King	b039718d92	drop support for "experimental" loose objects In git v1.4.3, we introduced a new loose object format that encoded some object information outside of the zlib stream. Ultimately the format was dropped in v1.5.3, but we kept the reading side around to help people migrate objects. Each time we open a loose object, we use a heuristic to check whether it is in the normal loose format, or the experimental one. This heuristic is robust in the face of valid data, but it tends to treat corrupted or garbage data as an experimental object. With the regular format, we would notice quickly that zlib's crc does not check out and complain. With the experimental object, we are likely to extract a nonsensical object size and try to allocate a huge buffer, resulting in xmalloc calling "die". This latter behavior is much worse, for two reasons. One, git reports an allocation error when the real error is corruption. And two, the program dies unconditionally, so you cannot even run fsck (which would otherwise ignore the broken object and keep going). We could try to improve the heuristic to err on the side of normal objects in the face of corruption, but there is really little point. The experimental format is long-dead, and was never enabled by default to begin with. We can instead simply remove it. The only affected repository would be one that explicitly set core.legacyheaders in 2007, and then never repacked in the intervening 6 years. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-11-21 11:43:42 -08:00
Junio C Hamano	4ef8d1dd03	sha1_loose_object_info(): do not return success on missing object Since `052fe5ea` (sha1_loose_object_info: make type lookup optional, 2013-07-12), sha1_loose_object_info() returns happily without checking if the object in question exists, which is not what the the caller sha1_object_info_extended() expects; the caller does not even bother checking the existence of the object itself. Noticed-by: Sven Brauch <svenbrauch@googlemail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-11-06 11:03:33 -08:00
Junio C Hamano	cfd10568b0	Sync with v1.8.4.2	2013-10-28 10:51:53 -07:00
Johan Herland	b2476a60bd	sha1_file.c:create_tmpfile(): Fix race when creating loose object dirs There are cases (e.g. when running concurrent fetches in a repo) where multiple Git processes concurrently attempt to create loose objects within the same objects/XX/ dir. The creation of the loose object files is (AFAICS) safe from races, but the creation of the objects/XX/ dir in which the loose objects reside is unsafe, for example: Two concurrent fetches - A and B. As part of its fetch, A needs to store 12aaaaa as a loose object. B, on the other hand, needs to store 12bbbbb as a loose object. The objects/12 directory does not already exist. Concurrently, both A and B determine that they need to create the objects/12 directory (because their first call to git_mkstemp_mode() within create_tmpfile() fails witn ENOENT). One of them - let's say A - executes the following mkdir() call before the other. This first call returns success, and A moves on. When B gets around to calling mkdir(), it fails with EEXIST, because A won the race. The mkdir() error causes B to return -1 from create_tmpfile(), which propagates all the way, resulting in the fetch failing with: error: unable to create temporary file: File exists fatal: failed to write object fatal: unpack-objects failed Although it's hard to add a testcase reproducing this issue, it's easy to provoke if we insert a sleep after the if (mkdir(buffer, 0777) \|\| adjust_shared_perm(buffer)) return -1; block, and then run two concurrent "git fetch"es against the same repo. The fix is to simply handle mkdir() failing with EEXIST as a success. If EEXIST is somehow returned for the wrong reasons (because the relevant objects/XX is not a directory, or is otherwise unsuitable for object storage), the following call to adjust_shared_perm(), or ultimately the retried call to git_mkstemp_mode() will fail, and we end up returning error from create_tmpfile() in any case. Note that there are still cases where two users with unsuitable umasks in a shared repo can end up in two races where one user first wins the mkdir() race to create an objects/XX/ directory, and then the other user wins the adjust_shared_perms() race to chmod() that directory, but fails because it is (transiently, until the first users completes its chmod()) unwriteable to the other user. However, (an equivalent of) this race also exists before this patch, and is made no worse by this patch. Signed-off-by: Johan Herland <johan@herland.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-10-28 09:50:34 -07:00
Christian Couder	3fc0dca9ce	sha1_file: move comment about return value where it belongs Commit `5b0864070` (sha1_object_info_extended: make type calculation optional, Jul 12 2013) changed the return value of the sha1_object_info_extended function to 0/-1 for success/error. Previously this function returned the object type for success or -1 for error. But unfortunately the above commit forgot to change or move the comment above this function that says "returns enum object_type or negative". To fix this inconsistency, let's move the comment above the sha1_object_info function where it is still true. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-10-28 09:07:01 -07:00
Vicent Marti	ec73f5807c	sha1_file: export `git_open_noatime` The `git_open_noatime` helper can be of general interest for other consumers of git's different on-disk formats. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-10-24 15:44:52 -07:00
Jonathan Nieder	87bcf148d7	Merge branch 'nd/unpack-entry-optim-in-pack-objects' * nd/unpack-entry-optim-in-pack-objects: pack-objects: no crc check when the cached version is used	2013-09-24 23:29:55 -07:00
Junio C Hamano	5ff9f2351a	Merge branch 'jk/has-sha1-file-retry-packed' When an object is not found after checking the packfiles and then loose object directory, read_sha1_file() re-checks the packfiles to prevent racing with a concurrent repacker; teach the same logic to has_sha1_file(). * jk/has-sha1-file-retry-packed: has_sha1_file: re-check pack directory before giving up	2013-09-17 11:41:35 -07:00
Nguyễn Thái Ngọc Duy	77965f8b29	pack-objects: no crc check when the cached version is used Current code makes pack-objects always do check_pack_crc() in unpack_entry() even if right after that we find out there's a cached version and pack access is not needed. Swap two code blocks, search for cached version first, then check crc. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-09-13 11:28:33 -07:00
Junio C Hamano	04fbba0119	Merge branch 'bc/unuse-packfile' Handle memory pressure and file descriptor pressure separately when deciding to release pack windows to honor resource limits. * bc/unuse-packfile: Don't close pack fd when free'ing pack windows sha1_file: introduce close_one_pack() to close packs on fd pressure	2013-09-04 12:30:21 -07:00
Jeff King	45e8a74873	has_sha1_file: re-check pack directory before giving up When we read a sha1 file, we first look for a packed version, then a loose version, and then re-check the pack directory again before concluding that we cannot find it. This lets us handle a process that is writing to the repository simultaneously (e.g., receive-pack writing a new pack followed by a ref update, or git-repack packing existing loose objects into a new pack). However, we do not do the same trick with has_sha1_file; we only check the packed objects once, followed by loose objects. This means that we might incorrectly report that we do not have an object, even though we could find it if we simply re-checked the pack directory. By itself, this is usually not a big deal. The other process is running simultaneously, so we may run has_sha1_file before it writes, anyway. It is a race whether we see the object or not. However, we may also see other things the writing process has done (like updating refs); and in that case, we must be able to also see the new objects. For example, imagine we are doing a for_each_ref iteration, and somebody simultaneously pushes. Receive-pack may write the pack and update a ref after we have examined the objects/pack directory, but before the iteration gets to the updated ref. When we do finally see the updated ref, for_each_ref will call has_sha1_file to check whether the ref is broken. If has_sha1_file returns the wrong answer, we erroneously will think that the ref is broken. For a normal iteration without DO_FOR_EACH_INCLUDE_BROKEN, this means that the caller does not see the ref at all (neither the old nor the new value). So not only will we fail to see the new value of the ref (which is acceptable, since we are running simultaneously with the writer, and we might well read the ref before the writer commits its write), but we will not see the old value either. For programs that act on reachability like pack-objects or prune, this can cause data loss, as we may see the objects referenced by the original ref value as dangling (and either omit them from the pack, or delete them via prune). There's no test included here, because the success case is two processes running simultaneously forever. But you can replicate the issue with: # base.sh # run this in one terminal; it creates and pushes # repeatedly to a repository git init parent && (cd parent && # create a base commit that will trigger us looking at # the objects/pack directory before we hit the updated ref echo content >file && git add file && git commit -m base && # set the unpack limit abnormally low, which # lets us simulate full-size pushes using tiny ones git config receive.unpackLimit 1 ) && git clone parent child && cd child && n=0 && while true; do echo $n >file && git add file && git commit -m $n && git push origin HEAD:refs/remotes/child/master && n=$(($n + 1)) done # fsck.sh # now run this simultaneously in another terminal; it # repeatedly fscks, looking for us to consider the # newly-pushed ref broken. We cannot use for-each-ref # here, as it uses DO_FOR_EACH_INCLUDE_BROKEN, which # skips the has_sha1_file check (and if it wants # more information on the object, it will actually read # the object, which does the proper two-step lookup) cd parent && while true; do broken=`git fsck 2>&1 \| grep remotes/child` if test -n "$broken"; then echo $broken exit 1 fi done Without this patch, the fsck loop fails within a few seconds (and almost instantly if the test repository actually has a large number of refs). With it, the two can run indefinitely. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-08-30 14:53:45 -07:00
Brandon Casey	7c3ecb3254	Don't close pack fd when free'ing pack windows Now that close_one_pack() has been introduced to handle file descriptor pressure, it is not strictly necessary to close the pack file descriptor in unuse_one_window() when we're under memory pressure. Jeff King provided a justification for leaving the pack file open: If you close packfile descriptors, you can run into racy situations where somebody else is repacking and deleting packs, and they go away while you are trying to access them. If you keep a descriptor open, you're fine; they last to the end of the process. If you don't, then they disappear from under you. For normal object access, this isn't that big a deal; we just rescan the packs and retry. But if you are packing yourself (e.g., because you are a pack-objects started by upload-pack for a clone or fetch), it's much harder to recover (and we print some warnings). Let's do so (or uh, not do so). Signed-off-by: Brandon Casey <drafnel@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-08-02 09:27:26 -07:00
Brandon Casey	88d0db5557	sha1_file: introduce close_one_pack() to close packs on fd pressure When the number of open packs exceeds pack_max_fds, unuse_one_window() is called repeatedly to attempt to release the least-recently-used pack windows, which, as a side-effect, will also close a pack file after closing its last open window. If a pack file has been opened, but no windows have been allocated into it, it will never be selected by unuse_one_window() and hence its file descriptor will not be closed. When this happens, git may exceed the number of file descriptors permitted by the system. This latter situation can occur in show-ref or receive-pack during ref advertisement. During ref advertisement, receive-pack will iterate over every ref in the repository and advertise it to the client after ensuring that the ref exists in the local repository. If the ref is located inside a pack, then the pack is opened to ensure that it exists, but since the object is not actually read from the pack, no mmap windows are allocated. When the number of open packs exceeds pack_max_fds, unuse_one_window() will not be able to find any windows to free and will not be able to close any packs. Once the per-process file descriptor limit is exceeded, receive-pack will produce a warning, not an error, for each pack it cannot open, and will then most likely fail with an error to spawn rev-list or index-pack like: error: cannot create standard input pipe for rev-list: Too many open files error: Could not run 'git rev-list' This may also occur during upload-pack when refs are packed (in the packed-refs file) and the number of packs that must be opened to verify that these packed refs exist exceeds the file descriptor limit. If the refs are loose, then upload-pack will read each ref from the object database (if the object is in a pack, allocating one or more mmap windows for it) in order to peel tags and advertise the underlying object. But when the refs are packed and peeled, upload-pack will use the peeled sha1 in the packed-refs file and will not need to read from the pack files, so no mmap windows will be allocated and just like with receive-pack, unuse_one_window() will never select these opened packs to close. When we have file descriptor pressure, we just need to find an open pack to close. We can leave the existing mmap windows open. If additional windows need to be mapped into the pack file, it will be reopened when necessary. If the pack file has been rewritten in the mean time, open_packed_git_1() should notice when it compares the file size or the pack's sha1 checksum to what was previously read from the pack index, and reject it. Let's introduce a new function close_one_pack() designed specifically for this purpose to search for and close the least-recently-used pack, where LRU is defined as (in order of preference): * pack with oldest mtime and no allocated mmap windows * pack with the least-recently-used windows, i.e. the pack with the oldest most-recently-used window, where none of the windows are in use * pack with the least-recently-used windows Signed-off-by: Brandon Casey <drafnel@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-08-02 08:53:54 -07:00
Junio C Hamano	356df9bd8d	Merge branch 'jk/cat-file-batch-optim' If somebody wants to only know on-disk footprint of an object without having to know its type or payload size, we can bypass a lot of code to cheaply learn it. * jk/cat-file-batch-optim: Fix some sparse warnings sha1_object_info_extended: pass object_info to helpers sha1_object_info_extended: make type calculation optional packed_object_info: make type lookup optional packed_object_info: hoist delta type resolution to helper sha1_loose_object_info: make type lookup optional sha1_object_info_extended: rename "status" to "type" cat-file: disable object/refname ambiguity check for batch mode	2013-07-24 19:21:21 -07:00
Ramsay Jones	d099b7173d	Fix some sparse warnings Sparse issues some "Using plain integer as NULL pointer" warnings. Each warning relates to the use of an '{0}' initialiser expression in the declaration of an 'struct object_info'. The first field of this structure has pointer type. Thus, in order to suppress these warnings, we replace the initialiser expression with '{NULL}'. Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-18 16:43:47 -07:00
Junio C Hamano	802f878b86	Merge branch 'jk/in-pack-size-measurement' "git cat-file --batch-check=<format>" is added, primarily to allow on-disk footprint of objects in packfiles (often they are a lot smaller than their true size, when expressed as deltas) to be reported. * jk/in-pack-size-measurement: pack-revindex: radix-sort the revindex pack-revindex: use unsigned to store number of objects cat-file: split --batch input lines on whitespace cat-file: add %(objectsize:disk) format atom cat-file: add --batch-check=<format> cat-file: refactor --batch option parsing cat-file: teach --batch to stream blob objects t1006: modernize output comparisons teach sha1_object_info_extended a "disk_size" query zero-initialize object_info structs	2013-07-18 12:59:41 -07:00
Jeff King	23c339c0f2	sha1_object_info_extended: pass object_info to helpers We take in a "struct object_info" which contains pointers to storage for items the caller cares about. But then rather than pass the whole object to the low-level loose/packed helper functions, we pass the individual pointers. Let's pass the whole struct instead, which will make adding more items later easier. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:29:27 -07:00
Jeff King	5b0864070e	sha1_object_info_extended: make type calculation optional Each caller of sha1_object_info_extended sets up an object_info struct to tell the function which elements of the object it wants to get. Until now, getting the type of the object has always been required (and it is returned via the return type rather than a pointer in object_info). This can involve actually opening a loose object file to determine its type, or following delta chains to determine a packed file's base type. These effects produce a measurable slow-down when doing a "cat-file --batch-check" that does not include %(objecttype). This patch adds a "typep" query to struct object_info, so that it can be optionally queried just like size and disk_size. As a result, the return type of the function is no longer the object type, but rather 0/-1 for success/error. As there are only three callers total, we just fix up each caller rather than keep a compatibility wrapper: 1. The simpler sha1_object_info wrapper continues to always ask for and return the type field. 2. The istream_source function wants to know the type, and so always asks for it. 3. The cat-file batch code asks for the type only when %(objecttype) is part of the format string. On linux.git, the best-of-five for running: $ git rev-list --objects --all >objects $ time git cat-file --batch-check='%(objectsize:disk)' on a fully packed repository goes from: real 0m8.680s user 0m8.160s sys 0m0.512s to: real 0m7.205s user 0m6.580s sys 0m0.608s Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:16:36 -07:00
Jeff King	412916ee13	packed_object_info: make type lookup optional Currently, packed_object_info can save some work by not calculating the size or disk_size of the object if the caller is not interested. However, it always calculates the true object type, whether the caller cares or not, and only optionally returns the easy-to-get "representation type". Let's swap these types. The function will now return the representation type (or OBJ_BAD on failure), and will only optionally fill in the true type. There should be no behavior change yet, as the only caller, sha1_object_info_extended, will always feed it a type pointer. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:14:06 -07:00
Jeff King	90191d37ab	packed_object_info: hoist delta type resolution to helper To calculate the type of a packed object, we must walk down its delta chain until we hit a true base object with a real type. Most of the code in packed_object_info is for handling this case. Let's hoist it out into a separate helper function, which will make it easier to make the type-lookup optional in the future (and keep our indentation level sane). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:13:23 -07:00
Jeff King	052fe5eaca	sha1_loose_object_info: make type lookup optional Until recently, the only items to request from sha1_object_info_extended were type and size. This meant that we always had to open a loose object file to determine one or the other. But with the addition of the disk_size query, it's possible that we can fulfill the query without even opening the object file at all. However, since the function interface always returns the type, we have no way of knowing whether the caller cares about it or not. This patch only modified sha1_loose_object_info to make type lookup optional using an out-parameter, similar to the way the size is handled (and the return value is "0" or "-1" for success or error, respectively). There should be no functional change yet, though, as sha1_object_info_extended, the only caller, will always ask for a type. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:10:04 -07:00
Jeff King	f2f57e31f6	sha1_object_info_extended: rename "status" to "type" The value we get from each low-level object_info function (e.g., loose, packed) is actually the object type (or -1 for error). Let's explicitly call it "type", which will make further refactorings easier to read. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:10:03 -07:00
Jeff King	161f00e708	teach sha1_object_info_extended a "disk_size" query Using sha1_object_info_extended, a caller can find out the type of an object, its size, and information about where it is stored. In addition to the object's "true" size, it can also be useful to know the size that the object takes on disk (e.g., to generate statistics about which refs consume space). This patch adds a "disk_sizep" field to "struct object_info", and fills it in during sha1_object_info_extended if it is non-NULL. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-07 10:53:22 -07:00
Jeff King	7c07385d90	zero-initialize object_info structs The sha1_object_info_extended function expects the caller to provide a "struct object_info" which contains pointers to "query" items that will be filled in. The purpose of providing pointers rather than storing the response directly in the struct is so that callers can choose not to incur the expense in finding particular fields that they do not care about. Right now the only query item is "sizep", and all callers set it explicitly to choose whether or not to query it; they can then leave the rest of the struct uninitialized. However, as we add new query items, each caller will have to be updated to explicitly turn off the new ones (by setting them to NULL). Instead, let's teach each caller to zero-initialize the struct, so that they do not have to learn about each new query item added. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-07 10:50:13 -07:00
Junio C Hamano	ee64e345b1	Merge branch 'jk/unpack-entry-fallback-to-another' * jk/unpack-entry-fallback-to-another: unpack_entry: do not die when we fail to apply a delta t5303: drop "count=1" from corruption dd	2013-06-23 14:53:20 -07:00
Junio C Hamano	8f0c843aab	Merge branch 'nd/traces' * nd/traces: git.txt: document GIT_TRACE_PACKET core: use env variable instead of config var to turn on logging pack access	2013-06-20 16:02:28 -07:00
Jeff King	1ee886c1f0	unpack_entry: do not die when we fail to apply a delta When we try to load an object from disk and fail, our general strategy is to see if we can get it from somewhere else (e.g., a loose object). That lets users fix corruption problems by copying known-good versions of objects into the object database. We already handle the case where we were not able to read the delta from disk. However, when we find that the delta we read does not apply, we simply die. This case is harder to trigger, as corruption in the delta data itself would trigger a crc error from zlib. However, a corruption that pointed us at the wrong delta base might cause it. We can do the same "fail and try to find the object elsewhere" trick instead of dying. This not only gives us a chance to recover, but also puts us on code paths that will alert the user to the problem (with the current message, they do not even know which sha1 caused the problem). Note that unlike some other pack corruptions, we do not recover automatically from this case when doing a repack. There is nothing apparently wrong with the delta, as it points to a valid, accessible object, and we realize the error only when the resulting size does not match up. And in theory, one could even have a case where the corrupted size is the same, and the problem would only be noticed by recomputing the sha1. We can get around this by recomputing the deltas with --no-reuse-delta, which our test does (and this is probably good advice for anyone recovering from pack corruption). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-06-14 14:56:09 -07:00
Junio C Hamano	cf6de2968c	Merge branch 'tr/sha1-file-silence-loose-object-info-under-prune-race' * tr/sha1-file-silence-loose-object-info-under-prune-race: sha1_file: silence sha1_loose_object_info	2013-06-11 13:31:19 -07:00
Nguyễn Thái Ngọc Duy	b12ca9631f	core: use env variable instead of config var to turn on logging pack access `5f44324` (core: log offset pack data accesses happened - 2011-07-06) provides a way to observe pack access patterns via a config switch. Setting an environment variable looks more obvious than a config var, especially when you just need to _observe_, and more inline with other tracing knobs we have. Document it as it may be useful for remote troubleshooting. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-06-09 16:07:50 -07:00
Thomas Rast	dbea72a8c0	sha1_file: silence sha1_loose_object_info sha1_object_info() returns -1 (OBJ_BAD) if it cannot find the object for some reason, which suggests that it wants the _caller_ to report this error. However, part of its work happens in sha1_loose_object_info, which _does_ report errors itself. This is doubly strange because: * packed_object_info(), which is the other half of the duo, does _not_ report this. * In the event that an object is packed and pruned while sha1_object_info_extended() goes looking for it, we would erroneously show the error -- even though the code of the latter function purports to handle this case gracefully. * A caller might invoke sha1_object_info() to find the type of an object even if that object is not known to exist. Silence this error. The others remain untouched as a corrupt object is a much more grave error than it merely being absent. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-06-03 12:51:53 -07:00
Felipe Contreras	4b8f772ce4	sha1_file: trivial style cleanup Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-06-03 10:14:48 -07:00
Junio C Hamano	7c2e8fc684	Merge branch 'tr/unpack-entry-use-after-free-fix' * tr/unpack-entry-use-after-free-fix: unpack_entry: avoid freeing objects in base cache	2013-05-03 15:18:04 -07:00
Thomas Rast	756a042600	unpack_entry: avoid freeing objects in base cache In the !delta_data error path of unpack_entry(), we run free(base). This became a window for use-after-free() in `abe601b` (sha1_file: remove recursion in unpack_entry, 2013-03-27), as follows: Before `abe601b`, we got the 'base' from cache_or_unpack_entry(..., 0); keep_cache=0 tells it to also remove that entry. So the 'base' is at this point not cached, and freeing it in the error path is the right thing. After `abe601b`, the structure changed: we use a three-phase approach where phase 1 finds the innermost base or a base that is already in the cache. In phase 3 we therefore know that all bases we unpack are not part of the delta cache yet. (Observe that we pop from the cache in phase 1, so this is also true for the very first base.) So we make no further attempts to look up the bases in the cache, and just call add_delta_base_cache() on every base object we have assembled. But the !delta_data error path remained unchanged, and now calls free() on a base that has already been entered in the cache. This means that there is a use-after-free if we later use the same base again. So remove that free(); we are still going to use that data. Reported-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-04-30 15:43:48 -07:00
Junio C Hamano	193e28f050	Merge branch 'tr/packed-object-info-wo-recursion' Attempts to reduce the stack footprint of sha1_object_info() and unpack_entry() codepaths. * tr/packed-object-info-wo-recursion: sha1_file: remove recursion in unpack_entry Refactor parts of in_delta_base_cache/cache_or_unpack_entry sha1_file: remove recursion in packed_object_info	2013-04-18 11:46:23 -07:00
Junio C Hamano	b9c78e9723	Merge branch 'jk/check-corrupt-objects-carefully' Have the streaming interface and other codepaths more carefully examine for corrupt objects. * jk/check-corrupt-objects-carefully: clone: leave repo in place after checkout errors clone: run check_everything_connected clone: die on errors from unpack_trees add tests for cloning corrupted repositories streaming_write_entry: propagate streaming errors add test for streaming corrupt blobs avoid infinite loop in read_istream_loose read_istream_filtered: propagate read error from upstream check_sha1_signature: check return value from read_istream stream_blob_to_fd: detect errors reading from stream	2013-04-03 09:34:29 -07:00
Junio C Hamano	37ba4c61d0	Merge branch 'sw/safe-create-leading-dir-race' * sw/safe-create-leading-dir-race: safe_create_leading_directories: fix race that could give a false negative	2013-04-02 15:09:48 -07:00
Jeff King	f54fac5378	check_sha1_signature: check return value from read_istream It's possible for read_istream to return an error, in which case we just end up in an infinite loop (aside from EOF, we do not even look at the result, but just feed it straight into our running hash). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-27 13:46:55 -07:00
Thomas Rast	abe601bba5	sha1_file: remove recursion in unpack_entry Similar to the recursion in packed_object_info(), this leads to problems on stack-space-constrained systems in the presence of long delta chains. We proceed in three phases: 1. Dig through the delta chain, saving each delta object's offsets and size on an ad-hoc stack. 2. Unpack the base object at the bottom. 3. Unpack and apply the deltas from the stack. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-27 13:25:16 -07:00
Thomas Rast	84dd81c126	Refactor parts of in_delta_base_cache/cache_or_unpack_entry The delta base cache lookup and test were shared. Refactor them; we'll need both parts again. Also, we'll use the clearing routine later. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-27 13:24:43 -07:00
Steven Walter	928734d993	safe_create_leading_directories: fix race that could give a false negative If two processes are racing to create the same directory tree, they will both see that the directory doesn't exist, both try to mkdir(), and one of them will fail. This is okay, as we only care that the directory gets created. So, we add a check for EEXIST from mkdir, and continue when the directory exists, taking the same codepath as the case where the earlier stat() succeeds and finds a directory. Signed-off-by: Steven Walter <stevenrwalter@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-26 21:07:42 -07:00
Thomas Rast	790d96c023	sha1_file: remove recursion in packed_object_info packed_object_info() and packed_delta_info() were mutually recursive. The former would handle ordinary types and defer deltas to the latter; the latter would use the former to resolve the delta base. This arrangement, however, leads to trouble with threaded index-pack and long delta chains on platforms where thread stacks are small, as happened on OS X (512kB thread stacks by default) with the chromium repo. The task of the two functions is not all that hard to describe without any recursion, however. It proceeds in three steps: - determine the representation type and size, based on the outermost object (delta or not) - follow through the delta chain, if any - determine the object type from what is found at the end of the delta chain The only complication stems from the error recovery. If parsing fails at any step, we want to mark that object (within the pack) as bad and try getting the corresponding SHA1 from elsewhere. If that also fails, we want to repeat this process back up the delta chain until we find a reasonable solution or conclude that there is no way to reconstruct the object. (This is conveniently checked by t5303.) To achieve that within the pack, we keep track of the entire delta chain in a stack. When things go sour, we process that stack from the top, marking entries as bad and attempting to re-resolve by sha1. To avoid excessive malloc(), the stack starts out with a small stack-allocated array. The choice of 64 is based on the default of pack.depth, which is 50, in the hope that it covers "most" delta chains without any need for malloc(). It's much harder to make the actual re-resolving by sha1 nonrecursive, so we skip that. If you can't afford that recursion, your corruption problems are more serious than your stack size problems. Reported-by: Stefan Zager <szager@google.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-25 15:48:18 -07:00
Nguyễn Thái Ngọc Duy	543c5caa6c	count-objects: report garbage files in pack directory too prepare_packed_git_one() is modified to allow count-objects to hook a report function to so we don't need to duplicate the pack searching logic in count-objects.c. When report_pack_garbage is NULL, the overhead is insignificant. The garbage is reported with warning() instead of error() in packed garbage case because it's not an error to have garbage. Loose garbage is still reported as errors and will be converted to warnings later. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-02-15 08:13:13 -08:00
Nguyễn Thái Ngọc Duy	d90906a902	sha1_file: reorder code in prepare_packed_git_one() The current loop does while (...) { if (it is not an .idx file) continue; process .idx file; } and is reordered to while (...) { if (it is an .idx file) { process .idx file; } } This makes it easier to add new extension file processing. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-02-13 07:42:05 -08:00
Michael Haggerty	c595016402	link_alt_odb_entries(): take (char *, len) rather than two pointers Change link_alt_odb_entries() to take the length of the "alt" parameter rather than a pointer to the end of the "alt" string. This is the more common calling convention and simplifies the code a tiny bit. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net>	2012-11-08 12:06:53 -05:00
Michael Haggerty	6eac50d827	link_alt_odb_entries(): use string_list_split_in_place() Change link_alt_odb_entry() to take a NUL-terminated string instead of (char *, len). Use string_list_split_in_place() rather than inline code in link_alt_odb_entries(). This approach saves some code and also avoids the (probably harmless) error of passing a non-NUL-terminated string to is_absolute_path(). Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net>	2012-11-08 12:06:53 -05:00
Joachim Schmitz	a0788266d3	sha1_file.c: introduce get_max_fd_limit() helper Not all platforms have getrlimit(), but there are other ways to see the maximum number of files that a process can have open. If getrlimit() is unavailable, fall back to sysconf(_SC_OPEN_MAX) if available, and use OPEN_MAX from <limits.h>. Signed-off-by: Joachim Schmitz <jojo@schmitz-digital.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-08-24 09:46:01 -07:00
Junio C Hamano	fbea95ce10	Merge branch 'hv/link-alt-odb-entry' The code to avoid mistaken attempt to add the object directory itself as its own alternate could read beyond end of a string while comparison. * hv/link-alt-odb-entry: link_alt_odb_entry: fix read over array bounds reported by valgrind	2012-07-30 12:55:01 -07:00
Heiko Voigt	cb2912c324	link_alt_odb_entry: fix read over array bounds reported by valgrind pfxlen can be longer than the path in objdir when relative_base contains the path to gits object directory. Here we are interested in checking if ent->base[] (the part that corresponds to .git/objects) is the same string as objdir, and the code NUL-terminated ent->base[] to LEADING PATH\0XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\0 in preparation for these "duplicate check" step (before we return from the function, the first NUL is turned into '/' so that we can fill XX when probing for loose objects). All we need to do is to compare the string with the path to our object directory. Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-07-29 18:02:51 -07:00
Junio C Hamano	4809ff858b	Merge branch 'hv/submodule-alt-odb' When peeking into object stores of submodules, the code forgot that they might borrow objects from alternate object stores on their own. By Heiko Voigt * hv/submodule-alt-odb: teach add_submodule_odb() to look for alternates	2012-05-23 13:35:06 -07:00
Heiko Voigt	5e73633dbf	teach add_submodule_odb() to look for alternates Since we allow to link other object databases when loading a submodules database we should also load possible alternates. Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-05-14 11:56:42 -07:00
Pete Wyckoff	5eaeda70de	remove blank filename in error message When write_loose_object() finds that it is unable to create a temporary file, it complains, for instance: unable to create temporary sha1 filename : Too many open files That extra space was supposed to be the name of the file, and will be an empty string if the git_mkstemps_mode() fails. The name of the temporary file is unimportant; delete it. Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-04-30 15:45:54 -07:00
Pete Wyckoff	82247e9bd5	remove superfluous newlines in error messages The error handling routines add a newline. Remove the duplicate ones in error messages. Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-04-30 15:45:51 -07:00
Nguyễn Thái Ngọc Duy	090ea12671	parse_object: avoid putting whole blob in core Traditionally, all the callers of check_sha1_signature() first called read_sha1_file() to prepare the whole object data in core, and called this function. The function is used to revalidate what we read from the object database actually matches the object name we used to ask for the data from the object database. Update the API to allow callers to pass NULL as the object data, and have the function read and hash the object data using streaming API to recompute the object name, without having to hold everything in core at the same time. This is most useful in parse_object() that parses a blob object, because this caller does not have to keep the actual blob data around in memory after a "struct blob" is returned. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-03-07 09:07:38 -08:00
Junio C Hamano	a09a0c2709	Merge branch 'jk/maint-avoid-streaming-filtered-contents' into maint * jk/maint-avoid-streaming-filtered-contents: do not stream large files to pack when filters are in use teach dry-run convert_to_git not to require a src buffer teach convert_to_git a "dry run" mode	2012-03-04 22:16:40 -08:00
Junio C Hamano	31e3d834b3	Merge branch 'jk/maint-avoid-streaming-filtered-contents' * jk/maint-avoid-streaming-filtered-contents: do not stream large files to pack when filters are in use teach dry-run convert_to_git not to require a src buffer teach convert_to_git a "dry run" mode	2012-02-26 23:05:38 -08:00
Jeff King	4f22b1015d	do not stream large files to pack when filters are in use Because git's object format requires us to specify the number of bytes in the object in its header, we must know the size before streaming a blob into the object database. This is not a problem when adding a regular file, as we can get the size from stat(). However, when filters are in use (such as autocrlf, or the ident, filter, or eol gitattributes), we have no idea what the ultimate size will be. The current code just punts on the whole issue and ignores filter configuration entirely for files larger than core.bigfilethreshold. This can generate confusing results if you use filters for large binary files, as the filter will suddenly stop working as the file goes over a certain size. Rather than try to handle unknown input sizes with streaming, this patch just turns off the streaming optimization when filters are in use. This has a slight performance regression in a very specific case: if you have autocrlf on, but no gitattributes, a large binary file will avoid the streaming code path because we don't know beforehand whether it will need conversion or not. But if you are handling large binary files, you should be marking them as such via attributes (or at least not using autocrlf, and instead marking your text files as such). And the flip side is that if you have a large _non_-binary file, there is a correctness improvement; before we did not apply the conversion at all. The first half of the new t1051 script covers these failures on input. The second half tests the matching output code paths. These already work correctly, and do not need any adjustment. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-24 14:18:20 -08:00
Junio C Hamano	f3ccea8dd4	Merge branch 'nd/find-pack-entry-recent-cache-invalidation' into maint * nd/find-pack-entry-recent-cache-invalidation: find_pack_entry(): do not keep packed_git pointer locally sha1_file.c: move the core logic of find_pack_entry() into fill_pack_entry()	2012-02-21 14:56:36 -08:00
Junio C Hamano	c6a4e3f7a7	Merge branch 'mm/empty-loose-error-message' into maint * mm/empty-loose-error-message: fsck: give accurate error message on empty loose object files	2012-02-16 14:00:25 -08:00
Junio C Hamano	dd5253b4bd	Merge branch 'nd/find-pack-entry-recent-cache-invalidation' * nd/find-pack-entry-recent-cache-invalidation: find_pack_entry(): do not keep packed_git pointer locally sha1_file.c: move the core logic of find_pack_entry() into fill_pack_entry()	2012-02-12 22:43:03 -08:00
Junio C Hamano	8c18a6f3fa	Merge branch 'mm/empty-loose-error-message' * mm/empty-loose-error-message: fsck: give accurate error message on empty loose object files	2012-02-12 22:42:02 -08:00
Matthieu Moy	33e42de0d2	fsck: give accurate error message on empty loose object files Since `3ba7a06552` (A loose object is not corrupt if it cannot be read due to EMFILE), "git fsck" on a repository with an empty loose object file complains with the error message fatal: failed to read object <sha1>: Invalid argument This comes from a failure of mmap on this empty file, which sets errno to EINVAL. Instead of calling xmmap on empty file, we display a clean error message ourselves, and return a NULL pointer. The new message is error: object file .git/objects/09/<rest-of-sha1> is empty fatal: loose object <sha1> (stored in .git/objects/09/<rest-of-sha1>) is corrupt The second line was already there before the regression in `3ba7a06552`, and the first is an additional message, that should help diagnosing the problem for the user. Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-06 11:05:36 -08:00
Nguyễn Thái Ngọc Duy	c01f51cc75	find_pack_entry(): do not keep packed_git pointer locally Commit `f7c22cc` (always start looking up objects in the last used pack first - 2007-05-30) introduce a static packed_git* pointer as an optimization. The kept pointer however may become invalid if free_pack_by_name() happens to free that particular pack. Current code base does not access packs after calling free_pack_by_name() so it should not be a problem. Anyway, move the pointer out so that free_pack_by_name() can reset it to avoid running into troubles in future. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-01 14:12:42 -08:00
Nguyễn Thái Ngọc Duy	95099731bf	sha1_file.c: move the core logic of find_pack_entry() into fill_pack_entry() The new helper function implements the logic to find the offset for the object in one pack and fill a pack_entry structure. The next patch will restructure the loop and will call the helper from two places. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-01 14:12:41 -08:00
Ævar Arnfjörð Bjarmason	ab1900a36e	Appease Sun Studio by renaming "tmpfile" On Solaris the system headers define the "tmpfile" name, which'll cause Git compiled with Sun Studio 12 Update 1 to whine about us redefining the name: "pack-write.c", line 76: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "sha1_file.c", line 2455: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "fast-import.c", line 858: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "builtin/index-pack.c", line 175: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) Just renaming the "tmpfile" variable to "tmp_file" in the relevant places is the easiest way to fix this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-12-21 10:21:04 -08:00
Junio C Hamano	48b303675a	Merge branch 'jc/stream-to-pack' * jc/stream-to-pack: bulk-checkin: replace fast-import based implementation csum-file: introduce sha1file_checkpoint finish_tmp_packfile(): a helper function create_tmp_packfile(): a helper function write_pack_header(): a helper function Conflicts: pack.h	2011-12-16 22:33:40 -08:00
Junio C Hamano	df6246ed78	Merge branch 'nd/misc-cleanups' into maint * nd/misc-cleanups: unpack_object_header_buffer(): clear the size field upon error tree_entry_interesting: make use of local pointer "item" tree_entry_interesting(): give meaningful names to return values read_directory_recursive: reduce one indentation level get_tree_entry(): do not call find_tree_entry() on an empty tree tree-walk.c: do not leak internal structure in tree_entry_len()	2011-12-13 22:02:51 -08:00
Junio C Hamano	62cdb6b23a	Merge branch 'nd/misc-cleanups' * nd/misc-cleanups: unpack_object_header_buffer(): clear the size field upon error tree_entry_interesting: make use of local pointer "item" tree_entry_interesting(): give meaningful names to return values read_directory_recursive: reduce one indentation level get_tree_entry(): do not call find_tree_entry() on an empty tree tree-walk.c: do not leak internal structure in tree_entry_len()	2011-12-05 15:10:20 -08:00
Junio C Hamano	568508e765	bulk-checkin: replace fast-import based implementation This extends the earlier approach to stream a large file directly from the filesystem to its own packfile, and allows "git add" to send large files directly into a single pack. Older code used to spawn fast-import, but the new bulk-checkin API replaces it. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-12-01 11:46:09 -08:00
Ramkumar Ramachandra	5e12e78e52	sha1_file: don't mix enum with int Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-11-15 16:09:20 -08:00
Junio C Hamano	ea4f9685cb	unpack_object_header_buffer(): clear the size field upon error The callers do not use the returned size when the function says it did not use any bytes and sets the type to OBJ_BAD, so this should not matter in practice, but it is a good code hygiene anyway. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-27 11:42:57 -07:00
Junio C Hamano	2070950633	Merge branch 'jk/maint-pack-objects-compete-with-delete' * jk/maint-pack-objects-compete-with-delete: downgrade "packfile cannot be accessed" errors to warnings pack-objects: protect against disappearing packs	2011-10-21 16:04:33 -07:00
Jeff King	58a6a9cc43	downgrade "packfile cannot be accessed" errors to warnings These can happen if another process simultaneously prunes a pack. But that is not usually an error condition, because a properly-running prune should have repacked the object into a new pack. So we will notice that the pack has disappeared unexpectedly, print a message, try other packs (possibly after re-scanning the list of packs), and find it in the new pack. Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-14 11:43:09 -07:00
Jeff King	4c08018204	pack-objects: protect against disappearing packs It's possible that while pack-objects is running, a simultaneously running prune process might delete a pack that we are interested in. Because we load the pack indices early on, we know that the pack contains our item, but by the time we try to open and map it, it is gone. Since `c715f78`, we already protect against this in the normal object access code path, but pack-objects accesses the packs at a lower level. In the normal access path, we call find_pack_entry, which will call find_pack_entry_one on each pack index, which does the actual lookup. If it gets a hit, we will actually open and verify the validity of the matching packfile (using c715f78's is_pack_valid). If we can't open it, we'll issue a warning and pretend that we didn't find it, causing us to go on to the next pack (or on to loose objects). Furthermore, we will cache the descriptor to the opened packfile. Which means that later, when we actually try to access the object, we are likely to still have that packfile opened, and won't care if it has been unlinked from the filesystem. Notice the "likely" above. If there is another pack access in the interim, and we run out of descriptors, we could close the pack. And then a later attempt to access the closed pack could fail (we'll try to re-open it, of course, but it may have been deleted). In practice, this doesn't happen because we tend to look up items and then access them immediately. Pack-objects does not follow this code path. Instead, it accesses the packs at a much lower level, using find_pack_entry_one directly. This means we skip the is_pack_valid check, and may end up with the name of a packfile, but no open descriptor. We can add the same is_pack_valid check here. Unfortunately, the access patterns of pack-objects are not quite as nice for keeping lookup and object access together. We look up each object as we find out about it, and the only later when writing the packfile do we necessarily access it. Which means that the opened packfile may be closed in the interim. In practice, however, adding this check still has value, for three reasons. 1. If you have a reasonable number of packs and/or a reasonable file descriptor limit, you can keep all of your packs open simultaneously. If this is the case, then the race is impossible to trigger. 2. Even if you can't keep all packs open at once, you may end up keeping the deleted one open (i.e., you may get lucky). 3. The race window is shortened. You may notice early that the pack is gone, and not try to access it. Triggering the problem without this check means deleting the pack any time after we read the list of index files, but before we access the looked-up objects. Triggering it with this check means deleting the pack means deleting the pack after we do a lookup (and successfully access the packfile), but before we access the object. Which is a smaller window. Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-14 11:42:37 -07:00
Junio C Hamano	e99f8c6dcf	Merge branch 'wh/normalize-alt-odb-path' * wh/normalize-alt-odb-path: sha1_file: normalize alt_odb path before comparing and storing	2011-10-05 12:36:22 -07:00
Hui Wang	5bdf0a8468	sha1_file: normalize alt_odb path before comparing and storing When it needs to compare and add an alt object path to the alt_odb_list, we normalize this path first since comparing normalized path is easy to get correct result. Use strbuf to replace some string operations, since it is cleaner and safer. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Hui Wang <Hui.Wang@windriver.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-09-07 11:47:43 -07:00
Junio C Hamano	2478bd8318	Merge branch 'jc/maint-clone-alternates' * jc/maint-clone-alternates: clone: clone from a repository with relative alternates clone: allow more than one --reference Conflicts: builtin/clone.c	2011-08-28 21:19:21 -07:00
Junio C Hamano	6fcb384869	Merge branch 'rt/zlib-smaller-window' * rt/zlib-smaller-window: test: consolidate definition of $LF Tolerate zlib deflation with window size < 32Kb	2011-08-23 15:40:33 -07:00
Junio C Hamano	e6baf4a1ae	clone: clone from a repository with relative alternates Cloning from a local repository blindly copies or hardlinks all the files under objects/ hierarchy. This results in two issues: - If the repository cloned has an "objects/info/alternates" file, and the command line of clone specifies --reference, the ones specified on the command line get overwritten by the copy from the original repository. - An entry in a "objects/info/alternates" file can specify the object stores it borrows objects from as a path relative to the "objects/" directory. When cloning a repository with such an alternates file, if the new repository is not sitting next to the original repository, such relative paths needs to be adjusted so that they can be used in the new repository. This updates add_to_alternates_file() to take the path to the alternate object store, including the "/objects" part at the end (earlier, it was taking the path to $GIT_DIR and was adding "/objects" itself), as it is technically possible to specify in objects/info/alternates file the path of a directory whose name does not end with "/objects". Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-23 09:56:14 -07:00
Roberto Tyley	7f684a2aff	Tolerate zlib deflation with window size < 32Kb Git currently reports loose objects as 'corrupt' if they've been deflated using a window size less than 32Kb, because the experimental_loose_object() function doesn't recognise the header byte as a zlib header. This patch makes the function tolerant of all valid window sizes (15-bit to 8-bit) - but doesn't sacrifice it's accuracy in distingushing the standard loose-object format from the experimental (now abandoned) format. On memory constrained systems zlib may use a much smaller window size - working on Agit, I found that Android uses a 4KB window; giving a header byte of 0x48, not 0x78. Consequently all loose objects generated appear 'corrupt', which is why Agit is a read-only Git client at this time - I don't want my client to generate Git repos that other clients treat as broken :( This patch makes Git tolerant of different deflate settings - it might appear that it changes experimental_loose_object() to the point where it could incorrectly identify the experimental format as the standard one, but the two criteria (bitmask & checksum) can only give a false result for an experimental object where both of the following are true: 1) object size is exactly 8 bytes when uncompressed (bitmask) 2) [single-byte in-pack git type&size header] * 256 + [1st byte of the following zlib header] % 31 = 0 (checksum) As it happens, for all possible combinations of valid object type (1-4) and window bits (0-7), the only time when the checksum will be divisible by 31 is for 0x1838 - ie object type 1, a Commit - which, due the fields all Commit objects must contain, could never be as small as 8 bytes in size. Given this, the combination of the two criteria (bitmask & checksum) always correctly determines the buffer format, and is more tolerant than the previous version. The alternative to this patch is simply removing support for the experimental format, which I am also totally cool with. References: Android uses a 4KB window for deflation: http://android.git.kernel.org/?p=platform/libcore.git;a=blob;f=luni/src/main/native/java_util_zip_Deflater.cpp;h=c0b2feff196e63a7b85d97cf9ae5bb2583409c28;hb=refs/heads/gingerbread#l53 Code snippet searching for false positives with the zlib checksum: https://gist.github.com/1118177 Signed-off-by: Roberto Tyley <roberto.tyley@guardian.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-11 13:02:47 -07:00
Junio C Hamano	96790ca029	Merge branch 'jc/pack-order-tweak' * jc/pack-order-tweak: pack-objects: optimize "recency order" core: log offset pack data accesses happened	2011-08-05 14:54:57 -07:00
Junio C Hamano	d48929e1c3	Merge branch 'jc/legacy-loose-object' into maint * jc/legacy-loose-object: sha1_file.c: "legacy" is really the current format	2011-08-01 14:43:58 -07:00
Junio C Hamano	d907bf8ef3	Merge branch 'jc/index-pack' * jc/index-pack: verify-pack: use index-pack --verify index-pack: show histogram when emulating "verify-pack -v" index-pack: start learning to emulate "verify-pack -v" index-pack: a miniscule refactor index-pack --verify: read anomalous offsets from v2 idx file write_idx_file: need_large_offset() helper function index-pack: --verify write_idx_file: introduce a struct to hold idx customization options index-pack: group the delta-base array entries also by type Conflicts: builtin/verify-pack.c cache.h sha1_file.c	2011-07-19 09:54:51 -07:00
Junio C Hamano	eb4f4076aa	Merge branch 'jc/zlib-wrap' * jc/zlib-wrap: zlib: allow feeding more than 4GB in one go zlib: zlib can only process 4GB at a time zlib: wrap deflateBound() too zlib: wrap deflate side of the API zlib: wrap inflateInit2 used to accept only for gzip format zlib: wrap remaining calls to direct inflate/inflateEnd zlib wrapper: refactor error message formatter Conflicts: sha1_file.c	2011-07-19 09:33:04 -07:00
Junio C Hamano	5f2e448370	Merge branch 'jc/legacy-loose-object' * jc/legacy-loose-object: sha1_file.c: "legacy" is really the current format	2011-07-13 14:31:34 -07:00
Junio C Hamano	5f44324d88	core: log offset pack data accesses happened In a workload other than "git log" (without pathspec nor any option that causes us to inspect trees and blobs), the recency pack order is said to cause the access jump around quite a bit. Add a hook to allow us observe how bad it is. "git config core.logpackaccess /var/tmp/pal.txt" will give you the log in the specified file. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-07-06 19:09:29 -07:00
Junio C Hamano	ef49a7a012	zlib: zlib can only process 4GB at a time The size of objects we read from the repository and data we try to put into the repository are represented in "unsigned long", so that on larger architectures we can handle objects that weigh more than 4GB. But the interface defined in zlib.h to communicate with inflate/deflate limits avail_in (how many bytes of input are we calling zlib with) and avail_out (how many bytes of output from zlib are we ready to accept) fields effectively to 4GB by defining their type to be uInt. In many places in our code, we allocate a large buffer (e.g. mmap'ing a large loose object file) and tell zlib its size by assigning the size to avail_in field of the stream, but that will truncate the high octets of the real size. The worst part of this story is that we often pass around z_stream (the state object used by zlib) to keep track of the number of used bytes in input/output buffer by inspecting these two fields, which practically limits our callchain to the same 4GB limit. Wrap z_stream in another structure git_zstream that can express avail_in and avail_out in unsigned long. For now, just die() when the caller gives a size that cannot be given to a single zlib call. In later patches in the series, we would make git_inflate() and git_deflate() internally loop to give callers an illusion that our "improved" version of zlib interface can operate on a buffer larger than 4GB in one go. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-06-10 11:52:15 -07:00
Junio C Hamano	55bb5c9147	zlib: wrap deflate side of the API Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use of deflateInit2 in remote-curl.c to tell the library to use gzip header and trailer in git_deflate_init_gzip(). There is only one caller that cares about the status from deflateEnd(). Introduce git_deflate_end_gently() to let that sole caller retrieve the status and act on it (i.e. die) for now, but we would probably want to make inflate_end/deflate_end die when they ran out of memory and get rid of the _gently() kind. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-06-10 11:10:29 -07:00
Junio C Hamano	cc5c54e78b	sha1_file.c: "legacy" is really the current format Every time I look at the read-loose-object codepath, legacy_loose_object() function makes my brain go through mental contortion. When we were playing with the experimental loose object format, it may have made sense to call the traditional format "legacy", in the hope that the experimental one will some day replace it to become official, but it never happened. This renames the function (and negates its return value) to detect if we are looking at the experimental format, and move the code around in its caller which used to do "if we are looing at legacy, do this special case, otherwise the normal case is this". The codepath to read from the loose objects in experimental format is the "unlikely" case. Someday after Git 2.0, we should drop the support of this format. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-06-08 16:39:33 -07:00
Junio C Hamano	3de89c9d42	verify-pack: use index-pack --verify This finally gets rid of the inefficient verify-pack implementation that walks objects in the packfile in their object name order and replaces it with a call to index-pack --verify. As a side effect, it also removes packed_object_info_detail() API which is rather expensive. As this changes the way errors are reported (verify-pack used to rely on the usual runtime error detection routine unpack_entry() to diagnose the CRC errors in an entry in the .idx file; index-pack --verify checks the whole .idx file in one go), update a test that expected the string "CRC" to appear in the error message. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-06-05 22:45:38 -07:00
Jim Meyering	23c7df6bdd	sha1_file: use the correct type (ssize_t, not size_t) for read-style function Using an unsigned type, we would fail to detect a read error and then proceed to try to write (size_t)-1 bytes. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-26 11:25:59 -07:00
Junio C Hamano	5cfe4256d9	Merge branch 'jc/bigfile' * jc/bigfile: Bigfile: teach "git add" to send a large file straight to a pack index_fd(): split into two helper functions index_fd(): turn write_object and format_check arguments into one flag	2011-05-25 16:23:26 -07:00
Junio C Hamano	f0270efd46	sha1_file.c: expose helpers to read loose objects Make map_sha1_file(), parse_sha1_header() and unpack_sha1_header() available to the streaming read API by exporting them via cache.h header file. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-20 23:16:53 -07:00
Junio C Hamano	f8c8abc5b7	unpack_object_header(): make it public This function is used to read and skip over the per-object header in a packfile. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-20 18:38:54 -07:00
Junio C Hamano	5266d369b2	sha1_object_info_extended(): hint about objects in delta-base cache An object found in the delta-base cache is not guaranteed to stay there, but we know it came from a pack and it is likely to give us a quick access if we read_sha1_file() it right now, which is a piece of useful information. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-20 18:38:50 -07:00
Junio C Hamano	61d7503da1	Merge branch 'jc/replacing' * jc/replacing: read_sha1_file(): allow selective bypassing of replacement mechanism inline lookup_replace_object() calls read_sha1_file(): get rid of read_sha1_file_repl() madness t6050: make sure we test not just commit replacement Declare lookup_replace_object() in cache.h, not in commit.h Conflicts: environment.c	2011-05-19 20:37:21 -07:00
Junio C Hamano	9a49059022	sha1_object_info_extended(): expose a bit more info The original interface for sha1_object_info() takes an object name and gives back a type and its size (the latter is given only when it was asked). The new interface wraps its implementation and exposes a bit more pieces of information that the interface used to discard, namely: - where the object is stored (loose? cached? packed?) - if packed, where in which packfile? Signed-off-by: Junio C Hamano <gitster@pobox.com> --- * In the earlier round, this used u.pack.delta to record the length of the delta chain, but the caller is not necessarily interested in the length of the delta chain per-se, but may only want to know if it is a delta against another object or is stored as a deflated data. Calling packed_object_info_detail() involves walking the reverse index chain to compute the store size of the object and is unnecessarily expensive. We could resurrect the code if a new caller wants to know, but I doubt it.	2011-05-19 14:22:47 -07:00
Junio C Hamano	b9a62cbeb9	packed_object_info_detail(): do not return a string Instead return an integer that can be given to typename() if the caller wants a string, just like everybody else does. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-16 22:13:34 -07:00
Junio C Hamano	02071b27f1	Merge branches 'jc/convert', 'jc/bigfile' and 'jc/replacing' into jc/streaming * jc/convert: convert: make it harder to screw up adding a conversion attribute convert: make it safer to add conversion attributes convert: give saner names to crlf/eol variables, types and functions convert: rename the "eol" global variable to "core_eol" * jc/bigfile: Bigfile: teach "git add" to send a large file straight to a pack index_fd(): split into two helper functions index_fd(): turn write_object and format_check arguments into one flag * jc/replacing: read_sha1_file(): allow selective bypassing of replacement mechanism inline lookup_replace_object() calls read_sha1_file(): get rid of read_sha1_file_repl() madness t6050: make sure we test not just commit replacement Declare lookup_replace_object() in cache.h, not in commit.h	2011-05-15 16:30:13 -07:00
Junio C Hamano	f4e516834e	git_open_noatime(): drop unused parameter Since commit `c793430` (Limit file descriptors used by packs, 2011-02-28), the extra parameter added in `f2e872aa` (Work around EMFILE when there are too many pack files, 2010-11-01) is not used anymore. Remove it. Signed-off-by: Junio C Hamano <gitster@pobox.com> Acked-by: Shawn O. Pearce <spearce@spearce.org>	2011-05-15 15:24:52 -07:00
Junio C Hamano	ccf5ace0dc	sha1_file: typofix The number zero is spelled "zero", not "zer0". Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-15 15:24:36 -07:00
Junio C Hamano	5bf29b9500	read_sha1_file(): allow selective bypassing of replacement mechanism The way "object replacement" mechanism was tucked to the read_sha1_file() interface was suboptimal in a couple of ways: - Callers that want it to die with useful diagnosis upon seeing a corrupt object does not have a way to say that they do not want any object replacement. - Callers who do not want it to die but want to handle the errors themselves are told to arrange to call read_object(), but the function does not use the replacement mechanism, and also it is a file scope static function that not many callers can call to begin with. This adds a read_sha1_file_extended() that takes a set of flags; the callers of read_sha1_file() passes a flag READ_SHA1_FILE_REPLACE to ask for object replacement mechanism to kick in. Later, we could add another flag bit to tell the function to return an error instead of dying and then remove the misguided "call read_object() yourself". Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-15 15:23:34 -07:00
Junio C Hamano	4bbf5a2615	read_sha1_file(): get rid of read_sha1_file_repl() madness Most callers want to silently get a replacement object, and they do not care what the real name of the replacement object is. Worse yet, no sane interface to return the underlying object without replacement is provided. Remove the function and make only the few callers that want the name of the replacement object find it themselves. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-15 15:23:33 -07:00
Junio C Hamano	4dd1fbc7b1	Bigfile: teach "git add" to send a large file straight to a pack When adding a new content to the repository, we have always slurped the blob in its entirety in-core first, and computed the object name and compressed it into a loose object file. Handling large binary files (e.g. video and audio asset for games) has been problematic because of this design. At the middle level of "git add" callchain is an internal API index_fd() that takes an open file descriptor to read from the working tree file being added with its size. Teach it to call out to fast-import when adding a large blob. The write-out codepath in entry.c::write_entry() should be taught to stream, instead of reading everything in core. This should not be so hard to implement, especially if we limit ourselves only to loose object files and non-delta representation in packfiles. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-13 16:11:18 -07:00
Junio C Hamano	7b41e1e15b	index_fd(): split into two helper functions Split out the case where we do not know the size of the input (hence we read everything into a strbuf before doing anything) to index_pipe(), and the other case where we mmap or read the whole data to index_bulk(). Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-09 11:58:19 -07:00
Junio C Hamano	c4ce46fc7a	index_fd(): turn write_object and format_check arguments into one flag The "format_check" parameter tucked after the existing parameters is too ugly an afterthought to live in any reasonable API. Combine it with the other boolean parameter "write_object" into a single "flags" parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-09 11:58:19 -07:00
Jim Meyering	0353a0c4ec	remove doubled words, e.g., s/to to/to/, and fix related typos I found that some doubled words had snuck back into projects from which I'd already removed them, so now there's a "syntax-check" makefile rule in gnulib to help prevent recurrence. Running the command below spotted a few in git, too: git ls-files \| xargs perl -0777 -n \ -e 'while (/\b(then?\|[iao]n\|i[fst]\|but\|f?or\|at\|and\|[dt])\s+\1\b/gims)' \ -e '{$n=($` =~ tr/\n/\n/ + 1); ($v=$&)=~s/\n/\\n/g;' \ -e 'print "$ARGV:$n:$v\n"}' Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-04-13 11:59:11 -07:00

... 3 4 5 6 7 ...

922 Commits