Support for the OBJECT_INFO_QUICK flag in sha1_object_info_extended()
was added in commit dfdd4afcf9 ("sha1_file: teach
sha1_object_info_extended more flags", 2017-06-26) in order to support
commit e83e71c5e1 ("sha1_file: refactor has_sha1_file_with_flags",
2017-06-26), but it was inadvertently removed in commit 8b4c0103a9
("sha1_file: support lazily fetching missing objects", 2017-12-08).
Restore this functionality.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More abstraction of hash function from the codepath.
* bc/hash-algo:
hash: update obsolete reference to SHA1_HEADER
bulk-checkin: abstract SHA-1 usage
csum-file: abstract uses of SHA-1
csum-file: rename sha1file to hashfile
read-cache: abstract away uses of SHA-1
pack-write: switch various SHA-1 values to abstract forms
pack-check: convert various uses of SHA-1 to abstract forms
fast-import: switch various uses of SHA-1 to the_hash_algo
sha1_file: switch uses of SHA-1 to the_hash_algo
builtin/unpack-objects: switch uses of SHA-1 to the_hash_algo
builtin/index-pack: improve hash function abstraction
hash: create union for hash context allocation
hash: move SHA-1 macros to hash.h
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Retire mru API as it does not give enough abstraction over
underlying list API to be worth it.
* gs/retire-mru:
mru: Replace mru.[ch] with list.h implementation
In preparation for implementing narrow/partial clone, the machinery
for checking object connectivity used by gc and fsck has been
taught that a missing object is OK when it is referenced by a
packfile specially marked as coming from trusted repository that
promises to make them available on-demand and lazily.
* jh/fsck-promisors:
gc: do not repack promisor packfiles
rev-list: support termination at promisor objects
sha1_file: support lazily fetching missing objects
introduce fetch-object: fetch one promisor object
index-pack: refactor writing of .keep files
fsck: support promisor objects as CLI argument
fsck: support referenced promisor objects
fsck: support refs pointing to promisor objects
fsck: introduce partialclone extension
extension.partialclone: introduce partial clone extension
Switch various uses of explicit calls to SHA-1 into references to
the_hash_algo for better abstraction. Convert some calls to use struct
object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In various parts of our code, we want to allocate a structure
representing the internal state of a hash algorithm. The original
implementation of the hash algorithm abstraction assumed we would do
that using heap allocations, and added a context size element to struct
git_hash_algo. However, most of the existing code uses stack
allocations and conversion would needlessly complicate various parts of
the code. Add a union for the purpose of allocating hash contexts on
the stack and a typedef for ease of use. Use this union for defining
the init, update, and final functions to avoid casts. Remove the ctxsz
element for struct git_hash_algo, which is no longer very useful.
This does mean that stack allocations will grow slightly as additional
hash functions are added, but this should not be a significant problem,
since we don't allocate many hash contexts. The improved usability and
benefits from avoiding dynamic allocation outweigh this small downside.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function was already converted to use struct object_id earlier.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the definition and declaration of static write_loose_object
function to struct object_id.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the definition and declaration of force_object_loose to
struct object_id and adjust usage of this function.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the definition and declaration of write_sha1_file to
struct object_id and adjust usage of this function.
This commit also converts static function write_sha1_file_prepare, as it
is closely related.
Rename these functions to write_object_file and
write_object_file_prepare respectively.
Replace sha1_to_hex, hashcpy and hashclr with their oid equivalents
wherever possible.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the declaration and definition of hash_sha1_file to use
struct object_id and adjust all function calls.
Rename this function to hash_object_file.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the declaration and definition of pretend_sha1_file to use
struct object_id and adjust all usages of this function. Rename it to
pretend_object_file.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the custom calls to mru.[ch] with calls to list.h. This patch is
the final step in removing the mru API completely and inlining the logic.
This patch leads to significant code reduction and the mru API hence, is
not a useful abstraction anymore.
Signed-off-by: Gargi Sharma <gs051095@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As sha1_file_name() could be performance sensitive, let's
make it faster by using strbuf_addstr() and strbuf_addc()
instead of strbuf_addf().
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using a static buffer in sha1_file_name() is error prone
and the performance improvements it gives are not needed
in many of the callers.
So let's get rid of this static buffer and, if necessary
or helpful, let's use one in the caller.
Suggested-by: Jeff Hostetler <git@jeffhostetler.com>
Helped-by: Kevin Daudt <me@ikke.info>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When calling convert_to_git(), the checksafe parameter defined what
should happen if the EOL conversion (CRLF --> LF --> CRLF) does not
roundtrip cleanly. In addition, it also defined if line endings should
be renormalized (CRLF --> LF) or kept as they are.
checksafe was an safe_crlf enum with these values:
SAFE_CRLF_FALSE: do nothing in case of EOL roundtrip errors
SAFE_CRLF_FAIL: die in case of EOL roundtrip errors
SAFE_CRLF_WARN: print a warning in case of EOL roundtrip errors
SAFE_CRLF_RENORMALIZE: change CRLF to LF
SAFE_CRLF_KEEP_CRLF: keep all line endings as they are
In some cases the integer value 0 was passed as checksafe parameter
instead of the correct enum value SAFE_CRLF_FALSE. That was no problem
because SAFE_CRLF_FALSE is defined as 0.
FALSE/FAIL/WARN are different from RENORMALIZE and KEEP_CRLF. Therefore,
an enum is not ideal. Let's use a integer bit pattern instead and rename
the parameter to conv_flags to make it more generically usable. This
allows us to extend the bit pattern in a subsequent commit.
Reported-By: Randall S. Becker <rsbecker@nexbridge.com>
Helped-By: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An infrastructure to define what hash function is used in Git is
introduced, and an effort to plumb that throughout various
codepaths has been started.
* bc/hash-algo:
repository: fix a sparse 'using integer as NULL pointer' warning
Switch empty tree and blob lookups to use hash abstraction
Integrate hash algorithm support with repo setup
Add structure representing hash algorithm
setup: expose enumerated repo info
Teach sha1_file to fetch objects from the remote configured in
extensions.partialclone whenever an object is requested but missing.
The fetching of objects can be suppressed through a global variable.
This is used by fsck and index-pack.
However, by default, such fetching is not suppressed. This is meant as a
temporary measure to ensure that all Git commands work in such a
situation. Future patches will update some commands to either tolerate
missing objects (without fetching them) or be more efficient in fetching
them.
In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
(1) functions in sha1_file.c that take in a hash, without the user
regarding how the object is stored (loose or packed)
(2) functions in packfile.c (because I need to check callers that know
about the loose/packed distinction and operate on both differently,
and ensure that they can handle the concept of objects that are
neither loose nor packed)
(1) is handled by the modification to sha1_object_info_extended().
For (2), I looked at for_each_packed_object and others. For
for_each_packed_object, the callers either already work or are fixed in
this patch:
- reachable - only to find recent objects
- builtin/fsck - already knows about missing objects
- builtin/cat-file - warning message added in this commit
Callers of the other functions do not need to be changed:
- parse_pack_index
- http - indirectly from http_get_info_packs
- find_pack_entry_one
- this searches a single pack that is provided as an argument; the
caller already knows (through other means) that the sought object
is in a specific pack
- find_sha1_pack
- fast-import - appears to be an optimization to not store a file if
it is already in a pack
- http-walker - to search through a struct alt_base
- http-push - to search through remote packs
- has_sha1_pack
- builtin/fsck - already knows about promisor objects
- builtin/count-objects - informational purposes only (check if loose
object is also packed)
- builtin/prune-packed - check if object to be pruned is packed (if
not, don't prune it)
- revision - used to exclude packed objects if requested by user
- diff - just for optimization
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Internaly we use 0{40} as a placeholder object name to signal the
codepath that there is no such object (e.g. the fast-forward check
while "git fetch" stores a new remote-tracking ref says "we know
there is no 'old' thing pointed at by the ref, as we are creating
it anew" by passing 0{40} for the 'old' side), and expect that a
codepath to locate an in-core object to return NULL as a sign that
the object does not exist. A look-up for an object that does not
exist however is quite costly with a repository with large number
of packfiles. This access pattern has been optimized.
* jk/fewer-pack-rescan:
sha1_file: fast-path null sha1 as a missing object
everything_local: use "quick" object existence check
p5551: add a script to test fetch pack-dir rescans
t/perf/lib-pack: use fast-import checkpoint to create packs
p5550: factor out nonsense-pack creation
Replace use of strbuf_addf() with strbuf_add() when enumerating
loose objects in for_each_file_in_obj_subdir(). Since we already
check the length and hex-values of the string before consuming
the path, we can prevent extra computation by using the lower-
level method.
One consumer of for_each_file_in_obj_subdir() is the abbreviation
code. OID abbreviations use a cached list of loose objects (per
object subdirectory) to make repeated queries fast, but there is
significant cache load time when there are many loose objects.
Most repositories do not have many loose objects before repacking,
but in the GVFS case the repos can grow to have millions of loose
objects. Profiling 'git log' performance in GitForWindows on a
GVFS-enabled repo with ~2.5 million loose objects revealed 12% of
the CPU time was spent in strbuf_addf().
Add a new performance test to p4211-line-log.sh that is more
sensitive to this cache-loading. By limiting to 1000 commits, we
more closely resemble user wait time when reading history into a
pager.
For a copy of the Linux repo with two ~512 MB packfiles and ~572K
loose objects, running 'git log --oneline --parents --raw -1000'
had the following performance:
HEAD~1 HEAD
----------------------------------------
7.70(7.15+0.54) 7.44(7.09+0.29) -3.4%
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git add --renormalize ." is a new and safer way to record the fact
that you are correcting the end-of-line convention and other
"convert_to_git()" glitches in the in-repository data.
* tb/add-renormalize:
add: introduce "--renormalize"
In theory nobody should ever ask the low-level object code
for a null sha1. It's used as a sentinel for "no such
object" in lots of places, so leaking through to this level
is a sign that the higher-level code is not being careful
about its error-checking. In practice, though, quite a few
code paths seem to rely on the null sha1 lookup failing as a
way to quietly propagate non-existence (e.g., by feeding it
to lookup_commit_reference_gently(), which then returns
NULL).
When this happens, we do two inefficient things:
1. We actually search for the null sha1 in packs and in
the loose object directory.
2. When we fail to find it, we re-scan the pack directory
in case a simultaneous repack happened to move it from
loose to packed. This can be very expensive if you have
a large number of packs.
Only the second one actually causes noticeable performance
problems, so we could treat them independently. But for the
sake of simplicity (both of code and of reasoning about it),
it makes sense to just declare that the null sha1 cannot be
a real on-disk object, and looking it up will always return
"no such object".
There's no real loss of functionality to do so Its use as a
sentinel value means that anybody who is unlucky enough to
hit the 2^-160th chance of generating an object with that
sha1 is already going to find the object largely unusable.
In an ideal world, we'd simply fix all of the callers to
notice the null sha1 and avoid passing it to us. But a
simple experiment to catch this with a BUG() shows that
there are a large number of code paths that do so.
So in the meantime, let's fix the performance problem by
taking a fast exit from the object lookup when we see a null
sha1. p5551 shows off the improvement (when a fetched ref is
new, the "old" sha1 is 0{40}, which ends up being passed for
fast-forward checks, the status table abbreviations, etc):
Test HEAD^ HEAD
--------------------------------------------------------
5551.4: fetch 5.51(5.03+0.48) 0.17(0.10+0.06) -96.9%
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We used to add an empty alternate object database to the system
that does not help anything; it has been corrected.
* jk/info-alternates-fix:
link_alt_odb_entries: make empty input a noop
Make it safer to normalize the line endings in a repository.
Files that had been commited with CRLF will be commited with LF.
The old way to normalize a repo was like this:
# Make sure that there are not untracked files
$ echo "* text=auto" >.gitattributes
$ git read-tree --empty
$ git add .
$ git commit -m "Introduce end-of-line normalization"
The user must make sure that there are no untracked files,
otherwise they would have been added and tracked from now on.
The new "add --renormalize" does not add untracked files:
$ echo "* text=auto" >.gitattributes
$ git add --renormalize .
$ git commit -m "Introduce end-of-line normalization"
Note that "git add --renormalize <pathspec>" is the short form for
"git add -u --renormalize <pathspec>".
While at it, document that the same renormalization may be needed,
whenever a clean filter is added or changed.
Helped-By: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We used to add an empty alternate object database to the system
that does not help anything; it has been corrected.
* jk/info-alternates-fix:
link_alt_odb_entries: make empty input a noop
If an empty string is passed to link_alt_odb_entries(), our
loop finds no entries and we link nothing. But we still do
some preparatory work to normalize the object directory
path, even though we'll never look at the result. This
triggers in basically every git process, since we feed the
usually-empty ALTERNATE_DB_ENVIRONMENT to the function.
Let's detect early that there's nothing to do and return.
While we're at it, let's treat NULL the same as an empty
string as a favor to our callers. That saves
prepare_alt_odb() from having to cover this case.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since in the future we want to support an additional hash algorithm, add
a structure that represents a hash algorithm and all the data that must
go along with it. Add a constant to allow easy enumeration of hash
algorithms. Implement function typedefs to create an abstract API that
can be used by any hash algorithm, and wrappers for the existing SHA1
functions that conform to this API.
Expose a value for hex size as well as binary size. While one will
always be twice the other, the two values are both used extremely
commonly throughout the codebase and providing both leads to improved
readability.
Don't include an entry in the hash algorithm structure for the null
object ID. As this value is all zeros, any suitably sized all-zero
object ID can be used, and there's no need to store a given one on a
per-hash basis.
The current hash function transition plan envisions a time when we will
accept input from the user that might be in SHA-1 or in the NewHash
format. Since we cannot know which the user has provided, add a
constant representing the unknown algorithm to allow us to indicate that
we must look the correct value up. Provide dummy API functions that die
in this case.
Finally, include git-compat-util.h in hash.h so that the required types
are available. This aids people using automated tools their editors.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Conversion from uchar[20] to struct object_id continues.
* bc/object-id: (25 commits)
refs/files-backend: convert static functions to object_id
refs: convert read_raw_ref backends to struct object_id
refs: convert peel_object to struct object_id
refs: convert resolve_ref_unsafe to struct object_id
worktree: convert struct worktree to object_id
refs: convert resolve_gitlink_ref to struct object_id
Convert remaining callers of resolve_gitlink_ref to object_id
sha1_file: convert index_path and index_fd to struct object_id
refs: convert reflog_expire parameter to struct object_id
refs: convert read_ref_at to struct object_id
refs: convert peel_ref to struct object_id
builtin/pack-objects: convert to struct object_id
pack-bitmap: convert traverse_bitmap_commit_list to object_id
refs: convert dwim_log to struct object_id
builtin/reflog: convert remaining unsigned char uses to object_id
refs: convert dwim_ref and expand_ref to struct object_id
refs: convert read_ref and read_ref_full to object_id
refs: convert resolve_refdup and refs_resolve_refdup to struct object_id
Convert check_connected to use struct object_id
refs: update ref transactions to use struct object_id
...
An earlier update made it possible to use an on-stack in-core
lockfile structure (as opposed to having to deliberately leak an
on-heap one). Many codepaths have been updated to take advantage
of this new facility.
* ma/lockfile-fixes:
read_cache: roll back lock in `update_index_if_able()`
read-cache: leave lock in right state in `write_locked_index()`
read-cache: drop explicit `CLOSE_LOCK`-flag
cache.h: document `write_locked_index()`
apply: remove `newfd` from `struct apply_state`
apply: move lockfile into `apply_state`
cache-tree: simplify locking logic
checkout-index: simplify locking logic
tempfile: fix documentation on `delete_tempfile()`
lockfile: fix documentation on `close_lock_file_gently()`
treewide: prefer lockfiles on the stack
sha1_file: do not leak `lock_file`
The path of a loose object contains its hash value encoded into two
substrings of 2 and 38 hexadecimal digits separated by a slash. The
first part is handed to for_each_file_in_obj_subdir() in decoded form as
subdir_nr. The current code builds a full hexadecimal representation of
the hash in a temporary buffer, then uses get_oid_hex() to decode it.
Avoid the intermediate step by taking subdir_nr as-is and using
hex_to_bytes() directly on the second substring. That's shorter and
easier.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A regression fix for 2.11 that made the code to read the list of
alternate object stores overrun the end of the string.
* jk/info-alternates-fix:
read_info_alternates: warn on non-trivial errors
read_info_alternates: read contents into strbuf
Many codepaths did not diagnose write failures correctly when disks
go full, due to their misuse of write_in_full() helper function,
which have been corrected.
* jk/write-in-full-fix:
read_pack_header: handle signed/unsigned comparison in read result
config: flip return value of store_write_*()
notes-merge: use ssize_t for write_in_full() return value
pkt-line: check write_in_full() errors against "< 0"
convert less-trivial versions of "write_in_full() != len"
avoid "write_in_full(fd, buf, len) != len" pattern
get-tar-commit-id: check write_in_full() return against 0
config: avoid "write_in_full(fd, buf, len) < len" pattern
Convert the declaration and definition of resolve_gitlink_ref to use
struct object_id and apply the following semantic patch:
@@
expression E1, E2, E3;
@@
- resolve_gitlink_ref(E1, E2, E3.hash)
+ resolve_gitlink_ref(E1, E2, &E3)
@@
expression E1, E2, E3;
@@
- resolve_gitlink_ref(E1, E2, E3->hash)
+ resolve_gitlink_ref(E1, E2, E3)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert these two functions and the functions that underlie them to take
pointers to struct object_id. This is a prerequisite to convert
resolve_gitlink_ref. Fix a stray tab in the middle of the index_mem
call in index_pipe by converting it to a space.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a caller of sha1_object_info_extended() sets the
"contentp" field in object_info, we call unpack_sha1_rest()
but do not check whether it signaled an error.
This causes two problems:
1. We pass back NULL to the caller via the contentp field,
but the function returns "0" for success. A caller
might reasonably expect after a successful return that
it can access contentp without a NULL check and
segfault.
As it happens, this is impossible to trigger in the
current code. There is exactly one caller which uses
contentp, read_object(). And the only thing it does
after a successful call is to return the content
pointer to its caller, using NULL as a sentinel for
errors. So in effect it converts the success code from
sha1_object_info_extended() back into an error!
But this is still worth addressing avoid problems for
future users of "contentp".
2. Callers of unpack_sha1_rest() are expected to close the
zlib stream themselves on error. Which means that we're
leaking the stream.
The problem in (1) comes from from c84a1f3ed4 (sha1_file:
refactor read_object, 2017-06-21), which added the contentp
field. Before that, we called unpack_sha1_rest() via
unpack_sha1_file(), which directly used the NULL to signal
an error.
But note that the leak in (2) is actually older than that.
The original unpack_sha1_file() directly returned the result
of unpack_sha1_rest() to its caller, when it should have
been closing the zlib stream itself on error.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no longer any need to allocate and leak a `struct lock_file`.
Initialize it on the stack instead.
Before this patch, we set `lock = NULL` to signal that we have already
rolled back, and that we should not do any more work. We need to take
another approach now that we cannot assign NULL. We could, e.g., use
`is_lock_file_locked()`. But we already have another variable that we
could use instead, `found`. Its scope is only too small.
Bump `found` to the scope of the whole function and rearrange the "roll
back or write?"-checks to a straightforward if-else on `found`. This
also future-proves the code by making it obvious that we intend to take
exactly one of these paths.
Improved-by: Jeff King <peff@peff.net>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up to prevent future mistakes by copying and pasting
code that checks the result of read_in_full() function.
* jk/read-in-full:
worktree: check the result of read_in_full()
worktree: use xsize_t to access file size
distinguish error versus short read from read_in_full()
avoid looking at errno for short read_in_full() returns
prefer "!=" when checking read_in_full() result
notes-merge: drop dead zero-write code
files-backend: prefer "0" for write_in_full() error check
When a caller tries to read a particular set of bytes via
read_in_full(), there are three possible outcomes:
1. An error, in which case -1 is returned and errno is
set.
2. A short read, in which fewer bytes are returned and
errno is unspecified (we never saw a read error, so we
may have some random value from whatever syscall failed
last).
3. The full read completed successfully.
Many callers handle cases 1 and 2 together by just checking
the result against the requested size. If their combined
error path looks at errno (e.g., by calling die_errno), they
may report a nonsense value.
Let's fix these sites by having them distinguish between the
two error cases. That avoids the random errno confusion, and
lets us give more detailed error messages.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>