Commit Graph

40 Commits

Author SHA1 Message Date
brian m. carlson
768e30ea27 hash: implement and use a context cloning function
For all of our SHA-1 implementations and most of our SHA-256
implementations, the hash context we use is a real struct.  For these
implementations, it's possible to copy a hash context by making a copy
of the struct.

However, for our libgcrypt implementation, our hash context is a
pointer.  Consequently, copying it does not lead to an independent hash
context like we intended.

Fortunately, however, libgcrypt provides us with a handy function to
copy hash contexts.  Let's add a cloning function to the hash algorithm
API, and use it in the one place we need to make a hash context copy.
With this change, our libgcrypt SHA-256 implementation is fully
functional with all of our other hash implementations.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24 09:33:21 -08:00
Derrick Stolee
cfe83216e4 csum-file: refactor finalize_hashfile() method
If we want to use a hashfile on the temporary file for a lockfile, then
we need finalize_hashfile() to fully write the trailing hash but also keep
the file descriptor open.

Do this by adding a new CSUM_HASH_IN_STREAM flag along with a functional
change that checks this flag before writing the checksum to the stream.
This differs from previous behavior since it would be written if either
CSUM_CLOSE or CSUM_FSYNC is provided.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-02 14:27:30 -07:00
Derrick Stolee
f2af9f5e02 csum-file: rename hashclose() to finalize_hashfile()
The hashclose() method behaves very differently depending on the flags
parameter. In particular, the file descriptor is not always closed.

Perform a simple rename of "hashclose()" to "finalize_hashfile()" in
preparation for functional changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-02 14:27:30 -07:00
brian m. carlson
4d2735005a csum-file: abstract uses of SHA-1
Convert several direct uses of SHA-1 to use the_hash_algo instead.
Convert one use of the constant 20 as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02 11:28:41 -08:00
brian m. carlson
98a3beab6a csum-file: rename sha1file to hashfile
Rename struct sha1file to struct hashfile, along with all of its related
functions.

The transformation in this commit was made by global search-and-replace.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02 11:28:41 -08:00
Jeff King
61d36330b4 prefer "!=" when checking read_in_full() result
Comparing the result of read_in_full() using less-than is
potentially dangerous, as a negative return value may be
converted to an unsigned type and be considered a success.
This is discussed further in 561598cfcf (read_pack_header:
handle signed/unsigned comparison in read result,
2017-09-13).

Each of these instances is actually fine in practice:

 - in get-tar-commit-id, the HEADERSIZE macro expands to a
   signed integer. If it were switched to an unsigned type
   (e.g., a size_t), then it would be a bug.

 - the other two callers check for a short read only after
   handling a negative return separately. This is a fine
   practice, but we'd prefer to model "!=" as a general
   rule.

So all of these cases can be considered cleanups and not
actual bugfixes.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-27 15:45:24 +09:00
Jeff King
599d223107 sha1fd_check: die when we cannot open the file
Right now we return a NULL "struct sha1file" if we encounter
an error. However, the sole caller (write_idx_file) does not
check the return value, and will segfault if we hit this
case.

One option would be to handle the error in the caller.
However, there's really nothing for it to do but die. This
code path is hit during "git index-pack --verify"; after we
verify the packfile, we check that the ".idx" we would
generate from it is byte-wise identical to what is on disk.
We hit the error (and segfault) if we can't open the .idx
file (a likely cause of this is that somebody else ran "git
repack -ad" while we were verifying). Since we can't
complete the requested verification, we really have no
choice but to die.

Furthermore, the rest of the sha1fd_* functions simply die
on errors. So if were to open the file successfully, for
example, and then hit a read error, sha1write would call
die() for us. So pushing the die() down into sha1fd_check
keeps the interface consistent.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-19 13:35:15 -07:00
Junio C Hamano
f06a5e607d Merge branch 'jk/sha1write-void'
Code clean-up.

* jk/sha1write-void:
  do not pretend sha1write returns errors
2014-01-10 10:33:09 -08:00
Jeff King
9af270e8c2 do not pretend sha1write returns errors
The sha1write function returns an int, but it will always be
"0". The failure-prone parts of the function happen in the
"flush" callback, which cannot pass an error back to us. So
we just end up calling die() during the flush.

Let's just drop the return value altogether, as it only
confuses callers into thinking that it might be useful.

Only one call site actually checked the return value. We can
drop that check, since it just led to a die() anyway.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-26 11:50:20 -08:00
Jeff King
e74435a516 sha1write: make buffer const-correct
We are passed a "void *" and write it out without ever
touching it; let's indicate that by using "const".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24 15:44:18 -07:00
Junio C Hamano
6c52614864 csum-file: introduce sha1file_checkpoint
It is useful to be able to rewind a check-summed file to a certain
previous state after writing data into it using sha1write() API. The
fast-import command does this after streaming a blob data to the packfile
being generated and then noticing that the same blob has already been
written, and it does this with a private code truncate_pack() that is
commented as "Yes, this is a layering violation".

Introduce two API functions, sha1file_checkpoint(), that allows the caller
to save a state of a sha1file, and then later revert it to the saved state.
Use it to reimplement truncate_pack().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-30 14:27:59 -08:00
Junio C Hamano
d907bf8ef3 Merge branch 'jc/index-pack'
* jc/index-pack:
  verify-pack: use index-pack --verify
  index-pack: show histogram when emulating "verify-pack -v"
  index-pack: start learning to emulate "verify-pack -v"
  index-pack: a miniscule refactor
  index-pack --verify: read anomalous offsets from v2 idx file
  write_idx_file: need_large_offset() helper function
  index-pack: --verify
  write_idx_file: introduce a struct to hold idx customization options
  index-pack: group the delta-base array entries also by type

Conflicts:
	builtin/verify-pack.c
	cache.h
	sha1_file.c
2011-07-19 09:54:51 -07:00
Stephen Boyd
1e4cd68c00 sparse: Fix errors and silence warnings
* load_file() returns a void pointer but is using 0 for the return
   value

 * builtin/receive-pack.c forgot to include builtin.h

 * packet_trace_prefix can be marked static

 * ll_merge takes a pointer for its last argument, not an int

 * crc32 expects a pointer as the second argument but Z_NULL is defined
   to be 0 (see 38f4d13 sparse fix: Using plain integer as NULL pointer,
   2006-11-18 for more info)

Signed-off-by: Stephen Boyd <bebarino@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-03 10:14:53 -07:00
Junio C Hamano
e337a04de2 index-pack: --verify
Given an existing .pack file and the .idx file that describes it,
this new mode of operation reads and re-index the packfile and makes
sure the existing .idx file matches the result byte-for-byte.

All the objects in the .pack file are validated during this operation as
well.  Unlike verify-pack, which visits each object described in the .idx
file in the SHA-1 order, index-pack efficiently exploits the delta-chain
to avoid rebuilding the objects that are used as the base of deltified
objects over and over again while validating the objects, resulting in
much quicker verification of the .pack file and its .idx file.

This version however cannot verify a .pack/.idx pair with a handcrafted v2
index that uses 64-bit offset representation for offsets that would fit
within 31-bit. You can create such an .idx file by giving a custom offset
to --index-version option to the command.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-27 23:29:03 -08:00
Thomas Rast
d824cbba02 Convert existing die(..., strerror(errno)) to die_errno()
Change calls to die(..., strerror(errno)) to use the new die_errno().

In the process, also make slight style adjustments: at least state
_something_ about the function that failed (instead of just printing
the pathname), and put paths in single quotes.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-27 11:14:53 -07:00
Shawn O. Pearce
e782e12f89 Merge branch 'maint'
* maint:
  rebase -i: do not fail when there is no commit to cherry-pick
  test-lib: fix color reset in say_color()
  fix pread()'s short read in index-pack

Conflicts:
	csum-file.c
2008-10-10 08:39:20 -07:00
Nicolas Pitre
838cd34664 fix pread()'s short read in index-pack
Since v1.6.0.2~13^2~ the completion of a thin pack uses sha1write() for
its ability to compute a SHA1 on the written data.  This also provides
data buffering which, along with commit 92392b4a45, will confuse pread()
whenever an appended object is 1) freed due to memory pressure because
of the depth-first delta processing, and 2) needed again because it has
many delta children, and 3) its data is still buffered by sha1write().

Let's fix the issue by simply forcing cached data out when such an
object is written so it can be pread()'d at leisure.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-10 07:09:30 -07:00
Nicolas Pitre
9126f0091f fix openssl headers conflicting with custom SHA1 implementations
On ARM I have the following compilation errors:

    CC fast-import.o
In file included from cache.h:8,
                 from builtin.h:6,
                 from fast-import.c:142:
arm/sha1.h:14: error: conflicting types for 'SHA_CTX'
/usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here
arm/sha1.h:16: error: conflicting types for 'SHA1_Init'
/usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here
arm/sha1.h:17: error: conflicting types for 'SHA1_Update'
/usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here
arm/sha1.h:18: error: conflicting types for 'SHA1_Final'
/usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here
make: *** [fast-import.o] Error 1

This is because openssl header files are always included in
git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not
set, which somehow brings in <openssl/sha1.h> clashing with the custom
ARM version.  Compilation of git is probably broken on PPC too for the
same reason.

Turns out that the only file requiring openssl/ssl.h and openssl/err.h
is imap-send.c.  But only moving those problematic includes there
doesn't solve the issue as it also includes cache.h which brings in the
conflicting local SHA1 header file.

As suggested by Jeff King, the best solution is to rename our references
to SHA1 functions and structure to something git specific, and define those
according to the implementation used.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-02 18:06:56 -07:00
Nicolas Pitre
a8032d1224 sha1write: don't copy full sized buffers
No need to memcpy() source buffer data when we might just process the
data in place instead of accumulating it into a separate buffer.
This is the case when a whole buffer would have been copied, summed,
written out and then discarded right away.

Also move the CRC32 processing within the loop so the data is more likely
to remain in the L1 CPU cache between the CRC32 sum, SHA1 sum and the
write call.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-09-02 17:05:47 -07:00
Nicolas Pitre
ac0463ed44 pack-objects: use fixup_pack_header_footer()'s validation mode
When limiting the pack size, a new header has to be written to the
pack and a new SHA1 computed.  Make sure that the SHA1 of what is being
read back matches the SHA1 of what was written.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-29 21:51:27 -07:00
Linus Torvalds
4c81b03e30 Make pack creation always fsync() the result
This means that we can depend on packs always being stable on disk,
simplifying a lot of the object serialization worries.  And unlike loose
objects, serializing pack creation IO isn't going to be a performance
killer.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-31 14:46:57 -07:00
Nicolas Pitre
ec640ed1cf remove dead code from the csum-file interface
The provided name argument is always constant and valid in every
caller's context, so no need to have an array of PATH_MAX chars to copy
it into when a simple pointer will do.  Unfortunately that means getting
rid of wascally wabbits too.

The 'error' field is also unused.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 12:55:33 -08:00
Nicolas Pitre
218558af59 make display of total transferred more accurate
The throughput display needs a delay period before accounting and
displaying anything.  Yet it might be called after some amount of data
has already been transferred.  The display of total data is therefore
accounted late and therefore smaller than the reality.

Let's call display_throughput() with an absolute amount of transferred
data instead of a relative number, and let the throughput code find the
relative amount of data by itself as needed.  This way the displayed
total is always exact.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-05 12:53:14 -08:00
Nicolas Pitre
2a128d63dc add throughput display to git-push
This one triggers only when git-pack-objects is called with
--all-progress and --stdout which is the combination used by
git-push.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-30 16:08:40 -07:00
Nicolas Pitre
7ba502c47b pack-objects.c: fix some global variable abuse and memory leaks
To keep things well layered, sha1close() now returns the file descriptor
when it doesn't close the file.

An ugly cast was added to the return of write_idx_file() to avoid a
warning.  A proper fix will come separately.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-17 02:54:56 -04:00
Junio C Hamano
4175e9e3a8 More static
There still are quite a few symbols that ought to be static.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-13 02:02:10 -07:00
Dana L. How
f02153696f Alter sha1close() 3rd argument to request flush only
update=0 suppressed writing the final SHA-1 but was not used.
Now final=0 suppresses SHA-1 finalization, SHA-1 writing,
and closing -- in other words,  only flush the buffer.

Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-20 21:55:26 -07:00
Dana How
960ccca680 Custom compression levels for objects and packs
Add config variables pack.compression and core.loosecompression ,
and switch --compression=level to pack-objects.

Loose objects will be compressed using core.loosecompression if set,
else core.compression if set, else Z_BEST_SPEED.
Packed objects will be compressed using --compression=level if seen,
else pack.compression if set, else core.compression if set,
else Z_DEFAULT_COMPRESSION.  This is the "pack compression level".

Loose objects added to a pack undeltified will be recompressed
to the pack compression level if it is unequal to the current
loose compression level by the preceding rules,  or if the loose
object was written while core.legacyheaders = true.  Newly
deltified loose objects are always compressed to the current
pack compression level.

Previously packed objects added to a pack are recompressed
to the current pack compression level exactly when their
deltification status changes,  since the previous pack data
cannot be reused.

In either case,  the --no-reuse-object switch from the first
patch below will always force recompression to the current pack
compression level,  instead of assuming the pack compression level
hasn't changed and pack data can be reused when possible.

This applies on top of the following patches from Nicolas Pitre:
[PATCH] allow for undeltified objects not to be reused
[PATCH] make "repack -f" imply "pack-objects --no-reuse-object"

Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-10 15:23:09 -07:00
Nicolas Pitre
78d1e84fe5 compute a CRC32 for each object as stored in a pack
The most important optimization for performance when repacking is the
ability to reuse data from a previous pack as is and bypass any delta
or even SHA1 computation by simply copying the raw data from one pack
to another directly.

The problem with  this is that any data corruption within a copied object
would go unnoticed and the new (repacked) pack would be self-consistent
with its own checksum despite containing a corrupted object.  This is a
real issue that already happened at least once in the past.

In some attempt to prevent this, we validate the copied data by inflating
it and making sure no error is signaled by zlib.  But this is still not
perfect as a significant portion of a pack content is made of object
headers and references to delta base objects which are not deflated and
therefore not validated when repacking actually making the pack data reuse
still not as safe as it could be.

Of course a full SHA1 validation could be performed, but that implies
full data inflating and delta replaying which is extremely costly, which
cost the data reuse optimization was designed to avoid in the first place.

So the best solution to this is simply to store a CRC32 of the raw pack
data for each object in the pack index.  This way any object in a pack can
be validated before being copied as is in another pack, including header
and any other non deflated data.

Why CRC32 instead of a faster checksum like Adler32?  Quoting Wikipedia:

   Jonathan Stone discovered in 2001 that Adler-32 has a weakness for very
   short messages. He wrote "Briefly, the problem is that, for very short
   packets, Adler32 is guaranteed to give poor coverage of the available
   bits. Don't take my word for it, ask Mark Adler. :-)" The problem is
   that sum A does not wrap for short messages. The maximum value of A for
   a 128-byte message is 32640, which is below the value 65521 used by the
   modulo operation. An extended explanation can be found in RFC 3309,
   which mandates the use of CRC32 instead of Adler-32 for SCTP, the
   Stream Control Transmission Protocol.

In the context of a GIT pack, we have lots of small objects, especially
deltas, which are likely to be quite small and in a size range for which
Adler32 is dimed not to be sufficient.  Another advantage of CRC32 is the
possibility for recovery from certain types of small corruptions like
single bit errors which are the most probable type of corruptions.

OK what this patch does is to compute the CRC32 of each object written to
a pack within pack-objects.  It is not written to the index yet and it is
obviously not validated when reusing pack data yet either.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10 12:48:14 -07:00
Shawn Pearce
e702496e43 Convert memcpy(a,b,20) to hashcpy(a,b).
This abstracts away the size of the hash values when copying them
from memory location to memory location, much as the introduction
of hashcmp abstracted away hash value comparsion.

A few call sites were using char* rather than unsigned char* so
I added the cast rather than open hashcpy to be void*.  This is a
reasonable tradeoff as most call sites already use unsigned char*
and the existing hashcmp is also declared to be unsigned char*.

[jc: Splitted the patch to "master" part, to be followed by a
 patch for merge-recursive.c which is not in "master" yet.

 Fixed the cast in the latter hunk to combine-diff.c which was
 wrong in the original.

 Also converted ones left-over in combine-diff.c, diff-lib.c and
 upload-pack.c ]

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-23 13:53:10 -07:00
David Rientjes
78b713a1d0 Make sha1flush void and remove conditional return.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-14 19:01:00 -07:00
Joachim B Haga
12f6c308d5 Make zlib compression level configurable, and change default.
With the change in default, "git add ." on kernel dir is about
twice as fast as before, with only minimal (0.5%) change in
object size. The speed difference is even more noticeable
when committing large files, which is now up to 8 times faster.

The configurability is through setting core.compression = [-1..9]
which maps to the zlib constants; -1 is the default, 0 is no
compression, and 1..9 are various speed/size tradeoffs, 9
being slowest.

Signed-off-by: Joachim B Haga (cjhaga@fys.uio.no)
Acked-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-03 13:55:11 -07:00
Florian Forster
1d7f171c3a Remove all void-pointer arithmetic.
ANSI C99 doesn't allow void-pointer arithmetic. This patch fixes this in
various ways. Usually the strategy that required the least changes was used.

Signed-off-by: Florian Forster <octo@verplant.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-20 01:59:46 -07:00
Junio C Hamano
1c15afb934 xread/xwrite: do not worry about EINTR at calling sites.
We had errno==EINTR check after read(2)/write(2) sprinkled all
over the places, always doing continue.  Consolidate them into
xread()/xwrite() wrapper routines.

Credits for suggestion goes to HPA -- bugs are mine.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-12-19 18:28:16 -08:00
Sergey Vlasov
7bf058f008 [PATCH] Plug memory leak in sha1close()
sha1create() and sha1fd() malloc the returned struct sha1file;
sha1close() should free it.

Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-08-08 22:51:46 -07:00
Junio C Hamano
f312de018b [PATCH] Let umask do its work upon filesystem object creation.
IIRC our strategy was to let the users' umask take care of the
final mode bits.  This patch fixes places that deviate from it.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-06 10:39:59 -07:00
Linus Torvalds
4397f0148a csum-file: add "sha1fd()" to create a SHA1 csum file from an existing file descriptor
We'll use this soon to write pack-files to stdout.
2005-06-28 11:10:06 -07:00
Linus Torvalds
2700628e64 csum-file: fix missing buf pointer update
This would create broken pack archives for anything nontrivial.
2005-06-27 17:02:56 -07:00
Linus Torvalds
e18088451d csum-file interface updates: return resulting SHA1
Also, make the writing of the SHA1 as a end-header be conditional: not
every user will necessarily want to write the SHA1 to the file itself,
even though current users do (but we migh end up using the same helper
functions for the object files themselves, that don't do this).

This also makes the packed index file contain the SHA1 of the packed
data file at the end (just before its own SHA1).  That way you can
validate the pairing of the two if you want to.
2005-06-26 22:01:46 -07:00
Linus Torvalds
c38138cd78 git-pack-objects: write the pack files with a SHA1 csum
We want to be able to check their integrity later, and putting the
sha1-sum of the contents at the end is a good thing.  The writing
routines are generic, so we could try to re-use them for the index file,
instead of having the same logic duplicated.

Update unpack-objects to know about the extra 20 bytes at the end
of the index.
2005-06-26 20:27:56 -07:00