Commit Graph

5 Commits

Author SHA1 Message Date
Junio C Hamano
7e71542e8b sha1dc: avoid CPP macro collisions
In an early part of sha1dc/sha1.c, the code checks the endianness of
the target platform by inspecting common CPP macros defined on
big-endian boxes, and sets BIGENDIAN macro to 1.  If these common
CPP macros are not defined, the code declares that the target
platform is little endian and does nothing (most notably, it does
not #undef its BIGENDIAN macro).

The code that does so even has this comment

    Note that all MSFT platforms are little endian,
    so none of these will be defined under the MSC compiler.

and later, the defined-ness of the BIGENDIAN macro is used to switch
the implementation of sha1_load() macro.

One thing the code did not anticipate is that somebody might define
BIGENDIAN macro in some header it includes to 0 on a little-endian
target platform.  Because the auto-detection based on common macros
do not touch BIGENDIAN macro when it detects a little-endian target,
such a definition is still valid and then defined-ness test will say
"Ah, BIGENDIAN is defined" and takes the wrong sha1_load().

As this auto-detection logic pretends as if it owns the BIGENDIAN
macro by ignoring the setting that may come from the outside and by
not explicitly unsetting when it decides that it is working for a
little-endian target, solve this problem without breaking that
assumption.  Namely, we can rename BIGENDIAN this code uses to
something much less generic, i.e. SHA1DC_BIGENDIAN.  For extra
protection, undef the macro on a little-endian target.

It is possible to work it around by instead #undef BIGENDIAN in
the auto-detection code, but a macro (or include) that happens later
in the code can be implemented in terms of BIGENDIAN on Windows and
it is possible that the implementation gets upset when it sees the
CPP macro undef'ed (instead of set to 0).  Renaming the private macro
intended to be used only in this file to a less generic name relieves
us from having to worry about that kind of breakage.

Noticed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-26 15:34:44 -07:00
Jeff King
8325e43b82 Makefile: add DC_SHA1 knob
This knob lets you use the sha1dc implementation from:

      https://github.com/cr-marcstevens/sha1collisiondetection

which can detect certain types of collision attacks (even
when we only see half of the colliding pair). So it
mitigates any attack which consists of getting the "good"
half of a collision into a trusted repository, and then
later replacing it with the "bad" half. The "good" half is
rejected by the victim's version of Git (and even if they
run an old version of Git, any sha1dc-enabled git will
complain loudly if it ever has to interact with the object).

The big downside is that it's slower than either the openssl
or block-sha1 implementations.

Here are some timings based off of linux.git:

  - compute sha1 over whole packfile
      sha1dc: 3.580s
    blk-sha1: 2.046s (-43%)
     openssl: 1.335s (-62%)

  - rev-list --all --objects
      sha1dc: 33.512s
    blk-sha1: 33.514s (+0.0%)
     openssl: 33.650s (+0.4%)

  - git log --no-merges -10000 -p
      sha1dc: 8.124s
    blk-sha1: 7.986s (-1.6%)
     openssl: 8.203s (+0.9%)

  - index-pack --verify
      sha1dc: 4m19s
    blk-sha1: 2m57s (-32%)
     openssl: 2m19s (-42%)

So overall the sha1 computation with collision detection is
about 1.75x slower than block-sha1, and 2.7x slower than
sha1. But of course most operations do more than just sha1.
Normal object access isn't really slowed at all (both the
+/- changes there are well within the run-to-run noise); any
changes are drowned out by the other work Git is doing.

The most-affected operation is `index-pack --verify`, which
is essentially just computing the sha1 on every object. This
is similar to the `index-pack` invocation that the receiver
of a push or fetch would perform. So clearly there's some
extra CPU load here.

There will also be some latency for the user, though keep in
mind that such an operation will generally be network bound
(this is about a 1.2GB packfile). Some of that extra CPU is
"free" in the sense that we use it while the pack is
streaming in anyway. But most of it comes during the
delta-resolution phase, after the whole pack has been
received. So we can imagine that for this (quite large)
push, the user might have to wait an extra 100 seconds over
openssl (which is what we use now). If we assume they can
push to us at 20Mbit/s, that's 480s for a 1.2GB pack, which
is only 20% slower.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-17 10:40:25 -07:00
Jeff King
c0c20060af sha1dc: disable safe_hash feature
The safe_hash feature is designed to make sha1dc a drop-in
replacement for sha1, where colliding entries will get a
permuted hash to un-collide them. However, since we're
handling the collision case ourselves, this isn't helpful
(and is actually harmful, as it means you get the wrong
object id if you want to show it in a log message).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-16 15:16:45 -07:00
Jeff King
45a574eec8 sha1dc: adjust header includes for git
We can replace system includes with git-compat-util.h or
cache.h (and should make sure it is included first in all C
files).  And we can drop includes from headers entirely, as
every C file should include git-compat-util.h itself.

We will add in new include guards around the header files,
though (otherwise you get into trouble including both
sha1dc/sha1.h and cache.h).

And finally, we'll use the full "sha1dc/" path for including
related files. This isn't strictly necessary, but makes the
expected resolution more obvious.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-16 15:16:45 -07:00
Jeff King
28dc98e343 sha1dc: add collision-detecting sha1 implementation
This is pulled straight from:

  https://github.com/cr-marcstevens/sha1collisiondetection

with no modifications yet (though I've pulled in only the
subset of files necessary for Git to use).

This is commit 007905a93c973f55b2daed6585f9f6c23545bf66.

Further updates can be done like:

  git checkout -b vendor-sha1dc $this_commit
  cp /path/to/sha1dc/{LICENSE.txt,lib/*} sha1dc/
  git add -A sha1dc
  git commit -m "update sha1dc"

  git checkout -b update-sha1dc origin
  git merge vendor-sha1dc

Thanks to both Marc and Dan for making the code fit our
needs by doing both optimization work, cutting down on the
object size, and doing some syntactic changes to work better
with git. And to Linus for kicking off the "diet" work that
removed some of the unused code.

The license of the sha1dc code is the MIT license, which is
obviously compatible with the GPLv2 of git.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-16 15:16:40 -07:00