2013-04-18 02:33:35 +08:00
|
|
|
#!/bin/sh
|
|
|
|
|
|
|
|
test_description='git log with invalid commit headers'
|
|
|
|
|
|
|
|
. ./test-lib.sh
|
|
|
|
|
|
|
|
test_expect_success 'setup' '
|
|
|
|
test_commit foo &&
|
|
|
|
|
|
|
|
git cat-file commit HEAD |
|
|
|
|
sed "/^author /s/>/>-<>/" >broken_email.commit &&
|
|
|
|
git hash-object -w -t commit broken_email.commit >broken_email.hash &&
|
|
|
|
git update-ref refs/heads/broken_email $(cat broken_email.hash)
|
|
|
|
'
|
|
|
|
|
split_ident: parse timestamp from end of line
Split_ident currently parses left to right. Given this
input:
Your Name <email@example.com> 123456789 -0500\n
We assume the name starts the line and runs until the first
"<". That starts the email address, which runs until the
first ">". Everything after that is assumed to be the
timestamp.
This works fine in the normal case, but is easily broken by
corrupted ident lines that contain an extra ">". Some
examples seen in the wild are:
1. Name <email>-<> 123456789 -0500\n
2. Name <email> <Name<email>> 123456789 -0500\n
3. Name1 <email1>, Name2 <email2> 123456789 -0500\n
Currently each of these produces some email address (which
is not necessarily the one the user intended) and end up
with a NULL date (which is generally interpreted as the
epoch by "git log" and friends).
But in each case we could get the correct timestamp simply
by parsing from the right-hand side, looking backwards for
the final ">", and then reading the timestamp from there.
In general, it's a losing battle to try to automatically
guess what the user meant with their broken crud. But this
particular workaround is probably worth doing. One, it's
dirt simple, and can't impact non-broken cases. Two, it
doesn't catch a single breakage we've seen, but rather a
large class of errors (i.e., any breakage inside the email
angle brackets may affect the email, but won't spill over
into the timestamp parsing). And three, the timestamp is
arguably more valuable to get right, because it can affect
correctness (e.g., in --until cutoffs).
This patch implements the right-to-left scheme described
above. We adjust the tests in t4212, which generate a commit
with such a broken ident, and now gets the timestamp right.
We also add a test that fsck continues to detect the
breakage.
For reference, here are pointers to the breakages seen (as
numbered above):
[1] http://article.gmane.org/gmane.comp.version-control.git/221441
[2] http://article.gmane.org/gmane.comp.version-control.git/222362
[3] http://perl5.git.perl.org/perl.git/commit/13b79730adea97e660de84bbe67f9d7cbe344302
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-15 06:45:00 +08:00
|
|
|
test_expect_success 'fsck notices broken commit' '
|
2014-08-30 04:31:46 +08:00
|
|
|
test_must_fail git fsck 2>actual &&
|
split_ident: parse timestamp from end of line
Split_ident currently parses left to right. Given this
input:
Your Name <email@example.com> 123456789 -0500\n
We assume the name starts the line and runs until the first
"<". That starts the email address, which runs until the
first ">". Everything after that is assumed to be the
timestamp.
This works fine in the normal case, but is easily broken by
corrupted ident lines that contain an extra ">". Some
examples seen in the wild are:
1. Name <email>-<> 123456789 -0500\n
2. Name <email> <Name<email>> 123456789 -0500\n
3. Name1 <email1>, Name2 <email2> 123456789 -0500\n
Currently each of these produces some email address (which
is not necessarily the one the user intended) and end up
with a NULL date (which is generally interpreted as the
epoch by "git log" and friends).
But in each case we could get the correct timestamp simply
by parsing from the right-hand side, looking backwards for
the final ">", and then reading the timestamp from there.
In general, it's a losing battle to try to automatically
guess what the user meant with their broken crud. But this
particular workaround is probably worth doing. One, it's
dirt simple, and can't impact non-broken cases. Two, it
doesn't catch a single breakage we've seen, but rather a
large class of errors (i.e., any breakage inside the email
angle brackets may affect the email, but won't spill over
into the timestamp parsing). And three, the timestamp is
arguably more valuable to get right, because it can affect
correctness (e.g., in --until cutoffs).
This patch implements the right-to-left scheme described
above. We adjust the tests in t4212, which generate a commit
with such a broken ident, and now gets the timestamp right.
We also add a test that fsck continues to detect the
breakage.
For reference, here are pointers to the breakages seen (as
numbered above):
[1] http://article.gmane.org/gmane.comp.version-control.git/221441
[2] http://article.gmane.org/gmane.comp.version-control.git/222362
[3] http://perl5.git.perl.org/perl.git/commit/13b79730adea97e660de84bbe67f9d7cbe344302
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-15 06:45:00 +08:00
|
|
|
test_i18ngrep invalid.author actual
|
|
|
|
'
|
|
|
|
|
2013-04-18 02:33:35 +08:00
|
|
|
test_expect_success 'git log with broken author email' '
|
|
|
|
{
|
|
|
|
echo commit $(cat broken_email.hash)
|
|
|
|
echo "Author: A U Thor <author@example.com>"
|
split_ident: parse timestamp from end of line
Split_ident currently parses left to right. Given this
input:
Your Name <email@example.com> 123456789 -0500\n
We assume the name starts the line and runs until the first
"<". That starts the email address, which runs until the
first ">". Everything after that is assumed to be the
timestamp.
This works fine in the normal case, but is easily broken by
corrupted ident lines that contain an extra ">". Some
examples seen in the wild are:
1. Name <email>-<> 123456789 -0500\n
2. Name <email> <Name<email>> 123456789 -0500\n
3. Name1 <email1>, Name2 <email2> 123456789 -0500\n
Currently each of these produces some email address (which
is not necessarily the one the user intended) and end up
with a NULL date (which is generally interpreted as the
epoch by "git log" and friends).
But in each case we could get the correct timestamp simply
by parsing from the right-hand side, looking backwards for
the final ">", and then reading the timestamp from there.
In general, it's a losing battle to try to automatically
guess what the user meant with their broken crud. But this
particular workaround is probably worth doing. One, it's
dirt simple, and can't impact non-broken cases. Two, it
doesn't catch a single breakage we've seen, but rather a
large class of errors (i.e., any breakage inside the email
angle brackets may affect the email, but won't spill over
into the timestamp parsing). And three, the timestamp is
arguably more valuable to get right, because it can affect
correctness (e.g., in --until cutoffs).
This patch implements the right-to-left scheme described
above. We adjust the tests in t4212, which generate a commit
with such a broken ident, and now gets the timestamp right.
We also add a test that fsck continues to detect the
breakage.
For reference, here are pointers to the breakages seen (as
numbered above):
[1] http://article.gmane.org/gmane.comp.version-control.git/221441
[2] http://article.gmane.org/gmane.comp.version-control.git/222362
[3] http://perl5.git.perl.org/perl.git/commit/13b79730adea97e660de84bbe67f9d7cbe344302
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-15 06:45:00 +08:00
|
|
|
echo "Date: Thu Apr 7 15:13:13 2005 -0700"
|
2013-04-18 02:33:35 +08:00
|
|
|
echo
|
|
|
|
echo " foo"
|
|
|
|
} >expect.out &&
|
|
|
|
: >expect.err &&
|
|
|
|
|
|
|
|
git log broken_email >actual.out 2>actual.err &&
|
|
|
|
|
|
|
|
test_cmp expect.out actual.out &&
|
|
|
|
test_cmp expect.err actual.err
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'git log --format with broken author email' '
|
split_ident: parse timestamp from end of line
Split_ident currently parses left to right. Given this
input:
Your Name <email@example.com> 123456789 -0500\n
We assume the name starts the line and runs until the first
"<". That starts the email address, which runs until the
first ">". Everything after that is assumed to be the
timestamp.
This works fine in the normal case, but is easily broken by
corrupted ident lines that contain an extra ">". Some
examples seen in the wild are:
1. Name <email>-<> 123456789 -0500\n
2. Name <email> <Name<email>> 123456789 -0500\n
3. Name1 <email1>, Name2 <email2> 123456789 -0500\n
Currently each of these produces some email address (which
is not necessarily the one the user intended) and end up
with a NULL date (which is generally interpreted as the
epoch by "git log" and friends).
But in each case we could get the correct timestamp simply
by parsing from the right-hand side, looking backwards for
the final ">", and then reading the timestamp from there.
In general, it's a losing battle to try to automatically
guess what the user meant with their broken crud. But this
particular workaround is probably worth doing. One, it's
dirt simple, and can't impact non-broken cases. Two, it
doesn't catch a single breakage we've seen, but rather a
large class of errors (i.e., any breakage inside the email
angle brackets may affect the email, but won't spill over
into the timestamp parsing). And three, the timestamp is
arguably more valuable to get right, because it can affect
correctness (e.g., in --until cutoffs).
This patch implements the right-to-left scheme described
above. We adjust the tests in t4212, which generate a commit
with such a broken ident, and now gets the timestamp right.
We also add a test that fsck continues to detect the
breakage.
For reference, here are pointers to the breakages seen (as
numbered above):
[1] http://article.gmane.org/gmane.comp.version-control.git/221441
[2] http://article.gmane.org/gmane.comp.version-control.git/222362
[3] http://perl5.git.perl.org/perl.git/commit/13b79730adea97e660de84bbe67f9d7cbe344302
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-15 06:45:00 +08:00
|
|
|
echo "A U Thor+author@example.com+Thu Apr 7 15:13:13 2005 -0700" >expect.out &&
|
2013-04-18 02:33:35 +08:00
|
|
|
: >expect.err &&
|
|
|
|
|
|
|
|
git log --format="%an+%ae+%ad" broken_email >actual.out 2>actual.err &&
|
|
|
|
|
|
|
|
test_cmp expect.out actual.out &&
|
|
|
|
test_cmp expect.err actual.err
|
|
|
|
'
|
|
|
|
|
t4212: test bogus timestamps with git-log
When t4212 was originally added by 9dbe7c3d (pretty: handle
broken commit headers gracefully, 2013-04-17), it tested our
handling of commits with broken ident lines in which the
timestamps could not be parsed. It does so using a bogus line
like "Name <email>-<> 1234 -0000", because that simulates an
error that was seen in the wild.
Later, 03818a4 (split_ident: parse timestamp from end of
line, 2013-10-14) made our parser smart enough to actually
find the timestamp on such a line, and t4212 was adjusted to
match. While it's nice that we handle this real-world case,
this meant that we were not actually testing the
bogus-timestamp case anymore.
This patch adds a test with a totally incomprehensible
timestamp to make sure we are testing the code path.
Note that the behavior is slightly different between regular log
output and "--format=%ad". In the former case, we produce a
sentinel value and in the latter, we produce an empty
string. While at first this seems unnecessarily
inconsistent, it matches the original behavior given by
9dbe7c3d.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 15:36:22 +08:00
|
|
|
munge_author_date () {
|
|
|
|
git cat-file commit "$1" >commit.orig &&
|
|
|
|
sed "s/^\(author .*>\) [0-9]*/\1 $2/" <commit.orig >commit.munge &&
|
|
|
|
git hash-object -w -t commit commit.munge
|
|
|
|
}
|
|
|
|
|
|
|
|
test_expect_success 'unparsable dates produce sentinel value' '
|
|
|
|
commit=$(munge_author_date HEAD totally_bogus) &&
|
|
|
|
echo "Date: Thu Jan 1 00:00:00 1970 +0000" >expect &&
|
|
|
|
git log -1 $commit >actual.full &&
|
|
|
|
grep Date <actual.full >actual &&
|
|
|
|
test_cmp expect actual
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'unparsable dates produce sentinel value (%ad)' '
|
|
|
|
commit=$(munge_author_date HEAD totally_bogus) &&
|
|
|
|
echo >expect &&
|
2015-03-20 18:06:44 +08:00
|
|
|
git log -1 --format=%ad $commit >actual &&
|
t4212: test bogus timestamps with git-log
When t4212 was originally added by 9dbe7c3d (pretty: handle
broken commit headers gracefully, 2013-04-17), it tested our
handling of commits with broken ident lines in which the
timestamps could not be parsed. It does so using a bogus line
like "Name <email>-<> 1234 -0000", because that simulates an
error that was seen in the wild.
Later, 03818a4 (split_ident: parse timestamp from end of
line, 2013-10-14) made our parser smart enough to actually
find the timestamp on such a line, and t4212 was adjusted to
match. While it's nice that we handle this real-world case,
this meant that we were not actually testing the
bogus-timestamp case anymore.
This patch adds a test with a totally incomprehensible
timestamp to make sure we are testing the code path.
Note that the behavior is slightly different between regular log
output and "--format=%ad". In the former case, we produce a
sentinel value and in the latter, we produce an empty
string. While at first this seems unnecessarily
inconsistent, it matches the original behavior given by
9dbe7c3d.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 15:36:22 +08:00
|
|
|
test_cmp expect actual
|
|
|
|
'
|
|
|
|
|
log: handle integer overflow in timestamps
If an ident line has a ridiculous date value like (2^64)+1,
we currently just pass ULONG_MAX along to the date code,
which can produce nonsensical dates.
On systems with a signed long time_t (e.g., 64-bit glibc
systems), this actually doesn't end up too bad. The
ULONG_MAX is converted to -1, we apply the timezone field to
that, and the result ends up somewhere between Dec 31, 1969
and Jan 1, 1970.
However, there is still a few good reasons to detect the
overflow explicitly:
1. On systems where "unsigned long" is smaller than
time_t, we get a nonsensical date in the future.
2. Even where it would produce "Dec 31, 1969", it's easier
to recognize "midnight Jan 1" as a consistent sentinel
value for "we could not parse this".
3. Values which do not overflow strtoul but do overflow a
signed time_t produce nonsensical values in the past.
For example, on a 64-bit system with a signed long
time_t, a timestamp of 18446744073000000000 produces a
date in 1947.
We also recognize overflow in the timezone field, which
could produce nonsensical results. In this case we show the
parsed date, but in UTC.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 15:46:37 +08:00
|
|
|
# date is 2^64 + 1
|
|
|
|
test_expect_success 'date parser recognizes integer overflow' '
|
|
|
|
commit=$(munge_author_date HEAD 18446744073709551617) &&
|
|
|
|
echo "Thu Jan 1 00:00:00 1970 +0000" >expect &&
|
|
|
|
git log -1 --format=%ad $commit >actual &&
|
|
|
|
test_cmp expect actual
|
|
|
|
'
|
|
|
|
|
|
|
|
# date is 2^64 - 2
|
|
|
|
test_expect_success 'date parser recognizes time_t overflow' '
|
|
|
|
commit=$(munge_author_date HEAD 18446744073709551614) &&
|
|
|
|
echo "Thu Jan 1 00:00:00 1970 +0000" >expect &&
|
|
|
|
git log -1 --format=%ad $commit >actual &&
|
|
|
|
test_cmp expect actual
|
|
|
|
'
|
|
|
|
|
2014-02-24 15:49:05 +08:00
|
|
|
# date is within 2^63-1, but enough to choke glibc's gmtime
|
2014-04-01 15:43:06 +08:00
|
|
|
test_expect_success 'absurdly far-in-future date' '
|
2014-02-24 15:49:05 +08:00
|
|
|
commit=$(munge_author_date HEAD 999999999999999999) &&
|
2014-04-01 15:43:06 +08:00
|
|
|
git log -1 --format=%ad $commit
|
2014-02-24 15:49:05 +08:00
|
|
|
'
|
|
|
|
|
2013-04-18 02:33:35 +08:00
|
|
|
test_done
|