global: introduce `USE_THE_REPOSITORY_VARIABLE` macro
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.
It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.
Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit
For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).
Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 14:50:23 +08:00
|
|
|
#define USE_THE_REPOSITORY_VARIABLE
|
|
|
|
|
2023-04-11 15:41:56 +08:00
|
|
|
#include "git-compat-util.h"
|
2005-05-19 07:14:22 +08:00
|
|
|
#include "tag.h"
|
2005-04-19 02:39:48 +08:00
|
|
|
#include "commit.h"
|
2018-04-10 20:56:05 +08:00
|
|
|
#include "commit-graph.h"
|
2023-03-21 14:25:57 +08:00
|
|
|
#include "environment.h"
|
2023-03-21 14:25:54 +08:00
|
|
|
#include "gettext.h"
|
2023-02-24 08:09:27 +08:00
|
|
|
#include "hex.h"
|
2018-05-16 07:42:16 +08:00
|
|
|
#include "repository.h"
|
2023-04-11 15:41:49 +08:00
|
|
|
#include "object-name.h"
|
2023-05-16 14:34:06 +08:00
|
|
|
#include "object-store-ll.h"
|
2006-12-26 03:48:35 +08:00
|
|
|
#include "utf8.h"
|
2007-04-09 17:34:05 +08:00
|
|
|
#include "diff.h"
|
|
|
|
#include "revision.h"
|
2009-10-09 18:21:57 +08:00
|
|
|
#include "notes.h"
|
2018-05-16 05:48:42 +08:00
|
|
|
#include "alloc.h"
|
commit: teach --gpg-sign option
This uses the gpg-interface.[ch] to allow signing the commit, i.e.
$ git commit --gpg-sign -m foo
You need a passphrase to unlock the secret key for
user: "Junio C Hamano <gitster@pobox.com>"
4096-bit RSA key, ID 96AFE6CB, created 2011-10-03 (main key ID 713660A7)
[master 8457d13] foo
1 files changed, 1 insertions(+), 0 deletions(-)
The lines of GPG detached signature are placed in a new multi-line header
field, instead of tucking the signature block at the end of the commit log
message text (similar to how signed tag is done), for multiple reasons:
- The signature won't clutter output from "git log" and friends if it is
in the extra header. If we place it at the end of the log message, we
would need to teach "git log" and friends to strip the signature block
with an option.
- Teaching new versions of "git log" and "gitk" to optionally verify and
show signatures is cleaner if we structurally know where the signature
block is (instead of scanning in the commit log message).
- The signature needs to be stripped upon various commit rewriting
operations, e.g. rebase, filter-branch, etc. They all already ignore
unknown headers, but if we place signature in the log message, all of
these tools (and third-party tools) also need to learn how a signature
block would look like.
- When we added the optional encoding header, all the tools (both in tree
and third-party) that acts on the raw commit object should have been
fixed to ignore headers they do not understand, so it is not like that
new header would be more likely to break than extra text in the commit.
A commit made with the above sample sequence would look like this:
$ git cat-file commit HEAD
tree 3cd71d90e3db4136e5260ab54599791c4f883b9d
parent b87755351a47b09cb27d6913e6e0e17e6254a4d4
author Junio C Hamano <gitster@pobox.com> 1317862251 -0700
committer Junio C Hamano <gitster@pobox.com> 1317862251 -0700
gpgsig -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJOjPtrAAoJELC16IaWr+bL4TMP/RSe2Y/jYnCkds9unO5JEnfG
...
=dt98
-----END PGP SIGNATURE-----
foo
but "git log" (unless you ask for it with --pretty=raw) output is not
cluttered with the signature information.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-06 08:23:20 +08:00
|
|
|
#include "gpg-interface.h"
|
2012-04-01 06:10:39 +08:00
|
|
|
#include "mergesort.h"
|
commit-slab: introduce a macro to define a slab for new type
Introduce a header file to define a macro that can define the struct
type, initializer, accessor and cleanup functions to manage a commit
slab. Update the "indegree" topological sort facility using it.
To associate 32 flag bits with each commit, you can write:
define_commit_slab(flag32, uint32);
to declare "struct flag32" type, define an instance of it with
struct flag32 flags;
and initialize it by calling
init_flag32(&flags);
After that, a call to flag32_at() function
uint32 *fp = flag32_at(&flags, commit);
will return a pointer pointing at a uint32 for that commit. Once
you are done with these flags, clean them up with
clear_flag32(&flags);
Callers that cannot hard-code how wide the data to be associated
with the commit be at compile time can use the "_with_stride"
variant to initialize the slab.
Suppose you want to give one bit per existing ref, and paint commits
down to find which refs are descendants of each commit. Saying
typedef uint32 bits320[5];
define_commit_slab(flagbits, bits320);
at compile time will still limit your code with hard-coded limit,
because you may find that you have more than 320 refs at runtime.
The code can declare a commit slab "struct flagbits" like this
instead:
define_commit_slab(flagbits, unsigned char);
struct flagbits flags;
and initialize it by:
nrefs = ... count number of refs ...
init_flagbits_with_stride(&flags, (nrefs + 7) / 8);
so that
unsigned char *fp = flagbits_at(&flags, commit);
will return a pointer pointing at an array of 40 "unsigned char"s
associated with the commit, once you figure out nrefs is 320 at
runtime.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-14 02:56:41 +08:00
|
|
|
#include "commit-slab.h"
|
2013-06-07 12:58:12 +08:00
|
|
|
#include "prio-queue.h"
|
2020-12-31 19:56:23 +08:00
|
|
|
#include "hash-lookup.h"
|
interpret-trailers: honor the cut line
If a commit message is edited with the "verbose" option, the buffer
will have a cut line and diff after the log message, like so:
my subject
# ------------------------ >8 ------------------------
# Do not touch the line above.
# Everything below will be removed.
diff --git a/foo.txt b/foo.txt
index 5716ca5..7601807 100644
--- a/foo.txt
+++ b/foo.txt
@@ -1 +1 @@
-bar
+baz
"git interpret-trailers" is unaware of the cut line, and assumes the
trailer block would be at the end of the whole thing. This can easily
be seen with:
$ GIT_EDITOR='git interpret-trailers --in-place --trailer Acked-by:me' \
git commit --amend -v
Teach "git interpret-trailers" to notice the cut-line and ignore the
remainder of the input when looking for a place to add new trailer
block. This makes it consistent with how "git commit -v -s" inserts a
new Signed-off-by: line.
This can be done by the same logic as the existing helper function,
wt_status_truncate_message_at_cut_line(), uses, but it wants the caller
to pass a strbuf to it. Because the function ignore_non_trailer() used
by the command takes a <pointer, length> pair, not a strbuf, steal the
logic from wt_status_truncate_message_at_cut_line() to create a new
wt_status_locate_end() helper function that takes <pointer, length>
pair, and make ignore_non_trailer() call it to help "interpret-trailers".
Since there is only one caller of wt_status_truncate_message_at_cut_line()
in cmd_commit(), rewrite it to call wt_status_locate_end() helper instead
and remove the old helper that no longer has any caller.
Signed-off-by: Brian Malehorn <bmalehorn@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-16 14:06:49 +08:00
|
|
|
#include "wt-status.h"
|
Deprecate support for .git/info/grafts
The grafts feature was a convenient way to "stitch together" ancient
history to the fresh start of linux.git.
Its implementation is, however, not up to Git's standards, as there are
too many ways where it can lead to surprising and unwelcome behavior.
For example, when pushing from a repository with active grafts, it is
possible to miss commits that have been "grafted out", resulting in a
broken state on the other side.
Also, the grafts feature is limited to "rewriting" commits' list of
parents, it cannot replace anything else.
The much younger feature implemented as `git replace` set out to remedy
those limitations and dangerous bugs.
Seeing as `git replace` is pretty mature by now (since 4228e8bc98
(replace: add --graft option, 2014-07-19) it can perform the graft
file's duties), it is time to deprecate support for the graft file, and
to retire it eventually.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-29 06:44:44 +08:00
|
|
|
#include "advice.h"
|
2018-09-05 06:00:08 +08:00
|
|
|
#include "refs.h"
|
2018-11-02 10:04:55 +08:00
|
|
|
#include "commit-reach.h"
|
2023-03-21 14:26:05 +08:00
|
|
|
#include "setup.h"
|
2020-05-01 03:48:50 +08:00
|
|
|
#include "shallow.h"
|
2023-04-23 04:17:26 +08:00
|
|
|
#include "tree.h"
|
2021-12-22 11:59:40 +08:00
|
|
|
#include "hook.h"
|
commit: detect commits that exist in commit-graph but not in the ODB
Commit graphs can become stale and contain references to commits that do
not exist in the object database anymore. Theoretically, this can lead
to a scenario where we are able to successfully look up any such commit
via the commit graph even though such a lookup would fail if done via
the object database directly.
As the commit graph is mostly intended as a sort of cache to speed up
parsing of commits we do not want to have diverging behaviour in a
repository with and a repository without commit graphs, no matter
whether they are stale or not. As commits are otherwise immutable, the
only thing that we really need to care about is thus the presence or
absence of a commit.
To address potentially stale commit data that may exist in the graph,
our `lookup_commit_in_graph()` function will check for the commit's
existence in both the commit graph, but also in the object database. So
even if we were able to look up the commit's data in the graph, we would
still pretend as if the commit didn't exist if it is missing in the
object database.
We don't have the same safety net in `parse_commit_in_graph_one()`
though. This function is mostly used internally in "commit-graph.c"
itself to validate the commit graph, and this usage is fine. We do
expose its functionality via `parse_commit_in_graph()` though, which
gets called by `repo_parse_commit_internal()`, and that function is in
turn used in many places in our codebase.
For all I can see this function is never used to directly turn an object
ID into a commit object without additional safety checks before or after
this lookup. What it is being used for though is to walk history via the
parent chain of commits. So when commits in the parent chain of a graph
walk are missing it is possible that we wouldn't notice if that missing
commit was part of the commit graph. Thus, a query like `git rev-parse
HEAD~2` can succeed even if the intermittent commit is missing.
It's unclear whether there are additional ways in which such stale
commit graphs can lead to problems. In any case, it feels like this is a
bigger bug waiting to happen when we gain additional direct or indirect
callers of `repo_parse_commit_internal()`. So let's fix the inconsistent
behaviour by checking for object existence via the object database, as
well.
This check of course comes with a performance penalty. The following
benchmarks have been executed in a clone of linux.git with stable tags
added:
Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master)
Time (mean ± σ): 2.913 s ± 0.018 s [User: 2.363 s, System: 0.548 s]
Range (min … max): 2.894 s … 2.950 s 10 runs
Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
Time (mean ± σ): 3.834 s ± 0.052 s [User: 3.276 s, System: 0.556 s]
Range (min … max): 3.780 s … 3.961 s 10 runs
Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master)
Time (mean ± σ): 13.841 s ± 0.084 s [User: 13.152 s, System: 0.687 s]
Range (min … max): 13.714 s … 13.995 s 10 runs
Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
Time (mean ± σ): 13.762 s ± 0.116 s [User: 13.094 s, System: 0.667 s]
Range (min … max): 13.645 s … 14.038 s 10 runs
Summary
git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran
1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master)
We look at a ~30% regression in general, but in general we're still a
whole lot faster than without the commit graph. To counteract this, the
new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-31 15:16:18 +08:00
|
|
|
#include "parse.h"
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
#include "object-file-convert.h"
|
2005-04-19 02:39:48 +08:00
|
|
|
|
2012-09-16 04:58:15 +08:00
|
|
|
static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **);
|
|
|
|
|
[PATCH] Avoid wasting memory in git-rev-list
As pointed out on the list, git-rev-list can use a lot of memory.
One low-hanging fruit is to free the commit buffer for commits that we
parse. By default, parse_commit() will save away the buffer, since a lot
of cases do want it, and re-reading it continually would be unnecessary.
However, in many cases the buffer isn't actually necessary and saving it
just wastes memory.
We could just free the buffer ourselves, but especially in git-rev-list,
we actually end up using the helper functions that automatically add
parent commits to the commit lists, so we don't actually control the
commit parsing directly.
Instead, just make this behaviour of "parse_commit()" a global flag.
Maybe this is a bit tasteless, but it's very simple, and it makes a
noticable difference in memory usage.
Before the change:
[torvalds@g5 linux]$ /usr/bin/time git-rev-list v2.6.12..HEAD > /dev/null
0.26user 0.02system 0:00.28elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+3714minor)pagefaults 0swaps
after the change:
[torvalds@g5 linux]$ /usr/bin/time git-rev-list v2.6.12..HEAD > /dev/null
0.26user 0.00system 0:00.27elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2433minor)pagefaults 0swaps
note how the minor faults have decreased from 3714 pages to 2433 pages.
That's all due to the fewer anonymous pages allocated to hold the comment
buffers and their metadata.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-16 05:43:17 +08:00
|
|
|
int save_commit_buffer = 1;
|
2021-08-23 18:44:02 +08:00
|
|
|
int no_graft_file_deprecated_advice;
|
[PATCH] Avoid wasting memory in git-rev-list
As pointed out on the list, git-rev-list can use a lot of memory.
One low-hanging fruit is to free the commit buffer for commits that we
parse. By default, parse_commit() will save away the buffer, since a lot
of cases do want it, and re-reading it continually would be unnecessary.
However, in many cases the buffer isn't actually necessary and saving it
just wastes memory.
We could just free the buffer ourselves, but especially in git-rev-list,
we actually end up using the helper functions that automatically add
parent commits to the commit lists, so we don't actually control the
commit parsing directly.
Instead, just make this behaviour of "parse_commit()" a global flag.
Maybe this is a bit tasteless, but it's very simple, and it makes a
noticable difference in memory usage.
Before the change:
[torvalds@g5 linux]$ /usr/bin/time git-rev-list v2.6.12..HEAD > /dev/null
0.26user 0.02system 0:00.28elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+3714minor)pagefaults 0swaps
after the change:
[torvalds@g5 linux]$ /usr/bin/time git-rev-list v2.6.12..HEAD > /dev/null
0.26user 0.00system 0:00.27elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2433minor)pagefaults 0swaps
note how the minor faults have decreased from 3714 pages to 2433 pages.
That's all due to the fewer anonymous pages allocated to hold the comment
buffers and their metadata.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-16 05:43:17 +08:00
|
|
|
|
2005-04-19 02:39:48 +08:00
|
|
|
const char *commit_type = "commit";
|
|
|
|
|
2018-06-29 09:22:21 +08:00
|
|
|
struct commit *lookup_commit_reference_gently(struct repository *r,
|
2018-06-29 09:21:57 +08:00
|
|
|
const struct object_id *oid, int quiet)
|
2005-05-19 07:14:22 +08:00
|
|
|
{
|
2018-06-29 09:22:21 +08:00
|
|
|
struct object *obj = deref_tag(r,
|
|
|
|
parse_object(r, oid),
|
2018-06-29 09:21:51 +08:00
|
|
|
NULL, 0);
|
2005-05-19 07:14:22 +08:00
|
|
|
|
|
|
|
if (!obj)
|
|
|
|
return NULL;
|
2020-06-17 17:14:08 +08:00
|
|
|
return object_as_type(obj, OBJ_COMMIT, quiet);
|
2005-08-21 17:51:10 +08:00
|
|
|
}
|
|
|
|
|
2018-06-29 09:22:22 +08:00
|
|
|
struct commit *lookup_commit_reference(struct repository *r, const struct object_id *oid)
|
2005-08-21 17:51:10 +08:00
|
|
|
{
|
2018-06-29 09:22:22 +08:00
|
|
|
return lookup_commit_reference_gently(r, oid, 0);
|
2005-05-19 07:14:22 +08:00
|
|
|
}
|
|
|
|
|
Convert lookup_commit* to struct object_id
Convert lookup_commit, lookup_commit_or_die,
lookup_commit_reference, and lookup_commit_reference_gently to take
struct object_id arguments.
Introduce a temporary in parse_object buffer in order to convert this
function. This is required since in order to convert parse_object and
parse_object_buffer, lookup_commit_reference_gently and
lookup_commit_or_die would need to be converted. Not introducing a
temporary would therefore require that lookup_commit_or_die take a
struct object_id *, but lookup_commit would take unsigned char *,
leaving a confusing and hard-to-use interface.
parse_object_buffer will lose this temporary in a later patch.
This commit was created with manual changes to commit.c, commit.h, and
object.c, plus the following semantic patch:
@@
expression E1, E2;
@@
- lookup_commit_reference_gently(E1.hash, E2)
+ lookup_commit_reference_gently(&E1, E2)
@@
expression E1, E2;
@@
- lookup_commit_reference_gently(E1->hash, E2)
+ lookup_commit_reference_gently(E1, E2)
@@
expression E1;
@@
- lookup_commit_reference(E1.hash)
+ lookup_commit_reference(&E1)
@@
expression E1;
@@
- lookup_commit_reference(E1->hash)
+ lookup_commit_reference(E1)
@@
expression E1;
@@
- lookup_commit(E1.hash)
+ lookup_commit(&E1)
@@
expression E1;
@@
- lookup_commit(E1->hash)
+ lookup_commit(E1)
@@
expression E1, E2;
@@
- lookup_commit_or_die(E1.hash, E2)
+ lookup_commit_or_die(&E1, E2)
@@
expression E1, E2;
@@
- lookup_commit_or_die(E1->hash, E2)
+ lookup_commit_or_die(E1, E2)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-07 06:10:10 +08:00
|
|
|
struct commit *lookup_commit_or_die(const struct object_id *oid, const char *ref_name)
|
2011-09-17 19:57:45 +08:00
|
|
|
{
|
2018-06-29 09:21:58 +08:00
|
|
|
struct commit *c = lookup_commit_reference(the_repository, oid);
|
2011-09-17 19:57:45 +08:00
|
|
|
if (!c)
|
|
|
|
die(_("could not parse %s"), ref_name);
|
2018-08-29 05:22:48 +08:00
|
|
|
if (!oideq(oid, &c->object.oid)) {
|
2011-09-17 19:57:45 +08:00
|
|
|
warning(_("%s %s is not a commit!"),
|
Convert lookup_commit* to struct object_id
Convert lookup_commit, lookup_commit_or_die,
lookup_commit_reference, and lookup_commit_reference_gently to take
struct object_id arguments.
Introduce a temporary in parse_object buffer in order to convert this
function. This is required since in order to convert parse_object and
parse_object_buffer, lookup_commit_reference_gently and
lookup_commit_or_die would need to be converted. Not introducing a
temporary would therefore require that lookup_commit_or_die take a
struct object_id *, but lookup_commit would take unsigned char *,
leaving a confusing and hard-to-use interface.
parse_object_buffer will lose this temporary in a later patch.
This commit was created with manual changes to commit.c, commit.h, and
object.c, plus the following semantic patch:
@@
expression E1, E2;
@@
- lookup_commit_reference_gently(E1.hash, E2)
+ lookup_commit_reference_gently(&E1, E2)
@@
expression E1, E2;
@@
- lookup_commit_reference_gently(E1->hash, E2)
+ lookup_commit_reference_gently(E1, E2)
@@
expression E1;
@@
- lookup_commit_reference(E1.hash)
+ lookup_commit_reference(&E1)
@@
expression E1;
@@
- lookup_commit_reference(E1->hash)
+ lookup_commit_reference(E1)
@@
expression E1;
@@
- lookup_commit(E1.hash)
+ lookup_commit(&E1)
@@
expression E1;
@@
- lookup_commit(E1->hash)
+ lookup_commit(E1)
@@
expression E1, E2;
@@
- lookup_commit_or_die(E1.hash, E2)
+ lookup_commit_or_die(&E1, E2)
@@
expression E1, E2;
@@
- lookup_commit_or_die(E1->hash, E2)
+ lookup_commit_or_die(E1, E2)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-07 06:10:10 +08:00
|
|
|
ref_name, oid_to_hex(oid));
|
2011-09-17 19:57:45 +08:00
|
|
|
}
|
|
|
|
return c;
|
|
|
|
}
|
|
|
|
|
2022-10-17 21:17:40 +08:00
|
|
|
struct commit *lookup_commit_object(struct repository *r,
|
|
|
|
const struct object_id *oid)
|
|
|
|
{
|
|
|
|
struct object *obj = parse_object(r, oid);
|
|
|
|
return obj ? object_as_type(obj, OBJ_COMMIT, 0) : NULL;
|
|
|
|
|
|
|
|
}
|
|
|
|
|
2018-06-29 09:22:10 +08:00
|
|
|
struct commit *lookup_commit(struct repository *r, const struct object_id *oid)
|
2005-04-19 02:39:48 +08:00
|
|
|
{
|
2019-06-20 15:41:14 +08:00
|
|
|
struct object *obj = lookup_object(r, oid);
|
2014-07-13 14:41:55 +08:00
|
|
|
if (!obj)
|
2019-06-20 15:41:21 +08:00
|
|
|
return create_object(r, oid, alloc_commit_node(r));
|
2020-06-17 17:14:08 +08:00
|
|
|
return object_as_type(obj, OBJ_COMMIT, 0);
|
2005-04-19 02:39:48 +08:00
|
|
|
}
|
|
|
|
|
2010-11-03 03:59:07 +08:00
|
|
|
struct commit *lookup_commit_reference_by_name(const char *name)
|
2024-08-14 18:31:28 +08:00
|
|
|
{
|
|
|
|
return lookup_commit_reference_by_name_gently(name, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
struct commit *lookup_commit_reference_by_name_gently(const char *name,
|
|
|
|
int quiet)
|
2010-11-03 03:59:07 +08:00
|
|
|
{
|
2015-03-14 07:39:34 +08:00
|
|
|
struct object_id oid;
|
2010-11-03 03:59:07 +08:00
|
|
|
struct commit *commit;
|
|
|
|
|
2023-03-28 21:58:46 +08:00
|
|
|
if (repo_get_oid_committish(the_repository, name, &oid))
|
2010-11-03 03:59:07 +08:00
|
|
|
return NULL;
|
2024-08-14 18:31:28 +08:00
|
|
|
commit = lookup_commit_reference_gently(the_repository, &oid, quiet);
|
2023-03-28 21:58:48 +08:00
|
|
|
if (repo_parse_commit(the_repository, commit))
|
2010-11-03 03:59:07 +08:00
|
|
|
return NULL;
|
|
|
|
return commit;
|
|
|
|
}
|
|
|
|
|
2017-04-27 03:29:31 +08:00
|
|
|
static timestamp_t parse_commit_date(const char *buf, const char *tail)
|
2005-04-19 02:39:48 +08:00
|
|
|
{
|
2008-01-20 01:35:23 +08:00
|
|
|
const char *dateptr;
|
parse_commit(): parse timestamp from end of line
To find the committer timestamp, we parse left-to-right looking for the
closing ">" of the email, and then expect the timestamp right after
that. But we've seen some broken cases in the wild where this fails, but
we _could_ find the timestamp with a little extra work. E.g.:
Name <Name<email>> 123456789 -0500
This means that features that rely on the committer timestamp, like
--since or --until, will treat the commit as happening at time 0 (i.e.,
1970).
This is doubly confusing because the pretty-print parser learned to
handle these in 03818a4a94 (split_ident: parse timestamp from end of
line, 2013-10-14). So printing them via "git show", etc, makes
everything look normal, but --until, etc are still broken (despite the
fact that that commit explicitly mentioned --until!).
So let's use the same trick as 03818a4a94: find the end of the line, and
parse back to the final ">". In theory we could use split_ident_line()
here, but it's actually a bit more strict. In particular, it requires a
valid time-zone token, too. That should be present, of course, but we
wouldn't want to break --until for cases that are working currently.
We might want to teach split_ident_line() to become more lenient there,
but it would require checking its many callers (since right now they can
assume that if date_start is non-NULL, so is tz_start).
So for now we'll just reimplement the same trick in the commit parser.
The test is in t4212, which already covers similar cases, courtesy of
03818a4a94. We'll just adjust the broken commit to munge both the author
and committer timestamps. Note that we could match (author|committer)
here, but alternation can't be used portably in sed. Since we wouldn't
expect to see ">" except as part of an ident line, we can just match
that character on any line.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27 16:14:09 +08:00
|
|
|
const char *eol;
|
2005-04-19 02:39:48 +08:00
|
|
|
|
2008-01-20 01:35:23 +08:00
|
|
|
if (buf + 6 >= tail)
|
|
|
|
return 0;
|
2005-04-19 02:39:48 +08:00
|
|
|
if (memcmp(buf, "author", 6))
|
|
|
|
return 0;
|
2008-01-20 01:35:23 +08:00
|
|
|
while (buf < tail && *buf++ != '\n')
|
2005-04-19 02:39:48 +08:00
|
|
|
/* nada */;
|
2008-01-20 01:35:23 +08:00
|
|
|
if (buf + 9 >= tail)
|
|
|
|
return 0;
|
2005-04-19 02:39:48 +08:00
|
|
|
if (memcmp(buf, "committer", 9))
|
|
|
|
return 0;
|
parse_commit(): parse timestamp from end of line
To find the committer timestamp, we parse left-to-right looking for the
closing ">" of the email, and then expect the timestamp right after
that. But we've seen some broken cases in the wild where this fails, but
we _could_ find the timestamp with a little extra work. E.g.:
Name <Name<email>> 123456789 -0500
This means that features that rely on the committer timestamp, like
--since or --until, will treat the commit as happening at time 0 (i.e.,
1970).
This is doubly confusing because the pretty-print parser learned to
handle these in 03818a4a94 (split_ident: parse timestamp from end of
line, 2013-10-14). So printing them via "git show", etc, makes
everything look normal, but --until, etc are still broken (despite the
fact that that commit explicitly mentioned --until!).
So let's use the same trick as 03818a4a94: find the end of the line, and
parse back to the final ">". In theory we could use split_ident_line()
here, but it's actually a bit more strict. In particular, it requires a
valid time-zone token, too. That should be present, of course, but we
wouldn't want to break --until for cases that are working currently.
We might want to teach split_ident_line() to become more lenient there,
but it would require checking its many callers (since right now they can
assume that if date_start is non-NULL, so is tz_start).
So for now we'll just reimplement the same trick in the commit parser.
The test is in t4212, which already covers similar cases, courtesy of
03818a4a94. We'll just adjust the broken commit to munge both the author
and committer timestamps. Note that we could match (author|committer)
here, but alternation can't be used portably in sed. Since we wouldn't
expect to see ">" except as part of an ident line, we can just match
that character on any line.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27 16:14:09 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Jump to end-of-line so that we can walk backwards to find the
|
|
|
|
* end-of-email ">". This is more forgiving of malformed cases
|
|
|
|
* because unexpected characters tend to be in the name and email
|
|
|
|
* fields.
|
|
|
|
*/
|
|
|
|
eol = memchr(buf, '\n', tail - buf);
|
|
|
|
if (!eol)
|
2008-01-20 01:35:23 +08:00
|
|
|
return 0;
|
parse_commit(): parse timestamp from end of line
To find the committer timestamp, we parse left-to-right looking for the
closing ">" of the email, and then expect the timestamp right after
that. But we've seen some broken cases in the wild where this fails, but
we _could_ find the timestamp with a little extra work. E.g.:
Name <Name<email>> 123456789 -0500
This means that features that rely on the committer timestamp, like
--since or --until, will treat the commit as happening at time 0 (i.e.,
1970).
This is doubly confusing because the pretty-print parser learned to
handle these in 03818a4a94 (split_ident: parse timestamp from end of
line, 2013-10-14). So printing them via "git show", etc, makes
everything look normal, but --until, etc are still broken (despite the
fact that that commit explicitly mentioned --until!).
So let's use the same trick as 03818a4a94: find the end of the line, and
parse back to the final ">". In theory we could use split_ident_line()
here, but it's actually a bit more strict. In particular, it requires a
valid time-zone token, too. That should be present, of course, but we
wouldn't want to break --until for cases that are working currently.
We might want to teach split_ident_line() to become more lenient there,
but it would require checking its many callers (since right now they can
assume that if date_start is non-NULL, so is tz_start).
So for now we'll just reimplement the same trick in the commit parser.
The test is in t4212, which already covers similar cases, courtesy of
03818a4a94. We'll just adjust the broken commit to munge both the author
and committer timestamps. Note that we could match (author|committer)
here, but alternation can't be used portably in sed. Since we wouldn't
expect to see ">" except as part of an ident line, we can just match
that character on any line.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27 16:14:09 +08:00
|
|
|
dateptr = eol;
|
|
|
|
while (dateptr > buf && dateptr[-1] != '>')
|
|
|
|
dateptr--;
|
parse_commit(): handle broken whitespace-only timestamp
The comment in parse_commit_date() claims that parse_timestamp() will
not walk past the end of the buffer we've been given, since it will hit
the newline at "eol" and stop. This is usually true, when dateptr
contains actual numbers to parse. But with a line like:
committer name <email> \n
with just whitespace, and no numbers, parse_timestamp() will consume
that newline as part of the leading whitespace, and we may walk past our
"tail" pointer (which itself is set from the "size" parameter passed in
to parse_commit_buffer()).
In practice this can't cause us to walk off the end of an array, because
we always add an extra NUL byte to the end of objects we load from disk
(as a defense against exactly this kind of bug). However, you can see
the behavior in action when "committer" is the final header (which it
usually is, unless there's an encoding) and the subject line can be
parsed as an integer. We walk right past the newline on the committer
line, as well as the "\n\n" separator, and mistake the subject for the
timestamp.
We can solve this by trimming the whitespace ourselves, making sure that
it has some non-whitespace to parse. Note that we need to be a bit
careful about the definition of "whitespace" here, as our isspace()
doesn't match exotic characters like vertical tab or formfeed. We can
work around that by checking for an actual number (see the in-code
comment). This is slightly more restrictive than the current code, but
in practice the results are either the same (we reject "foo" as "0", but
so would parse_timestamp()) or extremely unlikely even for broken
commits (parse_timestamp() would allow "\v123" as "123", but we'll now
make it "0").
I did also allow "-" here, which may be controversial, as we don't
currently support negative timestamps. My reasoning was two-fold. One,
the design of parse_timestamp() is such that we should be able to easily
switch it to handling signed values, and this otherwise creates a
hard-to-find gotcha that anybody doing that work would get tripped up
on. And two, the status quo is that we currently parse them, though the
result of course ends up as a very large unsigned value (which is likely
to just get clamped to "0" for display anyway, since our date routines
can't handle it).
The new test checks the commit parser (via "--until") for both vanilla
spaces and the vertical-tab case. I also added a test to check these
against the pretty-print formatter, which uses split_ident_line(). It's
not subject to the same bug, because it already insists that there be
one or more digits in the timestamp.
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27 16:17:15 +08:00
|
|
|
if (dateptr == buf)
|
2008-01-20 01:35:23 +08:00
|
|
|
return 0;
|
parse_commit(): parse timestamp from end of line
To find the committer timestamp, we parse left-to-right looking for the
closing ">" of the email, and then expect the timestamp right after
that. But we've seen some broken cases in the wild where this fails, but
we _could_ find the timestamp with a little extra work. E.g.:
Name <Name<email>> 123456789 -0500
This means that features that rely on the committer timestamp, like
--since or --until, will treat the commit as happening at time 0 (i.e.,
1970).
This is doubly confusing because the pretty-print parser learned to
handle these in 03818a4a94 (split_ident: parse timestamp from end of
line, 2013-10-14). So printing them via "git show", etc, makes
everything look normal, but --until, etc are still broken (despite the
fact that that commit explicitly mentioned --until!).
So let's use the same trick as 03818a4a94: find the end of the line, and
parse back to the final ">". In theory we could use split_ident_line()
here, but it's actually a bit more strict. In particular, it requires a
valid time-zone token, too. That should be present, of course, but we
wouldn't want to break --until for cases that are working currently.
We might want to teach split_ident_line() to become more lenient there,
but it would require checking its many callers (since right now they can
assume that if date_start is non-NULL, so is tz_start).
So for now we'll just reimplement the same trick in the commit parser.
The test is in t4212, which already covers similar cases, courtesy of
03818a4a94. We'll just adjust the broken commit to munge both the author
and committer timestamps. Note that we could match (author|committer)
here, but alternation can't be used portably in sed. Since we wouldn't
expect to see ">" except as part of an ident line, we can just match
that character on any line.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27 16:14:09 +08:00
|
|
|
|
parse_commit(): handle broken whitespace-only timestamp
The comment in parse_commit_date() claims that parse_timestamp() will
not walk past the end of the buffer we've been given, since it will hit
the newline at "eol" and stop. This is usually true, when dateptr
contains actual numbers to parse. But with a line like:
committer name <email> \n
with just whitespace, and no numbers, parse_timestamp() will consume
that newline as part of the leading whitespace, and we may walk past our
"tail" pointer (which itself is set from the "size" parameter passed in
to parse_commit_buffer()).
In practice this can't cause us to walk off the end of an array, because
we always add an extra NUL byte to the end of objects we load from disk
(as a defense against exactly this kind of bug). However, you can see
the behavior in action when "committer" is the final header (which it
usually is, unless there's an encoding) and the subject line can be
parsed as an integer. We walk right past the newline on the committer
line, as well as the "\n\n" separator, and mistake the subject for the
timestamp.
We can solve this by trimming the whitespace ourselves, making sure that
it has some non-whitespace to parse. Note that we need to be a bit
careful about the definition of "whitespace" here, as our isspace()
doesn't match exotic characters like vertical tab or formfeed. We can
work around that by checking for an actual number (see the in-code
comment). This is slightly more restrictive than the current code, but
in practice the results are either the same (we reject "foo" as "0", but
so would parse_timestamp()) or extremely unlikely even for broken
commits (parse_timestamp() would allow "\v123" as "123", but we'll now
make it "0").
I did also allow "-" here, which may be controversial, as we don't
currently support negative timestamps. My reasoning was two-fold. One,
the design of parse_timestamp() is such that we should be able to easily
switch it to handling signed values, and this otherwise creates a
hard-to-find gotcha that anybody doing that work would get tripped up
on. And two, the status quo is that we currently parse them, though the
result of course ends up as a very large unsigned value (which is likely
to just get clamped to "0" for display anyway, since our date routines
can't handle it).
The new test checks the commit parser (via "--until") for both vanilla
spaces and the vertical-tab case. I also added a test to check these
against the pretty-print formatter, which uses split_ident_line(). It's
not subject to the same bug, because it already insists that there be
one or more digits in the timestamp.
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27 16:17:15 +08:00
|
|
|
/*
|
|
|
|
* Trim leading whitespace, but make sure we have at least one
|
|
|
|
* non-whitespace character, as parse_timestamp() will otherwise walk
|
|
|
|
* right past the newline we found in "eol" when skipping whitespace
|
|
|
|
* itself.
|
|
|
|
*
|
|
|
|
* In theory it would be sufficient to allow any character not matched
|
|
|
|
* by isspace(), but there's a catch: our isspace() does not
|
|
|
|
* necessarily match the behavior of parse_timestamp(), as the latter
|
|
|
|
* is implemented by system routines which match more exotic control
|
|
|
|
* codes, or even locale-dependent sequences.
|
|
|
|
*
|
|
|
|
* Since we expect the timestamp to be a number, we can check for that.
|
|
|
|
* Anything else (e.g., a non-numeric token like "foo") would just
|
|
|
|
* cause parse_timestamp() to return 0 anyway.
|
|
|
|
*/
|
|
|
|
while (dateptr < eol && isspace(*dateptr))
|
|
|
|
dateptr++;
|
|
|
|
if (!isdigit(*dateptr) && *dateptr != '-')
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We know there is at least one digit (or dash), so we'll begin
|
|
|
|
* parsing there and stop at worst case at eol.
|
2023-04-27 16:17:24 +08:00
|
|
|
*
|
|
|
|
* Note that we may feed parse_timestamp() extra characters here if the
|
|
|
|
* commit is malformed, and it will parse as far as it can. For
|
|
|
|
* example, "123foo456" would return "123". That might be questionable
|
|
|
|
* (versus returning "0"), but it would help in a hypothetical case
|
|
|
|
* like "123456+0100", where the whitespace from the timezone is
|
|
|
|
* missing. Since such syntactic errors may be baked into history and
|
|
|
|
* hard to correct now, let's err on trying to make our best guess
|
|
|
|
* here, rather than insist on perfect syntax.
|
parse_commit(): handle broken whitespace-only timestamp
The comment in parse_commit_date() claims that parse_timestamp() will
not walk past the end of the buffer we've been given, since it will hit
the newline at "eol" and stop. This is usually true, when dateptr
contains actual numbers to parse. But with a line like:
committer name <email> \n
with just whitespace, and no numbers, parse_timestamp() will consume
that newline as part of the leading whitespace, and we may walk past our
"tail" pointer (which itself is set from the "size" parameter passed in
to parse_commit_buffer()).
In practice this can't cause us to walk off the end of an array, because
we always add an extra NUL byte to the end of objects we load from disk
(as a defense against exactly this kind of bug). However, you can see
the behavior in action when "committer" is the final header (which it
usually is, unless there's an encoding) and the subject line can be
parsed as an integer. We walk right past the newline on the committer
line, as well as the "\n\n" separator, and mistake the subject for the
timestamp.
We can solve this by trimming the whitespace ourselves, making sure that
it has some non-whitespace to parse. Note that we need to be a bit
careful about the definition of "whitespace" here, as our isspace()
doesn't match exotic characters like vertical tab or formfeed. We can
work around that by checking for an actual number (see the in-code
comment). This is slightly more restrictive than the current code, but
in practice the results are either the same (we reject "foo" as "0", but
so would parse_timestamp()) or extremely unlikely even for broken
commits (parse_timestamp() would allow "\v123" as "123", but we'll now
make it "0").
I did also allow "-" here, which may be controversial, as we don't
currently support negative timestamps. My reasoning was two-fold. One,
the design of parse_timestamp() is such that we should be able to easily
switch it to handling signed values, and this otherwise creates a
hard-to-find gotcha that anybody doing that work would get tripped up
on. And two, the status quo is that we currently parse them, though the
result of course ends up as a very large unsigned value (which is likely
to just get clamped to "0" for display anyway, since our date routines
can't handle it).
The new test checks the commit parser (via "--until") for both vanilla
spaces and the vertical-tab case. I also added a test to check these
against the pretty-print formatter, which uses split_ident_line(). It's
not subject to the same bug, because it already insists that there be
one or more digits in the timestamp.
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27 16:17:15 +08:00
|
|
|
*/
|
2017-04-21 18:45:44 +08:00
|
|
|
return parse_timestamp(dateptr, NULL, 10);
|
2005-04-19 02:39:48 +08:00
|
|
|
}
|
|
|
|
|
2021-01-28 14:20:23 +08:00
|
|
|
static const struct object_id *commit_graft_oid_access(size_t index, const void *table)
|
2014-02-27 02:49:22 +08:00
|
|
|
{
|
2021-01-28 14:20:23 +08:00
|
|
|
const struct commit_graft * const *commit_graft_table = table;
|
2021-01-28 14:19:42 +08:00
|
|
|
return &commit_graft_table[index]->oid;
|
2014-02-27 02:49:22 +08:00
|
|
|
}
|
|
|
|
|
2021-01-28 14:12:35 +08:00
|
|
|
int commit_graft_pos(struct repository *r, const struct object_id *oid)
|
2005-07-30 15:58:28 +08:00
|
|
|
{
|
2021-01-28 14:19:42 +08:00
|
|
|
return oid_pos(oid, r->parsed_objects->grafts,
|
|
|
|
r->parsed_objects->grafts_nr,
|
|
|
|
commit_graft_oid_access);
|
2005-07-30 15:58:28 +08:00
|
|
|
}
|
|
|
|
|
2024-09-05 18:09:12 +08:00
|
|
|
void unparse_commit(struct repository *r, const struct object_id *oid)
|
commit,shallow: unparse commits if grafts changed
When a commit is parsed, it pretends to have a different (possibly
empty) list of parents if there is graft information for that commit.
But there is a bug that could occur when a commit is parsed, the graft
information is updated (for example, when a shallow file is rewritten),
and the same commit is subsequently used: the parents of the commit do
not conform to the updated graft information, but the information at the
time of parsing.
This is usually not an issue, as a commit is usually introduced into the
repository at the same time as its graft information. That means that
when we try to parse that commit, we already have its graft information.
But it is an issue when fetching a shallow point directly into a
repository with submodules. The function
assign_shallow_commits_to_refs() parses all sought objects (including
the shallow point, which we are directly fetching). In update_shallow()
in fetch-pack.c, assign_shallow_commits_to_refs() is called before
commit_shallow_file(), which means that the shallow point would have
been parsed before graft information is updated. Once a commit is
parsed, it is no longer sensitive to any graft information updates. This
parsed commit is subsequently used when we do a revision walk to search
for submodules to fetch, meaning that the commit is considered to have
parents even though it is a shallow point (and therefore should be
treated as having no parents).
Therefore, whenever graft information is updated, mark the commits that
were previously grafts and the commits that are newly grafts as
unparsed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-07 01:54:37 +08:00
|
|
|
{
|
|
|
|
struct commit *c = lookup_commit(r, oid);
|
|
|
|
|
|
|
|
if (!c->object.parsed)
|
|
|
|
return;
|
|
|
|
free_commit_list(c->parents);
|
|
|
|
c->parents = NULL;
|
|
|
|
c->object.parsed = 0;
|
|
|
|
}
|
|
|
|
|
2018-05-18 06:51:48 +08:00
|
|
|
int register_commit_graft(struct repository *r, struct commit_graft *graft,
|
|
|
|
int ignore_dups)
|
2006-04-07 14:58:51 +08:00
|
|
|
{
|
2021-01-28 14:12:35 +08:00
|
|
|
int pos = commit_graft_pos(r, &graft->oid);
|
2007-06-07 15:04:01 +08:00
|
|
|
|
2006-04-07 14:58:51 +08:00
|
|
|
if (0 <= pos) {
|
|
|
|
if (ignore_dups)
|
|
|
|
free(graft);
|
|
|
|
else {
|
2018-05-18 06:51:48 +08:00
|
|
|
free(r->parsed_objects->grafts[pos]);
|
|
|
|
r->parsed_objects->grafts[pos] = graft;
|
2006-04-07 14:58:51 +08:00
|
|
|
}
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
pos = -pos - 1;
|
2018-05-18 06:51:48 +08:00
|
|
|
ALLOC_GROW(r->parsed_objects->grafts,
|
|
|
|
r->parsed_objects->grafts_nr + 1,
|
|
|
|
r->parsed_objects->grafts_alloc);
|
|
|
|
r->parsed_objects->grafts_nr++;
|
|
|
|
if (pos < r->parsed_objects->grafts_nr)
|
|
|
|
memmove(r->parsed_objects->grafts + pos + 1,
|
|
|
|
r->parsed_objects->grafts + pos,
|
|
|
|
(r->parsed_objects->grafts_nr - pos - 1) *
|
|
|
|
sizeof(*r->parsed_objects->grafts));
|
|
|
|
r->parsed_objects->grafts[pos] = graft;
|
commit,shallow: unparse commits if grafts changed
When a commit is parsed, it pretends to have a different (possibly
empty) list of parents if there is graft information for that commit.
But there is a bug that could occur when a commit is parsed, the graft
information is updated (for example, when a shallow file is rewritten),
and the same commit is subsequently used: the parents of the commit do
not conform to the updated graft information, but the information at the
time of parsing.
This is usually not an issue, as a commit is usually introduced into the
repository at the same time as its graft information. That means that
when we try to parse that commit, we already have its graft information.
But it is an issue when fetching a shallow point directly into a
repository with submodules. The function
assign_shallow_commits_to_refs() parses all sought objects (including
the shallow point, which we are directly fetching). In update_shallow()
in fetch-pack.c, assign_shallow_commits_to_refs() is called before
commit_shallow_file(), which means that the shallow point would have
been parsed before graft information is updated. Once a commit is
parsed, it is no longer sensitive to any graft information updates. This
parsed commit is subsequently used when we do a revision walk to search
for submodules to fetch, meaning that the commit is considered to have
parents even though it is a shallow point (and therefore should be
treated as having no parents).
Therefore, whenever graft information is updated, mark the commits that
were previously grafts and the commits that are newly grafts as
unparsed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-07 01:54:37 +08:00
|
|
|
unparse_commit(r, &graft->oid);
|
2006-04-07 14:58:51 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-08-19 02:33:12 +08:00
|
|
|
struct commit_graft *read_graft_line(struct strbuf *line)
|
2006-04-07 14:58:51 +08:00
|
|
|
{
|
|
|
|
/* The format is just "Commit Parent1 Parent2 ...\n" */
|
2017-08-19 02:33:14 +08:00
|
|
|
int i, phase;
|
|
|
|
const char *tail = NULL;
|
2006-04-07 14:58:51 +08:00
|
|
|
struct commit_graft *graft = NULL;
|
2017-08-19 02:33:14 +08:00
|
|
|
struct object_id dummy_oid, *oid;
|
2006-04-07 14:58:51 +08:00
|
|
|
|
2017-08-19 02:33:12 +08:00
|
|
|
strbuf_rtrim(line);
|
|
|
|
if (!line->len || line->buf[0] == '#')
|
2006-04-17 05:24:56 +08:00
|
|
|
return NULL;
|
2017-08-19 02:33:14 +08:00
|
|
|
/*
|
|
|
|
* phase 0 verifies line, counts hashes in line and allocates graft
|
|
|
|
* phase 1 fills graft
|
|
|
|
*/
|
|
|
|
for (phase = 0; phase < 2; phase++) {
|
|
|
|
oid = graft ? &graft->oid : &dummy_oid;
|
|
|
|
if (parse_oid_hex(line->buf, oid, &tail))
|
2006-04-07 14:58:51 +08:00
|
|
|
goto bad_graft_data;
|
2017-08-19 02:33:14 +08:00
|
|
|
for (i = 0; *tail != '\0'; i++) {
|
|
|
|
oid = graft ? &graft->parent[i] : &dummy_oid;
|
|
|
|
if (!isspace(*tail++) || parse_oid_hex(tail, oid, &tail))
|
|
|
|
goto bad_graft_data;
|
|
|
|
}
|
|
|
|
if (!graft) {
|
|
|
|
graft = xmalloc(st_add(sizeof(*graft),
|
|
|
|
st_mult(sizeof(struct object_id), i)));
|
|
|
|
graft->nr_parent = i;
|
|
|
|
}
|
2006-04-07 14:58:51 +08:00
|
|
|
}
|
|
|
|
return graft;
|
2010-12-02 03:15:59 +08:00
|
|
|
|
|
|
|
bad_graft_data:
|
2017-08-19 02:33:12 +08:00
|
|
|
error("bad graft data: %s", line->buf);
|
2017-08-19 02:33:14 +08:00
|
|
|
assert(!graft);
|
2010-12-02 03:15:59 +08:00
|
|
|
return NULL;
|
2006-04-07 14:58:51 +08:00
|
|
|
}
|
|
|
|
|
2018-05-18 06:51:49 +08:00
|
|
|
static int read_graft_file(struct repository *r, const char *graft_file)
|
2005-07-30 15:58:28 +08:00
|
|
|
{
|
2017-05-03 18:16:50 +08:00
|
|
|
FILE *fp = fopen_or_warn(graft_file, "r");
|
2013-12-28 04:49:57 +08:00
|
|
|
struct strbuf buf = STRBUF_INIT;
|
2006-04-07 14:58:51 +08:00
|
|
|
if (!fp)
|
|
|
|
return -1;
|
2021-08-23 18:44:02 +08:00
|
|
|
if (!no_graft_file_deprecated_advice &&
|
|
|
|
advice_enabled(ADVICE_GRAFT_FILE_DEPRECATED))
|
Deprecate support for .git/info/grafts
The grafts feature was a convenient way to "stitch together" ancient
history to the fresh start of linux.git.
Its implementation is, however, not up to Git's standards, as there are
too many ways where it can lead to surprising and unwelcome behavior.
For example, when pushing from a repository with active grafts, it is
possible to miss commits that have been "grafted out", resulting in a
broken state on the other side.
Also, the grafts feature is limited to "rewriting" commits' list of
parents, it cannot replace anything else.
The much younger feature implemented as `git replace` set out to remedy
those limitations and dangerous bugs.
Seeing as `git replace` is pretty mature by now (since 4228e8bc98
(replace: add --graft option, 2014-07-19) it can perform the graft
file's duties), it is time to deprecate support for the graft file, and
to retire it eventually.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-29 06:44:44 +08:00
|
|
|
advise(_("Support for <GIT_DIR>/info/grafts is deprecated\n"
|
|
|
|
"and will be removed in a future Git version.\n"
|
|
|
|
"\n"
|
|
|
|
"Please use \"git replace --convert-graft-file\"\n"
|
|
|
|
"to convert the grafts into replace refs.\n"
|
|
|
|
"\n"
|
|
|
|
"Turn this message off by running\n"
|
|
|
|
"\"git config advice.graftFileDeprecated false\""));
|
2013-12-28 04:49:57 +08:00
|
|
|
while (!strbuf_getwholeline(&buf, fp, '\n')) {
|
2005-07-30 15:58:28 +08:00
|
|
|
/* The format is just "Commit Parent1 Parent2 ...\n" */
|
2017-08-19 02:33:12 +08:00
|
|
|
struct commit_graft *graft = read_graft_line(&buf);
|
2006-04-17 05:24:56 +08:00
|
|
|
if (!graft)
|
|
|
|
continue;
|
2018-05-18 06:51:49 +08:00
|
|
|
if (register_commit_graft(r, graft, 1))
|
2013-12-28 04:49:57 +08:00
|
|
|
error("duplicate graft data: %s", buf.buf);
|
2005-07-30 15:58:28 +08:00
|
|
|
}
|
|
|
|
fclose(fp);
|
2013-12-28 04:49:57 +08:00
|
|
|
strbuf_release(&buf);
|
2006-04-07 14:58:51 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-08-21 02:24:30 +08:00
|
|
|
void prepare_commit_graft(struct repository *r)
|
2006-04-07 14:58:51 +08:00
|
|
|
{
|
2024-09-12 19:29:35 +08:00
|
|
|
const char *graft_file;
|
2006-04-07 14:58:51 +08:00
|
|
|
|
2018-05-18 06:51:53 +08:00
|
|
|
if (r->parsed_objects->commit_graft_prepared)
|
2006-04-07 14:58:51 +08:00
|
|
|
return;
|
prepare_commit_graft: treat non-repository as a noop
The parse_commit_buffer() function consults lookup_commit_graft()
to see if we need to rewrite parents. The latter will look
at $GIT_DIR/info/grafts. If you're outside of a repository,
then this will trigger a BUG() as of b1ef400eec (setup_git_env:
avoid blind fall-back to ".git", 2016-10-20).
It's probably uncommon to actually parse a commit outside of
a repository, but you can see it in action with:
cd /not/a/git/repo
git index-pack --strict /some/file.pack
This works fine without --strict, but the fsck checks will
try to parse any commits, triggering the BUG(). We can fix
that by teaching the graft code to behave as if there are no
grafts when we aren't in a repository.
Arguably index-pack (and fsck) are wrong to consider grafts
at all. So another solution is to disable grafts entirely
for those commands. But given that the graft feature is
deprecated anyway, it's not worth even thinking through the
ramifications that might have.
There is one other corner case I considered here. What
should:
cd /not/a/git/repo
export GIT_GRAFT_FILE=/file/with/grafts
git index-pack --strict /some/file.pack
do? We don't have a repository, but the user has pointed us
directly at a graft file, which we could respect. I believe
this case did work that way prior to b1ef400eec. However,
fixing it now would be pretty invasive. Back then we would
just call into setup_git_env() even without a repository.
But these days it actually takes a git_dir argument. So
there would be a fair bit of refactoring of the setup code
involved.
Given the obscurity of this case, plus the fact that grafts
are deprecated and probably shouldn't work under index-pack
anyway, it's not worth pursuing further. This patch at least
un-breaks the common case where you're _not_ using grafts,
but we BUG() anyway trying to even find that out.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-01 06:42:53 +08:00
|
|
|
if (!startup_info->have_repository)
|
|
|
|
return;
|
|
|
|
|
2024-09-12 19:29:35 +08:00
|
|
|
graft_file = repo_get_graft_file(r);
|
2018-05-18 06:51:53 +08:00
|
|
|
read_graft_file(r, graft_file);
|
2006-10-31 03:09:06 +08:00
|
|
|
/* make sure shallows are read */
|
2018-05-18 06:51:53 +08:00
|
|
|
is_repository_shallow(r);
|
|
|
|
r->parsed_objects->commit_graft_prepared = 1;
|
2005-07-30 15:58:28 +08:00
|
|
|
}
|
|
|
|
|
2018-05-18 06:51:54 +08:00
|
|
|
struct commit_graft *lookup_commit_graft(struct repository *r, const struct object_id *oid)
|
2005-07-30 15:58:28 +08:00
|
|
|
{
|
|
|
|
int pos;
|
2018-05-18 06:51:54 +08:00
|
|
|
prepare_commit_graft(r);
|
2021-01-28 14:12:35 +08:00
|
|
|
pos = commit_graft_pos(r, oid);
|
2005-07-30 15:58:28 +08:00
|
|
|
if (pos < 0)
|
|
|
|
return NULL;
|
2018-05-18 06:51:54 +08:00
|
|
|
return r->parsed_objects->grafts[pos];
|
2005-07-30 15:58:28 +08:00
|
|
|
}
|
|
|
|
|
2011-08-18 20:29:35 +08:00
|
|
|
int for_each_commit_graft(each_commit_graft_fn fn, void *cb_data)
|
2006-10-31 03:09:06 +08:00
|
|
|
{
|
2011-08-18 20:29:35 +08:00
|
|
|
int i, ret;
|
2018-05-16 07:42:16 +08:00
|
|
|
for (i = ret = 0; i < the_repository->parsed_objects->grafts_nr && !ret; i++)
|
|
|
|
ret = fn(the_repository->parsed_objects->grafts[i], cb_data);
|
2011-08-18 20:29:35 +08:00
|
|
|
return ret;
|
2006-10-31 03:09:06 +08:00
|
|
|
}
|
|
|
|
|
2014-06-11 05:44:13 +08:00
|
|
|
struct commit_buffer {
|
|
|
|
void *buffer;
|
|
|
|
unsigned long size;
|
|
|
|
};
|
|
|
|
define_commit_slab(buffer_slab, struct commit_buffer);
|
2014-06-11 05:43:02 +08:00
|
|
|
|
2018-06-29 09:22:15 +08:00
|
|
|
struct buffer_slab *allocate_commit_buffer_slab(void)
|
2014-06-11 05:40:14 +08:00
|
|
|
{
|
2018-06-29 09:22:15 +08:00
|
|
|
struct buffer_slab *bs = xmalloc(sizeof(*bs));
|
|
|
|
init_buffer_slab(bs);
|
|
|
|
return bs;
|
|
|
|
}
|
|
|
|
|
|
|
|
void free_commit_buffer_slab(struct buffer_slab *bs)
|
|
|
|
{
|
|
|
|
clear_buffer_slab(bs);
|
|
|
|
free(bs);
|
|
|
|
}
|
2014-06-11 05:43:02 +08:00
|
|
|
|
2018-06-29 09:22:16 +08:00
|
|
|
void set_commit_buffer(struct repository *r, struct commit *commit, void *buffer, unsigned long size)
|
2014-06-11 05:40:14 +08:00
|
|
|
{
|
2018-06-29 09:22:15 +08:00
|
|
|
struct commit_buffer *v = buffer_slab_at(
|
2018-06-29 09:22:16 +08:00
|
|
|
r->parsed_objects->buffer_slab, commit);
|
2014-06-11 05:44:13 +08:00
|
|
|
v->buffer = buffer;
|
|
|
|
v->size = size;
|
2014-06-11 05:40:14 +08:00
|
|
|
}
|
|
|
|
|
2018-06-29 09:22:17 +08:00
|
|
|
const void *get_cached_commit_buffer(struct repository *r, const struct commit *commit, unsigned long *sizep)
|
2014-06-11 05:40:39 +08:00
|
|
|
{
|
2018-06-29 09:22:15 +08:00
|
|
|
struct commit_buffer *v = buffer_slab_peek(
|
2018-06-29 09:22:17 +08:00
|
|
|
r->parsed_objects->buffer_slab, commit);
|
2015-05-15 06:25:52 +08:00
|
|
|
if (!v) {
|
|
|
|
if (sizep)
|
|
|
|
*sizep = 0;
|
|
|
|
return NULL;
|
|
|
|
}
|
2014-06-11 05:44:13 +08:00
|
|
|
if (sizep)
|
|
|
|
*sizep = v->size;
|
|
|
|
return v->buffer;
|
2014-06-11 05:40:39 +08:00
|
|
|
}
|
|
|
|
|
2018-11-14 08:12:57 +08:00
|
|
|
const void *repo_get_commit_buffer(struct repository *r,
|
|
|
|
const struct commit *commit,
|
|
|
|
unsigned long *sizep)
|
2014-06-11 05:40:39 +08:00
|
|
|
{
|
2018-11-14 08:12:57 +08:00
|
|
|
const void *ret = get_cached_commit_buffer(r, commit, sizep);
|
2014-06-11 05:40:39 +08:00
|
|
|
if (!ret) {
|
|
|
|
enum object_type type;
|
|
|
|
unsigned long size;
|
2018-11-14 08:12:57 +08:00
|
|
|
ret = repo_read_object_file(r, &commit->object.oid, &type, &size);
|
2014-06-11 05:40:39 +08:00
|
|
|
if (!ret)
|
|
|
|
die("cannot read commit object %s",
|
2015-11-10 10:22:28 +08:00
|
|
|
oid_to_hex(&commit->object.oid));
|
2014-06-11 05:40:39 +08:00
|
|
|
if (type != OBJ_COMMIT)
|
|
|
|
die("expected commit for %s, got %s",
|
2018-02-15 02:59:24 +08:00
|
|
|
oid_to_hex(&commit->object.oid), type_name(type));
|
2014-06-11 05:44:13 +08:00
|
|
|
if (sizep)
|
|
|
|
*sizep = size;
|
2014-06-11 05:40:39 +08:00
|
|
|
}
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2018-11-14 08:12:58 +08:00
|
|
|
void repo_unuse_commit_buffer(struct repository *r,
|
|
|
|
const struct commit *commit,
|
|
|
|
const void *buffer)
|
2014-06-11 05:40:39 +08:00
|
|
|
{
|
2018-06-29 09:22:15 +08:00
|
|
|
struct commit_buffer *v = buffer_slab_peek(
|
2018-11-14 08:12:58 +08:00
|
|
|
r->parsed_objects->buffer_slab, commit);
|
2015-05-15 06:25:52 +08:00
|
|
|
if (!(v && v->buffer == buffer))
|
2014-06-11 05:40:39 +08:00
|
|
|
free((void *)buffer);
|
|
|
|
}
|
|
|
|
|
2018-12-15 08:09:40 +08:00
|
|
|
void free_commit_buffer(struct parsed_object_pool *pool, struct commit *commit)
|
provide a helper to free commit buffer
This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.
Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it. But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual). In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.
Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 06:05:37 +08:00
|
|
|
{
|
2018-06-29 09:22:15 +08:00
|
|
|
struct commit_buffer *v = buffer_slab_peek(
|
2018-12-15 08:09:40 +08:00
|
|
|
pool->buffer_slab, commit);
|
2015-05-15 06:25:52 +08:00
|
|
|
if (v) {
|
2017-06-16 07:15:46 +08:00
|
|
|
FREE_AND_NULL(v->buffer);
|
2015-05-15 06:25:52 +08:00
|
|
|
v->size = 0;
|
|
|
|
}
|
provide a helper to free commit buffer
This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.
Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it. But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual). In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.
Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 06:05:37 +08:00
|
|
|
}
|
|
|
|
|
2019-04-16 17:33:18 +08:00
|
|
|
static inline void set_commit_tree(struct commit *c, struct tree *t)
|
|
|
|
{
|
|
|
|
c->maybe_tree = t;
|
|
|
|
}
|
|
|
|
|
2019-04-16 17:33:19 +08:00
|
|
|
struct tree *repo_get_commit_tree(struct repository *r,
|
|
|
|
const struct commit *commit)
|
2018-04-07 03:09:34 +08:00
|
|
|
{
|
2018-04-07 03:09:46 +08:00
|
|
|
if (commit->maybe_tree || !commit->object.parsed)
|
|
|
|
return commit->maybe_tree;
|
|
|
|
|
2020-06-17 17:14:10 +08:00
|
|
|
if (commit_graph_position(commit) != COMMIT_NOT_FROM_GRAPH)
|
2019-05-08 23:37:25 +08:00
|
|
|
return get_commit_tree_in_graph(r, commit);
|
2018-04-07 03:09:46 +08:00
|
|
|
|
2019-04-10 10:13:20 +08:00
|
|
|
return NULL;
|
2018-04-07 03:09:34 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
struct object_id *get_commit_tree_oid(const struct commit *commit)
|
|
|
|
{
|
2023-03-28 21:58:48 +08:00
|
|
|
struct tree *tree = repo_get_commit_tree(the_repository, commit);
|
2019-09-06 06:04:57 +08:00
|
|
|
return tree ? &tree->object.oid : NULL;
|
2018-04-07 03:09:34 +08:00
|
|
|
}
|
|
|
|
|
2018-12-15 08:09:40 +08:00
|
|
|
void release_commit_memory(struct parsed_object_pool *pool, struct commit *c)
|
2018-05-16 05:48:42 +08:00
|
|
|
{
|
2019-04-16 17:33:18 +08:00
|
|
|
set_commit_tree(c, NULL);
|
2018-12-15 08:09:40 +08:00
|
|
|
free_commit_buffer(pool, c);
|
2019-08-26 10:01:37 +08:00
|
|
|
c->index = 0;
|
2018-05-16 05:48:42 +08:00
|
|
|
free_commit_list(c->parents);
|
|
|
|
|
|
|
|
c->object.parsed = 0;
|
|
|
|
}
|
|
|
|
|
2014-06-11 05:44:13 +08:00
|
|
|
const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep)
|
provide a helper to free commit buffer
This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.
Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it. But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual). In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.
Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 06:05:37 +08:00
|
|
|
{
|
2018-06-29 09:22:15 +08:00
|
|
|
struct commit_buffer *v = buffer_slab_peek(
|
|
|
|
the_repository->parsed_objects->buffer_slab, commit);
|
2014-06-11 05:44:13 +08:00
|
|
|
void *ret;
|
|
|
|
|
2015-05-15 06:25:52 +08:00
|
|
|
if (!v) {
|
|
|
|
if (sizep)
|
|
|
|
*sizep = 0;
|
|
|
|
return NULL;
|
|
|
|
}
|
2014-06-11 05:44:13 +08:00
|
|
|
ret = v->buffer;
|
|
|
|
if (sizep)
|
|
|
|
*sizep = v->size;
|
|
|
|
|
|
|
|
v->buffer = NULL;
|
|
|
|
v->size = 0;
|
provide a helper to free commit buffer
This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.
Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it. But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual). In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.
Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 06:05:37 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2018-06-29 09:22:13 +08:00
|
|
|
int parse_commit_buffer(struct repository *r, struct commit *item, const void *buffer, unsigned long size, int check_graph)
|
2005-04-19 02:39:48 +08:00
|
|
|
{
|
2011-02-05 18:52:20 +08:00
|
|
|
const char *tail = buffer;
|
|
|
|
const char *bufptr = buffer;
|
2015-03-14 07:39:34 +08:00
|
|
|
struct object_id parent;
|
2005-06-21 11:26:03 +08:00
|
|
|
struct commit_list **pptr;
|
2005-07-30 15:58:28 +08:00
|
|
|
struct commit_graft *graft;
|
2018-07-16 09:27:56 +08:00
|
|
|
const int tree_entry_len = the_hash_algo->hexsz + 5;
|
|
|
|
const int parent_entry_len = the_hash_algo->hexsz + 7;
|
parse_commit_buffer(): treat lookup_tree() failure as parse error
If parsing a commit yields a valid tree oid, but we've seen that same
oid as a non-tree in the same process, the resulting commit struct will
end up with a NULL tree pointer, but not otherwise report a parsing
failure.
That's perhaps convenient for callers which want to operate on even
partially corrupt commits (e.g., by still looking at the parents). But
it leaves a potential trap for most callers, who now have to manually
check for a NULL tree. Most do not, and it's likely that there are
possible segfaults lurking. I say "possible" because there are many
candidates, and I don't think it's worth following through on
reproducing them when we can just fix them all in one spot. And
certainly we _have_ seen real-world cases, such as the one fixed by
806278dead (commit-graph.c: handle corrupt/missing trees, 2019-09-05).
Note that we can't quite drop the check in the caller added by that
commit yet, as there's some subtlety with repeated parsings (which will
be addressed in a future commit).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-18 12:43:29 +08:00
|
|
|
struct tree *tree;
|
2005-05-07 01:48:34 +08:00
|
|
|
|
2005-04-19 02:39:48 +08:00
|
|
|
if (item->object.parsed)
|
|
|
|
return 0;
|
2022-04-14 04:01:34 +08:00
|
|
|
/*
|
|
|
|
* Presumably this is leftover from an earlier failed parse;
|
|
|
|
* clear it out in preparation for us re-parsing (we'll hit the
|
|
|
|
* same error, but that's good, since it lets our caller know
|
|
|
|
* the result cannot be trusted.
|
|
|
|
*/
|
|
|
|
free_commit_list(item->parents);
|
|
|
|
item->parents = NULL;
|
commit, tag: don't set parsed bit for parse failures
If we can't parse a commit, then parse_commit() will return an error
code. But it _also_ sets the "parsed" flag, which tells us not to bother
trying to re-parse the object. That means that subsequent parses have no
idea that the information in the struct may be bogus. I.e., doing this:
parse_commit(commit);
...
if (parse_commit(commit) < 0)
die("commit is broken");
will never trigger the die(). The second parse_commit() will see the
"parsed" flag and quietly return success.
There are two obvious ways to fix this:
1. Stop setting "parsed" until we've successfully parsed.
2. Keep a second "corrupt" flag to indicate that we saw an error (and
when the parsed flag is set, return 0/-1 depending on the corrupt
flag).
This patch does option 1. The obvious downside versus option 2 is that
we might continually re-parse a broken object. But in practice,
corruption like this is rare, and we typically die() or return an error
in the caller. So it's OK not to worry about optimizing for corruption.
And it's much simpler: we don't need to use an extra bit in the object
struct, and callers which check the "parsed" flag don't need to learn
about the corrupt bit, too.
There's no new test here, because this case is already covered in t5318.
Note that we do need to update the expected message there, because we
now detect the problem in the return from "parse_commit()", and not with
a separate check for a NULL tree. In fact, we can now ditch that
explicit tree check entirely, as we're covered robustly by this change
(and the previous recent change to treat a NULL tree as a parse error).
We'll also give tags the same treatment. I don't know offhand of any
cases where the problem can be triggered (it implies somebody ignoring a
parse error earlier in the process), but consistently returning an error
should cause the least surprise.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-26 05:20:20 +08:00
|
|
|
|
2006-06-28 18:51:00 +08:00
|
|
|
tail += size;
|
2015-03-14 07:39:34 +08:00
|
|
|
if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) ||
|
|
|
|
bufptr[tree_entry_len] != '\n')
|
2015-11-10 10:22:28 +08:00
|
|
|
return error("bogus commit object %s", oid_to_hex(&item->object.oid));
|
2018-05-02 08:25:47 +08:00
|
|
|
if (get_oid_hex(bufptr + 5, &parent) < 0)
|
2006-02-23 09:47:10 +08:00
|
|
|
return error("bad tree pointer in commit %s",
|
2015-11-10 10:22:28 +08:00
|
|
|
oid_to_hex(&item->object.oid));
|
parse_commit_buffer(): treat lookup_tree() failure as parse error
If parsing a commit yields a valid tree oid, but we've seen that same
oid as a non-tree in the same process, the resulting commit struct will
end up with a NULL tree pointer, but not otherwise report a parsing
failure.
That's perhaps convenient for callers which want to operate on even
partially corrupt commits (e.g., by still looking at the parents). But
it leaves a potential trap for most callers, who now have to manually
check for a NULL tree. Most do not, and it's likely that there are
possible segfaults lurking. I say "possible" because there are many
candidates, and I don't think it's worth following through on
reproducing them when we can just fix them all in one spot. And
certainly we _have_ seen real-world cases, such as the one fixed by
806278dead (commit-graph.c: handle corrupt/missing trees, 2019-09-05).
Note that we can't quite drop the check in the caller added by that
commit yet, as there's some subtlety with repeated parsings (which will
be addressed in a future commit).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-18 12:43:29 +08:00
|
|
|
tree = lookup_tree(r, &parent);
|
|
|
|
if (!tree)
|
|
|
|
return error("bad tree pointer %s in commit %s",
|
|
|
|
oid_to_hex(&parent),
|
|
|
|
oid_to_hex(&item->object.oid));
|
|
|
|
set_commit_tree(item, tree);
|
2015-03-14 07:39:34 +08:00
|
|
|
bufptr += tree_entry_len + 1; /* "tree " + "hex sha1" + "\n" */
|
2005-06-21 11:26:03 +08:00
|
|
|
pptr = &item->parents;
|
2005-07-30 15:58:28 +08:00
|
|
|
|
2018-06-29 09:22:13 +08:00
|
|
|
graft = lookup_commit_graft(r, &item->object.oid);
|
commit.c: don't persist substituted parents when unshallowing
Since 37b9dcabfc (shallow.c: use '{commit,rollback}_shallow_file',
2020-04-22), Git knows how to reset stat-validity checks for the
$GIT_DIR/shallow file, allowing it to change between a shallow and
non-shallow state in the same process (e.g., in the case of 'git fetch
--unshallow').
However, when $GIT_DIR/shallow changes, Git does not alter or remove any
grafts (nor substituted parents) in memory.
This comes up in a "git fetch --unshallow" with fetch.writeCommitGraph
set to true. Ordinarily in a shallow repository (and before 37b9dcabfc,
even in this case), commit_graph_compatible() would return false,
indicating that the repository should not be used to write a
commit-graphs (since commit-graph files cannot represent a shallow
history). But since 37b9dcabfc, in an --unshallow operation that check
succeeds.
Thus even though the repository isn't shallow any longer (that is, we
have all of the objects), the in-core representation of those objects
still has munged parents at the shallow boundaries. When the
commit-graph write proceeds, we use the incorrect parentage, producing
wrong results.
There are two ways for a user to work around this: either (1) set
'fetch.writeCommitGraph' to 'false', or (2) drop the commit-graph after
unshallowing.
One way to fix this would be to reset the parsed object pool entirely
(flushing the cache and thus preventing subsequent reads from modifying
their parents) after unshallowing. That would produce a problem when
callers have a now-stale reference to the old pool, and so this patch
implements a different approach. Instead, attach a new bit to the pool,
'substituted_parent', which indicates if the repository *ever* stored a
commit which had its parents modified (i.e., the shallow boundary
prior to unshallowing).
This bit needs to be sticky because all reads subsequent to modifying a
commit's parents are unreliable when unshallowing. Modify the check in
'commit_graph_compatible' to take this bit into account, and correctly
avoid generating commit-graphs in this case, thus solving the bug.
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Reported-by: Jay Conrod <jayconrod@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-09 05:10:53 +08:00
|
|
|
if (graft)
|
|
|
|
r->parsed_objects->substituted_parent = 1;
|
2015-03-14 07:39:34 +08:00
|
|
|
while (bufptr + parent_entry_len < tail && !memcmp(bufptr, "parent ", 7)) {
|
2005-07-28 06:12:48 +08:00
|
|
|
struct commit *new_parent;
|
|
|
|
|
2015-03-14 07:39:34 +08:00
|
|
|
if (tail <= bufptr + parent_entry_len + 1 ||
|
2018-05-02 08:25:47 +08:00
|
|
|
get_oid_hex(bufptr + 7, &parent) ||
|
2015-03-14 07:39:34 +08:00
|
|
|
bufptr[parent_entry_len] != '\n')
|
2015-11-10 10:22:28 +08:00
|
|
|
return error("bad parents in commit %s", oid_to_hex(&item->object.oid));
|
2015-03-14 07:39:34 +08:00
|
|
|
bufptr += parent_entry_len + 1;
|
2009-07-23 23:33:49 +08:00
|
|
|
/*
|
|
|
|
* The clone is shallow if nr_parent < 0, and we must
|
|
|
|
* not traverse its real parents even when we unhide them.
|
|
|
|
*/
|
2023-07-21 20:41:56 +08:00
|
|
|
if (graft && (graft->nr_parent < 0 || !grafts_keep_true_parents))
|
2005-07-30 15:58:28 +08:00
|
|
|
continue;
|
2018-06-29 09:22:13 +08:00
|
|
|
new_parent = lookup_commit(r, &parent);
|
2019-10-18 12:42:13 +08:00
|
|
|
if (!new_parent)
|
|
|
|
return error("bad parent %s in commit %s",
|
|
|
|
oid_to_hex(&parent),
|
|
|
|
oid_to_hex(&item->object.oid));
|
|
|
|
pptr = &commit_list_insert(new_parent, pptr)->next;
|
2005-07-30 15:58:28 +08:00
|
|
|
}
|
|
|
|
if (graft) {
|
|
|
|
int i;
|
|
|
|
struct commit *new_parent;
|
|
|
|
for (i = 0; i < graft->nr_parent; i++) {
|
2018-06-29 09:22:13 +08:00
|
|
|
new_parent = lookup_commit(r,
|
2018-06-29 09:21:59 +08:00
|
|
|
&graft->parent[i]);
|
2005-07-30 15:58:28 +08:00
|
|
|
if (!new_parent)
|
2019-10-18 12:42:13 +08:00
|
|
|
return error("bad graft parent %s in commit %s",
|
|
|
|
oid_to_hex(&graft->parent[i]),
|
|
|
|
oid_to_hex(&item->object.oid));
|
2005-07-30 15:58:28 +08:00
|
|
|
pptr = &commit_list_insert(new_parent, pptr)->next;
|
|
|
|
}
|
2005-04-19 02:39:48 +08:00
|
|
|
}
|
2008-01-20 01:35:23 +08:00
|
|
|
item->date = parse_commit_date(bufptr, tail);
|
2005-11-17 13:32:44 +08:00
|
|
|
|
2018-05-01 20:47:13 +08:00
|
|
|
if (check_graph)
|
2019-05-09 22:22:31 +08:00
|
|
|
load_commit_graph_info(r, item);
|
2018-05-01 20:47:13 +08:00
|
|
|
|
commit, tag: don't set parsed bit for parse failures
If we can't parse a commit, then parse_commit() will return an error
code. But it _also_ sets the "parsed" flag, which tells us not to bother
trying to re-parse the object. That means that subsequent parses have no
idea that the information in the struct may be bogus. I.e., doing this:
parse_commit(commit);
...
if (parse_commit(commit) < 0)
die("commit is broken");
will never trigger the die(). The second parse_commit() will see the
"parsed" flag and quietly return success.
There are two obvious ways to fix this:
1. Stop setting "parsed" until we've successfully parsed.
2. Keep a second "corrupt" flag to indicate that we saw an error (and
when the parsed flag is set, return 0/-1 depending on the corrupt
flag).
This patch does option 1. The obvious downside versus option 2 is that
we might continually re-parse a broken object. But in practice,
corruption like this is rare, and we typically die() or return an error
in the caller. So it's OK not to worry about optimizing for corruption.
And it's much simpler: we don't need to use an extra bit in the object
struct, and callers which check the "parsed" flag don't need to learn
about the corrupt bit, too.
There's no new test here, because this case is already covered in t5318.
Note that we do need to update the expected message there, because we
now detect the problem in the return from "parse_commit()", and not with
a separate check for a NULL tree. In fact, we can now ditch that
explicit tree check entirely, as we're covered robustly by this change
(and the previous recent change to treat a NULL tree as a parse error).
We'll also give tags the same treatment. I don't know offhand of any
cases where the problem can be triggered (it implies somebody ignoring a
parse error earlier in the process), but consistently returning an error
should cause the least surprise.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-26 05:20:20 +08:00
|
|
|
item->object.parsed = 1;
|
2005-04-19 02:39:48 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-11-14 08:12:50 +08:00
|
|
|
int repo_parse_commit_internal(struct repository *r,
|
|
|
|
struct commit *item,
|
|
|
|
int quiet_on_missing,
|
|
|
|
int use_commit_graph)
|
2005-05-07 01:48:34 +08:00
|
|
|
{
|
2007-02-27 03:55:59 +08:00
|
|
|
enum object_type type;
|
2005-05-07 01:48:34 +08:00
|
|
|
void *buffer;
|
|
|
|
unsigned long size;
|
commit: don't lazy-fetch commits
When parsing commits, fail fast when the commit is missing or
corrupt, instead of attempting to fetch them. This is done by inlining
repo_read_object_file() and setting the flag that prevents fetching.
This is motivated by a situation in which through a bug (not necessarily
through Git), there was corruption in the object store of a partial
clone. In this particular case, the problem was exposed when "git gc"
tried to expire reflogs, which calls repo_parse_commit(), which triggers
fetches of the missing commits.
(There are other possible solutions to this problem including passing an
argument from "git gc" to "git reflog" to inhibit all lazy fetches, but
I think that this fix is at the wrong level - fixing "git reflog" means
that this particular command works fine, or so we think (it will fail if
it somehow needs to read a legitimately missing blob, say, a .gitmodules
file), but fixing repo_parse_commit() will fix a whole class of bugs.)
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-15 03:17:43 +08:00
|
|
|
struct object_info oi = {
|
|
|
|
.typep = &type,
|
|
|
|
.sizep = &size,
|
|
|
|
.contentp = &buffer,
|
|
|
|
};
|
|
|
|
/*
|
|
|
|
* Git does not support partial clones that exclude commits, so set
|
|
|
|
* OBJECT_INFO_SKIP_FETCH_OBJECT to fail fast when an object is missing.
|
|
|
|
*/
|
|
|
|
int flags = OBJECT_INFO_LOOKUP_REPLACE | OBJECT_INFO_SKIP_FETCH_OBJECT |
|
|
|
|
OBJECT_INFO_DIE_IF_CORRUPT;
|
2005-05-07 01:48:34 +08:00
|
|
|
int ret;
|
|
|
|
|
2008-02-19 04:48:02 +08:00
|
|
|
if (!item)
|
|
|
|
return -1;
|
2005-05-07 01:48:34 +08:00
|
|
|
if (item->object.parsed)
|
|
|
|
return 0;
|
commit: detect commits that exist in commit-graph but not in the ODB
Commit graphs can become stale and contain references to commits that do
not exist in the object database anymore. Theoretically, this can lead
to a scenario where we are able to successfully look up any such commit
via the commit graph even though such a lookup would fail if done via
the object database directly.
As the commit graph is mostly intended as a sort of cache to speed up
parsing of commits we do not want to have diverging behaviour in a
repository with and a repository without commit graphs, no matter
whether they are stale or not. As commits are otherwise immutable, the
only thing that we really need to care about is thus the presence or
absence of a commit.
To address potentially stale commit data that may exist in the graph,
our `lookup_commit_in_graph()` function will check for the commit's
existence in both the commit graph, but also in the object database. So
even if we were able to look up the commit's data in the graph, we would
still pretend as if the commit didn't exist if it is missing in the
object database.
We don't have the same safety net in `parse_commit_in_graph_one()`
though. This function is mostly used internally in "commit-graph.c"
itself to validate the commit graph, and this usage is fine. We do
expose its functionality via `parse_commit_in_graph()` though, which
gets called by `repo_parse_commit_internal()`, and that function is in
turn used in many places in our codebase.
For all I can see this function is never used to directly turn an object
ID into a commit object without additional safety checks before or after
this lookup. What it is being used for though is to walk history via the
parent chain of commits. So when commits in the parent chain of a graph
walk are missing it is possible that we wouldn't notice if that missing
commit was part of the commit graph. Thus, a query like `git rev-parse
HEAD~2` can succeed even if the intermittent commit is missing.
It's unclear whether there are additional ways in which such stale
commit graphs can lead to problems. In any case, it feels like this is a
bigger bug waiting to happen when we gain additional direct or indirect
callers of `repo_parse_commit_internal()`. So let's fix the inconsistent
behaviour by checking for object existence via the object database, as
well.
This check of course comes with a performance penalty. The following
benchmarks have been executed in a clone of linux.git with stable tags
added:
Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master)
Time (mean ± σ): 2.913 s ± 0.018 s [User: 2.363 s, System: 0.548 s]
Range (min … max): 2.894 s … 2.950 s 10 runs
Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
Time (mean ± σ): 3.834 s ± 0.052 s [User: 3.276 s, System: 0.556 s]
Range (min … max): 3.780 s … 3.961 s 10 runs
Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master)
Time (mean ± σ): 13.841 s ± 0.084 s [User: 13.152 s, System: 0.687 s]
Range (min … max): 13.714 s … 13.995 s 10 runs
Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
Time (mean ± σ): 13.762 s ± 0.116 s [User: 13.094 s, System: 0.667 s]
Range (min … max): 13.645 s … 14.038 s 10 runs
Summary
git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran
1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master)
We look at a ~30% regression in general, but in general we're still a
whole lot faster than without the commit graph. To counteract this, the
new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-31 15:16:18 +08:00
|
|
|
if (use_commit_graph && parse_commit_in_graph(r, item)) {
|
|
|
|
static int commit_graph_paranoia = -1;
|
|
|
|
|
|
|
|
if (commit_graph_paranoia == -1)
|
commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default
In 7a5d604443 (commit: detect commits that exist in commit-graph but not
in the ODB, 2023-10-31), we have introduced a new object existence check
into `repo_parse_commit_internal()` so that we do not parse commits via
the commit-graph that don't have a corresponding object in the object
database. This new check of course comes with a performance penalty,
which the commit put at around 30% for `git rev-list --topo-order`. But
there are in fact scenarios where the performance regression is even
higher. The following benchmark against linux.git with a fully-build
commit-graph:
Benchmark 1: git.v2.42.1 rev-list --count HEAD
Time (mean ± σ): 658.0 ms ± 5.2 ms [User: 613.5 ms, System: 44.4 ms]
Range (min … max): 650.2 ms … 666.0 ms 10 runs
Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD
Time (mean ± σ): 1.333 s ± 0.019 s [User: 1.263 s, System: 0.069 s]
Range (min … max): 1.302 s … 1.361 s 10 runs
Summary
git.v2.42.1 rev-list --count HEAD ran
2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD
While it's a noble goal to ensure that results are the same regardless
of whether or not we have a potentially stale commit-graph, taking twice
as much time is a tough sell. Furthermore, we can generally assume that
the commit-graph will be updated by git-gc(1) or git-maintenance(1) as
required so that the case where the commit-graph is stale should not at
all be common.
With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore
the behaviour and thus performance previous to the mentioned commit. In
order to not be inconsistent, also disable this behaviour by default in
`lookup_commit_in_graph()`, where the object existence check has been
introduced right at its inception via f559d6d45e (revision: avoid
hitting packfiles when commits are in commit-graph, 2021-08-09).
This results in another speedup in commands that end up calling this
function, even though it's less pronounced compared to the above
benchmark. The following has been executed in linux.git with ~1.2
million references:
Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted
Time (mean ± σ): 2.947 s ± 0.003 s [User: 2.412 s, System: 0.534 s]
Range (min … max): 2.943 s … 2.949 s 3 runs
Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted
Time (mean ± σ): 2.724 s ± 0.030 s [User: 2.207 s, System: 0.514 s]
Range (min … max): 2.704 s … 2.759 s 3 runs
Summary
GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran
1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted
So whereas 7a5d604443 initially introduced the logic to start doing an
object existence check in `repo_parse_commit_internal()` by default, the
updated logic will now instead cause `lookup_commit_in_graph()` to stop
doing the check by default. This behaviour continues to be tweakable by
the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable.
Note that this requires us to amend some tests to manually turn on the
paranoid checks again. This is because we cause repository corruption by
manually deleting objects which are part of the commit graph already.
These circumstances shouldn't usually happen in repositories.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-24 19:08:21 +08:00
|
|
|
commit_graph_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 0);
|
commit: detect commits that exist in commit-graph but not in the ODB
Commit graphs can become stale and contain references to commits that do
not exist in the object database anymore. Theoretically, this can lead
to a scenario where we are able to successfully look up any such commit
via the commit graph even though such a lookup would fail if done via
the object database directly.
As the commit graph is mostly intended as a sort of cache to speed up
parsing of commits we do not want to have diverging behaviour in a
repository with and a repository without commit graphs, no matter
whether they are stale or not. As commits are otherwise immutable, the
only thing that we really need to care about is thus the presence or
absence of a commit.
To address potentially stale commit data that may exist in the graph,
our `lookup_commit_in_graph()` function will check for the commit's
existence in both the commit graph, but also in the object database. So
even if we were able to look up the commit's data in the graph, we would
still pretend as if the commit didn't exist if it is missing in the
object database.
We don't have the same safety net in `parse_commit_in_graph_one()`
though. This function is mostly used internally in "commit-graph.c"
itself to validate the commit graph, and this usage is fine. We do
expose its functionality via `parse_commit_in_graph()` though, which
gets called by `repo_parse_commit_internal()`, and that function is in
turn used in many places in our codebase.
For all I can see this function is never used to directly turn an object
ID into a commit object without additional safety checks before or after
this lookup. What it is being used for though is to walk history via the
parent chain of commits. So when commits in the parent chain of a graph
walk are missing it is possible that we wouldn't notice if that missing
commit was part of the commit graph. Thus, a query like `git rev-parse
HEAD~2` can succeed even if the intermittent commit is missing.
It's unclear whether there are additional ways in which such stale
commit graphs can lead to problems. In any case, it feels like this is a
bigger bug waiting to happen when we gain additional direct or indirect
callers of `repo_parse_commit_internal()`. So let's fix the inconsistent
behaviour by checking for object existence via the object database, as
well.
This check of course comes with a performance penalty. The following
benchmarks have been executed in a clone of linux.git with stable tags
added:
Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master)
Time (mean ± σ): 2.913 s ± 0.018 s [User: 2.363 s, System: 0.548 s]
Range (min … max): 2.894 s … 2.950 s 10 runs
Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
Time (mean ± σ): 3.834 s ± 0.052 s [User: 3.276 s, System: 0.556 s]
Range (min … max): 3.780 s … 3.961 s 10 runs
Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master)
Time (mean ± σ): 13.841 s ± 0.084 s [User: 13.152 s, System: 0.687 s]
Range (min … max): 13.714 s … 13.995 s 10 runs
Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
Time (mean ± σ): 13.762 s ± 0.116 s [User: 13.094 s, System: 0.667 s]
Range (min … max): 13.645 s … 14.038 s 10 runs
Summary
git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran
1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master)
We look at a ~30% regression in general, but in general we're still a
whole lot faster than without the commit graph. To counteract this, the
new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-31 15:16:18 +08:00
|
|
|
|
|
|
|
if (commit_graph_paranoia && !has_object(r, &item->object.oid, 0)) {
|
|
|
|
unparse_commit(r, &item->object.oid);
|
|
|
|
return quiet_on_missing ? -1 :
|
|
|
|
error(_("commit %s exists in commit-graph but not in the object database"),
|
|
|
|
oid_to_hex(&item->object.oid));
|
|
|
|
}
|
|
|
|
|
2018-04-10 20:56:05 +08:00
|
|
|
return 0;
|
commit: detect commits that exist in commit-graph but not in the ODB
Commit graphs can become stale and contain references to commits that do
not exist in the object database anymore. Theoretically, this can lead
to a scenario where we are able to successfully look up any such commit
via the commit graph even though such a lookup would fail if done via
the object database directly.
As the commit graph is mostly intended as a sort of cache to speed up
parsing of commits we do not want to have diverging behaviour in a
repository with and a repository without commit graphs, no matter
whether they are stale or not. As commits are otherwise immutable, the
only thing that we really need to care about is thus the presence or
absence of a commit.
To address potentially stale commit data that may exist in the graph,
our `lookup_commit_in_graph()` function will check for the commit's
existence in both the commit graph, but also in the object database. So
even if we were able to look up the commit's data in the graph, we would
still pretend as if the commit didn't exist if it is missing in the
object database.
We don't have the same safety net in `parse_commit_in_graph_one()`
though. This function is mostly used internally in "commit-graph.c"
itself to validate the commit graph, and this usage is fine. We do
expose its functionality via `parse_commit_in_graph()` though, which
gets called by `repo_parse_commit_internal()`, and that function is in
turn used in many places in our codebase.
For all I can see this function is never used to directly turn an object
ID into a commit object without additional safety checks before or after
this lookup. What it is being used for though is to walk history via the
parent chain of commits. So when commits in the parent chain of a graph
walk are missing it is possible that we wouldn't notice if that missing
commit was part of the commit graph. Thus, a query like `git rev-parse
HEAD~2` can succeed even if the intermittent commit is missing.
It's unclear whether there are additional ways in which such stale
commit graphs can lead to problems. In any case, it feels like this is a
bigger bug waiting to happen when we gain additional direct or indirect
callers of `repo_parse_commit_internal()`. So let's fix the inconsistent
behaviour by checking for object existence via the object database, as
well.
This check of course comes with a performance penalty. The following
benchmarks have been executed in a clone of linux.git with stable tags
added:
Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master)
Time (mean ± σ): 2.913 s ± 0.018 s [User: 2.363 s, System: 0.548 s]
Range (min … max): 2.894 s … 2.950 s 10 runs
Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
Time (mean ± σ): 3.834 s ± 0.052 s [User: 3.276 s, System: 0.556 s]
Range (min … max): 3.780 s … 3.961 s 10 runs
Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master)
Time (mean ± σ): 13.841 s ± 0.084 s [User: 13.152 s, System: 0.687 s]
Range (min … max): 13.714 s … 13.995 s 10 runs
Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
Time (mean ± σ): 13.762 s ± 0.116 s [User: 13.094 s, System: 0.667 s]
Range (min … max): 13.645 s … 14.038 s 10 runs
Summary
git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran
1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master)
We look at a ~30% regression in general, but in general we're still a
whole lot faster than without the commit graph. To counteract this, the
new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-31 15:16:18 +08:00
|
|
|
}
|
commit: don't lazy-fetch commits
When parsing commits, fail fast when the commit is missing or
corrupt, instead of attempting to fetch them. This is done by inlining
repo_read_object_file() and setting the flag that prevents fetching.
This is motivated by a situation in which through a bug (not necessarily
through Git), there was corruption in the object store of a partial
clone. In this particular case, the problem was exposed when "git gc"
tried to expire reflogs, which calls repo_parse_commit(), which triggers
fetches of the missing commits.
(There are other possible solutions to this problem including passing an
argument from "git gc" to "git reflog" to inhibit all lazy fetches, but
I think that this fix is at the wrong level - fixing "git reflog" means
that this particular command works fine, or so we think (it will fail if
it somehow needs to read a legitimately missing blob, say, a .gitmodules
file), but fixing repo_parse_commit() will fix a whole class of bugs.)
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-15 03:17:43 +08:00
|
|
|
|
|
|
|
if (oid_object_info_extended(r, &item->object.oid, &oi, flags) < 0)
|
add quieter versions of parse_{tree,commit}
When we call parse_commit, it will complain to stderr if the
object does not exist or cannot be read. This means that we
may produce useless error messages if this situation is
expected (e.g., because the object is marked UNINTERESTING,
or because revs->ignore_missing_links is set).
We can fix this by adding a new "parse_X_gently" form that
takes a flag to suppress the messages. The existing
"parse_X" form is already gentle in the sense that it
returns an error rather than dying, and we could in theory
just add a "quiet" flag to it (with existing callers passing
"0"). But doing it this way means we do not have to disturb
existing callers.
Note also that the new flag is "quiet_on_missing", and not
just "quiet". We could add a flag to suppress _all_ errors,
but besides being a more invasive change (we would have to
pass the flag down to sub-functions, too), there is a good
reason not to: we would never want to use it. Missing a
linked object is expected in some circumstances, but it is
never expected to have a malformed commit, or to get a tree
when we wanted a commit. We should always complain about
these corruptions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-01 17:56:26 +08:00
|
|
|
return quiet_on_missing ? -1 :
|
|
|
|
error("Could not read %s",
|
2015-11-10 10:22:28 +08:00
|
|
|
oid_to_hex(&item->object.oid));
|
2007-02-27 03:55:59 +08:00
|
|
|
if (type != OBJ_COMMIT) {
|
2005-05-07 01:48:34 +08:00
|
|
|
free(buffer);
|
|
|
|
return error("Object %s not a commit",
|
2015-11-10 10:22:28 +08:00
|
|
|
oid_to_hex(&item->object.oid));
|
2005-05-07 01:48:34 +08:00
|
|
|
}
|
2018-06-27 21:24:30 +08:00
|
|
|
|
2018-11-14 08:12:50 +08:00
|
|
|
ret = parse_commit_buffer(r, item, buffer, size, 0);
|
commit: avoid leaking already-saved buffer
When we parse a commit via repo_parse_commit_internal(), if
save_commit_buffer is set we'll stuff the buffer of the object contents
into a cache, overwriting any previous value.
This can result in a leak of that previously cached value, though it's
rare in practice. If we have a value in the cache it would have come
from a previous parse, and during that parse we'd set the object.parsed
flag, causing any subsequent parse attempts to exit without doing any
work.
But it's possible to "unparse" a commit, which we do when registering a
commit graft. And since shallow fetches are implemented using grafts,
the leak is triggered in practice by t5539.
There are a number of possible ways to address this:
1. the unparsing function could clear the cached commit buffer, too. I
think this would work for the case I found, but I'm not sure if
there are other ways to end up in the same state (an unparsed
commit with an entry in the commit buffer cache).
2. when we parse, we could check the buffer cache and prefer it to
reading the contents from the object database. In theory the
contents of a particular sha1 are immutable, but the code in
question is violating the immutability with grafts. So this
approach makes me a bit nervous, although I think it would work in
practice (the grafts are applied to what we parse, but we still
retain the original contents).
3. We could realize the cache is already populated and discard its
contents before overwriting. It's possible some other code could be
holding on to a pointer to the old cache entry (and we'd introduce
a use-after-free), but I think the risk of that is relatively low.
4. The reverse of (3): when the cache is populated, don't bother
saving our new copy. This is perhaps a little weird, since we'll
have just populated the commit struct based on a different buffer.
But the two buffers should be the same, even in the presence of
grafts (as in (2) above).
I went with option 4. It addresses the leak directly and doesn't carry
any risk of breaking other assumptions. And it's the same technique used
by parse_object_buffer() for this situation, though I'm not sure when it
would even come up there. The extra safety has been there since
bd1e17e245 (Make "parse_object()" also fill in commit message buffer
data., 2005-05-25).
This lets us mark t5539 as leak-free.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-25 05:54:34 +08:00
|
|
|
if (save_commit_buffer && !ret &&
|
|
|
|
!get_cached_commit_buffer(r, item, NULL)) {
|
2018-11-14 08:12:50 +08:00
|
|
|
set_commit_buffer(r, item, buffer, size);
|
2005-05-26 09:27:14 +08:00
|
|
|
return 0;
|
|
|
|
}
|
2005-05-07 01:48:34 +08:00
|
|
|
free(buffer);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2018-11-14 08:12:50 +08:00
|
|
|
int repo_parse_commit_gently(struct repository *r,
|
|
|
|
struct commit *item, int quiet_on_missing)
|
2018-06-27 21:24:30 +08:00
|
|
|
{
|
2018-11-14 08:12:50 +08:00
|
|
|
return repo_parse_commit_internal(r, item, quiet_on_missing, 1);
|
2018-06-27 21:24:30 +08:00
|
|
|
}
|
|
|
|
|
2013-10-24 16:52:36 +08:00
|
|
|
void parse_commit_or_die(struct commit *item)
|
|
|
|
{
|
2023-03-28 21:58:48 +08:00
|
|
|
if (repo_parse_commit(the_repository, item))
|
2013-10-24 16:52:36 +08:00
|
|
|
die("unable to parse commit %s",
|
2015-11-10 10:22:28 +08:00
|
|
|
item ? oid_to_hex(&item->object.oid) : "(null)");
|
2013-10-24 16:52:36 +08:00
|
|
|
}
|
|
|
|
|
2010-07-22 21:18:30 +08:00
|
|
|
int find_commit_subject(const char *commit_buffer, const char **subject)
|
|
|
|
{
|
|
|
|
const char *eol;
|
|
|
|
const char *p = commit_buffer;
|
|
|
|
|
|
|
|
while (*p && (*p != '\n' || p[1] != '\n'))
|
|
|
|
p++;
|
|
|
|
if (*p) {
|
2016-06-23 04:20:20 +08:00
|
|
|
p = skip_blank_lines(p + 2);
|
2017-01-28 10:01:41 +08:00
|
|
|
eol = strchrnul(p, '\n');
|
2010-07-22 21:18:30 +08:00
|
|
|
} else
|
|
|
|
eol = p;
|
|
|
|
|
|
|
|
*subject = p;
|
|
|
|
|
|
|
|
return eol - p;
|
|
|
|
}
|
|
|
|
|
2021-03-15 15:54:31 +08:00
|
|
|
size_t commit_subject_length(const char *body)
|
|
|
|
{
|
|
|
|
const char *p = body;
|
|
|
|
while (*p) {
|
|
|
|
const char *next = skip_blank_lines(p);
|
|
|
|
if (next != p)
|
|
|
|
break;
|
|
|
|
p = strchrnul(p, '\n');
|
|
|
|
if (*p)
|
|
|
|
p++;
|
|
|
|
}
|
|
|
|
return p - body;
|
|
|
|
}
|
|
|
|
|
2005-05-31 09:44:02 +08:00
|
|
|
struct commit_list *commit_list_insert(struct commit *item, struct commit_list **list_p)
|
2005-04-24 09:47:23 +08:00
|
|
|
{
|
2005-04-27 03:00:58 +08:00
|
|
|
struct commit_list *new_list = xmalloc(sizeof(struct commit_list));
|
2005-04-24 09:47:23 +08:00
|
|
|
new_list->item = item;
|
|
|
|
new_list->next = *list_p;
|
|
|
|
*list_p = new_list;
|
2005-05-31 09:44:02 +08:00
|
|
|
return new_list;
|
2005-04-24 09:47:23 +08:00
|
|
|
}
|
|
|
|
|
2020-12-09 06:04:13 +08:00
|
|
|
int commit_list_contains(struct commit *item, struct commit_list *list)
|
|
|
|
{
|
|
|
|
while (list) {
|
|
|
|
if (list->item == item)
|
|
|
|
return 1;
|
|
|
|
list = list->next;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2008-06-28 00:21:55 +08:00
|
|
|
unsigned commit_list_count(const struct commit_list *l)
|
|
|
|
{
|
|
|
|
unsigned c = 0;
|
|
|
|
for (; l; l = l->next )
|
|
|
|
c++;
|
|
|
|
return c;
|
|
|
|
}
|
|
|
|
|
2024-06-11 17:21:11 +08:00
|
|
|
struct commit_list *copy_commit_list(const struct commit_list *list)
|
log: use true parents for diff even when rewriting
When using pathspec filtering in combination with diff-based log
output, parent simplification happens before the diff is computed.
The diff is therefore against the *simplified* parents.
This works okay, arguably by accident, in the normal case:
simplification reduces to one parent as long as the commit is TREESAME
to it. So the simplified parent of any given commit must have the
same tree contents on the filtered paths as its true (unfiltered)
parent.
However, --full-diff breaks this guarantee, and indeed gives pretty
spectacular results when comparing the output of
git log --graph --stat ...
git log --graph --full-diff --stat ...
(--graph internally kicks in parent simplification, much like
--parents).
To fix it, store a copy of the parent list before simplification (in a
slab) whenever --full-diff is in effect. Then use the stored parents
instead of the simplified ones in the commit display code paths. The
latter do not actually check for --full-diff to avoid duplicated code;
they just grab the original parents if save_parents() has not been
called for this revision walk.
For ordinary commits it should be obvious that this is the right thing
to do.
Merge commits are a bit subtle. Observe that with default
simplification, merge simplification is an all-or-nothing decision:
either the merge is TREESAME to one parent and disappears, or it is
different from all parents and the parent list remains intact.
Redundant parents are not pruned, so the existing code also shows them
as a merge.
So if we do show a merge commit, the parent list just consists of the
rewrite result on each parent. Running, e.g., --cc on this in
--full-diff mode is not very useful: if any commits were skipped, some
hunks will disagree with all sides of the merge (with one side,
because commits were skipped; with the others, because they didn't
have those changes in the first place). This triggers --cc showing
these hunks spuriously.
Therefore I believe that even for merge commits it is better to show
the diffs wrt. the original parents.
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01 04:13:20 +08:00
|
|
|
{
|
|
|
|
struct commit_list *head = NULL;
|
|
|
|
struct commit_list **pp = &head;
|
|
|
|
while (list) {
|
2014-07-10 17:47:47 +08:00
|
|
|
pp = commit_list_append(list->item, pp);
|
log: use true parents for diff even when rewriting
When using pathspec filtering in combination with diff-based log
output, parent simplification happens before the diff is computed.
The diff is therefore against the *simplified* parents.
This works okay, arguably by accident, in the normal case:
simplification reduces to one parent as long as the commit is TREESAME
to it. So the simplified parent of any given commit must have the
same tree contents on the filtered paths as its true (unfiltered)
parent.
However, --full-diff breaks this guarantee, and indeed gives pretty
spectacular results when comparing the output of
git log --graph --stat ...
git log --graph --full-diff --stat ...
(--graph internally kicks in parent simplification, much like
--parents).
To fix it, store a copy of the parent list before simplification (in a
slab) whenever --full-diff is in effect. Then use the stored parents
instead of the simplified ones in the commit display code paths. The
latter do not actually check for --full-diff to avoid duplicated code;
they just grab the original parents if save_parents() has not been
called for this revision walk.
For ordinary commits it should be obvious that this is the right thing
to do.
Merge commits are a bit subtle. Observe that with default
simplification, merge simplification is an all-or-nothing decision:
either the merge is TREESAME to one parent and disappears, or it is
different from all parents and the parent list remains intact.
Redundant parents are not pruned, so the existing code also shows them
as a merge.
So if we do show a merge commit, the parent list just consists of the
rewrite result on each parent. Running, e.g., --cc on this in
--full-diff mode is not very useful: if any commits were skipped, some
hunks will disagree with all sides of the merge (with one side,
because commits were skipped; with the others, because they didn't
have those changes in the first place). This triggers --cc showing
these hunks spuriously.
Therefore I believe that even for merge commits it is better to show
the diffs wrt. the original parents.
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01 04:13:20 +08:00
|
|
|
list = list->next;
|
|
|
|
}
|
|
|
|
return head;
|
|
|
|
}
|
|
|
|
|
2020-12-17 06:27:59 +08:00
|
|
|
struct commit_list *reverse_commit_list(struct commit_list *list)
|
|
|
|
{
|
|
|
|
struct commit_list *next = NULL, *current, *backup;
|
|
|
|
for (current = list; current; current = backup) {
|
|
|
|
backup = current->next;
|
|
|
|
current->next = next;
|
|
|
|
next = current;
|
|
|
|
}
|
|
|
|
return next;
|
|
|
|
}
|
|
|
|
|
2005-04-19 02:39:48 +08:00
|
|
|
void free_commit_list(struct commit_list *list)
|
|
|
|
{
|
2015-10-25 00:21:31 +08:00
|
|
|
while (list)
|
|
|
|
pop_commit(&list);
|
2005-04-19 02:39:48 +08:00
|
|
|
}
|
2005-04-24 09:47:23 +08:00
|
|
|
|
2010-11-27 09:58:14 +08:00
|
|
|
struct commit_list * commit_list_insert_by_date(struct commit *item, struct commit_list **list)
|
2005-04-24 09:47:23 +08:00
|
|
|
{
|
|
|
|
struct commit_list **pp = list;
|
|
|
|
struct commit_list *p;
|
|
|
|
while ((p = *pp) != NULL) {
|
|
|
|
if (p->item->date < item->date) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
pp = &p->next;
|
|
|
|
}
|
2005-07-07 00:31:17 +08:00
|
|
|
return commit_list_insert(item, pp);
|
2005-04-24 09:47:23 +08:00
|
|
|
}
|
|
|
|
|
2022-07-17 00:59:05 +08:00
|
|
|
static int commit_list_compare_by_date(const struct commit_list *a,
|
|
|
|
const struct commit_list *b)
|
2012-04-01 06:10:39 +08:00
|
|
|
{
|
2022-07-17 00:59:05 +08:00
|
|
|
timestamp_t a_date = a->item->date;
|
|
|
|
timestamp_t b_date = b->item->date;
|
2012-04-01 06:10:39 +08:00
|
|
|
if (a_date < b_date)
|
|
|
|
return 1;
|
|
|
|
if (a_date > b_date)
|
|
|
|
return -1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-07-17 00:59:05 +08:00
|
|
|
DEFINE_LIST_SORT(static, commit_list_sort, struct commit_list, next);
|
2007-06-07 15:04:01 +08:00
|
|
|
|
2010-11-27 09:58:14 +08:00
|
|
|
void commit_list_sort_by_date(struct commit_list **list)
|
2005-04-24 09:47:23 +08:00
|
|
|
{
|
2022-07-17 00:59:05 +08:00
|
|
|
commit_list_sort(list, commit_list_compare_by_date);
|
2005-04-24 09:47:23 +08:00
|
|
|
}
|
|
|
|
|
2005-04-24 11:29:22 +08:00
|
|
|
struct commit *pop_most_recent_commit(struct commit_list **list,
|
|
|
|
unsigned int mark)
|
2005-04-24 09:47:23 +08:00
|
|
|
{
|
2015-10-25 00:21:31 +08:00
|
|
|
struct commit *ret = pop_commit(list);
|
2005-04-24 09:47:23 +08:00
|
|
|
struct commit_list *parents = ret->parents;
|
|
|
|
|
|
|
|
while (parents) {
|
2005-04-24 10:21:28 +08:00
|
|
|
struct commit *commit = parents->item;
|
2023-03-28 21:58:48 +08:00
|
|
|
if (!repo_parse_commit(the_repository, commit) && !(commit->object.flags & mark)) {
|
2005-04-24 11:29:22 +08:00
|
|
|
commit->object.flags |= mark;
|
2010-11-27 09:58:14 +08:00
|
|
|
commit_list_insert_by_date(commit, list);
|
2005-04-24 10:21:28 +08:00
|
|
|
}
|
2005-04-24 09:47:23 +08:00
|
|
|
parents = parents->next;
|
|
|
|
}
|
|
|
|
return ret;
|
|
|
|
}
|
2005-06-01 23:34:23 +08:00
|
|
|
|
2012-01-14 20:19:53 +08:00
|
|
|
static void clear_commit_marks_1(struct commit_list **plist,
|
|
|
|
struct commit *commit, unsigned int mark)
|
2006-01-08 10:52:42 +08:00
|
|
|
{
|
2007-10-11 06:14:35 +08:00
|
|
|
while (commit) {
|
|
|
|
struct commit_list *parents;
|
2006-01-08 10:52:42 +08:00
|
|
|
|
2007-10-11 06:14:35 +08:00
|
|
|
if (!(mark & commit->object.flags))
|
|
|
|
return;
|
2006-07-05 08:45:22 +08:00
|
|
|
|
2007-10-11 06:14:35 +08:00
|
|
|
commit->object.flags &= ~mark;
|
|
|
|
|
|
|
|
parents = commit->parents;
|
|
|
|
if (!parents)
|
|
|
|
return;
|
|
|
|
|
2022-12-13 14:27:10 +08:00
|
|
|
while ((parents = parents->next)) {
|
|
|
|
if (parents->item->object.flags & mark)
|
|
|
|
commit_list_insert(parents->item, plist);
|
|
|
|
}
|
2007-10-11 06:14:35 +08:00
|
|
|
|
|
|
|
commit = commit->parents->item;
|
2006-01-08 10:52:42 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-03-06 03:42:20 +08:00
|
|
|
void clear_commit_marks_many(int nr, struct commit **commit, unsigned int mark)
|
2012-01-14 20:19:53 +08:00
|
|
|
{
|
|
|
|
struct commit_list *list = NULL;
|
2013-03-06 03:42:20 +08:00
|
|
|
|
|
|
|
while (nr--) {
|
2017-12-26 01:43:37 +08:00
|
|
|
clear_commit_marks_1(&list, *commit, mark);
|
2013-03-06 03:42:20 +08:00
|
|
|
commit++;
|
|
|
|
}
|
2012-01-14 20:19:53 +08:00
|
|
|
while (list)
|
|
|
|
clear_commit_marks_1(&list, pop_commit(&list), mark);
|
|
|
|
}
|
|
|
|
|
2013-03-06 03:42:20 +08:00
|
|
|
void clear_commit_marks(struct commit *commit, unsigned int mark)
|
|
|
|
{
|
|
|
|
clear_commit_marks_many(1, &commit, mark);
|
|
|
|
}
|
|
|
|
|
2005-06-06 23:39:40 +08:00
|
|
|
struct commit *pop_commit(struct commit_list **stack)
|
|
|
|
{
|
|
|
|
struct commit_list *top = *stack;
|
|
|
|
struct commit *item = top ? top->item : NULL;
|
|
|
|
|
|
|
|
if (top) {
|
|
|
|
*stack = top->next;
|
|
|
|
free(top);
|
|
|
|
}
|
|
|
|
return item;
|
|
|
|
}
|
|
|
|
|
commit-slab: introduce a macro to define a slab for new type
Introduce a header file to define a macro that can define the struct
type, initializer, accessor and cleanup functions to manage a commit
slab. Update the "indegree" topological sort facility using it.
To associate 32 flag bits with each commit, you can write:
define_commit_slab(flag32, uint32);
to declare "struct flag32" type, define an instance of it with
struct flag32 flags;
and initialize it by calling
init_flag32(&flags);
After that, a call to flag32_at() function
uint32 *fp = flag32_at(&flags, commit);
will return a pointer pointing at a uint32 for that commit. Once
you are done with these flags, clean them up with
clear_flag32(&flags);
Callers that cannot hard-code how wide the data to be associated
with the commit be at compile time can use the "_with_stride"
variant to initialize the slab.
Suppose you want to give one bit per existing ref, and paint commits
down to find which refs are descendants of each commit. Saying
typedef uint32 bits320[5];
define_commit_slab(flagbits, bits320);
at compile time will still limit your code with hard-coded limit,
because you may find that you have more than 320 refs at runtime.
The code can declare a commit slab "struct flagbits" like this
instead:
define_commit_slab(flagbits, unsigned char);
struct flagbits flags;
and initialize it by:
nrefs = ... count number of refs ...
init_flagbits_with_stride(&flags, (nrefs + 7) / 8);
so that
unsigned char *fp = flagbits_at(&flags, commit);
will return a pointer pointing at an array of 40 "unsigned char"s
associated with the commit, once you figure out nrefs is 320 at
runtime.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-14 02:56:41 +08:00
|
|
|
/*
|
|
|
|
* Topological sort support
|
|
|
|
*/
|
2013-04-09 14:52:56 +08:00
|
|
|
|
commit-slab: introduce a macro to define a slab for new type
Introduce a header file to define a macro that can define the struct
type, initializer, accessor and cleanup functions to manage a commit
slab. Update the "indegree" topological sort facility using it.
To associate 32 flag bits with each commit, you can write:
define_commit_slab(flag32, uint32);
to declare "struct flag32" type, define an instance of it with
struct flag32 flags;
and initialize it by calling
init_flag32(&flags);
After that, a call to flag32_at() function
uint32 *fp = flag32_at(&flags, commit);
will return a pointer pointing at a uint32 for that commit. Once
you are done with these flags, clean them up with
clear_flag32(&flags);
Callers that cannot hard-code how wide the data to be associated
with the commit be at compile time can use the "_with_stride"
variant to initialize the slab.
Suppose you want to give one bit per existing ref, and paint commits
down to find which refs are descendants of each commit. Saying
typedef uint32 bits320[5];
define_commit_slab(flagbits, bits320);
at compile time will still limit your code with hard-coded limit,
because you may find that you have more than 320 refs at runtime.
The code can declare a commit slab "struct flagbits" like this
instead:
define_commit_slab(flagbits, unsigned char);
struct flagbits flags;
and initialize it by:
nrefs = ... count number of refs ...
init_flagbits_with_stride(&flags, (nrefs + 7) / 8);
so that
unsigned char *fp = flagbits_at(&flags, commit);
will return a pointer pointing at an array of 40 "unsigned char"s
associated with the commit, once you figure out nrefs is 320 at
runtime.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-14 02:56:41 +08:00
|
|
|
/* count number of children that have not been emitted */
|
|
|
|
define_commit_slab(indegree_slab, int);
|
2013-04-09 14:52:56 +08:00
|
|
|
|
2018-08-22 04:54:12 +08:00
|
|
|
define_commit_slab(author_date_slab, timestamp_t);
|
2013-06-08 01:35:54 +08:00
|
|
|
|
2018-11-01 21:46:21 +08:00
|
|
|
void record_author_date(struct author_date_slab *author_date,
|
|
|
|
struct commit *commit)
|
2013-06-08 01:35:54 +08:00
|
|
|
{
|
2023-03-28 21:58:48 +08:00
|
|
|
const char *buffer = repo_get_commit_buffer(the_repository, commit,
|
|
|
|
NULL);
|
2013-06-08 01:35:54 +08:00
|
|
|
struct ident_split ident;
|
2014-08-27 15:56:55 +08:00
|
|
|
const char *ident_line;
|
|
|
|
size_t ident_len;
|
2013-06-08 01:35:54 +08:00
|
|
|
char *date_end;
|
2017-04-27 03:29:31 +08:00
|
|
|
timestamp_t date;
|
2013-06-08 01:35:54 +08:00
|
|
|
|
2014-08-27 15:56:55 +08:00
|
|
|
ident_line = find_commit_header(buffer, "author", &ident_len);
|
|
|
|
if (!ident_line)
|
|
|
|
goto fail_exit; /* no author line */
|
|
|
|
if (split_ident_line(&ident, ident_line, ident_len) ||
|
|
|
|
!ident.date_begin || !ident.date_end)
|
|
|
|
goto fail_exit; /* malformed "author" line */
|
2013-06-08 01:35:54 +08:00
|
|
|
|
2017-04-21 18:45:44 +08:00
|
|
|
date = parse_timestamp(ident.date_begin, &date_end, 10);
|
2013-06-08 01:35:54 +08:00
|
|
|
if (date_end != ident.date_end)
|
|
|
|
goto fail_exit; /* malformed date */
|
|
|
|
*(author_date_slab_at(author_date, commit)) = date;
|
|
|
|
|
|
|
|
fail_exit:
|
2023-03-28 21:58:48 +08:00
|
|
|
repo_unuse_commit_buffer(the_repository, commit, buffer);
|
2013-06-08 01:35:54 +08:00
|
|
|
}
|
|
|
|
|
2018-11-01 21:46:21 +08:00
|
|
|
int compare_commits_by_author_date(const void *a_, const void *b_,
|
|
|
|
void *cb_data)
|
2013-06-08 01:35:54 +08:00
|
|
|
{
|
|
|
|
const struct commit *a = a_, *b = b_;
|
|
|
|
struct author_date_slab *author_date = cb_data;
|
2017-04-27 03:29:31 +08:00
|
|
|
timestamp_t a_date = *(author_date_slab_at(author_date, a));
|
|
|
|
timestamp_t b_date = *(author_date_slab_at(author_date, b));
|
2013-06-08 01:35:54 +08:00
|
|
|
|
|
|
|
/* newer commits with larger date first */
|
|
|
|
if (a_date < b_date)
|
|
|
|
return 1;
|
|
|
|
else if (a_date > b_date)
|
|
|
|
return -1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2023-02-24 14:39:27 +08:00
|
|
|
int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_,
|
|
|
|
void *unused UNUSED)
|
2018-05-01 20:47:11 +08:00
|
|
|
{
|
|
|
|
const struct commit *a = a_, *b = b_;
|
2021-01-17 02:11:13 +08:00
|
|
|
const timestamp_t generation_a = commit_graph_generation(a),
|
|
|
|
generation_b = commit_graph_generation(b);
|
2018-05-01 20:47:11 +08:00
|
|
|
|
|
|
|
/* newer commits first */
|
2020-06-17 17:14:11 +08:00
|
|
|
if (generation_a < generation_b)
|
2018-05-01 20:47:11 +08:00
|
|
|
return 1;
|
2020-06-17 17:14:11 +08:00
|
|
|
else if (generation_a > generation_b)
|
2018-05-01 20:47:11 +08:00
|
|
|
return -1;
|
|
|
|
|
|
|
|
/* use date as a heuristic when generations are equal */
|
|
|
|
if (a->date < b->date)
|
|
|
|
return 1;
|
|
|
|
else if (a->date > b->date)
|
|
|
|
return -1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2023-02-24 14:39:27 +08:00
|
|
|
int compare_commits_by_commit_date(const void *a_, const void *b_,
|
|
|
|
void *unused UNUSED)
|
2013-06-07 12:58:12 +08:00
|
|
|
{
|
|
|
|
const struct commit *a = a_, *b = b_;
|
|
|
|
/* newer commits with larger date first */
|
|
|
|
if (a->date < b->date)
|
|
|
|
return 1;
|
|
|
|
else if (a->date > b->date)
|
|
|
|
return -1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2005-07-07 00:39:34 +08:00
|
|
|
/*
|
|
|
|
* Performs an in-place topological sort on the list supplied.
|
|
|
|
*/
|
2013-06-07 12:58:12 +08:00
|
|
|
void sort_in_topological_order(struct commit_list **list, enum rev_sort_order sort_order)
|
2006-03-10 17:21:37 +08:00
|
|
|
{
|
2007-11-03 04:32:58 +08:00
|
|
|
struct commit_list *next, *orig = *list;
|
|
|
|
struct commit_list **pptr;
|
commit-slab: introduce a macro to define a slab for new type
Introduce a header file to define a macro that can define the struct
type, initializer, accessor and cleanup functions to manage a commit
slab. Update the "indegree" topological sort facility using it.
To associate 32 flag bits with each commit, you can write:
define_commit_slab(flag32, uint32);
to declare "struct flag32" type, define an instance of it with
struct flag32 flags;
and initialize it by calling
init_flag32(&flags);
After that, a call to flag32_at() function
uint32 *fp = flag32_at(&flags, commit);
will return a pointer pointing at a uint32 for that commit. Once
you are done with these flags, clean them up with
clear_flag32(&flags);
Callers that cannot hard-code how wide the data to be associated
with the commit be at compile time can use the "_with_stride"
variant to initialize the slab.
Suppose you want to give one bit per existing ref, and paint commits
down to find which refs are descendants of each commit. Saying
typedef uint32 bits320[5];
define_commit_slab(flagbits, bits320);
at compile time will still limit your code with hard-coded limit,
because you may find that you have more than 320 refs at runtime.
The code can declare a commit slab "struct flagbits" like this
instead:
define_commit_slab(flagbits, unsigned char);
struct flagbits flags;
and initialize it by:
nrefs = ... count number of refs ...
init_flagbits_with_stride(&flags, (nrefs + 7) / 8);
so that
unsigned char *fp = flagbits_at(&flags, commit);
will return a pointer pointing at an array of 40 "unsigned char"s
associated with the commit, once you figure out nrefs is 320 at
runtime.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-14 02:56:41 +08:00
|
|
|
struct indegree_slab indegree;
|
2013-06-07 12:58:12 +08:00
|
|
|
struct prio_queue queue;
|
|
|
|
struct commit *commit;
|
2013-06-08 01:35:54 +08:00
|
|
|
struct author_date_slab author_date;
|
2007-06-07 15:04:01 +08:00
|
|
|
|
2007-11-03 04:32:58 +08:00
|
|
|
if (!orig)
|
2005-12-24 20:12:43 +08:00
|
|
|
return;
|
2007-11-03 04:32:58 +08:00
|
|
|
*list = NULL;
|
|
|
|
|
commit-slab: introduce a macro to define a slab for new type
Introduce a header file to define a macro that can define the struct
type, initializer, accessor and cleanup functions to manage a commit
slab. Update the "indegree" topological sort facility using it.
To associate 32 flag bits with each commit, you can write:
define_commit_slab(flag32, uint32);
to declare "struct flag32" type, define an instance of it with
struct flag32 flags;
and initialize it by calling
init_flag32(&flags);
After that, a call to flag32_at() function
uint32 *fp = flag32_at(&flags, commit);
will return a pointer pointing at a uint32 for that commit. Once
you are done with these flags, clean them up with
clear_flag32(&flags);
Callers that cannot hard-code how wide the data to be associated
with the commit be at compile time can use the "_with_stride"
variant to initialize the slab.
Suppose you want to give one bit per existing ref, and paint commits
down to find which refs are descendants of each commit. Saying
typedef uint32 bits320[5];
define_commit_slab(flagbits, bits320);
at compile time will still limit your code with hard-coded limit,
because you may find that you have more than 320 refs at runtime.
The code can declare a commit slab "struct flagbits" like this
instead:
define_commit_slab(flagbits, unsigned char);
struct flagbits flags;
and initialize it by:
nrefs = ... count number of refs ...
init_flagbits_with_stride(&flags, (nrefs + 7) / 8);
so that
unsigned char *fp = flagbits_at(&flags, commit);
will return a pointer pointing at an array of 40 "unsigned char"s
associated with the commit, once you figure out nrefs is 320 at
runtime.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-14 02:56:41 +08:00
|
|
|
init_indegree_slab(&indegree);
|
2013-06-07 12:58:12 +08:00
|
|
|
memset(&queue, '\0', sizeof(queue));
|
2013-06-08 01:35:54 +08:00
|
|
|
|
2013-06-07 12:58:12 +08:00
|
|
|
switch (sort_order) {
|
|
|
|
default: /* REV_SORT_IN_GRAPH_ORDER */
|
|
|
|
queue.compare = NULL;
|
|
|
|
break;
|
|
|
|
case REV_SORT_BY_COMMIT_DATE:
|
|
|
|
queue.compare = compare_commits_by_commit_date;
|
|
|
|
break;
|
2013-06-08 01:35:54 +08:00
|
|
|
case REV_SORT_BY_AUTHOR_DATE:
|
|
|
|
init_author_date_slab(&author_date);
|
|
|
|
queue.compare = compare_commits_by_author_date;
|
|
|
|
queue.cb_data = &author_date;
|
|
|
|
break;
|
2013-06-07 12:58:12 +08:00
|
|
|
}
|
2013-04-09 14:52:56 +08:00
|
|
|
|
2007-11-03 04:32:58 +08:00
|
|
|
/* Mark them and clear the indegree */
|
|
|
|
for (next = orig; next; next = next->next) {
|
|
|
|
struct commit *commit = next->item;
|
commit-slab: introduce a macro to define a slab for new type
Introduce a header file to define a macro that can define the struct
type, initializer, accessor and cleanup functions to manage a commit
slab. Update the "indegree" topological sort facility using it.
To associate 32 flag bits with each commit, you can write:
define_commit_slab(flag32, uint32);
to declare "struct flag32" type, define an instance of it with
struct flag32 flags;
and initialize it by calling
init_flag32(&flags);
After that, a call to flag32_at() function
uint32 *fp = flag32_at(&flags, commit);
will return a pointer pointing at a uint32 for that commit. Once
you are done with these flags, clean them up with
clear_flag32(&flags);
Callers that cannot hard-code how wide the data to be associated
with the commit be at compile time can use the "_with_stride"
variant to initialize the slab.
Suppose you want to give one bit per existing ref, and paint commits
down to find which refs are descendants of each commit. Saying
typedef uint32 bits320[5];
define_commit_slab(flagbits, bits320);
at compile time will still limit your code with hard-coded limit,
because you may find that you have more than 320 refs at runtime.
The code can declare a commit slab "struct flagbits" like this
instead:
define_commit_slab(flagbits, unsigned char);
struct flagbits flags;
and initialize it by:
nrefs = ... count number of refs ...
init_flagbits_with_stride(&flags, (nrefs + 7) / 8);
so that
unsigned char *fp = flagbits_at(&flags, commit);
will return a pointer pointing at an array of 40 "unsigned char"s
associated with the commit, once you figure out nrefs is 320 at
runtime.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-14 02:56:41 +08:00
|
|
|
*(indegree_slab_at(&indegree, commit)) = 1;
|
2013-06-08 01:35:54 +08:00
|
|
|
/* also record the author dates, if needed */
|
|
|
|
if (sort_order == REV_SORT_BY_AUTHOR_DATE)
|
|
|
|
record_author_date(&author_date, commit);
|
2005-07-07 00:39:34 +08:00
|
|
|
}
|
2007-11-03 04:32:58 +08:00
|
|
|
|
2005-07-07 00:39:34 +08:00
|
|
|
/* update the indegree */
|
2007-11-03 04:32:58 +08:00
|
|
|
for (next = orig; next; next = next->next) {
|
2013-06-07 12:58:12 +08:00
|
|
|
struct commit_list *parents = next->item->parents;
|
2005-07-07 00:39:34 +08:00
|
|
|
while (parents) {
|
2007-11-03 04:32:58 +08:00
|
|
|
struct commit *parent = parents->item;
|
commit-slab: introduce a macro to define a slab for new type
Introduce a header file to define a macro that can define the struct
type, initializer, accessor and cleanup functions to manage a commit
slab. Update the "indegree" topological sort facility using it.
To associate 32 flag bits with each commit, you can write:
define_commit_slab(flag32, uint32);
to declare "struct flag32" type, define an instance of it with
struct flag32 flags;
and initialize it by calling
init_flag32(&flags);
After that, a call to flag32_at() function
uint32 *fp = flag32_at(&flags, commit);
will return a pointer pointing at a uint32 for that commit. Once
you are done with these flags, clean them up with
clear_flag32(&flags);
Callers that cannot hard-code how wide the data to be associated
with the commit be at compile time can use the "_with_stride"
variant to initialize the slab.
Suppose you want to give one bit per existing ref, and paint commits
down to find which refs are descendants of each commit. Saying
typedef uint32 bits320[5];
define_commit_slab(flagbits, bits320);
at compile time will still limit your code with hard-coded limit,
because you may find that you have more than 320 refs at runtime.
The code can declare a commit slab "struct flagbits" like this
instead:
define_commit_slab(flagbits, unsigned char);
struct flagbits flags;
and initialize it by:
nrefs = ... count number of refs ...
init_flagbits_with_stride(&flags, (nrefs + 7) / 8);
so that
unsigned char *fp = flagbits_at(&flags, commit);
will return a pointer pointing at an array of 40 "unsigned char"s
associated with the commit, once you figure out nrefs is 320 at
runtime.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-14 02:56:41 +08:00
|
|
|
int *pi = indegree_slab_at(&indegree, parent);
|
2006-03-10 17:21:37 +08:00
|
|
|
|
2013-04-09 14:52:56 +08:00
|
|
|
if (*pi)
|
|
|
|
(*pi)++;
|
2007-11-03 04:32:58 +08:00
|
|
|
parents = parents->next;
|
2005-07-07 00:39:34 +08:00
|
|
|
}
|
|
|
|
}
|
2007-11-03 04:32:58 +08:00
|
|
|
|
2007-06-07 15:04:01 +08:00
|
|
|
/*
|
2007-09-10 18:35:06 +08:00
|
|
|
* find the tips
|
|
|
|
*
|
|
|
|
* tips are nodes not reachable from any other node in the list
|
|
|
|
*
|
|
|
|
* the tips serve as a starting set for the work queue.
|
|
|
|
*/
|
2007-11-03 04:32:58 +08:00
|
|
|
for (next = orig; next; next = next->next) {
|
|
|
|
struct commit *commit = next->item;
|
2005-07-07 00:39:34 +08:00
|
|
|
|
commit-slab: introduce a macro to define a slab for new type
Introduce a header file to define a macro that can define the struct
type, initializer, accessor and cleanup functions to manage a commit
slab. Update the "indegree" topological sort facility using it.
To associate 32 flag bits with each commit, you can write:
define_commit_slab(flag32, uint32);
to declare "struct flag32" type, define an instance of it with
struct flag32 flags;
and initialize it by calling
init_flag32(&flags);
After that, a call to flag32_at() function
uint32 *fp = flag32_at(&flags, commit);
will return a pointer pointing at a uint32 for that commit. Once
you are done with these flags, clean them up with
clear_flag32(&flags);
Callers that cannot hard-code how wide the data to be associated
with the commit be at compile time can use the "_with_stride"
variant to initialize the slab.
Suppose you want to give one bit per existing ref, and paint commits
down to find which refs are descendants of each commit. Saying
typedef uint32 bits320[5];
define_commit_slab(flagbits, bits320);
at compile time will still limit your code with hard-coded limit,
because you may find that you have more than 320 refs at runtime.
The code can declare a commit slab "struct flagbits" like this
instead:
define_commit_slab(flagbits, unsigned char);
struct flagbits flags;
and initialize it by:
nrefs = ... count number of refs ...
init_flagbits_with_stride(&flags, (nrefs + 7) / 8);
so that
unsigned char *fp = flagbits_at(&flags, commit);
will return a pointer pointing at an array of 40 "unsigned char"s
associated with the commit, once you figure out nrefs is 320 at
runtime.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-14 02:56:41 +08:00
|
|
|
if (*(indegree_slab_at(&indegree, commit)) == 1)
|
2013-06-07 12:58:12 +08:00
|
|
|
prio_queue_put(&queue, commit);
|
2005-07-07 00:39:34 +08:00
|
|
|
}
|
2006-02-16 14:05:33 +08:00
|
|
|
|
2013-06-07 12:58:12 +08:00
|
|
|
/*
|
|
|
|
* This is unfortunate; the initial tips need to be shown
|
|
|
|
* in the order given from the revision traversal machinery.
|
|
|
|
*/
|
|
|
|
if (sort_order == REV_SORT_IN_GRAPH_ORDER)
|
|
|
|
prio_queue_reverse(&queue);
|
|
|
|
|
|
|
|
/* We no longer need the commit list */
|
|
|
|
free_commit_list(orig);
|
2007-11-03 04:32:58 +08:00
|
|
|
|
|
|
|
pptr = list;
|
|
|
|
*list = NULL;
|
2013-06-07 12:58:12 +08:00
|
|
|
while ((commit = prio_queue_get(&queue)) != NULL) {
|
|
|
|
struct commit_list *parents;
|
2007-11-03 04:32:58 +08:00
|
|
|
|
|
|
|
for (parents = commit->parents; parents ; parents = parents->next) {
|
2011-08-26 05:46:52 +08:00
|
|
|
struct commit *parent = parents->item;
|
commit-slab: introduce a macro to define a slab for new type
Introduce a header file to define a macro that can define the struct
type, initializer, accessor and cleanup functions to manage a commit
slab. Update the "indegree" topological sort facility using it.
To associate 32 flag bits with each commit, you can write:
define_commit_slab(flag32, uint32);
to declare "struct flag32" type, define an instance of it with
struct flag32 flags;
and initialize it by calling
init_flag32(&flags);
After that, a call to flag32_at() function
uint32 *fp = flag32_at(&flags, commit);
will return a pointer pointing at a uint32 for that commit. Once
you are done with these flags, clean them up with
clear_flag32(&flags);
Callers that cannot hard-code how wide the data to be associated
with the commit be at compile time can use the "_with_stride"
variant to initialize the slab.
Suppose you want to give one bit per existing ref, and paint commits
down to find which refs are descendants of each commit. Saying
typedef uint32 bits320[5];
define_commit_slab(flagbits, bits320);
at compile time will still limit your code with hard-coded limit,
because you may find that you have more than 320 refs at runtime.
The code can declare a commit slab "struct flagbits" like this
instead:
define_commit_slab(flagbits, unsigned char);
struct flagbits flags;
and initialize it by:
nrefs = ... count number of refs ...
init_flagbits_with_stride(&flags, (nrefs + 7) / 8);
so that
unsigned char *fp = flagbits_at(&flags, commit);
will return a pointer pointing at an array of 40 "unsigned char"s
associated with the commit, once you figure out nrefs is 320 at
runtime.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-14 02:56:41 +08:00
|
|
|
int *pi = indegree_slab_at(&indegree, parent);
|
2007-11-03 04:32:58 +08:00
|
|
|
|
2013-04-09 14:52:56 +08:00
|
|
|
if (!*pi)
|
2007-11-03 04:32:58 +08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* parents are only enqueued for emission
|
|
|
|
* when all their children have been emitted thereby
|
|
|
|
* guaranteeing topological order.
|
|
|
|
*/
|
2013-06-07 12:58:12 +08:00
|
|
|
if (--(*pi) == 1)
|
|
|
|
prio_queue_put(&queue, parent);
|
2005-07-07 00:39:34 +08:00
|
|
|
}
|
|
|
|
/*
|
2013-06-07 12:58:12 +08:00
|
|
|
* all children of commit have already been
|
|
|
|
* emitted. we can emit it now.
|
2007-09-10 18:35:06 +08:00
|
|
|
*/
|
commit-slab: introduce a macro to define a slab for new type
Introduce a header file to define a macro that can define the struct
type, initializer, accessor and cleanup functions to manage a commit
slab. Update the "indegree" topological sort facility using it.
To associate 32 flag bits with each commit, you can write:
define_commit_slab(flag32, uint32);
to declare "struct flag32" type, define an instance of it with
struct flag32 flags;
and initialize it by calling
init_flag32(&flags);
After that, a call to flag32_at() function
uint32 *fp = flag32_at(&flags, commit);
will return a pointer pointing at a uint32 for that commit. Once
you are done with these flags, clean them up with
clear_flag32(&flags);
Callers that cannot hard-code how wide the data to be associated
with the commit be at compile time can use the "_with_stride"
variant to initialize the slab.
Suppose you want to give one bit per existing ref, and paint commits
down to find which refs are descendants of each commit. Saying
typedef uint32 bits320[5];
define_commit_slab(flagbits, bits320);
at compile time will still limit your code with hard-coded limit,
because you may find that you have more than 320 refs at runtime.
The code can declare a commit slab "struct flagbits" like this
instead:
define_commit_slab(flagbits, unsigned char);
struct flagbits flags;
and initialize it by:
nrefs = ... count number of refs ...
init_flagbits_with_stride(&flags, (nrefs + 7) / 8);
so that
unsigned char *fp = flagbits_at(&flags, commit);
will return a pointer pointing at an array of 40 "unsigned char"s
associated with the commit, once you figure out nrefs is 320 at
runtime.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-14 02:56:41 +08:00
|
|
|
*(indegree_slab_at(&indegree, commit)) = 0;
|
2013-06-07 12:58:12 +08:00
|
|
|
|
|
|
|
pptr = &commit_list_insert(commit, pptr)->next;
|
2005-07-07 00:39:34 +08:00
|
|
|
}
|
2013-04-09 14:52:56 +08:00
|
|
|
|
commit-slab: introduce a macro to define a slab for new type
Introduce a header file to define a macro that can define the struct
type, initializer, accessor and cleanup functions to manage a commit
slab. Update the "indegree" topological sort facility using it.
To associate 32 flag bits with each commit, you can write:
define_commit_slab(flag32, uint32);
to declare "struct flag32" type, define an instance of it with
struct flag32 flags;
and initialize it by calling
init_flag32(&flags);
After that, a call to flag32_at() function
uint32 *fp = flag32_at(&flags, commit);
will return a pointer pointing at a uint32 for that commit. Once
you are done with these flags, clean them up with
clear_flag32(&flags);
Callers that cannot hard-code how wide the data to be associated
with the commit be at compile time can use the "_with_stride"
variant to initialize the slab.
Suppose you want to give one bit per existing ref, and paint commits
down to find which refs are descendants of each commit. Saying
typedef uint32 bits320[5];
define_commit_slab(flagbits, bits320);
at compile time will still limit your code with hard-coded limit,
because you may find that you have more than 320 refs at runtime.
The code can declare a commit slab "struct flagbits" like this
instead:
define_commit_slab(flagbits, unsigned char);
struct flagbits flags;
and initialize it by:
nrefs = ... count number of refs ...
init_flagbits_with_stride(&flags, (nrefs + 7) / 8);
so that
unsigned char *fp = flagbits_at(&flags, commit);
will return a pointer pointing at an array of 40 "unsigned char"s
associated with the commit, once you figure out nrefs is 320 at
runtime.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-14 02:56:41 +08:00
|
|
|
clear_indegree_slab(&indegree);
|
2013-06-07 12:58:12 +08:00
|
|
|
clear_prio_queue(&queue);
|
2013-06-08 01:35:54 +08:00
|
|
|
if (sort_order == REV_SORT_BY_AUTHOR_DATE)
|
|
|
|
clear_author_date_slab(&author_date);
|
2005-07-07 00:39:34 +08:00
|
|
|
}
|
2006-06-29 21:17:32 +08:00
|
|
|
|
2018-09-05 06:00:08 +08:00
|
|
|
struct rev_collect {
|
|
|
|
struct commit **commit;
|
|
|
|
int nr;
|
|
|
|
int alloc;
|
|
|
|
unsigned int initial : 1;
|
|
|
|
};
|
|
|
|
|
|
|
|
static void add_one_commit(struct object_id *oid, struct rev_collect *revs)
|
|
|
|
{
|
|
|
|
struct commit *commit;
|
|
|
|
|
|
|
|
if (is_null_oid(oid))
|
|
|
|
return;
|
|
|
|
|
|
|
|
commit = lookup_commit(the_repository, oid);
|
|
|
|
if (!commit ||
|
|
|
|
(commit->object.flags & TMP_MARK) ||
|
2023-03-28 21:58:48 +08:00
|
|
|
repo_parse_commit(the_repository, commit))
|
2018-09-05 06:00:08 +08:00
|
|
|
return;
|
|
|
|
|
|
|
|
ALLOC_GROW(revs->commit, revs->nr + 1, revs->alloc);
|
|
|
|
revs->commit[revs->nr++] = commit;
|
|
|
|
commit->object.flags |= TMP_MARK;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int collect_one_reflog_ent(struct object_id *ooid, struct object_id *noid,
|
2022-08-26 01:09:48 +08:00
|
|
|
const char *ident UNUSED,
|
|
|
|
timestamp_t timestamp UNUSED, int tz UNUSED,
|
|
|
|
const char *message UNUSED, void *cbdata)
|
2018-09-05 06:00:08 +08:00
|
|
|
{
|
|
|
|
struct rev_collect *revs = cbdata;
|
|
|
|
|
|
|
|
if (revs->initial) {
|
|
|
|
revs->initial = 0;
|
|
|
|
add_one_commit(ooid, revs);
|
|
|
|
}
|
|
|
|
add_one_commit(noid, revs);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct commit *get_fork_point(const char *refname, struct commit *commit)
|
|
|
|
{
|
|
|
|
struct object_id oid;
|
|
|
|
struct rev_collect revs;
|
2024-02-28 17:44:16 +08:00
|
|
|
struct commit_list *bases = NULL;
|
2018-09-05 06:00:08 +08:00
|
|
|
int i;
|
|
|
|
struct commit *ret = NULL;
|
2019-12-10 02:51:47 +08:00
|
|
|
char *full_refname;
|
|
|
|
|
2023-03-28 21:58:54 +08:00
|
|
|
switch (repo_dwim_ref(the_repository, refname, strlen(refname), &oid,
|
|
|
|
&full_refname, 0)) {
|
2019-12-10 02:51:47 +08:00
|
|
|
case 0:
|
|
|
|
die("No such ref: '%s'", refname);
|
|
|
|
case 1:
|
|
|
|
break; /* good */
|
|
|
|
default:
|
|
|
|
die("Ambiguous refname: '%s'", refname);
|
|
|
|
}
|
2018-09-05 06:00:08 +08:00
|
|
|
|
|
|
|
memset(&revs, 0, sizeof(revs));
|
|
|
|
revs.initial = 1;
|
2024-05-07 15:11:53 +08:00
|
|
|
refs_for_each_reflog_ent(get_main_ref_store(the_repository),
|
|
|
|
full_refname, collect_one_reflog_ent, &revs);
|
2018-09-05 06:00:08 +08:00
|
|
|
|
2019-12-10 02:51:47 +08:00
|
|
|
if (!revs.nr)
|
2018-09-05 06:00:08 +08:00
|
|
|
add_one_commit(&oid, &revs);
|
|
|
|
|
|
|
|
for (i = 0; i < revs.nr; i++)
|
|
|
|
revs.commit[i]->object.flags &= ~TMP_MARK;
|
|
|
|
|
2024-02-28 17:44:16 +08:00
|
|
|
if (repo_get_merge_bases_many(the_repository, commit, revs.nr,
|
|
|
|
revs.commit, &bases) < 0)
|
|
|
|
exit(128);
|
2018-09-05 06:00:08 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* There should be one and only one merge base, when we found
|
|
|
|
* a common ancestor among reflog entries.
|
|
|
|
*/
|
|
|
|
if (!bases || bases->next)
|
|
|
|
goto cleanup_return;
|
|
|
|
|
|
|
|
/* And the found one must be one of the reflog entries */
|
|
|
|
for (i = 0; i < revs.nr; i++)
|
|
|
|
if (&bases->item->object == &revs.commit[i]->object)
|
|
|
|
break; /* found */
|
|
|
|
if (revs.nr <= i)
|
|
|
|
goto cleanup_return;
|
|
|
|
|
|
|
|
ret = bases->item;
|
|
|
|
|
|
|
|
cleanup_return:
|
2023-02-07 03:08:13 +08:00
|
|
|
free(revs.commit);
|
2018-09-05 06:00:08 +08:00
|
|
|
free_commit_list(bases);
|
2019-12-10 02:51:47 +08:00
|
|
|
free(full_refname);
|
2018-09-05 06:00:08 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2020-02-23 04:17:42 +08:00
|
|
|
/*
|
|
|
|
* Indexed by hash algorithm identifier.
|
|
|
|
*/
|
|
|
|
static const char *gpg_sig_headers[] = {
|
|
|
|
NULL,
|
|
|
|
"gpgsig",
|
|
|
|
"gpgsig-sha256",
|
|
|
|
};
|
commit: teach --gpg-sign option
This uses the gpg-interface.[ch] to allow signing the commit, i.e.
$ git commit --gpg-sign -m foo
You need a passphrase to unlock the secret key for
user: "Junio C Hamano <gitster@pobox.com>"
4096-bit RSA key, ID 96AFE6CB, created 2011-10-03 (main key ID 713660A7)
[master 8457d13] foo
1 files changed, 1 insertions(+), 0 deletions(-)
The lines of GPG detached signature are placed in a new multi-line header
field, instead of tucking the signature block at the end of the commit log
message text (similar to how signed tag is done), for multiple reasons:
- The signature won't clutter output from "git log" and friends if it is
in the extra header. If we place it at the end of the log message, we
would need to teach "git log" and friends to strip the signature block
with an option.
- Teaching new versions of "git log" and "gitk" to optionally verify and
show signatures is cleaner if we structurally know where the signature
block is (instead of scanning in the commit log message).
- The signature needs to be stripped upon various commit rewriting
operations, e.g. rebase, filter-branch, etc. They all already ignore
unknown headers, but if we place signature in the log message, all of
these tools (and third-party tools) also need to learn how a signature
block would look like.
- When we added the optional encoding header, all the tools (both in tree
and third-party) that acts on the raw commit object should have been
fixed to ignore headers they do not understand, so it is not like that
new header would be more likely to break than extra text in the commit.
A commit made with the above sample sequence would look like this:
$ git cat-file commit HEAD
tree 3cd71d90e3db4136e5260ab54599791c4f883b9d
parent b87755351a47b09cb27d6913e6e0e17e6254a4d4
author Junio C Hamano <gitster@pobox.com> 1317862251 -0700
committer Junio C Hamano <gitster@pobox.com> 1317862251 -0700
gpgsig -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJOjPtrAAoJELC16IaWr+bL4TMP/RSe2Y/jYnCkds9unO5JEnfG
...
=dt98
-----END PGP SIGNATURE-----
foo
but "git log" (unless you ask for it with --pretty=raw) output is not
cluttered with the signature information.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-06 08:23:20 +08:00
|
|
|
|
2023-10-02 10:40:15 +08:00
|
|
|
int add_header_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo)
|
commit: teach --gpg-sign option
This uses the gpg-interface.[ch] to allow signing the commit, i.e.
$ git commit --gpg-sign -m foo
You need a passphrase to unlock the secret key for
user: "Junio C Hamano <gitster@pobox.com>"
4096-bit RSA key, ID 96AFE6CB, created 2011-10-03 (main key ID 713660A7)
[master 8457d13] foo
1 files changed, 1 insertions(+), 0 deletions(-)
The lines of GPG detached signature are placed in a new multi-line header
field, instead of tucking the signature block at the end of the commit log
message text (similar to how signed tag is done), for multiple reasons:
- The signature won't clutter output from "git log" and friends if it is
in the extra header. If we place it at the end of the log message, we
would need to teach "git log" and friends to strip the signature block
with an option.
- Teaching new versions of "git log" and "gitk" to optionally verify and
show signatures is cleaner if we structurally know where the signature
block is (instead of scanning in the commit log message).
- The signature needs to be stripped upon various commit rewriting
operations, e.g. rebase, filter-branch, etc. They all already ignore
unknown headers, but if we place signature in the log message, all of
these tools (and third-party tools) also need to learn how a signature
block would look like.
- When we added the optional encoding header, all the tools (both in tree
and third-party) that acts on the raw commit object should have been
fixed to ignore headers they do not understand, so it is not like that
new header would be more likely to break than extra text in the commit.
A commit made with the above sample sequence would look like this:
$ git cat-file commit HEAD
tree 3cd71d90e3db4136e5260ab54599791c4f883b9d
parent b87755351a47b09cb27d6913e6e0e17e6254a4d4
author Junio C Hamano <gitster@pobox.com> 1317862251 -0700
committer Junio C Hamano <gitster@pobox.com> 1317862251 -0700
gpgsig -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJOjPtrAAoJELC16IaWr+bL4TMP/RSe2Y/jYnCkds9unO5JEnfG
...
=dt98
-----END PGP SIGNATURE-----
foo
but "git log" (unless you ask for it with --pretty=raw) output is not
cluttered with the signature information.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-06 08:23:20 +08:00
|
|
|
{
|
|
|
|
int inspos, copypos;
|
2016-06-29 22:14:54 +08:00
|
|
|
const char *eoh;
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
const char *gpg_sig_header = gpg_sig_headers[hash_algo_by_ptr(algo)];
|
2020-02-23 04:17:42 +08:00
|
|
|
int gpg_sig_header_len = strlen(gpg_sig_header);
|
commit: teach --gpg-sign option
This uses the gpg-interface.[ch] to allow signing the commit, i.e.
$ git commit --gpg-sign -m foo
You need a passphrase to unlock the secret key for
user: "Junio C Hamano <gitster@pobox.com>"
4096-bit RSA key, ID 96AFE6CB, created 2011-10-03 (main key ID 713660A7)
[master 8457d13] foo
1 files changed, 1 insertions(+), 0 deletions(-)
The lines of GPG detached signature are placed in a new multi-line header
field, instead of tucking the signature block at the end of the commit log
message text (similar to how signed tag is done), for multiple reasons:
- The signature won't clutter output from "git log" and friends if it is
in the extra header. If we place it at the end of the log message, we
would need to teach "git log" and friends to strip the signature block
with an option.
- Teaching new versions of "git log" and "gitk" to optionally verify and
show signatures is cleaner if we structurally know where the signature
block is (instead of scanning in the commit log message).
- The signature needs to be stripped upon various commit rewriting
operations, e.g. rebase, filter-branch, etc. They all already ignore
unknown headers, but if we place signature in the log message, all of
these tools (and third-party tools) also need to learn how a signature
block would look like.
- When we added the optional encoding header, all the tools (both in tree
and third-party) that acts on the raw commit object should have been
fixed to ignore headers they do not understand, so it is not like that
new header would be more likely to break than extra text in the commit.
A commit made with the above sample sequence would look like this:
$ git cat-file commit HEAD
tree 3cd71d90e3db4136e5260ab54599791c4f883b9d
parent b87755351a47b09cb27d6913e6e0e17e6254a4d4
author Junio C Hamano <gitster@pobox.com> 1317862251 -0700
committer Junio C Hamano <gitster@pobox.com> 1317862251 -0700
gpgsig -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJOjPtrAAoJELC16IaWr+bL4TMP/RSe2Y/jYnCkds9unO5JEnfG
...
=dt98
-----END PGP SIGNATURE-----
foo
but "git log" (unless you ask for it with --pretty=raw) output is not
cluttered with the signature information.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-06 08:23:20 +08:00
|
|
|
|
|
|
|
/* find the end of the header */
|
2016-06-29 22:14:54 +08:00
|
|
|
eoh = strstr(buf->buf, "\n\n");
|
|
|
|
if (!eoh)
|
|
|
|
inspos = buf->len;
|
|
|
|
else
|
|
|
|
inspos = eoh - buf->buf + 1;
|
commit: teach --gpg-sign option
This uses the gpg-interface.[ch] to allow signing the commit, i.e.
$ git commit --gpg-sign -m foo
You need a passphrase to unlock the secret key for
user: "Junio C Hamano <gitster@pobox.com>"
4096-bit RSA key, ID 96AFE6CB, created 2011-10-03 (main key ID 713660A7)
[master 8457d13] foo
1 files changed, 1 insertions(+), 0 deletions(-)
The lines of GPG detached signature are placed in a new multi-line header
field, instead of tucking the signature block at the end of the commit log
message text (similar to how signed tag is done), for multiple reasons:
- The signature won't clutter output from "git log" and friends if it is
in the extra header. If we place it at the end of the log message, we
would need to teach "git log" and friends to strip the signature block
with an option.
- Teaching new versions of "git log" and "gitk" to optionally verify and
show signatures is cleaner if we structurally know where the signature
block is (instead of scanning in the commit log message).
- The signature needs to be stripped upon various commit rewriting
operations, e.g. rebase, filter-branch, etc. They all already ignore
unknown headers, but if we place signature in the log message, all of
these tools (and third-party tools) also need to learn how a signature
block would look like.
- When we added the optional encoding header, all the tools (both in tree
and third-party) that acts on the raw commit object should have been
fixed to ignore headers they do not understand, so it is not like that
new header would be more likely to break than extra text in the commit.
A commit made with the above sample sequence would look like this:
$ git cat-file commit HEAD
tree 3cd71d90e3db4136e5260ab54599791c4f883b9d
parent b87755351a47b09cb27d6913e6e0e17e6254a4d4
author Junio C Hamano <gitster@pobox.com> 1317862251 -0700
committer Junio C Hamano <gitster@pobox.com> 1317862251 -0700
gpgsig -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJOjPtrAAoJELC16IaWr+bL4TMP/RSe2Y/jYnCkds9unO5JEnfG
...
=dt98
-----END PGP SIGNATURE-----
foo
but "git log" (unless you ask for it with --pretty=raw) output is not
cluttered with the signature information.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-06 08:23:20 +08:00
|
|
|
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
for (copypos = 0; sig->buf[copypos]; ) {
|
|
|
|
const char *bol = sig->buf + copypos;
|
commit: teach --gpg-sign option
This uses the gpg-interface.[ch] to allow signing the commit, i.e.
$ git commit --gpg-sign -m foo
You need a passphrase to unlock the secret key for
user: "Junio C Hamano <gitster@pobox.com>"
4096-bit RSA key, ID 96AFE6CB, created 2011-10-03 (main key ID 713660A7)
[master 8457d13] foo
1 files changed, 1 insertions(+), 0 deletions(-)
The lines of GPG detached signature are placed in a new multi-line header
field, instead of tucking the signature block at the end of the commit log
message text (similar to how signed tag is done), for multiple reasons:
- The signature won't clutter output from "git log" and friends if it is
in the extra header. If we place it at the end of the log message, we
would need to teach "git log" and friends to strip the signature block
with an option.
- Teaching new versions of "git log" and "gitk" to optionally verify and
show signatures is cleaner if we structurally know where the signature
block is (instead of scanning in the commit log message).
- The signature needs to be stripped upon various commit rewriting
operations, e.g. rebase, filter-branch, etc. They all already ignore
unknown headers, but if we place signature in the log message, all of
these tools (and third-party tools) also need to learn how a signature
block would look like.
- When we added the optional encoding header, all the tools (both in tree
and third-party) that acts on the raw commit object should have been
fixed to ignore headers they do not understand, so it is not like that
new header would be more likely to break than extra text in the commit.
A commit made with the above sample sequence would look like this:
$ git cat-file commit HEAD
tree 3cd71d90e3db4136e5260ab54599791c4f883b9d
parent b87755351a47b09cb27d6913e6e0e17e6254a4d4
author Junio C Hamano <gitster@pobox.com> 1317862251 -0700
committer Junio C Hamano <gitster@pobox.com> 1317862251 -0700
gpgsig -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJOjPtrAAoJELC16IaWr+bL4TMP/RSe2Y/jYnCkds9unO5JEnfG
...
=dt98
-----END PGP SIGNATURE-----
foo
but "git log" (unless you ask for it with --pretty=raw) output is not
cluttered with the signature information.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-06 08:23:20 +08:00
|
|
|
const char *eol = strchrnul(bol, '\n');
|
|
|
|
int len = (eol - bol) + !!*eol;
|
|
|
|
|
|
|
|
if (!copypos) {
|
|
|
|
strbuf_insert(buf, inspos, gpg_sig_header, gpg_sig_header_len);
|
|
|
|
inspos += gpg_sig_header_len;
|
|
|
|
}
|
2020-02-09 21:44:23 +08:00
|
|
|
strbuf_insertstr(buf, inspos++, " ");
|
commit: teach --gpg-sign option
This uses the gpg-interface.[ch] to allow signing the commit, i.e.
$ git commit --gpg-sign -m foo
You need a passphrase to unlock the secret key for
user: "Junio C Hamano <gitster@pobox.com>"
4096-bit RSA key, ID 96AFE6CB, created 2011-10-03 (main key ID 713660A7)
[master 8457d13] foo
1 files changed, 1 insertions(+), 0 deletions(-)
The lines of GPG detached signature are placed in a new multi-line header
field, instead of tucking the signature block at the end of the commit log
message text (similar to how signed tag is done), for multiple reasons:
- The signature won't clutter output from "git log" and friends if it is
in the extra header. If we place it at the end of the log message, we
would need to teach "git log" and friends to strip the signature block
with an option.
- Teaching new versions of "git log" and "gitk" to optionally verify and
show signatures is cleaner if we structurally know where the signature
block is (instead of scanning in the commit log message).
- The signature needs to be stripped upon various commit rewriting
operations, e.g. rebase, filter-branch, etc. They all already ignore
unknown headers, but if we place signature in the log message, all of
these tools (and third-party tools) also need to learn how a signature
block would look like.
- When we added the optional encoding header, all the tools (both in tree
and third-party) that acts on the raw commit object should have been
fixed to ignore headers they do not understand, so it is not like that
new header would be more likely to break than extra text in the commit.
A commit made with the above sample sequence would look like this:
$ git cat-file commit HEAD
tree 3cd71d90e3db4136e5260ab54599791c4f883b9d
parent b87755351a47b09cb27d6913e6e0e17e6254a4d4
author Junio C Hamano <gitster@pobox.com> 1317862251 -0700
committer Junio C Hamano <gitster@pobox.com> 1317862251 -0700
gpgsig -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJOjPtrAAoJELC16IaWr+bL4TMP/RSe2Y/jYnCkds9unO5JEnfG
...
=dt98
-----END PGP SIGNATURE-----
foo
but "git log" (unless you ask for it with --pretty=raw) output is not
cluttered with the signature information.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-06 08:23:20 +08:00
|
|
|
strbuf_insert(buf, inspos, bol, len);
|
|
|
|
inspos += len;
|
|
|
|
copypos += len;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
static int sign_commit_to_strbuf(struct strbuf *sig, struct strbuf *buf, const char *keyid)
|
|
|
|
{
|
2024-09-05 18:09:07 +08:00
|
|
|
char *keyid_to_free = NULL;
|
|
|
|
int ret = 0;
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
if (!keyid || !*keyid)
|
2024-09-05 18:09:07 +08:00
|
|
|
keyid = keyid_to_free = get_signing_key();
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
if (sign_buffer(buf, sig, keyid))
|
2024-09-05 18:09:07 +08:00
|
|
|
ret = -1;
|
|
|
|
free(keyid_to_free);
|
|
|
|
return ret;
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
}
|
2021-02-11 10:08:04 +08:00
|
|
|
|
reuse cached commit buffer when parsing signatures
When we call show_signature or show_mergetag, we read the
commit object fresh via read_sha1_file and reparse its
headers. However, in most cases we already have the object
data available, attached to the "struct commit". This is
partially laziness in dealing with the memory allocation
issues, but partially defensive programming, in that we
would always want to verify a clean version of the buffer
(not one that might have been munged by other users of the
commit).
However, we do not currently ever munge the commit buffer,
and not using the already-available buffer carries a fairly
big performance penalty when we are looking at a large
number of commits. Here are timings on linux.git:
[baseline, no signatures]
$ time git log >/dev/null
real 0m4.902s
user 0m4.784s
sys 0m0.120s
[before]
$ time git log --show-signature >/dev/null
real 0m14.735s
user 0m9.964s
sys 0m0.944s
[after]
$ time git log --show-signature >/dev/null
real 0m9.981s
user 0m5.260s
sys 0m0.936s
Note that our user CPU time drops almost in half, close to
the non-signature case, but we do still spend more
wall-clock and system time, presumably from dealing with
gpg.
An alternative to this is to note that most commits do not
have signatures (less than 1% in this repo), yet we pay the
re-parsing cost for every commit just to find out if it has
a mergetag or signature. If we checked that when parsing the
commit initially, we could avoid re-examining most commits
later on. Even if we did pursue that direction, however,
this would still speed up the cases where we _do_ have
signatures. So it's probably worth doing either way.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 14:32:11 +08:00
|
|
|
int parse_signed_commit(const struct commit *commit,
|
2021-01-19 07:49:11 +08:00
|
|
|
struct strbuf *payload, struct strbuf *signature,
|
|
|
|
const struct git_hash_algo *algop)
|
2011-10-19 06:53:23 +08:00
|
|
|
{
|
|
|
|
unsigned long size;
|
2023-03-28 21:58:48 +08:00
|
|
|
const char *buffer = repo_get_commit_buffer(the_repository, commit,
|
|
|
|
&size);
|
2021-02-11 10:08:04 +08:00
|
|
|
int ret = parse_buffer_signed_by_header(buffer, size, payload, signature, algop);
|
|
|
|
|
2023-03-28 21:58:48 +08:00
|
|
|
repo_unuse_commit_buffer(the_repository, commit, buffer);
|
2021-02-11 10:08:04 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
int parse_buffer_signed_by_header(const char *buffer,
|
|
|
|
unsigned long size,
|
|
|
|
struct strbuf *payload,
|
|
|
|
struct strbuf *signature,
|
|
|
|
const struct git_hash_algo *algop)
|
|
|
|
{
|
2021-01-19 07:49:11 +08:00
|
|
|
int in_signature = 0, saw_signature = 0, other_signature = 0;
|
|
|
|
const char *line, *tail, *p;
|
|
|
|
const char *gpg_sig_header = gpg_sig_headers[hash_algo_by_ptr(algop)];
|
2011-10-19 06:53:23 +08:00
|
|
|
|
|
|
|
line = buffer;
|
|
|
|
tail = buffer + size;
|
|
|
|
while (line < tail) {
|
|
|
|
const char *sig = NULL;
|
reuse cached commit buffer when parsing signatures
When we call show_signature or show_mergetag, we read the
commit object fresh via read_sha1_file and reparse its
headers. However, in most cases we already have the object
data available, attached to the "struct commit". This is
partially laziness in dealing with the memory allocation
issues, but partially defensive programming, in that we
would always want to verify a clean version of the buffer
(not one that might have been munged by other users of the
commit).
However, we do not currently ever munge the commit buffer,
and not using the already-available buffer carries a fairly
big performance penalty when we are looking at a large
number of commits. Here are timings on linux.git:
[baseline, no signatures]
$ time git log >/dev/null
real 0m4.902s
user 0m4.784s
sys 0m0.120s
[before]
$ time git log --show-signature >/dev/null
real 0m14.735s
user 0m9.964s
sys 0m0.944s
[after]
$ time git log --show-signature >/dev/null
real 0m9.981s
user 0m5.260s
sys 0m0.936s
Note that our user CPU time drops almost in half, close to
the non-signature case, but we do still spend more
wall-clock and system time, presumably from dealing with
gpg.
An alternative to this is to note that most commits do not
have signatures (less than 1% in this repo), yet we pay the
re-parsing cost for every commit just to find out if it has
a mergetag or signature. If we checked that when parsing the
commit initially, we could avoid re-examining most commits
later on. Even if we did pursue that direction, however,
this would still speed up the cases where we _do_ have
signatures. So it's probably worth doing either way.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 14:32:11 +08:00
|
|
|
const char *next = memchr(line, '\n', tail - line);
|
2011-10-19 06:53:23 +08:00
|
|
|
|
|
|
|
next = next ? next + 1 : tail;
|
|
|
|
if (in_signature && line[0] == ' ')
|
|
|
|
sig = line + 1;
|
2021-01-19 07:49:11 +08:00
|
|
|
else if (skip_prefix(line, gpg_sig_header, &p) &&
|
|
|
|
*p == ' ') {
|
|
|
|
sig = line + strlen(gpg_sig_header) + 1;
|
|
|
|
other_signature = 0;
|
|
|
|
}
|
|
|
|
else if (starts_with(line, "gpgsig"))
|
|
|
|
other_signature = 1;
|
|
|
|
else if (other_signature && line[0] != ' ')
|
|
|
|
other_signature = 0;
|
2011-10-19 06:53:23 +08:00
|
|
|
if (sig) {
|
|
|
|
strbuf_add(signature, sig, next - sig);
|
|
|
|
saw_signature = 1;
|
|
|
|
in_signature = 1;
|
|
|
|
} else {
|
|
|
|
if (*line == '\n')
|
|
|
|
/* dump the whole remainder of the buffer */
|
|
|
|
next = tail;
|
2021-01-19 07:49:11 +08:00
|
|
|
if (!other_signature)
|
|
|
|
strbuf_add(payload, line, next - line);
|
2011-10-19 06:53:23 +08:00
|
|
|
in_signature = 0;
|
|
|
|
}
|
|
|
|
line = next;
|
|
|
|
}
|
|
|
|
return saw_signature;
|
|
|
|
}
|
|
|
|
|
2014-07-19 23:01:12 +08:00
|
|
|
int remove_signature(struct strbuf *buf)
|
|
|
|
{
|
|
|
|
const char *line = buf->buf;
|
|
|
|
const char *tail = buf->buf + buf->len;
|
|
|
|
int in_signature = 0;
|
2021-01-19 07:49:11 +08:00
|
|
|
struct sigbuf {
|
|
|
|
const char *start;
|
|
|
|
const char *end;
|
|
|
|
} sigs[2], *sigp = &sigs[0];
|
|
|
|
int i;
|
|
|
|
const char *orig_buf = buf->buf;
|
|
|
|
|
|
|
|
memset(sigs, 0, sizeof(sigs));
|
2014-07-19 23:01:12 +08:00
|
|
|
|
|
|
|
while (line < tail) {
|
|
|
|
const char *next = memchr(line, '\n', tail - line);
|
|
|
|
next = next ? next + 1 : tail;
|
|
|
|
|
|
|
|
if (in_signature && line[0] == ' ')
|
2021-01-19 07:49:11 +08:00
|
|
|
sigp->end = next;
|
2020-02-23 04:17:42 +08:00
|
|
|
else if (starts_with(line, "gpgsig")) {
|
|
|
|
int i;
|
|
|
|
for (i = 1; i < GIT_HASH_NALGOS; i++) {
|
|
|
|
const char *p;
|
|
|
|
if (skip_prefix(line, gpg_sig_headers[i], &p) &&
|
|
|
|
*p == ' ') {
|
2021-01-19 07:49:11 +08:00
|
|
|
sigp->start = line;
|
|
|
|
sigp->end = next;
|
2020-02-23 04:17:42 +08:00
|
|
|
in_signature = 1;
|
|
|
|
}
|
|
|
|
}
|
2014-07-19 23:01:12 +08:00
|
|
|
} else {
|
|
|
|
if (*line == '\n')
|
|
|
|
/* dump the whole remainder of the buffer */
|
|
|
|
next = tail;
|
2021-01-19 07:49:11 +08:00
|
|
|
if (in_signature && sigp - sigs != ARRAY_SIZE(sigs))
|
|
|
|
sigp++;
|
2014-07-19 23:01:12 +08:00
|
|
|
in_signature = 0;
|
|
|
|
}
|
|
|
|
line = next;
|
|
|
|
}
|
|
|
|
|
2021-01-19 07:49:11 +08:00
|
|
|
for (i = ARRAY_SIZE(sigs) - 1; i >= 0; i--)
|
|
|
|
if (sigs[i].start)
|
|
|
|
strbuf_remove(buf, sigs[i].start - orig_buf, sigs[i].end - sigs[i].start);
|
2014-07-19 23:01:12 +08:00
|
|
|
|
2021-01-19 07:49:11 +08:00
|
|
|
return sigs[0].start != NULL;
|
2014-07-19 23:01:12 +08:00
|
|
|
}
|
|
|
|
|
2024-06-11 17:20:42 +08:00
|
|
|
static void handle_signed_tag(const struct commit *parent, struct commit_extra_header ***tail)
|
2011-11-08 08:21:32 +08:00
|
|
|
{
|
|
|
|
struct merge_remote_desc *desc;
|
|
|
|
struct commit_extra_header *mergetag;
|
|
|
|
char *buf;
|
2021-02-11 10:08:03 +08:00
|
|
|
unsigned long size;
|
2011-11-08 08:21:32 +08:00
|
|
|
enum object_type type;
|
2021-02-11 10:08:03 +08:00
|
|
|
struct strbuf payload = STRBUF_INIT;
|
|
|
|
struct strbuf signature = STRBUF_INIT;
|
2011-11-08 08:21:32 +08:00
|
|
|
|
|
|
|
desc = merge_remote_util(parent);
|
|
|
|
if (!desc || !desc->obj)
|
|
|
|
return;
|
2023-03-28 21:58:50 +08:00
|
|
|
buf = repo_read_object_file(the_repository, &desc->obj->oid, &type,
|
|
|
|
&size);
|
2011-11-08 08:21:32 +08:00
|
|
|
if (!buf || type != OBJ_TAG)
|
|
|
|
goto free_return;
|
2021-02-11 10:08:03 +08:00
|
|
|
if (!parse_signature(buf, size, &payload, &signature))
|
2011-11-08 08:21:32 +08:00
|
|
|
goto free_return;
|
|
|
|
/*
|
|
|
|
* We could verify this signature and either omit the tag when
|
|
|
|
* it does not validate, but the integrator may not have the
|
2021-06-15 22:11:10 +08:00
|
|
|
* public key of the signer of the tag being merged, while a
|
2011-11-08 08:21:32 +08:00
|
|
|
* later auditor may have it while auditing, so let's not run
|
|
|
|
* verify-signed-buffer here for now...
|
|
|
|
*
|
|
|
|
* if (verify_signed_buffer(buf, len, buf + len, size - len, ...))
|
|
|
|
* warn("warning: signed tag unverified.");
|
|
|
|
*/
|
2021-03-14 00:17:22 +08:00
|
|
|
CALLOC_ARRAY(mergetag, 1);
|
2011-11-08 08:21:32 +08:00
|
|
|
mergetag->key = xstrdup("mergetag");
|
|
|
|
mergetag->value = buf;
|
|
|
|
mergetag->len = size;
|
|
|
|
|
|
|
|
**tail = mergetag;
|
|
|
|
*tail = &mergetag->next;
|
2021-02-11 10:08:03 +08:00
|
|
|
strbuf_release(&payload);
|
|
|
|
strbuf_release(&signature);
|
2011-11-08 08:21:32 +08:00
|
|
|
return;
|
|
|
|
|
|
|
|
free_return:
|
|
|
|
free(buf);
|
|
|
|
}
|
|
|
|
|
2015-06-22 07:14:40 +08:00
|
|
|
int check_commit_signature(const struct commit *commit, struct signature_check *sigc)
|
2013-04-01 00:00:14 +08:00
|
|
|
{
|
|
|
|
struct strbuf payload = STRBUF_INIT;
|
|
|
|
struct strbuf signature = STRBUF_INIT;
|
2015-06-22 07:14:40 +08:00
|
|
|
int ret = 1;
|
2013-04-01 00:00:14 +08:00
|
|
|
|
|
|
|
sigc->result = 'N';
|
|
|
|
|
2021-01-19 07:49:11 +08:00
|
|
|
if (parse_signed_commit(commit, &payload, &signature, the_hash_algo) <= 0)
|
2013-04-01 00:00:14 +08:00
|
|
|
goto out;
|
2021-12-09 16:52:43 +08:00
|
|
|
|
2021-12-09 16:52:45 +08:00
|
|
|
sigc->payload_type = SIGNATURE_PAYLOAD_COMMIT;
|
2021-12-09 16:52:43 +08:00
|
|
|
sigc->payload = strbuf_detach(&payload, &sigc->payload_len);
|
|
|
|
ret = check_signature(sigc, signature.buf, signature.len);
|
2013-04-01 00:00:14 +08:00
|
|
|
|
|
|
|
out:
|
|
|
|
strbuf_release(&payload);
|
|
|
|
strbuf_release(&signature);
|
2015-06-22 07:14:40 +08:00
|
|
|
|
|
|
|
return ret;
|
2013-04-01 00:00:14 +08:00
|
|
|
}
|
|
|
|
|
gpg-interface: add minTrustLevel as a configuration option
Previously, signature verification for merge and pull operations checked
if the key had a trust-level of either TRUST_NEVER or TRUST_UNDEFINED in
verify_merge_signature(). If that was the case, the process die()d.
The other code paths that did signature verification relied entirely on
the return code from check_commit_signature(). And signatures made with
a good key, irregardless of its trust level, was considered valid by
check_commit_signature().
This difference in behavior might induce users to erroneously assume
that the trust level of a key in their keyring is always considered by
Git, even for operations where it is not (e.g. during a verify-commit or
verify-tag).
The way it worked was by gpg-interface.c storing the result from the
key/signature status *and* the lowest-two trust levels in the `result`
member of the signature_check structure (the last of these status lines
that were encountered got written to `result`). These are documented in
GPG under the subsection `General status codes` and `Key related`,
respectively [1].
The GPG documentation says the following on the TRUST_ status codes [1]:
"""
These are several similar status codes:
- TRUST_UNDEFINED <error_token>
- TRUST_NEVER <error_token>
- TRUST_MARGINAL [0 [<validation_model>]]
- TRUST_FULLY [0 [<validation_model>]]
- TRUST_ULTIMATE [0 [<validation_model>]]
For good signatures one of these status lines are emitted to
indicate the validity of the key used to create the signature.
The error token values are currently only emitted by gpgsm.
"""
My interpretation is that the trust level is conceptionally different
from the validity of the key and/or signature. That seems to also have
been the assumption of the old code in check_signature() where a result
of 'G' (as in GOODSIG) and 'U' (as in TRUST_NEVER or TRUST_UNDEFINED)
were both considered a success.
The two cases where a result of 'U' had special meaning were in
verify_merge_signature() (where this caused git to die()) and in
format_commit_one() (where it affected the output of the %G? format
specifier).
I think it makes sense to refactor the processing of TRUST_ status lines
such that users can configure a minimum trust level that is enforced
globally, rather than have individual parts of git (e.g. merge) do it
themselves (except for a grace period with backward compatibility).
I also think it makes sense to not store the trust level in the same
struct member as the key/signature status. While the presence of a
TRUST_ status code does imply that the signature is good (see the first
paragraph in the included snippet above), as far as I can tell, the
order of the status lines from GPG isn't well-defined; thus it would
seem plausible that the trust level could be overwritten with the
key/signature status if they were stored in the same member of the
signature_check structure.
This patch introduces a new configuration option: gpg.minTrustLevel. It
consolidates trust-level verification to gpg-interface.c and adds a new
`trust_level` member to the signature_check structure.
Backward-compatibility is maintained by introducing a special case in
verify_merge_signature() such that if no user-configurable
gpg.minTrustLevel is set, then the old behavior of rejecting
TRUST_UNDEFINED and TRUST_NEVER is enforced. If, on the other hand,
gpg.minTrustLevel is set, then that value overrides the old behavior.
Similarly, the %G? format specifier will continue show 'U' for
signatures made with a key that has a trust level of TRUST_UNDEFINED or
TRUST_NEVER, even though the 'U' character no longer exist in the
`result` member of the signature_check structure. A new format
specifier, %GT, is also introduced for users that want to show all
possible trust levels for a signature.
Another approach would have been to simply drop the trust-level
requirement in verify_merge_signature(). This would also have made the
behavior consistent with other parts of git that perform signature
verification. However, requiring a minimum trust level for signing keys
does seem to have a real-world use-case. For example, the build system
used by the Qubes OS project currently parses the raw output from
verify-tag in order to assert a minimum trust level for keys used to
sign git tags [2].
[1] https://git.gnupg.org/cgi-bin/gitweb.cgi?p=gnupg.git;a=blob;f=doc/doc/DETAILS;h=bd00006e933ac56719b1edd2478ecd79273eae72;hb=refs/heads/master
[2] https://github.com/QubesOS/qubes-builder/blob/9674c1991deef45b1a1b1c71fddfab14ba50dccf/scripts/verify-git-tag#L43
Signed-off-by: Hans Jerry Illikainen <hji@dyntopia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-27 21:55:57 +08:00
|
|
|
void verify_merge_signature(struct commit *commit, int verbosity,
|
|
|
|
int check_trust)
|
2018-11-06 15:50:17 +08:00
|
|
|
{
|
|
|
|
char hex[GIT_MAX_HEXSZ + 1];
|
|
|
|
struct signature_check signature_check;
|
gpg-interface: add minTrustLevel as a configuration option
Previously, signature verification for merge and pull operations checked
if the key had a trust-level of either TRUST_NEVER or TRUST_UNDEFINED in
verify_merge_signature(). If that was the case, the process die()d.
The other code paths that did signature verification relied entirely on
the return code from check_commit_signature(). And signatures made with
a good key, irregardless of its trust level, was considered valid by
check_commit_signature().
This difference in behavior might induce users to erroneously assume
that the trust level of a key in their keyring is always considered by
Git, even for operations where it is not (e.g. during a verify-commit or
verify-tag).
The way it worked was by gpg-interface.c storing the result from the
key/signature status *and* the lowest-two trust levels in the `result`
member of the signature_check structure (the last of these status lines
that were encountered got written to `result`). These are documented in
GPG under the subsection `General status codes` and `Key related`,
respectively [1].
The GPG documentation says the following on the TRUST_ status codes [1]:
"""
These are several similar status codes:
- TRUST_UNDEFINED <error_token>
- TRUST_NEVER <error_token>
- TRUST_MARGINAL [0 [<validation_model>]]
- TRUST_FULLY [0 [<validation_model>]]
- TRUST_ULTIMATE [0 [<validation_model>]]
For good signatures one of these status lines are emitted to
indicate the validity of the key used to create the signature.
The error token values are currently only emitted by gpgsm.
"""
My interpretation is that the trust level is conceptionally different
from the validity of the key and/or signature. That seems to also have
been the assumption of the old code in check_signature() where a result
of 'G' (as in GOODSIG) and 'U' (as in TRUST_NEVER or TRUST_UNDEFINED)
were both considered a success.
The two cases where a result of 'U' had special meaning were in
verify_merge_signature() (where this caused git to die()) and in
format_commit_one() (where it affected the output of the %G? format
specifier).
I think it makes sense to refactor the processing of TRUST_ status lines
such that users can configure a minimum trust level that is enforced
globally, rather than have individual parts of git (e.g. merge) do it
themselves (except for a grace period with backward compatibility).
I also think it makes sense to not store the trust level in the same
struct member as the key/signature status. While the presence of a
TRUST_ status code does imply that the signature is good (see the first
paragraph in the included snippet above), as far as I can tell, the
order of the status lines from GPG isn't well-defined; thus it would
seem plausible that the trust level could be overwritten with the
key/signature status if they were stored in the same member of the
signature_check structure.
This patch introduces a new configuration option: gpg.minTrustLevel. It
consolidates trust-level verification to gpg-interface.c and adds a new
`trust_level` member to the signature_check structure.
Backward-compatibility is maintained by introducing a special case in
verify_merge_signature() such that if no user-configurable
gpg.minTrustLevel is set, then the old behavior of rejecting
TRUST_UNDEFINED and TRUST_NEVER is enforced. If, on the other hand,
gpg.minTrustLevel is set, then that value overrides the old behavior.
Similarly, the %G? format specifier will continue show 'U' for
signatures made with a key that has a trust level of TRUST_UNDEFINED or
TRUST_NEVER, even though the 'U' character no longer exist in the
`result` member of the signature_check structure. A new format
specifier, %GT, is also introduced for users that want to show all
possible trust levels for a signature.
Another approach would have been to simply drop the trust-level
requirement in verify_merge_signature(). This would also have made the
behavior consistent with other parts of git that perform signature
verification. However, requiring a minimum trust level for signing keys
does seem to have a real-world use-case. For example, the build system
used by the Qubes OS project currently parses the raw output from
verify-tag in order to assert a minimum trust level for keys used to
sign git tags [2].
[1] https://git.gnupg.org/cgi-bin/gitweb.cgi?p=gnupg.git;a=blob;f=doc/doc/DETAILS;h=bd00006e933ac56719b1edd2478ecd79273eae72;hb=refs/heads/master
[2] https://github.com/QubesOS/qubes-builder/blob/9674c1991deef45b1a1b1c71fddfab14ba50dccf/scripts/verify-git-tag#L43
Signed-off-by: Hans Jerry Illikainen <hji@dyntopia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-27 21:55:57 +08:00
|
|
|
int ret;
|
2018-11-06 15:50:17 +08:00
|
|
|
memset(&signature_check, 0, sizeof(signature_check));
|
|
|
|
|
gpg-interface: add minTrustLevel as a configuration option
Previously, signature verification for merge and pull operations checked
if the key had a trust-level of either TRUST_NEVER or TRUST_UNDEFINED in
verify_merge_signature(). If that was the case, the process die()d.
The other code paths that did signature verification relied entirely on
the return code from check_commit_signature(). And signatures made with
a good key, irregardless of its trust level, was considered valid by
check_commit_signature().
This difference in behavior might induce users to erroneously assume
that the trust level of a key in their keyring is always considered by
Git, even for operations where it is not (e.g. during a verify-commit or
verify-tag).
The way it worked was by gpg-interface.c storing the result from the
key/signature status *and* the lowest-two trust levels in the `result`
member of the signature_check structure (the last of these status lines
that were encountered got written to `result`). These are documented in
GPG under the subsection `General status codes` and `Key related`,
respectively [1].
The GPG documentation says the following on the TRUST_ status codes [1]:
"""
These are several similar status codes:
- TRUST_UNDEFINED <error_token>
- TRUST_NEVER <error_token>
- TRUST_MARGINAL [0 [<validation_model>]]
- TRUST_FULLY [0 [<validation_model>]]
- TRUST_ULTIMATE [0 [<validation_model>]]
For good signatures one of these status lines are emitted to
indicate the validity of the key used to create the signature.
The error token values are currently only emitted by gpgsm.
"""
My interpretation is that the trust level is conceptionally different
from the validity of the key and/or signature. That seems to also have
been the assumption of the old code in check_signature() where a result
of 'G' (as in GOODSIG) and 'U' (as in TRUST_NEVER or TRUST_UNDEFINED)
were both considered a success.
The two cases where a result of 'U' had special meaning were in
verify_merge_signature() (where this caused git to die()) and in
format_commit_one() (where it affected the output of the %G? format
specifier).
I think it makes sense to refactor the processing of TRUST_ status lines
such that users can configure a minimum trust level that is enforced
globally, rather than have individual parts of git (e.g. merge) do it
themselves (except for a grace period with backward compatibility).
I also think it makes sense to not store the trust level in the same
struct member as the key/signature status. While the presence of a
TRUST_ status code does imply that the signature is good (see the first
paragraph in the included snippet above), as far as I can tell, the
order of the status lines from GPG isn't well-defined; thus it would
seem plausible that the trust level could be overwritten with the
key/signature status if they were stored in the same member of the
signature_check structure.
This patch introduces a new configuration option: gpg.minTrustLevel. It
consolidates trust-level verification to gpg-interface.c and adds a new
`trust_level` member to the signature_check structure.
Backward-compatibility is maintained by introducing a special case in
verify_merge_signature() such that if no user-configurable
gpg.minTrustLevel is set, then the old behavior of rejecting
TRUST_UNDEFINED and TRUST_NEVER is enforced. If, on the other hand,
gpg.minTrustLevel is set, then that value overrides the old behavior.
Similarly, the %G? format specifier will continue show 'U' for
signatures made with a key that has a trust level of TRUST_UNDEFINED or
TRUST_NEVER, even though the 'U' character no longer exist in the
`result` member of the signature_check structure. A new format
specifier, %GT, is also introduced for users that want to show all
possible trust levels for a signature.
Another approach would have been to simply drop the trust-level
requirement in verify_merge_signature(). This would also have made the
behavior consistent with other parts of git that perform signature
verification. However, requiring a minimum trust level for signing keys
does seem to have a real-world use-case. For example, the build system
used by the Qubes OS project currently parses the raw output from
verify-tag in order to assert a minimum trust level for keys used to
sign git tags [2].
[1] https://git.gnupg.org/cgi-bin/gitweb.cgi?p=gnupg.git;a=blob;f=doc/doc/DETAILS;h=bd00006e933ac56719b1edd2478ecd79273eae72;hb=refs/heads/master
[2] https://github.com/QubesOS/qubes-builder/blob/9674c1991deef45b1a1b1c71fddfab14ba50dccf/scripts/verify-git-tag#L43
Signed-off-by: Hans Jerry Illikainen <hji@dyntopia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-27 21:55:57 +08:00
|
|
|
ret = check_commit_signature(commit, &signature_check);
|
2018-11-06 15:50:17 +08:00
|
|
|
|
2023-03-28 21:58:46 +08:00
|
|
|
repo_find_unique_abbrev_r(the_repository, hex, &commit->object.oid,
|
|
|
|
DEFAULT_ABBREV);
|
2018-11-06 15:50:17 +08:00
|
|
|
switch (signature_check.result) {
|
|
|
|
case 'G':
|
gpg-interface: add minTrustLevel as a configuration option
Previously, signature verification for merge and pull operations checked
if the key had a trust-level of either TRUST_NEVER or TRUST_UNDEFINED in
verify_merge_signature(). If that was the case, the process die()d.
The other code paths that did signature verification relied entirely on
the return code from check_commit_signature(). And signatures made with
a good key, irregardless of its trust level, was considered valid by
check_commit_signature().
This difference in behavior might induce users to erroneously assume
that the trust level of a key in their keyring is always considered by
Git, even for operations where it is not (e.g. during a verify-commit or
verify-tag).
The way it worked was by gpg-interface.c storing the result from the
key/signature status *and* the lowest-two trust levels in the `result`
member of the signature_check structure (the last of these status lines
that were encountered got written to `result`). These are documented in
GPG under the subsection `General status codes` and `Key related`,
respectively [1].
The GPG documentation says the following on the TRUST_ status codes [1]:
"""
These are several similar status codes:
- TRUST_UNDEFINED <error_token>
- TRUST_NEVER <error_token>
- TRUST_MARGINAL [0 [<validation_model>]]
- TRUST_FULLY [0 [<validation_model>]]
- TRUST_ULTIMATE [0 [<validation_model>]]
For good signatures one of these status lines are emitted to
indicate the validity of the key used to create the signature.
The error token values are currently only emitted by gpgsm.
"""
My interpretation is that the trust level is conceptionally different
from the validity of the key and/or signature. That seems to also have
been the assumption of the old code in check_signature() where a result
of 'G' (as in GOODSIG) and 'U' (as in TRUST_NEVER or TRUST_UNDEFINED)
were both considered a success.
The two cases where a result of 'U' had special meaning were in
verify_merge_signature() (where this caused git to die()) and in
format_commit_one() (where it affected the output of the %G? format
specifier).
I think it makes sense to refactor the processing of TRUST_ status lines
such that users can configure a minimum trust level that is enforced
globally, rather than have individual parts of git (e.g. merge) do it
themselves (except for a grace period with backward compatibility).
I also think it makes sense to not store the trust level in the same
struct member as the key/signature status. While the presence of a
TRUST_ status code does imply that the signature is good (see the first
paragraph in the included snippet above), as far as I can tell, the
order of the status lines from GPG isn't well-defined; thus it would
seem plausible that the trust level could be overwritten with the
key/signature status if they were stored in the same member of the
signature_check structure.
This patch introduces a new configuration option: gpg.minTrustLevel. It
consolidates trust-level verification to gpg-interface.c and adds a new
`trust_level` member to the signature_check structure.
Backward-compatibility is maintained by introducing a special case in
verify_merge_signature() such that if no user-configurable
gpg.minTrustLevel is set, then the old behavior of rejecting
TRUST_UNDEFINED and TRUST_NEVER is enforced. If, on the other hand,
gpg.minTrustLevel is set, then that value overrides the old behavior.
Similarly, the %G? format specifier will continue show 'U' for
signatures made with a key that has a trust level of TRUST_UNDEFINED or
TRUST_NEVER, even though the 'U' character no longer exist in the
`result` member of the signature_check structure. A new format
specifier, %GT, is also introduced for users that want to show all
possible trust levels for a signature.
Another approach would have been to simply drop the trust-level
requirement in verify_merge_signature(). This would also have made the
behavior consistent with other parts of git that perform signature
verification. However, requiring a minimum trust level for signing keys
does seem to have a real-world use-case. For example, the build system
used by the Qubes OS project currently parses the raw output from
verify-tag in order to assert a minimum trust level for keys used to
sign git tags [2].
[1] https://git.gnupg.org/cgi-bin/gitweb.cgi?p=gnupg.git;a=blob;f=doc/doc/DETAILS;h=bd00006e933ac56719b1edd2478ecd79273eae72;hb=refs/heads/master
[2] https://github.com/QubesOS/qubes-builder/blob/9674c1991deef45b1a1b1c71fddfab14ba50dccf/scripts/verify-git-tag#L43
Signed-off-by: Hans Jerry Illikainen <hji@dyntopia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-27 21:55:57 +08:00
|
|
|
if (ret || (check_trust && signature_check.trust_level < TRUST_MARGINAL))
|
|
|
|
die(_("Commit %s has an untrusted GPG signature, "
|
|
|
|
"allegedly by %s."), hex, signature_check.signer);
|
2018-11-06 15:50:17 +08:00
|
|
|
break;
|
|
|
|
case 'B':
|
|
|
|
die(_("Commit %s has a bad GPG signature "
|
|
|
|
"allegedly by %s."), hex, signature_check.signer);
|
|
|
|
default: /* 'N' */
|
|
|
|
die(_("Commit %s does not have a GPG signature."), hex);
|
|
|
|
}
|
|
|
|
if (verbosity >= 0 && signature_check.result == 'G')
|
|
|
|
printf(_("Commit %s has a good GPG signature by %s\n"),
|
|
|
|
hex, signature_check.signer);
|
2013-04-01 00:00:14 +08:00
|
|
|
|
2018-11-06 15:50:17 +08:00
|
|
|
signature_check_clear(&signature_check);
|
|
|
|
}
|
2013-04-01 00:00:14 +08:00
|
|
|
|
2024-06-11 17:20:42 +08:00
|
|
|
void append_merge_tag_headers(const struct commit_list *parents,
|
2011-11-08 08:21:32 +08:00
|
|
|
struct commit_extra_header ***tail)
|
|
|
|
{
|
|
|
|
while (parents) {
|
2024-06-11 17:20:42 +08:00
|
|
|
const struct commit *parent = parents->item;
|
2011-11-08 08:21:32 +08:00
|
|
|
handle_signed_tag(parent, tail);
|
|
|
|
parents = parents->next;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2024-06-11 17:20:42 +08:00
|
|
|
static int convert_commit_extra_headers(const struct commit_extra_header *orig,
|
2023-10-02 10:40:14 +08:00
|
|
|
struct commit_extra_header **result)
|
|
|
|
{
|
|
|
|
const struct git_hash_algo *compat = the_repository->compat_hash_algo;
|
|
|
|
const struct git_hash_algo *algo = the_repository->hash_algo;
|
|
|
|
struct commit_extra_header *extra = NULL, **tail = &extra;
|
|
|
|
struct strbuf out = STRBUF_INIT;
|
|
|
|
while (orig) {
|
|
|
|
struct commit_extra_header *new;
|
|
|
|
CALLOC_ARRAY(new, 1);
|
|
|
|
if (!strcmp(orig->key, "mergetag")) {
|
|
|
|
if (convert_object_file(&out, algo, compat,
|
|
|
|
orig->value, orig->len,
|
|
|
|
OBJ_TAG, 1)) {
|
|
|
|
free(new);
|
|
|
|
free_commit_extra_headers(extra);
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
new->key = xstrdup("mergetag");
|
|
|
|
new->value = strbuf_detach(&out, &new->len);
|
|
|
|
} else {
|
|
|
|
new->key = xstrdup(orig->key);
|
|
|
|
new->len = orig->len;
|
|
|
|
new->value = xmemdupz(orig->value, orig->len);
|
|
|
|
}
|
|
|
|
*tail = new;
|
|
|
|
tail = &new->next;
|
|
|
|
orig = orig->next;
|
|
|
|
}
|
|
|
|
*result = extra;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2011-11-08 08:21:32 +08:00
|
|
|
static void add_extra_header(struct strbuf *buffer,
|
2024-06-11 17:20:42 +08:00
|
|
|
const struct commit_extra_header *extra)
|
2011-11-08 08:21:32 +08:00
|
|
|
{
|
|
|
|
strbuf_addstr(buffer, extra->key);
|
2011-11-09 07:38:07 +08:00
|
|
|
if (extra->len)
|
|
|
|
strbuf_add_lines(buffer, " ", extra->value, extra->len);
|
|
|
|
else
|
|
|
|
strbuf_addch(buffer, '\n');
|
|
|
|
}
|
|
|
|
|
2012-01-06 02:54:14 +08:00
|
|
|
struct commit_extra_header *read_commit_extra_headers(struct commit *commit,
|
|
|
|
const char **exclude)
|
2011-11-09 07:38:07 +08:00
|
|
|
{
|
|
|
|
struct commit_extra_header *extra = NULL;
|
|
|
|
unsigned long size;
|
2023-03-28 21:58:48 +08:00
|
|
|
const char *buffer = repo_get_commit_buffer(the_repository, commit,
|
|
|
|
&size);
|
reuse cached commit buffer when parsing signatures
When we call show_signature or show_mergetag, we read the
commit object fresh via read_sha1_file and reparse its
headers. However, in most cases we already have the object
data available, attached to the "struct commit". This is
partially laziness in dealing with the memory allocation
issues, but partially defensive programming, in that we
would always want to verify a clean version of the buffer
(not one that might have been munged by other users of the
commit).
However, we do not currently ever munge the commit buffer,
and not using the already-available buffer carries a fairly
big performance penalty when we are looking at a large
number of commits. Here are timings on linux.git:
[baseline, no signatures]
$ time git log >/dev/null
real 0m4.902s
user 0m4.784s
sys 0m0.120s
[before]
$ time git log --show-signature >/dev/null
real 0m14.735s
user 0m9.964s
sys 0m0.944s
[after]
$ time git log --show-signature >/dev/null
real 0m9.981s
user 0m5.260s
sys 0m0.936s
Note that our user CPU time drops almost in half, close to
the non-signature case, but we do still spend more
wall-clock and system time, presumably from dealing with
gpg.
An alternative to this is to note that most commits do not
have signatures (less than 1% in this repo), yet we pay the
re-parsing cost for every commit just to find out if it has
a mergetag or signature. If we checked that when parsing the
commit initially, we could avoid re-examining most commits
later on. Even if we did pursue that direction, however,
this would still speed up the cases where we _do_ have
signatures. So it's probably worth doing either way.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 14:32:11 +08:00
|
|
|
extra = read_commit_extra_header_lines(buffer, size, exclude);
|
2023-03-28 21:58:48 +08:00
|
|
|
repo_unuse_commit_buffer(the_repository, commit, buffer);
|
2011-11-09 07:38:07 +08:00
|
|
|
return extra;
|
|
|
|
}
|
|
|
|
|
2018-04-25 17:54:04 +08:00
|
|
|
int for_each_mergetag(each_mergetag_fn fn, struct commit *commit, void *data)
|
2014-07-07 14:35:37 +08:00
|
|
|
{
|
|
|
|
struct commit_extra_header *extra, *to_free;
|
2018-04-25 17:54:04 +08:00
|
|
|
int res = 0;
|
2014-07-07 14:35:37 +08:00
|
|
|
|
|
|
|
to_free = read_commit_extra_headers(commit, NULL);
|
2018-04-25 17:54:04 +08:00
|
|
|
for (extra = to_free; !res && extra; extra = extra->next) {
|
2014-07-07 14:35:37 +08:00
|
|
|
if (strcmp(extra->key, "mergetag"))
|
|
|
|
continue; /* not a merge tag */
|
2018-04-25 17:54:04 +08:00
|
|
|
res = fn(commit, extra, data);
|
2014-07-07 14:35:37 +08:00
|
|
|
}
|
|
|
|
free_commit_extra_headers(to_free);
|
2018-04-25 17:54:04 +08:00
|
|
|
return res;
|
2014-07-07 14:35:37 +08:00
|
|
|
}
|
|
|
|
|
2011-11-09 07:38:07 +08:00
|
|
|
static inline int standard_header_field(const char *field, size_t len)
|
|
|
|
{
|
2017-02-26 03:27:40 +08:00
|
|
|
return ((len == 4 && !memcmp(field, "tree", 4)) ||
|
|
|
|
(len == 6 && !memcmp(field, "parent", 6)) ||
|
|
|
|
(len == 6 && !memcmp(field, "author", 6)) ||
|
|
|
|
(len == 9 && !memcmp(field, "committer", 9)) ||
|
|
|
|
(len == 8 && !memcmp(field, "encoding", 8)));
|
2011-11-09 07:38:07 +08:00
|
|
|
}
|
|
|
|
|
2012-01-06 02:54:14 +08:00
|
|
|
static int excluded_header_field(const char *field, size_t len, const char **exclude)
|
|
|
|
{
|
|
|
|
if (!exclude)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
while (*exclude) {
|
|
|
|
size_t xlen = strlen(*exclude);
|
2017-02-26 03:27:40 +08:00
|
|
|
if (len == xlen && !memcmp(field, *exclude, xlen))
|
2012-01-06 02:54:14 +08:00
|
|
|
return 1;
|
|
|
|
exclude++;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2012-09-16 04:58:15 +08:00
|
|
|
static struct commit_extra_header *read_commit_extra_header_lines(
|
|
|
|
const char *buffer, size_t size,
|
|
|
|
const char **exclude)
|
2011-11-09 07:38:07 +08:00
|
|
|
{
|
|
|
|
struct commit_extra_header *extra = NULL, **tail = &extra, *it = NULL;
|
|
|
|
const char *line, *next, *eof, *eob;
|
|
|
|
struct strbuf buf = STRBUF_INIT;
|
|
|
|
|
|
|
|
for (line = buffer, eob = line + size;
|
|
|
|
line < eob && *line != '\n';
|
|
|
|
line = next) {
|
|
|
|
next = memchr(line, '\n', eob - line);
|
|
|
|
next = next ? next + 1 : eob;
|
|
|
|
if (*line == ' ') {
|
|
|
|
/* continuation */
|
|
|
|
if (it)
|
|
|
|
strbuf_add(&buf, line + 1, next - (line + 1));
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
if (it)
|
|
|
|
it->value = strbuf_detach(&buf, &it->len);
|
|
|
|
strbuf_reset(&buf);
|
|
|
|
it = NULL;
|
|
|
|
|
2017-02-26 03:21:52 +08:00
|
|
|
eof = memchr(line, ' ', next - line);
|
|
|
|
if (!eof)
|
2011-11-09 07:38:07 +08:00
|
|
|
eof = next;
|
2017-02-26 03:27:40 +08:00
|
|
|
else if (standard_header_field(line, eof - line) ||
|
|
|
|
excluded_header_field(line, eof - line, exclude))
|
2011-11-09 07:38:07 +08:00
|
|
|
continue;
|
|
|
|
|
2021-03-14 00:17:22 +08:00
|
|
|
CALLOC_ARRAY(it, 1);
|
2011-11-09 07:38:07 +08:00
|
|
|
it->key = xmemdupz(line, eof-line);
|
|
|
|
*tail = it;
|
|
|
|
tail = &it->next;
|
|
|
|
if (eof + 1 < next)
|
|
|
|
strbuf_add(&buf, eof + 1, next - (eof + 1));
|
|
|
|
}
|
|
|
|
if (it)
|
|
|
|
it->value = strbuf_detach(&buf, &it->len);
|
|
|
|
return extra;
|
2011-11-08 08:21:32 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void free_commit_extra_headers(struct commit_extra_header *extra)
|
|
|
|
{
|
|
|
|
while (extra) {
|
|
|
|
struct commit_extra_header *next = extra->next;
|
|
|
|
free(extra->key);
|
|
|
|
free(extra->value);
|
|
|
|
free(extra);
|
|
|
|
extra = next;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-28 08:13:16 +08:00
|
|
|
int commit_tree(const char *msg, size_t msg_len, const struct object_id *tree,
|
2024-06-11 17:20:42 +08:00
|
|
|
const struct commit_list *parents, struct object_id *ret,
|
commit: teach --gpg-sign option
This uses the gpg-interface.[ch] to allow signing the commit, i.e.
$ git commit --gpg-sign -m foo
You need a passphrase to unlock the secret key for
user: "Junio C Hamano <gitster@pobox.com>"
4096-bit RSA key, ID 96AFE6CB, created 2011-10-03 (main key ID 713660A7)
[master 8457d13] foo
1 files changed, 1 insertions(+), 0 deletions(-)
The lines of GPG detached signature are placed in a new multi-line header
field, instead of tucking the signature block at the end of the commit log
message text (similar to how signed tag is done), for multiple reasons:
- The signature won't clutter output from "git log" and friends if it is
in the extra header. If we place it at the end of the log message, we
would need to teach "git log" and friends to strip the signature block
with an option.
- Teaching new versions of "git log" and "gitk" to optionally verify and
show signatures is cleaner if we structurally know where the signature
block is (instead of scanning in the commit log message).
- The signature needs to be stripped upon various commit rewriting
operations, e.g. rebase, filter-branch, etc. They all already ignore
unknown headers, but if we place signature in the log message, all of
these tools (and third-party tools) also need to learn how a signature
block would look like.
- When we added the optional encoding header, all the tools (both in tree
and third-party) that acts on the raw commit object should have been
fixed to ignore headers they do not understand, so it is not like that
new header would be more likely to break than extra text in the commit.
A commit made with the above sample sequence would look like this:
$ git cat-file commit HEAD
tree 3cd71d90e3db4136e5260ab54599791c4f883b9d
parent b87755351a47b09cb27d6913e6e0e17e6254a4d4
author Junio C Hamano <gitster@pobox.com> 1317862251 -0700
committer Junio C Hamano <gitster@pobox.com> 1317862251 -0700
gpgsig -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJOjPtrAAoJELC16IaWr+bL4TMP/RSe2Y/jYnCkds9unO5JEnfG
...
=dt98
-----END PGP SIGNATURE-----
foo
but "git log" (unless you ask for it with --pretty=raw) output is not
cluttered with the signature information.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-06 08:23:20 +08:00
|
|
|
const char *author, const char *sign_commit)
|
2011-11-08 08:21:32 +08:00
|
|
|
{
|
|
|
|
struct commit_extra_header *extra = NULL, **tail = &extra;
|
|
|
|
int result;
|
|
|
|
|
|
|
|
append_merge_tag_headers(parents, &tail);
|
2020-08-18 01:40:01 +08:00
|
|
|
result = commit_tree_extended(msg, msg_len, tree, parents, ret, author,
|
|
|
|
NULL, sign_commit, extra);
|
2011-11-08 08:21:32 +08:00
|
|
|
free_commit_extra_headers(extra);
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2012-06-29 02:24:14 +08:00
|
|
|
static int find_invalid_utf8(const char *buf, int len)
|
|
|
|
{
|
|
|
|
int offset = 0;
|
2013-07-05 01:20:34 +08:00
|
|
|
static const unsigned int max_codepoint[] = {
|
|
|
|
0x7f, 0x7ff, 0xffff, 0x10ffff
|
|
|
|
};
|
2012-06-29 02:24:14 +08:00
|
|
|
|
|
|
|
while (len) {
|
|
|
|
unsigned char c = *buf++;
|
|
|
|
int bytes, bad_offset;
|
2013-07-05 01:19:43 +08:00
|
|
|
unsigned int codepoint;
|
2013-07-05 01:20:34 +08:00
|
|
|
unsigned int min_val, max_val;
|
2012-06-29 02:24:14 +08:00
|
|
|
|
|
|
|
len--;
|
|
|
|
offset++;
|
|
|
|
|
|
|
|
/* Simple US-ASCII? No worries. */
|
|
|
|
if (c < 0x80)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
bad_offset = offset-1;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Count how many more high bits set: that's how
|
|
|
|
* many more bytes this sequence should have.
|
|
|
|
*/
|
|
|
|
bytes = 0;
|
|
|
|
while (c & 0x40) {
|
|
|
|
c <<= 1;
|
|
|
|
bytes++;
|
|
|
|
}
|
|
|
|
|
2013-07-05 01:19:43 +08:00
|
|
|
/*
|
|
|
|
* Must be between 1 and 3 more bytes. Longer sequences result in
|
|
|
|
* codepoints beyond U+10FFFF, which are guaranteed never to exist.
|
|
|
|
*/
|
|
|
|
if (bytes < 1 || 3 < bytes)
|
2012-06-29 02:24:14 +08:00
|
|
|
return bad_offset;
|
|
|
|
|
|
|
|
/* Do we *have* that many bytes? */
|
|
|
|
if (len < bytes)
|
|
|
|
return bad_offset;
|
|
|
|
|
2013-07-05 01:20:34 +08:00
|
|
|
/*
|
|
|
|
* Place the encoded bits at the bottom of the value and compute the
|
|
|
|
* valid range.
|
|
|
|
*/
|
2013-07-05 01:19:43 +08:00
|
|
|
codepoint = (c & 0x7f) >> bytes;
|
2013-07-05 01:20:34 +08:00
|
|
|
min_val = max_codepoint[bytes-1] + 1;
|
|
|
|
max_val = max_codepoint[bytes];
|
2013-07-05 01:19:43 +08:00
|
|
|
|
2012-06-29 02:24:14 +08:00
|
|
|
offset += bytes;
|
|
|
|
len -= bytes;
|
|
|
|
|
|
|
|
/* And verify that they are good continuation bytes */
|
|
|
|
do {
|
2013-07-05 01:19:43 +08:00
|
|
|
codepoint <<= 6;
|
|
|
|
codepoint |= *buf & 0x3f;
|
2012-06-29 02:24:14 +08:00
|
|
|
if ((*buf++ & 0xc0) != 0x80)
|
|
|
|
return bad_offset;
|
|
|
|
} while (--bytes);
|
|
|
|
|
2013-07-05 01:20:34 +08:00
|
|
|
/* Reject codepoints that are out of range for the sequence length. */
|
|
|
|
if (codepoint < min_val || codepoint > max_val)
|
2013-07-05 01:19:43 +08:00
|
|
|
return bad_offset;
|
|
|
|
/* Surrogates are only for UTF-16 and cannot be encoded in UTF-8. */
|
|
|
|
if ((codepoint & 0x1ff800) == 0xd800)
|
|
|
|
return bad_offset;
|
2013-07-09 19:16:33 +08:00
|
|
|
/* U+xxFFFE and U+xxFFFF are guaranteed non-characters. */
|
2013-08-06 00:52:28 +08:00
|
|
|
if ((codepoint & 0xfffe) == 0xfffe)
|
2013-07-09 19:16:33 +08:00
|
|
|
return bad_offset;
|
|
|
|
/* So are anything in the range U+FDD0..U+FDEF. */
|
|
|
|
if (codepoint >= 0xfdd0 && codepoint <= 0xfdef)
|
2013-07-05 01:19:43 +08:00
|
|
|
return bad_offset;
|
2012-06-29 02:24:14 +08:00
|
|
|
}
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This verifies that the buffer is in proper utf8 format.
|
|
|
|
*
|
|
|
|
* If it isn't, it assumes any non-utf8 characters are Latin1,
|
|
|
|
* and does the conversion.
|
|
|
|
*/
|
|
|
|
static int verify_utf8(struct strbuf *buf)
|
|
|
|
{
|
|
|
|
int ok = 1;
|
|
|
|
long pos = 0;
|
|
|
|
|
|
|
|
for (;;) {
|
|
|
|
int bad;
|
|
|
|
unsigned char c;
|
|
|
|
unsigned char replace[2];
|
|
|
|
|
|
|
|
bad = find_invalid_utf8(buf->buf + pos, buf->len - pos);
|
|
|
|
if (bad < 0)
|
|
|
|
return ok;
|
|
|
|
pos += bad;
|
|
|
|
ok = 0;
|
|
|
|
c = buf->buf[pos];
|
|
|
|
strbuf_remove(buf, pos, 1);
|
|
|
|
|
|
|
|
/* We know 'c' must be in the range 128-255 */
|
|
|
|
replace[0] = 0xc0 + (c >> 6);
|
|
|
|
replace[1] = 0x80 + (c & 0x3f);
|
|
|
|
strbuf_insert(buf, pos, replace, 2);
|
|
|
|
pos += 2;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
make commit_tree a library function
Until now, this has been part of the commit-tree builtin.
However, it is already used by other builtins (like commit,
merge, and notes), and it would be useful to access it from
library code.
The check_valid helper has to come along, too, but is given
a more library-ish name of "assert_sha1_type".
Otherwise, the code is unchanged. There are still a few
rough edges for a library function, like printing the utf8
warning to stderr, but we can address those if and when they
come up as inappropriate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-02 08:05:23 +08:00
|
|
|
static const char commit_utf8_warn[] =
|
2016-09-19 21:08:16 +08:00
|
|
|
N_("Warning: commit message did not conform to UTF-8.\n"
|
|
|
|
"You may want to amend it after fixing the message, or set the config\n"
|
2022-06-17 18:03:09 +08:00
|
|
|
"variable i18n.commitEncoding to the encoding your project uses.\n");
|
make commit_tree a library function
Until now, this has been part of the commit-tree builtin.
However, it is already used by other builtins (like commit,
merge, and notes), and it would be useful to access it from
library code.
The check_valid helper has to come along, too, but is given
a more library-ish name of "assert_sha1_type".
Otherwise, the code is unchanged. There are still a few
rough edges for a library function, like printing the utf8
warning to stderr, but we can address those if and when they
come up as inappropriate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-02 08:05:23 +08:00
|
|
|
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
static void write_commit_tree(struct strbuf *buffer, const char *msg, size_t msg_len,
|
|
|
|
const struct object_id *tree,
|
|
|
|
const struct object_id *parents, size_t parents_len,
|
|
|
|
const char *author, const char *committer,
|
2024-06-11 17:20:42 +08:00
|
|
|
const struct commit_extra_header *extra)
|
make commit_tree a library function
Until now, this has been part of the commit-tree builtin.
However, it is already used by other builtins (like commit,
merge, and notes), and it would be useful to access it from
library code.
The check_valid helper has to come along, too, but is given
a more library-ish name of "assert_sha1_type".
Otherwise, the code is unchanged. There are still a few
rough edges for a library function, like printing the utf8
warning to stderr, but we can address those if and when they
come up as inappropriate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-02 08:05:23 +08:00
|
|
|
{
|
|
|
|
int encoding_is_utf8;
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
size_t i;
|
2011-12-15 21:47:23 +08:00
|
|
|
|
make commit_tree a library function
Until now, this has been part of the commit-tree builtin.
However, it is already used by other builtins (like commit,
merge, and notes), and it would be useful to access it from
library code.
The check_valid helper has to come along, too, but is given
a more library-ish name of "assert_sha1_type".
Otherwise, the code is unchanged. There are still a few
rough edges for a library function, like printing the utf8
warning to stderr, but we can address those if and when they
come up as inappropriate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-02 08:05:23 +08:00
|
|
|
/* Not having i18n.commitencoding is the same as having utf-8 */
|
|
|
|
encoding_is_utf8 = is_encoding_utf8(git_commit_encoding);
|
|
|
|
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
strbuf_grow(buffer, 8192); /* should avoid reallocs for the headers */
|
|
|
|
strbuf_addf(buffer, "tree %s\n", oid_to_hex(tree));
|
make commit_tree a library function
Until now, this has been part of the commit-tree builtin.
However, it is already used by other builtins (like commit,
merge, and notes), and it would be useful to access it from
library code.
The check_valid helper has to come along, too, but is given
a more library-ish name of "assert_sha1_type".
Otherwise, the code is unchanged. There are still a few
rough edges for a library function, like printing the utf8
warning to stderr, but we can address those if and when they
come up as inappropriate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-02 08:05:23 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* NOTE! This ordering means that the same exact tree merged with a
|
|
|
|
* different order of parents will be a _different_ changeset even
|
|
|
|
* if everything else stays the same.
|
|
|
|
*/
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
for (i = 0; i < parents_len; i++)
|
|
|
|
strbuf_addf(buffer, "parent %s\n", oid_to_hex(&parents[i]));
|
make commit_tree a library function
Until now, this has been part of the commit-tree builtin.
However, it is already used by other builtins (like commit,
merge, and notes), and it would be useful to access it from
library code.
The check_valid helper has to come along, too, but is given
a more library-ish name of "assert_sha1_type".
Otherwise, the code is unchanged. There are still a few
rough edges for a library function, like printing the utf8
warning to stderr, but we can address those if and when they
come up as inappropriate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-02 08:05:23 +08:00
|
|
|
|
|
|
|
/* Person/date information */
|
|
|
|
if (!author)
|
2012-05-25 07:28:40 +08:00
|
|
|
author = git_author_info(IDENT_STRICT);
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
strbuf_addf(buffer, "author %s\n", author);
|
2020-08-18 01:40:01 +08:00
|
|
|
if (!committer)
|
|
|
|
committer = git_committer_info(IDENT_STRICT);
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
strbuf_addf(buffer, "committer %s\n", committer);
|
make commit_tree a library function
Until now, this has been part of the commit-tree builtin.
However, it is already used by other builtins (like commit,
merge, and notes), and it would be useful to access it from
library code.
The check_valid helper has to come along, too, but is given
a more library-ish name of "assert_sha1_type".
Otherwise, the code is unchanged. There are still a few
rough edges for a library function, like printing the utf8
warning to stderr, but we can address those if and when they
come up as inappropriate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-02 08:05:23 +08:00
|
|
|
if (!encoding_is_utf8)
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
strbuf_addf(buffer, "encoding %s\n", git_commit_encoding);
|
2011-11-08 08:21:32 +08:00
|
|
|
|
|
|
|
while (extra) {
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
add_extra_header(buffer, extra);
|
2011-11-08 08:21:32 +08:00
|
|
|
extra = extra->next;
|
|
|
|
}
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
strbuf_addch(buffer, '\n');
|
make commit_tree a library function
Until now, this has been part of the commit-tree builtin.
However, it is already used by other builtins (like commit,
merge, and notes), and it would be useful to access it from
library code.
The check_valid helper has to come along, too, but is given
a more library-ish name of "assert_sha1_type".
Otherwise, the code is unchanged. There are still a few
rough edges for a library function, like printing the utf8
warning to stderr, but we can address those if and when they
come up as inappropriate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-02 08:05:23 +08:00
|
|
|
|
|
|
|
/* And add the comment */
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
strbuf_add(buffer, msg, msg_len);
|
|
|
|
}
|
make commit_tree a library function
Until now, this has been part of the commit-tree builtin.
However, it is already used by other builtins (like commit,
merge, and notes), and it would be useful to access it from
library code.
The check_valid helper has to come along, too, but is given
a more library-ish name of "assert_sha1_type".
Otherwise, the code is unchanged. There are still a few
rough edges for a library function, like printing the utf8
warning to stderr, but we can address those if and when they
come up as inappropriate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-02 08:05:23 +08:00
|
|
|
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
int commit_tree_extended(const char *msg, size_t msg_len,
|
|
|
|
const struct object_id *tree,
|
2024-06-11 17:20:42 +08:00
|
|
|
const struct commit_list *parents, struct object_id *ret,
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
const char *author, const char *committer,
|
|
|
|
const char *sign_commit,
|
2024-06-11 17:20:42 +08:00
|
|
|
const struct commit_extra_header *extra)
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
{
|
|
|
|
struct repository *r = the_repository;
|
|
|
|
int result = 0;
|
|
|
|
int encoding_is_utf8;
|
|
|
|
struct strbuf buffer = STRBUF_INIT, compat_buffer = STRBUF_INIT;
|
|
|
|
struct strbuf sig = STRBUF_INIT, compat_sig = STRBUF_INIT;
|
|
|
|
struct object_id *parent_buf = NULL, *compat_oid = NULL;
|
|
|
|
struct object_id compat_oid_buf;
|
|
|
|
size_t i, nparents;
|
|
|
|
|
|
|
|
/* Not having i18n.commitencoding is the same as having utf-8 */
|
|
|
|
encoding_is_utf8 = is_encoding_utf8(git_commit_encoding);
|
|
|
|
|
|
|
|
assert_oid_type(tree, OBJ_TREE);
|
make commit_tree a library function
Until now, this has been part of the commit-tree builtin.
However, it is already used by other builtins (like commit,
merge, and notes), and it would be useful to access it from
library code.
The check_valid helper has to come along, too, but is given
a more library-ish name of "assert_sha1_type".
Otherwise, the code is unchanged. There are still a few
rough edges for a library function, like printing the utf8
warning to stderr, but we can address those if and when they
come up as inappropriate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-02 08:05:23 +08:00
|
|
|
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
if (memchr(msg, '\0', msg_len))
|
|
|
|
return error("a NUL byte in commit log message not allowed.");
|
make commit_tree a library function
Until now, this has been part of the commit-tree builtin.
However, it is already used by other builtins (like commit,
merge, and notes), and it would be useful to access it from
library code.
The check_valid helper has to come along, too, but is given
a more library-ish name of "assert_sha1_type".
Otherwise, the code is unchanged. There are still a few
rough edges for a library function, like printing the utf8
warning to stderr, but we can address those if and when they
come up as inappropriate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-02 08:05:23 +08:00
|
|
|
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
nparents = commit_list_count(parents);
|
|
|
|
CALLOC_ARRAY(parent_buf, nparents);
|
|
|
|
i = 0;
|
2024-06-11 17:20:42 +08:00
|
|
|
for (const struct commit_list *p = parents; p; p = p->next)
|
|
|
|
oidcpy(&parent_buf[i++], &p->item->object.oid);
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
|
|
|
|
write_commit_tree(&buffer, msg, msg_len, tree, parent_buf, nparents, author, committer, extra);
|
|
|
|
if (sign_commit && sign_commit_to_strbuf(&sig, &buffer, sign_commit)) {
|
2017-08-31 01:49:38 +08:00
|
|
|
result = -1;
|
|
|
|
goto out;
|
|
|
|
}
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
if (r->compat_hash_algo) {
|
2023-10-02 10:40:14 +08:00
|
|
|
struct commit_extra_header *compat_extra = NULL;
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
struct object_id mapped_tree;
|
|
|
|
struct object_id *mapped_parents;
|
|
|
|
|
|
|
|
CALLOC_ARRAY(mapped_parents, nparents);
|
commit: teach --gpg-sign option
This uses the gpg-interface.[ch] to allow signing the commit, i.e.
$ git commit --gpg-sign -m foo
You need a passphrase to unlock the secret key for
user: "Junio C Hamano <gitster@pobox.com>"
4096-bit RSA key, ID 96AFE6CB, created 2011-10-03 (main key ID 713660A7)
[master 8457d13] foo
1 files changed, 1 insertions(+), 0 deletions(-)
The lines of GPG detached signature are placed in a new multi-line header
field, instead of tucking the signature block at the end of the commit log
message text (similar to how signed tag is done), for multiple reasons:
- The signature won't clutter output from "git log" and friends if it is
in the extra header. If we place it at the end of the log message, we
would need to teach "git log" and friends to strip the signature block
with an option.
- Teaching new versions of "git log" and "gitk" to optionally verify and
show signatures is cleaner if we structurally know where the signature
block is (instead of scanning in the commit log message).
- The signature needs to be stripped upon various commit rewriting
operations, e.g. rebase, filter-branch, etc. They all already ignore
unknown headers, but if we place signature in the log message, all of
these tools (and third-party tools) also need to learn how a signature
block would look like.
- When we added the optional encoding header, all the tools (both in tree
and third-party) that acts on the raw commit object should have been
fixed to ignore headers they do not understand, so it is not like that
new header would be more likely to break than extra text in the commit.
A commit made with the above sample sequence would look like this:
$ git cat-file commit HEAD
tree 3cd71d90e3db4136e5260ab54599791c4f883b9d
parent b87755351a47b09cb27d6913e6e0e17e6254a4d4
author Junio C Hamano <gitster@pobox.com> 1317862251 -0700
committer Junio C Hamano <gitster@pobox.com> 1317862251 -0700
gpgsig -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJOjPtrAAoJELC16IaWr+bL4TMP/RSe2Y/jYnCkds9unO5JEnfG
...
=dt98
-----END PGP SIGNATURE-----
foo
but "git log" (unless you ask for it with --pretty=raw) output is not
cluttered with the signature information.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-06 08:23:20 +08:00
|
|
|
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
if (repo_oid_to_algop(r, tree, r->compat_hash_algo, &mapped_tree)) {
|
|
|
|
result = -1;
|
|
|
|
free(mapped_parents);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
for (i = 0; i < nparents; i++)
|
|
|
|
if (repo_oid_to_algop(r, &parent_buf[i], r->compat_hash_algo, &mapped_parents[i])) {
|
|
|
|
result = -1;
|
|
|
|
free(mapped_parents);
|
|
|
|
goto out;
|
|
|
|
}
|
2023-10-02 10:40:14 +08:00
|
|
|
if (convert_commit_extra_headers(extra, &compat_extra)) {
|
|
|
|
result = -1;
|
|
|
|
free(mapped_parents);
|
|
|
|
goto out;
|
|
|
|
}
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
write_commit_tree(&compat_buffer, msg, msg_len, &mapped_tree,
|
2023-10-02 10:40:14 +08:00
|
|
|
mapped_parents, nparents, author, committer, compat_extra);
|
|
|
|
free_commit_extra_headers(compat_extra);
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
free(mapped_parents);
|
|
|
|
|
|
|
|
if (sign_commit && sign_commit_to_strbuf(&compat_sig, &compat_buffer, sign_commit)) {
|
|
|
|
result = -1;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (sign_commit) {
|
|
|
|
struct sig_pairs {
|
|
|
|
struct strbuf *sig;
|
|
|
|
const struct git_hash_algo *algo;
|
|
|
|
} bufs [2] = {
|
|
|
|
{ &compat_sig, r->compat_hash_algo },
|
|
|
|
{ &sig, r->hash_algo },
|
|
|
|
};
|
|
|
|
int i;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We write algorithms in the order they were implemented in
|
|
|
|
* Git to produce a stable hash when multiple algorithms are
|
|
|
|
* used.
|
|
|
|
*/
|
|
|
|
if (r->compat_hash_algo && hash_algo_by_ptr(bufs[0].algo) > hash_algo_by_ptr(bufs[1].algo))
|
|
|
|
SWAP(bufs[0], bufs[1]);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We traverse each algorithm in order, and apply the signature
|
|
|
|
* to each buffer.
|
|
|
|
*/
|
|
|
|
for (i = 0; i < ARRAY_SIZE(bufs); i++) {
|
|
|
|
if (!bufs[i].algo)
|
|
|
|
continue;
|
2023-10-02 10:40:15 +08:00
|
|
|
add_header_signature(&buffer, bufs[i].sig, bufs[i].algo);
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
if (r->compat_hash_algo)
|
2023-10-02 10:40:15 +08:00
|
|
|
add_header_signature(&compat_buffer, bufs[i].sig, bufs[i].algo);
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
}
|
|
|
|
}
|
commit: teach --gpg-sign option
This uses the gpg-interface.[ch] to allow signing the commit, i.e.
$ git commit --gpg-sign -m foo
You need a passphrase to unlock the secret key for
user: "Junio C Hamano <gitster@pobox.com>"
4096-bit RSA key, ID 96AFE6CB, created 2011-10-03 (main key ID 713660A7)
[master 8457d13] foo
1 files changed, 1 insertions(+), 0 deletions(-)
The lines of GPG detached signature are placed in a new multi-line header
field, instead of tucking the signature block at the end of the commit log
message text (similar to how signed tag is done), for multiple reasons:
- The signature won't clutter output from "git log" and friends if it is
in the extra header. If we place it at the end of the log message, we
would need to teach "git log" and friends to strip the signature block
with an option.
- Teaching new versions of "git log" and "gitk" to optionally verify and
show signatures is cleaner if we structurally know where the signature
block is (instead of scanning in the commit log message).
- The signature needs to be stripped upon various commit rewriting
operations, e.g. rebase, filter-branch, etc. They all already ignore
unknown headers, but if we place signature in the log message, all of
these tools (and third-party tools) also need to learn how a signature
block would look like.
- When we added the optional encoding header, all the tools (both in tree
and third-party) that acts on the raw commit object should have been
fixed to ignore headers they do not understand, so it is not like that
new header would be more likely to break than extra text in the commit.
A commit made with the above sample sequence would look like this:
$ git cat-file commit HEAD
tree 3cd71d90e3db4136e5260ab54599791c4f883b9d
parent b87755351a47b09cb27d6913e6e0e17e6254a4d4
author Junio C Hamano <gitster@pobox.com> 1317862251 -0700
committer Junio C Hamano <gitster@pobox.com> 1317862251 -0700
gpgsig -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJOjPtrAAoJELC16IaWr+bL4TMP/RSe2Y/jYnCkds9unO5JEnfG
...
=dt98
-----END PGP SIGNATURE-----
foo
but "git log" (unless you ask for it with --pretty=raw) output is not
cluttered with the signature information.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-06 08:23:20 +08:00
|
|
|
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
/* And check the encoding. */
|
|
|
|
if (encoding_is_utf8 && (!verify_utf8(&buffer) || !verify_utf8(&compat_buffer)))
|
|
|
|
fprintf(stderr, _(commit_utf8_warn));
|
|
|
|
|
|
|
|
if (r->compat_hash_algo) {
|
|
|
|
hash_object_file(r->compat_hash_algo, compat_buffer.buf, compat_buffer.len,
|
|
|
|
OBJ_COMMIT, &compat_oid_buf);
|
|
|
|
compat_oid = &compat_oid_buf;
|
|
|
|
}
|
|
|
|
|
|
|
|
result = write_object_file_flags(buffer.buf, buffer.len, OBJ_COMMIT,
|
|
|
|
ret, compat_oid, 0);
|
2017-08-31 01:49:38 +08:00
|
|
|
out:
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
free(parent_buf);
|
make commit_tree a library function
Until now, this has been part of the commit-tree builtin.
However, it is already used by other builtins (like commit,
merge, and notes), and it would be useful to access it from
library code.
The check_valid helper has to come along, too, but is given
a more library-ish name of "assert_sha1_type".
Otherwise, the code is unchanged. There are still a few
rough edges for a library function, like printing the utf8
warning to stderr, but we can address those if and when they
come up as inappropriate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-02 08:05:23 +08:00
|
|
|
strbuf_release(&buffer);
|
commit: write commits for both hashes
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 10:40:13 +08:00
|
|
|
strbuf_release(&compat_buffer);
|
|
|
|
strbuf_release(&sig);
|
|
|
|
strbuf_release(&compat_sig);
|
make commit_tree a library function
Until now, this has been part of the commit-tree builtin.
However, it is already used by other builtins (like commit,
merge, and notes), and it would be useful to access it from
library code.
The check_valid helper has to come along, too, but is given
a more library-ish name of "assert_sha1_type".
Otherwise, the code is unchanged. There are still a few
rough edges for a library function, like printing the utf8
warning to stderr, but we can address those if and when they
come up as inappropriate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-02 08:05:23 +08:00
|
|
|
return result;
|
|
|
|
}
|
2011-11-08 05:26:22 +08:00
|
|
|
|
2018-05-19 13:28:30 +08:00
|
|
|
define_commit_slab(merge_desc_slab, struct merge_remote_desc *);
|
|
|
|
static struct merge_desc_slab merge_desc_slab = COMMIT_SLAB_INIT(1, merge_desc_slab);
|
|
|
|
|
2024-06-11 17:20:42 +08:00
|
|
|
struct merge_remote_desc *merge_remote_util(const struct commit *commit)
|
2018-05-19 13:28:30 +08:00
|
|
|
{
|
|
|
|
return *merge_desc_slab_at(&merge_desc_slab, commit);
|
|
|
|
}
|
|
|
|
|
2016-08-13 20:11:27 +08:00
|
|
|
void set_merge_remote_desc(struct commit *commit,
|
|
|
|
const char *name, struct object *obj)
|
|
|
|
{
|
|
|
|
struct merge_remote_desc *desc;
|
2016-08-13 20:21:27 +08:00
|
|
|
FLEX_ALLOC_STR(desc, name, name);
|
2016-08-13 20:11:27 +08:00
|
|
|
desc->obj = obj;
|
2018-05-19 13:28:30 +08:00
|
|
|
*merge_desc_slab_at(&merge_desc_slab, commit) = desc;
|
2016-08-13 20:11:27 +08:00
|
|
|
}
|
|
|
|
|
2011-11-08 05:26:22 +08:00
|
|
|
struct commit *get_merge_parent(const char *name)
|
|
|
|
{
|
|
|
|
struct object *obj;
|
|
|
|
struct commit *commit;
|
2015-03-14 07:39:34 +08:00
|
|
|
struct object_id oid;
|
2023-03-28 21:58:46 +08:00
|
|
|
if (repo_get_oid(the_repository, name, &oid))
|
2011-11-08 05:26:22 +08:00
|
|
|
return NULL;
|
2018-06-29 09:21:51 +08:00
|
|
|
obj = parse_object(the_repository, &oid);
|
2023-03-28 21:58:46 +08:00
|
|
|
commit = (struct commit *)repo_peel_to_type(the_repository, name, 0,
|
|
|
|
obj, OBJ_COMMIT);
|
2018-05-19 13:28:30 +08:00
|
|
|
if (commit && !merge_remote_util(commit))
|
2016-08-13 20:11:27 +08:00
|
|
|
set_merge_remote_desc(commit, name, obj);
|
2011-11-08 05:26:22 +08:00
|
|
|
return commit;
|
|
|
|
}
|
2012-04-26 04:35:27 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Append a commit to the end of the commit_list.
|
|
|
|
*
|
|
|
|
* next starts by pointing to the variable that holds the head of an
|
|
|
|
* empty commit_list, and is updated to point to the "next" field of
|
|
|
|
* the last item on the list as new commits are appended.
|
|
|
|
*
|
|
|
|
* Usage example:
|
|
|
|
*
|
|
|
|
* struct commit_list *list;
|
|
|
|
* struct commit_list **next = &list;
|
|
|
|
*
|
|
|
|
* next = commit_list_append(c1, next);
|
|
|
|
* next = commit_list_append(c2, next);
|
|
|
|
* assert(commit_list_count(list) == 2);
|
|
|
|
* return list;
|
|
|
|
*/
|
|
|
|
struct commit_list **commit_list_append(struct commit *commit,
|
|
|
|
struct commit_list **next)
|
|
|
|
{
|
2018-02-15 02:59:37 +08:00
|
|
|
struct commit_list *new_commit = xmalloc(sizeof(struct commit_list));
|
|
|
|
new_commit->item = commit;
|
|
|
|
*next = new_commit;
|
|
|
|
new_commit->next = NULL;
|
|
|
|
return &new_commit->next;
|
2012-04-26 04:35:27 +08:00
|
|
|
}
|
2012-10-26 23:53:51 +08:00
|
|
|
|
2024-06-20 01:13:19 +08:00
|
|
|
const char *find_commit_header(const char *msg, const char *key, size_t *out_len)
|
commit: provide a function to find a header in a buffer
Usually when we parse a commit, we read it line by line and
handle each individual line (e.g., parse_commit and
parse_commit_header). Sometimes, however, we only care
about extracting a single header. Code in this situation is
stuck doing an ad-hoc parse of the commit buffer.
Let's provide a reusable function to locate a header within
the commit. The code is modeled after pretty.c's
get_header, which is used to extract the encoding.
Since some callers may not have the "struct commit" to go
along with the buffer, we drop that parameter. The only
thing lost is a warning for truncated commits, but that's
OK. This shouldn't happen in practice, and even if it does,
there's no particular reason that this function needs to
complain about it. It either finds the header it was asked
for, or it doesn't (and in the latter case, the caller will
typically complain).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-27 15:56:01 +08:00
|
|
|
{
|
|
|
|
int key_len = strlen(key);
|
|
|
|
const char *line = msg;
|
|
|
|
|
2024-06-20 01:13:19 +08:00
|
|
|
while (line) {
|
commit: provide a function to find a header in a buffer
Usually when we parse a commit, we read it line by line and
handle each individual line (e.g., parse_commit and
parse_commit_header). Sometimes, however, we only care
about extracting a single header. Code in this situation is
stuck doing an ad-hoc parse of the commit buffer.
Let's provide a reusable function to locate a header within
the commit. The code is modeled after pretty.c's
get_header, which is used to extract the encoding.
Since some callers may not have the "struct commit" to go
along with the buffer, we drop that parameter. The only
thing lost is a warning for truncated commits, but that's
OK. This shouldn't happen in practice, and even if it does,
there's no particular reason that this function needs to
complain about it. It either finds the header it was asked
for, or it doesn't (and in the latter case, the caller will
typically complain).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-27 15:56:01 +08:00
|
|
|
const char *eol = strchrnul(line, '\n');
|
|
|
|
|
|
|
|
if (line == eol)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
if (eol - line > key_len &&
|
|
|
|
!strncmp(line, key, key_len) &&
|
|
|
|
line[key_len] == ' ') {
|
|
|
|
*out_len = eol - line - key_len - 1;
|
|
|
|
return line + key_len + 1;
|
|
|
|
}
|
|
|
|
line = *eol ? eol + 1 : NULL;
|
|
|
|
}
|
|
|
|
return NULL;
|
|
|
|
}
|
2014-12-23 04:26:23 +08:00
|
|
|
|
2014-11-09 17:23:41 +08:00
|
|
|
/*
|
2016-11-03 01:29:17 +08:00
|
|
|
* Inspect the given string and determine the true "end" of the log message, in
|
Documentation: stylistically normalize references to Signed-off-by:
Ted reported an old typo in the git-commit.txt and merge-options.txt.
Namely, the phrase "Signed-off-by line" was used without either a
definite nor indefinite article.
Upon examination, it seems that the documentation (including items in
Documentation/, but also option help strings) have been quite
inconsistent on usage when referring to `Signed-off-by`.
First, very few places used a definite or indefinite article with the
phrase "Signed-off-by line", but that was the initial typo that led
to this investigation. So, normalize using either an indefinite or
definite article consistently.
The original phrasing, in Commit 3f971fc425b (Documentation updates,
2005-08-14), is "Add Signed-off-by line". Commit 6f855371a53 (Add
--signoff, --check, and long option-names. 2005-12-09) switched to
using "Add `Signed-off-by:` line", but didn't normalize the former
commit to match. Later commits seem to have cut and pasted from one
or the other, which is likely how the usage became so inconsistent.
Junio stated on the git mailing list in
<xmqqy2k1dfoh.fsf@gitster.c.googlers.com> a preference to leave off
the colon. Thus, prefer `Signed-off-by` (with backticks) for the
documentation files and Signed-off-by (without backticks) for option
help strings.
Additionally, Junio argued that "trailer" is now the standard term to
refer to `Signed-off-by`, saying that "becomes plenty clear that we
are not talking about any random line in the log message". As such,
prefer "trailer" over "line" anywhere the former word fits.
However, leave alone those few places in documentation that use
Signed-off-by to refer to the process (rather than the specific
trailer), or in places where mail headers are generally discussed in
comparison with Signed-off-by.
Reported-by: "Theodore Y. Ts'o" <tytso@mit.edu>
Signed-off-by: Bradley M. Kuhn <bkuhn@sfconservancy.org>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 09:03:55 +08:00
|
|
|
* order to find where to put a new Signed-off-by trailer. Ignored are
|
interpret-trailers: honor the cut line
If a commit message is edited with the "verbose" option, the buffer
will have a cut line and diff after the log message, like so:
my subject
# ------------------------ >8 ------------------------
# Do not touch the line above.
# Everything below will be removed.
diff --git a/foo.txt b/foo.txt
index 5716ca5..7601807 100644
--- a/foo.txt
+++ b/foo.txt
@@ -1 +1 @@
-bar
+baz
"git interpret-trailers" is unaware of the cut line, and assumes the
trailer block would be at the end of the whole thing. This can easily
be seen with:
$ GIT_EDITOR='git interpret-trailers --in-place --trailer Acked-by:me' \
git commit --amend -v
Teach "git interpret-trailers" to notice the cut-line and ignore the
remainder of the input when looking for a place to add new trailer
block. This makes it consistent with how "git commit -v -s" inserts a
new Signed-off-by: line.
This can be done by the same logic as the existing helper function,
wt_status_truncate_message_at_cut_line(), uses, but it wants the caller
to pass a strbuf to it. Because the function ignore_non_trailer() used
by the command takes a <pointer, length> pair, not a strbuf, steal the
logic from wt_status_truncate_message_at_cut_line() to create a new
wt_status_locate_end() helper function that takes <pointer, length>
pair, and make ignore_non_trailer() call it to help "interpret-trailers".
Since there is only one caller of wt_status_truncate_message_at_cut_line()
in cmd_commit(), rewrite it to call wt_status_locate_end() helper instead
and remove the old helper that no longer has any caller.
Signed-off-by: Brian Malehorn <bmalehorn@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-16 14:06:49 +08:00
|
|
|
* trailing comment lines and blank lines. To support "git commit -s
|
|
|
|
* --amend" on an existing commit, we also ignore "Conflicts:". To
|
|
|
|
* support "git commit -v", we truncate at cut lines.
|
2014-11-09 17:23:41 +08:00
|
|
|
*
|
|
|
|
* Returns the number of bytes from the tail to ignore, to be fed as
|
|
|
|
* the second parameter to append_signoff().
|
|
|
|
*/
|
2023-10-21 03:01:33 +08:00
|
|
|
size_t ignored_log_message_bytes(const char *buf, size_t len)
|
2014-11-09 17:23:41 +08:00
|
|
|
{
|
2018-08-23 08:50:51 +08:00
|
|
|
size_t boc = 0;
|
|
|
|
size_t bol = 0;
|
2014-11-09 17:23:41 +08:00
|
|
|
int in_old_conflicts_block = 0;
|
interpret-trailers: honor the cut line
If a commit message is edited with the "verbose" option, the buffer
will have a cut line and diff after the log message, like so:
my subject
# ------------------------ >8 ------------------------
# Do not touch the line above.
# Everything below will be removed.
diff --git a/foo.txt b/foo.txt
index 5716ca5..7601807 100644
--- a/foo.txt
+++ b/foo.txt
@@ -1 +1 @@
-bar
+baz
"git interpret-trailers" is unaware of the cut line, and assumes the
trailer block would be at the end of the whole thing. This can easily
be seen with:
$ GIT_EDITOR='git interpret-trailers --in-place --trailer Acked-by:me' \
git commit --amend -v
Teach "git interpret-trailers" to notice the cut-line and ignore the
remainder of the input when looking for a place to add new trailer
block. This makes it consistent with how "git commit -v -s" inserts a
new Signed-off-by: line.
This can be done by the same logic as the existing helper function,
wt_status_truncate_message_at_cut_line(), uses, but it wants the caller
to pass a strbuf to it. Because the function ignore_non_trailer() used
by the command takes a <pointer, length> pair, not a strbuf, steal the
logic from wt_status_truncate_message_at_cut_line() to create a new
wt_status_locate_end() helper function that takes <pointer, length>
pair, and make ignore_non_trailer() call it to help "interpret-trailers".
Since there is only one caller of wt_status_truncate_message_at_cut_line()
in cmd_commit(), rewrite it to call wt_status_locate_end() helper instead
and remove the old helper that no longer has any caller.
Signed-off-by: Brian Malehorn <bmalehorn@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-16 14:06:49 +08:00
|
|
|
size_t cutoff = wt_status_locate_end(buf, len);
|
2014-11-09 17:23:41 +08:00
|
|
|
|
interpret-trailers: honor the cut line
If a commit message is edited with the "verbose" option, the buffer
will have a cut line and diff after the log message, like so:
my subject
# ------------------------ >8 ------------------------
# Do not touch the line above.
# Everything below will be removed.
diff --git a/foo.txt b/foo.txt
index 5716ca5..7601807 100644
--- a/foo.txt
+++ b/foo.txt
@@ -1 +1 @@
-bar
+baz
"git interpret-trailers" is unaware of the cut line, and assumes the
trailer block would be at the end of the whole thing. This can easily
be seen with:
$ GIT_EDITOR='git interpret-trailers --in-place --trailer Acked-by:me' \
git commit --amend -v
Teach "git interpret-trailers" to notice the cut-line and ignore the
remainder of the input when looking for a place to add new trailer
block. This makes it consistent with how "git commit -v -s" inserts a
new Signed-off-by: line.
This can be done by the same logic as the existing helper function,
wt_status_truncate_message_at_cut_line(), uses, but it wants the caller
to pass a strbuf to it. Because the function ignore_non_trailer() used
by the command takes a <pointer, length> pair, not a strbuf, steal the
logic from wt_status_truncate_message_at_cut_line() to create a new
wt_status_locate_end() helper function that takes <pointer, length>
pair, and make ignore_non_trailer() call it to help "interpret-trailers".
Since there is only one caller of wt_status_truncate_message_at_cut_line()
in cmd_commit(), rewrite it to call wt_status_locate_end() helper instead
and remove the old helper that no longer has any caller.
Signed-off-by: Brian Malehorn <bmalehorn@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-16 14:06:49 +08:00
|
|
|
while (bol < cutoff) {
|
2016-11-03 01:29:17 +08:00
|
|
|
const char *next_line = memchr(buf + bol, '\n', len - bol);
|
2014-11-09 17:23:41 +08:00
|
|
|
|
2016-11-03 01:29:17 +08:00
|
|
|
if (!next_line)
|
|
|
|
next_line = buf + len;
|
2014-11-09 17:23:41 +08:00
|
|
|
else
|
|
|
|
next_line++;
|
|
|
|
|
find multi-byte comment chars in unterminated buffers
As with the previous patch, we need to swap out single-byte matching for
something like starts_with() to match all bytes of a multi-byte comment
character. But for cases where the buffer is not NUL-terminated (and we
instead have an explicit size or end pointer), it's not safe to use
starts_with(), as it might walk off the end of the buffer.
Let's introduce a new starts_with_mem() that does the same thing but
also accepts the length of the "haystack" str and makes sure not to walk
past it.
Note that in most cases the existing code did not need a length check at
all, since it was written in a way that knew we had at least one byte
available (and that was all we checked). So I had to read each one to
find the appropriate bounds. The one exception is sequencer.c's
add_commented_lines(), where we can actually get rid of the length
check. Just like starts_with(), our starts_with_mem() handles an empty
haystack variable by not matching (assuming a non-empty prefix).
A few notes on the implementation of starts_with_mem():
- it would be equally correct to take an "end" pointer (and indeed,
many of the callers have this and have to subtract to come up with
the length). I think taking a ptr/size combo is a more usual
interface for our codebase, though, and has the added benefit that
the function signature makes it harder to mix up the three
parameters.
- we could obviously build starts_with() on top of this by passing
strlen(str) as the length. But it's possible that starts_with() is a
relatively hot code path, and it should not pay that penalty (it can
generally return an answer proportional to the size of the prefix,
not the whole string).
- it naively feels like xstrncmpz() should be able to do the same
thing, but that's not quite true. If you pass the length of the
haystack buffer, then strncmp() finds that a shorter prefix string
is "less than" than the haystack, even if the haystack starts with
the prefix. If you pass the length of the prefix, then you risk
reading past the end of the haystack if it is shorter than the
prefix. So I think we really do need a new function.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-12 17:17:39 +08:00
|
|
|
if (starts_with_mem(buf + bol, cutoff - bol, comment_line_str) ||
|
|
|
|
buf[bol] == '\n') {
|
2014-11-09 17:23:41 +08:00
|
|
|
/* is this the first of the run of comments? */
|
|
|
|
if (!boc)
|
|
|
|
boc = bol;
|
|
|
|
/* otherwise, it is just continuing */
|
2016-11-03 01:29:17 +08:00
|
|
|
} else if (starts_with(buf + bol, "Conflicts:\n")) {
|
2014-11-09 17:23:41 +08:00
|
|
|
in_old_conflicts_block = 1;
|
|
|
|
if (!boc)
|
|
|
|
boc = bol;
|
2016-11-03 01:29:17 +08:00
|
|
|
} else if (in_old_conflicts_block && buf[bol] == '\t') {
|
2014-11-09 17:23:41 +08:00
|
|
|
; /* a pathname in the conflicts block */
|
|
|
|
} else if (boc) {
|
|
|
|
/* the previous was not trailing comment */
|
|
|
|
boc = 0;
|
|
|
|
in_old_conflicts_block = 0;
|
|
|
|
}
|
2016-11-03 01:29:17 +08:00
|
|
|
bol = next_line - buf;
|
2014-11-09 17:23:41 +08:00
|
|
|
}
|
interpret-trailers: honor the cut line
If a commit message is edited with the "verbose" option, the buffer
will have a cut line and diff after the log message, like so:
my subject
# ------------------------ >8 ------------------------
# Do not touch the line above.
# Everything below will be removed.
diff --git a/foo.txt b/foo.txt
index 5716ca5..7601807 100644
--- a/foo.txt
+++ b/foo.txt
@@ -1 +1 @@
-bar
+baz
"git interpret-trailers" is unaware of the cut line, and assumes the
trailer block would be at the end of the whole thing. This can easily
be seen with:
$ GIT_EDITOR='git interpret-trailers --in-place --trailer Acked-by:me' \
git commit --amend -v
Teach "git interpret-trailers" to notice the cut-line and ignore the
remainder of the input when looking for a place to add new trailer
block. This makes it consistent with how "git commit -v -s" inserts a
new Signed-off-by: line.
This can be done by the same logic as the existing helper function,
wt_status_truncate_message_at_cut_line(), uses, but it wants the caller
to pass a strbuf to it. Because the function ignore_non_trailer() used
by the command takes a <pointer, length> pair, not a strbuf, steal the
logic from wt_status_truncate_message_at_cut_line() to create a new
wt_status_locate_end() helper function that takes <pointer, length>
pair, and make ignore_non_trailer() call it to help "interpret-trailers".
Since there is only one caller of wt_status_truncate_message_at_cut_line()
in cmd_commit(), rewrite it to call wt_status_locate_end() helper instead
and remove the old helper that no longer has any caller.
Signed-off-by: Brian Malehorn <bmalehorn@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-16 14:06:49 +08:00
|
|
|
return boc ? len - boc : len - cutoff;
|
2014-11-09 17:23:41 +08:00
|
|
|
}
|
2019-10-15 18:25:31 +08:00
|
|
|
|
|
|
|
int run_commit_hook(int editor_is_used, const char *index_file,
|
hooks: fix an obscure TOCTOU "did we just run a hook?" race
Fix a Time-of-check to time-of-use (TOCTOU) race in code added in
680ee550d72 (commit: skip discarding the index if there is no
pre-commit hook, 2017-08-14).
This obscure race condition can occur if we e.g. ran the "pre-commit"
hook and it modified the index, but hook_exists() returns false later
on (e.g., because the hook itself went away, the directory became
unreadable, etc.). Then we won't call discard_cache() when we should
have.
The race condition itself probably doesn't matter, and users would
have been unlikely to run into it in practice. This problem has been
noted on-list when 680ee550d72 was discussed[1], but had not been
fixed.
This change is mainly intended to improve the readability of the code
involved, and to make reasoning about it more straightforward. It
wasn't as obvious what we were trying to do here, but by having an
"invoked_hook" it's clearer that e.g. our discard_cache() is happening
because of the earlier hook execution.
Let's also change this for the push-to-checkout hook. Now instead of
checking if the hook exists and either doing a push to checkout or a
push to deploy we'll always attempt a push to checkout. If the hook
doesn't exist we'll fall back on push to deploy. The same behavior as
before, without the TOCTOU race. See 0855331941b (receive-pack:
support push-to-checkout hook, 2014-12-01) for the introduction of the
previous behavior.
This leaves uses of hook_exists() in two places that matter. The
"reference-transaction" check in refs.c, see 67541597670 (refs:
implement reference transaction hook, 2020-06-19), and the
"prepare-commit-msg" hook, see 66618a50f9c (sequencer: run
'prepare-commit-msg' hook, 2018-01-24).
In both of those cases we're saving ourselves CPU time by not
preparing data for the hook that we'll then do nothing with if we
don't have the hook. So using this "invoked_hook" pattern doesn't make
sense in those cases.
The "reference-transaction" and "prepare-commit-msg" hook also aren't
racy. In those cases we'll skip the hook runs if we race with a new
hook being added, whereas in the TOCTOU races being fixed here we were
incorrectly skipping the required post-hook logic.
1. https://lore.kernel.org/git/20170810191613.kpmhzg4seyxy3cpq@sigill.intra.peff.net/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-07 20:33:46 +08:00
|
|
|
int *invoked_hook, const char *name, ...)
|
2019-10-15 18:25:31 +08:00
|
|
|
{
|
2021-12-22 11:59:40 +08:00
|
|
|
struct run_hooks_opt opt = RUN_HOOKS_OPT_INIT;
|
2019-10-15 18:25:31 +08:00
|
|
|
va_list args;
|
2021-12-22 11:59:40 +08:00
|
|
|
const char *arg;
|
2019-10-15 18:25:31 +08:00
|
|
|
|
2021-12-22 11:59:40 +08:00
|
|
|
strvec_pushf(&opt.env, "GIT_INDEX_FILE=%s", index_file);
|
2019-10-15 18:25:31 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Let the hook know that no editor will be launched.
|
|
|
|
*/
|
|
|
|
if (!editor_is_used)
|
2021-12-22 11:59:40 +08:00
|
|
|
strvec_push(&opt.env, "GIT_EDITOR=:");
|
2019-10-15 18:25:31 +08:00
|
|
|
|
|
|
|
va_start(args, name);
|
2021-12-22 11:59:40 +08:00
|
|
|
while ((arg = va_arg(args, const char *)))
|
|
|
|
strvec_push(&opt.args, arg);
|
2019-10-15 18:25:31 +08:00
|
|
|
va_end(args);
|
|
|
|
|
2022-03-22 07:15:13 +08:00
|
|
|
opt.invoked_hook = invoked_hook;
|
2024-08-13 17:13:28 +08:00
|
|
|
return run_hooks_opt(the_repository, name, &opt);
|
2019-10-15 18:25:31 +08:00
|
|
|
}
|