git/tag.c

245 lines
5.8 KiB
C
Raw Normal View History

global: introduce `USE_THE_REPOSITORY_VARIABLE` macro Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 14:50:23 +08:00
#define USE_THE_REPOSITORY_VARIABLE
#include "git-compat-util.h"
#include "environment.h"
#include "tag.h"
#include "object-name.h"
#include "object-store-ll.h"
#include "commit.h"
#include "tree.h"
#include "blob.h"
#include "alloc.h"
#include "gpg-interface.h"
#include "hex.h"
#include "packfile.h"
const char *tag_type = "tag";
static int run_gpg_verify(const char *buf, unsigned long size, unsigned flags)
{
struct signature_check sigc;
struct strbuf payload = STRBUF_INIT;
struct strbuf signature = STRBUF_INIT;
int ret;
memset(&sigc, 0, sizeof(sigc));
if (!parse_signature(buf, size, &payload, &signature)) {
if (flags & GPG_VERIFY_VERBOSE)
write_in_full(1, buf, size);
return error("no signature found");
}
sigc.payload_type = SIGNATURE_PAYLOAD_TAG;
sigc.payload = strbuf_detach(&payload, &sigc.payload_len);
ret = check_signature(&sigc, signature.buf, signature.len);
if (!(flags & GPG_VERIFY_OMIT_STATUS))
print_signature_buffer(&sigc, flags);
signature_check_clear(&sigc);
strbuf_release(&payload);
strbuf_release(&signature);
return ret;
}
int gpg_verify_tag(const struct object_id *oid, const char *name_to_report,
unsigned flags)
{
enum object_type type;
char *buf;
unsigned long size;
int ret;
type = oid_object_info(the_repository, oid, NULL);
if (type != OBJ_TAG)
return error("%s: cannot verify a non-tag object of type %s.",
name_to_report ?
name_to_report :
repo_find_unique_abbrev(the_repository, oid, DEFAULT_ABBREV),
type_name(type));
buf = repo_read_object_file(the_repository, oid, &type, &size);
if (!buf)
return error("%s: unable to read file.",
name_to_report ?
name_to_report :
repo_find_unique_abbrev(the_repository, oid, DEFAULT_ABBREV));
ret = run_gpg_verify(buf, size, flags);
free(buf);
return ret;
}
struct object *deref_tag(struct repository *r, struct object *o, const char *warn, int warnlen)
{
struct object_id *last_oid = NULL;
while (o && o->type == OBJ_TAG)
if (((struct tag *)o)->tagged) {
last_oid = &((struct tag *)o)->tagged->oid;
o = parse_object(r, last_oid);
} else {
last_oid = NULL;
o = NULL;
}
if (!o && warn) {
if (last_oid && is_promisor_object(last_oid))
return NULL;
if (!warnlen)
warnlen = strlen(warn);
error("missing object referenced by '%.*s'", warnlen, warn);
}
return o;
}
struct object *deref_tag_noverify(struct repository *r, struct object *o)
upload-pack: avoid parsing tag destinations When upload-pack advertises refs, it dereferences any tags it sees, and shows the resulting sha1 to the client. It does this by calling deref_tag. That function must load and parse each tag object to find the sha1 of the tagged object. However, it also ends up parsing the tagged object itself, which is not strictly necessary for upload-pack's use. Each tag produces two object loads (assuming it is not a recursive tag), when it could get away with only a single one. Dropping the second load halves the effort we spend. The downside is that we are no longer verifying the resulting object by loading it. In particular: 1. We never cross-check the "type" field given in the tag object with the type of the pointed-to object. If the tag says it points to a tag but doesn't, then we will keep peeling and realize the error. If the tag says it points to a non-tag but actually points to a tag, we will stop peeling and just advertise the pointed-to tag. 2. If we are missing the pointed-to object, we will not realize (because we never even look it up in the object db). However, both of these are errors in the object database, and both will be detected if a client actually requests the broken objects in question. So we are simply pushing the verification away from the advertising stage, and down to the actual fetching stage. On my test repo with 120K refs, this drops the time to advertise the refs from ~3.2s to ~2.0s. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-07 03:18:01 +08:00
{
while (o && o->type == OBJ_TAG) {
o = parse_object(r, &o->oid);
upload-pack: avoid parsing tag destinations When upload-pack advertises refs, it dereferences any tags it sees, and shows the resulting sha1 to the client. It does this by calling deref_tag. That function must load and parse each tag object to find the sha1 of the tagged object. However, it also ends up parsing the tagged object itself, which is not strictly necessary for upload-pack's use. Each tag produces two object loads (assuming it is not a recursive tag), when it could get away with only a single one. Dropping the second load halves the effort we spend. The downside is that we are no longer verifying the resulting object by loading it. In particular: 1. We never cross-check the "type" field given in the tag object with the type of the pointed-to object. If the tag says it points to a tag but doesn't, then we will keep peeling and realize the error. If the tag says it points to a non-tag but actually points to a tag, we will stop peeling and just advertise the pointed-to tag. 2. If we are missing the pointed-to object, we will not realize (because we never even look it up in the object db). However, both of these are errors in the object database, and both will be detected if a client actually requests the broken objects in question. So we are simply pushing the verification away from the advertising stage, and down to the actual fetching stage. On my test repo with 120K refs, this drops the time to advertise the refs from ~3.2s to ~2.0s. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-07 03:18:01 +08:00
if (o && o->type == OBJ_TAG && ((struct tag *)o)->tagged)
o = ((struct tag *)o)->tagged;
else
o = NULL;
}
return o;
}
struct tag *lookup_tag(struct repository *r, const struct object_id *oid)
{
struct object *obj = lookup_object(r, oid);
if (!obj)
return create_object(r, oid, alloc_tag_node(r));
return object_as_type(obj, OBJ_TAG, 0);
}
static timestamp_t parse_tag_date(const char *buf, const char *tail)
{
const char *dateptr;
while (buf < tail && *buf++ != '>')
/* nada */;
if (buf >= tail)
return 0;
dateptr = buf;
while (buf < tail && *buf++ != '\n')
/* nada */;
if (buf >= tail)
return 0;
/* dateptr < buf && buf[-1] == '\n', so parsing will stop at buf-1 */
return parse_timestamp(dateptr, NULL, 10);
}
void release_tag_memory(struct tag *t)
{
free(t->tag);
t->tagged = NULL;
t->object.parsed = 0;
t->date = 0;
}
int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, unsigned long size)
{
struct object_id oid;
char type[20];
const char *bufptr = data;
const char *tail = bufptr + size;
const char *nl;
if (item->object.parsed)
return 0;
commit, tag: don't set parsed bit for parse failures If we can't parse a commit, then parse_commit() will return an error code. But it _also_ sets the "parsed" flag, which tells us not to bother trying to re-parse the object. That means that subsequent parses have no idea that the information in the struct may be bogus. I.e., doing this: parse_commit(commit); ... if (parse_commit(commit) < 0) die("commit is broken"); will never trigger the die(). The second parse_commit() will see the "parsed" flag and quietly return success. There are two obvious ways to fix this: 1. Stop setting "parsed" until we've successfully parsed. 2. Keep a second "corrupt" flag to indicate that we saw an error (and when the parsed flag is set, return 0/-1 depending on the corrupt flag). This patch does option 1. The obvious downside versus option 2 is that we might continually re-parse a broken object. But in practice, corruption like this is rare, and we typically die() or return an error in the caller. So it's OK not to worry about optimizing for corruption. And it's much simpler: we don't need to use an extra bit in the object struct, and callers which check the "parsed" flag don't need to learn about the corrupt bit, too. There's no new test here, because this case is already covered in t5318. Note that we do need to update the expected message there, because we now detect the problem in the return from "parse_commit()", and not with a separate check for a NULL tree. In fact, we can now ditch that explicit tree check entirely, as we're covered robustly by this change (and the previous recent change to treat a NULL tree as a parse error). We'll also give tags the same treatment. I don't know offhand of any cases where the problem can be triggered (it implies somebody ignoring a parse error earlier in the process), but consistently returning an error should cause the least surprise. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-26 05:20:20 +08:00
if (item->tag) {
/*
* Presumably left over from a previous failed parse;
* clear it out in preparation for re-parsing (we'll probably
* hit the same error, which lets us tell our current caller
* about the problem).
*/
FREE_AND_NULL(item->tag);
}
if (size < the_hash_algo->hexsz + 24)
return -1;
if (memcmp("object ", bufptr, 7) || parse_oid_hex(bufptr + 7, &oid, &bufptr) || *bufptr++ != '\n')
return -1;
if (!starts_with(bufptr, "type "))
return -1;
bufptr += 5;
nl = memchr(bufptr, '\n', tail - bufptr);
if (!nl || sizeof(type) <= (nl - bufptr))
return -1;
memcpy(type, bufptr, nl - bufptr);
type[nl - bufptr] = '\0';
bufptr = nl + 1;
if (!strcmp(type, blob_type)) {
item->tagged = (struct object *)lookup_blob(r, &oid);
} else if (!strcmp(type, tree_type)) {
item->tagged = (struct object *)lookup_tree(r, &oid);
} else if (!strcmp(type, commit_type)) {
item->tagged = (struct object *)lookup_commit(r, &oid);
} else if (!strcmp(type, tag_type)) {
item->tagged = (struct object *)lookup_tag(r, &oid);
} else {
return error("unknown tag type '%s' in %s",
type, oid_to_hex(&item->object.oid));
}
if (!item->tagged)
return error("bad tag pointer to %s in %s",
oid_to_hex(&oid),
oid_to_hex(&item->object.oid));
if (bufptr + 4 < tail && starts_with(bufptr, "tag "))
; /* good */
else
return -1;
bufptr += 4;
nl = memchr(bufptr, '\n', tail - bufptr);
if (!nl)
return -1;
item->tag = xmemdupz(bufptr, nl - bufptr);
bufptr = nl + 1;
if (bufptr + 7 < tail && starts_with(bufptr, "tagger "))
item->date = parse_tag_date(bufptr, tail);
else
item->date = 0;
commit, tag: don't set parsed bit for parse failures If we can't parse a commit, then parse_commit() will return an error code. But it _also_ sets the "parsed" flag, which tells us not to bother trying to re-parse the object. That means that subsequent parses have no idea that the information in the struct may be bogus. I.e., doing this: parse_commit(commit); ... if (parse_commit(commit) < 0) die("commit is broken"); will never trigger the die(). The second parse_commit() will see the "parsed" flag and quietly return success. There are two obvious ways to fix this: 1. Stop setting "parsed" until we've successfully parsed. 2. Keep a second "corrupt" flag to indicate that we saw an error (and when the parsed flag is set, return 0/-1 depending on the corrupt flag). This patch does option 1. The obvious downside versus option 2 is that we might continually re-parse a broken object. But in practice, corruption like this is rare, and we typically die() or return an error in the caller. So it's OK not to worry about optimizing for corruption. And it's much simpler: we don't need to use an extra bit in the object struct, and callers which check the "parsed" flag don't need to learn about the corrupt bit, too. There's no new test here, because this case is already covered in t5318. Note that we do need to update the expected message there, because we now detect the problem in the return from "parse_commit()", and not with a separate check for a NULL tree. In fact, we can now ditch that explicit tree check entirely, as we're covered robustly by this change (and the previous recent change to treat a NULL tree as a parse error). We'll also give tags the same treatment. I don't know offhand of any cases where the problem can be triggered (it implies somebody ignoring a parse error earlier in the process), but consistently returning an error should cause the least surprise. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-26 05:20:20 +08:00
item->object.parsed = 1;
return 0;
}
int parse_tag(struct tag *item)
{
enum object_type type;
void *data;
unsigned long size;
int ret;
if (item->object.parsed)
return 0;
data = repo_read_object_file(the_repository, &item->object.oid, &type,
&size);
if (!data)
return error("Could not read %s",
oid_to_hex(&item->object.oid));
if (type != OBJ_TAG) {
free(data);
return error("Object %s not a tag",
oid_to_hex(&item->object.oid));
}
ret = parse_tag_buffer(the_repository, item, data, size);
free(data);
return ret;
}
struct object_id *get_tagged_oid(struct tag *tag)
{
if (!tag->tagged)
die("bad tag");
return &tag->tagged->oid;
}