2018-10-18 06:13:26 +08:00
|
|
|
#ifndef MIDX_H
|
|
|
|
#define MIDX_H
|
2018-07-13 03:39:21 +08:00
|
|
|
|
2021-09-29 09:55:01 +08:00
|
|
|
#include "string-list.h"
|
2018-07-13 03:39:33 +08:00
|
|
|
|
2018-09-19 08:13:36 +08:00
|
|
|
struct object_id;
|
|
|
|
struct pack_entry;
|
2019-04-30 00:18:55 +08:00
|
|
|
struct repository;
|
midx: implement `BTMP` chunk
When a multi-pack bitmap is used to implement verbatim pack reuse (that
is, when verbatim chunks from an on-disk packfile are copied
directly[^1]), it does so by using its "preferred pack" as the source
for pack-reuse.
This allows repositories to pack the majority of their objects into a
single (often large) pack, and then use it as the single source for
verbatim pack reuse. This increases the amount of objects that are
reused verbatim (and consequently, decrease the amount of time it takes
to generate many packs). But this performance comes at a cost, which is
that the preferred packfile must pace its growth with that of the entire
repository in order to maintain the utility of verbatim pack reuse.
As repositories grow beyond what we can reasonably store in a single
packfile, the utility of verbatim pack reuse diminishes. Or, at the very
least, it becomes increasingly more expensive to maintain as the pack
grows larger and larger.
It would be beneficial to be able to perform this same optimization over
multiple packs, provided some modest constraints (most importantly, that
the set of packs eligible for verbatim reuse are disjoint with respect
to the subset of their objects being sent).
If we assume that the packs which we treat as candidates for verbatim
reuse are disjoint with respect to any of their objects we may output,
we need to make only modest modifications to the verbatim pack-reuse
code itself. Most notably, we need to remove the assumption that the
bits in the reachability bitmap corresponding to objects from the single
reuse pack begin at the first bit position.
Future patches will unwind these assumptions and reimplement their
existing functionality as special cases of the more general assumptions
(e.g. that reuse bits can start anywhere within the bitset, but happen
to start at 0 for all existing cases).
This patch does not yet relax any of those assumptions. Instead, it
implements a foundational data-structure, the "Bitampped Packs" (`BTMP`)
chunk of the multi-pack index. The `BTMP` chunk's contents are described
in detail here. Importantly, the `BTMP` chunk contains information to
map regions of a multi-pack index's reachability bitmap to the packs
whose objects they represent.
For now, this chunk is only written, not read (outside of the test-tool
used in this patch to test the new chunk's behavior). Future patches
will begin to make use of this new chunk.
[^1]: Modulo patching any `OFS_DELTA`'s that cross over a region of the
pack that wasn't used verbatim.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-15 06:23:51 +08:00
|
|
|
struct bitmapped_pack;
|
2018-09-19 08:13:36 +08:00
|
|
|
|
2024-04-02 05:16:34 +08:00
|
|
|
#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
|
|
|
|
#define MIDX_VERSION 1
|
|
|
|
#define MIDX_BYTE_FILE_VERSION 4
|
|
|
|
#define MIDX_BYTE_HASH_VERSION 5
|
|
|
|
#define MIDX_BYTE_NUM_CHUNKS 6
|
|
|
|
#define MIDX_BYTE_NUM_PACKS 8
|
|
|
|
#define MIDX_HEADER_SIZE 12
|
|
|
|
|
|
|
|
#define MIDX_CHUNK_ALIGNMENT 4
|
|
|
|
#define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
|
|
|
|
#define MIDX_CHUNKID_BITMAPPEDPACKS 0x42544d50 /* "BTMP" */
|
|
|
|
#define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
|
|
|
|
#define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
|
|
|
|
#define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */
|
|
|
|
#define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */
|
|
|
|
#define MIDX_CHUNKID_REVINDEX 0x52494458 /* "RIDX" */
|
|
|
|
#define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t))
|
|
|
|
#define MIDX_LARGE_OFFSET_NEEDED 0x80000000
|
|
|
|
|
2018-10-13 01:34:20 +08:00
|
|
|
#define GIT_TEST_MULTI_PACK_INDEX "GIT_TEST_MULTI_PACK_INDEX"
|
2021-09-01 04:52:43 +08:00
|
|
|
#define GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP \
|
|
|
|
"GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP"
|
2018-10-13 01:34:20 +08:00
|
|
|
|
2018-07-13 03:39:23 +08:00
|
|
|
struct multi_pack_index {
|
2018-07-13 03:39:33 +08:00
|
|
|
struct multi_pack_index *next;
|
|
|
|
|
2018-07-13 03:39:23 +08:00
|
|
|
const unsigned char *data;
|
|
|
|
size_t data_len;
|
|
|
|
|
pack-revindex: read multi-pack reverse indexes
Implement reading for multi-pack reverse indexes, as described in the
previous patch.
Note that these functions don't yet have any callers, and won't until
multi-pack reachability bitmaps are introduced in a later patch series.
In the meantime, this patch implements some of the infrastructure
necessary to support multi-pack bitmaps.
There are three new functions exposed by the revindex API:
- load_midx_revindex(): loads the reverse index corresponding to the
given multi-pack index.
- midx_to_pack_pos() and pack_pos_to_midx(): these convert between the
multi-pack index and pseudo-pack order.
load_midx_revindex() and pack_pos_to_midx() are both relatively
straightforward.
load_midx_revindex() needs a few functions to be exposed from the midx
API. One to get the checksum of a midx, and another to get the .rev's
filename. Similar to recent changes in the packed_git struct, three new
fields are added to the multi_pack_index struct: one to keep track of
the size, one to keep track of the mmap'd pointer, and another to point
past the header and at the reverse index's data.
pack_pos_to_midx() simply reads the corresponding entry out of the
table.
midx_to_pack_pos() is the trickiest, since it needs to find an object's
position in the psuedo-pack order, but that order can only be recovered
in the .rev file itself. This mapping can be implemented with a binary
search, but note that the thing we're binary searching over isn't an
array of values, but rather a permuted order of those values.
So, when comparing two items, it's helpful to keep in mind the
difference. Instead of a traditional binary search, where you are
comparing two things directly, here we're comparing a (pack, offset)
tuple with an index into the multi-pack index. That index describes
another (pack, offset) tuple, and it is _those_ two tuples that are
compared.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 23:04:26 +08:00
|
|
|
const uint32_t *revindex_data;
|
|
|
|
const uint32_t *revindex_map;
|
|
|
|
size_t revindex_len;
|
|
|
|
|
2018-07-13 03:39:23 +08:00
|
|
|
uint32_t signature;
|
|
|
|
unsigned char version;
|
|
|
|
unsigned char hash_len;
|
|
|
|
unsigned char num_chunks;
|
|
|
|
uint32_t num_packs;
|
|
|
|
uint32_t num_objects;
|
midx: implement `midx_preferred_pack()`
When performing a binary search over the objects in a MIDX's bitmap
(i.e. in pseudo-pack order), the reader reconstructs the pseudo-pack
ordering using a combination of (a) the preferred pack, (b) the pack's
lexical position in the MIDX based on pack names, and (c) the object
offset within the pack.
In order to perform this binary search, the reader must know the
identity of the preferred pack. This could be stored in the MIDX, but
isn't for historical reasons, mostly because it can easily be inferred
at read-time by looking at the object in the first bit position and
finding out which pack it was selected from in the MIDX, like so:
nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0));
In midx_to_pack_pos() which performs this binary search, we look up the
identity of the preferred pack before each search. This is relatively
quick, since it involves two table-driven lookups (one in the MIDX's
revindex for `pack_pos_to_midx()`, and another in the MIDX's object
table for `nth_midxed_pack_int_id()`).
But since the preferred pack does not change after the MIDX is written,
it is safe to cache this value on the MIDX itself.
Write a helper to do just that, and rewrite all of the existing
call-sites that care about the identity of the preferred pack in terms
of this new helper.
This will prepare us for a subsequent patch where we will need to binary
search through the MIDX's pseudo-pack order multiple times.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-15 06:24:25 +08:00
|
|
|
int preferred_pack_idx;
|
2018-07-13 03:39:23 +08:00
|
|
|
|
2018-08-21 00:51:55 +08:00
|
|
|
int local;
|
|
|
|
|
2018-07-13 03:39:27 +08:00
|
|
|
const unsigned char *chunk_pack_names;
|
2023-10-10 05:05:14 +08:00
|
|
|
size_t chunk_pack_names_len;
|
midx: implement `BTMP` chunk
When a multi-pack bitmap is used to implement verbatim pack reuse (that
is, when verbatim chunks from an on-disk packfile are copied
directly[^1]), it does so by using its "preferred pack" as the source
for pack-reuse.
This allows repositories to pack the majority of their objects into a
single (often large) pack, and then use it as the single source for
verbatim pack reuse. This increases the amount of objects that are
reused verbatim (and consequently, decrease the amount of time it takes
to generate many packs). But this performance comes at a cost, which is
that the preferred packfile must pace its growth with that of the entire
repository in order to maintain the utility of verbatim pack reuse.
As repositories grow beyond what we can reasonably store in a single
packfile, the utility of verbatim pack reuse diminishes. Or, at the very
least, it becomes increasingly more expensive to maintain as the pack
grows larger and larger.
It would be beneficial to be able to perform this same optimization over
multiple packs, provided some modest constraints (most importantly, that
the set of packs eligible for verbatim reuse are disjoint with respect
to the subset of their objects being sent).
If we assume that the packs which we treat as candidates for verbatim
reuse are disjoint with respect to any of their objects we may output,
we need to make only modest modifications to the verbatim pack-reuse
code itself. Most notably, we need to remove the assumption that the
bits in the reachability bitmap corresponding to objects from the single
reuse pack begin at the first bit position.
Future patches will unwind these assumptions and reimplement their
existing functionality as special cases of the more general assumptions
(e.g. that reuse bits can start anywhere within the bitset, but happen
to start at 0 for all existing cases).
This patch does not yet relax any of those assumptions. Instead, it
implements a foundational data-structure, the "Bitampped Packs" (`BTMP`)
chunk of the multi-pack index. The `BTMP` chunk's contents are described
in detail here. Importantly, the `BTMP` chunk contains information to
map regions of a multi-pack index's reachability bitmap to the packs
whose objects they represent.
For now, this chunk is only written, not read (outside of the test-tool
used in this patch to test the new chunk's behavior). Future patches
will begin to make use of this new chunk.
[^1]: Modulo patching any `OFS_DELTA`'s that cross over a region of the
pack that wasn't used verbatim.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-15 06:23:51 +08:00
|
|
|
const uint32_t *chunk_bitmapped_packs;
|
|
|
|
size_t chunk_bitmapped_packs_len;
|
2018-07-13 03:39:31 +08:00
|
|
|
const uint32_t *chunk_oid_fanout;
|
2018-07-13 03:39:30 +08:00
|
|
|
const unsigned char *chunk_oid_lookup;
|
2018-07-13 03:39:32 +08:00
|
|
|
const unsigned char *chunk_object_offsets;
|
|
|
|
const unsigned char *chunk_large_offsets;
|
2023-10-10 05:05:30 +08:00
|
|
|
size_t chunk_large_offsets_len;
|
midx: read `RIDX` chunk when present
When a MIDX contains the new `RIDX` chunk, ensure that the reverse index
is read from it instead of the on-disk .rev file. Since we need to
encode the object order in the MIDX itself for correctness reasons,
there is no point in storing the same data again outside of the MIDX.
So, this patch stops writing separate .rev files, and reads it out of
the MIDX itself. This is possible to do with relatively little new code,
since the format of the RIDX chunk is identical to the data in the .rev
file. In other words, we can implement this by pointing the
`revindex_data` field at the reverse index chunk of the MIDX instead of
the .rev file without any other changes.
Note that we have two knobs that are adjusted for the new tests:
GIT_TEST_MIDX_WRITE_REV and GIT_TEST_MIDX_READ_RIDX. The former controls
whether the MIDX .rev is written at all, and the latter controls whether
we read the MIDX's RIDX chunk.
Both are necessary to ensure that the test added at the beginning of
this series continues to work. This is because we always need to write
the RIDX chunk in the MIDX in order to change its checksum, but we want
to make sure reading the existing .rev file still works (since the RIDX
chunk takes precedence by default).
Arguably this isn't a very interesting mode to test, because the
precedence rules mean that we'll always read the RIDX chunk over the
.rev file. But it makes it impossible for a user to induce corruption in
their repository by adjusting the test knobs (since if we had an
either/or knob they could stop writing the RIDX chunk, allowing them to
tweak the MIDX's object order without changing its checksum).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 06:41:17 +08:00
|
|
|
const unsigned char *chunk_revindex;
|
2023-10-10 05:05:33 +08:00
|
|
|
size_t chunk_revindex_len;
|
2018-07-13 03:39:27 +08:00
|
|
|
|
2018-07-13 03:39:28 +08:00
|
|
|
const char **pack_names;
|
2018-07-13 03:39:34 +08:00
|
|
|
struct packed_git **packs;
|
2018-07-13 03:39:23 +08:00
|
|
|
char object_dir[FLEX_ARRAY];
|
|
|
|
};
|
|
|
|
|
2019-10-22 02:39:58 +08:00
|
|
|
#define MIDX_PROGRESS (1 << 0)
|
pack-revindex: write multi-pack reverse indexes
Implement the writing half of multi-pack reverse indexes. This is
nothing more than the format describe a few patches ago, with a new set
of helper functions that will be used to clear out stale .rev files
corresponding to old MIDXs.
Unfortunately, a very similar comparison function as the one implemented
recently in pack-revindex.c is reimplemented here, this time accepting a
MIDX-internal type. An effort to DRY these up would create more
indirection and overhead than is necessary, so it isn't pursued here.
Currently, there are no callers which pass the MIDX_WRITE_REV_INDEX
flag, meaning that this is all dead code. But, that won't be the case
for long, since subsequent patches will introduce the multi-pack bitmap,
which will begin passing this field.
(In midx.c:write_midx_internal(), the two adjacent if statements share a
conditional, but are written separately since the first one will
eventually also handle the MIDX_WRITE_BITMAP flag, which does not yet
exist.)
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 23:04:32 +08:00
|
|
|
#define MIDX_WRITE_REV_INDEX (1 << 1)
|
2021-09-01 04:52:24 +08:00
|
|
|
#define MIDX_WRITE_BITMAP (1 << 2)
|
2021-09-15 06:06:06 +08:00
|
|
|
#define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
|
2022-08-15 00:55:09 +08:00
|
|
|
#define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
|
2019-10-22 02:39:58 +08:00
|
|
|
|
2024-05-30 06:55:42 +08:00
|
|
|
#define MIDX_EXT_REV "rev"
|
|
|
|
#define MIDX_EXT_BITMAP "bitmap"
|
|
|
|
|
2021-09-01 04:52:21 +08:00
|
|
|
const unsigned char *get_midx_checksum(struct multi_pack_index *m);
|
midx.c: write MIDX filenames to strbuf
To ask for the name of a MIDX and its corresponding .rev file, callers
invoke get_midx_filename() and get_midx_rev_filename(), respectively.
These both invoke xstrfmt(), allocating a chunk of memory which must be
freed later on.
This makes callers in pack-bitmap.c somewhat awkward. Specifically,
midx_bitmap_filename(), which is implemented like:
return xstrfmt("%s-%s.bitmap",
get_midx_filename(midx->object_dir),
hash_to_hex(get_midx_checksum(midx)));
this leaks the second argument to xstrfmt(), which itself was allocated
with xstrfmt(). This caller could assign both the result of
get_midx_filename() and the outer xstrfmt() to a temporary variable,
remembering to free() the former before returning. But that involves a
wasteful copy.
Instead, get_midx_filename() and get_midx_rev_filename() take a strbuf
as an output parameter. This way midx_bitmap_filename() can manipulate
and pass around a temporary buffer which it detaches back to its caller.
That allows us to implement the function without copying or open-coding
get_midx_filename() in a way that doesn't leak.
Update the other callers of get_midx_filename() and
get_midx_rev_filename() accordingly.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-27 05:01:21 +08:00
|
|
|
void get_midx_filename(struct strbuf *out, const char *object_dir);
|
2024-05-30 06:55:42 +08:00
|
|
|
void get_midx_filename_ext(struct strbuf *out, const char *object_dir,
|
|
|
|
const unsigned char *hash, const char *ext);
|
pack-revindex: read multi-pack reverse indexes
Implement reading for multi-pack reverse indexes, as described in the
previous patch.
Note that these functions don't yet have any callers, and won't until
multi-pack reachability bitmaps are introduced in a later patch series.
In the meantime, this patch implements some of the infrastructure
necessary to support multi-pack bitmaps.
There are three new functions exposed by the revindex API:
- load_midx_revindex(): loads the reverse index corresponding to the
given multi-pack index.
- midx_to_pack_pos() and pack_pos_to_midx(): these convert between the
multi-pack index and pseudo-pack order.
load_midx_revindex() and pack_pos_to_midx() are both relatively
straightforward.
load_midx_revindex() needs a few functions to be exposed from the midx
API. One to get the checksum of a midx, and another to get the .rev's
filename. Similar to recent changes in the packed_git struct, three new
fields are added to the multi_pack_index struct: one to keep track of
the size, one to keep track of the mmap'd pointer, and another to point
past the header and at the reverse index's data.
pack_pos_to_midx() simply reads the corresponding entry out of the
table.
midx_to_pack_pos() is the trickiest, since it needs to find an object's
position in the psuedo-pack order, but that order can only be recovered
in the .rev file itself. This mapping can be implemented with a binary
search, but note that the thing we're binary searching over isn't an
array of values, but rather a permuted order of those values.
So, when comparing two items, it's helpful to keep in mind the
difference. Instead of a traditional binary search, where you are
comparing two things directly, here we're comparing a (pack, offset)
tuple with an index into the multi-pack index. That index describes
another (pack, offset) tuple, and it is _those_ two tuples that are
compared.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 23:04:26 +08:00
|
|
|
|
2018-08-21 00:51:55 +08:00
|
|
|
struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
|
2019-04-30 00:18:55 +08:00
|
|
|
int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
|
midx: implement `BTMP` chunk
When a multi-pack bitmap is used to implement verbatim pack reuse (that
is, when verbatim chunks from an on-disk packfile are copied
directly[^1]), it does so by using its "preferred pack" as the source
for pack-reuse.
This allows repositories to pack the majority of their objects into a
single (often large) pack, and then use it as the single source for
verbatim pack reuse. This increases the amount of objects that are
reused verbatim (and consequently, decrease the amount of time it takes
to generate many packs). But this performance comes at a cost, which is
that the preferred packfile must pace its growth with that of the entire
repository in order to maintain the utility of verbatim pack reuse.
As repositories grow beyond what we can reasonably store in a single
packfile, the utility of verbatim pack reuse diminishes. Or, at the very
least, it becomes increasingly more expensive to maintain as the pack
grows larger and larger.
It would be beneficial to be able to perform this same optimization over
multiple packs, provided some modest constraints (most importantly, that
the set of packs eligible for verbatim reuse are disjoint with respect
to the subset of their objects being sent).
If we assume that the packs which we treat as candidates for verbatim
reuse are disjoint with respect to any of their objects we may output,
we need to make only modest modifications to the verbatim pack-reuse
code itself. Most notably, we need to remove the assumption that the
bits in the reachability bitmap corresponding to objects from the single
reuse pack begin at the first bit position.
Future patches will unwind these assumptions and reimplement their
existing functionality as special cases of the more general assumptions
(e.g. that reuse bits can start anywhere within the bitset, but happen
to start at 0 for all existing cases).
This patch does not yet relax any of those assumptions. Instead, it
implements a foundational data-structure, the "Bitampped Packs" (`BTMP`)
chunk of the multi-pack index. The `BTMP` chunk's contents are described
in detail here. Importantly, the `BTMP` chunk contains information to
map regions of a multi-pack index's reachability bitmap to the packs
whose objects they represent.
For now, this chunk is only written, not read (outside of the test-tool
used in this patch to test the new chunk's behavior). Future patches
will begin to make use of this new chunk.
[^1]: Modulo patching any `OFS_DELTA`'s that cross over a region of the
pack that wasn't used verbatim.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-15 06:23:51 +08:00
|
|
|
int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
|
|
|
|
struct bitmapped_pack *bp, uint32_t pack_int_id);
|
2018-07-13 03:39:34 +08:00
|
|
|
int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result);
|
2021-03-30 23:04:20 +08:00
|
|
|
off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos);
|
|
|
|
uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos);
|
2018-07-13 03:39:35 +08:00
|
|
|
struct object_id *nth_midxed_object_oid(struct object_id *oid,
|
|
|
|
struct multi_pack_index *m,
|
|
|
|
uint32_t n);
|
2019-04-30 00:18:55 +08:00
|
|
|
int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e, struct multi_pack_index *m);
|
2023-12-15 06:23:54 +08:00
|
|
|
int midx_contains_pack(struct multi_pack_index *m,
|
|
|
|
const char *idx_or_pack_name);
|
|
|
|
int midx_locate_pack(struct multi_pack_index *m, const char *idx_or_pack_name,
|
|
|
|
uint32_t *pos);
|
midx: implement `midx_preferred_pack()`
When performing a binary search over the objects in a MIDX's bitmap
(i.e. in pseudo-pack order), the reader reconstructs the pseudo-pack
ordering using a combination of (a) the preferred pack, (b) the pack's
lexical position in the MIDX based on pack names, and (c) the object
offset within the pack.
In order to perform this binary search, the reader must know the
identity of the preferred pack. This could be stored in the MIDX, but
isn't for historical reasons, mostly because it can easily be inferred
at read-time by looking at the object in the first bit position and
finding out which pack it was selected from in the MIDX, like so:
nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0));
In midx_to_pack_pos() which performs this binary search, we look up the
identity of the preferred pack before each search. This is relatively
quick, since it involves two table-driven lookups (one in the MIDX's
revindex for `pack_pos_to_midx()`, and another in the MIDX's object
table for `nth_midxed_pack_int_id()`).
But since the preferred pack does not change after the MIDX is written,
it is safe to cache this value on the MIDX itself.
Write a helper to do just that, and rewrite all of the existing
call-sites that care about the identity of the preferred pack in terms
of this new helper.
This will prepare us for a subsequent patch where we will need to binary
search through the MIDX's pseudo-pack order multiple times.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-15 06:24:25 +08:00
|
|
|
int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id);
|
2018-08-21 00:51:55 +08:00
|
|
|
int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local);
|
2018-07-13 03:39:23 +08:00
|
|
|
|
2021-09-29 09:55:01 +08:00
|
|
|
/*
|
|
|
|
* Variant of write_midx_file which writes a MIDX containing only the packs
|
|
|
|
* specified in packs_to_include.
|
|
|
|
*/
|
midx: preliminary support for `--refs-snapshot`
To figure out which commits we can write a bitmap for, the multi-pack
index/bitmap code does a reachability traversal, marking any commit
which can be found in the MIDX as eligible to receive a bitmap.
This approach will cause a problem when multi-pack bitmaps are able to
be generated from `git repack`, since the reference tips can change
during the repack. Even though we ignore commits that don't exist in
the MIDX (when doing a scan of the ref tips), it's possible that a
commit in the MIDX reaches something that isn't.
This can happen when a multi-pack index contains some pack which refers
to loose objects (e.g., if a pack was pushed after starting the repack
but before generating the MIDX which depends on an object which is
stored as loose in the repository, and by definition isn't included in
the multi-pack index).
By taking a snapshot of the references before we start repacking, we can
close that race window. In the above scenario (where we have a packed
object pointing at a loose one), we'll either (a) take a snapshot of the
references before seeing the packed one, or (b) take it after, at which
point we can guarantee that the loose object will be packed and included
in the MIDX.
This patch does just that. It writes a temporary "reference snapshot",
which is a list of OIDs that are at the ref tips before writing a
multi-pack bitmap. References that are "preferred" (i.e,. are a suffix
of at least one value of the 'pack.preferBitmapTips' configuration) are
marked with a special '+'.
The format is simple: one line per commit at each tip, with an optional
'+' at the beginning (for preferred references, as described above).
When provided, the reference snapshot is used to drive bitmap selection
instead of the MIDX code doing its own traversal. When it isn't
provided, the usual traversal takes place instead.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-29 09:55:07 +08:00
|
|
|
int write_midx_file(const char *object_dir,
|
|
|
|
const char *preferred_pack_name,
|
|
|
|
const char *refs_snapshot,
|
|
|
|
unsigned flags);
|
2021-09-29 09:55:01 +08:00
|
|
|
int write_midx_file_only(const char *object_dir,
|
|
|
|
struct string_list *packs_to_include,
|
|
|
|
const char *preferred_pack_name,
|
midx: preliminary support for `--refs-snapshot`
To figure out which commits we can write a bitmap for, the multi-pack
index/bitmap code does a reachability traversal, marking any commit
which can be found in the MIDX as eligible to receive a bitmap.
This approach will cause a problem when multi-pack bitmaps are able to
be generated from `git repack`, since the reference tips can change
during the repack. Even though we ignore commits that don't exist in
the MIDX (when doing a scan of the ref tips), it's possible that a
commit in the MIDX reaches something that isn't.
This can happen when a multi-pack index contains some pack which refers
to loose objects (e.g., if a pack was pushed after starting the repack
but before generating the MIDX which depends on an object which is
stored as loose in the repository, and by definition isn't included in
the multi-pack index).
By taking a snapshot of the references before we start repacking, we can
close that race window. In the above scenario (where we have a packed
object pointing at a loose one), we'll either (a) take a snapshot of the
references before seeing the packed one, or (b) take it after, at which
point we can guarantee that the loose object will be packed and included
in the MIDX.
This patch does just that. It writes a temporary "reference snapshot",
which is a list of OIDs that are at the ref tips before writing a
multi-pack bitmap. References that are "preferred" (i.e,. are a suffix
of at least one value of the 'pack.preferBitmapTips' configuration) are
marked with a special '+'.
The format is simple: one line per commit at each tip, with an optional
'+' at the beginning (for preferred references, as described above).
When provided, the reference snapshot is used to drive bitmap selection
instead of the MIDX code doing its own traversal. When it isn't
provided, the usual traversal takes place instead.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-29 09:55:07 +08:00
|
|
|
const char *refs_snapshot,
|
2021-09-29 09:55:01 +08:00
|
|
|
unsigned flags);
|
2018-10-13 01:34:19 +08:00
|
|
|
void clear_midx_file(struct repository *r);
|
2019-10-22 02:39:58 +08:00
|
|
|
int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags);
|
|
|
|
int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags);
|
|
|
|
int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags);
|
2018-07-13 03:39:21 +08:00
|
|
|
|
2018-10-13 01:34:19 +08:00
|
|
|
void close_midx(struct multi_pack_index *m);
|
2018-07-13 03:39:21 +08:00
|
|
|
|
|
|
|
#endif
|