git/builtin/pack-objects.c

3002 lines
78 KiB
C
Raw Normal View History

#include "builtin.h"
#include "cache.h"
#include "attr.h"
#include "object.h"
#include "blob.h"
#include "commit.h"
#include "tag.h"
#include "tree.h"
#include "delta.h"
#include "pack.h"
#include "pack-revindex.h"
#include "csum-file.h"
#include "tree-walk.h"
#include "diff.h"
#include "revision.h"
#include "list-objects.h"
#include "pack-objects.h"
#include "progress.h"
#include "refs.h"
#include "streaming.h"
#include "thread-utils.h"
pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:09 +08:00
#include "pack-bitmap.h"
#include "reachable.h"
#include "sha1-array.h"
#include "argv-array.h"
pack-objects: use mru list when iterating over packs In the original implementation of want_object_in_pack(), we always looked for the object in every pack, so the order did not matter for performance. As of the last few patches, however, we can now often break out of the loop early after finding the first instance, and avoid looking in the other packs at all. In this case, pack order can make a big difference, because we'd like to find the objects by looking at as few packs as possible. This patch switches us to the same packed_git_mru list that is now used by normal object lookups. Here are timings for p5303 on linux.git: Test HEAD^ HEAD ------------------------------------------------------------------------ 5303.3: rev-list (1) 31.31(31.07+0.23) 31.28(31.00+0.27) -0.1% 5303.4: repack (1) 40.35(38.84+2.60) 40.53(39.31+2.32) +0.4% 5303.6: rev-list (50) 31.37(31.15+0.21) 31.41(31.16+0.24) +0.1% 5303.7: repack (50) 58.25(68.54+2.03) 47.28(57.66+1.89) -18.8% 5303.9: rev-list (1000) 31.91(31.57+0.33) 31.93(31.64+0.28) +0.1% 5303.10: repack (1000) 304.80(376.00+3.92) 87.21(159.54+2.84) -71.4% The rev-list numbers are unchanged, which makes sense (they are not exercising this code at all). The 50- and 1000-pack repack cases show considerable improvement. The single-pack repack case doesn't, of course; there's nothing to improve. In fact, it gives us a baseline for how fast we could possibly go. You can see that though rev-list can approach the single-pack case even with 1000 packs, repack doesn't. The reason is simple: the loop we are optimizing is only part of what the repack is doing. After the "counting" phase, we do delta compression, which is much more expensive when there are multiple packs, because we have fewer deltas we can reuse (you can also see that these numbers come from a multicore machine; the CPU times are much higher than the wall-clock times due to the delta phase). So the good news is that in cases with many packs, we used to be dominated by the "counting" phase, and now we are dominated by the delta compression (which is faster, and which we have already parallelized). Here are similar numbers for git.git: Test HEAD^ HEAD --------------------------------------------------------------------- 5303.3: rev-list (1) 1.55(1.51+0.02) 1.54(1.53+0.00) -0.6% 5303.4: repack (1) 1.82(1.80+0.08) 1.82(1.78+0.09) +0.0% 5303.6: rev-list (50) 1.58(1.57+0.00) 1.58(1.56+0.01) +0.0% 5303.7: repack (50) 2.50(3.12+0.07) 2.31(2.95+0.06) -7.6% 5303.9: rev-list (1000) 2.22(2.20+0.02) 2.23(2.19+0.03) +0.5% 5303.10: repack (1000) 10.47(16.78+0.22) 7.50(13.76+0.22) -28.4% Not as impressive in terms of percentage, but still measurable wins. If you look at the wall-clock time improvements in the 1000-pack case, you can see that linux improved by roughly 10x as many seconds as git. That's because it has roughly 10x as many objects, and we'd expect this improvement to scale linearly with the number of objects (since the number of packs is kept constant). It's just that the "counting" phase is a smaller percentage of the total time spent for a git.git repack, and hence the percentage win is smaller. The implementation itself is a straightforward use of the MRU code. We only bother marking a pack as used when we know that we are able to break early out of the loop, for two reasons: 1. If we can't break out early, it does no good; we have to visit each pack anyway, so we might as well avoid even the minor overhead of managing the cache order. 2. The mru_mark() function reorders the list, which would screw up our traversal. So it is only safe to mark when we are about to break out of the loop. We could record the found pack and mark it after the loop finishes, of course, but that's more complicated and it doesn't buy us anything due to (1). Note that this reordering does have a potential impact on the final pack, as we store only a single "found" pack for each object, even if it is present in multiple packs. In principle, any copy is acceptable, as they all refer to the same content. But in practice, they may differ in whether they are stored as deltas, against which base, etc. This may have an impact on delta reuse, and even the delta search (since we skip pairs that were already in the same pack). It's not clear whether this change of order would hurt or even help average cases, though. The most likely reason to have duplicate objects is from the completion of thin packs (e.g., you have some objects in a "base" pack, then receive several pushes; the packs you receive may be thin on the wire, with deltas that refer to bases outside the pack, but we complete them with duplicate base objects when indexing them). In such a case the current code would always find the thin duplicates (because we currently walk the packs in reverse chronological order). Whereas with this patch, some of those duplicates would be found in the base pack instead. In my tests repacking a real-world case of linux.git with 3600 thin-pack pushes (on top of a large "base" pack), the resulting pack was about 0.04% larger with this patch. On the other hand, because we were more likely to hit the base pack, there were more opportunities for delta reuse, and we had 50,000 fewer objects to examine in the delta search. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 17:26:57 +08:00
#include "mru.h"
static const char *pack_usage[] = {
N_("git pack-objects --stdout [<options>...] [< <ref-list> | < <object-list>]"),
N_("git pack-objects [<options>...] <base-name> [< <ref-list> | < <object-list>]"),
NULL
};
/*
* Objects we are going to pack are collected in the `to_pack` structure.
* It contains an array (dynamically expanded) of the object data, and a map
* that can resolve SHA1s to their position in the array.
*/
static struct packing_data to_pack;
static struct pack_idx_entry **written_list;
static uint32_t nr_result, nr_written;
static int non_empty;
static int reuse_delta = 1, reuse_object = 1;
static int keep_unreachable, unpack_unreachable, include_tag;
static unsigned long unpack_unreachable_expiration;
static int pack_loose_unreachable;
static int local;
pack-objects: compute local/ignore_pack_keep early In want_object_in_pack(), we can exit early from our loop if neither "local" nor "ignore_pack_keep" are set. If they are, however, we must examine each pack to see if it has the object and is non-local or has a ".keep". It's quite common for there to be no non-local or .keep packs at all, in which case we know ahead of time that looking further will be pointless. We can pre-compute this by simply iterating over the list of packs ahead of time, and dropping the flags if there are no packs that could match. Another similar strategy would be to modify the loop in want_object_in_pack() to notice that we have already found the object once, and that we are looping only to check for "local" and "keep" attributes. If a pack has neither of those, we can skip the call to find_pack_entry_one(), which is the expensive part of the loop. This has two advantages: - it isn't all-or-nothing; we still get some improvement when there's a small number of kept or non-local packs, and a large number of non-kept local packs - it eliminates any possible race where we add new non-local or kept packs after our initial scan. In practice, I don't think this race matters; we already cache the packed_git information, so somebody who adds a new pack or .keep file after we've started will not be noticed at all, unless we happen to need to call reprepare_packed_git() because a lookup fails. In other words, we're already racy, and the race is not a big deal (losing the race means we might include an object in the pack that would not otherwise be, which is an acceptable outcome). However, it also has a disadvantage: we still loop over the rest of the packs for each object to check their flags. This is much less expensive than doing the object lookup, but still not free. So if we wanted to implement that strategy to cover the non-all-or-nothing cases, we could do so in addition to this one (so you get the most speedup in the all-or-nothing case, and the best we can do in the other cases). But given that the all-or-nothing case is likely the most common, it is probably not worth the trouble, and we can revisit this later if evidence points otherwise. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-29 12:11:31 +08:00
static int have_non_local_packs;
static int incremental;
static int ignore_packed_keep;
static int allow_ofs_delta;
static struct pack_idx_option pack_idx_opts;
static const char *base_name;
static int progress = 1;
static int window = 10;
static unsigned long pack_size_limit;
static int depth = 50;
static int delta_search_threads;
static int pack_to_stdout;
static int num_preferred_base;
static struct progress *progress_state;
pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:09 +08:00
static struct packed_git *reuse_packfile;
static uint32_t reuse_packfile_objects;
static off_t reuse_packfile_offset;
pack-objects: use reachability bitmap index when generating non-stdout pack Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) if a repository has bitmap index, pack-objects can nicely speedup "Counting objects" graph traversal phase. That however was done only for case when resultant pack is sent to stdout, not written into a file. The reason here is for on-disk repack by default we want: - to produce good pack (with bitmap index not-yet-packed objects are emitted to pack in suboptimal order). - to use more robust pack-generation codepath (avoiding possible bugs in bitmap code and possible bitmap index corruption). Jeff King further explains: The reason for this split is that pack-objects tries to determine how "careful" it should be based on whether we are packing to disk or to stdout. Packing to disk implies "git repack", and that we will likely delete the old packs after finishing. We want to be more careful (so as not to carry forward a corruption, and to generate a more optimal pack), and we presumably run less frequently and can afford extra CPU. Whereas packing to stdout implies serving a remote via "git fetch" or "git push". This happens more frequently (e.g., a server handling many fetching clients), and we assume the receiving end takes more responsibility for verifying the data. But this isn't always the case. One might want to generate on-disk packfiles for a specialized object transfer. Just using "--stdout" and writing to a file is not optimal, as it will not generate the matching pack index. So it would be useful to have some way of overriding this heuristic: to tell pack-objects that even though it should generate on-disk files, it is still OK to use the reachability bitmaps to do the traversal. So we can teach pack-objects to use bitmap index for initial object counting phase when generating resultant pack file too: - if we take care to not let it be activated under git-repack: See above about repack robustness and not forward-carrying corruption. - if we know bitmap index generation is not enabled for resultant pack: The current code has singleton bitmap_git, so it cannot work simultaneously with two bitmap indices. We also want to avoid (at least with current implementation) generating bitmaps off of bitmaps. The reason here is: when generating a pack, not-yet-packed objects will be emitted into pack in suboptimal order and added to tail of the bitmap as "extended entries". When the resultant pack + some new objects in associated repository are in turn used to generate another pack with bitmap, the situation repeats: new objects are again not emitted optimally and just added to bitmap tail - not in recency order. So the pack badness can grow over time when at each step we have bitmapped pack + some other objects. That's why we want to avoid generating bitmaps off of bitmaps, not to let pack badness grow. - if we keep pack reuse enabled still only for "send-to-stdout" case: Because pack-to-file needs to generate index for destination pack, and currently on pack reuse raw entries are directly written out to the destination pack by write_reused_pack(), bypassing needed for pack index generation bookkeeping done by regular codepath in write_one() and friends. ( In the future we might teach pack-reuse code about cases when index also needs to be generated for resultant pack and remove pack-reuse-only-for-stdout limitation ) This way for pack-objects -> file we get nice speedup: erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup repository managed by git-backup[2] via time echo 0186ac99 | git pack-objects --revs erp5pack before: 37.2s after: 26.2s And for `git repack -adb` packed git.git time echo 5c589a73 | git pack-objects --revs gitpack before: 7.1s after: 3.6s i.e. it can be 30% - 50% speedup for pack extraction. git-backup extracts many packs on repositories restoration. That was my initial motivation for the patch. [1] https://lab.nexedi.com/nexedi/erp5 [2] https://lab.nexedi.com/kirr/git-backup NOTE Jeff also suggests that pack.useBitmaps was probably a mistake to introduce originally. This way we are not adding another config point, but instead just always default to-file pack-objects not to use bitmap index: Tools which need to generate on-disk packs with using bitmap, can pass --use-bitmap-index explicitly. And git-repack does never pass --use-bitmap-index, so this way we can be sure regular on-disk repacking remains robust. NOTE2 `git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much slower than `git pack-objects file.pack`. Extracting erp5.git pack from lab.nexedi.com backup repository: $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack real 0m22.309s user 0m21.148s sys 0m0.932s $ time git index-pack erp5pack-stdout.pack real 0m50.873s <-- more than 2 times slower than time to generate pack itself! user 0m49.300s sys 0m1.360s So the time for `pack-object --stdout >file.pack` + `index-pack file.pack` is 72s, while `pack-objects file.pack` which does both pack and index is 27s. And even `pack-objects --no-use-bitmap-index file.pack` is 37s. Jeff explains: The packfile does not carry the sha1 of the objects. A receiving index-pack has to compute them itself, including inflating and applying all of the deltas. that's why for `git-backup restore` we want to teach `git pack-objects file.pack` to use bitmaps instead of using `git pack-objects --stdout >file.pack` + `git index-pack file.pack`. NOTE3 The speedup is now tracked via t/perf/p5310-pack-bitmaps.sh Test 56dfeb62 this tree -------------------------------------------------------------------------------- 5310.2: repack to disk 8.98(8.05+0.29) 9.05(8.08+0.33) +0.8% 5310.3: simulated clone 2.02(2.27+0.09) 2.01(2.25+0.08) -0.5% 5310.4: simulated fetch 0.81(1.07+0.02) 0.81(1.05+0.04) +0.0% 5310.5: pack to file 7.58(7.04+0.28) 7.60(7.04+0.30) +0.3% 5310.6: pack to file (bitmap) 7.55(7.02+0.28) 3.25(2.82+0.18) -57.0% 5310.8: clone (partial bitmap) 1.83(2.26+0.12) 1.82(2.22+0.14) -0.5% 5310.9: pack to file (partial bitmap) 6.86(6.58+0.30) 2.87(2.74+0.20) -58.2% More context: http://marc.info/?t=146792101400001&r=1&w=2 http://public-inbox.org/git/20160707190917.20011-1-kirr@nexedi.com/T/#t Cc: Vicent Marti <tanoku@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-10 23:01:44 +08:00
static int use_bitmap_index_default = 1;
static int use_bitmap_index = -1;
pack-objects: implement bitmap writing This commit extends more the functionality of `pack-objects` by allowing it to write out a `.bitmap` index next to any written packs, together with the `.idx` index that currently gets written. If bitmap writing is enabled for a given repository (either by calling `pack-objects` with the `--write-bitmap-index` flag or by having `pack.writebitmaps` set to `true` in the config) and pack-objects is writing a packfile that would normally be indexed (i.e. not piping to stdout), we will attempt to write the corresponding bitmap index for the packfile. Bitmap index writing happens after the packfile and its index has been successfully written to disk (`finish_tmp_packfile`). The process is performed in several steps: 1. `bitmap_writer_set_checksum`: this call stores the partial checksum for the packfile being written; the checksum will be written in the resulting bitmap index to verify its integrity 2. `bitmap_writer_build_type_index`: this call uses the array of `struct object_entry` that has just been sorted when writing out the actual packfile index to disk to generate 4 type-index bitmaps (one for each object type). These bitmaps have their nth bit set if the given object is of the bitmap's type. E.g. the nth bit of the Commits bitmap will be 1 if the nth object in the packfile index is a commit. This is a very cheap operation because the bitmap writing code has access to the metadata stored in the `struct object_entry` array, and hence the real type for each object in the packfile. 3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap index for one of the packfiles we're trying to repack, this call will efficiently rebuild the existing bitmaps so they can be reused on the new index. All the existing bitmaps will be stored in a `reuse` hash table, and the commit selection phase will prioritize these when selecting, as they can be written directly to the new index without having to perform a revision walk to fill the bitmap. This can greatly speed up the repack of a repository that already has bitmaps. 4. `bitmap_writer_select_commits`: if bitmap writing is enabled for a given `pack-objects` run, the sequence of commits generated during the Counting Objects phase will be stored in an array. We then use that array to build up the list of selected commits. Writing a bitmap in the index for each object in the repository would be cost-prohibitive, so we use a simple heuristic to pick the commits that will be indexed with bitmaps. The current heuristics are a simplified version of JGit's original implementation. We select a higher density of commits depending on their age: the 100 most recent commits are always selected, after that we pick 1 commit of each 100, and the gap increases as the commits grow older. On top of that, we make sure that every single branch that has not been merged (all the tips that would be required from a clone) gets their own bitmap, and when selecting commits between a gap, we tend to prioritize the commit with the most parents. Do note that there is no right/wrong way to perform commit selection; different selection algorithms will result in different commits being selected, but there's no such thing as "missing a commit". The bitmap walker algorithm implemented in `prepare_bitmap_walk` is able to adapt to missing bitmaps by performing manual walks that complete the bitmap: the ideal selection algorithm, however, would select the commits that are more likely to be used as roots for a walk in the future (e.g. the tips of each branch, and so on) to ensure a bitmap for them is always available. 5. `bitmap_writer_build`: this is the computationally expensive part of bitmap generation. Based on the list of commits that were selected in the previous step, we perform several incremental walks to generate the bitmap for each commit. The walks begin from the oldest commit, and are built up incrementally for each branch. E.g. consider this dag where A, B, C, D, E, F are the selected commits, and a, b, c, e are a chunk of simplified history that will not receive bitmaps. A---a---B--b--C--c--D \ E--e--F We start by building the bitmap for A, using A as the root for a revision walk and marking all the objects that are reachable until the walk is over. Once this bitmap is stored, we reuse the bitmap walker to perform the walk for B, assuming that once we reach A again, the walk will be terminated because A has already been SEEN on the previous walk. This process is repeated for C, and D, but when we try to generate the bitmaps for E, we can reuse neither the current walk nor the bitmap we have generated so far. What we do now is resetting both the walk and clearing the bitmap, and performing the walk from scratch using E as the origin. This new walk, however, does not need to be completed. Once we hit B, we can lookup the bitmap we have already stored for that commit and OR it with the existing bitmap we've composed so far, allowing us to limit the walk early. After all the bitmaps have been generated, another iteration through the list of commits is performed to find the best XOR offsets for compression before writing them to disk. Because of the incremental nature of these bitmaps, XORing one of them with its predecesor results in a minimal "bitmap delta" most of the time. We can write this delta to the on-disk bitmap index, and then re-compose the original bitmaps by XORing them again when loaded. This is a phase very similar to pack-object's `find_delta` (using bitmaps instead of objects, of course), except the heuristics have been greatly simplified: we only check the 10 bitmaps before any given one to find best compressing one. This gives good results in practice, because there is locality in the ordering of the objects (and therefore bitmaps) in the packfile. 6. `bitmap_writer_finish`: the last step in the process is serializing to disk all the bitmap data that has been generated in the two previous steps. The bitmap is written to a tmp file and then moved atomically to its final destination, using the same process as `pack-write.c:write_idx_file`. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:16 +08:00
static int write_bitmap_index;
pack-bitmap: implement optional name_hash cache When we use pack bitmaps rather than walking the object graph, we end up with the list of objects to include in the packfile, but we do not know the path at which any tree or blob objects would be found. In a recently packed repository, this is fine. A fetch would use the paths only as a heuristic in the delta compression phase, and a fully packed repository should not need to do much delta compression. As time passes, though, we may acquire more objects on top of our large bitmapped pack. If clients fetch frequently, then they never even look at the bitmapped history, and all works as usual. However, a client who has not fetched since the last bitmap repack will have "have" tips in the bitmapped history, but "want" newer objects. The bitmaps themselves degrade gracefully in this circumstance. We manually walk the more recent bits of history, and then use bitmaps when we hit them. But we would also like to perform delta compression between the newer objects and the bitmapped objects (both to delta against what we know the user already has, but also between "new" and "old" objects that the user is fetching). The lack of pathnames makes our delta heuristics much less effective. This patch adds an optional cache of the 32-bit name_hash values to the end of the bitmap file. If present, a reader can use it to match bitmapped and non-bitmapped names during delta compression. Here are perf results for p5310: Test origin/master HEAD^ HEAD ------------------------------------------------------------------------------------------------- 5310.2: repack to disk 36.81(37.82+1.43) 47.70(48.74+1.41) +29.6% 47.75(48.70+1.51) +29.7% 5310.3: simulated clone 30.78(29.70+2.14) 1.08(0.97+0.10) -96.5% 1.07(0.94+0.12) -96.5% 5310.4: simulated fetch 3.16(6.10+0.08) 3.54(10.65+0.06) +12.0% 1.70(3.07+0.06) -46.2% 5310.6: partial bitmap 36.76(43.19+1.81) 6.71(11.25+0.76) -81.7% 4.08(6.26+0.46) -88.9% You can see that the time spent on an incremental fetch goes down, as our delta heuristics are able to do their work. And we save time on the partial bitmap clone for the same reason. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:45 +08:00
static uint16_t write_bitmap_options;
pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:09 +08:00
static unsigned long delta_cache_size = 0;
static unsigned long max_delta_cache_size = 256 * 1024 * 1024;
static unsigned long cache_max_small_delta_size = 1000;
static unsigned long window_memory_limit = 0;
/*
* stats
*/
static uint32_t written, written_delta;
static uint32_t reused, reused_delta;
pack-objects: implement bitmap writing This commit extends more the functionality of `pack-objects` by allowing it to write out a `.bitmap` index next to any written packs, together with the `.idx` index that currently gets written. If bitmap writing is enabled for a given repository (either by calling `pack-objects` with the `--write-bitmap-index` flag or by having `pack.writebitmaps` set to `true` in the config) and pack-objects is writing a packfile that would normally be indexed (i.e. not piping to stdout), we will attempt to write the corresponding bitmap index for the packfile. Bitmap index writing happens after the packfile and its index has been successfully written to disk (`finish_tmp_packfile`). The process is performed in several steps: 1. `bitmap_writer_set_checksum`: this call stores the partial checksum for the packfile being written; the checksum will be written in the resulting bitmap index to verify its integrity 2. `bitmap_writer_build_type_index`: this call uses the array of `struct object_entry` that has just been sorted when writing out the actual packfile index to disk to generate 4 type-index bitmaps (one for each object type). These bitmaps have their nth bit set if the given object is of the bitmap's type. E.g. the nth bit of the Commits bitmap will be 1 if the nth object in the packfile index is a commit. This is a very cheap operation because the bitmap writing code has access to the metadata stored in the `struct object_entry` array, and hence the real type for each object in the packfile. 3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap index for one of the packfiles we're trying to repack, this call will efficiently rebuild the existing bitmaps so they can be reused on the new index. All the existing bitmaps will be stored in a `reuse` hash table, and the commit selection phase will prioritize these when selecting, as they can be written directly to the new index without having to perform a revision walk to fill the bitmap. This can greatly speed up the repack of a repository that already has bitmaps. 4. `bitmap_writer_select_commits`: if bitmap writing is enabled for a given `pack-objects` run, the sequence of commits generated during the Counting Objects phase will be stored in an array. We then use that array to build up the list of selected commits. Writing a bitmap in the index for each object in the repository would be cost-prohibitive, so we use a simple heuristic to pick the commits that will be indexed with bitmaps. The current heuristics are a simplified version of JGit's original implementation. We select a higher density of commits depending on their age: the 100 most recent commits are always selected, after that we pick 1 commit of each 100, and the gap increases as the commits grow older. On top of that, we make sure that every single branch that has not been merged (all the tips that would be required from a clone) gets their own bitmap, and when selecting commits between a gap, we tend to prioritize the commit with the most parents. Do note that there is no right/wrong way to perform commit selection; different selection algorithms will result in different commits being selected, but there's no such thing as "missing a commit". The bitmap walker algorithm implemented in `prepare_bitmap_walk` is able to adapt to missing bitmaps by performing manual walks that complete the bitmap: the ideal selection algorithm, however, would select the commits that are more likely to be used as roots for a walk in the future (e.g. the tips of each branch, and so on) to ensure a bitmap for them is always available. 5. `bitmap_writer_build`: this is the computationally expensive part of bitmap generation. Based on the list of commits that were selected in the previous step, we perform several incremental walks to generate the bitmap for each commit. The walks begin from the oldest commit, and are built up incrementally for each branch. E.g. consider this dag where A, B, C, D, E, F are the selected commits, and a, b, c, e are a chunk of simplified history that will not receive bitmaps. A---a---B--b--C--c--D \ E--e--F We start by building the bitmap for A, using A as the root for a revision walk and marking all the objects that are reachable until the walk is over. Once this bitmap is stored, we reuse the bitmap walker to perform the walk for B, assuming that once we reach A again, the walk will be terminated because A has already been SEEN on the previous walk. This process is repeated for C, and D, but when we try to generate the bitmaps for E, we can reuse neither the current walk nor the bitmap we have generated so far. What we do now is resetting both the walk and clearing the bitmap, and performing the walk from scratch using E as the origin. This new walk, however, does not need to be completed. Once we hit B, we can lookup the bitmap we have already stored for that commit and OR it with the existing bitmap we've composed so far, allowing us to limit the walk early. After all the bitmaps have been generated, another iteration through the list of commits is performed to find the best XOR offsets for compression before writing them to disk. Because of the incremental nature of these bitmaps, XORing one of them with its predecesor results in a minimal "bitmap delta" most of the time. We can write this delta to the on-disk bitmap index, and then re-compose the original bitmaps by XORing them again when loaded. This is a phase very similar to pack-object's `find_delta` (using bitmaps instead of objects, of course), except the heuristics have been greatly simplified: we only check the 10 bitmaps before any given one to find best compressing one. This gives good results in practice, because there is locality in the ordering of the objects (and therefore bitmaps) in the packfile. 6. `bitmap_writer_finish`: the last step in the process is serializing to disk all the bitmap data that has been generated in the two previous steps. The bitmap is written to a tmp file and then moved atomically to its final destination, using the same process as `pack-write.c:write_idx_file`. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:16 +08:00
/*
* Indexed commits
*/
static struct commit **indexed_commits;
static unsigned int indexed_commits_nr;
static unsigned int indexed_commits_alloc;
static void index_commit_for_bitmap(struct commit *commit)
{
if (indexed_commits_nr >= indexed_commits_alloc) {
indexed_commits_alloc = (indexed_commits_alloc + 32) * 2;
REALLOC_ARRAY(indexed_commits, indexed_commits_alloc);
pack-objects: implement bitmap writing This commit extends more the functionality of `pack-objects` by allowing it to write out a `.bitmap` index next to any written packs, together with the `.idx` index that currently gets written. If bitmap writing is enabled for a given repository (either by calling `pack-objects` with the `--write-bitmap-index` flag or by having `pack.writebitmaps` set to `true` in the config) and pack-objects is writing a packfile that would normally be indexed (i.e. not piping to stdout), we will attempt to write the corresponding bitmap index for the packfile. Bitmap index writing happens after the packfile and its index has been successfully written to disk (`finish_tmp_packfile`). The process is performed in several steps: 1. `bitmap_writer_set_checksum`: this call stores the partial checksum for the packfile being written; the checksum will be written in the resulting bitmap index to verify its integrity 2. `bitmap_writer_build_type_index`: this call uses the array of `struct object_entry` that has just been sorted when writing out the actual packfile index to disk to generate 4 type-index bitmaps (one for each object type). These bitmaps have their nth bit set if the given object is of the bitmap's type. E.g. the nth bit of the Commits bitmap will be 1 if the nth object in the packfile index is a commit. This is a very cheap operation because the bitmap writing code has access to the metadata stored in the `struct object_entry` array, and hence the real type for each object in the packfile. 3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap index for one of the packfiles we're trying to repack, this call will efficiently rebuild the existing bitmaps so they can be reused on the new index. All the existing bitmaps will be stored in a `reuse` hash table, and the commit selection phase will prioritize these when selecting, as they can be written directly to the new index without having to perform a revision walk to fill the bitmap. This can greatly speed up the repack of a repository that already has bitmaps. 4. `bitmap_writer_select_commits`: if bitmap writing is enabled for a given `pack-objects` run, the sequence of commits generated during the Counting Objects phase will be stored in an array. We then use that array to build up the list of selected commits. Writing a bitmap in the index for each object in the repository would be cost-prohibitive, so we use a simple heuristic to pick the commits that will be indexed with bitmaps. The current heuristics are a simplified version of JGit's original implementation. We select a higher density of commits depending on their age: the 100 most recent commits are always selected, after that we pick 1 commit of each 100, and the gap increases as the commits grow older. On top of that, we make sure that every single branch that has not been merged (all the tips that would be required from a clone) gets their own bitmap, and when selecting commits between a gap, we tend to prioritize the commit with the most parents. Do note that there is no right/wrong way to perform commit selection; different selection algorithms will result in different commits being selected, but there's no such thing as "missing a commit". The bitmap walker algorithm implemented in `prepare_bitmap_walk` is able to adapt to missing bitmaps by performing manual walks that complete the bitmap: the ideal selection algorithm, however, would select the commits that are more likely to be used as roots for a walk in the future (e.g. the tips of each branch, and so on) to ensure a bitmap for them is always available. 5. `bitmap_writer_build`: this is the computationally expensive part of bitmap generation. Based on the list of commits that were selected in the previous step, we perform several incremental walks to generate the bitmap for each commit. The walks begin from the oldest commit, and are built up incrementally for each branch. E.g. consider this dag where A, B, C, D, E, F are the selected commits, and a, b, c, e are a chunk of simplified history that will not receive bitmaps. A---a---B--b--C--c--D \ E--e--F We start by building the bitmap for A, using A as the root for a revision walk and marking all the objects that are reachable until the walk is over. Once this bitmap is stored, we reuse the bitmap walker to perform the walk for B, assuming that once we reach A again, the walk will be terminated because A has already been SEEN on the previous walk. This process is repeated for C, and D, but when we try to generate the bitmaps for E, we can reuse neither the current walk nor the bitmap we have generated so far. What we do now is resetting both the walk and clearing the bitmap, and performing the walk from scratch using E as the origin. This new walk, however, does not need to be completed. Once we hit B, we can lookup the bitmap we have already stored for that commit and OR it with the existing bitmap we've composed so far, allowing us to limit the walk early. After all the bitmaps have been generated, another iteration through the list of commits is performed to find the best XOR offsets for compression before writing them to disk. Because of the incremental nature of these bitmaps, XORing one of them with its predecesor results in a minimal "bitmap delta" most of the time. We can write this delta to the on-disk bitmap index, and then re-compose the original bitmaps by XORing them again when loaded. This is a phase very similar to pack-object's `find_delta` (using bitmaps instead of objects, of course), except the heuristics have been greatly simplified: we only check the 10 bitmaps before any given one to find best compressing one. This gives good results in practice, because there is locality in the ordering of the objects (and therefore bitmaps) in the packfile. 6. `bitmap_writer_finish`: the last step in the process is serializing to disk all the bitmap data that has been generated in the two previous steps. The bitmap is written to a tmp file and then moved atomically to its final destination, using the same process as `pack-write.c:write_idx_file`. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:16 +08:00
}
indexed_commits[indexed_commits_nr++] = commit;
}
static void *get_delta(struct object_entry *entry)
{
unsigned long size, base_size, delta_size;
void *buf, *base_buf, *delta_buf;
enum object_type type;
buf = read_sha1_file(entry->idx.sha1, &type, &size);
if (!buf)
die("unable to read %s", sha1_to_hex(entry->idx.sha1));
base_buf = read_sha1_file(entry->delta->idx.sha1, &type, &base_size);
if (!base_buf)
die("unable to read %s", sha1_to_hex(entry->delta->idx.sha1));
delta_buf = diff_delta(base_buf, base_size,
buf, size, &delta_size, 0);
if (!delta_buf || delta_size != entry->delta_size)
die("delta size changed");
free(buf);
free(base_buf);
return delta_buf;
}
static unsigned long do_compress(void **pptr, unsigned long size)
{
2011-06-11 02:52:15 +08:00
git_zstream stream;
void *in, *out;
unsigned long maxsize;
git_deflate_init(&stream, pack_compression_level);
maxsize = git_deflate_bound(&stream, size);
in = *pptr;
out = xmalloc(maxsize);
*pptr = out;
stream.next_in = in;
stream.avail_in = size;
stream.next_out = out;
stream.avail_out = maxsize;
while (git_deflate(&stream, Z_FINISH) == Z_OK)
; /* nothing */
git_deflate_end(&stream);
free(in);
return stream.total_out;
}
static unsigned long write_large_blob_data(struct git_istream *st, struct sha1file *f,
const unsigned char *sha1)
{
git_zstream stream;
unsigned char ibuf[1024 * 16];
unsigned char obuf[1024 * 16];
unsigned long olen = 0;
git_deflate_init(&stream, pack_compression_level);
for (;;) {
ssize_t readlen;
int zret = Z_OK;
readlen = read_istream(st, ibuf, sizeof(ibuf));
if (readlen == -1)
die(_("unable to read %s"), sha1_to_hex(sha1));
stream.next_in = ibuf;
stream.avail_in = readlen;
while ((stream.avail_in || readlen == 0) &&
(zret == Z_OK || zret == Z_BUF_ERROR)) {
stream.next_out = obuf;
stream.avail_out = sizeof(obuf);
zret = git_deflate(&stream, readlen ? 0 : Z_FINISH);
sha1write(f, obuf, stream.next_out - obuf);
olen += stream.next_out - obuf;
}
if (stream.avail_in)
die(_("deflate error (%d)"), zret);
if (readlen == 0) {
if (zret != Z_STREAM_END)
die(_("deflate error (%d)"), zret);
break;
}
}
git_deflate_end(&stream);
return olen;
}
/*
* we are going to reuse the existing object data as is. make
* sure it is not corrupt.
*/
static int check_pack_inflate(struct packed_git *p,
struct pack_window **w_curs,
off_t offset,
off_t len,
unsigned long expect)
{
2011-06-11 02:52:15 +08:00
git_zstream stream;
unsigned char fakebuf[4096], *in;
int st;
memset(&stream, 0, sizeof(stream));
git_inflate_init(&stream);
do {
in = use_pack(p, w_curs, offset, &stream.avail_in);
stream.next_in = in;
stream.next_out = fakebuf;
stream.avail_out = sizeof(fakebuf);
st = git_inflate(&stream, Z_FINISH);
offset += stream.next_in - in;
} while (st == Z_OK || st == Z_BUF_ERROR);
git_inflate_end(&stream);
return (st == Z_STREAM_END &&
stream.total_out == expect &&
stream.total_in == len) ? 0 : -1;
}
static void copy_pack_data(struct sha1file *f,
struct packed_git *p,
struct pack_window **w_curs,
off_t offset,
off_t len)
{
unsigned char *in;
2011-06-11 02:52:15 +08:00
unsigned long avail;
while (len) {
in = use_pack(p, w_curs, offset, &avail);
if (avail > len)
2011-06-11 02:52:15 +08:00
avail = (unsigned long)len;
sha1write(f, in, avail);
offset += avail;
len -= avail;
}
}
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
/* Return 0 if we will bust the pack-size limit */
static unsigned long write_no_reuse_object(struct sha1file *f, struct object_entry *entry,
unsigned long limit, int usable_delta)
{
unsigned long size, datalen;
unsigned char header[10], dheader[10];
unsigned hdrlen;
enum object_type type;
void *buf;
struct git_istream *st = NULL;
if (!usable_delta) {
if (entry->type == OBJ_BLOB &&
entry->size > big_file_threshold &&
(st = open_istream(entry->idx.sha1, &type, &size, NULL)) != NULL)
buf = NULL;
else {
buf = read_sha1_file(entry->idx.sha1, &type, &size);
if (!buf)
die(_("unable to read %s"), sha1_to_hex(entry->idx.sha1));
}
/*
* make sure no cached delta data remains from a
* previous attempt before a pack split occurred.
*/
free(entry->delta_data);
entry->delta_data = NULL;
entry->z_delta_size = 0;
} else if (entry->delta_data) {
size = entry->delta_size;
buf = entry->delta_data;
entry->delta_data = NULL;
type = (allow_ofs_delta && entry->delta->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
} else {
buf = get_delta(entry);
size = entry->delta_size;
type = (allow_ofs_delta && entry->delta->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
}
if (st) /* large blob case, just assume we don't compress well */
datalen = size;
else if (entry->z_delta_size)
datalen = entry->z_delta_size;
else
datalen = do_compress(&buf, size);
/*
* The object header is a byte of 'type' followed by zero or
* more bytes of length.
*/
hdrlen = encode_in_pack_object_header(type, size, header);
if (type == OBJ_OFS_DELTA) {
/*
* Deltas with relative base contain an additional
* encoding of the relative offset for the delta
* base from this object's position in the pack.
*/
off_t ofs = entry->idx.offset - entry->delta->idx.offset;
unsigned pos = sizeof(dheader) - 1;
dheader[pos] = ofs & 127;
while (ofs >>= 7)
dheader[--pos] = 128 | (--ofs & 127);
if (limit && hdrlen + sizeof(dheader) - pos + datalen + 20 >= limit) {
if (st)
close_istream(st);
free(buf);
return 0;
}
sha1write(f, header, hdrlen);
sha1write(f, dheader + pos, sizeof(dheader) - pos);
hdrlen += sizeof(dheader) - pos;
} else if (type == OBJ_REF_DELTA) {
/*
* Deltas with a base reference contain
* an additional 20 bytes for the base sha1.
*/
if (limit && hdrlen + 20 + datalen + 20 >= limit) {
if (st)
close_istream(st);
free(buf);
return 0;
}
sha1write(f, header, hdrlen);
sha1write(f, entry->delta->idx.sha1, 20);
hdrlen += 20;
} else {
if (limit && hdrlen + datalen + 20 >= limit) {
if (st)
close_istream(st);
free(buf);
return 0;
}
sha1write(f, header, hdrlen);
}
if (st) {
datalen = write_large_blob_data(st, f, entry->idx.sha1);
close_istream(st);
} else {
sha1write(f, buf, datalen);
free(buf);
}
return hdrlen + datalen;
}
/* Return 0 if we will bust the pack-size limit */
static off_t write_reuse_object(struct sha1file *f, struct object_entry *entry,
unsigned long limit, int usable_delta)
{
struct packed_git *p = entry->in_pack;
struct pack_window *w_curs = NULL;
struct revindex_entry *revidx;
off_t offset;
enum object_type type = entry->type;
off_t datalen;
unsigned char header[10], dheader[10];
unsigned hdrlen;
if (entry->delta)
type = (allow_ofs_delta && entry->delta->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
hdrlen = encode_in_pack_object_header(type, entry->size, header);
offset = entry->in_pack_offset;
revidx = find_pack_revindex(p, offset);
datalen = revidx[1].offset - offset;
if (!pack_to_stdout && p->index_version > 1 &&
check_pack_crc(p, &w_curs, offset, datalen, revidx->nr)) {
error("bad packed object CRC for %s", sha1_to_hex(entry->idx.sha1));
unuse_pack(&w_curs);
return write_no_reuse_object(f, entry, limit, usable_delta);
}
offset += entry->in_pack_header_size;
datalen -= entry->in_pack_header_size;
if (!pack_to_stdout && p->index_version == 1 &&
check_pack_inflate(p, &w_curs, offset, datalen, entry->size)) {
error("corrupt packed object for %s", sha1_to_hex(entry->idx.sha1));
unuse_pack(&w_curs);
return write_no_reuse_object(f, entry, limit, usable_delta);
}
if (type == OBJ_OFS_DELTA) {
off_t ofs = entry->idx.offset - entry->delta->idx.offset;
unsigned pos = sizeof(dheader) - 1;
dheader[pos] = ofs & 127;
while (ofs >>= 7)
dheader[--pos] = 128 | (--ofs & 127);
if (limit && hdrlen + sizeof(dheader) - pos + datalen + 20 >= limit) {
unuse_pack(&w_curs);
return 0;
}
sha1write(f, header, hdrlen);
sha1write(f, dheader + pos, sizeof(dheader) - pos);
hdrlen += sizeof(dheader) - pos;
reused_delta++;
} else if (type == OBJ_REF_DELTA) {
if (limit && hdrlen + 20 + datalen + 20 >= limit) {
unuse_pack(&w_curs);
return 0;
}
sha1write(f, header, hdrlen);
sha1write(f, entry->delta->idx.sha1, 20);
hdrlen += 20;
reused_delta++;
} else {
if (limit && hdrlen + datalen + 20 >= limit) {
unuse_pack(&w_curs);
return 0;
}
sha1write(f, header, hdrlen);
}
copy_pack_data(f, p, &w_curs, offset, datalen);
unuse_pack(&w_curs);
reused++;
return hdrlen + datalen;
}
/* Return 0 if we will bust the pack-size limit */
static off_t write_object(struct sha1file *f,
struct object_entry *entry,
off_t write_offset)
{
unsigned long limit;
off_t len;
int usable_delta, to_reuse;
compute a CRC32 for each object as stored in a pack The most important optimization for performance when repacking is the ability to reuse data from a previous pack as is and bypass any delta or even SHA1 computation by simply copying the raw data from one pack to another directly. The problem with this is that any data corruption within a copied object would go unnoticed and the new (repacked) pack would be self-consistent with its own checksum despite containing a corrupted object. This is a real issue that already happened at least once in the past. In some attempt to prevent this, we validate the copied data by inflating it and making sure no error is signaled by zlib. But this is still not perfect as a significant portion of a pack content is made of object headers and references to delta base objects which are not deflated and therefore not validated when repacking actually making the pack data reuse still not as safe as it could be. Of course a full SHA1 validation could be performed, but that implies full data inflating and delta replaying which is extremely costly, which cost the data reuse optimization was designed to avoid in the first place. So the best solution to this is simply to store a CRC32 of the raw pack data for each object in the pack index. This way any object in a pack can be validated before being copied as is in another pack, including header and any other non deflated data. Why CRC32 instead of a faster checksum like Adler32? Quoting Wikipedia: Jonathan Stone discovered in 2001 that Adler-32 has a weakness for very short messages. He wrote "Briefly, the problem is that, for very short packets, Adler32 is guaranteed to give poor coverage of the available bits. Don't take my word for it, ask Mark Adler. :-)" The problem is that sum A does not wrap for short messages. The maximum value of A for a 128-byte message is 32640, which is below the value 65521 used by the modulo operation. An extended explanation can be found in RFC 3309, which mandates the use of CRC32 instead of Adler-32 for SCTP, the Stream Control Transmission Protocol. In the context of a GIT pack, we have lots of small objects, especially deltas, which are likely to be quite small and in a size range for which Adler32 is dimed not to be sufficient. Another advantage of CRC32 is the possibility for recovery from certain types of small corruptions like single bit errors which are the most probable type of corruptions. OK what this patch does is to compute the CRC32 of each object written to a pack within pack-objects. It is not written to the index yet and it is obviously not validated when reusing pack data yet either. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-09 13:06:31 +08:00
if (!pack_to_stdout)
crc32_begin(f);
pack-objects: fix pack generation when using pack_size_limit Current handling of pack_size_limit is quite suboptimal. Let's consider a list of objects to pack which contain alternatively big and small objects (which pretty matches reality when big blobs are interlaced with tree objects). Currently, the code simply close the pack and opens a new one when the next object in line breaks the size limit. The current code may degenerate into: - small tree object => store into pack #1 - big blob object busting the pack size limit => store into pack #2 - small blob but pack #2 is over the limit already => pack #3 - big blob busting the size limit => pack #4 - small tree but pack #4 is over the limit => pack #5 - big blob => pack #6 - small tree => pack #7 - ... and so on. The reality is that the content of packs 1, 3, 5 and 7 could well be stored more efficiently (and delta compressed) together in pack #1 if the big blobs were not forcing an immediate transition to a new pack. Incidentally this can be fixed pretty easily by simply skipping over those objects that are too big to fit in the current pack while trying the whole list of unwritten objects, and then that list considered from the beginning again when a new pack is opened. This creates much fewer smallish pack files and help making more predictable test cases for the test suite. This change made one of the self sanity checks useless so it is removed as well. That check was rather redundant already anyway. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-04 11:48:27 +08:00
/* apply size limit if limited packsize and not first object */
if (!pack_size_limit || !nr_written)
limit = 0;
else if (pack_size_limit <= write_offset)
/*
* the earlier object did not fit the limit; avoid
* mistaking this with unlimited (i.e. limit = 0).
*/
limit = 1;
else
limit = pack_size_limit - write_offset;
if (!entry->delta)
usable_delta = 0; /* no delta */
else if (!pack_size_limit)
usable_delta = 1; /* unlimited packfile */
else if (entry->delta->idx.offset == (off_t)-1)
usable_delta = 0; /* base was written to another pack */
else if (entry->delta->idx.offset)
usable_delta = 1; /* base already exists in this pack */
else
usable_delta = 0; /* base could end up in another pack */
if (!reuse_object)
to_reuse = 0; /* explicit */
else if (!entry->in_pack)
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17 03:55:51 +08:00
to_reuse = 0; /* can't reuse what we don't have */
else if (entry->type == OBJ_REF_DELTA || entry->type == OBJ_OFS_DELTA)
/* check_object() decided it for us ... */
to_reuse = usable_delta;
/* ... but pack split may override that */
else if (entry->type != entry->in_pack_type)
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17 03:55:51 +08:00
to_reuse = 0; /* pack has delta which is unusable */
else if (entry->delta)
to_reuse = 0; /* we want to pack afresh */
else
to_reuse = 1; /* we have it in-pack undeltified,
* and we do not need to deltify it.
*/
if (!to_reuse)
len = write_no_reuse_object(f, entry, limit, usable_delta);
else
len = write_reuse_object(f, entry, limit, usable_delta);
if (!len)
return 0;
if (usable_delta)
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17 03:55:51 +08:00
written_delta++;
written++;
compute a CRC32 for each object as stored in a pack The most important optimization for performance when repacking is the ability to reuse data from a previous pack as is and bypass any delta or even SHA1 computation by simply copying the raw data from one pack to another directly. The problem with this is that any data corruption within a copied object would go unnoticed and the new (repacked) pack would be self-consistent with its own checksum despite containing a corrupted object. This is a real issue that already happened at least once in the past. In some attempt to prevent this, we validate the copied data by inflating it and making sure no error is signaled by zlib. But this is still not perfect as a significant portion of a pack content is made of object headers and references to delta base objects which are not deflated and therefore not validated when repacking actually making the pack data reuse still not as safe as it could be. Of course a full SHA1 validation could be performed, but that implies full data inflating and delta replaying which is extremely costly, which cost the data reuse optimization was designed to avoid in the first place. So the best solution to this is simply to store a CRC32 of the raw pack data for each object in the pack index. This way any object in a pack can be validated before being copied as is in another pack, including header and any other non deflated data. Why CRC32 instead of a faster checksum like Adler32? Quoting Wikipedia: Jonathan Stone discovered in 2001 that Adler-32 has a weakness for very short messages. He wrote "Briefly, the problem is that, for very short packets, Adler32 is guaranteed to give poor coverage of the available bits. Don't take my word for it, ask Mark Adler. :-)" The problem is that sum A does not wrap for short messages. The maximum value of A for a 128-byte message is 32640, which is below the value 65521 used by the modulo operation. An extended explanation can be found in RFC 3309, which mandates the use of CRC32 instead of Adler-32 for SCTP, the Stream Control Transmission Protocol. In the context of a GIT pack, we have lots of small objects, especially deltas, which are likely to be quite small and in a size range for which Adler32 is dimed not to be sufficient. Another advantage of CRC32 is the possibility for recovery from certain types of small corruptions like single bit errors which are the most probable type of corruptions. OK what this patch does is to compute the CRC32 of each object written to a pack within pack-objects. It is not written to the index yet and it is obviously not validated when reusing pack data yet either. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-09 13:06:31 +08:00
if (!pack_to_stdout)
entry->idx.crc32 = crc32_end(f);
return len;
}
enum write_one_status {
WRITE_ONE_SKIP = -1, /* already written */
WRITE_ONE_BREAK = 0, /* writing this will bust the limit; not written */
WRITE_ONE_WRITTEN = 1, /* normal */
WRITE_ONE_RECURSIVE = 2 /* already scheduled to be written */
};
static enum write_one_status write_one(struct sha1file *f,
struct object_entry *e,
off_t *offset)
{
off_t size;
int recursing;
/*
* we set offset to 1 (which is an impossible value) to mark
* the fact that this object is involved in "write its base
* first before writing a deltified object" recursion.
*/
recursing = (e->idx.offset == 1);
if (recursing) {
warning("recursive delta detected for object %s",
sha1_to_hex(e->idx.sha1));
return WRITE_ONE_RECURSIVE;
} else if (e->idx.offset || e->preferred_base) {
/* offset is non zero if object is written already. */
return WRITE_ONE_SKIP;
}
/* if we are deltified, write out base object first. */
if (e->delta) {
e->idx.offset = 1; /* now recurse */
switch (write_one(f, e->delta, offset)) {
case WRITE_ONE_RECURSIVE:
/* we cannot depend on this one */
e->delta = NULL;
break;
default:
break;
case WRITE_ONE_BREAK:
e->idx.offset = recursing;
return WRITE_ONE_BREAK;
}
}
e->idx.offset = *offset;
size = write_object(f, e, *offset);
if (!size) {
e->idx.offset = recursing;
return WRITE_ONE_BREAK;
}
written_list[nr_written++] = &e->idx;
/* make sure off_t is sufficiently large not to wrap */
if (signed_add_overflows(*offset, size))
die("pack too large for current definition of off_t");
*offset += size;
return WRITE_ONE_WRITTEN;
}
static int mark_tagged(const char *path, const struct object_id *oid, int flag,
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
void *cb_data)
{
unsigned char peeled[20];
struct object_entry *entry = packlist_find(&to_pack, oid->hash, NULL);
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
if (entry)
entry->tagged = 1;
if (!peel_ref(path, peeled)) {
entry = packlist_find(&to_pack, peeled, NULL);
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
if (entry)
entry->tagged = 1;
}
return 0;
}
static inline void add_to_write_order(struct object_entry **wo,
unsigned int *endp,
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
struct object_entry *e)
{
if (e->filled)
return;
wo[(*endp)++] = e;
e->filled = 1;
}
static void add_descendants_to_write_order(struct object_entry **wo,
unsigned int *endp,
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
struct object_entry *e)
{
int add_to_order = 1;
while (e) {
if (add_to_order) {
struct object_entry *s;
/* add this node... */
add_to_write_order(wo, endp, e);
/* all its siblings... */
for (s = e->delta_sibling; s; s = s->delta_sibling) {
add_to_write_order(wo, endp, s);
}
}
/* drop down a level to add left subtree nodes if possible */
if (e->delta_child) {
add_to_order = 1;
e = e->delta_child;
} else {
add_to_order = 0;
/* our sibling might have some children, it is next */
if (e->delta_sibling) {
e = e->delta_sibling;
continue;
}
/* go back to our parent node */
e = e->delta;
while (e && !e->delta_sibling) {
/* we're on the right side of a subtree, keep
* going up until we can go right again */
e = e->delta;
}
if (!e) {
/* done- we hit our original root node */
return;
}
/* pass it off to sibling at this level */
e = e->delta_sibling;
}
};
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
}
static void add_family_to_write_order(struct object_entry **wo,
unsigned int *endp,
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
struct object_entry *e)
{
struct object_entry *root;
for (root = e; root->delta; root = root->delta)
; /* nothing */
add_descendants_to_write_order(wo, endp, root);
}
static struct object_entry **compute_write_order(void)
{
unsigned int i, wo_end, last_untagged;
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
struct object_entry **wo;
struct object_entry *objects = to_pack.objects;
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
for (i = 0; i < to_pack.nr_objects; i++) {
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
objects[i].tagged = 0;
objects[i].filled = 0;
objects[i].delta_child = NULL;
objects[i].delta_sibling = NULL;
}
/*
* Fully connect delta_child/delta_sibling network.
* Make sure delta_sibling is sorted in the original
* recency order.
*/
for (i = to_pack.nr_objects; i > 0;) {
struct object_entry *e = &objects[--i];
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
if (!e->delta)
continue;
/* Mark me as the first child */
e->delta_sibling = e->delta->delta_child;
e->delta->delta_child = e;
}
/*
* Mark objects that are at the tip of tags.
*/
for_each_tag_ref(mark_tagged, NULL);
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
/*
* Give the objects in the original recency order until
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
* we see a tagged tip.
*/
ALLOC_ARRAY(wo, to_pack.nr_objects);
for (i = wo_end = 0; i < to_pack.nr_objects; i++) {
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
if (objects[i].tagged)
break;
add_to_write_order(wo, &wo_end, &objects[i]);
}
last_untagged = i;
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
/*
* Then fill all the tagged tips.
*/
for (; i < to_pack.nr_objects; i++) {
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
if (objects[i].tagged)
add_to_write_order(wo, &wo_end, &objects[i]);
}
/*
* And then all remaining commits and tags.
*/
for (i = last_untagged; i < to_pack.nr_objects; i++) {
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
if (objects[i].type != OBJ_COMMIT &&
objects[i].type != OBJ_TAG)
continue;
add_to_write_order(wo, &wo_end, &objects[i]);
}
/*
* And then all the trees.
*/
for (i = last_untagged; i < to_pack.nr_objects; i++) {
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
if (objects[i].type != OBJ_TREE)
continue;
add_to_write_order(wo, &wo_end, &objects[i]);
}
/*
* Finally all the rest in really tight order
*/
for (i = last_untagged; i < to_pack.nr_objects; i++) {
if (!objects[i].filled)
add_family_to_write_order(wo, &wo_end, &objects[i]);
}
if (wo_end != to_pack.nr_objects)
die("ordered %u objects, expected %"PRIu32, wo_end, to_pack.nr_objects);
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
return wo;
}
pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:09 +08:00
static off_t write_reused_pack(struct sha1file *f)
{
unsigned char buffer[8192];
pack-objects: show progress for reused packfiles When the "--all-progress" option is in effect, pack-objects shows a progress report for the "writing" phase. If the repository has bitmaps and we are reusing a packfile, the user sees no progress update until the whole packfile is sent. Since this is typically the bulk of what is being written, it can look like git hangs during this phase, even though the transfer is proceeding. This generally only happens with "git push" from a repository with bitmaps. We do not use "--all-progress" for fetch (since the result is going to index-pack on the client, which takes care of progress reporting). And for regular repacks to disk, we do not reuse packfiles. We already have the progress meter setup during write_reused_pack; we just need to call display_progress whiel we are writing out the pack. The progress meter is attached to our output descriptor, so it automatically handles the throughput measurements. However, we need to update the object count as we go, since that is what feeds the percentage we show. We aren't actually parsing the packfile as we send it, so we have no idea how many objects we have sent; we only know that at the end of N bytes, we will have sent M objects. So we cheat a little and assume each object is M/N bytes (i.e., the mean of the objects we are sending). While this isn't strictly true, it actually produces a more pleasing progress meter for the user, as it moves smoothly and predictably (and nobody really cares about the object count; they care about the percentage, and the object count is a proxy for that). One alternative would be to actually show two progress meters: one for the reused pack, and one for the rest of the objects. That would more closely reflect the data we have (the first would be measured in bytes, and the second measured in objects). But it would also be more complex and annoying to the user; rather than seeing one progress meter counting up to 100%, they would finish one meter, then start another one at zero. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-15 10:26:21 +08:00
off_t to_write, total;
pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:09 +08:00
int fd;
if (!is_pack_valid(reuse_packfile))
die("packfile is invalid: %s", reuse_packfile->pack_name);
fd = git_open(reuse_packfile->pack_name);
pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:09 +08:00
if (fd < 0)
die_errno("unable to open packfile for reuse: %s",
reuse_packfile->pack_name);
if (lseek(fd, sizeof(struct pack_header), SEEK_SET) == -1)
die_errno("unable to seek in reused packfile");
if (reuse_packfile_offset < 0)
reuse_packfile_offset = reuse_packfile->pack_size - 20;
pack-objects: show progress for reused packfiles When the "--all-progress" option is in effect, pack-objects shows a progress report for the "writing" phase. If the repository has bitmaps and we are reusing a packfile, the user sees no progress update until the whole packfile is sent. Since this is typically the bulk of what is being written, it can look like git hangs during this phase, even though the transfer is proceeding. This generally only happens with "git push" from a repository with bitmaps. We do not use "--all-progress" for fetch (since the result is going to index-pack on the client, which takes care of progress reporting). And for regular repacks to disk, we do not reuse packfiles. We already have the progress meter setup during write_reused_pack; we just need to call display_progress whiel we are writing out the pack. The progress meter is attached to our output descriptor, so it automatically handles the throughput measurements. However, we need to update the object count as we go, since that is what feeds the percentage we show. We aren't actually parsing the packfile as we send it, so we have no idea how many objects we have sent; we only know that at the end of N bytes, we will have sent M objects. So we cheat a little and assume each object is M/N bytes (i.e., the mean of the objects we are sending). While this isn't strictly true, it actually produces a more pleasing progress meter for the user, as it moves smoothly and predictably (and nobody really cares about the object count; they care about the percentage, and the object count is a proxy for that). One alternative would be to actually show two progress meters: one for the reused pack, and one for the rest of the objects. That would more closely reflect the data we have (the first would be measured in bytes, and the second measured in objects). But it would also be more complex and annoying to the user; rather than seeing one progress meter counting up to 100%, they would finish one meter, then start another one at zero. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-15 10:26:21 +08:00
total = to_write = reuse_packfile_offset - sizeof(struct pack_header);
pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:09 +08:00
while (to_write) {
int read_pack = xread(fd, buffer, sizeof(buffer));
if (read_pack <= 0)
die_errno("unable to read from reused packfile");
if (read_pack > to_write)
read_pack = to_write;
sha1write(f, buffer, read_pack);
to_write -= read_pack;
pack-objects: show progress for reused packfiles When the "--all-progress" option is in effect, pack-objects shows a progress report for the "writing" phase. If the repository has bitmaps and we are reusing a packfile, the user sees no progress update until the whole packfile is sent. Since this is typically the bulk of what is being written, it can look like git hangs during this phase, even though the transfer is proceeding. This generally only happens with "git push" from a repository with bitmaps. We do not use "--all-progress" for fetch (since the result is going to index-pack on the client, which takes care of progress reporting). And for regular repacks to disk, we do not reuse packfiles. We already have the progress meter setup during write_reused_pack; we just need to call display_progress whiel we are writing out the pack. The progress meter is attached to our output descriptor, so it automatically handles the throughput measurements. However, we need to update the object count as we go, since that is what feeds the percentage we show. We aren't actually parsing the packfile as we send it, so we have no idea how many objects we have sent; we only know that at the end of N bytes, we will have sent M objects. So we cheat a little and assume each object is M/N bytes (i.e., the mean of the objects we are sending). While this isn't strictly true, it actually produces a more pleasing progress meter for the user, as it moves smoothly and predictably (and nobody really cares about the object count; they care about the percentage, and the object count is a proxy for that). One alternative would be to actually show two progress meters: one for the reused pack, and one for the rest of the objects. That would more closely reflect the data we have (the first would be measured in bytes, and the second measured in objects). But it would also be more complex and annoying to the user; rather than seeing one progress meter counting up to 100%, they would finish one meter, then start another one at zero. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-15 10:26:21 +08:00
/*
* We don't know the actual number of objects written,
* only how many bytes written, how many bytes total, and
* how many objects total. So we can fake it by pretending all
* objects we are writing are the same size. This gives us a
* smooth progress meter, and at the end it matches the true
* answer.
*/
written = reuse_packfile_objects *
(((double)(total - to_write)) / total);
display_progress(progress_state, written);
pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:09 +08:00
}
close(fd);
pack-objects: show progress for reused packfiles When the "--all-progress" option is in effect, pack-objects shows a progress report for the "writing" phase. If the repository has bitmaps and we are reusing a packfile, the user sees no progress update until the whole packfile is sent. Since this is typically the bulk of what is being written, it can look like git hangs during this phase, even though the transfer is proceeding. This generally only happens with "git push" from a repository with bitmaps. We do not use "--all-progress" for fetch (since the result is going to index-pack on the client, which takes care of progress reporting). And for regular repacks to disk, we do not reuse packfiles. We already have the progress meter setup during write_reused_pack; we just need to call display_progress whiel we are writing out the pack. The progress meter is attached to our output descriptor, so it automatically handles the throughput measurements. However, we need to update the object count as we go, since that is what feeds the percentage we show. We aren't actually parsing the packfile as we send it, so we have no idea how many objects we have sent; we only know that at the end of N bytes, we will have sent M objects. So we cheat a little and assume each object is M/N bytes (i.e., the mean of the objects we are sending). While this isn't strictly true, it actually produces a more pleasing progress meter for the user, as it moves smoothly and predictably (and nobody really cares about the object count; they care about the percentage, and the object count is a proxy for that). One alternative would be to actually show two progress meters: one for the reused pack, and one for the rest of the objects. That would more closely reflect the data we have (the first would be measured in bytes, and the second measured in objects). But it would also be more complex and annoying to the user; rather than seeing one progress meter counting up to 100%, they would finish one meter, then start another one at zero. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-15 10:26:21 +08:00
written = reuse_packfile_objects;
display_progress(progress_state, written);
pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:09 +08:00
return reuse_packfile_offset - sizeof(struct pack_header);
}
static const char no_split_warning[] = N_(
"disabling bitmap writing, packs are split due to pack.packSizeLimit"
);
static void write_pack_file(void)
{
uint32_t i = 0, j;
struct sha1file *f;
off_t offset;
uint32_t nr_remaining = nr_result;
time_t last_mtime = 0;
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
struct object_entry **write_order;
if (progress > pack_to_stdout)
progress_state = start_progress(_("Writing objects"), nr_result);
ALLOC_ARRAY(written_list, to_pack.nr_objects);
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
write_order = compute_write_order();
do {
unsigned char sha1[20];
char *pack_tmp_name = NULL;
if (pack_to_stdout)
f = sha1fd_throughput(1, "<stdout>", progress_state);
else
f = create_tmp_packfile(&pack_tmp_name);
offset = write_pack_header(f, nr_remaining);
pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:09 +08:00
if (reuse_packfile) {
off_t packfile_size;
assert(pack_to_stdout);
packfile_size = write_reused_pack(f);
offset += packfile_size;
}
nr_written = 0;
for (; i < to_pack.nr_objects; i++) {
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
struct object_entry *e = write_order[i];
if (write_one(f, e, &offset) == WRITE_ONE_BREAK)
break;
display_progress(progress_state, written);
}
pack-objects: learn about pack index version 2 Pack index version 2 goes as follows: - 8 bytes of header with signature and version. - 256 entries of 4-byte first-level fan-out table. - Table of sorted 20-byte SHA1 records for each object in pack. - Table of 4-byte CRC32 entries for raw pack object data. - Table of 4-byte offset entries for objects in the pack if offset is representable with 31 bits or less, otherwise it is an index in the next table with top bit set. - Table of 8-byte offset entries indexed from previous table for offsets which are 32 bits or more (optional). - 20-byte SHA1 checksum of sorted object names. - 20-byte SHA1 checksum of the above. The object SHA1 table is all contiguous so future pack format that would contain this table directly won't require big changes to the code. It is also tighter for slightly better cache locality when looking up entries. Support for large packs exceeding 31 bits in size won't impose an index size bloat for packs within that range that don't need a 64-bit offset. And because newer objects which are likely to be the most frequently used are located at the beginning of the pack, they won't pay the 64-bit offset lookup at run time either even if the pack is large. Right now an index version 2 is created only when the biggest offset in a pack reaches 31 bits. It might be a good idea to always use index version 2 eventually to benefit from the CRC32 it contains when reusing pack data while repacking. [jc: with the "oops" fix to keep track of the last offset correctly] Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-09 13:06:33 +08:00
/*
* Did we write the wrong # entries in the header?
* If so, rewrite it like in fast-import
*/
if (pack_to_stdout) {
sha1close(f, sha1, CSUM_CLOSE);
} else if (nr_written == nr_remaining) {
sha1close(f, sha1, CSUM_FSYNC);
} else {
int fd = sha1close(f, sha1, 0);
fixup_pack_header_footer(fd, sha1, pack_tmp_name,
nr_written, sha1, offset);
close(fd);
if (write_bitmap_index) {
warning(_(no_split_warning));
write_bitmap_index = 0;
}
}
if (!pack_to_stdout) {
struct stat st;
struct strbuf tmpname = STRBUF_INIT;
/*
* Packs are runtime accessed in their mtime
* order since newer packs are more likely to contain
* younger objects. So if we are creating multiple
* packs then we should modify the mtime of later ones
* to preserve this property.
*/
if (stat(pack_tmp_name, &st) < 0) {
warning_errno("failed to stat %s", pack_tmp_name);
} else if (!last_mtime) {
last_mtime = st.st_mtime;
} else {
struct utimbuf utb;
utb.actime = st.st_atime;
utb.modtime = --last_mtime;
if (utime(pack_tmp_name, &utb) < 0)
warning_errno("failed utime() on %s", pack_tmp_name);
}
strbuf_addf(&tmpname, "%s-", base_name);
pack-objects: implement bitmap writing This commit extends more the functionality of `pack-objects` by allowing it to write out a `.bitmap` index next to any written packs, together with the `.idx` index that currently gets written. If bitmap writing is enabled for a given repository (either by calling `pack-objects` with the `--write-bitmap-index` flag or by having `pack.writebitmaps` set to `true` in the config) and pack-objects is writing a packfile that would normally be indexed (i.e. not piping to stdout), we will attempt to write the corresponding bitmap index for the packfile. Bitmap index writing happens after the packfile and its index has been successfully written to disk (`finish_tmp_packfile`). The process is performed in several steps: 1. `bitmap_writer_set_checksum`: this call stores the partial checksum for the packfile being written; the checksum will be written in the resulting bitmap index to verify its integrity 2. `bitmap_writer_build_type_index`: this call uses the array of `struct object_entry` that has just been sorted when writing out the actual packfile index to disk to generate 4 type-index bitmaps (one for each object type). These bitmaps have their nth bit set if the given object is of the bitmap's type. E.g. the nth bit of the Commits bitmap will be 1 if the nth object in the packfile index is a commit. This is a very cheap operation because the bitmap writing code has access to the metadata stored in the `struct object_entry` array, and hence the real type for each object in the packfile. 3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap index for one of the packfiles we're trying to repack, this call will efficiently rebuild the existing bitmaps so they can be reused on the new index. All the existing bitmaps will be stored in a `reuse` hash table, and the commit selection phase will prioritize these when selecting, as they can be written directly to the new index without having to perform a revision walk to fill the bitmap. This can greatly speed up the repack of a repository that already has bitmaps. 4. `bitmap_writer_select_commits`: if bitmap writing is enabled for a given `pack-objects` run, the sequence of commits generated during the Counting Objects phase will be stored in an array. We then use that array to build up the list of selected commits. Writing a bitmap in the index for each object in the repository would be cost-prohibitive, so we use a simple heuristic to pick the commits that will be indexed with bitmaps. The current heuristics are a simplified version of JGit's original implementation. We select a higher density of commits depending on their age: the 100 most recent commits are always selected, after that we pick 1 commit of each 100, and the gap increases as the commits grow older. On top of that, we make sure that every single branch that has not been merged (all the tips that would be required from a clone) gets their own bitmap, and when selecting commits between a gap, we tend to prioritize the commit with the most parents. Do note that there is no right/wrong way to perform commit selection; different selection algorithms will result in different commits being selected, but there's no such thing as "missing a commit". The bitmap walker algorithm implemented in `prepare_bitmap_walk` is able to adapt to missing bitmaps by performing manual walks that complete the bitmap: the ideal selection algorithm, however, would select the commits that are more likely to be used as roots for a walk in the future (e.g. the tips of each branch, and so on) to ensure a bitmap for them is always available. 5. `bitmap_writer_build`: this is the computationally expensive part of bitmap generation. Based on the list of commits that were selected in the previous step, we perform several incremental walks to generate the bitmap for each commit. The walks begin from the oldest commit, and are built up incrementally for each branch. E.g. consider this dag where A, B, C, D, E, F are the selected commits, and a, b, c, e are a chunk of simplified history that will not receive bitmaps. A---a---B--b--C--c--D \ E--e--F We start by building the bitmap for A, using A as the root for a revision walk and marking all the objects that are reachable until the walk is over. Once this bitmap is stored, we reuse the bitmap walker to perform the walk for B, assuming that once we reach A again, the walk will be terminated because A has already been SEEN on the previous walk. This process is repeated for C, and D, but when we try to generate the bitmaps for E, we can reuse neither the current walk nor the bitmap we have generated so far. What we do now is resetting both the walk and clearing the bitmap, and performing the walk from scratch using E as the origin. This new walk, however, does not need to be completed. Once we hit B, we can lookup the bitmap we have already stored for that commit and OR it with the existing bitmap we've composed so far, allowing us to limit the walk early. After all the bitmaps have been generated, another iteration through the list of commits is performed to find the best XOR offsets for compression before writing them to disk. Because of the incremental nature of these bitmaps, XORing one of them with its predecesor results in a minimal "bitmap delta" most of the time. We can write this delta to the on-disk bitmap index, and then re-compose the original bitmaps by XORing them again when loaded. This is a phase very similar to pack-object's `find_delta` (using bitmaps instead of objects, of course), except the heuristics have been greatly simplified: we only check the 10 bitmaps before any given one to find best compressing one. This gives good results in practice, because there is locality in the ordering of the objects (and therefore bitmaps) in the packfile. 6. `bitmap_writer_finish`: the last step in the process is serializing to disk all the bitmap data that has been generated in the two previous steps. The bitmap is written to a tmp file and then moved atomically to its final destination, using the same process as `pack-write.c:write_idx_file`. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:16 +08:00
if (write_bitmap_index) {
bitmap_writer_set_checksum(sha1);
bitmap_writer_build_type_index(written_list, nr_written);
}
finish_tmp_packfile(&tmpname, pack_tmp_name,
written_list, nr_written,
&pack_idx_opts, sha1);
pack-objects: implement bitmap writing This commit extends more the functionality of `pack-objects` by allowing it to write out a `.bitmap` index next to any written packs, together with the `.idx` index that currently gets written. If bitmap writing is enabled for a given repository (either by calling `pack-objects` with the `--write-bitmap-index` flag or by having `pack.writebitmaps` set to `true` in the config) and pack-objects is writing a packfile that would normally be indexed (i.e. not piping to stdout), we will attempt to write the corresponding bitmap index for the packfile. Bitmap index writing happens after the packfile and its index has been successfully written to disk (`finish_tmp_packfile`). The process is performed in several steps: 1. `bitmap_writer_set_checksum`: this call stores the partial checksum for the packfile being written; the checksum will be written in the resulting bitmap index to verify its integrity 2. `bitmap_writer_build_type_index`: this call uses the array of `struct object_entry` that has just been sorted when writing out the actual packfile index to disk to generate 4 type-index bitmaps (one for each object type). These bitmaps have their nth bit set if the given object is of the bitmap's type. E.g. the nth bit of the Commits bitmap will be 1 if the nth object in the packfile index is a commit. This is a very cheap operation because the bitmap writing code has access to the metadata stored in the `struct object_entry` array, and hence the real type for each object in the packfile. 3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap index for one of the packfiles we're trying to repack, this call will efficiently rebuild the existing bitmaps so they can be reused on the new index. All the existing bitmaps will be stored in a `reuse` hash table, and the commit selection phase will prioritize these when selecting, as they can be written directly to the new index without having to perform a revision walk to fill the bitmap. This can greatly speed up the repack of a repository that already has bitmaps. 4. `bitmap_writer_select_commits`: if bitmap writing is enabled for a given `pack-objects` run, the sequence of commits generated during the Counting Objects phase will be stored in an array. We then use that array to build up the list of selected commits. Writing a bitmap in the index for each object in the repository would be cost-prohibitive, so we use a simple heuristic to pick the commits that will be indexed with bitmaps. The current heuristics are a simplified version of JGit's original implementation. We select a higher density of commits depending on their age: the 100 most recent commits are always selected, after that we pick 1 commit of each 100, and the gap increases as the commits grow older. On top of that, we make sure that every single branch that has not been merged (all the tips that would be required from a clone) gets their own bitmap, and when selecting commits between a gap, we tend to prioritize the commit with the most parents. Do note that there is no right/wrong way to perform commit selection; different selection algorithms will result in different commits being selected, but there's no such thing as "missing a commit". The bitmap walker algorithm implemented in `prepare_bitmap_walk` is able to adapt to missing bitmaps by performing manual walks that complete the bitmap: the ideal selection algorithm, however, would select the commits that are more likely to be used as roots for a walk in the future (e.g. the tips of each branch, and so on) to ensure a bitmap for them is always available. 5. `bitmap_writer_build`: this is the computationally expensive part of bitmap generation. Based on the list of commits that were selected in the previous step, we perform several incremental walks to generate the bitmap for each commit. The walks begin from the oldest commit, and are built up incrementally for each branch. E.g. consider this dag where A, B, C, D, E, F are the selected commits, and a, b, c, e are a chunk of simplified history that will not receive bitmaps. A---a---B--b--C--c--D \ E--e--F We start by building the bitmap for A, using A as the root for a revision walk and marking all the objects that are reachable until the walk is over. Once this bitmap is stored, we reuse the bitmap walker to perform the walk for B, assuming that once we reach A again, the walk will be terminated because A has already been SEEN on the previous walk. This process is repeated for C, and D, but when we try to generate the bitmaps for E, we can reuse neither the current walk nor the bitmap we have generated so far. What we do now is resetting both the walk and clearing the bitmap, and performing the walk from scratch using E as the origin. This new walk, however, does not need to be completed. Once we hit B, we can lookup the bitmap we have already stored for that commit and OR it with the existing bitmap we've composed so far, allowing us to limit the walk early. After all the bitmaps have been generated, another iteration through the list of commits is performed to find the best XOR offsets for compression before writing them to disk. Because of the incremental nature of these bitmaps, XORing one of them with its predecesor results in a minimal "bitmap delta" most of the time. We can write this delta to the on-disk bitmap index, and then re-compose the original bitmaps by XORing them again when loaded. This is a phase very similar to pack-object's `find_delta` (using bitmaps instead of objects, of course), except the heuristics have been greatly simplified: we only check the 10 bitmaps before any given one to find best compressing one. This gives good results in practice, because there is locality in the ordering of the objects (and therefore bitmaps) in the packfile. 6. `bitmap_writer_finish`: the last step in the process is serializing to disk all the bitmap data that has been generated in the two previous steps. The bitmap is written to a tmp file and then moved atomically to its final destination, using the same process as `pack-write.c:write_idx_file`. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:16 +08:00
if (write_bitmap_index) {
strbuf_addf(&tmpname, "%s.bitmap", sha1_to_hex(sha1));
pack-objects: implement bitmap writing This commit extends more the functionality of `pack-objects` by allowing it to write out a `.bitmap` index next to any written packs, together with the `.idx` index that currently gets written. If bitmap writing is enabled for a given repository (either by calling `pack-objects` with the `--write-bitmap-index` flag or by having `pack.writebitmaps` set to `true` in the config) and pack-objects is writing a packfile that would normally be indexed (i.e. not piping to stdout), we will attempt to write the corresponding bitmap index for the packfile. Bitmap index writing happens after the packfile and its index has been successfully written to disk (`finish_tmp_packfile`). The process is performed in several steps: 1. `bitmap_writer_set_checksum`: this call stores the partial checksum for the packfile being written; the checksum will be written in the resulting bitmap index to verify its integrity 2. `bitmap_writer_build_type_index`: this call uses the array of `struct object_entry` that has just been sorted when writing out the actual packfile index to disk to generate 4 type-index bitmaps (one for each object type). These bitmaps have their nth bit set if the given object is of the bitmap's type. E.g. the nth bit of the Commits bitmap will be 1 if the nth object in the packfile index is a commit. This is a very cheap operation because the bitmap writing code has access to the metadata stored in the `struct object_entry` array, and hence the real type for each object in the packfile. 3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap index for one of the packfiles we're trying to repack, this call will efficiently rebuild the existing bitmaps so they can be reused on the new index. All the existing bitmaps will be stored in a `reuse` hash table, and the commit selection phase will prioritize these when selecting, as they can be written directly to the new index without having to perform a revision walk to fill the bitmap. This can greatly speed up the repack of a repository that already has bitmaps. 4. `bitmap_writer_select_commits`: if bitmap writing is enabled for a given `pack-objects` run, the sequence of commits generated during the Counting Objects phase will be stored in an array. We then use that array to build up the list of selected commits. Writing a bitmap in the index for each object in the repository would be cost-prohibitive, so we use a simple heuristic to pick the commits that will be indexed with bitmaps. The current heuristics are a simplified version of JGit's original implementation. We select a higher density of commits depending on their age: the 100 most recent commits are always selected, after that we pick 1 commit of each 100, and the gap increases as the commits grow older. On top of that, we make sure that every single branch that has not been merged (all the tips that would be required from a clone) gets their own bitmap, and when selecting commits between a gap, we tend to prioritize the commit with the most parents. Do note that there is no right/wrong way to perform commit selection; different selection algorithms will result in different commits being selected, but there's no such thing as "missing a commit". The bitmap walker algorithm implemented in `prepare_bitmap_walk` is able to adapt to missing bitmaps by performing manual walks that complete the bitmap: the ideal selection algorithm, however, would select the commits that are more likely to be used as roots for a walk in the future (e.g. the tips of each branch, and so on) to ensure a bitmap for them is always available. 5. `bitmap_writer_build`: this is the computationally expensive part of bitmap generation. Based on the list of commits that were selected in the previous step, we perform several incremental walks to generate the bitmap for each commit. The walks begin from the oldest commit, and are built up incrementally for each branch. E.g. consider this dag where A, B, C, D, E, F are the selected commits, and a, b, c, e are a chunk of simplified history that will not receive bitmaps. A---a---B--b--C--c--D \ E--e--F We start by building the bitmap for A, using A as the root for a revision walk and marking all the objects that are reachable until the walk is over. Once this bitmap is stored, we reuse the bitmap walker to perform the walk for B, assuming that once we reach A again, the walk will be terminated because A has already been SEEN on the previous walk. This process is repeated for C, and D, but when we try to generate the bitmaps for E, we can reuse neither the current walk nor the bitmap we have generated so far. What we do now is resetting both the walk and clearing the bitmap, and performing the walk from scratch using E as the origin. This new walk, however, does not need to be completed. Once we hit B, we can lookup the bitmap we have already stored for that commit and OR it with the existing bitmap we've composed so far, allowing us to limit the walk early. After all the bitmaps have been generated, another iteration through the list of commits is performed to find the best XOR offsets for compression before writing them to disk. Because of the incremental nature of these bitmaps, XORing one of them with its predecesor results in a minimal "bitmap delta" most of the time. We can write this delta to the on-disk bitmap index, and then re-compose the original bitmaps by XORing them again when loaded. This is a phase very similar to pack-object's `find_delta` (using bitmaps instead of objects, of course), except the heuristics have been greatly simplified: we only check the 10 bitmaps before any given one to find best compressing one. This gives good results in practice, because there is locality in the ordering of the objects (and therefore bitmaps) in the packfile. 6. `bitmap_writer_finish`: the last step in the process is serializing to disk all the bitmap data that has been generated in the two previous steps. The bitmap is written to a tmp file and then moved atomically to its final destination, using the same process as `pack-write.c:write_idx_file`. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:16 +08:00
stop_progress(&progress_state);
bitmap_writer_show_progress(progress);
bitmap_writer_reuse_bitmaps(&to_pack);
bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1);
bitmap_writer_build(&to_pack);
pack-bitmap: implement optional name_hash cache When we use pack bitmaps rather than walking the object graph, we end up with the list of objects to include in the packfile, but we do not know the path at which any tree or blob objects would be found. In a recently packed repository, this is fine. A fetch would use the paths only as a heuristic in the delta compression phase, and a fully packed repository should not need to do much delta compression. As time passes, though, we may acquire more objects on top of our large bitmapped pack. If clients fetch frequently, then they never even look at the bitmapped history, and all works as usual. However, a client who has not fetched since the last bitmap repack will have "have" tips in the bitmapped history, but "want" newer objects. The bitmaps themselves degrade gracefully in this circumstance. We manually walk the more recent bits of history, and then use bitmaps when we hit them. But we would also like to perform delta compression between the newer objects and the bitmapped objects (both to delta against what we know the user already has, but also between "new" and "old" objects that the user is fetching). The lack of pathnames makes our delta heuristics much less effective. This patch adds an optional cache of the 32-bit name_hash values to the end of the bitmap file. If present, a reader can use it to match bitmapped and non-bitmapped names during delta compression. Here are perf results for p5310: Test origin/master HEAD^ HEAD ------------------------------------------------------------------------------------------------- 5310.2: repack to disk 36.81(37.82+1.43) 47.70(48.74+1.41) +29.6% 47.75(48.70+1.51) +29.7% 5310.3: simulated clone 30.78(29.70+2.14) 1.08(0.97+0.10) -96.5% 1.07(0.94+0.12) -96.5% 5310.4: simulated fetch 3.16(6.10+0.08) 3.54(10.65+0.06) +12.0% 1.70(3.07+0.06) -46.2% 5310.6: partial bitmap 36.76(43.19+1.81) 6.71(11.25+0.76) -81.7% 4.08(6.26+0.46) -88.9% You can see that the time spent on an incremental fetch goes down, as our delta heuristics are able to do their work. And we save time on the partial bitmap clone for the same reason. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:45 +08:00
bitmap_writer_finish(written_list, nr_written,
tmpname.buf, write_bitmap_options);
pack-objects: implement bitmap writing This commit extends more the functionality of `pack-objects` by allowing it to write out a `.bitmap` index next to any written packs, together with the `.idx` index that currently gets written. If bitmap writing is enabled for a given repository (either by calling `pack-objects` with the `--write-bitmap-index` flag or by having `pack.writebitmaps` set to `true` in the config) and pack-objects is writing a packfile that would normally be indexed (i.e. not piping to stdout), we will attempt to write the corresponding bitmap index for the packfile. Bitmap index writing happens after the packfile and its index has been successfully written to disk (`finish_tmp_packfile`). The process is performed in several steps: 1. `bitmap_writer_set_checksum`: this call stores the partial checksum for the packfile being written; the checksum will be written in the resulting bitmap index to verify its integrity 2. `bitmap_writer_build_type_index`: this call uses the array of `struct object_entry` that has just been sorted when writing out the actual packfile index to disk to generate 4 type-index bitmaps (one for each object type). These bitmaps have their nth bit set if the given object is of the bitmap's type. E.g. the nth bit of the Commits bitmap will be 1 if the nth object in the packfile index is a commit. This is a very cheap operation because the bitmap writing code has access to the metadata stored in the `struct object_entry` array, and hence the real type for each object in the packfile. 3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap index for one of the packfiles we're trying to repack, this call will efficiently rebuild the existing bitmaps so they can be reused on the new index. All the existing bitmaps will be stored in a `reuse` hash table, and the commit selection phase will prioritize these when selecting, as they can be written directly to the new index without having to perform a revision walk to fill the bitmap. This can greatly speed up the repack of a repository that already has bitmaps. 4. `bitmap_writer_select_commits`: if bitmap writing is enabled for a given `pack-objects` run, the sequence of commits generated during the Counting Objects phase will be stored in an array. We then use that array to build up the list of selected commits. Writing a bitmap in the index for each object in the repository would be cost-prohibitive, so we use a simple heuristic to pick the commits that will be indexed with bitmaps. The current heuristics are a simplified version of JGit's original implementation. We select a higher density of commits depending on their age: the 100 most recent commits are always selected, after that we pick 1 commit of each 100, and the gap increases as the commits grow older. On top of that, we make sure that every single branch that has not been merged (all the tips that would be required from a clone) gets their own bitmap, and when selecting commits between a gap, we tend to prioritize the commit with the most parents. Do note that there is no right/wrong way to perform commit selection; different selection algorithms will result in different commits being selected, but there's no such thing as "missing a commit". The bitmap walker algorithm implemented in `prepare_bitmap_walk` is able to adapt to missing bitmaps by performing manual walks that complete the bitmap: the ideal selection algorithm, however, would select the commits that are more likely to be used as roots for a walk in the future (e.g. the tips of each branch, and so on) to ensure a bitmap for them is always available. 5. `bitmap_writer_build`: this is the computationally expensive part of bitmap generation. Based on the list of commits that were selected in the previous step, we perform several incremental walks to generate the bitmap for each commit. The walks begin from the oldest commit, and are built up incrementally for each branch. E.g. consider this dag where A, B, C, D, E, F are the selected commits, and a, b, c, e are a chunk of simplified history that will not receive bitmaps. A---a---B--b--C--c--D \ E--e--F We start by building the bitmap for A, using A as the root for a revision walk and marking all the objects that are reachable until the walk is over. Once this bitmap is stored, we reuse the bitmap walker to perform the walk for B, assuming that once we reach A again, the walk will be terminated because A has already been SEEN on the previous walk. This process is repeated for C, and D, but when we try to generate the bitmaps for E, we can reuse neither the current walk nor the bitmap we have generated so far. What we do now is resetting both the walk and clearing the bitmap, and performing the walk from scratch using E as the origin. This new walk, however, does not need to be completed. Once we hit B, we can lookup the bitmap we have already stored for that commit and OR it with the existing bitmap we've composed so far, allowing us to limit the walk early. After all the bitmaps have been generated, another iteration through the list of commits is performed to find the best XOR offsets for compression before writing them to disk. Because of the incremental nature of these bitmaps, XORing one of them with its predecesor results in a minimal "bitmap delta" most of the time. We can write this delta to the on-disk bitmap index, and then re-compose the original bitmaps by XORing them again when loaded. This is a phase very similar to pack-object's `find_delta` (using bitmaps instead of objects, of course), except the heuristics have been greatly simplified: we only check the 10 bitmaps before any given one to find best compressing one. This gives good results in practice, because there is locality in the ordering of the objects (and therefore bitmaps) in the packfile. 6. `bitmap_writer_finish`: the last step in the process is serializing to disk all the bitmap data that has been generated in the two previous steps. The bitmap is written to a tmp file and then moved atomically to its final destination, using the same process as `pack-write.c:write_idx_file`. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:16 +08:00
write_bitmap_index = 0;
}
strbuf_release(&tmpname);
free(pack_tmp_name);
puts(sha1_to_hex(sha1));
}
/* mark written objects as written to previous pack */
for (j = 0; j < nr_written; j++) {
written_list[j]->offset = (off_t)-1;
}
nr_remaining -= nr_written;
} while (nr_remaining && i < to_pack.nr_objects);
free(written_list);
pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) | git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-01 07:21:58 +08:00
free(write_order);
stop_progress(&progress_state);
if (written != nr_result)
die("wrote %"PRIu32" objects while expecting %"PRIu32,
written, nr_result);
}
static void setup_delta_attr_check(struct git_attr_check *check)
{
static struct git_attr *attr_delta;
if (!attr_delta)
attr_delta = git_attr("delta");
check[0].attr = attr_delta;
}
static int no_try_delta(const char *path)
{
struct git_attr_check check[1];
setup_delta_attr_check(check);
if (git_check_attr(path, ARRAY_SIZE(check), check))
return 0;
if (ATTR_FALSE(check->value))
return 1;
return 0;
}
/*
* When adding an object, check whether we have already added it
* to our packing list. If so, we can skip. However, if we are
* being asked to excludei t, but the previous mention was to include
* it, make sure to adjust its flags and tweak our numbers accordingly.
*
* As an optimization, we pass out the index position where we would have
* found the item, since that saves us from having to look it up again a
* few lines later when we want to add the new entry.
*/
static int have_duplicate_entry(const unsigned char *sha1,
int exclude,
uint32_t *index_pos)
{
struct object_entry *entry;
entry = packlist_find(&to_pack, sha1, index_pos);
if (!entry)
return 0;
if (exclude) {
if (!entry->preferred_base)
nr_result--;
entry->preferred_base = 1;
}
return 1;
}
pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there are two codepaths in pack-objects: with & without using bitmap reachability index. However add_object_entry_from_bitmap(), despite its non-bitmapped counterpart add_object_entry(), in no way does check for whether --local or --honor-pack-keep or --incremental should be respected. In non-bitmapped codepath this is handled in want_object_in_pack(), but bitmapped codepath has simply no such checking at all. The bitmapped codepath however was allowing to pass in all those options and with bitmap indices still being used under such conditions - potentially giving wrong output (e.g. including objects from non-local or .keep'ed pack). We can easily fix this by noting the following: when an object comes to add_object_entry_from_bitmap() it can come for two reasons: 1. entries coming from main pack covered by bitmap index, and 2. object coming from, possibly alternate, loose or other packs. "2" can be already handled by want_object_in_pack() and to cover "1" we can teach want_object_in_pack() to expect that *found_pack can be non-NULL, meaning calling client already found object's pack entry. In want_object_in_pack() we care to start the checks from already found pack, if we have one, this way determining the answer right away in case neither --local nor --honour-pack-keep are active. In particular, as p5310-pack-bitmaps.sh shows (3 consecutive runs), we do not do harm to served-with-bitmap clones performance-wise: Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.08(8.20+0.25) 9.09(8.14+0.32) +0.1% 5310.3: simulated clone 1.92(2.12+0.08) 1.93(2.12+0.09) +0.5% 5310.4: simulated fetch 0.82(1.07+0.04) 0.82(1.06+0.04) +0.0% 5310.6: partial bitmap 1.96(2.42+0.13) 1.95(2.40+0.15) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.11(8.16+0.32) 9.11(8.19+0.28) +0.0% 5310.3: simulated clone 1.93(2.14+0.07) 1.92(2.11+0.10) -0.5% 5310.4: simulated fetch 0.82(1.06+0.04) 0.82(1.04+0.05) +0.0% 5310.6: partial bitmap 1.95(2.38+0.16) 1.94(2.39+0.14) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.13(8.17+0.31) 9.07(8.13+0.28) -0.7% 5310.3: simulated clone 1.92(2.13+0.07) 1.91(2.12+0.06) -0.5% 5310.4: simulated fetch 0.82(1.08+0.03) 0.82(1.08+0.03) +0.0% 5310.6: partial bitmap 1.96(2.43+0.14) 1.96(2.42+0.14) +0.0% with delta timings showing they are all within noise from run to run. In the general case we do not want to call find_pack_entry_one() more than once, because it is expensive. This patch splits the loop in want_object_in_pack() into two parts: finding the object and seeing if it impacts our choice to include it in the pack. We may call the inexpensive want_found_object() twice, but we will never call find_pack_entry_one() if we do not need to. I appreciate help and discussing this change with Junio C Hamano and Jeff King. Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-10 23:01:10 +08:00
static int want_found_object(int exclude, struct packed_git *p)
{
if (exclude)
return 1;
if (incremental)
return 0;
/*
* When asked to do --local (do not include an object that appears in a
* pack we borrow from elsewhere) or --honor-pack-keep (do not include
* an object that appears in a pack marked with .keep), finding a pack
* that matches the criteria is sufficient for us to decide to omit it.
* However, even if this pack does not satisfy the criteria, we need to
* make sure no copy of this object appears in _any_ pack that makes us
* to omit the object, so we need to check all the packs.
*
* We can however first check whether these options can possible matter;
* if they do not matter we know we want the object in generated pack.
* Otherwise, we signal "-1" at the end to tell the caller that we do
* not know either way, and it needs to check more packs.
*/
if (!ignore_packed_keep &&
(!local || !have_non_local_packs))
return 1;
if (local && !p->pack_local)
return 0;
if (ignore_packed_keep && p->pack_local && p->pack_keep)
return 0;
/* we don't know yet; keep looking for more packs */
return -1;
}
/*
* Check whether we want the object in the pack (e.g., we do not want
* objects found in non-local stores if the "--local" option was used).
*
pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there are two codepaths in pack-objects: with & without using bitmap reachability index. However add_object_entry_from_bitmap(), despite its non-bitmapped counterpart add_object_entry(), in no way does check for whether --local or --honor-pack-keep or --incremental should be respected. In non-bitmapped codepath this is handled in want_object_in_pack(), but bitmapped codepath has simply no such checking at all. The bitmapped codepath however was allowing to pass in all those options and with bitmap indices still being used under such conditions - potentially giving wrong output (e.g. including objects from non-local or .keep'ed pack). We can easily fix this by noting the following: when an object comes to add_object_entry_from_bitmap() it can come for two reasons: 1. entries coming from main pack covered by bitmap index, and 2. object coming from, possibly alternate, loose or other packs. "2" can be already handled by want_object_in_pack() and to cover "1" we can teach want_object_in_pack() to expect that *found_pack can be non-NULL, meaning calling client already found object's pack entry. In want_object_in_pack() we care to start the checks from already found pack, if we have one, this way determining the answer right away in case neither --local nor --honour-pack-keep are active. In particular, as p5310-pack-bitmaps.sh shows (3 consecutive runs), we do not do harm to served-with-bitmap clones performance-wise: Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.08(8.20+0.25) 9.09(8.14+0.32) +0.1% 5310.3: simulated clone 1.92(2.12+0.08) 1.93(2.12+0.09) +0.5% 5310.4: simulated fetch 0.82(1.07+0.04) 0.82(1.06+0.04) +0.0% 5310.6: partial bitmap 1.96(2.42+0.13) 1.95(2.40+0.15) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.11(8.16+0.32) 9.11(8.19+0.28) +0.0% 5310.3: simulated clone 1.93(2.14+0.07) 1.92(2.11+0.10) -0.5% 5310.4: simulated fetch 0.82(1.06+0.04) 0.82(1.04+0.05) +0.0% 5310.6: partial bitmap 1.95(2.38+0.16) 1.94(2.39+0.14) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.13(8.17+0.31) 9.07(8.13+0.28) -0.7% 5310.3: simulated clone 1.92(2.13+0.07) 1.91(2.12+0.06) -0.5% 5310.4: simulated fetch 0.82(1.08+0.03) 0.82(1.08+0.03) +0.0% 5310.6: partial bitmap 1.96(2.43+0.14) 1.96(2.42+0.14) +0.0% with delta timings showing they are all within noise from run to run. In the general case we do not want to call find_pack_entry_one() more than once, because it is expensive. This patch splits the loop in want_object_in_pack() into two parts: finding the object and seeing if it impacts our choice to include it in the pack. We may call the inexpensive want_found_object() twice, but we will never call find_pack_entry_one() if we do not need to. I appreciate help and discussing this change with Junio C Hamano and Jeff King. Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-10 23:01:10 +08:00
* If the caller already knows an existing pack it wants to take the object
* from, that is passed in *found_pack and *found_offset; otherwise this
* function finds if there is any pack that has the object and returns the pack
* and its offset in these variables.
*/
static int want_object_in_pack(const unsigned char *sha1,
int exclude,
struct packed_git **found_pack,
off_t *found_offset)
{
pack-objects: use mru list when iterating over packs In the original implementation of want_object_in_pack(), we always looked for the object in every pack, so the order did not matter for performance. As of the last few patches, however, we can now often break out of the loop early after finding the first instance, and avoid looking in the other packs at all. In this case, pack order can make a big difference, because we'd like to find the objects by looking at as few packs as possible. This patch switches us to the same packed_git_mru list that is now used by normal object lookups. Here are timings for p5303 on linux.git: Test HEAD^ HEAD ------------------------------------------------------------------------ 5303.3: rev-list (1) 31.31(31.07+0.23) 31.28(31.00+0.27) -0.1% 5303.4: repack (1) 40.35(38.84+2.60) 40.53(39.31+2.32) +0.4% 5303.6: rev-list (50) 31.37(31.15+0.21) 31.41(31.16+0.24) +0.1% 5303.7: repack (50) 58.25(68.54+2.03) 47.28(57.66+1.89) -18.8% 5303.9: rev-list (1000) 31.91(31.57+0.33) 31.93(31.64+0.28) +0.1% 5303.10: repack (1000) 304.80(376.00+3.92) 87.21(159.54+2.84) -71.4% The rev-list numbers are unchanged, which makes sense (they are not exercising this code at all). The 50- and 1000-pack repack cases show considerable improvement. The single-pack repack case doesn't, of course; there's nothing to improve. In fact, it gives us a baseline for how fast we could possibly go. You can see that though rev-list can approach the single-pack case even with 1000 packs, repack doesn't. The reason is simple: the loop we are optimizing is only part of what the repack is doing. After the "counting" phase, we do delta compression, which is much more expensive when there are multiple packs, because we have fewer deltas we can reuse (you can also see that these numbers come from a multicore machine; the CPU times are much higher than the wall-clock times due to the delta phase). So the good news is that in cases with many packs, we used to be dominated by the "counting" phase, and now we are dominated by the delta compression (which is faster, and which we have already parallelized). Here are similar numbers for git.git: Test HEAD^ HEAD --------------------------------------------------------------------- 5303.3: rev-list (1) 1.55(1.51+0.02) 1.54(1.53+0.00) -0.6% 5303.4: repack (1) 1.82(1.80+0.08) 1.82(1.78+0.09) +0.0% 5303.6: rev-list (50) 1.58(1.57+0.00) 1.58(1.56+0.01) +0.0% 5303.7: repack (50) 2.50(3.12+0.07) 2.31(2.95+0.06) -7.6% 5303.9: rev-list (1000) 2.22(2.20+0.02) 2.23(2.19+0.03) +0.5% 5303.10: repack (1000) 10.47(16.78+0.22) 7.50(13.76+0.22) -28.4% Not as impressive in terms of percentage, but still measurable wins. If you look at the wall-clock time improvements in the 1000-pack case, you can see that linux improved by roughly 10x as many seconds as git. That's because it has roughly 10x as many objects, and we'd expect this improvement to scale linearly with the number of objects (since the number of packs is kept constant). It's just that the "counting" phase is a smaller percentage of the total time spent for a git.git repack, and hence the percentage win is smaller. The implementation itself is a straightforward use of the MRU code. We only bother marking a pack as used when we know that we are able to break early out of the loop, for two reasons: 1. If we can't break out early, it does no good; we have to visit each pack anyway, so we might as well avoid even the minor overhead of managing the cache order. 2. The mru_mark() function reorders the list, which would screw up our traversal. So it is only safe to mark when we are about to break out of the loop. We could record the found pack and mark it after the loop finishes, of course, but that's more complicated and it doesn't buy us anything due to (1). Note that this reordering does have a potential impact on the final pack, as we store only a single "found" pack for each object, even if it is present in multiple packs. In principle, any copy is acceptable, as they all refer to the same content. But in practice, they may differ in whether they are stored as deltas, against which base, etc. This may have an impact on delta reuse, and even the delta search (since we skip pairs that were already in the same pack). It's not clear whether this change of order would hurt or even help average cases, though. The most likely reason to have duplicate objects is from the completion of thin packs (e.g., you have some objects in a "base" pack, then receive several pushes; the packs you receive may be thin on the wire, with deltas that refer to bases outside the pack, but we complete them with duplicate base objects when indexing them). In such a case the current code would always find the thin duplicates (because we currently walk the packs in reverse chronological order). Whereas with this patch, some of those duplicates would be found in the base pack instead. In my tests repacking a real-world case of linux.git with 3600 thin-pack pushes (on top of a large "base" pack), the resulting pack was about 0.04% larger with this patch. On the other hand, because we were more likely to hit the base pack, there were more opportunities for delta reuse, and we had 50,000 fewer objects to examine in the delta search. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 17:26:57 +08:00
struct mru_entry *entry;
pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there are two codepaths in pack-objects: with & without using bitmap reachability index. However add_object_entry_from_bitmap(), despite its non-bitmapped counterpart add_object_entry(), in no way does check for whether --local or --honor-pack-keep or --incremental should be respected. In non-bitmapped codepath this is handled in want_object_in_pack(), but bitmapped codepath has simply no such checking at all. The bitmapped codepath however was allowing to pass in all those options and with bitmap indices still being used under such conditions - potentially giving wrong output (e.g. including objects from non-local or .keep'ed pack). We can easily fix this by noting the following: when an object comes to add_object_entry_from_bitmap() it can come for two reasons: 1. entries coming from main pack covered by bitmap index, and 2. object coming from, possibly alternate, loose or other packs. "2" can be already handled by want_object_in_pack() and to cover "1" we can teach want_object_in_pack() to expect that *found_pack can be non-NULL, meaning calling client already found object's pack entry. In want_object_in_pack() we care to start the checks from already found pack, if we have one, this way determining the answer right away in case neither --local nor --honour-pack-keep are active. In particular, as p5310-pack-bitmaps.sh shows (3 consecutive runs), we do not do harm to served-with-bitmap clones performance-wise: Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.08(8.20+0.25) 9.09(8.14+0.32) +0.1% 5310.3: simulated clone 1.92(2.12+0.08) 1.93(2.12+0.09) +0.5% 5310.4: simulated fetch 0.82(1.07+0.04) 0.82(1.06+0.04) +0.0% 5310.6: partial bitmap 1.96(2.42+0.13) 1.95(2.40+0.15) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.11(8.16+0.32) 9.11(8.19+0.28) +0.0% 5310.3: simulated clone 1.93(2.14+0.07) 1.92(2.11+0.10) -0.5% 5310.4: simulated fetch 0.82(1.06+0.04) 0.82(1.04+0.05) +0.0% 5310.6: partial bitmap 1.95(2.38+0.16) 1.94(2.39+0.14) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.13(8.17+0.31) 9.07(8.13+0.28) -0.7% 5310.3: simulated clone 1.92(2.13+0.07) 1.91(2.12+0.06) -0.5% 5310.4: simulated fetch 0.82(1.08+0.03) 0.82(1.08+0.03) +0.0% 5310.6: partial bitmap 1.96(2.43+0.14) 1.96(2.42+0.14) +0.0% with delta timings showing they are all within noise from run to run. In the general case we do not want to call find_pack_entry_one() more than once, because it is expensive. This patch splits the loop in want_object_in_pack() into two parts: finding the object and seeing if it impacts our choice to include it in the pack. We may call the inexpensive want_found_object() twice, but we will never call find_pack_entry_one() if we do not need to. I appreciate help and discussing this change with Junio C Hamano and Jeff King. Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-10 23:01:10 +08:00
int want;
if (!exclude && local && has_loose_object_nonlocal(sha1))
return 0;
pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there are two codepaths in pack-objects: with & without using bitmap reachability index. However add_object_entry_from_bitmap(), despite its non-bitmapped counterpart add_object_entry(), in no way does check for whether --local or --honor-pack-keep or --incremental should be respected. In non-bitmapped codepath this is handled in want_object_in_pack(), but bitmapped codepath has simply no such checking at all. The bitmapped codepath however was allowing to pass in all those options and with bitmap indices still being used under such conditions - potentially giving wrong output (e.g. including objects from non-local or .keep'ed pack). We can easily fix this by noting the following: when an object comes to add_object_entry_from_bitmap() it can come for two reasons: 1. entries coming from main pack covered by bitmap index, and 2. object coming from, possibly alternate, loose or other packs. "2" can be already handled by want_object_in_pack() and to cover "1" we can teach want_object_in_pack() to expect that *found_pack can be non-NULL, meaning calling client already found object's pack entry. In want_object_in_pack() we care to start the checks from already found pack, if we have one, this way determining the answer right away in case neither --local nor --honour-pack-keep are active. In particular, as p5310-pack-bitmaps.sh shows (3 consecutive runs), we do not do harm to served-with-bitmap clones performance-wise: Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.08(8.20+0.25) 9.09(8.14+0.32) +0.1% 5310.3: simulated clone 1.92(2.12+0.08) 1.93(2.12+0.09) +0.5% 5310.4: simulated fetch 0.82(1.07+0.04) 0.82(1.06+0.04) +0.0% 5310.6: partial bitmap 1.96(2.42+0.13) 1.95(2.40+0.15) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.11(8.16+0.32) 9.11(8.19+0.28) +0.0% 5310.3: simulated clone 1.93(2.14+0.07) 1.92(2.11+0.10) -0.5% 5310.4: simulated fetch 0.82(1.06+0.04) 0.82(1.04+0.05) +0.0% 5310.6: partial bitmap 1.95(2.38+0.16) 1.94(2.39+0.14) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.13(8.17+0.31) 9.07(8.13+0.28) -0.7% 5310.3: simulated clone 1.92(2.13+0.07) 1.91(2.12+0.06) -0.5% 5310.4: simulated fetch 0.82(1.08+0.03) 0.82(1.08+0.03) +0.0% 5310.6: partial bitmap 1.96(2.43+0.14) 1.96(2.42+0.14) +0.0% with delta timings showing they are all within noise from run to run. In the general case we do not want to call find_pack_entry_one() more than once, because it is expensive. This patch splits the loop in want_object_in_pack() into two parts: finding the object and seeing if it impacts our choice to include it in the pack. We may call the inexpensive want_found_object() twice, but we will never call find_pack_entry_one() if we do not need to. I appreciate help and discussing this change with Junio C Hamano and Jeff King. Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-10 23:01:10 +08:00
/*
* If we already know the pack object lives in, start checks from that
* pack - in the usual case when neither --local was given nor .keep files
* are present we will determine the answer right now.
*/
if (*found_pack) {
want = want_found_object(exclude, *found_pack);
if (want != -1)
return want;
}
pack-objects: use mru list when iterating over packs In the original implementation of want_object_in_pack(), we always looked for the object in every pack, so the order did not matter for performance. As of the last few patches, however, we can now often break out of the loop early after finding the first instance, and avoid looking in the other packs at all. In this case, pack order can make a big difference, because we'd like to find the objects by looking at as few packs as possible. This patch switches us to the same packed_git_mru list that is now used by normal object lookups. Here are timings for p5303 on linux.git: Test HEAD^ HEAD ------------------------------------------------------------------------ 5303.3: rev-list (1) 31.31(31.07+0.23) 31.28(31.00+0.27) -0.1% 5303.4: repack (1) 40.35(38.84+2.60) 40.53(39.31+2.32) +0.4% 5303.6: rev-list (50) 31.37(31.15+0.21) 31.41(31.16+0.24) +0.1% 5303.7: repack (50) 58.25(68.54+2.03) 47.28(57.66+1.89) -18.8% 5303.9: rev-list (1000) 31.91(31.57+0.33) 31.93(31.64+0.28) +0.1% 5303.10: repack (1000) 304.80(376.00+3.92) 87.21(159.54+2.84) -71.4% The rev-list numbers are unchanged, which makes sense (they are not exercising this code at all). The 50- and 1000-pack repack cases show considerable improvement. The single-pack repack case doesn't, of course; there's nothing to improve. In fact, it gives us a baseline for how fast we could possibly go. You can see that though rev-list can approach the single-pack case even with 1000 packs, repack doesn't. The reason is simple: the loop we are optimizing is only part of what the repack is doing. After the "counting" phase, we do delta compression, which is much more expensive when there are multiple packs, because we have fewer deltas we can reuse (you can also see that these numbers come from a multicore machine; the CPU times are much higher than the wall-clock times due to the delta phase). So the good news is that in cases with many packs, we used to be dominated by the "counting" phase, and now we are dominated by the delta compression (which is faster, and which we have already parallelized). Here are similar numbers for git.git: Test HEAD^ HEAD --------------------------------------------------------------------- 5303.3: rev-list (1) 1.55(1.51+0.02) 1.54(1.53+0.00) -0.6% 5303.4: repack (1) 1.82(1.80+0.08) 1.82(1.78+0.09) +0.0% 5303.6: rev-list (50) 1.58(1.57+0.00) 1.58(1.56+0.01) +0.0% 5303.7: repack (50) 2.50(3.12+0.07) 2.31(2.95+0.06) -7.6% 5303.9: rev-list (1000) 2.22(2.20+0.02) 2.23(2.19+0.03) +0.5% 5303.10: repack (1000) 10.47(16.78+0.22) 7.50(13.76+0.22) -28.4% Not as impressive in terms of percentage, but still measurable wins. If you look at the wall-clock time improvements in the 1000-pack case, you can see that linux improved by roughly 10x as many seconds as git. That's because it has roughly 10x as many objects, and we'd expect this improvement to scale linearly with the number of objects (since the number of packs is kept constant). It's just that the "counting" phase is a smaller percentage of the total time spent for a git.git repack, and hence the percentage win is smaller. The implementation itself is a straightforward use of the MRU code. We only bother marking a pack as used when we know that we are able to break early out of the loop, for two reasons: 1. If we can't break out early, it does no good; we have to visit each pack anyway, so we might as well avoid even the minor overhead of managing the cache order. 2. The mru_mark() function reorders the list, which would screw up our traversal. So it is only safe to mark when we are about to break out of the loop. We could record the found pack and mark it after the loop finishes, of course, but that's more complicated and it doesn't buy us anything due to (1). Note that this reordering does have a potential impact on the final pack, as we store only a single "found" pack for each object, even if it is present in multiple packs. In principle, any copy is acceptable, as they all refer to the same content. But in practice, they may differ in whether they are stored as deltas, against which base, etc. This may have an impact on delta reuse, and even the delta search (since we skip pairs that were already in the same pack). It's not clear whether this change of order would hurt or even help average cases, though. The most likely reason to have duplicate objects is from the completion of thin packs (e.g., you have some objects in a "base" pack, then receive several pushes; the packs you receive may be thin on the wire, with deltas that refer to bases outside the pack, but we complete them with duplicate base objects when indexing them). In such a case the current code would always find the thin duplicates (because we currently walk the packs in reverse chronological order). Whereas with this patch, some of those duplicates would be found in the base pack instead. In my tests repacking a real-world case of linux.git with 3600 thin-pack pushes (on top of a large "base" pack), the resulting pack was about 0.04% larger with this patch. On the other hand, because we were more likely to hit the base pack, there were more opportunities for delta reuse, and we had 50,000 fewer objects to examine in the delta search. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 17:26:57 +08:00
for (entry = packed_git_mru->head; entry; entry = entry->next) {
struct packed_git *p = entry->item;
pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there are two codepaths in pack-objects: with & without using bitmap reachability index. However add_object_entry_from_bitmap(), despite its non-bitmapped counterpart add_object_entry(), in no way does check for whether --local or --honor-pack-keep or --incremental should be respected. In non-bitmapped codepath this is handled in want_object_in_pack(), but bitmapped codepath has simply no such checking at all. The bitmapped codepath however was allowing to pass in all those options and with bitmap indices still being used under such conditions - potentially giving wrong output (e.g. including objects from non-local or .keep'ed pack). We can easily fix this by noting the following: when an object comes to add_object_entry_from_bitmap() it can come for two reasons: 1. entries coming from main pack covered by bitmap index, and 2. object coming from, possibly alternate, loose or other packs. "2" can be already handled by want_object_in_pack() and to cover "1" we can teach want_object_in_pack() to expect that *found_pack can be non-NULL, meaning calling client already found object's pack entry. In want_object_in_pack() we care to start the checks from already found pack, if we have one, this way determining the answer right away in case neither --local nor --honour-pack-keep are active. In particular, as p5310-pack-bitmaps.sh shows (3 consecutive runs), we do not do harm to served-with-bitmap clones performance-wise: Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.08(8.20+0.25) 9.09(8.14+0.32) +0.1% 5310.3: simulated clone 1.92(2.12+0.08) 1.93(2.12+0.09) +0.5% 5310.4: simulated fetch 0.82(1.07+0.04) 0.82(1.06+0.04) +0.0% 5310.6: partial bitmap 1.96(2.42+0.13) 1.95(2.40+0.15) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.11(8.16+0.32) 9.11(8.19+0.28) +0.0% 5310.3: simulated clone 1.93(2.14+0.07) 1.92(2.11+0.10) -0.5% 5310.4: simulated fetch 0.82(1.06+0.04) 0.82(1.04+0.05) +0.0% 5310.6: partial bitmap 1.95(2.38+0.16) 1.94(2.39+0.14) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.13(8.17+0.31) 9.07(8.13+0.28) -0.7% 5310.3: simulated clone 1.92(2.13+0.07) 1.91(2.12+0.06) -0.5% 5310.4: simulated fetch 0.82(1.08+0.03) 0.82(1.08+0.03) +0.0% 5310.6: partial bitmap 1.96(2.43+0.14) 1.96(2.42+0.14) +0.0% with delta timings showing they are all within noise from run to run. In the general case we do not want to call find_pack_entry_one() more than once, because it is expensive. This patch splits the loop in want_object_in_pack() into two parts: finding the object and seeing if it impacts our choice to include it in the pack. We may call the inexpensive want_found_object() twice, but we will never call find_pack_entry_one() if we do not need to. I appreciate help and discussing this change with Junio C Hamano and Jeff King. Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-10 23:01:10 +08:00
off_t offset;
if (p == *found_pack)
offset = *found_offset;
else
offset = find_pack_entry_one(sha1, p);
if (offset) {
if (!*found_pack) {
sha1_file: squelch "packfile cannot be accessed" warnings When we find an object in a packfile index, we make sure we can still open the packfile itself (or that it is already open), as it might have been deleted by a simultaneous repack. If we can't access the packfile, we print a warning for the user and tell the caller that we don't have the object (we can then look in other packfiles, or find a loose version, before giving up). The warning we print to the user isn't really accomplishing anything, and it is potentially confusing to users. In the normal case, it is complete noise; we find the object elsewhere, and the user does not have to care that we racily saw a packfile index that became stale. It didn't affect the operation at all. A possibly more interesting case is when we later can't find the object, and report failure to the user. In this case the warning could be considered a clue toward that ultimate failure. But it's not really a useful clue in practice. We wouldn't even print it consistently (since we are racing with another process, we might not even see the .idx file, or we might win the race and open the packfile, completing the operation). This patch drops the warning entirely (not only from the fill_pack_entry site, but also from an identical use in pack-objects). If we did find the warning interesting in the error case, we could stuff it away and reveal it to the user when we later die() due to the broken object. But that complexity just isn't worth it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-31 08:47:38 +08:00
if (!is_pack_valid(p))
pack-objects: protect against disappearing packs It's possible that while pack-objects is running, a simultaneously running prune process might delete a pack that we are interested in. Because we load the pack indices early on, we know that the pack contains our item, but by the time we try to open and map it, it is gone. Since c715f78, we already protect against this in the normal object access code path, but pack-objects accesses the packs at a lower level. In the normal access path, we call find_pack_entry, which will call find_pack_entry_one on each pack index, which does the actual lookup. If it gets a hit, we will actually open and verify the validity of the matching packfile (using c715f78's is_pack_valid). If we can't open it, we'll issue a warning and pretend that we didn't find it, causing us to go on to the next pack (or on to loose objects). Furthermore, we will cache the descriptor to the opened packfile. Which means that later, when we actually try to access the object, we are likely to still have that packfile opened, and won't care if it has been unlinked from the filesystem. Notice the "likely" above. If there is another pack access in the interim, and we run out of descriptors, we could close the pack. And then a later attempt to access the closed pack could fail (we'll try to re-open it, of course, but it may have been deleted). In practice, this doesn't happen because we tend to look up items and then access them immediately. Pack-objects does not follow this code path. Instead, it accesses the packs at a much lower level, using find_pack_entry_one directly. This means we skip the is_pack_valid check, and may end up with the name of a packfile, but no open descriptor. We can add the same is_pack_valid check here. Unfortunately, the access patterns of pack-objects are not quite as nice for keeping lookup and object access together. We look up each object as we find out about it, and the only later when writing the packfile do we necessarily access it. Which means that the opened packfile may be closed in the interim. In practice, however, adding this check still has value, for three reasons. 1. If you have a reasonable number of packs and/or a reasonable file descriptor limit, you can keep all of your packs open simultaneously. If this is the case, then the race is impossible to trigger. 2. Even if you can't keep all packs open at once, you may end up keeping the deleted one open (i.e., you may get lucky). 3. The race window is shortened. You may notice early that the pack is gone, and not try to access it. Triggering the problem without this check means deleting the pack any time after we read the list of index files, but before we access the looked-up objects. Triggering it with this check means deleting the pack means deleting the pack after we do a lookup (and successfully access the packfile), but before we access the object. Which is a smaller window. Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-15 02:03:48 +08:00
continue;
*found_offset = offset;
*found_pack = p;
}
pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there are two codepaths in pack-objects: with & without using bitmap reachability index. However add_object_entry_from_bitmap(), despite its non-bitmapped counterpart add_object_entry(), in no way does check for whether --local or --honor-pack-keep or --incremental should be respected. In non-bitmapped codepath this is handled in want_object_in_pack(), but bitmapped codepath has simply no such checking at all. The bitmapped codepath however was allowing to pass in all those options and with bitmap indices still being used under such conditions - potentially giving wrong output (e.g. including objects from non-local or .keep'ed pack). We can easily fix this by noting the following: when an object comes to add_object_entry_from_bitmap() it can come for two reasons: 1. entries coming from main pack covered by bitmap index, and 2. object coming from, possibly alternate, loose or other packs. "2" can be already handled by want_object_in_pack() and to cover "1" we can teach want_object_in_pack() to expect that *found_pack can be non-NULL, meaning calling client already found object's pack entry. In want_object_in_pack() we care to start the checks from already found pack, if we have one, this way determining the answer right away in case neither --local nor --honour-pack-keep are active. In particular, as p5310-pack-bitmaps.sh shows (3 consecutive runs), we do not do harm to served-with-bitmap clones performance-wise: Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.08(8.20+0.25) 9.09(8.14+0.32) +0.1% 5310.3: simulated clone 1.92(2.12+0.08) 1.93(2.12+0.09) +0.5% 5310.4: simulated fetch 0.82(1.07+0.04) 0.82(1.06+0.04) +0.0% 5310.6: partial bitmap 1.96(2.42+0.13) 1.95(2.40+0.15) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.11(8.16+0.32) 9.11(8.19+0.28) +0.0% 5310.3: simulated clone 1.93(2.14+0.07) 1.92(2.11+0.10) -0.5% 5310.4: simulated fetch 0.82(1.06+0.04) 0.82(1.04+0.05) +0.0% 5310.6: partial bitmap 1.95(2.38+0.16) 1.94(2.39+0.14) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.13(8.17+0.31) 9.07(8.13+0.28) -0.7% 5310.3: simulated clone 1.92(2.13+0.07) 1.91(2.12+0.06) -0.5% 5310.4: simulated fetch 0.82(1.08+0.03) 0.82(1.08+0.03) +0.0% 5310.6: partial bitmap 1.96(2.43+0.14) 1.96(2.42+0.14) +0.0% with delta timings showing they are all within noise from run to run. In the general case we do not want to call find_pack_entry_one() more than once, because it is expensive. This patch splits the loop in want_object_in_pack() into two parts: finding the object and seeing if it impacts our choice to include it in the pack. We may call the inexpensive want_found_object() twice, but we will never call find_pack_entry_one() if we do not need to. I appreciate help and discussing this change with Junio C Hamano and Jeff King. Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-10 23:01:10 +08:00
want = want_found_object(exclude, p);
if (!exclude && want > 0)
pack-objects: use mru list when iterating over packs In the original implementation of want_object_in_pack(), we always looked for the object in every pack, so the order did not matter for performance. As of the last few patches, however, we can now often break out of the loop early after finding the first instance, and avoid looking in the other packs at all. In this case, pack order can make a big difference, because we'd like to find the objects by looking at as few packs as possible. This patch switches us to the same packed_git_mru list that is now used by normal object lookups. Here are timings for p5303 on linux.git: Test HEAD^ HEAD ------------------------------------------------------------------------ 5303.3: rev-list (1) 31.31(31.07+0.23) 31.28(31.00+0.27) -0.1% 5303.4: repack (1) 40.35(38.84+2.60) 40.53(39.31+2.32) +0.4% 5303.6: rev-list (50) 31.37(31.15+0.21) 31.41(31.16+0.24) +0.1% 5303.7: repack (50) 58.25(68.54+2.03) 47.28(57.66+1.89) -18.8% 5303.9: rev-list (1000) 31.91(31.57+0.33) 31.93(31.64+0.28) +0.1% 5303.10: repack (1000) 304.80(376.00+3.92) 87.21(159.54+2.84) -71.4% The rev-list numbers are unchanged, which makes sense (they are not exercising this code at all). The 50- and 1000-pack repack cases show considerable improvement. The single-pack repack case doesn't, of course; there's nothing to improve. In fact, it gives us a baseline for how fast we could possibly go. You can see that though rev-list can approach the single-pack case even with 1000 packs, repack doesn't. The reason is simple: the loop we are optimizing is only part of what the repack is doing. After the "counting" phase, we do delta compression, which is much more expensive when there are multiple packs, because we have fewer deltas we can reuse (you can also see that these numbers come from a multicore machine; the CPU times are much higher than the wall-clock times due to the delta phase). So the good news is that in cases with many packs, we used to be dominated by the "counting" phase, and now we are dominated by the delta compression (which is faster, and which we have already parallelized). Here are similar numbers for git.git: Test HEAD^ HEAD --------------------------------------------------------------------- 5303.3: rev-list (1) 1.55(1.51+0.02) 1.54(1.53+0.00) -0.6% 5303.4: repack (1) 1.82(1.80+0.08) 1.82(1.78+0.09) +0.0% 5303.6: rev-list (50) 1.58(1.57+0.00) 1.58(1.56+0.01) +0.0% 5303.7: repack (50) 2.50(3.12+0.07) 2.31(2.95+0.06) -7.6% 5303.9: rev-list (1000) 2.22(2.20+0.02) 2.23(2.19+0.03) +0.5% 5303.10: repack (1000) 10.47(16.78+0.22) 7.50(13.76+0.22) -28.4% Not as impressive in terms of percentage, but still measurable wins. If you look at the wall-clock time improvements in the 1000-pack case, you can see that linux improved by roughly 10x as many seconds as git. That's because it has roughly 10x as many objects, and we'd expect this improvement to scale linearly with the number of objects (since the number of packs is kept constant). It's just that the "counting" phase is a smaller percentage of the total time spent for a git.git repack, and hence the percentage win is smaller. The implementation itself is a straightforward use of the MRU code. We only bother marking a pack as used when we know that we are able to break early out of the loop, for two reasons: 1. If we can't break out early, it does no good; we have to visit each pack anyway, so we might as well avoid even the minor overhead of managing the cache order. 2. The mru_mark() function reorders the list, which would screw up our traversal. So it is only safe to mark when we are about to break out of the loop. We could record the found pack and mark it after the loop finishes, of course, but that's more complicated and it doesn't buy us anything due to (1). Note that this reordering does have a potential impact on the final pack, as we store only a single "found" pack for each object, even if it is present in multiple packs. In principle, any copy is acceptable, as they all refer to the same content. But in practice, they may differ in whether they are stored as deltas, against which base, etc. This may have an impact on delta reuse, and even the delta search (since we skip pairs that were already in the same pack). It's not clear whether this change of order would hurt or even help average cases, though. The most likely reason to have duplicate objects is from the completion of thin packs (e.g., you have some objects in a "base" pack, then receive several pushes; the packs you receive may be thin on the wire, with deltas that refer to bases outside the pack, but we complete them with duplicate base objects when indexing them). In such a case the current code would always find the thin duplicates (because we currently walk the packs in reverse chronological order). Whereas with this patch, some of those duplicates would be found in the base pack instead. In my tests repacking a real-world case of linux.git with 3600 thin-pack pushes (on top of a large "base" pack), the resulting pack was about 0.04% larger with this patch. On the other hand, because we were more likely to hit the base pack, there were more opportunities for delta reuse, and we had 50,000 fewer objects to examine in the delta search. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 17:26:57 +08:00
mru_mark(packed_git_mru, entry);
pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there are two codepaths in pack-objects: with & without using bitmap reachability index. However add_object_entry_from_bitmap(), despite its non-bitmapped counterpart add_object_entry(), in no way does check for whether --local or --honor-pack-keep or --incremental should be respected. In non-bitmapped codepath this is handled in want_object_in_pack(), but bitmapped codepath has simply no such checking at all. The bitmapped codepath however was allowing to pass in all those options and with bitmap indices still being used under such conditions - potentially giving wrong output (e.g. including objects from non-local or .keep'ed pack). We can easily fix this by noting the following: when an object comes to add_object_entry_from_bitmap() it can come for two reasons: 1. entries coming from main pack covered by bitmap index, and 2. object coming from, possibly alternate, loose or other packs. "2" can be already handled by want_object_in_pack() and to cover "1" we can teach want_object_in_pack() to expect that *found_pack can be non-NULL, meaning calling client already found object's pack entry. In want_object_in_pack() we care to start the checks from already found pack, if we have one, this way determining the answer right away in case neither --local nor --honour-pack-keep are active. In particular, as p5310-pack-bitmaps.sh shows (3 consecutive runs), we do not do harm to served-with-bitmap clones performance-wise: Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.08(8.20+0.25) 9.09(8.14+0.32) +0.1% 5310.3: simulated clone 1.92(2.12+0.08) 1.93(2.12+0.09) +0.5% 5310.4: simulated fetch 0.82(1.07+0.04) 0.82(1.06+0.04) +0.0% 5310.6: partial bitmap 1.96(2.42+0.13) 1.95(2.40+0.15) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.11(8.16+0.32) 9.11(8.19+0.28) +0.0% 5310.3: simulated clone 1.93(2.14+0.07) 1.92(2.11+0.10) -0.5% 5310.4: simulated fetch 0.82(1.06+0.04) 0.82(1.04+0.05) +0.0% 5310.6: partial bitmap 1.95(2.38+0.16) 1.94(2.39+0.14) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.13(8.17+0.31) 9.07(8.13+0.28) -0.7% 5310.3: simulated clone 1.92(2.13+0.07) 1.91(2.12+0.06) -0.5% 5310.4: simulated fetch 0.82(1.08+0.03) 0.82(1.08+0.03) +0.0% 5310.6: partial bitmap 1.96(2.43+0.14) 1.96(2.42+0.14) +0.0% with delta timings showing they are all within noise from run to run. In the general case we do not want to call find_pack_entry_one() more than once, because it is expensive. This patch splits the loop in want_object_in_pack() into two parts: finding the object and seeing if it impacts our choice to include it in the pack. We may call the inexpensive want_found_object() twice, but we will never call find_pack_entry_one() if we do not need to. I appreciate help and discussing this change with Junio C Hamano and Jeff King. Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-10 23:01:10 +08:00
if (want != -1)
return want;
}
}
return 1;
}
static void create_object_entry(const unsigned char *sha1,
enum object_type type,
uint32_t hash,
int exclude,
int no_try_delta,
uint32_t index_pos,
struct packed_git *found_pack,
off_t found_offset)
{
struct object_entry *entry;
entry = packlist_alloc(&to_pack, sha1, index_pos);
entry->hash = hash;
if (type)
entry->type = type;
if (exclude)
entry->preferred_base = 1;
else
nr_result++;
if (found_pack) {
entry->in_pack = found_pack;
entry->in_pack_offset = found_offset;
}
entry->no_try_delta = no_try_delta;
}
static const char no_closure_warning[] = N_(
"disabling bitmap writing, as some objects are not being packed"
);
static int add_object_entry(const unsigned char *sha1, enum object_type type,
const char *name, int exclude)
{
pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there are two codepaths in pack-objects: with & without using bitmap reachability index. However add_object_entry_from_bitmap(), despite its non-bitmapped counterpart add_object_entry(), in no way does check for whether --local or --honor-pack-keep or --incremental should be respected. In non-bitmapped codepath this is handled in want_object_in_pack(), but bitmapped codepath has simply no such checking at all. The bitmapped codepath however was allowing to pass in all those options and with bitmap indices still being used under such conditions - potentially giving wrong output (e.g. including objects from non-local or .keep'ed pack). We can easily fix this by noting the following: when an object comes to add_object_entry_from_bitmap() it can come for two reasons: 1. entries coming from main pack covered by bitmap index, and 2. object coming from, possibly alternate, loose or other packs. "2" can be already handled by want_object_in_pack() and to cover "1" we can teach want_object_in_pack() to expect that *found_pack can be non-NULL, meaning calling client already found object's pack entry. In want_object_in_pack() we care to start the checks from already found pack, if we have one, this way determining the answer right away in case neither --local nor --honour-pack-keep are active. In particular, as p5310-pack-bitmaps.sh shows (3 consecutive runs), we do not do harm to served-with-bitmap clones performance-wise: Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.08(8.20+0.25) 9.09(8.14+0.32) +0.1% 5310.3: simulated clone 1.92(2.12+0.08) 1.93(2.12+0.09) +0.5% 5310.4: simulated fetch 0.82(1.07+0.04) 0.82(1.06+0.04) +0.0% 5310.6: partial bitmap 1.96(2.42+0.13) 1.95(2.40+0.15) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.11(8.16+0.32) 9.11(8.19+0.28) +0.0% 5310.3: simulated clone 1.93(2.14+0.07) 1.92(2.11+0.10) -0.5% 5310.4: simulated fetch 0.82(1.06+0.04) 0.82(1.04+0.05) +0.0% 5310.6: partial bitmap 1.95(2.38+0.16) 1.94(2.39+0.14) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.13(8.17+0.31) 9.07(8.13+0.28) -0.7% 5310.3: simulated clone 1.92(2.13+0.07) 1.91(2.12+0.06) -0.5% 5310.4: simulated fetch 0.82(1.08+0.03) 0.82(1.08+0.03) +0.0% 5310.6: partial bitmap 1.96(2.43+0.14) 1.96(2.42+0.14) +0.0% with delta timings showing they are all within noise from run to run. In the general case we do not want to call find_pack_entry_one() more than once, because it is expensive. This patch splits the loop in want_object_in_pack() into two parts: finding the object and seeing if it impacts our choice to include it in the pack. We may call the inexpensive want_found_object() twice, but we will never call find_pack_entry_one() if we do not need to. I appreciate help and discussing this change with Junio C Hamano and Jeff King. Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-10 23:01:10 +08:00
struct packed_git *found_pack = NULL;
off_t found_offset = 0;
uint32_t index_pos;
if (have_duplicate_entry(sha1, exclude, &index_pos))
return 0;
if (!want_object_in_pack(sha1, exclude, &found_pack, &found_offset)) {
/* The pack is missing an object, so it will not have closure */
if (write_bitmap_index) {
warning(_(no_closure_warning));
write_bitmap_index = 0;
}
return 0;
}
create_object_entry(sha1, type, pack_name_hash(name),
exclude, name && no_try_delta(name),
index_pos, found_pack, found_offset);
display_progress(progress_state, nr_result);
return 1;
}
pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:09 +08:00
static int add_object_entry_from_bitmap(const unsigned char *sha1,
enum object_type type,
int flags, uint32_t name_hash,
struct packed_git *pack, off_t offset)
{
uint32_t index_pos;
if (have_duplicate_entry(sha1, 0, &index_pos))
return 0;
pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there are two codepaths in pack-objects: with & without using bitmap reachability index. However add_object_entry_from_bitmap(), despite its non-bitmapped counterpart add_object_entry(), in no way does check for whether --local or --honor-pack-keep or --incremental should be respected. In non-bitmapped codepath this is handled in want_object_in_pack(), but bitmapped codepath has simply no such checking at all. The bitmapped codepath however was allowing to pass in all those options and with bitmap indices still being used under such conditions - potentially giving wrong output (e.g. including objects from non-local or .keep'ed pack). We can easily fix this by noting the following: when an object comes to add_object_entry_from_bitmap() it can come for two reasons: 1. entries coming from main pack covered by bitmap index, and 2. object coming from, possibly alternate, loose or other packs. "2" can be already handled by want_object_in_pack() and to cover "1" we can teach want_object_in_pack() to expect that *found_pack can be non-NULL, meaning calling client already found object's pack entry. In want_object_in_pack() we care to start the checks from already found pack, if we have one, this way determining the answer right away in case neither --local nor --honour-pack-keep are active. In particular, as p5310-pack-bitmaps.sh shows (3 consecutive runs), we do not do harm to served-with-bitmap clones performance-wise: Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.08(8.20+0.25) 9.09(8.14+0.32) +0.1% 5310.3: simulated clone 1.92(2.12+0.08) 1.93(2.12+0.09) +0.5% 5310.4: simulated fetch 0.82(1.07+0.04) 0.82(1.06+0.04) +0.0% 5310.6: partial bitmap 1.96(2.42+0.13) 1.95(2.40+0.15) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.11(8.16+0.32) 9.11(8.19+0.28) +0.0% 5310.3: simulated clone 1.93(2.14+0.07) 1.92(2.11+0.10) -0.5% 5310.4: simulated fetch 0.82(1.06+0.04) 0.82(1.04+0.05) +0.0% 5310.6: partial bitmap 1.95(2.38+0.16) 1.94(2.39+0.14) -0.5% Test 56dfeb62 this tree ----------------------------------------------------------------- 5310.2: repack to disk 9.13(8.17+0.31) 9.07(8.13+0.28) -0.7% 5310.3: simulated clone 1.92(2.13+0.07) 1.91(2.12+0.06) -0.5% 5310.4: simulated fetch 0.82(1.08+0.03) 0.82(1.08+0.03) +0.0% 5310.6: partial bitmap 1.96(2.43+0.14) 1.96(2.42+0.14) +0.0% with delta timings showing they are all within noise from run to run. In the general case we do not want to call find_pack_entry_one() more than once, because it is expensive. This patch splits the loop in want_object_in_pack() into two parts: finding the object and seeing if it impacts our choice to include it in the pack. We may call the inexpensive want_found_object() twice, but we will never call find_pack_entry_one() if we do not need to. I appreciate help and discussing this change with Junio C Hamano and Jeff King. Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-10 23:01:10 +08:00
if (!want_object_in_pack(sha1, 0, &pack, &offset))
return 0;
pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:09 +08:00
create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, offset);
display_progress(progress_state, nr_result);
return 1;
}
struct pbase_tree_cache {
unsigned char sha1[20];
int ref;
int temporary;
void *tree_data;
unsigned long tree_size;
};
static struct pbase_tree_cache *(pbase_tree_cache[256]);
static int pbase_tree_cache_ix(const unsigned char *sha1)
{
return sha1[0] % ARRAY_SIZE(pbase_tree_cache);
}
static int pbase_tree_cache_ix_incr(int ix)
{
return (ix+1) % ARRAY_SIZE(pbase_tree_cache);
}
static struct pbase_tree {
struct pbase_tree *next;
/* This is a phony "cache" entry; we are not
* going to evict it or find it through _get()
* mechanism -- this is for the toplevel node that
* would almost always change with any commit.
*/
struct pbase_tree_cache pcache;
} *pbase_tree;
static struct pbase_tree_cache *pbase_tree_get(const unsigned char *sha1)
{
struct pbase_tree_cache *ent, *nent;
void *data;
unsigned long size;
enum object_type type;
int neigh;
int my_ix = pbase_tree_cache_ix(sha1);
int available_ix = -1;
/* pbase-tree-cache acts as a limited hashtable.
* your object will be found at your index or within a few
* slots after that slot if it is cached.
*/
for (neigh = 0; neigh < 8; neigh++) {
ent = pbase_tree_cache[my_ix];
if (ent && !hashcmp(ent->sha1, sha1)) {
ent->ref++;
return ent;
}
else if (((available_ix < 0) && (!ent || !ent->ref)) ||
((0 <= available_ix) &&
(!ent && pbase_tree_cache[available_ix])))
available_ix = my_ix;
if (!ent)
break;
my_ix = pbase_tree_cache_ix_incr(my_ix);
}
/* Did not find one. Either we got a bogus request or
* we need to read and perhaps cache.
*/
data = read_sha1_file(sha1, &type, &size);
if (!data)
return NULL;
if (type != OBJ_TREE) {
free(data);
return NULL;
}
/* We need to either cache or return a throwaway copy */
if (available_ix < 0)
ent = NULL;
else {
ent = pbase_tree_cache[available_ix];
my_ix = available_ix;
}
if (!ent) {
nent = xmalloc(sizeof(*nent));
nent->temporary = (available_ix < 0);
}
else {
/* evict and reuse */
free(ent->tree_data);
nent = ent;
}
hashcpy(nent->sha1, sha1);
nent->tree_data = data;
nent->tree_size = size;
nent->ref = 1;
if (!nent->temporary)
pbase_tree_cache[my_ix] = nent;
return nent;
}
static void pbase_tree_put(struct pbase_tree_cache *cache)
{
if (!cache->temporary) {
cache->ref--;
return;
}
free(cache->tree_data);
free(cache);
}
static int name_cmp_len(const char *name)
{
int i;
for (i = 0; name[i] && name[i] != '\n' && name[i] != '/'; i++)
;
return i;
}
static void add_pbase_object(struct tree_desc *tree,
const char *name,
int cmplen,
const char *fullname)
{
tree_entry(): new tree-walking helper function This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-31 00:45:45 +08:00
struct name_entry entry;
int cmp;
tree_entry(): new tree-walking helper function This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-31 00:45:45 +08:00
while (tree_entry(tree,&entry)) {
if (S_ISGITLINK(entry.mode))
continue;
cmp = tree_entry_len(&entry) != cmplen ? 1 :
memcmp(name, entry.path, cmplen);
if (cmp > 0)
continue;
if (cmp < 0)
return;
if (name[cmplen] != '/') {
add_object_entry(entry.oid->hash,
object_type(entry.mode),
fullname, 1);
return;
}
if (S_ISDIR(entry.mode)) {
struct tree_desc sub;
struct pbase_tree_cache *tree;
const char *down = name+cmplen+1;
int downlen = name_cmp_len(down);
tree = pbase_tree_get(entry.oid->hash);
if (!tree)
return;
init_tree_desc(&sub, tree->tree_data, tree->tree_size);
add_pbase_object(&sub, down, downlen, fullname);
pbase_tree_put(tree);
}
}
}
static unsigned *done_pbase_paths;
static int done_pbase_paths_num;
static int done_pbase_paths_alloc;
static int done_pbase_path_pos(unsigned hash)
{
int lo = 0;
int hi = done_pbase_paths_num;
while (lo < hi) {
int mi = (hi + lo) / 2;
if (done_pbase_paths[mi] == hash)
return mi;
if (done_pbase_paths[mi] < hash)
hi = mi;
else
lo = mi + 1;
}
return -lo-1;
}
static int check_pbase_path(unsigned hash)
{
int pos = (!done_pbase_paths) ? -1 : done_pbase_path_pos(hash);
if (0 <= pos)
return 1;
pos = -pos - 1;
ALLOC_GROW(done_pbase_paths,
done_pbase_paths_num + 1,
done_pbase_paths_alloc);
done_pbase_paths_num++;
if (pos < done_pbase_paths_num)
memmove(done_pbase_paths + pos + 1,
done_pbase_paths + pos,
(done_pbase_paths_num - pos - 1) * sizeof(unsigned));
done_pbase_paths[pos] = hash;
return 0;
}
static void add_preferred_base_object(const char *name)
{
struct pbase_tree *it;
int cmplen;
unsigned hash = pack_name_hash(name);
if (!num_preferred_base || check_pbase_path(hash))
return;
cmplen = name_cmp_len(name);
for (it = pbase_tree; it; it = it->next) {
if (cmplen == 0) {
add_object_entry(it->pcache.sha1, OBJ_TREE, NULL, 1);
}
else {
struct tree_desc tree;
init_tree_desc(&tree, it->pcache.tree_data, it->pcache.tree_size);
add_pbase_object(&tree, name, cmplen, name);
}
}
}
static void add_preferred_base(unsigned char *sha1)
{
struct pbase_tree *it;
void *data;
unsigned long size;
unsigned char tree_sha1[20];
if (window <= num_preferred_base++)
return;
data = read_object_with_reference(sha1, tree_type, &size, tree_sha1);
if (!data)
return;
for (it = pbase_tree; it; it = it->next) {
if (!hashcmp(it->pcache.sha1, tree_sha1)) {
free(data);
return;
}
}
it = xcalloc(1, sizeof(*it));
it->next = pbase_tree;
pbase_tree = it;
hashcpy(it->pcache.sha1, tree_sha1);
it->pcache.tree_data = data;
it->pcache.tree_size = size;
}
static void cleanup_preferred_base(void)
{
struct pbase_tree *it;
unsigned i;
it = pbase_tree;
pbase_tree = NULL;
while (it) {
struct pbase_tree *this = it;
it = this->next;
free(this->pcache.tree_data);
free(this);
}
for (i = 0; i < ARRAY_SIZE(pbase_tree_cache); i++) {
if (!pbase_tree_cache[i])
continue;
free(pbase_tree_cache[i]->tree_data);
free(pbase_tree_cache[i]);
pbase_tree_cache[i] = NULL;
}
free(done_pbase_paths);
done_pbase_paths = NULL;
done_pbase_paths_num = done_pbase_paths_alloc = 0;
}
static void check_object(struct object_entry *entry)
{
if (entry->in_pack) {
struct packed_git *p = entry->in_pack;
Replace use_packed_git with window cursors. Part of the implementation concept of the sliding mmap window for pack access is to permit multiple windows per pack to be mapped independently. Since the inuse_cnt is associated with the mmap and not with the file, this value is in struct pack_window and needs to be incremented/decremented for each pack_window accessed by any code. To faciliate that implementation we need to replace all uses of use_packed_git() and unuse_packed_git() with a different API that follows struct pack_window objects rather than struct packed_git. The way this works is when we need to start accessing a pack for the first time we should setup a new window 'cursor' by declaring a local and setting it to NULL: struct pack_windows *w_curs = NULL; To obtain the memory region which contains a specific section of the pack file we invoke use_pack(), supplying the address of our current window cursor: unsigned int len; unsigned char *addr = use_pack(p, &w_curs, offset, &len); the returned address `addr` will be the first byte at `offset` within the pack file. The optional variable len will also be updated with the number of bytes remaining following the address. Multiple calls to use_pack() with the same window cursor will update the window cursor, moving it from one window to another when necessary. In this way each window cursor variable maintains only one struct pack_window inuse at a time. Finally before exiting the scope which originally declared the window cursor we must invoke unuse_pack() to unuse the current window (which may be different from the one that was first obtained from use_pack): unuse_pack(&w_curs); This implementation is still not complete with regards to multiple windows, as only one window per pack file is supported right now. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-23 15:34:08 +08:00
struct pack_window *w_curs = NULL;
const unsigned char *base_ref = NULL;
struct object_entry *base_entry;
unsigned long used, used_0;
2011-06-11 02:52:15 +08:00
unsigned long avail;
off_t ofs;
unsigned char *buf, c;
buf = use_pack(p, &w_curs, entry->in_pack_offset, &avail);
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17 03:55:51 +08:00
/*
* We want in_pack_type even if we do not reuse delta
* since non-delta representations could still be reused.
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17 03:55:51 +08:00
*/
used = unpack_object_header_buffer(buf, avail,
&entry->in_pack_type,
&entry->size);
if (used == 0)
goto give_up;
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17 03:55:51 +08:00
/*
* Determine if this is a delta and if so whether we can
* reuse it or not. Otherwise let's find out as cheaply as
* possible what the actual type and size for this object is.
*/
switch (entry->in_pack_type) {
default:
/* Not a delta hence we've already got all we need. */
entry->type = entry->in_pack_type;
entry->in_pack_header_size = used;
if (entry->type < OBJ_COMMIT || entry->type > OBJ_BLOB)
goto give_up;
unuse_pack(&w_curs);
return;
case OBJ_REF_DELTA:
if (reuse_delta && !entry->preferred_base)
base_ref = use_pack(p, &w_curs,
entry->in_pack_offset + used, NULL);
entry->in_pack_header_size = used + 20;
break;
case OBJ_OFS_DELTA:
buf = use_pack(p, &w_curs,
entry->in_pack_offset + used, NULL);
used_0 = 0;
c = buf[used_0++];
ofs = c & 127;
while (c & 128) {
ofs += 1;
if (!ofs || MSB(ofs, 7)) {
error("delta base offset overflow in pack for %s",
sha1_to_hex(entry->idx.sha1));
goto give_up;
}
c = buf[used_0++];
ofs = (ofs << 7) + (c & 127);
}
ofs = entry->in_pack_offset - ofs;
if (ofs <= 0 || ofs >= entry->in_pack_offset) {
error("delta base offset out of bound for %s",
sha1_to_hex(entry->idx.sha1));
goto give_up;
}
if (reuse_delta && !entry->preferred_base) {
struct revindex_entry *revidx;
revidx = find_pack_revindex(p, ofs);
if (!revidx)
goto give_up;
base_ref = nth_packed_object_sha1(p, revidx->nr);
}
entry->in_pack_header_size = used + used_0;
break;
}
if (base_ref && (base_entry = packlist_find(&to_pack, base_ref, NULL))) {
/*
* If base_ref was set above that means we wish to
* reuse delta data, and we even found that base
* in the list of objects we want to pack. Goodie!
*
* Depth value does not matter - find_deltas() will
* never consider reused delta as the base object to
* deltify other objects against, in order to avoid
* circular deltas.
*/
entry->type = entry->in_pack_type;
entry->delta = base_entry;
entry->delta_size = entry->size;
entry->delta_sibling = base_entry->delta_child;
base_entry->delta_child = entry;
unuse_pack(&w_curs);
return;
}
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17 03:55:51 +08:00
if (entry->type) {
/*
* This must be a delta and we already know what the
* final object type is. Let's extract the actual
* object size from the delta header.
*/
entry->size = get_size_from_delta(p, &w_curs,
entry->in_pack_offset + entry->in_pack_header_size);
if (entry->size == 0)
goto give_up;
unuse_pack(&w_curs);
return;
}
/*
* No choice but to fall back to the recursive delta walk
* with sha1_object_info() to find about the object type
* at this point...
*/
give_up:
unuse_pack(&w_curs);
}
entry->type = sha1_object_info(entry->idx.sha1, &entry->size);
/*
* The error condition is checked in prepare_pack(). This is
* to permit a missing preferred base object to be ignored
* as a preferred base. Doing so can result in a larger
* pack file, but the transfer will still take place.
*/
}
static int pack_offset_sort(const void *_a, const void *_b)
{
const struct object_entry *a = *(struct object_entry **)_a;
const struct object_entry *b = *(struct object_entry **)_b;
/* avoid filesystem trashing with loose objects */
if (!a->in_pack && !b->in_pack)
return hashcmp(a->idx.sha1, b->idx.sha1);
if (a->in_pack < b->in_pack)
return -1;
if (a->in_pack > b->in_pack)
return 1;
return a->in_pack_offset < b->in_pack_offset ? -1 :
(a->in_pack_offset > b->in_pack_offset);
}
pack-objects: break delta cycles before delta-search phase We do not allow cycles in the delta graph of a pack (i.e., A is a delta of B which is a delta of A) for the obvious reason that you cannot actually access any of the objects in such a case. There's a last-ditch attempt to notice cycles during the write phase, during which we issue a warning to the user and write one of the objects out in full. However, this is "last-ditch" for two reasons: 1. By this time, it's too late to find another delta for the object, so the resulting pack is larger than it otherwise could be. 2. The warning is there because this is something that _shouldn't_ ever happen. If it does, then either: a. a pack we are reusing deltas from had its own cycle b. we are reusing deltas from multiple packs, and we found a cycle among them (i.e., A is a delta of B in one pack, but B is a delta of A in another, and we choose to use both deltas). c. there is a bug in the delta-search code So this code serves as a final check that none of these things has happened, warns the user, and prevents us from writing a bogus pack. Right now, (2b) should never happen because of the static ordering of packs in want_object_in_pack(). If two objects have a delta relationship, then they must be in the same pack, and therefore we will find them from that same pack. However, a future patch would like to change that static ordering, which will make (2b) a common occurrence. In preparation, we should be able to handle those kinds of cycles better. This patch does by introducing a cycle-breaking step during the get_object_details() phase, when we are deciding which deltas can be reused. That gives us the chance to feed the objects into the delta search as if the cycle did not exist. We'll leave the detection and warning in the write_object() phase in place, as it still serves as a check for case (2c). This does mean we will stop warning for (2a). That case is caused by bogus input packs, and we ideally would warn the user about it. However, since those cycles show up after picking reusable deltas, they look the same as (2b) to us; our new code will break the cycles early and the last-ditch check will never see them. We could do analysis on any cycles that we find to distinguish the two cases (i.e., it is a bogus pack if and only if every delta in the cycle is in the same pack), but we don't need to. If there is a cycle inside a pack, we'll run into problems not only reusing the delta, but accessing the object data at all. So when we try to dig up the actual size of the object, we'll hit that same cycle and kick in our usual complain-and-try-another-source code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 17:26:36 +08:00
/*
* Drop an on-disk delta we were planning to reuse. Naively, this would
* just involve blanking out the "delta" field, but we have to deal
* with some extra book-keeping:
*
* 1. Removing ourselves from the delta_sibling linked list.
*
* 2. Updating our size/type to the non-delta representation. These were
* either not recorded initially (size) or overwritten with the delta type
* (type) when check_object() decided to reuse the delta.
*/
static void drop_reused_delta(struct object_entry *entry)
{
struct object_entry **p = &entry->delta->delta_child;
struct object_info oi = OBJECT_INFO_INIT;
while (*p) {
if (*p == entry)
*p = (*p)->delta_sibling;
else
p = &(*p)->delta_sibling;
}
entry->delta = NULL;
oi.sizep = &entry->size;
oi.typep = &entry->type;
if (packed_object_info(entry->in_pack, entry->in_pack_offset, &oi) < 0) {
/*
* We failed to get the info from this pack for some reason;
* fall back to sha1_object_info, which may find another copy.
* And if that fails, the error will be recorded in entry->type
* and dealt with in prepare_pack().
*/
entry->type = sha1_object_info(entry->idx.sha1, &entry->size);
}
}
/*
* Follow the chain of deltas from this entry onward, throwing away any links
* that cause us to hit a cycle (as determined by the DFS state flags in
* the entries).
*/
static void break_delta_chains(struct object_entry *entry)
{
/* If it's not a delta, it can't be part of a cycle. */
if (!entry->delta) {
entry->dfs_state = DFS_DONE;
return;
}
switch (entry->dfs_state) {
case DFS_NONE:
/*
* This is the first time we've seen the object. We mark it as
* part of the active potential cycle and recurse.
*/
entry->dfs_state = DFS_ACTIVE;
break_delta_chains(entry->delta);
entry->dfs_state = DFS_DONE;
break;
case DFS_DONE:
/* object already examined, and not part of a cycle */
break;
case DFS_ACTIVE:
/*
* We found a cycle that needs broken. It would be correct to
* break any link in the chain, but it's convenient to
* break this one.
*/
drop_reused_delta(entry);
entry->dfs_state = DFS_DONE;
break;
}
}
static void get_object_details(void)
{
uint32_t i;
struct object_entry **sorted_by_offset;
sorted_by_offset = xcalloc(to_pack.nr_objects, sizeof(struct object_entry *));
for (i = 0; i < to_pack.nr_objects; i++)
sorted_by_offset[i] = to_pack.objects + i;
QSORT(sorted_by_offset, to_pack.nr_objects, pack_offset_sort);
for (i = 0; i < to_pack.nr_objects; i++) {
struct object_entry *entry = sorted_by_offset[i];
check_object(entry);
if (big_file_threshold < entry->size)
entry->no_try_delta = 1;
}
pack-objects: break delta cycles before delta-search phase We do not allow cycles in the delta graph of a pack (i.e., A is a delta of B which is a delta of A) for the obvious reason that you cannot actually access any of the objects in such a case. There's a last-ditch attempt to notice cycles during the write phase, during which we issue a warning to the user and write one of the objects out in full. However, this is "last-ditch" for two reasons: 1. By this time, it's too late to find another delta for the object, so the resulting pack is larger than it otherwise could be. 2. The warning is there because this is something that _shouldn't_ ever happen. If it does, then either: a. a pack we are reusing deltas from had its own cycle b. we are reusing deltas from multiple packs, and we found a cycle among them (i.e., A is a delta of B in one pack, but B is a delta of A in another, and we choose to use both deltas). c. there is a bug in the delta-search code So this code serves as a final check that none of these things has happened, warns the user, and prevents us from writing a bogus pack. Right now, (2b) should never happen because of the static ordering of packs in want_object_in_pack(). If two objects have a delta relationship, then they must be in the same pack, and therefore we will find them from that same pack. However, a future patch would like to change that static ordering, which will make (2b) a common occurrence. In preparation, we should be able to handle those kinds of cycles better. This patch does by introducing a cycle-breaking step during the get_object_details() phase, when we are deciding which deltas can be reused. That gives us the chance to feed the objects into the delta search as if the cycle did not exist. We'll leave the detection and warning in the write_object() phase in place, as it still serves as a check for case (2c). This does mean we will stop warning for (2a). That case is caused by bogus input packs, and we ideally would warn the user about it. However, since those cycles show up after picking reusable deltas, they look the same as (2b) to us; our new code will break the cycles early and the last-ditch check will never see them. We could do analysis on any cycles that we find to distinguish the two cases (i.e., it is a bogus pack if and only if every delta in the cycle is in the same pack), but we don't need to. If there is a cycle inside a pack, we'll run into problems not only reusing the delta, but accessing the object data at all. So when we try to dig up the actual size of the object, we'll hit that same cycle and kick in our usual complain-and-try-another-source code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 17:26:36 +08:00
/*
* This must happen in a second pass, since we rely on the delta
* information for the whole list being completed.
*/
for (i = 0; i < to_pack.nr_objects; i++)
break_delta_chains(&to_pack.objects[i]);
free(sorted_by_offset);
}
/*
* We search for deltas in a list sorted by type, by filename hash, and then
* by size, so that we see progressively smaller and smaller files.
* That's because we prefer deltas to be from the bigger file
* to the smaller -- deletes are potentially cheaper, but perhaps
* more importantly, the bigger file is likely the more recent
* one. The deepest deltas are therefore the oldest objects which are
* less susceptible to be accessed often.
*/
static int type_size_sort(const void *_a, const void *_b)
{
const struct object_entry *a = *(struct object_entry **)_a;
const struct object_entry *b = *(struct object_entry **)_b;
if (a->type > b->type)
return -1;
if (a->type < b->type)
return 1;
if (a->hash > b->hash)
return -1;
if (a->hash < b->hash)
return 1;
if (a->preferred_base > b->preferred_base)
return -1;
if (a->preferred_base < b->preferred_base)
return 1;
if (a->size > b->size)
return -1;
if (a->size < b->size)
return 1;
return a < b ? -1 : (a > b); /* newest first */
}
struct unpacked {
struct object_entry *entry;
void *data;
struct delta_index *index;
unsigned depth;
};
static int delta_cacheable(unsigned long src_size, unsigned long trg_size,
unsigned long delta_size)
{
if (max_delta_cache_size && delta_cache_size + delta_size > max_delta_cache_size)
return 0;
if (delta_size < cache_max_small_delta_size)
return 1;
/* cache delta, if objects are large enough compared to delta size */
if ((src_size >> 20) + (trg_size >> 21) > (delta_size >> 10))
return 1;
return 0;
}
#ifndef NO_PTHREADS
static pthread_mutex_t read_mutex;
#define read_lock() pthread_mutex_lock(&read_mutex)
#define read_unlock() pthread_mutex_unlock(&read_mutex)
static pthread_mutex_t cache_mutex;
#define cache_lock() pthread_mutex_lock(&cache_mutex)
#define cache_unlock() pthread_mutex_unlock(&cache_mutex)
static pthread_mutex_t progress_mutex;
#define progress_lock() pthread_mutex_lock(&progress_mutex)
#define progress_unlock() pthread_mutex_unlock(&progress_mutex)
#else
#define read_lock() (void)0
#define read_unlock() (void)0
#define cache_lock() (void)0
#define cache_unlock() (void)0
#define progress_lock() (void)0
#define progress_unlock() (void)0
#endif
static int try_delta(struct unpacked *trg, struct unpacked *src,
unsigned max_depth, unsigned long *mem_usage)
{
struct object_entry *trg_entry = trg->entry;
struct object_entry *src_entry = src->entry;
unsigned long trg_size, src_size, delta_size, sizediff, max_size, sz;
unsigned ref_depth;
enum object_type type;
void *delta_buf;
/* Don't bother doing diffs between different types */
if (trg_entry->type != src_entry->type)
return -1;
/*
thin-pack: try harder to use preferred base objects as base When creating a pack using objects that reside in existing packs, we try to avoid recomputing futile delta between an object (trg) and a candidate for its base object (src) if they are stored in the same packfile, and trg is not recorded as a delta already. This heuristics makes sense because it is likely that we tried to express trg as a delta based on src but it did not produce a good delta when we created the existing pack. As the pack heuristics prefer producing delta to remove data, and Linus's law dictates that the size of a file grows over time, we tend to record the newest version of the file as inflated, and older ones as delta against it. When creating a thin-pack to transfer recent history, it is likely that we will try to send an object that is recorded in full, as it is newer. But the heuristics to avoid recomputing futile delta effectively forbids us from attempting to express such an object as a delta based on another object. Sending an object in full is often more expensive than sending a suboptimal delta based on other objects, and it is even more so if we could use an object we know the receiving end already has (i.e. preferred base object) as the delta base. Tweak the recomputation avoidance logic, so that we do not punt on computing delta against a preferred base object. The effect of this change can be seen on two simulated upload-pack workloads. The first is based on 44 reflog entries from my git.git origin/master reflog, and represents the packs that kernel.org sent me git updates for the past month or two. The second workload represents much larger fetches, going from git's v1.0.0 tag to v1.1.0, then v1.1.0 to v1.2.0, and so on. The table below shows the average generated pack size and the average CPU time consumed for each dataset, both before and after the patch: dataset | reflog | tags --------------------------------- before | 53358 | 2750977 size after | 32398 | 2668479 change | -39% | -3% --------------------------------- before | 0.18 | 1.12 CPU after | 0.18 | 1.15 change | +0% | +3% This patch makes a much bigger difference for packs with a shorter slice of history (since its effect is seen at the boundaries of the pack) though it has some benefit even for larger packs. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-13 06:32:34 +08:00
* We do not bother to try a delta that we discarded on an
* earlier try, but only when reusing delta data. Note that
* src_entry that is marked as the preferred_base should always
* be considered, as even if we produce a suboptimal delta against
* it, we will still save the transfer cost, as we already know
* the other side has it and we won't send src_entry at all.
*/
if (reuse_delta && trg_entry->in_pack &&
trg_entry->in_pack == src_entry->in_pack &&
thin-pack: try harder to use preferred base objects as base When creating a pack using objects that reside in existing packs, we try to avoid recomputing futile delta between an object (trg) and a candidate for its base object (src) if they are stored in the same packfile, and trg is not recorded as a delta already. This heuristics makes sense because it is likely that we tried to express trg as a delta based on src but it did not produce a good delta when we created the existing pack. As the pack heuristics prefer producing delta to remove data, and Linus's law dictates that the size of a file grows over time, we tend to record the newest version of the file as inflated, and older ones as delta against it. When creating a thin-pack to transfer recent history, it is likely that we will try to send an object that is recorded in full, as it is newer. But the heuristics to avoid recomputing futile delta effectively forbids us from attempting to express such an object as a delta based on another object. Sending an object in full is often more expensive than sending a suboptimal delta based on other objects, and it is even more so if we could use an object we know the receiving end already has (i.e. preferred base object) as the delta base. Tweak the recomputation avoidance logic, so that we do not punt on computing delta against a preferred base object. The effect of this change can be seen on two simulated upload-pack workloads. The first is based on 44 reflog entries from my git.git origin/master reflog, and represents the packs that kernel.org sent me git updates for the past month or two. The second workload represents much larger fetches, going from git's v1.0.0 tag to v1.1.0, then v1.1.0 to v1.2.0, and so on. The table below shows the average generated pack size and the average CPU time consumed for each dataset, both before and after the patch: dataset | reflog | tags --------------------------------- before | 53358 | 2750977 size after | 32398 | 2668479 change | -39% | -3% --------------------------------- before | 0.18 | 1.12 CPU after | 0.18 | 1.15 change | +0% | +3% This patch makes a much bigger difference for packs with a shorter slice of history (since its effect is seen at the boundaries of the pack) though it has some benefit even for larger packs. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-13 06:32:34 +08:00
!src_entry->preferred_base &&
trg_entry->in_pack_type != OBJ_REF_DELTA &&
trg_entry->in_pack_type != OBJ_OFS_DELTA)
return 0;
/* Let's not bust the allowed depth. */
if (src->depth >= max_depth)
return 0;
/* Now some size filtering heuristics. */
trg_size = trg_entry->size;
if (!trg_entry->delta) {
max_size = trg_size/2 - 20;
ref_depth = 1;
} else {
max_size = trg_entry->delta_size;
ref_depth = trg->depth;
}
max_size = (uint64_t)max_size * (max_depth - src->depth) /
(max_depth - ref_depth + 1);
if (max_size == 0)
return 0;
src_size = src_entry->size;
sizediff = src_size < trg_size ? trg_size - src_size : 0;
if (sizediff >= max_size)
return 0;
if (trg_size < src_size / 32)
return 0;
/* Load data if not already done */
if (!trg->data) {
read_lock();
trg->data = read_sha1_file(trg_entry->idx.sha1, &type, &sz);
read_unlock();
if (!trg->data)
die("object %s cannot be read",
sha1_to_hex(trg_entry->idx.sha1));
if (sz != trg_size)
die("object %s inconsistent object length (%lu vs %lu)",
sha1_to_hex(trg_entry->idx.sha1), sz, trg_size);
*mem_usage += sz;
}
if (!src->data) {
read_lock();
src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
read_unlock();
if (!src->data) {
if (src_entry->preferred_base) {
static int warned = 0;
if (!warned++)
warning("object %s cannot be read",
sha1_to_hex(src_entry->idx.sha1));
/*
* Those objects are not included in the
* resulting pack. Be resilient and ignore
* them if they can't be read, in case the
* pack could be created nevertheless.
*/
return 0;
}
die("object %s cannot be read",
sha1_to_hex(src_entry->idx.sha1));
}
if (sz != src_size)
die("object %s inconsistent object length (%lu vs %lu)",
sha1_to_hex(src_entry->idx.sha1), sz, src_size);
*mem_usage += sz;
}
if (!src->index) {
src->index = create_delta_index(src->data, src_size);
if (!src->index) {
static int warned = 0;
if (!warned++)
warning("suboptimal pack - out of memory");
return 0;
}
*mem_usage += sizeof_delta_index(src->index);
}
delta_buf = create_delta(src->index, trg->data, trg_size, &delta_size, max_size);
if (!delta_buf)
return 0;
if (trg_entry->delta) {
/* Prefer only shallower same-sized deltas. */
if (delta_size == trg_entry->delta_size &&
src->depth + 1 >= trg->depth) {
free(delta_buf);
return 0;
}
}
/*
* Handle memory allocation outside of the cache
* accounting lock. Compiler will optimize the strangeness
* away when NO_PTHREADS is defined.
*/
Avoid unnecessary "if-before-free" tests. This change removes all obvious useless if-before-free tests. E.g., it replaces code like this: if (some_expression) free (some_expression); with the now-equivalent: free (some_expression); It is equivalent not just because POSIX has required free(NULL) to work for a long time, but simply because it has worked for so long that no reasonable porting target fails the test. Here's some evidence from nearly 1.5 years ago: http://www.winehq.org/pipermail/wine-patches/2006-October/031544.html FYI, the change below was prepared by running the following: git ls-files -z | xargs -0 \ perl -0x3b -pi -e \ 's/\bif\s*\(\s*(\S+?)(?:\s*!=\s*NULL)?\s*\)\s+(free\s*\(\s*\1\s*\))/$2/s' Note however, that it doesn't handle brace-enclosed blocks like "if (x) { free (x); }". But that's ok, since there were none like that in git sources. Beware: if you do use the above snippet, note that it can produce syntactically invalid C code. That happens when the affected "if"-statement has a matching "else". E.g., it would transform this if (x) free (x); else foo (); into this: free (x); else foo (); There were none of those here, either. If you're interested in automating detection of the useless tests, you might like the useless-if-before-free script in gnulib: [it *does* detect brace-enclosed free statements, and has a --name=S option to make it detect free-like functions with different names] http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=build-aux/useless-if-before-free Addendum: Remove one more (in imap-send.c), spotted by Jean-Luc Herren <jlh@gmx.ch>. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-01 01:26:32 +08:00
free(trg_entry->delta_data);
cache_lock();
if (trg_entry->delta_data) {
delta_cache_size -= trg_entry->delta_size;
trg_entry->delta_data = NULL;
}
if (delta_cacheable(src_size, trg_size, delta_size)) {
delta_cache_size += delta_size;
cache_unlock();
trg_entry->delta_data = xrealloc(delta_buf, delta_size);
} else {
cache_unlock();
free(delta_buf);
}
trg_entry->delta = src_entry;
trg_entry->delta_size = delta_size;
trg->depth = src->depth + 1;
return 1;
}
static unsigned int check_delta_limit(struct object_entry *me, unsigned int n)
{
struct object_entry *child = me->delta_child;
unsigned int m = n;
while (child) {
unsigned int c = check_delta_limit(child, n + 1);
if (m < c)
m = c;
child = child->delta_sibling;
}
return m;
}
static unsigned long free_unpacked(struct unpacked *n)
{
unsigned long freed_mem = sizeof_delta_index(n->index);
free_delta_index(n->index);
n->index = NULL;
if (n->data) {
freed_mem += n->entry->size;
free(n->data);
n->data = NULL;
}
n->entry = NULL;
n->depth = 0;
return freed_mem;
}
static void find_deltas(struct object_entry **list, unsigned *list_size,
int window, int depth, unsigned *processed)
{
uint32_t i, idx = 0, count = 0;
struct unpacked *array;
unsigned long mem_usage = 0;
array = xcalloc(window, sizeof(struct unpacked));
for (;;) {
struct object_entry *entry;
struct unpacked *n = array + idx;
int j, max_depth, best_base = -1;
progress_lock();
if (!*list_size) {
progress_unlock();
break;
}
entry = *list++;
(*list_size)--;
if (!entry->preferred_base) {
(*processed)++;
display_progress(progress_state, *processed);
}
progress_unlock();
mem_usage -= free_unpacked(n);
n->entry = entry;
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17 03:55:51 +08:00
while (window_memory_limit &&
mem_usage > window_memory_limit &&
count > 1) {
uint32_t tail = (idx + window - count) % window;
mem_usage -= free_unpacked(array + tail);
count--;
}
/* We do not compute delta to *create* objects we are not
* going to pack.
*/
if (entry->preferred_base)
goto next;
/*
* If the current object is at pack edge, take the depth the
* objects that depend on the current object into account
* otherwise they would become too deep.
*/
max_depth = depth;
if (entry->delta_child) {
max_depth -= check_delta_limit(entry, 0);
if (max_depth <= 0)
goto next;
}
j = window;
while (--j > 0) {
Keep last used delta base in the delta window This is based on Martin Koegler's idea to keep the object that was successfully used as the base of the delta when it is about to fall off the edge of the window. Instead of doing so only for the objects at the edge of the window, this makes the window a lru eviction mechanism. If an entry is used as a base, it is moved to the last of the queue to be evicted. This is a quick-and-dirty implementation, as it keeps the original implementation of the data structure used for the window. This originally was done as an array, not as an array of pointers, because it was meant to be used as a cyclic FIFO buffer and a plain array avoids an extra pointer indirection, while its FIFOness eant that we are not "moving" the entries like this patch does. The runtime from three versions were comparable. It seems to make the resulting chain even shorter, which can only be good. (stock "master") 15782196 bytes chain length = 1: 2972 objects chain length = 2: 2651 objects chain length = 3: 2369 objects chain length = 4: 2121 objects chain length = 5: 1877 objects ... chain length = 46: 490 objects chain length = 47: 515 objects chain length = 48: 527 objects chain length = 49: 570 objects chain length = 50: 408 objects (with your patch) 15745736 bytes (0.23% smaller) chain length = 1: 3137 objects chain length = 2: 2688 objects chain length = 3: 2322 objects chain length = 4: 2146 objects chain length = 5: 1824 objects ... chain length = 46: 503 objects chain length = 47: 509 objects chain length = 48: 536 objects chain length = 49: 588 objects chain length = 50: 357 objects (with this patch) 15612086 bytes (1.08% smaller) chain length = 1: 4831 objects chain length = 2: 3811 objects chain length = 3: 2964 objects chain length = 4: 2352 objects chain length = 5: 1944 objects ... chain length = 46: 327 objects chain length = 47: 353 objects chain length = 48: 304 objects chain length = 49: 298 objects chain length = 50: 135 objects [jc: this is with code simplification follow-up from Nico] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-02 14:53:47 +08:00
int ret;
uint32_t other_idx = idx + j;
struct unpacked *m;
if (other_idx >= window)
other_idx -= window;
m = array + other_idx;
if (!m->entry)
break;
ret = try_delta(n, m, max_depth, &mem_usage);
Keep last used delta base in the delta window This is based on Martin Koegler's idea to keep the object that was successfully used as the base of the delta when it is about to fall off the edge of the window. Instead of doing so only for the objects at the edge of the window, this makes the window a lru eviction mechanism. If an entry is used as a base, it is moved to the last of the queue to be evicted. This is a quick-and-dirty implementation, as it keeps the original implementation of the data structure used for the window. This originally was done as an array, not as an array of pointers, because it was meant to be used as a cyclic FIFO buffer and a plain array avoids an extra pointer indirection, while its FIFOness eant that we are not "moving" the entries like this patch does. The runtime from three versions were comparable. It seems to make the resulting chain even shorter, which can only be good. (stock "master") 15782196 bytes chain length = 1: 2972 objects chain length = 2: 2651 objects chain length = 3: 2369 objects chain length = 4: 2121 objects chain length = 5: 1877 objects ... chain length = 46: 490 objects chain length = 47: 515 objects chain length = 48: 527 objects chain length = 49: 570 objects chain length = 50: 408 objects (with your patch) 15745736 bytes (0.23% smaller) chain length = 1: 3137 objects chain length = 2: 2688 objects chain length = 3: 2322 objects chain length = 4: 2146 objects chain length = 5: 1824 objects ... chain length = 46: 503 objects chain length = 47: 509 objects chain length = 48: 536 objects chain length = 49: 588 objects chain length = 50: 357 objects (with this patch) 15612086 bytes (1.08% smaller) chain length = 1: 4831 objects chain length = 2: 3811 objects chain length = 3: 2964 objects chain length = 4: 2352 objects chain length = 5: 1944 objects ... chain length = 46: 327 objects chain length = 47: 353 objects chain length = 48: 304 objects chain length = 49: 298 objects chain length = 50: 135 objects [jc: this is with code simplification follow-up from Nico] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-02 14:53:47 +08:00
if (ret < 0)
break;
Keep last used delta base in the delta window This is based on Martin Koegler's idea to keep the object that was successfully used as the base of the delta when it is about to fall off the edge of the window. Instead of doing so only for the objects at the edge of the window, this makes the window a lru eviction mechanism. If an entry is used as a base, it is moved to the last of the queue to be evicted. This is a quick-and-dirty implementation, as it keeps the original implementation of the data structure used for the window. This originally was done as an array, not as an array of pointers, because it was meant to be used as a cyclic FIFO buffer and a plain array avoids an extra pointer indirection, while its FIFOness eant that we are not "moving" the entries like this patch does. The runtime from three versions were comparable. It seems to make the resulting chain even shorter, which can only be good. (stock "master") 15782196 bytes chain length = 1: 2972 objects chain length = 2: 2651 objects chain length = 3: 2369 objects chain length = 4: 2121 objects chain length = 5: 1877 objects ... chain length = 46: 490 objects chain length = 47: 515 objects chain length = 48: 527 objects chain length = 49: 570 objects chain length = 50: 408 objects (with your patch) 15745736 bytes (0.23% smaller) chain length = 1: 3137 objects chain length = 2: 2688 objects chain length = 3: 2322 objects chain length = 4: 2146 objects chain length = 5: 1824 objects ... chain length = 46: 503 objects chain length = 47: 509 objects chain length = 48: 536 objects chain length = 49: 588 objects chain length = 50: 357 objects (with this patch) 15612086 bytes (1.08% smaller) chain length = 1: 4831 objects chain length = 2: 3811 objects chain length = 3: 2964 objects chain length = 4: 2352 objects chain length = 5: 1944 objects ... chain length = 46: 327 objects chain length = 47: 353 objects chain length = 48: 304 objects chain length = 49: 298 objects chain length = 50: 135 objects [jc: this is with code simplification follow-up from Nico] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-02 14:53:47 +08:00
else if (ret > 0)
best_base = other_idx;
}
/*
* If we decided to cache the delta data, then it is best
* to compress it right away. First because we have to do
* it anyway, and doing it here while we're threaded will
* save a lot of time in the non threaded write phase,
* as well as allow for caching more deltas within
* the same cache size limit.
* ...
* But only if not writing to stdout, since in that case
* the network is most likely throttling writes anyway,
* and therefore it is best to go to the write phase ASAP
* instead, as we can afford spending more time compressing
* between writes at that moment.
*/
if (entry->delta_data && !pack_to_stdout) {
entry->z_delta_size = do_compress(&entry->delta_data,
entry->delta_size);
cache_lock();
delta_cache_size -= entry->delta_size;
delta_cache_size += entry->z_delta_size;
cache_unlock();
}
/* if we made n a delta, and if n is already at max
* depth, leaving it in the window is pointless. we
* should evict it first.
*/
if (entry->delta && max_depth <= n->depth)
continue;
Keep last used delta base in the delta window This is based on Martin Koegler's idea to keep the object that was successfully used as the base of the delta when it is about to fall off the edge of the window. Instead of doing so only for the objects at the edge of the window, this makes the window a lru eviction mechanism. If an entry is used as a base, it is moved to the last of the queue to be evicted. This is a quick-and-dirty implementation, as it keeps the original implementation of the data structure used for the window. This originally was done as an array, not as an array of pointers, because it was meant to be used as a cyclic FIFO buffer and a plain array avoids an extra pointer indirection, while its FIFOness eant that we are not "moving" the entries like this patch does. The runtime from three versions were comparable. It seems to make the resulting chain even shorter, which can only be good. (stock "master") 15782196 bytes chain length = 1: 2972 objects chain length = 2: 2651 objects chain length = 3: 2369 objects chain length = 4: 2121 objects chain length = 5: 1877 objects ... chain length = 46: 490 objects chain length = 47: 515 objects chain length = 48: 527 objects chain length = 49: 570 objects chain length = 50: 408 objects (with your patch) 15745736 bytes (0.23% smaller) chain length = 1: 3137 objects chain length = 2: 2688 objects chain length = 3: 2322 objects chain length = 4: 2146 objects chain length = 5: 1824 objects ... chain length = 46: 503 objects chain length = 47: 509 objects chain length = 48: 536 objects chain length = 49: 588 objects chain length = 50: 357 objects (with this patch) 15612086 bytes (1.08% smaller) chain length = 1: 4831 objects chain length = 2: 3811 objects chain length = 3: 2964 objects chain length = 4: 2352 objects chain length = 5: 1944 objects ... chain length = 46: 327 objects chain length = 47: 353 objects chain length = 48: 304 objects chain length = 49: 298 objects chain length = 50: 135 objects [jc: this is with code simplification follow-up from Nico] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-02 14:53:47 +08:00
/*
* Move the best delta base up in the window, after the
* currently deltified object, to keep it longer. It will
* be the first base object to be attempted next.
*/
if (entry->delta) {
struct unpacked swap = array[best_base];
int dist = (window + idx - best_base) % window;
int dst = best_base;
while (dist--) {
int src = (dst + 1) % window;
array[dst] = array[src];
dst = src;
}
array[dst] = swap;
}
next:
idx++;
if (count + 1 < window)
count++;
if (idx >= window)
idx = 0;
}
for (i = 0; i < window; ++i) {
free_delta_index(array[i].index);
free(array[i].data);
}
free(array);
}
#ifndef NO_PTHREADS
static void try_to_free_from_threads(size_t size)
{
read_lock();
release_pack_memory(size);
read_unlock();
}
static try_to_free_t old_try_to_free_routine;
/*
* The main thread waits on the condition that (at least) one of the workers
* has stopped working (which is indicated in the .working member of
* struct thread_params).
* When a work thread has completed its work, it sets .working to 0 and
* signals the main thread and waits on the condition that .data_ready
* becomes 1.
*/
struct thread_params {
pthread_t thread;
struct object_entry **list;
unsigned list_size;
unsigned remaining;
int window;
int depth;
int working;
int data_ready;
pthread_mutex_t mutex;
pthread_cond_t cond;
unsigned *processed;
};
static pthread_cond_t progress_cond;
/*
* Mutex and conditional variable can't be statically-initialized on Windows.
*/
static void init_threaded_search(void)
{
init_recursive_mutex(&read_mutex);
pthread_mutex_init(&cache_mutex, NULL);
pthread_mutex_init(&progress_mutex, NULL);
pthread_cond_init(&progress_cond, NULL);
old_try_to_free_routine = set_try_to_free_routine(try_to_free_from_threads);
}
static void cleanup_threaded_search(void)
{
set_try_to_free_routine(old_try_to_free_routine);
pthread_cond_destroy(&progress_cond);
pthread_mutex_destroy(&read_mutex);
pthread_mutex_destroy(&cache_mutex);
pthread_mutex_destroy(&progress_mutex);
}
static void *threaded_find_deltas(void *arg)
{
struct thread_params *me = arg;
while (me->remaining) {
find_deltas(me->list, &me->remaining,
me->window, me->depth, me->processed);
progress_lock();
me->working = 0;
pthread_cond_signal(&progress_cond);
progress_unlock();
/*
* We must not set ->data_ready before we wait on the
* condition because the main thread may have set it to 1
* before we get here. In order to be sure that new
* work is available if we see 1 in ->data_ready, it
* was initialized to 0 before this thread was spawned
* and we reset it to 0 right away.
*/
pthread_mutex_lock(&me->mutex);
while (!me->data_ready)
pthread_cond_wait(&me->cond, &me->mutex);
me->data_ready = 0;
pthread_mutex_unlock(&me->mutex);
}
/* leave ->working 1 so that this doesn't get more work assigned */
return NULL;
}
static void ll_find_deltas(struct object_entry **list, unsigned list_size,
int window, int depth, unsigned *processed)
{
struct thread_params *p;
int i, ret, active_threads = 0;
init_threaded_search();
if (delta_search_threads <= 1) {
find_deltas(list, &list_size, window, depth, processed);
cleanup_threaded_search();
return;
}
if (progress > pack_to_stdout)
fprintf(stderr, "Delta compression using up to %d threads.\n",
delta_search_threads);
p = xcalloc(delta_search_threads, sizeof(*p));
/* Partition the work amongst work threads. */
for (i = 0; i < delta_search_threads; i++) {
unsigned sub_size = list_size / (delta_search_threads - i);
/* don't use too small segments or no deltas will be found */
if (sub_size < 2*window && i+1 < delta_search_threads)
sub_size = 0;
p[i].window = window;
p[i].depth = depth;
p[i].processed = processed;
p[i].working = 1;
p[i].data_ready = 0;
/* try to split chunks on "path" boundaries */
while (sub_size && sub_size < list_size &&
list[sub_size]->hash &&
list[sub_size]->hash == list[sub_size-1]->hash)
sub_size++;
p[i].list = list;
p[i].list_size = sub_size;
p[i].remaining = sub_size;
list += sub_size;
list_size -= sub_size;
}
/* Start work threads. */
for (i = 0; i < delta_search_threads; i++) {
if (!p[i].list_size)
continue;
pthread_mutex_init(&p[i].mutex, NULL);
pthread_cond_init(&p[i].cond, NULL);
ret = pthread_create(&p[i].thread, NULL,
threaded_find_deltas, &p[i]);
if (ret)
die("unable to create thread: %s", strerror(ret));
active_threads++;
}
/*
* Now let's wait for work completion. Each time a thread is done
* with its work, we steal half of the remaining work from the
* thread with the largest number of unprocessed objects and give
* it to that newly idle thread. This ensure good load balancing
* until the remaining object list segments are simply too short
* to be worth splitting anymore.
*/
while (active_threads) {
struct thread_params *target = NULL;
struct thread_params *victim = NULL;
unsigned sub_size = 0;
progress_lock();
for (;;) {
for (i = 0; !target && i < delta_search_threads; i++)
if (!p[i].working)
target = &p[i];
if (target)
break;
pthread_cond_wait(&progress_cond, &progress_mutex);
}
for (i = 0; i < delta_search_threads; i++)
if (p[i].remaining > 2*window &&
(!victim || victim->remaining < p[i].remaining))
victim = &p[i];
if (victim) {
sub_size = victim->remaining / 2;
list = victim->list + victim->list_size - sub_size;
while (sub_size && list[0]->hash &&
list[0]->hash == list[-1]->hash) {
list++;
sub_size--;
}
if (!sub_size) {
/*
* It is possible for some "paths" to have
* so many objects that no hash boundary
* might be found. Let's just steal the
* exact half in that case.
*/
sub_size = victim->remaining / 2;
list -= sub_size;
}
target->list = list;
victim->list_size -= sub_size;
victim->remaining -= sub_size;
}
target->list_size = sub_size;
target->remaining = sub_size;
target->working = 1;
progress_unlock();
pthread_mutex_lock(&target->mutex);
target->data_ready = 1;
pthread_cond_signal(&target->cond);
pthread_mutex_unlock(&target->mutex);
if (!sub_size) {
pthread_join(target->thread, NULL);
pthread_cond_destroy(&target->cond);
pthread_mutex_destroy(&target->mutex);
active_threads--;
}
}
cleanup_threaded_search();
free(p);
}
#else
#define ll_find_deltas(l, s, w, d, p) find_deltas(l, &s, w, d, p)
#endif
pack-objects: walk tag chains for --include-tag When pack-objects is given --include-tag, it peels each tag ref down to a non-tag object, and if that non-tag object is going to be packed, we include the tag, too. But what happens if we have a chain of tags (e.g., tag "A" points to tag "B", which points to commit "C")? We'll peel down to "C" and realize that we want to include tag "A", but we do not ever consider tag "B", leading to a broken pack (assuming "B" was not otherwise selected). Instead, we have to walk the whole chain, adding any tags we find to the pack. Interestingly, it doesn't seem possible to trigger this problem with "git fetch", but you can with "git clone --single-branch". The reason is that we generate the correct pack when the client explicitly asks for "A" (because we do a real reachability analysis there), and "fetch" is more willing to do so. There are basically two cases: 1. If "C" is already a ref tip, then the client can deduce that it needs "A" itself (via find_non_local_tags), and will ask for it explicitly rather than relying on the include-tag capability. Everything works. 2. If "C" is not already a ref tip, then we hope for include-tag to send us the correct tag. But it doesn't; it generates a broken pack. However, the next step is to do a follow-up run of find_non_local_tags(), followed by fetch_refs() to backfill any tags we learned about. In the normal case, fetch_refs() calls quickfetch(), which does a connectivity check and sees we have no new objects to fetch. We just write the refs. But for the broken-pack case, the connectivity check fails, and quickfetch will follow-up with the remote, asking explicitly for each of the ref tips. This picks up the missing object in a new pack. For a regular "git clone", we are similarly OK, because we explicitly request all of the tag refs, and get a correct pack. But with "--single-branch", we kick in tag auto-following via "include-tag", but do _not_ do a follow-up backfill. We just take whatever the server sent us via include-tag and write out tag refs for any tag objects we were sent. So prior to c6807a4 (clone: open a shortcut for connectivity check, 2013-05-26), we actually claimed the clone was a success, but the result was silently corrupted! Since c6807a4, index-pack's connectivity check catches this case, and we correctly complain. The included test directly checks that pack-objects does not generate a broken pack, but also confirms that "clone --single-branch" does not hit the bug. Note that tag chains introduce another interesting question: if we are packing the tag "B" but not the commit "C", should "A" be included? Both before and after this patch, we do not include "A", because the initial peel_ref() check only knows about the bottom-most level, "C". To realize that "B" is involved at all, we would have to switch to an incremental peel, in which we examine each tagged object, asking if it is being packed (and including the outer tag if so). But that runs contrary to the optimizations in peel_ref(), which avoid accessing the objects at all, in favor of using the value we pull from packed-refs. It's OK to walk the whole chain once we know we're going to include the tag (we have to access it anyway, so the effort is proportional to the pack we're generating). But for the initial selection, we have to look at every ref. If we're only packing a few objects, we'd still have to parse every single referenced tag object just to confirm that it isn't part of a tag chain. This could be addressed if packed-refs stored the complete tag chain for each peeled ref (in most cases, this would be the same cost as now, as each "chain" is only a single link). But given the size of that project, it's out of scope for this fix (and probably nobody cares enough anyway, as it's such an obscure situation). This commit limits itself to just avoiding the creation of a broken pack. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-06 05:52:26 +08:00
static void add_tag_chain(const struct object_id *oid)
{
struct tag *tag;
/*
* We catch duplicates already in add_object_entry(), but we'd
* prefer to do this extra check to avoid having to parse the
* tag at all if we already know that it's being packed (e.g., if
* it was included via bitmaps, we would not have parsed it
* previously).
*/
if (packlist_find(&to_pack, oid->hash, NULL))
return;
tag = lookup_tag(oid->hash);
while (1) {
if (!tag || parse_tag(tag) || !tag->tagged)
die("unable to pack objects reachable from tag %s",
oid_to_hex(oid));
add_object_entry(tag->object.oid.hash, OBJ_TAG, NULL, 0);
if (tag->tagged->type != OBJ_TAG)
return;
tag = (struct tag *)tag->tagged;
}
}
static int add_ref_tag(const char *path, const struct object_id *oid, int flag, void *cb_data)
{
struct object_id peeled;
if (starts_with(path, "refs/tags/") && /* is a tag? */
!peel_ref(path, peeled.hash) && /* peelable? */
packlist_find(&to_pack, peeled.hash, NULL)) /* object packed? */
pack-objects: walk tag chains for --include-tag When pack-objects is given --include-tag, it peels each tag ref down to a non-tag object, and if that non-tag object is going to be packed, we include the tag, too. But what happens if we have a chain of tags (e.g., tag "A" points to tag "B", which points to commit "C")? We'll peel down to "C" and realize that we want to include tag "A", but we do not ever consider tag "B", leading to a broken pack (assuming "B" was not otherwise selected). Instead, we have to walk the whole chain, adding any tags we find to the pack. Interestingly, it doesn't seem possible to trigger this problem with "git fetch", but you can with "git clone --single-branch". The reason is that we generate the correct pack when the client explicitly asks for "A" (because we do a real reachability analysis there), and "fetch" is more willing to do so. There are basically two cases: 1. If "C" is already a ref tip, then the client can deduce that it needs "A" itself (via find_non_local_tags), and will ask for it explicitly rather than relying on the include-tag capability. Everything works. 2. If "C" is not already a ref tip, then we hope for include-tag to send us the correct tag. But it doesn't; it generates a broken pack. However, the next step is to do a follow-up run of find_non_local_tags(), followed by fetch_refs() to backfill any tags we learned about. In the normal case, fetch_refs() calls quickfetch(), which does a connectivity check and sees we have no new objects to fetch. We just write the refs. But for the broken-pack case, the connectivity check fails, and quickfetch will follow-up with the remote, asking explicitly for each of the ref tips. This picks up the missing object in a new pack. For a regular "git clone", we are similarly OK, because we explicitly request all of the tag refs, and get a correct pack. But with "--single-branch", we kick in tag auto-following via "include-tag", but do _not_ do a follow-up backfill. We just take whatever the server sent us via include-tag and write out tag refs for any tag objects we were sent. So prior to c6807a4 (clone: open a shortcut for connectivity check, 2013-05-26), we actually claimed the clone was a success, but the result was silently corrupted! Since c6807a4, index-pack's connectivity check catches this case, and we correctly complain. The included test directly checks that pack-objects does not generate a broken pack, but also confirms that "clone --single-branch" does not hit the bug. Note that tag chains introduce another interesting question: if we are packing the tag "B" but not the commit "C", should "A" be included? Both before and after this patch, we do not include "A", because the initial peel_ref() check only knows about the bottom-most level, "C". To realize that "B" is involved at all, we would have to switch to an incremental peel, in which we examine each tagged object, asking if it is being packed (and including the outer tag if so). But that runs contrary to the optimizations in peel_ref(), which avoid accessing the objects at all, in favor of using the value we pull from packed-refs. It's OK to walk the whole chain once we know we're going to include the tag (we have to access it anyway, so the effort is proportional to the pack we're generating). But for the initial selection, we have to look at every ref. If we're only packing a few objects, we'd still have to parse every single referenced tag object just to confirm that it isn't part of a tag chain. This could be addressed if packed-refs stored the complete tag chain for each peeled ref (in most cases, this would be the same cost as now, as each "chain" is only a single link). But given the size of that project, it's out of scope for this fix (and probably nobody cares enough anyway, as it's such an obscure situation). This commit limits itself to just avoiding the creation of a broken pack. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-06 05:52:26 +08:00
add_tag_chain(oid);
return 0;
}
static void prepare_pack(int window, int depth)
{
struct object_entry **delta_list;
uint32_t i, nr_deltas;
unsigned n;
get_object_details();
close another possibility for propagating pack corruption Abstract -------- With index v2 we have a per object CRC to allow quick and safe reuse of pack data when repacking. This, however, doesn't currently prevent a stealth corruption from being propagated into a new pack when _not_ reusing pack data as demonstrated by the modification to t5302 included here. The Context ----------- The Git database is all checksummed with SHA1 hashes. Any kind of corruption can be confirmed by verifying this per object hash against corresponding data. However this can be costly to perform systematically and therefore this check is often not performed at run time when accessing the object database. First, the loose object format is entirely compressed with zlib which already provide a CRC verification of its own when inflating data. Any disk corruption would be caught already in this case. Then, packed objects are also compressed with zlib but only for their actual payload. The object headers and delta base references are not deflated for obvious performance reasons, however this leave them vulnerable to potentially undetected disk corruptions. Object types are often validated against the expected type when they're requested, and deflated size must always match the size recorded in the object header, so those cases are pretty much covered as well. Where corruptions could go unnoticed is in the delta base reference. Of course, in the OBJ_REF_DELTA case, the odds for a SHA1 reference to get corrupted so it actually matches the SHA1 of another object with the same size (the delta header stores the expected size of the base object to apply against) are virtually zero. In the OBJ_OFS_DELTA case, the reference is a pack offset which would have to match the start boundary of a different base object but still with the same size, and although this is relatively much more "probable" than in the OBJ_REF_DELTA case, the probability is also about zero in absolute terms. Still, the possibility exists as demonstrated in t5302 and is certainly greater than a SHA1 collision, especially in the OBJ_OFS_DELTA case which is now the default when repacking. Again, repacking by reusing existing pack data is OK since the per object CRC provided by index v2 guards against any such corruptions. What t5302 failed to test is a full repack in such case. The Solution ------------ As unlikely as this kind of stealth corruption can be in practice, it certainly isn't acceptable to propagate it into a freshly created pack. But, because this is so unlikely, we don't want to pay the run time cost associated with extra validation checks all the time either. Furthermore, consequences of such corruption in anything but repacking should be rather visible, and even if it could be quite unpleasant, it still has far less severe consequences than actively creating bad packs. So the best compromize is to check packed object CRC when unpacking objects, and only during the compression/writing phase of a repack, and only when not streaming the result. The cost of this is minimal (less than 1% CPU time), and visible only with a full repack. Someone with a stats background could provide an objective evaluation of this, but I suspect that it's bad RAM that has more potential for data corruptions at this point, even in those cases where this extra check is not performed. Still, it is best to prevent a known hole for corruption when recreating object data into a new pack. What about the streamed pack case? Well, any client receiving a pack must always consider that pack as untrusty and perform full validation anyway, hence no such stealth corruption could be propagated to remote repositoryes already. It is therefore worthless doing local validation in that case. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-10-31 23:31:08 +08:00
/*
* If we're locally repacking then we need to be doubly careful
* from now on in order to make sure no stealth corruption gets
* propagated to the new pack. Clients receiving streamed packs
* should validate everything they get anyway so no need to incur
* the additional cost here in that case.
*/
if (!pack_to_stdout)
do_check_packed_object_crc = 1;
if (!to_pack.nr_objects || !window || !depth)
return;
ALLOC_ARRAY(delta_list, to_pack.nr_objects);
nr_deltas = n = 0;
for (i = 0; i < to_pack.nr_objects; i++) {
struct object_entry *entry = to_pack.objects + i;
if (entry->delta)
/* This happens if we decided to reuse existing
* delta from a pack. "reuse_delta &&" is implied.
*/
continue;
if (entry->size < 50)
continue;
if (entry->no_try_delta)
continue;
if (!entry->preferred_base) {
nr_deltas++;
if (entry->type < 0)
die("unable to get type of object %s",
sha1_to_hex(entry->idx.sha1));
} else {
if (entry->type < 0) {
/*
* This object is not found, but we
* don't have to include it anyway.
*/
continue;
}
}
delta_list[n++] = entry;
}
if (nr_deltas && n > 1) {
unsigned nr_done = 0;
if (progress)
progress_state = start_progress(_("Compressing objects"),
nr_deltas);
QSORT(delta_list, n, type_size_sort);
ll_find_deltas(delta_list, n, window+1, depth, &nr_done);
stop_progress(&progress_state);
if (nr_done != nr_deltas)
die("inconsistency with delta count");
}
free(delta_list);
}
static int git_pack_config(const char *k, const char *v, void *cb)
{
if (!strcmp(k, "pack.window")) {
window = git_config_int(k, v);
return 0;
}
if (!strcmp(k, "pack.windowmemory")) {
window_memory_limit = git_config_ulong(k, v);
return 0;
}
if (!strcmp(k, "pack.depth")) {
depth = git_config_int(k, v);
return 0;
}
if (!strcmp(k, "pack.deltacachesize")) {
max_delta_cache_size = git_config_int(k, v);
return 0;
}
if (!strcmp(k, "pack.deltacachelimit")) {
cache_max_small_delta_size = git_config_int(k, v);
return 0;
}
pack-bitmap: implement optional name_hash cache When we use pack bitmaps rather than walking the object graph, we end up with the list of objects to include in the packfile, but we do not know the path at which any tree or blob objects would be found. In a recently packed repository, this is fine. A fetch would use the paths only as a heuristic in the delta compression phase, and a fully packed repository should not need to do much delta compression. As time passes, though, we may acquire more objects on top of our large bitmapped pack. If clients fetch frequently, then they never even look at the bitmapped history, and all works as usual. However, a client who has not fetched since the last bitmap repack will have "have" tips in the bitmapped history, but "want" newer objects. The bitmaps themselves degrade gracefully in this circumstance. We manually walk the more recent bits of history, and then use bitmaps when we hit them. But we would also like to perform delta compression between the newer objects and the bitmapped objects (both to delta against what we know the user already has, but also between "new" and "old" objects that the user is fetching). The lack of pathnames makes our delta heuristics much less effective. This patch adds an optional cache of the 32-bit name_hash values to the end of the bitmap file. If present, a reader can use it to match bitmapped and non-bitmapped names during delta compression. Here are perf results for p5310: Test origin/master HEAD^ HEAD ------------------------------------------------------------------------------------------------- 5310.2: repack to disk 36.81(37.82+1.43) 47.70(48.74+1.41) +29.6% 47.75(48.70+1.51) +29.7% 5310.3: simulated clone 30.78(29.70+2.14) 1.08(0.97+0.10) -96.5% 1.07(0.94+0.12) -96.5% 5310.4: simulated fetch 3.16(6.10+0.08) 3.54(10.65+0.06) +12.0% 1.70(3.07+0.06) -46.2% 5310.6: partial bitmap 36.76(43.19+1.81) 6.71(11.25+0.76) -81.7% 4.08(6.26+0.46) -88.9% You can see that the time spent on an incremental fetch goes down, as our delta heuristics are able to do their work. And we save time on the partial bitmap clone for the same reason. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:45 +08:00
if (!strcmp(k, "pack.writebitmaphashcache")) {
if (git_config_bool(k, v))
write_bitmap_options |= BITMAP_OPT_HASH_CACHE;
else
write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE;
}
pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:09 +08:00
if (!strcmp(k, "pack.usebitmaps")) {
pack-objects: use reachability bitmap index when generating non-stdout pack Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) if a repository has bitmap index, pack-objects can nicely speedup "Counting objects" graph traversal phase. That however was done only for case when resultant pack is sent to stdout, not written into a file. The reason here is for on-disk repack by default we want: - to produce good pack (with bitmap index not-yet-packed objects are emitted to pack in suboptimal order). - to use more robust pack-generation codepath (avoiding possible bugs in bitmap code and possible bitmap index corruption). Jeff King further explains: The reason for this split is that pack-objects tries to determine how "careful" it should be based on whether we are packing to disk or to stdout. Packing to disk implies "git repack", and that we will likely delete the old packs after finishing. We want to be more careful (so as not to carry forward a corruption, and to generate a more optimal pack), and we presumably run less frequently and can afford extra CPU. Whereas packing to stdout implies serving a remote via "git fetch" or "git push". This happens more frequently (e.g., a server handling many fetching clients), and we assume the receiving end takes more responsibility for verifying the data. But this isn't always the case. One might want to generate on-disk packfiles for a specialized object transfer. Just using "--stdout" and writing to a file is not optimal, as it will not generate the matching pack index. So it would be useful to have some way of overriding this heuristic: to tell pack-objects that even though it should generate on-disk files, it is still OK to use the reachability bitmaps to do the traversal. So we can teach pack-objects to use bitmap index for initial object counting phase when generating resultant pack file too: - if we take care to not let it be activated under git-repack: See above about repack robustness and not forward-carrying corruption. - if we know bitmap index generation is not enabled for resultant pack: The current code has singleton bitmap_git, so it cannot work simultaneously with two bitmap indices. We also want to avoid (at least with current implementation) generating bitmaps off of bitmaps. The reason here is: when generating a pack, not-yet-packed objects will be emitted into pack in suboptimal order and added to tail of the bitmap as "extended entries". When the resultant pack + some new objects in associated repository are in turn used to generate another pack with bitmap, the situation repeats: new objects are again not emitted optimally and just added to bitmap tail - not in recency order. So the pack badness can grow over time when at each step we have bitmapped pack + some other objects. That's why we want to avoid generating bitmaps off of bitmaps, not to let pack badness grow. - if we keep pack reuse enabled still only for "send-to-stdout" case: Because pack-to-file needs to generate index for destination pack, and currently on pack reuse raw entries are directly written out to the destination pack by write_reused_pack(), bypassing needed for pack index generation bookkeeping done by regular codepath in write_one() and friends. ( In the future we might teach pack-reuse code about cases when index also needs to be generated for resultant pack and remove pack-reuse-only-for-stdout limitation ) This way for pack-objects -> file we get nice speedup: erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup repository managed by git-backup[2] via time echo 0186ac99 | git pack-objects --revs erp5pack before: 37.2s after: 26.2s And for `git repack -adb` packed git.git time echo 5c589a73 | git pack-objects --revs gitpack before: 7.1s after: 3.6s i.e. it can be 30% - 50% speedup for pack extraction. git-backup extracts many packs on repositories restoration. That was my initial motivation for the patch. [1] https://lab.nexedi.com/nexedi/erp5 [2] https://lab.nexedi.com/kirr/git-backup NOTE Jeff also suggests that pack.useBitmaps was probably a mistake to introduce originally. This way we are not adding another config point, but instead just always default to-file pack-objects not to use bitmap index: Tools which need to generate on-disk packs with using bitmap, can pass --use-bitmap-index explicitly. And git-repack does never pass --use-bitmap-index, so this way we can be sure regular on-disk repacking remains robust. NOTE2 `git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much slower than `git pack-objects file.pack`. Extracting erp5.git pack from lab.nexedi.com backup repository: $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack real 0m22.309s user 0m21.148s sys 0m0.932s $ time git index-pack erp5pack-stdout.pack real 0m50.873s <-- more than 2 times slower than time to generate pack itself! user 0m49.300s sys 0m1.360s So the time for `pack-object --stdout >file.pack` + `index-pack file.pack` is 72s, while `pack-objects file.pack` which does both pack and index is 27s. And even `pack-objects --no-use-bitmap-index file.pack` is 37s. Jeff explains: The packfile does not carry the sha1 of the objects. A receiving index-pack has to compute them itself, including inflating and applying all of the deltas. that's why for `git-backup restore` we want to teach `git pack-objects file.pack` to use bitmaps instead of using `git pack-objects --stdout >file.pack` + `git index-pack file.pack`. NOTE3 The speedup is now tracked via t/perf/p5310-pack-bitmaps.sh Test 56dfeb62 this tree -------------------------------------------------------------------------------- 5310.2: repack to disk 8.98(8.05+0.29) 9.05(8.08+0.33) +0.8% 5310.3: simulated clone 2.02(2.27+0.09) 2.01(2.25+0.08) -0.5% 5310.4: simulated fetch 0.81(1.07+0.02) 0.81(1.05+0.04) +0.0% 5310.5: pack to file 7.58(7.04+0.28) 7.60(7.04+0.30) +0.3% 5310.6: pack to file (bitmap) 7.55(7.02+0.28) 3.25(2.82+0.18) -57.0% 5310.8: clone (partial bitmap) 1.83(2.26+0.12) 1.82(2.22+0.14) -0.5% 5310.9: pack to file (partial bitmap) 6.86(6.58+0.30) 2.87(2.74+0.20) -58.2% More context: http://marc.info/?t=146792101400001&r=1&w=2 http://public-inbox.org/git/20160707190917.20011-1-kirr@nexedi.com/T/#t Cc: Vicent Marti <tanoku@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-10 23:01:44 +08:00
use_bitmap_index_default = git_config_bool(k, v);
pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:09 +08:00
return 0;
}
if (!strcmp(k, "pack.threads")) {
delta_search_threads = git_config_int(k, v);
if (delta_search_threads < 0)
die("invalid number of threads specified (%d)",
delta_search_threads);
#ifdef NO_PTHREADS
if (delta_search_threads != 1)
warning("no threads support, ignoring %s", k);
#endif
return 0;
}
if (!strcmp(k, "pack.indexversion")) {
pack_idx_opts.version = git_config_int(k, v);
if (pack_idx_opts.version > 2)
die("bad pack.indexversion=%"PRIu32,
pack_idx_opts.version);
return 0;
}
return git_default_config(k, v, cb);
}
static void read_object_list_from_stdin(void)
{
char line[40 + 1 + PATH_MAX + 2];
unsigned char sha1[20];
for (;;) {
if (!fgets(line, sizeof(line), stdin)) {
if (feof(stdin))
break;
if (!ferror(stdin))
die("fgets returned NULL, not EOF, not error!");
if (errno != EINTR)
die_errno("fgets");
clearerr(stdin);
continue;
}
if (line[0] == '-') {
if (get_sha1_hex(line+1, sha1))
die("expected edge sha1, got garbage:\n %s",
line);
add_preferred_base(sha1);
continue;
}
if (get_sha1_hex(line, sha1))
die("expected sha1, got garbage:\n %s", line);
add_preferred_base_object(line+41);
add_object_entry(sha1, 0, line+41, 0);
}
}
#define OBJECT_ADDED (1u<<20)
static void show_commit(struct commit *commit, void *data)
{
add_object_entry(commit->object.oid.hash, OBJ_COMMIT, NULL, 0);
commit->object.flags |= OBJECT_ADDED;
pack-objects: implement bitmap writing This commit extends more the functionality of `pack-objects` by allowing it to write out a `.bitmap` index next to any written packs, together with the `.idx` index that currently gets written. If bitmap writing is enabled for a given repository (either by calling `pack-objects` with the `--write-bitmap-index` flag or by having `pack.writebitmaps` set to `true` in the config) and pack-objects is writing a packfile that would normally be indexed (i.e. not piping to stdout), we will attempt to write the corresponding bitmap index for the packfile. Bitmap index writing happens after the packfile and its index has been successfully written to disk (`finish_tmp_packfile`). The process is performed in several steps: 1. `bitmap_writer_set_checksum`: this call stores the partial checksum for the packfile being written; the checksum will be written in the resulting bitmap index to verify its integrity 2. `bitmap_writer_build_type_index`: this call uses the array of `struct object_entry` that has just been sorted when writing out the actual packfile index to disk to generate 4 type-index bitmaps (one for each object type). These bitmaps have their nth bit set if the given object is of the bitmap's type. E.g. the nth bit of the Commits bitmap will be 1 if the nth object in the packfile index is a commit. This is a very cheap operation because the bitmap writing code has access to the metadata stored in the `struct object_entry` array, and hence the real type for each object in the packfile. 3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap index for one of the packfiles we're trying to repack, this call will efficiently rebuild the existing bitmaps so they can be reused on the new index. All the existing bitmaps will be stored in a `reuse` hash table, and the commit selection phase will prioritize these when selecting, as they can be written directly to the new index without having to perform a revision walk to fill the bitmap. This can greatly speed up the repack of a repository that already has bitmaps. 4. `bitmap_writer_select_commits`: if bitmap writing is enabled for a given `pack-objects` run, the sequence of commits generated during the Counting Objects phase will be stored in an array. We then use that array to build up the list of selected commits. Writing a bitmap in the index for each object in the repository would be cost-prohibitive, so we use a simple heuristic to pick the commits that will be indexed with bitmaps. The current heuristics are a simplified version of JGit's original implementation. We select a higher density of commits depending on their age: the 100 most recent commits are always selected, after that we pick 1 commit of each 100, and the gap increases as the commits grow older. On top of that, we make sure that every single branch that has not been merged (all the tips that would be required from a clone) gets their own bitmap, and when selecting commits between a gap, we tend to prioritize the commit with the most parents. Do note that there is no right/wrong way to perform commit selection; different selection algorithms will result in different commits being selected, but there's no such thing as "missing a commit". The bitmap walker algorithm implemented in `prepare_bitmap_walk` is able to adapt to missing bitmaps by performing manual walks that complete the bitmap: the ideal selection algorithm, however, would select the commits that are more likely to be used as roots for a walk in the future (e.g. the tips of each branch, and so on) to ensure a bitmap for them is always available. 5. `bitmap_writer_build`: this is the computationally expensive part of bitmap generation. Based on the list of commits that were selected in the previous step, we perform several incremental walks to generate the bitmap for each commit. The walks begin from the oldest commit, and are built up incrementally for each branch. E.g. consider this dag where A, B, C, D, E, F are the selected commits, and a, b, c, e are a chunk of simplified history that will not receive bitmaps. A---a---B--b--C--c--D \ E--e--F We start by building the bitmap for A, using A as the root for a revision walk and marking all the objects that are reachable until the walk is over. Once this bitmap is stored, we reuse the bitmap walker to perform the walk for B, assuming that once we reach A again, the walk will be terminated because A has already been SEEN on the previous walk. This process is repeated for C, and D, but when we try to generate the bitmaps for E, we can reuse neither the current walk nor the bitmap we have generated so far. What we do now is resetting both the walk and clearing the bitmap, and performing the walk from scratch using E as the origin. This new walk, however, does not need to be completed. Once we hit B, we can lookup the bitmap we have already stored for that commit and OR it with the existing bitmap we've composed so far, allowing us to limit the walk early. After all the bitmaps have been generated, another iteration through the list of commits is performed to find the best XOR offsets for compression before writing them to disk. Because of the incremental nature of these bitmaps, XORing one of them with its predecesor results in a minimal "bitmap delta" most of the time. We can write this delta to the on-disk bitmap index, and then re-compose the original bitmaps by XORing them again when loaded. This is a phase very similar to pack-object's `find_delta` (using bitmaps instead of objects, of course), except the heuristics have been greatly simplified: we only check the 10 bitmaps before any given one to find best compressing one. This gives good results in practice, because there is locality in the ordering of the objects (and therefore bitmaps) in the packfile. 6. `bitmap_writer_finish`: the last step in the process is serializing to disk all the bitmap data that has been generated in the two previous steps. The bitmap is written to a tmp file and then moved atomically to its final destination, using the same process as `pack-write.c:write_idx_file`. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:16 +08:00
if (write_bitmap_index)
index_commit_for_bitmap(commit);
}
static void show_object(struct object *obj, const char *name, void *data)
{
process_{tree,blob}: show objects without buffering Here's a less trivial thing, and slightly more dubious one. I was looking at that "struct object_array objects", and wondering why we do that. I have honestly totally forgotten. Why not just call the "show()" function as we encounter the objects? Rather than add the objects to the object_array, and then at the very end going through the array and doing a 'show' on all, just do things more incrementally. Now, there are possible downsides to this: - the "buffer using object_array" _can_ in theory result in at least better I-cache usage (two tight loops rather than one more spread out one). I don't think this is a real issue, but in theory.. - this _does_ change the order of the objects printed. Instead of doing a "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop over the commits (which puts all the root trees _first_ in the object list, this patch just adds them to the list of pending objects, and then we'll traverse them in that order (and thus show each root tree object together with the objects we discover under it) I _think_ the new ordering actually makes more sense, but the object ordering is actually a subtle thing when it comes to packing efficiency, so any change in order is going to have implications for packing. Good or bad, I dunno. - There may be some reason why we did it that odd way with the object array, that I have simply forgotten. Anyway, now that we don't buffer up the objects before showing them that may actually result in lower memory usage during that whole traverse_commit_list() phase. This is seriously not very deeply tested. It makes sense to me, it seems to pass all the tests, it looks ok, but... Does anybody remember why we did that "object_array" thing? It used to be an "object_list" a long long time ago, but got changed into the array due to better memory usage patterns (those linked lists of obejcts are horrible from a memory allocation standpoint). But I wonder why we didn't do this back then. Maybe there's a reason for it. Or maybe there _used_ to be a reason, and no longer is. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-11 08:27:58 +08:00
add_preferred_base_object(name);
add_object_entry(obj->oid.hash, obj->type, name, 0);
process_{tree,blob}: show objects without buffering Here's a less trivial thing, and slightly more dubious one. I was looking at that "struct object_array objects", and wondering why we do that. I have honestly totally forgotten. Why not just call the "show()" function as we encounter the objects? Rather than add the objects to the object_array, and then at the very end going through the array and doing a 'show' on all, just do things more incrementally. Now, there are possible downsides to this: - the "buffer using object_array" _can_ in theory result in at least better I-cache usage (two tight loops rather than one more spread out one). I don't think this is a real issue, but in theory.. - this _does_ change the order of the objects printed. Instead of doing a "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop over the commits (which puts all the root trees _first_ in the object list, this patch just adds them to the list of pending objects, and then we'll traverse them in that order (and thus show each root tree object together with the objects we discover under it) I _think_ the new ordering actually makes more sense, but the object ordering is actually a subtle thing when it comes to packing efficiency, so any change in order is going to have implications for packing. Good or bad, I dunno. - There may be some reason why we did it that odd way with the object array, that I have simply forgotten. Anyway, now that we don't buffer up the objects before showing them that may actually result in lower memory usage during that whole traverse_commit_list() phase. This is seriously not very deeply tested. It makes sense to me, it seems to pass all the tests, it looks ok, but... Does anybody remember why we did that "object_array" thing? It used to be an "object_list" a long long time ago, but got changed into the array due to better memory usage patterns (those linked lists of obejcts are horrible from a memory allocation standpoint). But I wonder why we didn't do this back then. Maybe there's a reason for it. Or maybe there _used_ to be a reason, and no longer is. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-11 08:27:58 +08:00
obj->flags |= OBJECT_ADDED;
}
static void show_edge(struct commit *commit)
{
add_preferred_base(commit->object.oid.hash);
}
struct in_pack_object {
off_t offset;
struct object *object;
};
struct in_pack {
int alloc;
int nr;
struct in_pack_object *array;
};
static void mark_in_pack_object(struct object *object, struct packed_git *p, struct in_pack *in_pack)
{
in_pack->array[in_pack->nr].offset = find_pack_entry_one(object->oid.hash, p);
in_pack->array[in_pack->nr].object = object;
in_pack->nr++;
}
/*
* Compare the objects in the offset order, in order to emulate the
* "git rev-list --objects" output that produced the pack originally.
*/
static int ofscmp(const void *a_, const void *b_)
{
struct in_pack_object *a = (struct in_pack_object *)a_;
struct in_pack_object *b = (struct in_pack_object *)b_;
if (a->offset < b->offset)
return -1;
else if (a->offset > b->offset)
return 1;
else
return oidcmp(&a->object->oid, &b->object->oid);
}
static void add_objects_in_unpacked_packs(struct rev_info *revs)
{
struct packed_git *p;
struct in_pack in_pack;
uint32_t i;
memset(&in_pack, 0, sizeof(in_pack));
for (p = packed_git; p; p = p->next) {
const unsigned char *sha1;
struct object *o;
if (!p->pack_local || p->pack_keep)
continue;
if (open_pack_index(p))
die("cannot open pack index");
ALLOC_GROW(in_pack.array,
in_pack.nr + p->num_objects,
in_pack.alloc);
for (i = 0; i < p->num_objects; i++) {
sha1 = nth_packed_object_sha1(p, i);
o = lookup_unknown_object(sha1);
if (!(o->flags & OBJECT_ADDED))
mark_in_pack_object(o, p, &in_pack);
o->flags |= OBJECT_ADDED;
}
}
if (in_pack.nr) {
QSORT(in_pack.array, in_pack.nr, ofscmp);
for (i = 0; i < in_pack.nr; i++) {
struct object *o = in_pack.array[i].object;
add_object_entry(o->oid.hash, o->type, "", 0);
}
}
free(in_pack.array);
}
static int add_loose_object(const unsigned char *sha1, const char *path,
void *data)
{
enum object_type type = sha1_object_info(sha1, NULL);
if (type < 0) {
warning("loose object at %s could not be examined", path);
return 0;
}
add_object_entry(sha1, type, "", 0);
return 0;
}
/*
* We actually don't even have to worry about reachability here.
* add_object_entry will weed out duplicates, so we just add every
* loose object we find.
*/
static void add_unreachable_loose_objects(void)
{
for_each_loose_file_in_objdir(get_object_directory(),
add_loose_object,
NULL, NULL, NULL);
}
static int has_sha1_pack_kept_or_nonlocal(const unsigned char *sha1)
{
static struct packed_git *last_found = (void *)1;
struct packed_git *p;
p = (last_found != (void *)1) ? last_found : packed_git;
while (p) {
if ((!p->pack_local || p->pack_keep) &&
find_pack_entry_one(sha1, p)) {
last_found = p;
return 1;
}
if (p == last_found)
p = packed_git;
else
p = p->next;
if (p == last_found)
p = p->next;
}
return 0;
}
/*
* Store a list of sha1s that are should not be discarded
* because they are either written too recently, or are
* reachable from another object that was.
*
* This is filled by get_object_list.
*/
static struct sha1_array recent_objects;
static int loosened_object_can_be_discarded(const unsigned char *sha1,
unsigned long mtime)
{
if (!unpack_unreachable_expiration)
return 0;
if (mtime > unpack_unreachable_expiration)
return 0;
if (sha1_array_lookup(&recent_objects, sha1) >= 0)
return 0;
return 1;
}
let pack-objects do the writing of unreachable objects as loose objects Commit ccc1297226b184c40459e9d373cc9eebfb7bd898 changed the behavior of 'git repack -A' so unreachable objects are stored as loose objects. However it did so in a naive and inn efficient way by making packs about to be deleted inaccessible and feeding their content through 'git unpack-objects'. While this works, there are major flaws with this approach: - It is unacceptably sloooooooooooooow. In the Linux kernel repository with no actual unreachable objects, doing 'git repack -A -d' before: real 2m33.220s user 2m21.675s sys 0m3.510s And with this change: real 0m36.849s user 0m24.365s sys 0m1.950s For reference, here's the timing for 'git repack -a -d': real 0m35.816s user 0m22.571s sys 0m2.011s This is explained by the fact that 'git unpack-objects' was used to unpack _every_ objects even if (almost) 100% of them were thrown away. - There is a black out period. Between the removal of the .idx file for the redundant pack and the completion of its unpacking, the unreachable objects become completely unaccessible. This is not a big issue as we're talking about unreachable objects, but some consistency is always good. - There is no way to easily set a sensible mtime for the newly created unreachable loose objects. So, while having a command called "pack-objects" to perform object unpacking looks really odd, this is probably the best compromize to be able to solve the above issues in an efficient way. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-14 13:33:53 +08:00
static void loosen_unused_packed_objects(struct rev_info *revs)
{
struct packed_git *p;
uint32_t i;
const unsigned char *sha1;
for (p = packed_git; p; p = p->next) {
if (!p->pack_local || p->pack_keep)
let pack-objects do the writing of unreachable objects as loose objects Commit ccc1297226b184c40459e9d373cc9eebfb7bd898 changed the behavior of 'git repack -A' so unreachable objects are stored as loose objects. However it did so in a naive and inn efficient way by making packs about to be deleted inaccessible and feeding their content through 'git unpack-objects'. While this works, there are major flaws with this approach: - It is unacceptably sloooooooooooooow. In the Linux kernel repository with no actual unreachable objects, doing 'git repack -A -d' before: real 2m33.220s user 2m21.675s sys 0m3.510s And with this change: real 0m36.849s user 0m24.365s sys 0m1.950s For reference, here's the timing for 'git repack -a -d': real 0m35.816s user 0m22.571s sys 0m2.011s This is explained by the fact that 'git unpack-objects' was used to unpack _every_ objects even if (almost) 100% of them were thrown away. - There is a black out period. Between the removal of the .idx file for the redundant pack and the completion of its unpacking, the unreachable objects become completely unaccessible. This is not a big issue as we're talking about unreachable objects, but some consistency is always good. - There is no way to easily set a sensible mtime for the newly created unreachable loose objects. So, while having a command called "pack-objects" to perform object unpacking looks really odd, this is probably the best compromize to be able to solve the above issues in an efficient way. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-14 13:33:53 +08:00
continue;
if (open_pack_index(p))
die("cannot open pack index");
for (i = 0; i < p->num_objects; i++) {
sha1 = nth_packed_object_sha1(p, i);
if (!packlist_find(&to_pack, sha1, NULL) &&
!has_sha1_pack_kept_or_nonlocal(sha1) &&
!loosened_object_can_be_discarded(sha1, p->mtime))
let pack-objects do the writing of unreachable objects as loose objects Commit ccc1297226b184c40459e9d373cc9eebfb7bd898 changed the behavior of 'git repack -A' so unreachable objects are stored as loose objects. However it did so in a naive and inn efficient way by making packs about to be deleted inaccessible and feeding their content through 'git unpack-objects'. While this works, there are major flaws with this approach: - It is unacceptably sloooooooooooooow. In the Linux kernel repository with no actual unreachable objects, doing 'git repack -A -d' before: real 2m33.220s user 2m21.675s sys 0m3.510s And with this change: real 0m36.849s user 0m24.365s sys 0m1.950s For reference, here's the timing for 'git repack -a -d': real 0m35.816s user 0m22.571s sys 0m2.011s This is explained by the fact that 'git unpack-objects' was used to unpack _every_ objects even if (almost) 100% of them were thrown away. - There is a black out period. Between the removal of the .idx file for the redundant pack and the completion of its unpacking, the unreachable objects become completely unaccessible. This is not a big issue as we're talking about unreachable objects, but some consistency is always good. - There is no way to easily set a sensible mtime for the newly created unreachable loose objects. So, while having a command called "pack-objects" to perform object unpacking looks really odd, this is probably the best compromize to be able to solve the above issues in an efficient way. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-14 13:33:53 +08:00
if (force_object_loose(sha1, p->mtime))
die("unable to force loose object");
}
}
}
/*
pack-objects: use reachability bitmap index when generating non-stdout pack Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) if a repository has bitmap index, pack-objects can nicely speedup "Counting objects" graph traversal phase. That however was done only for case when resultant pack is sent to stdout, not written into a file. The reason here is for on-disk repack by default we want: - to produce good pack (with bitmap index not-yet-packed objects are emitted to pack in suboptimal order). - to use more robust pack-generation codepath (avoiding possible bugs in bitmap code and possible bitmap index corruption). Jeff King further explains: The reason for this split is that pack-objects tries to determine how "careful" it should be based on whether we are packing to disk or to stdout. Packing to disk implies "git repack", and that we will likely delete the old packs after finishing. We want to be more careful (so as not to carry forward a corruption, and to generate a more optimal pack), and we presumably run less frequently and can afford extra CPU. Whereas packing to stdout implies serving a remote via "git fetch" or "git push". This happens more frequently (e.g., a server handling many fetching clients), and we assume the receiving end takes more responsibility for verifying the data. But this isn't always the case. One might want to generate on-disk packfiles for a specialized object transfer. Just using "--stdout" and writing to a file is not optimal, as it will not generate the matching pack index. So it would be useful to have some way of overriding this heuristic: to tell pack-objects that even though it should generate on-disk files, it is still OK to use the reachability bitmaps to do the traversal. So we can teach pack-objects to use bitmap index for initial object counting phase when generating resultant pack file too: - if we take care to not let it be activated under git-repack: See above about repack robustness and not forward-carrying corruption. - if we know bitmap index generation is not enabled for resultant pack: The current code has singleton bitmap_git, so it cannot work simultaneously with two bitmap indices. We also want to avoid (at least with current implementation) generating bitmaps off of bitmaps. The reason here is: when generating a pack, not-yet-packed objects will be emitted into pack in suboptimal order and added to tail of the bitmap as "extended entries". When the resultant pack + some new objects in associated repository are in turn used to generate another pack with bitmap, the situation repeats: new objects are again not emitted optimally and just added to bitmap tail - not in recency order. So the pack badness can grow over time when at each step we have bitmapped pack + some other objects. That's why we want to avoid generating bitmaps off of bitmaps, not to let pack badness grow. - if we keep pack reuse enabled still only for "send-to-stdout" case: Because pack-to-file needs to generate index for destination pack, and currently on pack reuse raw entries are directly written out to the destination pack by write_reused_pack(), bypassing needed for pack index generation bookkeeping done by regular codepath in write_one() and friends. ( In the future we might teach pack-reuse code about cases when index also needs to be generated for resultant pack and remove pack-reuse-only-for-stdout limitation ) This way for pack-objects -> file we get nice speedup: erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup repository managed by git-backup[2] via time echo 0186ac99 | git pack-objects --revs erp5pack before: 37.2s after: 26.2s And for `git repack -adb` packed git.git time echo 5c589a73 | git pack-objects --revs gitpack before: 7.1s after: 3.6s i.e. it can be 30% - 50% speedup for pack extraction. git-backup extracts many packs on repositories restoration. That was my initial motivation for the patch. [1] https://lab.nexedi.com/nexedi/erp5 [2] https://lab.nexedi.com/kirr/git-backup NOTE Jeff also suggests that pack.useBitmaps was probably a mistake to introduce originally. This way we are not adding another config point, but instead just always default to-file pack-objects not to use bitmap index: Tools which need to generate on-disk packs with using bitmap, can pass --use-bitmap-index explicitly. And git-repack does never pass --use-bitmap-index, so this way we can be sure regular on-disk repacking remains robust. NOTE2 `git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much slower than `git pack-objects file.pack`. Extracting erp5.git pack from lab.nexedi.com backup repository: $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack real 0m22.309s user 0m21.148s sys 0m0.932s $ time git index-pack erp5pack-stdout.pack real 0m50.873s <-- more than 2 times slower than time to generate pack itself! user 0m49.300s sys 0m1.360s So the time for `pack-object --stdout >file.pack` + `index-pack file.pack` is 72s, while `pack-objects file.pack` which does both pack and index is 27s. And even `pack-objects --no-use-bitmap-index file.pack` is 37s. Jeff explains: The packfile does not carry the sha1 of the objects. A receiving index-pack has to compute them itself, including inflating and applying all of the deltas. that's why for `git-backup restore` we want to teach `git pack-objects file.pack` to use bitmaps instead of using `git pack-objects --stdout >file.pack` + `git index-pack file.pack`. NOTE3 The speedup is now tracked via t/perf/p5310-pack-bitmaps.sh Test 56dfeb62 this tree -------------------------------------------------------------------------------- 5310.2: repack to disk 8.98(8.05+0.29) 9.05(8.08+0.33) +0.8% 5310.3: simulated clone 2.02(2.27+0.09) 2.01(2.25+0.08) -0.5% 5310.4: simulated fetch 0.81(1.07+0.02) 0.81(1.05+0.04) +0.0% 5310.5: pack to file 7.58(7.04+0.28) 7.60(7.04+0.30) +0.3% 5310.6: pack to file (bitmap) 7.55(7.02+0.28) 3.25(2.82+0.18) -57.0% 5310.8: clone (partial bitmap) 1.83(2.26+0.12) 1.82(2.22+0.14) -0.5% 5310.9: pack to file (partial bitmap) 6.86(6.58+0.30) 2.87(2.74+0.20) -58.2% More context: http://marc.info/?t=146792101400001&r=1&w=2 http://public-inbox.org/git/20160707190917.20011-1-kirr@nexedi.com/T/#t Cc: Vicent Marti <tanoku@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-10 23:01:44 +08:00
* This tracks any options which pack-reuse code expects to be on, or which a
* reader of the pack might not understand, and which would therefore prevent
* blind reuse of what we have on disk.
*/
static int pack_options_allow_reuse(void)
{
pack-objects: use reachability bitmap index when generating non-stdout pack Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) if a repository has bitmap index, pack-objects can nicely speedup "Counting objects" graph traversal phase. That however was done only for case when resultant pack is sent to stdout, not written into a file. The reason here is for on-disk repack by default we want: - to produce good pack (with bitmap index not-yet-packed objects are emitted to pack in suboptimal order). - to use more robust pack-generation codepath (avoiding possible bugs in bitmap code and possible bitmap index corruption). Jeff King further explains: The reason for this split is that pack-objects tries to determine how "careful" it should be based on whether we are packing to disk or to stdout. Packing to disk implies "git repack", and that we will likely delete the old packs after finishing. We want to be more careful (so as not to carry forward a corruption, and to generate a more optimal pack), and we presumably run less frequently and can afford extra CPU. Whereas packing to stdout implies serving a remote via "git fetch" or "git push". This happens more frequently (e.g., a server handling many fetching clients), and we assume the receiving end takes more responsibility for verifying the data. But this isn't always the case. One might want to generate on-disk packfiles for a specialized object transfer. Just using "--stdout" and writing to a file is not optimal, as it will not generate the matching pack index. So it would be useful to have some way of overriding this heuristic: to tell pack-objects that even though it should generate on-disk files, it is still OK to use the reachability bitmaps to do the traversal. So we can teach pack-objects to use bitmap index for initial object counting phase when generating resultant pack file too: - if we take care to not let it be activated under git-repack: See above about repack robustness and not forward-carrying corruption. - if we know bitmap index generation is not enabled for resultant pack: The current code has singleton bitmap_git, so it cannot work simultaneously with two bitmap indices. We also want to avoid (at least with current implementation) generating bitmaps off of bitmaps. The reason here is: when generating a pack, not-yet-packed objects will be emitted into pack in suboptimal order and added to tail of the bitmap as "extended entries". When the resultant pack + some new objects in associated repository are in turn used to generate another pack with bitmap, the situation repeats: new objects are again not emitted optimally and just added to bitmap tail - not in recency order. So the pack badness can grow over time when at each step we have bitmapped pack + some other objects. That's why we want to avoid generating bitmaps off of bitmaps, not to let pack badness grow. - if we keep pack reuse enabled still only for "send-to-stdout" case: Because pack-to-file needs to generate index for destination pack, and currently on pack reuse raw entries are directly written out to the destination pack by write_reused_pack(), bypassing needed for pack index generation bookkeeping done by regular codepath in write_one() and friends. ( In the future we might teach pack-reuse code about cases when index also needs to be generated for resultant pack and remove pack-reuse-only-for-stdout limitation ) This way for pack-objects -> file we get nice speedup: erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup repository managed by git-backup[2] via time echo 0186ac99 | git pack-objects --revs erp5pack before: 37.2s after: 26.2s And for `git repack -adb` packed git.git time echo 5c589a73 | git pack-objects --revs gitpack before: 7.1s after: 3.6s i.e. it can be 30% - 50% speedup for pack extraction. git-backup extracts many packs on repositories restoration. That was my initial motivation for the patch. [1] https://lab.nexedi.com/nexedi/erp5 [2] https://lab.nexedi.com/kirr/git-backup NOTE Jeff also suggests that pack.useBitmaps was probably a mistake to introduce originally. This way we are not adding another config point, but instead just always default to-file pack-objects not to use bitmap index: Tools which need to generate on-disk packs with using bitmap, can pass --use-bitmap-index explicitly. And git-repack does never pass --use-bitmap-index, so this way we can be sure regular on-disk repacking remains robust. NOTE2 `git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much slower than `git pack-objects file.pack`. Extracting erp5.git pack from lab.nexedi.com backup repository: $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack real 0m22.309s user 0m21.148s sys 0m0.932s $ time git index-pack erp5pack-stdout.pack real 0m50.873s <-- more than 2 times slower than time to generate pack itself! user 0m49.300s sys 0m1.360s So the time for `pack-object --stdout >file.pack` + `index-pack file.pack` is 72s, while `pack-objects file.pack` which does both pack and index is 27s. And even `pack-objects --no-use-bitmap-index file.pack` is 37s. Jeff explains: The packfile does not carry the sha1 of the objects. A receiving index-pack has to compute them itself, including inflating and applying all of the deltas. that's why for `git-backup restore` we want to teach `git pack-objects file.pack` to use bitmaps instead of using `git pack-objects --stdout >file.pack` + `git index-pack file.pack`. NOTE3 The speedup is now tracked via t/perf/p5310-pack-bitmaps.sh Test 56dfeb62 this tree -------------------------------------------------------------------------------- 5310.2: repack to disk 8.98(8.05+0.29) 9.05(8.08+0.33) +0.8% 5310.3: simulated clone 2.02(2.27+0.09) 2.01(2.25+0.08) -0.5% 5310.4: simulated fetch 0.81(1.07+0.02) 0.81(1.05+0.04) +0.0% 5310.5: pack to file 7.58(7.04+0.28) 7.60(7.04+0.30) +0.3% 5310.6: pack to file (bitmap) 7.55(7.02+0.28) 3.25(2.82+0.18) -57.0% 5310.8: clone (partial bitmap) 1.83(2.26+0.12) 1.82(2.22+0.14) -0.5% 5310.9: pack to file (partial bitmap) 6.86(6.58+0.30) 2.87(2.74+0.20) -58.2% More context: http://marc.info/?t=146792101400001&r=1&w=2 http://public-inbox.org/git/20160707190917.20011-1-kirr@nexedi.com/T/#t Cc: Vicent Marti <tanoku@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-10 23:01:44 +08:00
return pack_to_stdout && allow_ofs_delta;
}
pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:09 +08:00
static int get_object_list_from_bitmap(struct rev_info *revs)
{
if (prepare_bitmap_walk(revs) < 0)
return -1;
if (pack_options_allow_reuse() &&
!reuse_partial_packfile_from_bitmap(
pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:09 +08:00
&reuse_packfile,
&reuse_packfile_objects,
&reuse_packfile_offset)) {
assert(reuse_packfile_objects);
nr_result += reuse_packfile_objects;
display_progress(progress_state, nr_result);
pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:09 +08:00
}
traverse_bitmap_commit_list(&add_object_entry_from_bitmap);
return 0;
}
static void record_recent_object(struct object *obj,
const char *name,
void *data)
{
sha1_array_append(&recent_objects, obj->oid.hash);
}
static void record_recent_commit(struct commit *commit, void *data)
{
sha1_array_append(&recent_objects, commit->object.oid.hash);
}
static void get_object_list(int ac, const char **av)
{
struct rev_info revs;
char line[1000];
int flags = 0;
init_revisions(&revs, NULL);
save_commit_buffer = 0;
setup_revisions(ac, av, &revs, NULL);
/* make sure shallows are read */
is_repository_shallow();
while (fgets(line, sizeof(line), stdin) != NULL) {
int len = strlen(line);
if (len && line[len - 1] == '\n')
line[--len] = 0;
if (!len)
break;
if (*line == '-') {
if (!strcmp(line, "--not")) {
flags ^= UNINTERESTING;
pack-objects: implement bitmap writing This commit extends more the functionality of `pack-objects` by allowing it to write out a `.bitmap` index next to any written packs, together with the `.idx` index that currently gets written. If bitmap writing is enabled for a given repository (either by calling `pack-objects` with the `--write-bitmap-index` flag or by having `pack.writebitmaps` set to `true` in the config) and pack-objects is writing a packfile that would normally be indexed (i.e. not piping to stdout), we will attempt to write the corresponding bitmap index for the packfile. Bitmap index writing happens after the packfile and its index has been successfully written to disk (`finish_tmp_packfile`). The process is performed in several steps: 1. `bitmap_writer_set_checksum`: this call stores the partial checksum for the packfile being written; the checksum will be written in the resulting bitmap index to verify its integrity 2. `bitmap_writer_build_type_index`: this call uses the array of `struct object_entry` that has just been sorted when writing out the actual packfile index to disk to generate 4 type-index bitmaps (one for each object type). These bitmaps have their nth bit set if the given object is of the bitmap's type. E.g. the nth bit of the Commits bitmap will be 1 if the nth object in the packfile index is a commit. This is a very cheap operation because the bitmap writing code has access to the metadata stored in the `struct object_entry` array, and hence the real type for each object in the packfile. 3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap index for one of the packfiles we're trying to repack, this call will efficiently rebuild the existing bitmaps so they can be reused on the new index. All the existing bitmaps will be stored in a `reuse` hash table, and the commit selection phase will prioritize these when selecting, as they can be written directly to the new index without having to perform a revision walk to fill the bitmap. This can greatly speed up the repack of a repository that already has bitmaps. 4. `bitmap_writer_select_commits`: if bitmap writing is enabled for a given `pack-objects` run, the sequence of commits generated during the Counting Objects phase will be stored in an array. We then use that array to build up the list of selected commits. Writing a bitmap in the index for each object in the repository would be cost-prohibitive, so we use a simple heuristic to pick the commits that will be indexed with bitmaps. The current heuristics are a simplified version of JGit's original implementation. We select a higher density of commits depending on their age: the 100 most recent commits are always selected, after that we pick 1 commit of each 100, and the gap increases as the commits grow older. On top of that, we make sure that every single branch that has not been merged (all the tips that would be required from a clone) gets their own bitmap, and when selecting commits between a gap, we tend to prioritize the commit with the most parents. Do note that there is no right/wrong way to perform commit selection; different selection algorithms will result in different commits being selected, but there's no such thing as "missing a commit". The bitmap walker algorithm implemented in `prepare_bitmap_walk` is able to adapt to missing bitmaps by performing manual walks that complete the bitmap: the ideal selection algorithm, however, would select the commits that are more likely to be used as roots for a walk in the future (e.g. the tips of each branch, and so on) to ensure a bitmap for them is always available. 5. `bitmap_writer_build`: this is the computationally expensive part of bitmap generation. Based on the list of commits that were selected in the previous step, we perform several incremental walks to generate the bitmap for each commit. The walks begin from the oldest commit, and are built up incrementally for each branch. E.g. consider this dag where A, B, C, D, E, F are the selected commits, and a, b, c, e are a chunk of simplified history that will not receive bitmaps. A---a---B--b--C--c--D \ E--e--F We start by building the bitmap for A, using A as the root for a revision walk and marking all the objects that are reachable until the walk is over. Once this bitmap is stored, we reuse the bitmap walker to perform the walk for B, assuming that once we reach A again, the walk will be terminated because A has already been SEEN on the previous walk. This process is repeated for C, and D, but when we try to generate the bitmaps for E, we can reuse neither the current walk nor the bitmap we have generated so far. What we do now is resetting both the walk and clearing the bitmap, and performing the walk from scratch using E as the origin. This new walk, however, does not need to be completed. Once we hit B, we can lookup the bitmap we have already stored for that commit and OR it with the existing bitmap we've composed so far, allowing us to limit the walk early. After all the bitmaps have been generated, another iteration through the list of commits is performed to find the best XOR offsets for compression before writing them to disk. Because of the incremental nature of these bitmaps, XORing one of them with its predecesor results in a minimal "bitmap delta" most of the time. We can write this delta to the on-disk bitmap index, and then re-compose the original bitmaps by XORing them again when loaded. This is a phase very similar to pack-object's `find_delta` (using bitmaps instead of objects, of course), except the heuristics have been greatly simplified: we only check the 10 bitmaps before any given one to find best compressing one. This gives good results in practice, because there is locality in the ordering of the objects (and therefore bitmaps) in the packfile. 6. `bitmap_writer_finish`: the last step in the process is serializing to disk all the bitmap data that has been generated in the two previous steps. The bitmap is written to a tmp file and then moved atomically to its final destination, using the same process as `pack-write.c:write_idx_file`. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:16 +08:00
write_bitmap_index = 0;
continue;
}
if (starts_with(line, "--shallow ")) {
unsigned char sha1[20];
if (get_sha1_hex(line + 10, sha1))
die("not an SHA-1 '%s'", line + 10);
register_shallow(sha1);
pack-objects: turn off bitmaps when we see --shallow lines Reachability bitmaps do not work with shallow operations, because they cache a view of the object reachability that represents the true objects. Whereas a shallow repository (or a shallow operation in a repository) is inherently cutting off the object graph with a graft. We explicitly disallow the use of bitmaps in shallow repositories by checking is_repository_shallow(), and we should continue to do that. However, we also want to disallow bitmaps when we are serving a fetch to a shallow client, since we momentarily take on their grafted view of the world. It used to be enough to call is_repository_shallow at the start of pack-objects. Upload-pack wrote the other side's shallow state to a temporary file and pointed the whole pack-objects process at this state with "git --shallow-file", and from the perspective of pack-objects, we really were in a shallow repo. But since b790e0f (upload-pack: send shallow info over stdin to pack-objects, 2014-03-11), we do it differently: we send --shallow lines to pack-objects over stdin, and it registers them itself. This means that our is_repository_shallow check is way too early (we have not been told about the shallowness yet), and that it is insufficient (calling is_repository_shallow is not enough, as the shallow grafts we register do not change its return value). Instead, we can just turn off bitmaps explicitly when we see these lines. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-12 12:34:53 +08:00
use_bitmap_index = 0;
continue;
}
die("not a rev '%s'", line);
}
if (handle_revision_arg(line, &revs, flags, REVARG_CANNOT_BE_FILENAME))
die("bad revision '%s'", line);
}
pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:09 +08:00
if (use_bitmap_index && !get_object_list_from_bitmap(&revs))
return;
if (prepare_revision_walk(&revs))
die("revision walk setup failed");
mark_edges_uninteresting(&revs, show_edge);
traverse_commit_list(&revs, show_commit, show_object, NULL);
if (unpack_unreachable_expiration) {
revs.ignore_missing_links = 1;
if (add_unseen_recent_objects_to_traversal(&revs,
unpack_unreachable_expiration))
die("unable to add recent objects");
if (prepare_revision_walk(&revs))
die("revision walk setup failed");
traverse_commit_list(&revs, record_recent_commit,
record_recent_object, NULL);
}
if (keep_unreachable)
add_objects_in_unpacked_packs(&revs);
if (pack_loose_unreachable)
add_unreachable_loose_objects();
let pack-objects do the writing of unreachable objects as loose objects Commit ccc1297226b184c40459e9d373cc9eebfb7bd898 changed the behavior of 'git repack -A' so unreachable objects are stored as loose objects. However it did so in a naive and inn efficient way by making packs about to be deleted inaccessible and feeding their content through 'git unpack-objects'. While this works, there are major flaws with this approach: - It is unacceptably sloooooooooooooow. In the Linux kernel repository with no actual unreachable objects, doing 'git repack -A -d' before: real 2m33.220s user 2m21.675s sys 0m3.510s And with this change: real 0m36.849s user 0m24.365s sys 0m1.950s For reference, here's the timing for 'git repack -a -d': real 0m35.816s user 0m22.571s sys 0m2.011s This is explained by the fact that 'git unpack-objects' was used to unpack _every_ objects even if (almost) 100% of them were thrown away. - There is a black out period. Between the removal of the .idx file for the redundant pack and the completion of its unpacking, the unreachable objects become completely unaccessible. This is not a big issue as we're talking about unreachable objects, but some consistency is always good. - There is no way to easily set a sensible mtime for the newly created unreachable loose objects. So, while having a command called "pack-objects" to perform object unpacking looks really odd, this is probably the best compromize to be able to solve the above issues in an efficient way. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-14 13:33:53 +08:00
if (unpack_unreachable)
loosen_unused_packed_objects(&revs);
sha1_array_clear(&recent_objects);
}
static int option_parse_index_version(const struct option *opt,
const char *arg, int unset)
{
char *c;
const char *val = arg;
pack_idx_opts.version = strtoul(val, &c, 10);
if (pack_idx_opts.version > 2)
die(_("unsupported index version %s"), val);
if (*c == ',' && c[1])
pack_idx_opts.off32_limit = strtoul(c+1, &c, 0);
if (*c || pack_idx_opts.off32_limit & 0x80000000)
die(_("bad index version '%s'"), val);
return 0;
}
static int option_parse_unpack_unreachable(const struct option *opt,
const char *arg, int unset)
{
if (unset) {
unpack_unreachable = 0;
unpack_unreachable_expiration = 0;
}
else {
unpack_unreachable = 1;
if (arg)
unpack_unreachable_expiration = approxidate(arg);
}
return 0;
}
int cmd_pack_objects(int argc, const char **argv, const char *prefix)
{
int use_internal_rev_list = 0;
int thin = 0;
int shallow = 0;
int all_progress_implied = 0;
struct argv_array rp = ARGV_ARRAY_INIT;
int rev_list_unpacked = 0, rev_list_all = 0, rev_list_reflog = 0;
int rev_list_index = 0;
struct option pack_objects_options[] = {
OPT_SET_INT('q', "quiet", &progress,
N_("do not show progress meter"), 0),
OPT_SET_INT(0, "progress", &progress,
N_("show progress meter"), 1),
OPT_SET_INT(0, "all-progress", &progress,
N_("show progress meter during object writing phase"), 2),
OPT_BOOL(0, "all-progress-implied",
&all_progress_implied,
N_("similar to --all-progress when progress meter is shown")),
{ OPTION_CALLBACK, 0, "index-version", NULL, N_("version[,offset]"),
N_("write the pack index file in the specified idx format version"),
0, option_parse_index_version },
OPT_MAGNITUDE(0, "max-pack-size", &pack_size_limit,
N_("maximum size of each output pack file")),
OPT_BOOL(0, "local", &local,
N_("ignore borrowed objects from alternate object store")),
OPT_BOOL(0, "incremental", &incremental,
N_("ignore packed objects")),
OPT_INTEGER(0, "window", &window,
N_("limit pack window by objects")),
OPT_MAGNITUDE(0, "window-memory", &window_memory_limit,
N_("limit pack window by memory in addition to object limit")),
OPT_INTEGER(0, "depth", &depth,
N_("maximum length of delta chain allowed in the resulting pack")),
OPT_BOOL(0, "reuse-delta", &reuse_delta,
N_("reuse existing deltas")),
OPT_BOOL(0, "reuse-object", &reuse_object,
N_("reuse existing objects")),
OPT_BOOL(0, "delta-base-offset", &allow_ofs_delta,
N_("use OFS_DELTA objects")),
OPT_INTEGER(0, "threads", &delta_search_threads,
N_("use threads when searching for best delta matches")),
OPT_BOOL(0, "non-empty", &non_empty,
N_("do not create an empty pack output")),
OPT_BOOL(0, "revs", &use_internal_rev_list,
N_("read revision arguments from standard input")),
{ OPTION_SET_INT, 0, "unpacked", &rev_list_unpacked, NULL,
N_("limit the objects to those that are not yet packed"),
PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, 1 },
{ OPTION_SET_INT, 0, "all", &rev_list_all, NULL,
N_("include objects reachable from any reference"),
PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, 1 },
{ OPTION_SET_INT, 0, "reflog", &rev_list_reflog, NULL,
N_("include objects referred by reflog entries"),
PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, 1 },
{ OPTION_SET_INT, 0, "indexed-objects", &rev_list_index, NULL,
N_("include objects referred to by the index"),
PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, 1 },
OPT_BOOL(0, "stdout", &pack_to_stdout,
N_("output pack to stdout")),
OPT_BOOL(0, "include-tag", &include_tag,
N_("include tag objects that refer to objects to be packed")),
OPT_BOOL(0, "keep-unreachable", &keep_unreachable,
N_("keep unreachable objects")),
OPT_BOOL(0, "pack-loose-unreachable", &pack_loose_unreachable,
N_("pack loose unreachable objects")),
{ OPTION_CALLBACK, 0, "unpack-unreachable", NULL, N_("time"),
N_("unpack unreachable objects newer than <time>"),
PARSE_OPT_OPTARG, option_parse_unpack_unreachable },
OPT_BOOL(0, "thin", &thin,
N_("create thin packs")),
OPT_BOOL(0, "shallow", &shallow,
N_("create packs suitable for shallow fetches")),
OPT_BOOL(0, "honor-pack-keep", &ignore_packed_keep,
N_("ignore packs that have companion .keep file")),
OPT_INTEGER(0, "compression", &pack_compression_level,
N_("pack compression level")),
OPT_SET_INT(0, "keep-true-parents", &grafts_replace_parents,
N_("do not hide commits by grafts"), 0),
pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:09 +08:00
OPT_BOOL(0, "use-bitmap-index", &use_bitmap_index,
N_("use a bitmap index if available to speed up counting objects")),
pack-objects: implement bitmap writing This commit extends more the functionality of `pack-objects` by allowing it to write out a `.bitmap` index next to any written packs, together with the `.idx` index that currently gets written. If bitmap writing is enabled for a given repository (either by calling `pack-objects` with the `--write-bitmap-index` flag or by having `pack.writebitmaps` set to `true` in the config) and pack-objects is writing a packfile that would normally be indexed (i.e. not piping to stdout), we will attempt to write the corresponding bitmap index for the packfile. Bitmap index writing happens after the packfile and its index has been successfully written to disk (`finish_tmp_packfile`). The process is performed in several steps: 1. `bitmap_writer_set_checksum`: this call stores the partial checksum for the packfile being written; the checksum will be written in the resulting bitmap index to verify its integrity 2. `bitmap_writer_build_type_index`: this call uses the array of `struct object_entry` that has just been sorted when writing out the actual packfile index to disk to generate 4 type-index bitmaps (one for each object type). These bitmaps have their nth bit set if the given object is of the bitmap's type. E.g. the nth bit of the Commits bitmap will be 1 if the nth object in the packfile index is a commit. This is a very cheap operation because the bitmap writing code has access to the metadata stored in the `struct object_entry` array, and hence the real type for each object in the packfile. 3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap index for one of the packfiles we're trying to repack, this call will efficiently rebuild the existing bitmaps so they can be reused on the new index. All the existing bitmaps will be stored in a `reuse` hash table, and the commit selection phase will prioritize these when selecting, as they can be written directly to the new index without having to perform a revision walk to fill the bitmap. This can greatly speed up the repack of a repository that already has bitmaps. 4. `bitmap_writer_select_commits`: if bitmap writing is enabled for a given `pack-objects` run, the sequence of commits generated during the Counting Objects phase will be stored in an array. We then use that array to build up the list of selected commits. Writing a bitmap in the index for each object in the repository would be cost-prohibitive, so we use a simple heuristic to pick the commits that will be indexed with bitmaps. The current heuristics are a simplified version of JGit's original implementation. We select a higher density of commits depending on their age: the 100 most recent commits are always selected, after that we pick 1 commit of each 100, and the gap increases as the commits grow older. On top of that, we make sure that every single branch that has not been merged (all the tips that would be required from a clone) gets their own bitmap, and when selecting commits between a gap, we tend to prioritize the commit with the most parents. Do note that there is no right/wrong way to perform commit selection; different selection algorithms will result in different commits being selected, but there's no such thing as "missing a commit". The bitmap walker algorithm implemented in `prepare_bitmap_walk` is able to adapt to missing bitmaps by performing manual walks that complete the bitmap: the ideal selection algorithm, however, would select the commits that are more likely to be used as roots for a walk in the future (e.g. the tips of each branch, and so on) to ensure a bitmap for them is always available. 5. `bitmap_writer_build`: this is the computationally expensive part of bitmap generation. Based on the list of commits that were selected in the previous step, we perform several incremental walks to generate the bitmap for each commit. The walks begin from the oldest commit, and are built up incrementally for each branch. E.g. consider this dag where A, B, C, D, E, F are the selected commits, and a, b, c, e are a chunk of simplified history that will not receive bitmaps. A---a---B--b--C--c--D \ E--e--F We start by building the bitmap for A, using A as the root for a revision walk and marking all the objects that are reachable until the walk is over. Once this bitmap is stored, we reuse the bitmap walker to perform the walk for B, assuming that once we reach A again, the walk will be terminated because A has already been SEEN on the previous walk. This process is repeated for C, and D, but when we try to generate the bitmaps for E, we can reuse neither the current walk nor the bitmap we have generated so far. What we do now is resetting both the walk and clearing the bitmap, and performing the walk from scratch using E as the origin. This new walk, however, does not need to be completed. Once we hit B, we can lookup the bitmap we have already stored for that commit and OR it with the existing bitmap we've composed so far, allowing us to limit the walk early. After all the bitmaps have been generated, another iteration through the list of commits is performed to find the best XOR offsets for compression before writing them to disk. Because of the incremental nature of these bitmaps, XORing one of them with its predecesor results in a minimal "bitmap delta" most of the time. We can write this delta to the on-disk bitmap index, and then re-compose the original bitmaps by XORing them again when loaded. This is a phase very similar to pack-object's `find_delta` (using bitmaps instead of objects, of course), except the heuristics have been greatly simplified: we only check the 10 bitmaps before any given one to find best compressing one. This gives good results in practice, because there is locality in the ordering of the objects (and therefore bitmaps) in the packfile. 6. `bitmap_writer_finish`: the last step in the process is serializing to disk all the bitmap data that has been generated in the two previous steps. The bitmap is written to a tmp file and then moved atomically to its final destination, using the same process as `pack-write.c:write_idx_file`. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:16 +08:00
OPT_BOOL(0, "write-bitmap-index", &write_bitmap_index,
N_("write a bitmap index together with the pack index")),
OPT_END(),
};
check_replace_refs = 0;
reset_pack_idx_option(&pack_idx_opts);
git_config(git_pack_config, NULL);
progress = isatty(2);
argc = parse_options(argc, argv, prefix, pack_objects_options,
pack_usage, 0);
if (argc) {
base_name = argv[0];
argc--;
}
if (pack_to_stdout != !base_name || argc)
usage_with_options(pack_usage, pack_objects_options);
argv_array_push(&rp, "pack-objects");
if (thin) {
use_internal_rev_list = 1;
argv_array_push(&rp, shallow
? "--objects-edge-aggressive"
: "--objects-edge");
} else
argv_array_push(&rp, "--objects");
if (rev_list_all) {
use_internal_rev_list = 1;
argv_array_push(&rp, "--all");
}
if (rev_list_reflog) {
use_internal_rev_list = 1;
argv_array_push(&rp, "--reflog");
}
if (rev_list_index) {
use_internal_rev_list = 1;
argv_array_push(&rp, "--indexed-objects");
}
if (rev_list_unpacked) {
use_internal_rev_list = 1;
argv_array_push(&rp, "--unpacked");
}
if (!reuse_object)
reuse_delta = 0;
if (pack_compression_level == -1)
pack_compression_level = Z_DEFAULT_COMPRESSION;
else if (pack_compression_level < 0 || pack_compression_level > Z_BEST_COMPRESSION)
die("bad pack compression level %d", pack_compression_level);
if (!delta_search_threads) /* --threads=0 means autodetect */
delta_search_threads = online_cpus();
#ifdef NO_PTHREADS
if (delta_search_threads != 1)
warning("no threads support, ignoring --threads");
#endif
if (!pack_to_stdout && !pack_size_limit)
pack_size_limit = pack_size_limit_cfg;
if (pack_to_stdout && pack_size_limit)
die("--max-pack-size cannot be used to build a pack for transfer.");
if (pack_size_limit && pack_size_limit < 1024*1024) {
warning("minimum pack size limit is 1 MiB");
pack_size_limit = 1024*1024;
}
if (!pack_to_stdout && thin)
die("--thin cannot be used to build an indexable pack.");
let pack-objects do the writing of unreachable objects as loose objects Commit ccc1297226b184c40459e9d373cc9eebfb7bd898 changed the behavior of 'git repack -A' so unreachable objects are stored as loose objects. However it did so in a naive and inn efficient way by making packs about to be deleted inaccessible and feeding their content through 'git unpack-objects'. While this works, there are major flaws with this approach: - It is unacceptably sloooooooooooooow. In the Linux kernel repository with no actual unreachable objects, doing 'git repack -A -d' before: real 2m33.220s user 2m21.675s sys 0m3.510s And with this change: real 0m36.849s user 0m24.365s sys 0m1.950s For reference, here's the timing for 'git repack -a -d': real 0m35.816s user 0m22.571s sys 0m2.011s This is explained by the fact that 'git unpack-objects' was used to unpack _every_ objects even if (almost) 100% of them were thrown away. - There is a black out period. Between the removal of the .idx file for the redundant pack and the completion of its unpacking, the unreachable objects become completely unaccessible. This is not a big issue as we're talking about unreachable objects, but some consistency is always good. - There is no way to easily set a sensible mtime for the newly created unreachable loose objects. So, while having a command called "pack-objects" to perform object unpacking looks really odd, this is probably the best compromize to be able to solve the above issues in an efficient way. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-14 13:33:53 +08:00
if (keep_unreachable && unpack_unreachable)
die("--keep-unreachable and --unpack-unreachable are incompatible.");
if (!rev_list_all || !rev_list_reflog || !rev_list_index)
unpack_unreachable_expiration = 0;
let pack-objects do the writing of unreachable objects as loose objects Commit ccc1297226b184c40459e9d373cc9eebfb7bd898 changed the behavior of 'git repack -A' so unreachable objects are stored as loose objects. However it did so in a naive and inn efficient way by making packs about to be deleted inaccessible and feeding their content through 'git unpack-objects'. While this works, there are major flaws with this approach: - It is unacceptably sloooooooooooooow. In the Linux kernel repository with no actual unreachable objects, doing 'git repack -A -d' before: real 2m33.220s user 2m21.675s sys 0m3.510s And with this change: real 0m36.849s user 0m24.365s sys 0m1.950s For reference, here's the timing for 'git repack -a -d': real 0m35.816s user 0m22.571s sys 0m2.011s This is explained by the fact that 'git unpack-objects' was used to unpack _every_ objects even if (almost) 100% of them were thrown away. - There is a black out period. Between the removal of the .idx file for the redundant pack and the completion of its unpacking, the unreachable objects become completely unaccessible. This is not a big issue as we're talking about unreachable objects, but some consistency is always good. - There is no way to easily set a sensible mtime for the newly created unreachable loose objects. So, while having a command called "pack-objects" to perform object unpacking looks really odd, this is probably the best compromize to be able to solve the above issues in an efficient way. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-14 13:33:53 +08:00
pack-objects: use reachability bitmap index when generating non-stdout pack Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects) if a repository has bitmap index, pack-objects can nicely speedup "Counting objects" graph traversal phase. That however was done only for case when resultant pack is sent to stdout, not written into a file. The reason here is for on-disk repack by default we want: - to produce good pack (with bitmap index not-yet-packed objects are emitted to pack in suboptimal order). - to use more robust pack-generation codepath (avoiding possible bugs in bitmap code and possible bitmap index corruption). Jeff King further explains: The reason for this split is that pack-objects tries to determine how "careful" it should be based on whether we are packing to disk or to stdout. Packing to disk implies "git repack", and that we will likely delete the old packs after finishing. We want to be more careful (so as not to carry forward a corruption, and to generate a more optimal pack), and we presumably run less frequently and can afford extra CPU. Whereas packing to stdout implies serving a remote via "git fetch" or "git push". This happens more frequently (e.g., a server handling many fetching clients), and we assume the receiving end takes more responsibility for verifying the data. But this isn't always the case. One might want to generate on-disk packfiles for a specialized object transfer. Just using "--stdout" and writing to a file is not optimal, as it will not generate the matching pack index. So it would be useful to have some way of overriding this heuristic: to tell pack-objects that even though it should generate on-disk files, it is still OK to use the reachability bitmaps to do the traversal. So we can teach pack-objects to use bitmap index for initial object counting phase when generating resultant pack file too: - if we take care to not let it be activated under git-repack: See above about repack robustness and not forward-carrying corruption. - if we know bitmap index generation is not enabled for resultant pack: The current code has singleton bitmap_git, so it cannot work simultaneously with two bitmap indices. We also want to avoid (at least with current implementation) generating bitmaps off of bitmaps. The reason here is: when generating a pack, not-yet-packed objects will be emitted into pack in suboptimal order and added to tail of the bitmap as "extended entries". When the resultant pack + some new objects in associated repository are in turn used to generate another pack with bitmap, the situation repeats: new objects are again not emitted optimally and just added to bitmap tail - not in recency order. So the pack badness can grow over time when at each step we have bitmapped pack + some other objects. That's why we want to avoid generating bitmaps off of bitmaps, not to let pack badness grow. - if we keep pack reuse enabled still only for "send-to-stdout" case: Because pack-to-file needs to generate index for destination pack, and currently on pack reuse raw entries are directly written out to the destination pack by write_reused_pack(), bypassing needed for pack index generation bookkeeping done by regular codepath in write_one() and friends. ( In the future we might teach pack-reuse code about cases when index also needs to be generated for resultant pack and remove pack-reuse-only-for-stdout limitation ) This way for pack-objects -> file we get nice speedup: erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup repository managed by git-backup[2] via time echo 0186ac99 | git pack-objects --revs erp5pack before: 37.2s after: 26.2s And for `git repack -adb` packed git.git time echo 5c589a73 | git pack-objects --revs gitpack before: 7.1s after: 3.6s i.e. it can be 30% - 50% speedup for pack extraction. git-backup extracts many packs on repositories restoration. That was my initial motivation for the patch. [1] https://lab.nexedi.com/nexedi/erp5 [2] https://lab.nexedi.com/kirr/git-backup NOTE Jeff also suggests that pack.useBitmaps was probably a mistake to introduce originally. This way we are not adding another config point, but instead just always default to-file pack-objects not to use bitmap index: Tools which need to generate on-disk packs with using bitmap, can pass --use-bitmap-index explicitly. And git-repack does never pass --use-bitmap-index, so this way we can be sure regular on-disk repacking remains robust. NOTE2 `git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much slower than `git pack-objects file.pack`. Extracting erp5.git pack from lab.nexedi.com backup repository: $ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack real 0m22.309s user 0m21.148s sys 0m0.932s $ time git index-pack erp5pack-stdout.pack real 0m50.873s <-- more than 2 times slower than time to generate pack itself! user 0m49.300s sys 0m1.360s So the time for `pack-object --stdout >file.pack` + `index-pack file.pack` is 72s, while `pack-objects file.pack` which does both pack and index is 27s. And even `pack-objects --no-use-bitmap-index file.pack` is 37s. Jeff explains: The packfile does not carry the sha1 of the objects. A receiving index-pack has to compute them itself, including inflating and applying all of the deltas. that's why for `git-backup restore` we want to teach `git pack-objects file.pack` to use bitmaps instead of using `git pack-objects --stdout >file.pack` + `git index-pack file.pack`. NOTE3 The speedup is now tracked via t/perf/p5310-pack-bitmaps.sh Test 56dfeb62 this tree -------------------------------------------------------------------------------- 5310.2: repack to disk 8.98(8.05+0.29) 9.05(8.08+0.33) +0.8% 5310.3: simulated clone 2.02(2.27+0.09) 2.01(2.25+0.08) -0.5% 5310.4: simulated fetch 0.81(1.07+0.02) 0.81(1.05+0.04) +0.0% 5310.5: pack to file 7.58(7.04+0.28) 7.60(7.04+0.30) +0.3% 5310.6: pack to file (bitmap) 7.55(7.02+0.28) 3.25(2.82+0.18) -57.0% 5310.8: clone (partial bitmap) 1.83(2.26+0.12) 1.82(2.22+0.14) -0.5% 5310.9: pack to file (partial bitmap) 6.86(6.58+0.30) 2.87(2.74+0.20) -58.2% More context: http://marc.info/?t=146792101400001&r=1&w=2 http://public-inbox.org/git/20160707190917.20011-1-kirr@nexedi.com/T/#t Cc: Vicent Marti <tanoku@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-10 23:01:44 +08:00
/*
* "soft" reasons not to use bitmaps - for on-disk repack by default we want
*
* - to produce good pack (with bitmap index not-yet-packed objects are
* packed in suboptimal order).
*
* - to use more robust pack-generation codepath (avoiding possible
* bugs in bitmap code and possible bitmap index corruption).
*/
if (!pack_to_stdout)
use_bitmap_index_default = 0;
if (use_bitmap_index < 0)
use_bitmap_index = use_bitmap_index_default;
/* "hard" reasons not to use bitmaps; these just won't work at all */
if (!use_internal_rev_list || (!pack_to_stdout && write_bitmap_index) || is_repository_shallow())
pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:09 +08:00
use_bitmap_index = 0;
pack-objects: implement bitmap writing This commit extends more the functionality of `pack-objects` by allowing it to write out a `.bitmap` index next to any written packs, together with the `.idx` index that currently gets written. If bitmap writing is enabled for a given repository (either by calling `pack-objects` with the `--write-bitmap-index` flag or by having `pack.writebitmaps` set to `true` in the config) and pack-objects is writing a packfile that would normally be indexed (i.e. not piping to stdout), we will attempt to write the corresponding bitmap index for the packfile. Bitmap index writing happens after the packfile and its index has been successfully written to disk (`finish_tmp_packfile`). The process is performed in several steps: 1. `bitmap_writer_set_checksum`: this call stores the partial checksum for the packfile being written; the checksum will be written in the resulting bitmap index to verify its integrity 2. `bitmap_writer_build_type_index`: this call uses the array of `struct object_entry` that has just been sorted when writing out the actual packfile index to disk to generate 4 type-index bitmaps (one for each object type). These bitmaps have their nth bit set if the given object is of the bitmap's type. E.g. the nth bit of the Commits bitmap will be 1 if the nth object in the packfile index is a commit. This is a very cheap operation because the bitmap writing code has access to the metadata stored in the `struct object_entry` array, and hence the real type for each object in the packfile. 3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap index for one of the packfiles we're trying to repack, this call will efficiently rebuild the existing bitmaps so they can be reused on the new index. All the existing bitmaps will be stored in a `reuse` hash table, and the commit selection phase will prioritize these when selecting, as they can be written directly to the new index without having to perform a revision walk to fill the bitmap. This can greatly speed up the repack of a repository that already has bitmaps. 4. `bitmap_writer_select_commits`: if bitmap writing is enabled for a given `pack-objects` run, the sequence of commits generated during the Counting Objects phase will be stored in an array. We then use that array to build up the list of selected commits. Writing a bitmap in the index for each object in the repository would be cost-prohibitive, so we use a simple heuristic to pick the commits that will be indexed with bitmaps. The current heuristics are a simplified version of JGit's original implementation. We select a higher density of commits depending on their age: the 100 most recent commits are always selected, after that we pick 1 commit of each 100, and the gap increases as the commits grow older. On top of that, we make sure that every single branch that has not been merged (all the tips that would be required from a clone) gets their own bitmap, and when selecting commits between a gap, we tend to prioritize the commit with the most parents. Do note that there is no right/wrong way to perform commit selection; different selection algorithms will result in different commits being selected, but there's no such thing as "missing a commit". The bitmap walker algorithm implemented in `prepare_bitmap_walk` is able to adapt to missing bitmaps by performing manual walks that complete the bitmap: the ideal selection algorithm, however, would select the commits that are more likely to be used as roots for a walk in the future (e.g. the tips of each branch, and so on) to ensure a bitmap for them is always available. 5. `bitmap_writer_build`: this is the computationally expensive part of bitmap generation. Based on the list of commits that were selected in the previous step, we perform several incremental walks to generate the bitmap for each commit. The walks begin from the oldest commit, and are built up incrementally for each branch. E.g. consider this dag where A, B, C, D, E, F are the selected commits, and a, b, c, e are a chunk of simplified history that will not receive bitmaps. A---a---B--b--C--c--D \ E--e--F We start by building the bitmap for A, using A as the root for a revision walk and marking all the objects that are reachable until the walk is over. Once this bitmap is stored, we reuse the bitmap walker to perform the walk for B, assuming that once we reach A again, the walk will be terminated because A has already been SEEN on the previous walk. This process is repeated for C, and D, but when we try to generate the bitmaps for E, we can reuse neither the current walk nor the bitmap we have generated so far. What we do now is resetting both the walk and clearing the bitmap, and performing the walk from scratch using E as the origin. This new walk, however, does not need to be completed. Once we hit B, we can lookup the bitmap we have already stored for that commit and OR it with the existing bitmap we've composed so far, allowing us to limit the walk early. After all the bitmaps have been generated, another iteration through the list of commits is performed to find the best XOR offsets for compression before writing them to disk. Because of the incremental nature of these bitmaps, XORing one of them with its predecesor results in a minimal "bitmap delta" most of the time. We can write this delta to the on-disk bitmap index, and then re-compose the original bitmaps by XORing them again when loaded. This is a phase very similar to pack-object's `find_delta` (using bitmaps instead of objects, of course), except the heuristics have been greatly simplified: we only check the 10 bitmaps before any given one to find best compressing one. This gives good results in practice, because there is locality in the ordering of the objects (and therefore bitmaps) in the packfile. 6. `bitmap_writer_finish`: the last step in the process is serializing to disk all the bitmap data that has been generated in the two previous steps. The bitmap is written to a tmp file and then moved atomically to its final destination, using the same process as `pack-write.c:write_idx_file`. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-21 22:00:16 +08:00
if (pack_to_stdout || !rev_list_all)
write_bitmap_index = 0;
if (progress && all_progress_implied)
progress = 2;
prepare_packed_git();
pack-objects: compute local/ignore_pack_keep early In want_object_in_pack(), we can exit early from our loop if neither "local" nor "ignore_pack_keep" are set. If they are, however, we must examine each pack to see if it has the object and is non-local or has a ".keep". It's quite common for there to be no non-local or .keep packs at all, in which case we know ahead of time that looking further will be pointless. We can pre-compute this by simply iterating over the list of packs ahead of time, and dropping the flags if there are no packs that could match. Another similar strategy would be to modify the loop in want_object_in_pack() to notice that we have already found the object once, and that we are looping only to check for "local" and "keep" attributes. If a pack has neither of those, we can skip the call to find_pack_entry_one(), which is the expensive part of the loop. This has two advantages: - it isn't all-or-nothing; we still get some improvement when there's a small number of kept or non-local packs, and a large number of non-kept local packs - it eliminates any possible race where we add new non-local or kept packs after our initial scan. In practice, I don't think this race matters; we already cache the packed_git information, so somebody who adds a new pack or .keep file after we've started will not be noticed at all, unless we happen to need to call reprepare_packed_git() because a lookup fails. In other words, we're already racy, and the race is not a big deal (losing the race means we might include an object in the pack that would not otherwise be, which is an acceptable outcome). However, it also has a disadvantage: we still loop over the rest of the packs for each object to check their flags. This is much less expensive than doing the object lookup, but still not free. So if we wanted to implement that strategy to cover the non-all-or-nothing cases, we could do so in addition to this one (so you get the most speedup in the all-or-nothing case, and the best we can do in the other cases). But given that the all-or-nothing case is likely the most common, it is probably not worth the trouble, and we can revisit this later if evidence points otherwise. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-29 12:11:31 +08:00
if (ignore_packed_keep) {
struct packed_git *p;
for (p = packed_git; p; p = p->next)
if (p->pack_local && p->pack_keep)
break;
if (!p) /* no keep-able packs found */
ignore_packed_keep = 0;
}
if (local) {
/*
* unlike ignore_packed_keep above, we do not want to
* unset "local" based on looking at packs, as it
* also covers non-local objects
*/
struct packed_git *p;
for (p = packed_git; p; p = p->next) {
if (!p->pack_local) {
have_non_local_packs = 1;
break;
}
}
}
if (progress)
progress_state = start_progress(_("Counting objects"), 0);
if (!use_internal_rev_list)
read_object_list_from_stdin();
else {
get_object_list(rp.argc, rp.argv);
argv_array_clear(&rp);
}
cleanup_preferred_base();
if (include_tag && nr_result)
for_each_ref(add_ref_tag, NULL);
stop_progress(&progress_state);
if (non_empty && !nr_result)
return 0;
if (nr_result)
prepare_pack(window, depth);
write_pack_file();
pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17 03:55:51 +08:00
if (progress)
fprintf(stderr, "Total %"PRIu32" (delta %"PRIu32"),"
" reused %"PRIu32" (delta %"PRIu32")\n",
written, written_delta, reused, reused_delta);
return 0;
}