mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-24 02:17:02 +08:00

Author	SHA1	Message	Date
Elijah Newren	ea305a68fd	merge-ort: add a special minimal index just for renormalization renormalize_buffer() requires an index_state, which is something that merge-ort does not operate with. However, all the renormalization code needs is an index with a .gitattributes file...plus a little bit of setup. Create such an index, along with the deallocation and attr_direction handling. A subsequent commit will add a function to finish the initialization of this index. Signed-off-by: Elijah Newren <newren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-20 12:35:40 -07:00
Elijah Newren	72b3091040	merge-ort: use STABLE_QSORT instead of QSORT where required rename/rename conflict handling depends on the fact that if both sides renamed the same path, that the one on the MERGE_SIDE1 will appear first in the combined diff_queue_struct passed to process_renames(). Since we add all pairs from MERGE_SIDE1 to combined first, and then all pairs from MERGE_SIDE2, and then sort based on filename, this will only be true if the sort used is stable. This was found due to the fact that Mac, unlike Linux, apparently has a system-defined qsort that is not stable. While we are at it, review the other callers of QSORT and add comments about why they can remain as calls to QSORT instead of being modified to call STABLE_QSORT. Signed-off-by: Elijah Newren <newren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-20 12:35:39 -07:00
Elijah Newren	ec59da6015	merge-ort: record the reason that we want a rename for a file There are two different reasons we might want a rename for a file -- for three-way content merging or as part of directory rename detection. Record the reason. diffcore-rename will potentially be able to filter some of the ones marked as needed only for directory rename detection, if it can determine those directory renames based solely on renames found via exact rename detection and basename-guided rename detection. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-18 14:32:56 -07:00
Elijah Newren	bf238b7137	diffcore-rename: add computation of number of unknown renames The previous commit can only be effective if we have a computation of the number of paths under a given directory which are still have pending renames, and expected this number to be recorded in the dir_rename_count map under the key UNKNOWN_DIR. Add the code necessary to compute these values. Note that this change means dir_rename_count might have a directory whose only entry (for UNKNOWN_DIR) was removed by the time merge-ort goes to check it. To account for this, merge-ort needs to check for the case where the max count is 0. With this change we are now computing the necessary value for each directory in dirs_removed, but are not using that value anywhere. The next two commits will make use of the values stored in dirs_removed in order to compute whether each relevant_source (that is needed only for directory rename detection) has become unnecessary. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-18 14:32:56 -07:00
Elijah Newren	fb52938eec	merge-ort: record the reason that we want a rename for a directory When one side of history renames a directory, and the other side of history added files to the old directory, directory rename detection is used to warn about the location of the added files so the user can move them to the old directory or keep them with the new one. This sets up three different types of directories: * directories that had new files added to them * directories underneath a directory that had new files added to them * directories where no new files were added to it or any leading path Save this information in dirs_removed; the next several commits will make use of this information. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-18 14:32:55 -07:00
Elijah Newren	a49b55d52e	merge-ort, diffcore-rename: tweak dirs_removed and relevant_source type As noted in the previous commit, we want to be able to take advantage of the "majority rules" portion of directory rename detection to avoid detecting more renames than necessary. However, for diffcore-rename to take advantage of that, it needs to know whether a rename source file was needed for just directory rename detection reasons, or if it is wanted for potential three-way content merging. Modify relevant_sources from a strset to a strintmap, so we can encode additional information. We also modify dirs_removed from a strset to a strintmap at the same time because trying to determine what files are needed for directory rename detection will require us tracking a bit more information for each directory. This commit only changes the types of the two variables from strset to strintmap; it does not actually store any special values yet and for now only checks for presence of entries in the strintmap. Thus, the code is functionally identical to how it behaved before. Future commits will start associating values with each key for these two maps. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-18 14:32:55 -07:00
Elijah Newren	f89b4f2bee	merge-ort: skip rename detection entirely if possible diffcore_rename_extended() will do a bunch of setup, then check for exact renames, then abort before inexact rename detection if there are no more sources or destinations that need to be matched. It will sometimes be the case, however, that either * we start with neither any sources or destinations * we start with no relevant sources In the first of these two cases, the setup and exact rename detection will be very cheap since there are 0 files to operate on. In the second case, it is quite possible to have thousands of files with none of the source ones being relevant. Avoid calling diffcore_rename_extended() or even some of the setup before diffcore_rename_extended() when we can determine that rename detection is unnecessary. For the testcases mentioned in commit `557ac0350d` ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 6.003 s ± 0.048 s 5.708 s ± 0.111 s mega-renames: 114.009 s ± 0.236 s 102.171 s ± 0.440 s just-one-mega: 3.489 s ± 0.017 s 3.471 s ± 0.015 s Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-10 22:18:05 -08:00
Elijah Newren	174791f0fb	merge-ort: use relevant_sources to filter possible rename sources The past several commits determined conditions when rename sources might be needed, and filled a relevant_sources strset with those paths. Pass these along to diffcore_rename_extended() to use to limit the sources that we need to detect renames for. For the testcases mentioned in commit `557ac0350d` ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 12.596 s ± 0.061 s 6.003 s ± 0.048 s mega-renames: 130.465 s ± 0.259 s 114.009 s ± 0.236 s just-one-mega: 3.958 s ± 0.010 s 3.489 s ± 0.017 s Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-10 22:18:05 -08:00
Elijah Newren	2fd9eda462	merge-ort: precompute whether directory rename detection is needed The point of directory rename detection is that if one side of history renames a directory, and the other side adds new files under the old directory, then the merge can move those new files into the new directory. This leads to the following important observation: * If the other side does not add any new files under the old directory, we do not need to detect any renames for that directory. Similarly, directory rename detection had an important requirement: * If a directory still exists on one side of history, it has not been renamed on that side of history. (See section 4 of t6423 or Documentation/technical/directory-rename-detection.txt for more details). Using these two bits of information, we note that directory rename detection is only needed in cases where (1) directories exist in the merge base and on one side of history (i.e. dirmask == 3 or dirmask == 5), and (2) where there is some new file added to that directory on the side where it still exists (thus where the file has filemask == 2 or filemask == 4, respectively). This has to be done in two steps, because we have the dirmask when we are first considering the directory, and won't get the filemasks for the files within it until we recurse into that directory. So, we save dir_rename_mask = dirmask - 1 when we hit a directory that is missing on one side, and then later look for cases of filemask == dir_rename_mask One final note is that as soon as we hit a directory that needs directory rename detection, we will need to detect renames in all subdirectories of that directory as well due to the "majority rules" decision when files are renamed into different directory hierarchies. We arbitrarily use the special value of 0x07 to record when we've hit such a directory. The combination of all the above mean that we introduce a variable named dir_rename_mask (couldn't think of a better name) which has one of the following values as we traverse into a directory: * 0x00: directory rename detection not needed * 0x02 or 0x04: directory rename detection only needed if files added * 0x07: directory rename detection definitely needed We then pass this value through to add_pairs() so that it can mark location_relevant as true only when dir_rename_mask is 0x07. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-10 22:18:05 -08:00
Elijah Newren	a68e6cea59	merge-ort: introduce wrappers for alternate tree traversal Add traverse_trees_wrapper() and traverse_trees_wrapper_callback() functions. The former runs traverse_trees() with info->fn set to traverse_trees_wrapper_callback, in order to simply save all the entries without processing or recursing into any of them. This step allows extra computation to be done (e.g. checking some condition across all files) that can be used later. Then, after that is completed, it iterates over all the saved entries and calls the original info->fn callback with the saved data. Currently, this does nothing more than marginally slowing down the tree traversal since we do not take advantage of the opportunity to compute anything special in traverse_trees_wrapper_callback(), and thus the real callback will be called identically as it would have been without this extra wrapper. However, a subsequent commit will add some special computation of some values that the real callback will be able to use. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-10 22:18:05 -08:00
Elijah Newren	beb06145f8	merge-ort: add data structures for an alternate tree traversal In order to determine whether directory rename detection is needed, we as a pre-requisite need a way to traverse through all the files in a given tree before visiting any directories within that tree. traverse_trees() only iterates through the entries in the order they appear, so add some data structures that will store all the entries as we iterate through them in traverse_trees(), which will allow us to re-traverse them in our desired order. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-10 22:18:04 -08:00
Elijah Newren	32a56dfb99	merge-ort: precompute subset of sources for which we need rename detection rename detection works by trying to pair all file deletions (or "sources") with all file additions (or "destinations"), checking similarity, and then marking the sufficiently similar ones as renames. This can be expensive if there are many sources and destinations on a given side of history as it results in an N x M comparison matrix. However, there are many cases where we can compute in advance that detecting renames for some of the sources provides no useful information and thus that we can exclude those sources from the matrix. To see why, first note that the merge machinery uses detected renames in two ways: * directory rename detection: when one side of history renames a directory, and the other side of history adds new files to that directory, we want to be able to warn the user about the need to chose whether those new files stay in the old directory or move to the new one. * three-way content merging: in order to do three-way content merging of files, we need three different file versions. If one side of history renamed a file, then some of the content for the file is found under a different path than in the merge base or on the other side of history. Add a simple testcase showing the two kinds of reasons renames are relevant; it's a testcase that will only pass if we detect both kinds of needed renames. Other than the testcase added above, this commit concentrates just on the three-way content merging; it will punt and mark all sources as needed for directory rename detection, and leave it to future commits to narrow that down more. The point of three-way content merging is to reconcile changes made on both sides of history. What if the file wasn't modified on both sides? There are two possibilities: * If it wasn't modified on the renamed side: -> then we get to do exact rename detection, which is cheap. * If it wasn't modified on the unrenamed side: -> then detection of a rename for that source file is irrelevant That latter claim might be surprising at first, so let's walk through a case to show why rename detection for that source file is irrelevant. Let's use two filenames, old.c & new.c, with the following abbreviated object ids (and where the value '000000' is used to denote that the file is missing in that commit): old.c new.c MERGE_BASE: 01d01d 000000 MERGE_SIDE1: 01d01d 000000 MERGE_SIDE2: 000000 5e1ec7 If the rename isn't detected: then old.c looks like it was unmodified on one side and deleted on the other and should thus be removed. new.c looks like a new file we should keep as-is. If the rename is detected: then a three-way content merge is done. Since the version of the file in MERGE_BASE and MERGE_SIDE1 are identical, the three-way merge will produce exactly the version of the file whose abbreviated object id is 5e1ec7. It will record that file at the path new.c, while removing old.c from the directory. Note that these two results are identical -- a single file named 'new.c' with object id 5e1ec7. In other words, it doesn't matter if the rename is detected in the case where the file is unmodified on the unrenamed side. Use this information to compute whether we need rename detection for each source created in add_pair(). It's probably worth noting that there used to be a few other edge or corner cases besides three-way content merges and directory rename detection where lack of rename detection could have affected the result, but those cases actually highlighted where conflict resolution methods were not consistent with each other. Fixing those inconsistencies were thus critically important to enabling this optimization. That work involved the following: * bringing consistency to add/add, rename/add, and rename/rename conflict types, as done back in the topic merged at commit `ac193e0e0a` ("Merge branch 'en/merge-path-collision'", 2019-01-04), and further extended in commits `2a7c16c980` ("t6422, t6426: be more flexible for add/add conflicts involving renames", 2020-08-10) and `e8eb99d4a6` ("t642[23]: be more flexible for add/add conflicts involving pair renames", 2020-08-10) * making rename/delete more consistent with modify/delete as done in commits `1f3c9ba707` ("t6425: be more flexible with rename/delete conflict messages", 2020-08-10) and `727c75b23f` ("t6404, t6423: expect improved rename/delete handling in ort backend", 2020-10-26) Since the set of relevant_sources we compute has not yet been narrowed down for directory rename detection, we do not pass it to diffcore_rename_extended() yet. That will be done after subsequent commits narrow down the list of relevant_sources needed for directory rename detection reasons. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-10 22:18:04 -08:00
Elijah Newren	9799889f2e	diffcore-rename: enable filtering possible rename sources Add the ability to diffcore_rename_extended() to allow external callers to declare that they only need renames detected for a subset of source files, and use that information to skip detecting renames for them. There are two important pieces to this optimization that may not be obvious at first glance: * We do not require callers to just filter the filepairs out to remove the non-relevant sources, because exact rename detection is fast and when it finds a match it can remove both a source and a destination whereas the relevant_sources filter can only remove a source. * We need to filter out the source pairs in a preliminary pass instead of adding a strset_contains(relevant_sources, one->path) check within the nested matrix loop. The reason for that is if we have 30k renames, doing 30k * 30k = 900M strset_contains() calls becomes extraordinarily expensive and defeats the performance gains from this change; we only want to do 30k such calls instead. If callers pass NULL for relevant_sources, that is special cases to treat all sources as relevant. Since all callers currently pass NULL, this optimization does not yet have any effect. Subsequent commits will have merge-ort compute a set of relevant_sources to restrict which sources we detect renames for, and have merge-ort pass that set of relevant_sources to diffcore_rename_extended(). A note about filtering order: Some may be curious why we don't filter out irrelevant sources at the same time we filter out exact renames. While that technically could be done at this point, there are two reasons to defer it: First, was to reinforce a lesson that was too easy to forget. As I mentioned above, in the past I filtered irrelevant sources out before exact rename checking, and then discovered that exact renames' ability to remove both sources and destinations was an important consideration and thus doing the filtering after exact rename checking would speed things up. Then at some point I realized that basename matching could also remove both sources and destinations, and decided to put irrelevant source filtering after basename filtering. That slowed things down a lot. But, despite learning about this important ordering, in later restructuring I forgot and made the same mistake of putting the filtering after basename guided rename detection again. So, I have this series of patches structured to do the irrelevant filtering last to start to show how much extra it costs, and then add relevant filtering in to find_basename_matches() to show how much it speeds things up. Basically, it's a way to reinforce something that apparently was too easy to forget, and make sure the commit messages record this lesson. Second, the items in the "relevant_sources" in this patch series will include all sources that might be relevant. It has to be conservative and catch anything that might need a rename, but in the patch series after this one we'll find ways to weed out more of the might be relevant ones. Unfortunately, merge-ort does not have sufficient information to weed those ones out, and there isn't enough information at the time of filtering exact renames out to remove the extra ones either. It has to be deferred. So the deferral is in part to simplify some later additions. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-10 22:18:04 -08:00
Elijah Newren	cd52e0050f	diffcore-rename: add function for clearing dir_rename_count As we adjust the usage of dir_rename_count we want to have a function for clearing, or partially clearing it out. Add a partial_clear_dir_rename_count() function for this purpose. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-26 17:53:11 -08:00
Elijah Newren	0c4fd732f0	Move computation of dir_rename_count from merge-ort to diffcore-rename Move the computation of dir_rename_count from merge-ort.c to diffcore-rename.c, making slight adjustments to the data structures based on the move. While the diffstat looks large, viewing this commit with --color-moved makes it clear that only about 20 lines changed. With this patch, the computation of dir_rename_count is still only done after inexact rename detection, but subsequent commits will add a preliminary computation of dir_rename_count after exact rename detection, followed by some updates after inexact rename detection. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-26 17:53:11 -08:00
Elijah Newren	f78cf97617	merge-ort: call diffcore_rename() directly We want to pass additional information to diffcore_rename() (or some variant thereof) without plumbing that extra information through diff_tree_oid() and diffcore_std(). Further, since we will need to gather additional special information related to diffs and are walking the trees anyway in collect_merge_info(), it seems odd to have diff_tree_oid()/diffcore_std() repeat those tree walks. And there may be times where we can avoid traversing into a subtree in collect_merge_info() (based on additional information at our disposal), that the basic diff logic would be unable to take advantage of. For all these reasons, just create the add and delete pairs ourself and then call diffcore_rename() directly. This change is primarily about enabling future optimizations; the advantage of avoiding extra tree traversals is small compared to the cost of rename detection, and the advantage of avoiding the extra tree traversals is somewhat offset by the extra time spent in collect_merge_info() collecting the additional data anyway. However... For the testcases mentioned in commit `557ac0350d` ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 13.294 s ± 0.103 s 12.775 s ± 0.062 s mega-renames: 187.248 s ± 0.882 s 188.754 s ± 0.284 s just-one-mega: 5.557 s ± 0.017 s 5.599 s ± 0.019 s Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-15 18:02:16 -08:00
Elijah Newren	557ac0350d	merge-ort: begin performance work; instrument with trace2_region_* calls Add some timing instrumentation for both merge-ort and diffcore-rename; I used these to measure and optimize performance in both, and several future patch series will build on these to reduce the timings of some select testcases. === Setup === The primary testcase I used involved rebasing a random topic in the linux kernel (consisting of 35 patches) against an older version. I added two variants, one where I rename a toplevel directory, and another where I only rebase one patch instead of the whole topic. The setup is as follows: $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git $ git branch hwmon-updates fd8bdb23b91876ac1e624337bb88dc1dcc21d67e $ git branch hwmon-just-one fd8bdb23b91876ac1e624337bb88dc1dcc21d67e~34 $ git branch base 4703d9119972bf586d2cca76ec6438f819ffa30e $ git switch -c 5.4-renames v5.4 $ git mv drivers pilots # Introduce over 26,000 renames $ git commit -m "Rename drivers/ to pilots/" $ git config merge.renameLimit 30000 $ git config merge.directoryRenames true === Testcases === Now with REBASE standing for either "git rebase [--merge]" (using merge-recursive) or "test-tool fast-rebase" (using merge-ort), the testcases are: Testcase #1: no-renames $ git checkout v5.4^0 $ REBASE --onto HEAD base hwmon-updates Note: technically the name is misleading; there are some renames, but very few. Rename detection only takes about half the overall time. Testcase #2: mega-renames $ git checkout 5.4-renames^0 $ REBASE --onto HEAD base hwmon-updates Testcase #3: just-one-mega $ git checkout 5.4-renames^0 $ REBASE --onto HEAD base hwmon-just-one === Timing results === Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames, 10 runs for the other two cases): merge-recursive merge-ort no-renames: 18.912 s ± 0.174 s 14.263 s ± 0.053 s mega-renames: 5964.031 s ± 10.459 s 5504.231 s ± 5.150 s just-one-mega: 149.583 s ± 0.751 s 158.534 s ± 0.498 s A single re-run of each with some breakdowns: --- no-renames --- merge-recursive merge-ort overall runtime: 19.302 s 14.257 s inexact rename detection: 7.603 s 7.906 s everything else: 11.699 s 6.351 s --- mega-renames --- merge-recursive merge-ort overall runtime: 5950.195 s 5499.672 s inexact rename detection: 5746.309 s 5487.120 s everything else: 203.886 s 17.552 s --- just-one-mega --- merge-recursive merge-ort overall runtime: 151.001 s 158.582 s inexact rename detection: 143.448 s 157.835 s everything else: 7.553 s 0.747 s === Timing observations === 0) Maximum speedup The "everything else" row represents the maximum speedup we could achieve if we were to somehow infinitely parallelize inexact rename detection, but leave everything else alone. The fact that this is so much smaller than the real runtime (even in the case with virtually no renames) makes it clear just how overwhelmingly large the time spent on rename detection can be. 1) no-renames 1a) merge-ort is faster than merge-recursive, which is nice. However, this still should not be considered good enough. Although the "merge" backend to rebase (merge-recursive) is sometimes faster than the "apply" backend, this is one of those cases where it is not. In fact, even merge-ort is slower. The "apply" backend can complete this testcase in 6.940 s ± 0.485 s which is about 2x faster than merge-ort and 3x faster than merge-recursive. One goal of the merge-ort performance work will be to make it faster than git-am on this (and similar) testcases. 2) mega-renames 2a) Obviously rename detection is a huge cost; it's where most the time is spent. We need to cut that down. If we could somehow infinitely parallelize it and drive its time to 0, the merge-recursive time would drop to about 204s, and the merge-ort time would drop to about 17s. I think this particular stat shows I've subtly baked a couple performance improvements into merge-ort and into fast-rebase already. 3) just-one-mega 3a) not much to say here, it just gives some flavor for how rebasing only one patch compares to rebasing 35. === Goals === This patch is obviously just the beginning. Here are some of my goals that this measurement will help us achieve: * Drive the cost of rename detection down considerably for merges * After the above has been achieved, see if there are other slowness factors (which would have previously been overshadowed by rename detection costs) which we can then focus on and also optimize. * Ensure our rebase testcase that requires little rename detection is noticeably faster with merge-ort than with apply-based rebase. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Taylor Blau <ttaylorr@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 23:30:06 -08:00
Elijah Newren	5ced7c3da0	merge-ort: ignore the directory rename split conflict for now get_provisional_directory_renames() has code to detect directories being evenly split between different locations. However, as noted previously, if there are no new files added to that directory that was split evenly, our inability to determine where the directory was renamed to doesn't matter since there are no new files to try to move into the new location. Unfortunately, that code is unaware of whether there are new files under the directory in question and we just ignore that, causing us to fail t6423 test 2b but pass test 2a; turn off the error for now, swapping which tests pass and fail. The motivating reason for switching this off as a temporary measure is that as we add optimizations, we'll start looking at only subsets of renames, and subsets of renames can start switching the result we get when this error is (wrongly) on. Once we get enough optimizations, however, we can prevent that code from even running when there are no new files added to the relevant directory, at which point we can revert this commit and then both testcases 2a and 2b will pass simultaneously. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 23:30:06 -08:00
Elijah Newren	cf8937acde	merge-ort: fix massive leak When a series of merges was performed (such as for a rebase or series of cherry-picks), only the data structures allocated by the final merge operation were being freed. The problem was that while picking out pieces of merge-ort to upstream, I previously misread a certain section of merge_start() and assumed it was associated with a later optimization. Include that section now, which ensures that if there was a previous merge operation, that we clear out result->priv and then re-use it for opt->priv, and otherwise we allocate opt->priv. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 23:30:06 -08:00
Junio C Hamano	fe2f4d0031	Merge branch 'en/ort-directory-rename' into en/merge-ort-perf * en/ort-directory-rename: (28 commits) merge-ort: fix a directory rename detection bug merge-ort: process_renames() now needs more defensiveness merge-ort: implement apply_directory_rename_modifications() merge-ort: add a new toplevel_dir field merge-ort: implement handle_path_level_conflicts() merge-ort: implement check_for_directory_rename() merge-ort: implement apply_dir_rename() and check_dir_renamed() merge-ort: implement compute_collisions() merge-ort: modify collect_renames() for directory rename handling merge-ort: implement handle_directory_level_conflicts() merge-ort: implement compute_rename_counts() merge-ort: copy get_renamed_dir_portion() from merge-recursive.c merge-ort: add outline of get_provisional_directory_renames() merge-ort: add outline for computing directory renames merge-ort: collect which directories are removed in dirs_removed merge-ort: initialize and free new directory rename data structures merge-ort: add new data structures for directory rename detection merge-ort: add implementation of type-changed rename handling merge-ort: add implementation of normal rename handling merge-ort: add implementation of rename collisions ...	2021-01-20 22:52:50 -08:00
Elijah Newren	203c872c4f	merge-ort: fix a directory rename detection bug As noted in commit `902c521a35` ("t6423: more involved directory rename test", 2020-10-15), when we have a case where * dir/subdir/ has several files * almost all files in dir/subdir/ are renamed to folder/subdir/ * one of the files in dir/subdir/ is renamed to folder/subdir/newsubdir/ * the other side of history (that doesn't do the renames) adds a new file to dir/subdir/ Then for the majority of the file renames, the directory rename of dir/subdir/ -> folder/subdir/ is actually not represented that way but as dir/ -> folder/ We also had one rename that was represented as dir/subdir/ -> folder/subdir/newsubdir/ Now, since there's a new file in dir/subdir/, where does it go? Well, there's only one rule for dir/subdir/, so the code previously noted that this rule had the "majority" of the one "relevant" rename and thus erroneously used it to place the file in folder/subdir/newsubdir/. We really want the heavy weight associated with dir/ -> folder/ to also be treated as dir/subdir/ -> folder/subdir/, so that we correctly place the file in folder/subdir/. Add a bunch of logic to make sure that we use all relevant renamings in directory rename detection. Note that testcase 12f of t6423 still fails after this, but it gets further than merge-recursive does. There are some performance related bits in that testcase (the region_enter messages) that do not yet succeed, but the rest of the testcase works after this patch. Subsequent patch series will fix up the performance side. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	1b6b902d95	merge-ort: process_renames() now needs more defensiveness Since directory rename detection adds new paths to opt->priv->paths and removes old ones, process_renames() needs to now check whether pair->one->path actually exists in opt->priv->paths instead of just assuming it does. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	089d82bc18	merge-ort: implement apply_directory_rename_modifications() This function roughly follows the same outline as the function of the same name from merge-recursive.c, but the code diverges in multiple ways due to some special considerations: * merge-ort's version needs to update opt->priv->paths with any new paths (and opt->priv->paths points to struct conflict_infos which track quite a bit of metadata for each path); merge-recursive's version would directly update the index * merge-ort requires that opt->priv->paths has any leading directories of any relevant files also be included in the set of paths. And due to pointer equality requirements on merged_info.directory_name, we have to be careful how we compute and insert these. * due to the above requirements on opt->priv->paths, merge-ort's version starts with a long comment to explain all the special considerations that need to be handled * merge-ort can use the full data stored in opt->priv->paths to avoid making expensive get_tree_entry() calls to regather the necessary data. * due to messages being deferred automatically in merge-ort, this is the best place to handle conflict messages whereas in merge-recursive.c they are deferred manually so that processing of entries does all the printing Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	05b85c6eeb	merge-ort: add a new toplevel_dir field Due to the string-equality-iff-pointer-equality requirements placed on merged_info.directory_name, apply_directory_rename_modifications() will need to have access to the exact toplevel directory name string pointer and can't just use a new empty string. Store it in a field that we can use. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	bea433655a	merge-ort: implement handle_path_level_conflicts() This is copied from merge-recursive.c, with minor tweaks due to: * using strmap API * merge-ort not using the non_unique_new_dir field, since it'll obviate its need entirely later with performance improvements * adding a new path_in_way() function that uses opt->priv->paths instead of doing an expensive tree_has_path() lookup to see if a tree has a given path. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	47325e8533	merge-ort: implement check_for_directory_rename() This is copied from merge-recursive.c, with minor tweaks due to using strmap API and the fact that it can use opt->priv->paths to get all pathnames that exist instead of taking a tree object. This depends on a new function, handle_path_level_conflicts(), which just has a placeholder die-not-yet-implemented implementation for now; a subsequent patch will implement it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	fbcfc0cc17	merge-ort: implement apply_dir_rename() and check_dir_renamed() Both of these are copied from merge-recursive.c, with just minor tweaks due to using strmap API and not having a non_unique_new_dir field. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	d9d015df4a	merge-ort: implement compute_collisions() This is nearly a wholesale copy of compute_collisions() from merge-recursive.c, and the logic remains the same, but it has been tweaked slightly due to: * using strmap.h API (instead of direct hashmaps) * allocation/freeing of data structures were done separately in merge_start() and clear_or_reinit_internal_opts() in an earlier patch in this series * there is no non_unique_new_dir data field in merge-ort; that will be handled a different way It does depend on two new functions, apply_dir_rename() and check_dir_renamed() which were introduced with simple die-not-yet-implemented shells and will be implemented in subsequent patches. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	fa5e06d690	merge-ort: modify collect_renames() for directory rename handling collect_renames() is similar to merge-recursive.c's get_renames(), but lacks the directory rename handling found in the latter. Port that code structure over to merge-ort. This introduces three new die-not-yet-implemented functions that will be defined in future commits. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	98d0d08128	merge-ort: implement handle_directory_level_conflicts() This is modelled on the version of handle_directory_level_conflicts() from merge-recursive.c, but is massively simplified due to the following factors: * strmap API provides simplifications over using direct hashmap * we have a dirs_removed field in struct rename_info that we have an easy way to populate from collect_merge_info(); this was already used in compute_rename_counts() and thus we do not need to check for condition #2. * The removal of condition #2 by handling it earlier in the code also obviates the need to check for condition #3 -- if both sides renamed a directory, meaning that the directory no longer exists on either side, then neither side could have added any new files to that directory, and thus there are no files whose locations we need to move due to such a directory rename. In fact, the same logic that makes condition #3 irrelevant means condition #1 is also irrelevant so we could drop this function. However, it is cheap to check if both sides rename the same directory, and doing so can save future computation. So, simply remove any directories that both sides renamed from the list of directory renames. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	2f620a4f19	merge-ort: implement compute_rename_counts() This function is based on the first half of get_directory_renames() from merge-recursive.c; as part of the implementation, factor out a routine, increment_count(), to update the bookkeeping to track the number of items renamed into new directories. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	9fe37e7bb9	merge-ort: copy get_renamed_dir_portion() from merge-recursive.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	04264d4079	merge-ort: add outline of get_provisional_directory_renames() This function is based on merge-recursive.c's get_directory_renames(), except that the first half has been split out into a not-yet-implemented compute_rename_counts(). The primary difference here is our lack of the non_unique_new_dir boolean in our strmap. The lack of that field will at first cause us to fail testcase 2b of t6423; however, future optimizations will obviate the need for that ugly field so we have just left it out. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Elijah Newren	112e11126b	merge-ort: add outline for computing directory renames Port some directory rename handling changes from merge-recursive.c's detect_and_process_renames() to the same-named function of merge-ort.c. This does not yet add any use or handling of directory renames, just the outline for where we start to compute them. Thus, a future patch will add port additional changes to merge-ort's detect_and_process_renames(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 22:18:55 -08:00
Junio C Hamano	12aa5552a9	Merge branch 'en/ort-conflict-handling' into en/merge-ort-perf * en/ort-conflict-handling: merge-ort: add handling for different types of files at same path merge-ort: copy find_first_merges() implementation from merge-recursive.c merge-ort: implement format_commit() merge-ort: copy and adapt merge_submodule() from merge-recursive.c merge-ort: copy and adapt merge_3way() from merge-recursive.c merge-ort: flesh out implementation of handle_content_merge() merge-ort: handle book-keeping around two- and three-way content merge merge-ort: implement unique_path() helper merge-ort: handle directory/file conflicts that remain merge-ort: handle D/F conflict where directory disappears due to merge	2021-01-14 12:41:54 -08:00
Elijah Newren	eb3e3e1ddf	merge-ort: collect which directories are removed in dirs_removed Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-07 15:30:03 -08:00
Elijah Newren	f5d9fbc2e9	merge-ort: initialize and free new directory rename data structures Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-07 15:30:03 -08:00
Elijah Newren	c09376d55f	merge-ort: add new data structures for directory rename detection Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-07 15:30:02 -08:00
Junio C Hamano	8f894b2263	Merge branch 'en/merge-ort-3' into en/ort-directory-rename * en/merge-ort-3: merge-ort: add implementation of type-changed rename handling merge-ort: add implementation of normal rename handling merge-ort: add implementation of rename collisions merge-ort: add implementation of rename/delete conflicts merge-ort: add implementation of both sides renaming differently merge-ort: add implementation of both sides renaming identically merge-ort: add basic outline for process_renames() merge-ort: implement compare_pairs() and collect_renames() merge-ort: implement detect_regular_renames() merge-ort: add initial outline for basic rename detection merge-ort: add basic data structures for handling renames	2021-01-07 15:29:49 -08:00
Elijah Newren	4ef88fc3a8	merge-ort: add handling for different types of files at same path Add some handling that explicitly considers collisions of the following types: * file/submodule * file/symlink * submodule/symlink Leaving them as conflicts at the same path are hard for users to resolve, so move one or both of them aside so that they each get their own path. Note that in the case of recursive handling (i.e. call_depth > 0), we can just use the merge base of the two merge bases as the merge result much like we do with modify/delete conflicts, binary files, conflicting submodule values, and so on. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 10:40:45 -08:00
Elijah Newren	4204cd591b	merge-ort: copy find_first_merges() implementation from merge-recursive.c Code is identical for the function body in the two files, the call signature is just slightly different in merge-ort than merge-recursive as noted a couple commits ago. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 10:40:45 -08:00
Elijah Newren	70f19c7fce	merge-ort: implement format_commit() This implementation is based on a mixture of print_commit() and output_commit_title() from merge-recursive.c so that it can be used to take over both functions. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 10:40:45 -08:00
Elijah Newren	c73cda76b1	merge-ort: copy and adapt merge_submodule() from merge-recursive.c Take merge_submodule() from merge-recursive.c and make slight adjustments, predominantly around deferring output using path_msg() instead of using merge-recursive's output() and show() functions. There's also a fix for recursive cases (when call_depth > 0) and a slight change to argument order for find_first_merges(). find_first_merges() and format_commit() are left unimplemented for now, but will be added by subsequent commits. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 10:40:45 -08:00
Elijah Newren	f591c47246	merge-ort: copy and adapt merge_3way() from merge-recursive.c Take merge_3way() from merge-recursive.c and make slight adjustments based on different data structures (direct usage of object_id rather diff_filespec, separate pathnames which based on our careful interning of pathnames in opt->priv->paths can be compared with '!=' rather than 'strcmp'). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 10:40:45 -08:00
Elijah Newren	62fdec17a1	merge-ort: flesh out implementation of handle_content_merge() This implementation is based heavily on merge_mode_and_contents() from merge-recursive.c, though it has some fixes for recursive merges (i.e. when call_depth > 0), and has a number of changes throughout based on slight differences in data structures and in how the functions are called. It is, however, based on two new helper functions -- merge_3way() and merge_submodule -- for which we only provide die-not-implemented stubs at this point. Future commits will add implementations of these functions. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 10:40:45 -08:00
Elijah Newren	991bbdcab9	merge-ort: handle book-keeping around two- and three-way content merge In addition to the content merge (which will go in a subsequent commit), we need to worry about conflict messages, placing results in higher order stages in case of a df_conflict, and making sure the results are placed in ci->merged.result so that they will show up in the working tree. Take care of all that external book-keeping, moving the simplistic just-take-HEAD code into the barebones handle_content_merge() function for now. Subsequent commits will flesh out handle_content_merge(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 10:40:45 -08:00
Elijah Newren	5a1a1e8ea9	merge-ort: implement unique_path() helper Implement unique_path(), based on the one from merge-recursive.c. It is simplified, however, due to: (1) using strmaps, and (2) the fact that merge-ort lets the checkout codepath handle possible collisions with the working tree means that other code locations don't have to. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 10:40:45 -08:00
Elijah Newren	23366d2aa9	merge-ort: handle directory/file conflicts that remain When a directory/file conflict remains, we can leave the directory where it is, but need to move the information about the file to a different pathname. After moving the file to a different pathname, we allow subsequent process_entry() logic to handle any additional details that might be relevant. This depends on a new helper function, unique_path(), that dies with an unimplemented error currently but will be implemented in a subsequent commit. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 10:40:45 -08:00
Elijah Newren	0ccfa4e5d8	merge-ort: handle D/F conflict where directory disappears due to merge When one side has a directory at a given path and the other side of history has a file at the path, but the merge resolves the directory away (e.g. because no path within that directory was modified and the other side deleted it, or because renaming moved all the files elsewhere), then we don't actually have a conflict anymore. We just need to clear away any information related to the relevant directory, and then the subsequent process_entry() handling can handle the given path. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 10:40:45 -08:00
Elijah Newren	8119214f4e	merge-ort: implement merge_incore_recursive() Implement merge_incore_recursive(), mostly through the use of a new helper function, merge_ort_internal(), which itself is based off merge_recursive_internal() from merge-recursive.c. This drops the number of failures in the testsuite when run under GIT_TEST_MERGE_ALGORITHM=ort from around 1500 to 647. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-16 21:56:39 -08:00

1 2

90 Commits