git/refs.c

2952 lines
75 KiB
C
Raw Normal View History

/*
* The backend-independent part of the reference module.
*/
#define USE_THE_REPOSITORY_VARIABLE
#include "git-compat-util.h"
#include "advice.h"
#include "config.h"
#include "environment.h"
#include "strmap.h"
#include "gettext.h"
#include "hex.h"
#include "lockfile.h"
#include "iterator.h"
#include "refs.h"
#include "refs/refs-internal.h"
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:56:14 +08:00
#include "run-command.h"
#include "hook.h"
#include "object-name.h"
#include "object-store-ll.h"
#include "object.h"
#include "path.h"
#include "submodule.h"
#include "worktree.h"
#include "strvec.h"
#include "repo-settings.h"
#include "setup.h"
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:56:14 +08:00
#include "sigchain.h"
#include "date.h"
#include "commit.h"
#include "wildmatch.h"
/*
* List of all available backends
*/
static const struct ref_storage_be *refs_backends[] = {
[REF_STORAGE_FORMAT_FILES] = &refs_be_files,
refs: introduce reftable backend Due to scalability issues, Shawn Pearce has originally proposed a new "reftable" format more than six years ago [1]. Initially, this new format was implemented in JGit with promising results. Around two years ago, we have then added the "reftable" library to the Git codebase via a4bbd13be3 (Merge branch 'hn/reftable', 2021-12-15). With this we have landed all the low-level code to read and write reftables. Notably missing though was the integration of this low-level code into the Git code base in the form of a new ref backend that ties all of this together. This gap is now finally closed by introducing a new "reftable" backend into the Git codebase. This new backend promises to bring some notable improvements to Git repositories: - It becomes possible to do truly atomic writes where either all refs are committed to disk or none are. This was not possible with the "files" backend because ref updates were split across multiple loose files. - The disk space required to store many refs is reduced, both compared to loose refs and packed-refs. This is enabled both by the reftable format being a binary format, which is more compact, and by prefix compression. - We can ignore filesystem-specific behaviour as ref names are not encoded via paths anymore. This means there is no need to handle case sensitivity on Windows systems or Unicode precomposition on macOS. - There is no need to rewrite the complete refdb anymore every time a ref is being deleted like it was the case for packed-refs. This means that ref deletions are now constant time instead of scaling linearly with the number of refs. - We can ignore file/directory conflicts so that it becomes possible to store both "refs/heads/foo" and "refs/heads/foo/bar". - Due to this property we can retain reflogs for deleted refs. We have previously been deleting reflogs together with their refs to avoid file/directory conflicts, which is not necessary anymore. - We can properly enumerate all refs. With the "files" backend it is not easily possible to distinguish between refs and non-refs because they may live side by side in the gitdir. Not all of these improvements are realized with the current "reftable" backend implementation. At this point, the new backend is supposed to be a drop-in replacement for the "files" backend that is used by basically all Git repositories nowadays. It strives for 1:1 compatibility, which means that a user can expect the same behaviour regardless of whether they use the "reftable" backend or the "files" backend for most of the part. Most notably, this means we artificially limit the capabilities of the "reftable" backend to match the limits of the "files" backend. It is not possible to create refs that would end up with file/directory conflicts, we do not retain reflogs, we perform stricter-than-necessary checks. This is done intentionally due to two main reasons: - It makes it significantly easier to land the "reftable" backend as tests behave the same. It would be tough to argue for each and every single test that doesn't pass with the "reftable" backend. - It ensures compatibility between repositories that use the "files" backend and repositories that use the "reftable" backend. Like this, hosters can migrate their repositories to use the "reftable" backend without causing issues for clients that use the "files" backend in their clones. It is expected that these artificial limitations may eventually go away in the long term. Performance-wise things very much depend on the actual workload. The following benchmarks compare the "files" and "reftable" backends in the current version: - Creating N refs in separate transactions shows that the "files" backend is ~50% faster. This is not surprising given that creating a ref only requires us to create a single loose ref. The "reftable" backend will also perform auto compaction on updates. In real-world workloads we would likely also want to perform pack loose refs, which would likely change the picture. Benchmark 1: update-ref: create refs sequentially (refformat = files, refcount = 1) Time (mean ± σ): 2.1 ms ± 0.3 ms [User: 0.6 ms, System: 1.7 ms] Range (min … max): 1.8 ms … 4.3 ms 133 runs Benchmark 2: update-ref: create refs sequentially (refformat = reftable, refcount = 1) Time (mean ± σ): 2.7 ms ± 0.1 ms [User: 0.6 ms, System: 2.2 ms] Range (min … max): 2.4 ms … 2.9 ms 132 runs Benchmark 3: update-ref: create refs sequentially (refformat = files, refcount = 1000) Time (mean ± σ): 1.975 s ± 0.006 s [User: 0.437 s, System: 1.535 s] Range (min … max): 1.969 s … 1.980 s 3 runs Benchmark 4: update-ref: create refs sequentially (refformat = reftable, refcount = 1000) Time (mean ± σ): 2.611 s ± 0.013 s [User: 0.782 s, System: 1.825 s] Range (min … max): 2.597 s … 2.622 s 3 runs Benchmark 5: update-ref: create refs sequentially (refformat = files, refcount = 100000) Time (mean ± σ): 198.442 s ± 0.241 s [User: 43.051 s, System: 155.250 s] Range (min … max): 198.189 s … 198.670 s 3 runs Benchmark 6: update-ref: create refs sequentially (refformat = reftable, refcount = 100000) Time (mean ± σ): 294.509 s ± 4.269 s [User: 104.046 s, System: 190.326 s] Range (min … max): 290.223 s … 298.761 s 3 runs - Creating N refs in a single transaction shows that the "files" backend is significantly slower once we start to write many refs. The "reftable" backend only needs to update two files, whereas the "files" backend needs to write one file per ref. Benchmark 1: update-ref: create many refs (refformat = files, refcount = 1) Time (mean ± σ): 1.9 ms ± 0.1 ms [User: 0.4 ms, System: 1.4 ms] Range (min … max): 1.8 ms … 2.6 ms 151 runs Benchmark 2: update-ref: create many refs (refformat = reftable, refcount = 1) Time (mean ± σ): 2.5 ms ± 0.1 ms [User: 0.7 ms, System: 1.7 ms] Range (min … max): 2.4 ms … 3.4 ms 148 runs Benchmark 3: update-ref: create many refs (refformat = files, refcount = 1000) Time (mean ± σ): 152.5 ms ± 5.2 ms [User: 19.1 ms, System: 133.1 ms] Range (min … max): 148.5 ms … 167.8 ms 15 runs Benchmark 4: update-ref: create many refs (refformat = reftable, refcount = 1000) Time (mean ± σ): 58.0 ms ± 2.5 ms [User: 28.4 ms, System: 29.4 ms] Range (min … max): 56.3 ms … 72.9 ms 40 runs Benchmark 5: update-ref: create many refs (refformat = files, refcount = 1000000) Time (mean ± σ): 152.752 s ± 0.710 s [User: 20.315 s, System: 131.310 s] Range (min … max): 152.165 s … 153.542 s 3 runs Benchmark 6: update-ref: create many refs (refformat = reftable, refcount = 1000000) Time (mean ± σ): 51.912 s ± 0.127 s [User: 26.483 s, System: 25.424 s] Range (min … max): 51.769 s … 52.012 s 3 runs - Deleting a ref in a fully-packed repository shows that the "files" backend scales with the number of refs. The "reftable" backend has constant-time deletions. Benchmark 1: update-ref: delete ref (refformat = files, refcount = 1) Time (mean ± σ): 1.7 ms ± 0.1 ms [User: 0.4 ms, System: 1.2 ms] Range (min … max): 1.6 ms … 2.1 ms 316 runs Benchmark 2: update-ref: delete ref (refformat = reftable, refcount = 1) Time (mean ± σ): 1.8 ms ± 0.1 ms [User: 0.4 ms, System: 1.3 ms] Range (min … max): 1.7 ms … 2.1 ms 294 runs Benchmark 3: update-ref: delete ref (refformat = files, refcount = 1000) Time (mean ± σ): 2.0 ms ± 0.1 ms [User: 0.5 ms, System: 1.4 ms] Range (min … max): 1.9 ms … 2.5 ms 287 runs Benchmark 4: update-ref: delete ref (refformat = reftable, refcount = 1000) Time (mean ± σ): 1.9 ms ± 0.1 ms [User: 0.5 ms, System: 1.3 ms] Range (min … max): 1.8 ms … 2.1 ms 217 runs Benchmark 5: update-ref: delete ref (refformat = files, refcount = 1000000) Time (mean ± σ): 229.8 ms ± 7.9 ms [User: 182.6 ms, System: 46.8 ms] Range (min … max): 224.6 ms … 245.2 ms 6 runs Benchmark 6: update-ref: delete ref (refformat = reftable, refcount = 1000000) Time (mean ± σ): 2.0 ms ± 0.0 ms [User: 0.6 ms, System: 1.3 ms] Range (min … max): 2.0 ms … 2.1 ms 3 runs - Listing all refs shows no significant advantage for either of the backends. The "files" backend is a bit faster, but not by a significant margin. When repositories are not packed the "reftable" backend outperforms the "files" backend because the "reftable" backend performs auto-compaction. Benchmark 1: show-ref: print all refs (refformat = files, refcount = 1, packed = true) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.0 ms 1729 runs Benchmark 2: show-ref: print all refs (refformat = reftable, refcount = 1, packed = true) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 1.8 ms 1816 runs Benchmark 3: show-ref: print all refs (refformat = files, refcount = 1000, packed = true) Time (mean ± σ): 4.3 ms ± 0.1 ms [User: 0.9 ms, System: 3.3 ms] Range (min … max): 4.1 ms … 4.6 ms 645 runs Benchmark 4: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = true) Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 1.0 ms, System: 3.3 ms] Range (min … max): 4.2 ms … 5.9 ms 643 runs Benchmark 5: show-ref: print all refs (refformat = files, refcount = 1000000, packed = true) Time (mean ± σ): 2.537 s ± 0.034 s [User: 0.488 s, System: 2.048 s] Range (min … max): 2.511 s … 2.627 s 10 runs Benchmark 6: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = true) Time (mean ± σ): 2.712 s ± 0.017 s [User: 0.653 s, System: 2.059 s] Range (min … max): 2.692 s … 2.752 s 10 runs Benchmark 7: show-ref: print all refs (refformat = files, refcount = 1, packed = false) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 1.9 ms 1834 runs Benchmark 8: show-ref: print all refs (refformat = reftable, refcount = 1, packed = false) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.4 ms … 2.0 ms 1840 runs Benchmark 9: show-ref: print all refs (refformat = files, refcount = 1000, packed = false) Time (mean ± σ): 13.8 ms ± 0.2 ms [User: 2.8 ms, System: 10.8 ms] Range (min … max): 13.3 ms … 14.5 ms 208 runs Benchmark 10: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = false) Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 1.2 ms, System: 3.3 ms] Range (min … max): 4.3 ms … 6.2 ms 624 runs Benchmark 11: show-ref: print all refs (refformat = files, refcount = 1000000, packed = false) Time (mean ± σ): 12.127 s ± 0.129 s [User: 2.675 s, System: 9.451 s] Range (min … max): 11.965 s … 12.370 s 10 runs Benchmark 12: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = false) Time (mean ± σ): 2.799 s ± 0.022 s [User: 0.735 s, System: 2.063 s] Range (min … max): 2.769 s … 2.836 s 10 runs - Printing a single ref shows no real difference between the "files" and "reftable" backends. Benchmark 1: show-ref: print single ref (refformat = files, refcount = 1) Time (mean ± σ): 1.5 ms ± 0.1 ms [User: 0.4 ms, System: 1.0 ms] Range (min … max): 1.4 ms … 1.8 ms 1779 runs Benchmark 2: show-ref: print single ref (refformat = reftable, refcount = 1) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.4 ms … 2.5 ms 1753 runs Benchmark 3: show-ref: print single ref (refformat = files, refcount = 1000) Time (mean ± σ): 1.5 ms ± 0.1 ms [User: 0.3 ms, System: 1.1 ms] Range (min … max): 1.4 ms … 1.9 ms 1840 runs Benchmark 4: show-ref: print single ref (refformat = reftable, refcount = 1000) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.0 ms 1831 runs Benchmark 5: show-ref: print single ref (refformat = files, refcount = 1000000) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.1 ms 1848 runs Benchmark 6: show-ref: print single ref (refformat = reftable, refcount = 1000000) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.1 ms 1762 runs So overall, performance depends on the usecases. Except for many sequential writes the "reftable" backend is roughly on par or significantly faster than the "files" backend though. Given that the "files" backend has received 18 years of optimizations by now this can be seen as a win. Furthermore, we can expect that the "reftable" backend will grow faster over time when attention turns more towards optimizations. The complete test suite passes, except for those tests explicitly marked to require the REFFILES prerequisite. Some tests in t0610 are marked as failing because they depend on still-in-flight bug fixes. Tests can be run with the new backend by setting the GIT_TEST_DEFAULT_REF_FORMAT environment variable to "reftable". There is a single known conceptual incompatibility with the dumb HTTP transport. As "info/refs" SHOULD NOT contain the HEAD reference, and because the "HEAD" file is not valid anymore, it is impossible for the remote client to figure out the default branch without changing the protocol. This shortcoming needs to be handled in a subsequent patch series. As the reftable library has already been introduced a while ago, this commit message will not go into the details of how exactly the on-disk format works. Please refer to our preexisting technical documentation at Documentation/technical/reftable for this. [1]: https://public-inbox.org/git/CAJo=hJtyof=HRy=2sLP0ng0uZ4=S-DpZ5dR1aF+VHVETKG20OQ@mail.gmail.com/ Original-idea-by: Shawn Pearce <spearce@spearce.org> Based-on-patch-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-07 15:20:31 +08:00
[REF_STORAGE_FORMAT_REFTABLE] = &refs_be_reftable,
};
static const struct ref_storage_be *find_ref_storage_backend(
enum ref_storage_format ref_storage_format)
{
if (ref_storage_format < ARRAY_SIZE(refs_backends))
return refs_backends[ref_storage_format];
return NULL;
}
enum ref_storage_format ref_storage_format_by_name(const char *name)
{
for (unsigned int i = 0; i < ARRAY_SIZE(refs_backends); i++)
if (refs_backends[i] && !strcmp(refs_backends[i]->name, name))
return i;
return REF_STORAGE_FORMAT_UNKNOWN;
}
const char *ref_storage_format_to_name(enum ref_storage_format ref_storage_format)
{
const struct ref_storage_be *be = find_ref_storage_backend(ref_storage_format);
if (!be)
return "unknown";
return be->name;
}
/*
* How to handle various characters in refnames:
* 0: An acceptable character for refs
* 1: End-of-component
* 2: ., look for a preceding . to reject .. in refs
* 3: {, look for a preceding @ to reject @{ in refs
* 4: A bad character: ASCII control characters, and
* ":", "?", "[", "\", "^", "~", SP, or TAB
* 5: *, reject unless REFNAME_REFSPEC_PATTERN is set
*/
static unsigned char refname_disposition[256] = {
1, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 2, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 4,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 0, 4, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 4, 4
};
struct ref_namespace_info ref_namespace[] = {
[NAMESPACE_HEAD] = {
.ref = "HEAD",
.decoration = DECORATION_REF_HEAD,
.exact = 1,
},
[NAMESPACE_BRANCHES] = {
.ref = "refs/heads/",
.decoration = DECORATION_REF_LOCAL,
},
[NAMESPACE_TAGS] = {
.ref = "refs/tags/",
.decoration = DECORATION_REF_TAG,
},
[NAMESPACE_REMOTE_REFS] = {
/*
* The default refspec for new remotes copies refs from
* refs/heads/ on the remote into refs/remotes/<remote>/.
* As such, "refs/remotes/" has special handling.
*/
.ref = "refs/remotes/",
.decoration = DECORATION_REF_REMOTE,
},
[NAMESPACE_STASH] = {
/*
* The single ref "refs/stash" stores the latest stash.
* Older stashes can be found in the reflog.
*/
.ref = "refs/stash",
.exact = 1,
.decoration = DECORATION_REF_STASH,
},
[NAMESPACE_REPLACE] = {
/*
* This namespace allows Git to act as if one object ID
* points to the content of another. Unlike the other
* ref namespaces, this one can be changed by the
* GIT_REPLACE_REF_BASE environment variable. This
* .namespace value will be overwritten in setup_git_env().
*/
.ref = "refs/replace/",
.decoration = DECORATION_GRAFTED,
},
[NAMESPACE_NOTES] = {
/*
* The refs/notes/commit ref points to the tip of a
* parallel commit history that adds metadata to commits
* in the normal history. This ref can be overwritten
* by the core.notesRef config variable or the
* GIT_NOTES_REFS environment variable.
*/
.ref = "refs/notes/commit",
.exact = 1,
},
[NAMESPACE_PREFETCH] = {
/*
* Prefetch refs are written by the background 'fetch'
* maintenance task. It allows faster foreground fetches
* by advertising these previously-downloaded tips without
* updating refs/remotes/ without user intervention.
*/
.ref = "refs/prefetch/",
},
[NAMESPACE_REWRITTEN] = {
/*
* Rewritten refs are used by the 'label' command in the
* sequencer. These are particularly useful during an
* interactive rebase that uses the 'merge' command.
*/
.ref = "refs/rewritten/",
},
};
void update_ref_namespace(enum ref_namespace namespace, char *ref)
{
struct ref_namespace_info *info = &ref_namespace[namespace];
if (info->ref_updated)
free((char *)info->ref);
info->ref = ref;
info->ref_updated = 1;
}
/*
* Try to read one refname component from the front of refname.
* Return the length of the component found, or -1 if the component is
* not legal. It is legal if it is something reasonable to have under
* ".git/refs/"; We do not like it if:
*
* - it begins with ".", or
* - it has double dots "..", or
* - it has ASCII control characters, or
* - it has ":", "?", "[", "\", "^", "~", SP, or TAB anywhere, or
* - it has "*" anywhere unless REFNAME_REFSPEC_PATTERN is set, or
* - it ends with a "/", or
* - it ends with ".lock", or
* - it contains a "@{" portion
*
* When sanitized is not NULL, instead of rejecting the input refname
* as an error, try to come up with a usable replacement for the input
* refname in it.
*/
static int check_refname_component(const char *refname, int *flags,
struct strbuf *sanitized)
{
const char *cp;
char last = '\0';
size_t component_start = 0; /* garbage - not a reasonable initial value */
if (sanitized)
component_start = sanitized->len;
for (cp = refname; ; cp++) {
int ch = *cp & 255;
unsigned char disp = refname_disposition[ch];
if (sanitized && disp != 1)
strbuf_addch(sanitized, ch);
switch (disp) {
case 1:
goto out;
case 2:
if (last == '.') { /* Refname contains "..". */
if (sanitized)
/* collapse ".." to single "." */
strbuf_setlen(sanitized, sanitized->len - 1);
else
return -1;
}
break;
case 3:
if (last == '@') { /* Refname contains "@{". */
if (sanitized)
sanitized->buf[sanitized->len-1] = '-';
else
return -1;
}
break;
case 4:
/* forbidden char */
if (sanitized)
sanitized->buf[sanitized->len-1] = '-';
else
return -1;
break;
case 5:
if (!(*flags & REFNAME_REFSPEC_PATTERN)) {
/* refspec can't be a pattern */
if (sanitized)
sanitized->buf[sanitized->len-1] = '-';
else
return -1;
}
/*
* Unset the pattern flag so that we only accept
* a single asterisk for one side of refspec.
*/
*flags &= ~ REFNAME_REFSPEC_PATTERN;
break;
}
last = ch;
}
out:
if (cp == refname)
return 0; /* Component has zero length. */
if (refname[0] == '.') { /* Component starts with '.'. */
if (sanitized)
sanitized->buf[component_start] = '-';
else
return -1;
}
if (cp - refname >= LOCK_SUFFIX_LEN &&
!memcmp(cp - LOCK_SUFFIX_LEN, LOCK_SUFFIX, LOCK_SUFFIX_LEN)) {
if (!sanitized)
return -1;
/* Refname ends with ".lock". */
while (strbuf_strip_suffix(sanitized, LOCK_SUFFIX)) {
/* try again in case we have .lock.lock */
}
}
return cp - refname;
}
static int check_or_sanitize_refname(const char *refname, int flags,
struct strbuf *sanitized)
{
int component_len, component_count = 0;
if (!strcmp(refname, "@")) {
/* Refname is a single character '@'. */
if (sanitized)
strbuf_addch(sanitized, '-');
else
return -1;
}
while (1) {
if (sanitized && sanitized->len)
strbuf_complete(sanitized, '/');
/* We are at the start of a path component. */
component_len = check_refname_component(refname, &flags,
sanitized);
if (sanitized && component_len == 0)
; /* OK, omit empty component */
else if (component_len <= 0)
return -1;
component_count++;
if (refname[component_len] == '\0')
break;
/* Skip to next component. */
refname += component_len + 1;
}
if (refname[component_len - 1] == '.') {
/* Refname ends with '.'. */
if (sanitized)
; /* omit ending dot */
else
return -1;
}
if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
return -1; /* Refname has only one component. */
return 0;
}
int check_refname_format(const char *refname, int flags)
{
return check_or_sanitize_refname(refname, flags, NULL);
}
int refs_fsck(struct ref_store *refs, struct fsck_options *o)
{
return refs->be->fsck(refs, o);
}
void sanitize_refname_component(const char *refname, struct strbuf *out)
{
if (check_or_sanitize_refname(refname, REFNAME_ALLOW_ONELEVEL, out))
BUG("sanitizing refname '%s' check returned error", refname);
}
int refname_is_safe(const char *refname)
refs.c: allow listing and deleting badly named refs We currently do not handle badly named refs well: $ cp .git/refs/heads/master .git/refs/heads/master.....@\*@\\. $ git branch fatal: Reference has invalid format: 'refs/heads/master.....@*@\.' $ git branch -D master.....@\*@\\. error: branch 'master.....@*@\.' not found. Users cannot recover from a badly named ref without manually finding and deleting the loose ref file or appropriate line in packed-refs. Making that easier will make it easier to tweak the ref naming rules in the future, for example to forbid shell metacharacters like '`' and '"', without putting people in a state that is hard to get out of. So allow "branch --list" to show these refs and allow "branch -d/-D" and "update-ref -d" to delete them. Other commands (for example to rename refs) will continue to not handle these refs but can be changed in later patches. Details: In resolving functions, refuse to resolve refs that don't pass the git-check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to resolve refs that escape the refs/ directory and do not match the pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD"). In locking functions, refuse to act on badly named refs unless they are being deleted and either are in the refs/ directory or match [A-Z_]*. Just like other invalid refs, flag resolved, badly named refs with the REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them in all iteration functions except for for_each_rawref. Flag badly named refs (but not symrefs pointing to badly named refs) with a REF_BAD_NAME flag to make it easier for future callers to notice and handle them specially. For example, in a later patch for-each-ref will use this flag to detect refs whose names can confuse callers parsing for-each-ref output. In the transaction API, refuse to create or update badly named refs, but allow deleting them (unless they try to escape refs/ and don't match [A-Z_]*). Signed-off-by: Ronnie Sahlberg <sahlberg@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-04 02:45:43 +08:00
{
const char *rest;
if (skip_prefix(refname, "refs/", &rest)) {
refs.c: allow listing and deleting badly named refs We currently do not handle badly named refs well: $ cp .git/refs/heads/master .git/refs/heads/master.....@\*@\\. $ git branch fatal: Reference has invalid format: 'refs/heads/master.....@*@\.' $ git branch -D master.....@\*@\\. error: branch 'master.....@*@\.' not found. Users cannot recover from a badly named ref without manually finding and deleting the loose ref file or appropriate line in packed-refs. Making that easier will make it easier to tweak the ref naming rules in the future, for example to forbid shell metacharacters like '`' and '"', without putting people in a state that is hard to get out of. So allow "branch --list" to show these refs and allow "branch -d/-D" and "update-ref -d" to delete them. Other commands (for example to rename refs) will continue to not handle these refs but can be changed in later patches. Details: In resolving functions, refuse to resolve refs that don't pass the git-check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to resolve refs that escape the refs/ directory and do not match the pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD"). In locking functions, refuse to act on badly named refs unless they are being deleted and either are in the refs/ directory or match [A-Z_]*. Just like other invalid refs, flag resolved, badly named refs with the REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them in all iteration functions except for for_each_rawref. Flag badly named refs (but not symrefs pointing to badly named refs) with a REF_BAD_NAME flag to make it easier for future callers to notice and handle them specially. For example, in a later patch for-each-ref will use this flag to detect refs whose names can confuse callers parsing for-each-ref output. In the transaction API, refuse to create or update badly named refs, but allow deleting them (unless they try to escape refs/ and don't match [A-Z_]*). Signed-off-by: Ronnie Sahlberg <sahlberg@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-04 02:45:43 +08:00
char *buf;
int result;
size_t restlen = strlen(rest);
/* rest must not be empty, or start or end with "/" */
if (!restlen || *rest == '/' || rest[restlen - 1] == '/')
return 0;
refs.c: allow listing and deleting badly named refs We currently do not handle badly named refs well: $ cp .git/refs/heads/master .git/refs/heads/master.....@\*@\\. $ git branch fatal: Reference has invalid format: 'refs/heads/master.....@*@\.' $ git branch -D master.....@\*@\\. error: branch 'master.....@*@\.' not found. Users cannot recover from a badly named ref without manually finding and deleting the loose ref file or appropriate line in packed-refs. Making that easier will make it easier to tweak the ref naming rules in the future, for example to forbid shell metacharacters like '`' and '"', without putting people in a state that is hard to get out of. So allow "branch --list" to show these refs and allow "branch -d/-D" and "update-ref -d" to delete them. Other commands (for example to rename refs) will continue to not handle these refs but can be changed in later patches. Details: In resolving functions, refuse to resolve refs that don't pass the git-check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to resolve refs that escape the refs/ directory and do not match the pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD"). In locking functions, refuse to act on badly named refs unless they are being deleted and either are in the refs/ directory or match [A-Z_]*. Just like other invalid refs, flag resolved, badly named refs with the REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them in all iteration functions except for for_each_rawref. Flag badly named refs (but not symrefs pointing to badly named refs) with a REF_BAD_NAME flag to make it easier for future callers to notice and handle them specially. For example, in a later patch for-each-ref will use this flag to detect refs whose names can confuse callers parsing for-each-ref output. In the transaction API, refuse to create or update badly named refs, but allow deleting them (unless they try to escape refs/ and don't match [A-Z_]*). Signed-off-by: Ronnie Sahlberg <sahlberg@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-04 02:45:43 +08:00
/*
* Does the refname try to escape refs/?
* For example: refs/foo/../bar is safe but refs/foo/../../bar
* is not.
*/
buf = xmallocz(restlen);
result = !normalize_path_copy(buf, rest) && !strcmp(buf, rest);
refs.c: allow listing and deleting badly named refs We currently do not handle badly named refs well: $ cp .git/refs/heads/master .git/refs/heads/master.....@\*@\\. $ git branch fatal: Reference has invalid format: 'refs/heads/master.....@*@\.' $ git branch -D master.....@\*@\\. error: branch 'master.....@*@\.' not found. Users cannot recover from a badly named ref without manually finding and deleting the loose ref file or appropriate line in packed-refs. Making that easier will make it easier to tweak the ref naming rules in the future, for example to forbid shell metacharacters like '`' and '"', without putting people in a state that is hard to get out of. So allow "branch --list" to show these refs and allow "branch -d/-D" and "update-ref -d" to delete them. Other commands (for example to rename refs) will continue to not handle these refs but can be changed in later patches. Details: In resolving functions, refuse to resolve refs that don't pass the git-check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to resolve refs that escape the refs/ directory and do not match the pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD"). In locking functions, refuse to act on badly named refs unless they are being deleted and either are in the refs/ directory or match [A-Z_]*. Just like other invalid refs, flag resolved, badly named refs with the REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them in all iteration functions except for for_each_rawref. Flag badly named refs (but not symrefs pointing to badly named refs) with a REF_BAD_NAME flag to make it easier for future callers to notice and handle them specially. For example, in a later patch for-each-ref will use this flag to detect refs whose names can confuse callers parsing for-each-ref output. In the transaction API, refuse to create or update badly named refs, but allow deleting them (unless they try to escape refs/ and don't match [A-Z_]*). Signed-off-by: Ronnie Sahlberg <sahlberg@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-04 02:45:43 +08:00
free(buf);
return result;
}
do {
refs.c: allow listing and deleting badly named refs We currently do not handle badly named refs well: $ cp .git/refs/heads/master .git/refs/heads/master.....@\*@\\. $ git branch fatal: Reference has invalid format: 'refs/heads/master.....@*@\.' $ git branch -D master.....@\*@\\. error: branch 'master.....@*@\.' not found. Users cannot recover from a badly named ref without manually finding and deleting the loose ref file or appropriate line in packed-refs. Making that easier will make it easier to tweak the ref naming rules in the future, for example to forbid shell metacharacters like '`' and '"', without putting people in a state that is hard to get out of. So allow "branch --list" to show these refs and allow "branch -d/-D" and "update-ref -d" to delete them. Other commands (for example to rename refs) will continue to not handle these refs but can be changed in later patches. Details: In resolving functions, refuse to resolve refs that don't pass the git-check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to resolve refs that escape the refs/ directory and do not match the pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD"). In locking functions, refuse to act on badly named refs unless they are being deleted and either are in the refs/ directory or match [A-Z_]*. Just like other invalid refs, flag resolved, badly named refs with the REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them in all iteration functions except for for_each_rawref. Flag badly named refs (but not symrefs pointing to badly named refs) with a REF_BAD_NAME flag to make it easier for future callers to notice and handle them specially. For example, in a later patch for-each-ref will use this flag to detect refs whose names can confuse callers parsing for-each-ref output. In the transaction API, refuse to create or update badly named refs, but allow deleting them (unless they try to escape refs/ and don't match [A-Z_]*). Signed-off-by: Ronnie Sahlberg <sahlberg@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-04 02:45:43 +08:00
if (!isupper(*refname) && *refname != '_')
return 0;
refname++;
} while (*refname);
refs.c: allow listing and deleting badly named refs We currently do not handle badly named refs well: $ cp .git/refs/heads/master .git/refs/heads/master.....@\*@\\. $ git branch fatal: Reference has invalid format: 'refs/heads/master.....@*@\.' $ git branch -D master.....@\*@\\. error: branch 'master.....@*@\.' not found. Users cannot recover from a badly named ref without manually finding and deleting the loose ref file or appropriate line in packed-refs. Making that easier will make it easier to tweak the ref naming rules in the future, for example to forbid shell metacharacters like '`' and '"', without putting people in a state that is hard to get out of. So allow "branch --list" to show these refs and allow "branch -d/-D" and "update-ref -d" to delete them. Other commands (for example to rename refs) will continue to not handle these refs but can be changed in later patches. Details: In resolving functions, refuse to resolve refs that don't pass the git-check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to resolve refs that escape the refs/ directory and do not match the pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD"). In locking functions, refuse to act on badly named refs unless they are being deleted and either are in the refs/ directory or match [A-Z_]*. Just like other invalid refs, flag resolved, badly named refs with the REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them in all iteration functions except for for_each_rawref. Flag badly named refs (but not symrefs pointing to badly named refs) with a REF_BAD_NAME flag to make it easier for future callers to notice and handle them specially. For example, in a later patch for-each-ref will use this flag to detect refs whose names can confuse callers parsing for-each-ref output. In the transaction API, refuse to create or update badly named refs, but allow deleting them (unless they try to escape refs/ and don't match [A-Z_]*). Signed-off-by: Ronnie Sahlberg <sahlberg@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-04 02:45:43 +08:00
return 1;
}
/*
* Return true if refname, which has the specified oid and flags, can
* be resolved to an object in the database. If the referred-to object
* does not exist, emit a warning and return false.
*/
int ref_resolves_to_object(const char *refname,
struct repository *repo,
const struct object_id *oid,
unsigned int flags)
{
if (flags & REF_ISBROKEN)
return 0;
if (!repo_has_object_file(repo, oid)) {
error(_("%s does not point to a valid object!"), refname);
return 0;
}
return 1;
}
char *refs_resolve_refdup(struct ref_store *refs,
const char *refname, int resolve_flags,
struct object_id *oid, int *flags)
{
const char *result;
result = refs_resolve_ref_unsafe(refs, refname, resolve_flags,
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 22:37:01 +08:00
oid, flags);
return xstrdup_or_null(result);
}
/* The argument to for_each_filter_refs */
struct for_each_ref_filter {
const char *pattern;
const char *prefix;
each_ref_fn *fn;
void *cb_data;
};
int refs_read_ref_full(struct ref_store *refs, const char *refname,
int resolve_flags, struct object_id *oid, int *flags)
{
if (refs_resolve_ref_unsafe(refs, refname, resolve_flags,
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 22:37:01 +08:00
oid, flags))
return 0;
return -1;
}
int refs_read_ref(struct ref_store *refs, const char *refname, struct object_id *oid)
{
return refs_read_ref_full(refs, refname, RESOLVE_REF_READING, oid, NULL);
}
int refs_ref_exists(struct ref_store *refs, const char *refname)
{
return !!refs_resolve_ref_unsafe(refs, refname, RESOLVE_REF_READING,
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 22:37:01 +08:00
NULL, NULL);
}
static int for_each_filter_refs(const char *refname, const char *referent,
const struct object_id *oid,
int flags, void *data)
{
struct for_each_ref_filter *filter = data;
if (wildmatch(filter->pattern, refname, 0))
return 0;
if (filter->prefix)
skip_prefix(refname, filter->prefix, &refname);
return filter->fn(refname, referent, oid, flags, filter->cb_data);
}
struct warn_if_dangling_data {
struct ref_store *refs;
FILE *fp;
const char *refname;
const struct string_list *refnames;
const char *msg_fmt;
};
static int warn_if_dangling_symref(const char *refname, const char *referent UNUSED,
const struct object_id *oid UNUSED,
int flags, void *cb_data)
{
struct warn_if_dangling_data *d = cb_data;
const char *resolves_to;
if (!(flags & REF_ISSYMREF))
return 0;
resolves_to = refs_resolve_ref_unsafe(d->refs, refname, 0, NULL, NULL);
if (!resolves_to
|| (d->refname
? strcmp(resolves_to, d->refname)
: !string_list_has_string(d->refnames, resolves_to))) {
return 0;
}
fprintf(d->fp, d->msg_fmt, refname);
fputc('\n', d->fp);
return 0;
}
void refs_warn_dangling_symref(struct ref_store *refs, FILE *fp,
const char *msg_fmt, const char *refname)
{
struct warn_if_dangling_data data = {
.refs = refs,
.fp = fp,
.refname = refname,
.msg_fmt = msg_fmt,
};
refs_for_each_rawref(refs, warn_if_dangling_symref, &data);
}
void refs_warn_dangling_symrefs(struct ref_store *refs, FILE *fp,
const char *msg_fmt, const struct string_list *refnames)
{
struct warn_if_dangling_data data = {
.refs = refs,
.fp = fp,
.refnames = refnames,
.msg_fmt = msg_fmt,
};
refs_for_each_rawref(refs, warn_if_dangling_symref, &data);
}
int refs_for_each_tag_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
{
return refs_for_each_ref_in(refs, "refs/tags/", fn, cb_data);
}
int refs_for_each_branch_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
{
return refs_for_each_ref_in(refs, "refs/heads/", fn, cb_data);
}
int refs_for_each_remote_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
{
return refs_for_each_ref_in(refs, "refs/remotes/", fn, cb_data);
}
int refs_head_ref_namespaced(struct ref_store *refs, each_ref_fn fn, void *cb_data)
{
struct strbuf buf = STRBUF_INIT;
int ret = 0;
struct object_id oid;
int flag;
strbuf_addf(&buf, "%sHEAD", get_git_namespace());
if (!refs_read_ref_full(refs, buf.buf, RESOLVE_REF_READING, &oid, &flag))
ret = fn(buf.buf, NULL, &oid, flag, cb_data);
strbuf_release(&buf);
return ret;
}
log: add option to choose which refs to decorate When `log --decorate` is used, git will decorate commits with all available refs. While in most cases this may give the desired effect, under some conditions it can lead to excessively verbose output. Introduce two command line options, `--decorate-refs=<pattern>` and `--decorate-refs-exclude=<pattern>` to allow the user to select which refs are used in decoration. When "--decorate-refs=<pattern>" is given, only the refs that match the pattern are used in decoration. The refs that match the pattern when "--decorate-refs-exclude=<pattern>" is given, are never used in decoration. These options follow the same convention for mixing negative and positive patterns across the system, assuming that the inclusive default is to match all refs available. (1) if there is no positive pattern given, pretend as if an inclusive default positive pattern was given; (2) for each candidate, reject it if it matches no positive pattern, or if it matches any one of the negative patterns. The rules for what is considered a match are slightly different from the rules used elsewhere. Commands like `log --glob` assume a trailing '/*' when glob chars are not present in the pattern. This makes it difficult to specify a single ref. On the other hand, commands like `describe --match --all` allow specifying exact refs, but do not have the convenience of allowing "shorthand refs" like 'refs/heads' or 'heads' to refer to 'refs/heads/*'. The commands introduced in this patch consider a match if: (a) the pattern contains globs chars, and regular pattern matching returns a match. (b) the pattern does not contain glob chars, and ref '<pattern>' exists, or if ref exists under '<pattern>/' This allows both behaviours (allowing single refs and shorthand refs) yet remaining compatible with existent commands. Helped-by: Kevin Daudt <me@ikke.info> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rafael Ascensão <rafa.almas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-22 05:33:41 +08:00
void normalize_glob_ref(struct string_list_item *item, const char *prefix,
const char *pattern)
{
struct strbuf normalized_pattern = STRBUF_INIT;
if (*pattern == '/')
BUG("pattern must not start with '/'");
if (prefix)
log: add option to choose which refs to decorate When `log --decorate` is used, git will decorate commits with all available refs. While in most cases this may give the desired effect, under some conditions it can lead to excessively verbose output. Introduce two command line options, `--decorate-refs=<pattern>` and `--decorate-refs-exclude=<pattern>` to allow the user to select which refs are used in decoration. When "--decorate-refs=<pattern>" is given, only the refs that match the pattern are used in decoration. The refs that match the pattern when "--decorate-refs-exclude=<pattern>" is given, are never used in decoration. These options follow the same convention for mixing negative and positive patterns across the system, assuming that the inclusive default is to match all refs available. (1) if there is no positive pattern given, pretend as if an inclusive default positive pattern was given; (2) for each candidate, reject it if it matches no positive pattern, or if it matches any one of the negative patterns. The rules for what is considered a match are slightly different from the rules used elsewhere. Commands like `log --glob` assume a trailing '/*' when glob chars are not present in the pattern. This makes it difficult to specify a single ref. On the other hand, commands like `describe --match --all` allow specifying exact refs, but do not have the convenience of allowing "shorthand refs" like 'refs/heads' or 'heads' to refer to 'refs/heads/*'. The commands introduced in this patch consider a match if: (a) the pattern contains globs chars, and regular pattern matching returns a match. (b) the pattern does not contain glob chars, and ref '<pattern>' exists, or if ref exists under '<pattern>/' This allows both behaviours (allowing single refs and shorthand refs) yet remaining compatible with existent commands. Helped-by: Kevin Daudt <me@ikke.info> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rafael Ascensão <rafa.almas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-22 05:33:41 +08:00
strbuf_addstr(&normalized_pattern, prefix);
else if (!starts_with(pattern, "refs/") &&
strcmp(pattern, "HEAD"))
log: add option to choose which refs to decorate When `log --decorate` is used, git will decorate commits with all available refs. While in most cases this may give the desired effect, under some conditions it can lead to excessively verbose output. Introduce two command line options, `--decorate-refs=<pattern>` and `--decorate-refs-exclude=<pattern>` to allow the user to select which refs are used in decoration. When "--decorate-refs=<pattern>" is given, only the refs that match the pattern are used in decoration. The refs that match the pattern when "--decorate-refs-exclude=<pattern>" is given, are never used in decoration. These options follow the same convention for mixing negative and positive patterns across the system, assuming that the inclusive default is to match all refs available. (1) if there is no positive pattern given, pretend as if an inclusive default positive pattern was given; (2) for each candidate, reject it if it matches no positive pattern, or if it matches any one of the negative patterns. The rules for what is considered a match are slightly different from the rules used elsewhere. Commands like `log --glob` assume a trailing '/*' when glob chars are not present in the pattern. This makes it difficult to specify a single ref. On the other hand, commands like `describe --match --all` allow specifying exact refs, but do not have the convenience of allowing "shorthand refs" like 'refs/heads' or 'heads' to refer to 'refs/heads/*'. The commands introduced in this patch consider a match if: (a) the pattern contains globs chars, and regular pattern matching returns a match. (b) the pattern does not contain glob chars, and ref '<pattern>' exists, or if ref exists under '<pattern>/' This allows both behaviours (allowing single refs and shorthand refs) yet remaining compatible with existent commands. Helped-by: Kevin Daudt <me@ikke.info> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rafael Ascensão <rafa.almas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-22 05:33:41 +08:00
strbuf_addstr(&normalized_pattern, "refs/");
/*
* NEEDSWORK: Special case other symrefs such as REBASE_HEAD,
* MERGE_HEAD, etc.
*/
log: add option to choose which refs to decorate When `log --decorate` is used, git will decorate commits with all available refs. While in most cases this may give the desired effect, under some conditions it can lead to excessively verbose output. Introduce two command line options, `--decorate-refs=<pattern>` and `--decorate-refs-exclude=<pattern>` to allow the user to select which refs are used in decoration. When "--decorate-refs=<pattern>" is given, only the refs that match the pattern are used in decoration. The refs that match the pattern when "--decorate-refs-exclude=<pattern>" is given, are never used in decoration. These options follow the same convention for mixing negative and positive patterns across the system, assuming that the inclusive default is to match all refs available. (1) if there is no positive pattern given, pretend as if an inclusive default positive pattern was given; (2) for each candidate, reject it if it matches no positive pattern, or if it matches any one of the negative patterns. The rules for what is considered a match are slightly different from the rules used elsewhere. Commands like `log --glob` assume a trailing '/*' when glob chars are not present in the pattern. This makes it difficult to specify a single ref. On the other hand, commands like `describe --match --all` allow specifying exact refs, but do not have the convenience of allowing "shorthand refs" like 'refs/heads' or 'heads' to refer to 'refs/heads/*'. The commands introduced in this patch consider a match if: (a) the pattern contains globs chars, and regular pattern matching returns a match. (b) the pattern does not contain glob chars, and ref '<pattern>' exists, or if ref exists under '<pattern>/' This allows both behaviours (allowing single refs and shorthand refs) yet remaining compatible with existent commands. Helped-by: Kevin Daudt <me@ikke.info> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rafael Ascensão <rafa.almas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-22 05:33:41 +08:00
strbuf_addstr(&normalized_pattern, pattern);
strbuf_strip_suffix(&normalized_pattern, "/");
item->string = strbuf_detach(&normalized_pattern, NULL);
item->util = has_glob_specials(pattern) ? NULL : item->string;
strbuf_release(&normalized_pattern);
}
int refs_for_each_glob_ref_in(struct ref_store *refs, each_ref_fn fn,
const char *pattern, const char *prefix, void *cb_data)
{
struct strbuf real_pattern = STRBUF_INIT;
struct for_each_ref_filter filter;
int ret;
if (!prefix && !starts_with(pattern, "refs/"))
strbuf_addstr(&real_pattern, "refs/");
else if (prefix)
strbuf_addstr(&real_pattern, prefix);
strbuf_addstr(&real_pattern, pattern);
if (!has_glob_specials(pattern)) {
/* Append implied '/' '*' if not present. */
2015-09-25 05:08:35 +08:00
strbuf_complete(&real_pattern, '/');
/* No need to check for '*', there is none. */
strbuf_addch(&real_pattern, '*');
}
filter.pattern = real_pattern.buf;
filter.prefix = prefix;
filter.fn = fn;
filter.cb_data = cb_data;
ret = refs_for_each_ref(refs, for_each_filter_refs, &filter);
strbuf_release(&real_pattern);
return ret;
}
int refs_for_each_glob_ref(struct ref_store *refs, each_ref_fn fn,
const char *pattern, void *cb_data)
{
return refs_for_each_glob_ref_in(refs, fn, pattern, NULL, cb_data);
}
const char *prettify_refname(const char *name)
{
if (skip_prefix(name, "refs/heads/", &name) ||
skip_prefix(name, "refs/tags/", &name) ||
skip_prefix(name, "refs/remotes/", &name))
; /* nothing */
return name;
}
static const char *ref_rev_parse_rules[] = {
2007-11-11 22:01:46 +08:00
"%.*s",
"refs/%.*s",
"refs/tags/%.*s",
"refs/heads/%.*s",
"refs/remotes/%.*s",
"refs/remotes/%.*s/HEAD",
NULL
};
remote: make refspec follow the same disambiguation rule as local refs When matching a non-wildcard LHS of a refspec against a list of refs, find_ref_by_name_abbrev() returns the first ref that matches using any DWIM rules used by refname_match() in refs.c, even if a better match occurs later in the list of refs. This causes unexpected behavior when (for example) fetching using the refspec "refs/heads/s:<something>" from a remote with both "refs/heads/refs/heads/s" and "refs/heads/s"; even if the former was inadvertently created, one would still expect the latter to be fetched. Similarly, when both a tag T and a branch T exist, fetching T should favor the tag, just like how local refname disambiguation rule works. But because the code walks over ls-remote output from the remote, which happens to be sorted in alphabetical order and has refs/heads/T before refs/tags/T, a request to fetch T is (mis)interpreted as fetching refs/heads/T. Update refname_match(), all of whose current callers care only if it returns non-zero (i.e. matches) to see if an abbreviated name can mean the full name being tested, so that it returns a positive integer whose magnitude can be used to tell the precedence, and fix the find_ref_by_name_abbrev() function not to stop at the first match but find the match with the highest precedence. This is based on an earlier work, which special cased only the exact matches, by Jonathan Tan. Helped-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-02 00:22:37 +08:00
#define NUM_REV_PARSE_RULES (ARRAY_SIZE(ref_rev_parse_rules) - 1)
/*
* Is it possible that the caller meant full_name with abbrev_name?
* If so return a non-zero value to signal "yes"; the magnitude of
* the returned value gives the precedence used for disambiguation.
*
* If abbrev_name cannot mean full_name, return 0.
*/
int refname_match(const char *abbrev_name, const char *full_name)
2007-11-11 22:01:46 +08:00
{
const char **p;
const int abbrev_name_len = strlen(abbrev_name);
remote: make refspec follow the same disambiguation rule as local refs When matching a non-wildcard LHS of a refspec against a list of refs, find_ref_by_name_abbrev() returns the first ref that matches using any DWIM rules used by refname_match() in refs.c, even if a better match occurs later in the list of refs. This causes unexpected behavior when (for example) fetching using the refspec "refs/heads/s:<something>" from a remote with both "refs/heads/refs/heads/s" and "refs/heads/s"; even if the former was inadvertently created, one would still expect the latter to be fetched. Similarly, when both a tag T and a branch T exist, fetching T should favor the tag, just like how local refname disambiguation rule works. But because the code walks over ls-remote output from the remote, which happens to be sorted in alphabetical order and has refs/heads/T before refs/tags/T, a request to fetch T is (mis)interpreted as fetching refs/heads/T. Update refname_match(), all of whose current callers care only if it returns non-zero (i.e. matches) to see if an abbreviated name can mean the full name being tested, so that it returns a positive integer whose magnitude can be used to tell the precedence, and fix the find_ref_by_name_abbrev() function not to stop at the first match but find the match with the highest precedence. This is based on an earlier work, which special cased only the exact matches, by Jonathan Tan. Helped-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-02 00:22:37 +08:00
const int num_rules = NUM_REV_PARSE_RULES;
2007-11-11 22:01:46 +08:00
remote: make refspec follow the same disambiguation rule as local refs When matching a non-wildcard LHS of a refspec against a list of refs, find_ref_by_name_abbrev() returns the first ref that matches using any DWIM rules used by refname_match() in refs.c, even if a better match occurs later in the list of refs. This causes unexpected behavior when (for example) fetching using the refspec "refs/heads/s:<something>" from a remote with both "refs/heads/refs/heads/s" and "refs/heads/s"; even if the former was inadvertently created, one would still expect the latter to be fetched. Similarly, when both a tag T and a branch T exist, fetching T should favor the tag, just like how local refname disambiguation rule works. But because the code walks over ls-remote output from the remote, which happens to be sorted in alphabetical order and has refs/heads/T before refs/tags/T, a request to fetch T is (mis)interpreted as fetching refs/heads/T. Update refname_match(), all of whose current callers care only if it returns non-zero (i.e. matches) to see if an abbreviated name can mean the full name being tested, so that it returns a positive integer whose magnitude can be used to tell the precedence, and fix the find_ref_by_name_abbrev() function not to stop at the first match but find the match with the highest precedence. This is based on an earlier work, which special cased only the exact matches, by Jonathan Tan. Helped-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-02 00:22:37 +08:00
for (p = ref_rev_parse_rules; *p; p++)
if (!strcmp(full_name, mkpath(*p, abbrev_name_len, abbrev_name)))
return &ref_rev_parse_rules[num_rules] - p;
2007-11-11 22:01:46 +08:00
return 0;
}
/*
* Given a 'prefix' expand it by the rules in 'ref_rev_parse_rules' and add
* the results to 'prefixes'
*/
void expand_ref_prefix(struct strvec *prefixes, const char *prefix)
{
const char **p;
int len = strlen(prefix);
for (p = ref_rev_parse_rules; *p; p++)
strvec_pushf(prefixes, *p, len, prefix);
}
static const char default_branch_name_advice[] = N_(
"Using '%s' as the name for the initial branch. This default branch name\n"
"is subject to change. To configure the initial branch name to use in all\n"
"of your new repositories, which will suppress this warning, call:\n"
"\n"
"\tgit config --global init.defaultBranch <name>\n"
"\n"
"Names commonly chosen instead of 'master' are 'main', 'trunk' and\n"
"'development'. The just-created branch can be renamed via this command:\n"
"\n"
"\tgit branch -m <name>\n"
);
char *repo_default_branch_name(struct repository *r, int quiet)
{
const char *config_key = "init.defaultbranch";
const char *config_display_key = "init.defaultBranch";
char *ret = NULL, *full_ref;
const char *env = getenv("GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME");
if (env && *env)
ret = xstrdup(env);
else if (repo_config_get_string(r, config_key, &ret) < 0)
die(_("could not retrieve `%s`"), config_display_key);
if (!ret) {
ret = xstrdup("master");
if (!quiet)
advise(_(default_branch_name_advice), ret);
}
full_ref = xstrfmt("refs/heads/%s", ret);
if (check_refname_format(full_ref, 0))
die(_("invalid branch name: %s = %s"), config_display_key, ret);
free(full_ref);
return ret;
}
/*
* *string and *len will only be substituted, and *string returned (for
* later free()ing) if the string passed in is a magic short-hand form
* to name a branch.
*/
static char *substitute_branch_name(struct repository *r,
const char **string, int *len,
int nonfatal_dangling_mark)
{
struct strbuf buf = STRBUF_INIT;
struct interpret_branch_name_options options = {
.nonfatal_dangling_mark = nonfatal_dangling_mark
};
int ret = repo_interpret_branch_name(r, *string, *len, &buf, &options);
if (ret == *len) {
size_t size;
*string = strbuf_detach(&buf, &size);
*len = size;
return (char *)*string;
}
return NULL;
}
int repo_dwim_ref(struct repository *r, const char *str, int len,
struct object_id *oid, char **ref, int nonfatal_dangling_mark)
{
char *last_branch = substitute_branch_name(r, &str, &len,
nonfatal_dangling_mark);
int refs_found = expand_ref(r, str, len, oid, ref);
free(last_branch);
return refs_found;
}
int expand_ref(struct repository *repo, const char *str, int len,
struct object_id *oid, char **ref)
{
const char **p, *r;
int refs_found = 0;
struct strbuf fullref = STRBUF_INIT;
*ref = NULL;
for (p = ref_rev_parse_rules; *p; p++) {
struct object_id oid_from_ref;
struct object_id *this_result;
int flag;
struct ref_store *refs = get_main_ref_store(repo);
this_result = refs_found ? &oid_from_ref : oid;
strbuf_reset(&fullref);
strbuf_addf(&fullref, *p, len, str);
r = refs_resolve_ref_unsafe(refs, fullref.buf,
RESOLVE_REF_READING,
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 22:37:01 +08:00
this_result, &flag);
if (r) {
if (!refs_found++)
*ref = xstrdup(r);
if (!repo_settings_get_warn_ambiguous_refs(repo))
break;
} else if ((flag & REF_ISSYMREF) && strcmp(fullref.buf, "HEAD")) {
warning(_("ignoring dangling symref %s"), fullref.buf);
} else if ((flag & REF_ISBROKEN) && strchr(fullref.buf, '/')) {
warning(_("ignoring broken ref %s"), fullref.buf);
}
}
strbuf_release(&fullref);
return refs_found;
}
int repo_dwim_log(struct repository *r, const char *str, int len,
struct object_id *oid, char **log)
{
struct ref_store *refs = get_main_ref_store(r);
char *last_branch = substitute_branch_name(r, &str, &len, 0);
const char **p;
int logs_found = 0;
struct strbuf path = STRBUF_INIT;
*log = NULL;
for (p = ref_rev_parse_rules; *p; p++) {
struct object_id hash;
const char *ref, *it;
strbuf_reset(&path);
strbuf_addf(&path, *p, len, str);
ref = refs_resolve_ref_unsafe(refs, path.buf,
RESOLVE_REF_READING,
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 22:37:01 +08:00
oid ? &hash : NULL, NULL);
if (!ref)
continue;
if (refs_reflog_exists(refs, path.buf))
it = path.buf;
else if (strcmp(ref, path.buf) &&
refs_reflog_exists(refs, ref))
it = ref;
else
continue;
if (!logs_found++) {
*log = xstrdup(it);
if (oid)
oidcpy(oid, &hash);
}
if (!repo_settings_get_warn_ambiguous_refs(r))
break;
}
strbuf_release(&path);
free(last_branch);
return logs_found;
}
int is_per_worktree_ref(const char *refname)
{
return starts_with(refname, "refs/worktree/") ||
starts_with(refname, "refs/bisect/") ||
starts_with(refname, "refs/rewritten/");
}
int is_pseudo_ref(const char *refname)
{
static const char * const pseudo_refs[] = {
"FETCH_HEAD",
"MERGE_HEAD",
};
size_t i;
for (i = 0; i < ARRAY_SIZE(pseudo_refs); i++)
if (!strcmp(refname, pseudo_refs[i]))
return 1;
return 0;
}
static int is_root_ref_syntax(const char *refname)
{
const char *c;
for (c = refname; *c; c++) {
if (!isupper(*c) && *c != '-' && *c != '_')
return 0;
}
return 1;
}
refs: do not check ref existence in `is_root_ref()` Before this patch series, root refs except for "HEAD" and our special refs were classified as pseudorefs. Furthermore, our terminology clarified that pseudorefs must not be symbolic refs. This restriction is enforced in `is_root_ref()`, which explicitly checks that a supposed root ref resolves to an object ID without recursing. This has been extremely confusing right from the start because (in old terminology) a ref name may sometimes be a pseudoref and sometimes not depending on whether it is a symbolic or regular ref. This behaviour does not seem reasonable at all and I very much doubt that it results in anything sane. Last but not least, the current behaviour can actually lead to a segfault when calling `is_root_ref()` with a reference that either does not exist or that is a symbolic ref because we never initialized `oid`, but then read it via `is_null_oid()`. We have now changed terminology to clarify that pseudorefs are really only "MERGE_HEAD" and "FETCH_HEAD", whereas all the other refs that live in the root of the ref hierarchy are just plain refs. Thus, we do not need to check whether the ref is symbolic or not. In fact, we can now avoid looking up the ref completely as the name is sufficient for us to figure out whether something would be a root ref or not. This change of course changes semantics for our callers. As there are only three of them we can assess each of them individually: - "ref-filter.c:ref_kind_from_refname()" uses it to classify refs. It's clear that the intent is to classify based on the ref name, only. - "refs/reftable_backend.c:reftable_ref_iterator_advance()" uses it to filter root refs. Again, using existence checks is pointless here as the iterator has just surfaced the ref, so we know it does exist. - "refs/files_backend.c:add_pseudoref_and_head_entries()" uses it to determine whether it should add a ref to the root directory of its iterator. This had the effect that we skipped over any files that are either a symbolic ref, or which are not a ref at all. The new behaviour is to include symbolic refs know, which aligns us with the adapted terminology. Furthermore, files which look like root refs but aren't are now mark those as "broken". As broken refs are not surfaced by our tooling, this should not lead to a change in user-visible behaviour, but may cause us to emit warnings. This feels like the right thing to do as we would otherwise just silently ignore corrupted root refs completely. So in all cases the existence check was either superfluous, not in line with the adapted terminology or masked potential issues. This commit thus changes the behaviour as proposed and drops the existence check altogether. Add a test that verifies that this does not change user-visible behaviour. Namely, we still don't want to show broken refs to the user by default in git-for-each-ref(1). What this does allow though is for internal callers to surface dangling root refs when they pass in the `DO_FOR_EACH_INCLUDE_BROKEN` flag. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-15 14:50:51 +08:00
int is_root_ref(const char *refname)
2024-02-23 18:01:08 +08:00
{
static const char *const irregular_root_refs[] = {
"HEAD",
2024-02-23 18:01:08 +08:00
"AUTO_MERGE",
"BISECT_EXPECTED_REV",
"NOTES_MERGE_PARTIAL",
"NOTES_MERGE_REF",
"MERGE_AUTOSTASH",
};
size_t i;
if (!is_root_ref_syntax(refname) ||
is_pseudo_ref(refname))
2024-02-23 18:01:08 +08:00
return 0;
refs: do not check ref existence in `is_root_ref()` Before this patch series, root refs except for "HEAD" and our special refs were classified as pseudorefs. Furthermore, our terminology clarified that pseudorefs must not be symbolic refs. This restriction is enforced in `is_root_ref()`, which explicitly checks that a supposed root ref resolves to an object ID without recursing. This has been extremely confusing right from the start because (in old terminology) a ref name may sometimes be a pseudoref and sometimes not depending on whether it is a symbolic or regular ref. This behaviour does not seem reasonable at all and I very much doubt that it results in anything sane. Last but not least, the current behaviour can actually lead to a segfault when calling `is_root_ref()` with a reference that either does not exist or that is a symbolic ref because we never initialized `oid`, but then read it via `is_null_oid()`. We have now changed terminology to clarify that pseudorefs are really only "MERGE_HEAD" and "FETCH_HEAD", whereas all the other refs that live in the root of the ref hierarchy are just plain refs. Thus, we do not need to check whether the ref is symbolic or not. In fact, we can now avoid looking up the ref completely as the name is sufficient for us to figure out whether something would be a root ref or not. This change of course changes semantics for our callers. As there are only three of them we can assess each of them individually: - "ref-filter.c:ref_kind_from_refname()" uses it to classify refs. It's clear that the intent is to classify based on the ref name, only. - "refs/reftable_backend.c:reftable_ref_iterator_advance()" uses it to filter root refs. Again, using existence checks is pointless here as the iterator has just surfaced the ref, so we know it does exist. - "refs/files_backend.c:add_pseudoref_and_head_entries()" uses it to determine whether it should add a ref to the root directory of its iterator. This had the effect that we skipped over any files that are either a symbolic ref, or which are not a ref at all. The new behaviour is to include symbolic refs know, which aligns us with the adapted terminology. Furthermore, files which look like root refs but aren't are now mark those as "broken". As broken refs are not surfaced by our tooling, this should not lead to a change in user-visible behaviour, but may cause us to emit warnings. This feels like the right thing to do as we would otherwise just silently ignore corrupted root refs completely. So in all cases the existence check was either superfluous, not in line with the adapted terminology or masked potential issues. This commit thus changes the behaviour as proposed and drops the existence check altogether. Add a test that verifies that this does not change user-visible behaviour. Namely, we still don't want to show broken refs to the user by default in git-for-each-ref(1). What this does allow though is for internal callers to surface dangling root refs when they pass in the `DO_FOR_EACH_INCLUDE_BROKEN` flag. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-15 14:50:51 +08:00
if (ends_with(refname, "_HEAD"))
return 1;
2024-02-23 18:01:08 +08:00
for (i = 0; i < ARRAY_SIZE(irregular_root_refs); i++)
refs: do not check ref existence in `is_root_ref()` Before this patch series, root refs except for "HEAD" and our special refs were classified as pseudorefs. Furthermore, our terminology clarified that pseudorefs must not be symbolic refs. This restriction is enforced in `is_root_ref()`, which explicitly checks that a supposed root ref resolves to an object ID without recursing. This has been extremely confusing right from the start because (in old terminology) a ref name may sometimes be a pseudoref and sometimes not depending on whether it is a symbolic or regular ref. This behaviour does not seem reasonable at all and I very much doubt that it results in anything sane. Last but not least, the current behaviour can actually lead to a segfault when calling `is_root_ref()` with a reference that either does not exist or that is a symbolic ref because we never initialized `oid`, but then read it via `is_null_oid()`. We have now changed terminology to clarify that pseudorefs are really only "MERGE_HEAD" and "FETCH_HEAD", whereas all the other refs that live in the root of the ref hierarchy are just plain refs. Thus, we do not need to check whether the ref is symbolic or not. In fact, we can now avoid looking up the ref completely as the name is sufficient for us to figure out whether something would be a root ref or not. This change of course changes semantics for our callers. As there are only three of them we can assess each of them individually: - "ref-filter.c:ref_kind_from_refname()" uses it to classify refs. It's clear that the intent is to classify based on the ref name, only. - "refs/reftable_backend.c:reftable_ref_iterator_advance()" uses it to filter root refs. Again, using existence checks is pointless here as the iterator has just surfaced the ref, so we know it does exist. - "refs/files_backend.c:add_pseudoref_and_head_entries()" uses it to determine whether it should add a ref to the root directory of its iterator. This had the effect that we skipped over any files that are either a symbolic ref, or which are not a ref at all. The new behaviour is to include symbolic refs know, which aligns us with the adapted terminology. Furthermore, files which look like root refs but aren't are now mark those as "broken". As broken refs are not surfaced by our tooling, this should not lead to a change in user-visible behaviour, but may cause us to emit warnings. This feels like the right thing to do as we would otherwise just silently ignore corrupted root refs completely. So in all cases the existence check was either superfluous, not in line with the adapted terminology or masked potential issues. This commit thus changes the behaviour as proposed and drops the existence check altogether. Add a test that verifies that this does not change user-visible behaviour. Namely, we still don't want to show broken refs to the user by default in git-for-each-ref(1). What this does allow though is for internal callers to surface dangling root refs when they pass in the `DO_FOR_EACH_INCLUDE_BROKEN` flag. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-15 14:50:51 +08:00
if (!strcmp(refname, irregular_root_refs[i]))
return 1;
2024-02-23 18:01:08 +08:00
return 0;
}
static int is_current_worktree_ref(const char *ref) {
return is_root_ref_syntax(ref) || is_per_worktree_ref(ref);
refs: new ref types to make per-worktree refs visible to all worktrees One of the problems with multiple worktree is accessing per-worktree refs of one worktree from another worktree. This was sort of solved by multiple ref store, where the code can open the ref store of another worktree and has access to the ref space of that worktree. The problem with this is reporting. "HEAD" in another ref space is also called "HEAD" like in the current ref space. In order to differentiate them, all the code must somehow carry the ref store around and print something like "HEAD from this ref store". But that is not feasible (or possible with a _lot_ of work). With the current design, we pass a reference around as a string (so called "refname"). Extending this design to pass a string _and_ a ref store is a nightmare, especially when handling extended SHA-1 syntax. So we do it another way. Instead of entering a separate ref space, we make refs from other worktrees available in the current ref space. So "HEAD" is always HEAD of the current worktree, but then we can have "worktrees/blah/HEAD" to denote HEAD from a worktree named "blah". This syntax coincidentally matches the underlying directory structure which makes implementation a bit easier. The main worktree has to be treated specially because well... it's special from the beginning. So HEAD from the main worktree is acccessible via the name "main-worktree/HEAD" instead of "worktrees/main/HEAD" because "main" could be just another secondary worktree. This patch also makes it possible to specify refs from one worktree in another one, e.g. git log worktrees/foo/HEAD Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-21 16:08:54 +08:00
}
enum ref_worktree_type parse_worktree_ref(const char *maybe_worktree_ref,
const char **worktree_name, int *worktree_name_length,
const char **bare_refname)
refs: new ref types to make per-worktree refs visible to all worktrees One of the problems with multiple worktree is accessing per-worktree refs of one worktree from another worktree. This was sort of solved by multiple ref store, where the code can open the ref store of another worktree and has access to the ref space of that worktree. The problem with this is reporting. "HEAD" in another ref space is also called "HEAD" like in the current ref space. In order to differentiate them, all the code must somehow carry the ref store around and print something like "HEAD from this ref store". But that is not feasible (or possible with a _lot_ of work). With the current design, we pass a reference around as a string (so called "refname"). Extending this design to pass a string _and_ a ref store is a nightmare, especially when handling extended SHA-1 syntax. So we do it another way. Instead of entering a separate ref space, we make refs from other worktrees available in the current ref space. So "HEAD" is always HEAD of the current worktree, but then we can have "worktrees/blah/HEAD" to denote HEAD from a worktree named "blah". This syntax coincidentally matches the underlying directory structure which makes implementation a bit easier. The main worktree has to be treated specially because well... it's special from the beginning. So HEAD from the main worktree is acccessible via the name "main-worktree/HEAD" instead of "worktrees/main/HEAD" because "main" could be just another secondary worktree. This patch also makes it possible to specify refs from one worktree in another one, e.g. git log worktrees/foo/HEAD Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-21 16:08:54 +08:00
{
const char *name_dummy;
int name_length_dummy;
const char *ref_dummy;
refs: new ref types to make per-worktree refs visible to all worktrees One of the problems with multiple worktree is accessing per-worktree refs of one worktree from another worktree. This was sort of solved by multiple ref store, where the code can open the ref store of another worktree and has access to the ref space of that worktree. The problem with this is reporting. "HEAD" in another ref space is also called "HEAD" like in the current ref space. In order to differentiate them, all the code must somehow carry the ref store around and print something like "HEAD from this ref store". But that is not feasible (or possible with a _lot_ of work). With the current design, we pass a reference around as a string (so called "refname"). Extending this design to pass a string _and_ a ref store is a nightmare, especially when handling extended SHA-1 syntax. So we do it another way. Instead of entering a separate ref space, we make refs from other worktrees available in the current ref space. So "HEAD" is always HEAD of the current worktree, but then we can have "worktrees/blah/HEAD" to denote HEAD from a worktree named "blah". This syntax coincidentally matches the underlying directory structure which makes implementation a bit easier. The main worktree has to be treated specially because well... it's special from the beginning. So HEAD from the main worktree is acccessible via the name "main-worktree/HEAD" instead of "worktrees/main/HEAD" because "main" could be just another secondary worktree. This patch also makes it possible to specify refs from one worktree in another one, e.g. git log worktrees/foo/HEAD Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-21 16:08:54 +08:00
if (!worktree_name)
worktree_name = &name_dummy;
if (!worktree_name_length)
worktree_name_length = &name_length_dummy;
if (!bare_refname)
bare_refname = &ref_dummy;
if (skip_prefix(maybe_worktree_ref, "worktrees/", bare_refname)) {
const char *slash = strchr(*bare_refname, '/');
*worktree_name = *bare_refname;
if (!slash) {
*worktree_name_length = strlen(*worktree_name);
/* This is an error condition, and the caller tell because the bare_refname is "" */
*bare_refname = *worktree_name + *worktree_name_length;
return REF_WORKTREE_OTHER;
}
*worktree_name_length = slash - *bare_refname;
*bare_refname = slash + 1;
if (is_current_worktree_ref(*bare_refname))
return REF_WORKTREE_OTHER;
}
*worktree_name = NULL;
*worktree_name_length = 0;
if (skip_prefix(maybe_worktree_ref, "main-worktree/", bare_refname)
&& is_current_worktree_ref(*bare_refname))
return REF_WORKTREE_MAIN;
*bare_refname = maybe_worktree_ref;
if (is_current_worktree_ref(maybe_worktree_ref))
return REF_WORKTREE_CURRENT;
return REF_WORKTREE_SHARED;
}
long get_files_ref_lock_timeout_ms(void)
{
static int configured = 0;
/* The default timeout is 100 ms: */
static int timeout_ms = 100;
if (!configured) {
git_config_get_int("core.filesreflocktimeout", &timeout_ms);
configured = 1;
}
return timeout_ms;
}
int refs_delete_ref(struct ref_store *refs, const char *msg,
const char *refname,
const struct object_id *old_oid,
unsigned int flags)
{
struct ref_transaction *transaction;
struct strbuf err = STRBUF_INIT;
transaction = ref_store_transaction_begin(refs, &err);
if (!transaction ||
ref_transaction_delete(transaction, refname, old_oid,
NULL, flags, msg, &err) ||
ref_transaction_commit(transaction, &err)) {
error("%s", err.buf);
ref_transaction_free(transaction);
strbuf_release(&err);
return 1;
}
ref_transaction_free(transaction);
strbuf_release(&err);
return 0;
}
reflog: cleanse messages in the refs.c layer Regarding reflog messages: - We expect that a reflog message consists of a single line. The file format used by the files backend may add a LF after the message as a delimiter, and output by commands like "git log -g" may complete such an incomplete line by adding a LF at the end, but philosophically, the terminating LF is not a part of the message. - We however allow callers of refs API to supply a random sequence of NUL terminated bytes. We cleanse caller-supplied message by squashing a run of whitespaces into a SP, and by trimming trailing whitespace, before storing the message. This is how we tolerate, instead of erring out, a message with LF in it (be it at the end, in the middle, or both). Currently, the cleansing of the reflog message is done by the files backend, before the log is written out. This is sufficient with the current code, as that is the only backend that writes reflogs. But new backends can be added that write reflogs, and we'd want the resulting log message we would read out of "log -g" the same no matter what backend is used, and moving the code to do so to the generic layer is a way to do so. An added benefit is that the "cleansing" function could be updated later, independent from individual backends, to e.g. allow multi-line log messages if we wanted to, and when that happens, it would help a lot to ensure we covered all bases if the cleansing function (which would be updated) is called from the generic layer. Side note: I am not interested in supporting multi-line reflog messages right at the moment (nobody is asking for it), but I envision that instead of the "squash a run of whitespaces into a SP and rtrim" cleansing, we can %urlencode problematic bytes in the message *AND* append a SP at the end, when a new version of Git that supports multi-line and/or verbatim reflog messages writes a reflog record. The reading side can detect the presense of SP at the end (which should have been rtrimmed out if it were written by existing versions of Git) as a signal that decoding %urlencode recovers the original reflog message. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-11 01:19:53 +08:00
static void copy_reflog_msg(struct strbuf *sb, const char *msg)
{
char c;
int wasspace = 1;
while ((c = *msg++)) {
if (wasspace && isspace(c))
continue;
wasspace = isspace(c);
if (wasspace)
c = ' ';
strbuf_addch(sb, c);
}
strbuf_rtrim(sb);
}
reflog: cleanse messages in the refs.c layer Regarding reflog messages: - We expect that a reflog message consists of a single line. The file format used by the files backend may add a LF after the message as a delimiter, and output by commands like "git log -g" may complete such an incomplete line by adding a LF at the end, but philosophically, the terminating LF is not a part of the message. - We however allow callers of refs API to supply a random sequence of NUL terminated bytes. We cleanse caller-supplied message by squashing a run of whitespaces into a SP, and by trimming trailing whitespace, before storing the message. This is how we tolerate, instead of erring out, a message with LF in it (be it at the end, in the middle, or both). Currently, the cleansing of the reflog message is done by the files backend, before the log is written out. This is sufficient with the current code, as that is the only backend that writes reflogs. But new backends can be added that write reflogs, and we'd want the resulting log message we would read out of "log -g" the same no matter what backend is used, and moving the code to do so to the generic layer is a way to do so. An added benefit is that the "cleansing" function could be updated later, independent from individual backends, to e.g. allow multi-line log messages if we wanted to, and when that happens, it would help a lot to ensure we covered all bases if the cleansing function (which would be updated) is called from the generic layer. Side note: I am not interested in supporting multi-line reflog messages right at the moment (nobody is asking for it), but I envision that instead of the "squash a run of whitespaces into a SP and rtrim" cleansing, we can %urlencode problematic bytes in the message *AND* append a SP at the end, when a new version of Git that supports multi-line and/or verbatim reflog messages writes a reflog record. The reading side can detect the presense of SP at the end (which should have been rtrimmed out if it were written by existing versions of Git) as a signal that decoding %urlencode recovers the original reflog message. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-11 01:19:53 +08:00
static char *normalize_reflog_message(const char *msg)
{
struct strbuf sb = STRBUF_INIT;
if (msg && *msg)
copy_reflog_msg(&sb, msg);
return strbuf_detach(&sb, NULL);
}
int should_autocreate_reflog(enum log_refs_config log_all_ref_updates,
const char *refname)
{
switch (log_all_ref_updates) {
case LOG_REFS_ALWAYS:
return 1;
case LOG_REFS_NORMAL:
return starts_with(refname, "refs/heads/") ||
starts_with(refname, "refs/remotes/") ||
starts_with(refname, "refs/notes/") ||
!strcmp(refname, "HEAD");
default:
return 0;
}
}
int is_branch(const char *refname)
{
return !strcmp(refname, "HEAD") || starts_with(refname, "refs/heads/");
}
struct read_ref_at_cb {
const char *refname;
timestamp_t at_time;
int cnt;
int reccnt;
struct object_id *oid;
int found_it;
struct object_id ooid;
struct object_id noid;
int tz;
timestamp_t date;
char **msg;
timestamp_t *cutoff_time;
int *cutoff_tz;
int *cutoff_cnt;
};
static void set_read_ref_cutoffs(struct read_ref_at_cb *cb,
timestamp_t timestamp, int tz, const char *message)
{
if (cb->msg)
*cb->msg = xstrdup(message);
if (cb->cutoff_time)
*cb->cutoff_time = timestamp;
if (cb->cutoff_tz)
*cb->cutoff_tz = tz;
if (cb->cutoff_cnt)
*cb->cutoff_cnt = cb->reccnt;
}
static int read_ref_at_ent(struct object_id *ooid, struct object_id *noid,
const char *email UNUSED,
timestamp_t timestamp, int tz,
const char *message, void *cb_data)
{
struct read_ref_at_cb *cb = cb_data;
cb->tz = tz;
cb->date = timestamp;
Revert "refs: allow @{n} to work with n-sized reflog" This reverts commit 6436a20284f33d42103cac93bd82e65bebb31526. The idea of that commit is that if read_ref_at() is counting back to the Nth reflog but the reflog is short by one entry (e.g., because it was pruned), we can find the oid of the missing entry by looking at the "before" oid value of the entry that comes after it (whereas before, we looked at the "after" value of each entry and complained that we couldn't find the one from before the truncation). This works fine for resolving the oid of ref@{n}, as it is used by get_oid_basic(), which does not look at any other aspect of the reflog we found (e.g., its timestamp or message). But there's another caller of read_ref_at(): in show-branch we use it to walk over the reflog, and we do care about the reflog entry. And so that commit broke "show-branch --reflog"; it shows the reflog message for ref@{0} as ref@{1}, ref@{1} as ref@{2}, and so on. For example, in the new test in t3202 we produce: ! [branch@{0}] (0 seconds ago) commit: three ! [branch@{1}] (0 seconds ago) commit: three ! [branch@{2}] (60 seconds ago) commit: two ! [branch@{3}] (2 minutes ago) reset: moving to HEAD^ instead of the correct: ! [branch@{0}] (0 seconds ago) commit: three ! [branch@{1}] (60 seconds ago) commit: two ! [branch@{2}] (2 minutes ago) reset: moving to HEAD^ ! [branch@{3}] (2 minutes ago) commit: one But there's another bug, too: because it is looking at the "old" value of the reflog after the one we're interested in, it has to special-case ref@{0} (since there isn't anything after it). That's why it doesn't show the offset bug in the output above. But this special-case code fails to handle the situation where the reflog is empty or missing; it returns success even though the reflog message out-parameter has been left uninitialized. You can't trigger this through get_oid_basic(), but "show-branch --reflog" will pretty reliably segfault as it tries to access the garbage pointer. Fixing the segfault would be pretty easy. But the off-by-one problem is inherent in this approach. So let's start by reverting the commit to give us a clean slate to work with. This isn't a pure revert; all of the code changes are reverted, but for the tests: 1. We'll flip the cases in t1508 to expect_failure; making these work was the goal of 6436a2028, and we'll want to use them for our replacement approach. 2. There's a test in t3202 for "show-branch --reflog", but it expects the broken output! It was added by f2463490c4 (show-branch: show reflog message, 2021-12-02) which was fixing another bug, and I think the author simply didn't notice that the second line showed the wrong reflog. Rather than fixing that test, let's replace it with one that is more thorough (while still covering the reflog message fix from that commit). We'll use a longer reflog, which lets us see more entries (thus making the "off by one" pattern much more clear). And we'll use a more recent timestamp for "now" so that our relative dates have more resolution. That lets us see that the reflog dates are correct (whereas when you are 4 years away, two entries that are 60 seconds apart will have the same "4 years ago" relative date). Because we're adjusting the repository state, I've moved this new test to the end of the script, leaving the other tests undisturbed. We'll also add a new test which covers the missing reflog case; previously it segfaulted, but now it reports the empty reflog). Reported-by: Yasushi SHOJI <yasushi.shoji@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-26 18:02:26 +08:00
if (timestamp <= cb->at_time || cb->cnt == 0) {
set_read_ref_cutoffs(cb, timestamp, tz, message);
/*
* we have not yet updated cb->[n|o]oid so they still
* hold the values for the previous record.
*/
Revert "refs: allow @{n} to work with n-sized reflog" This reverts commit 6436a20284f33d42103cac93bd82e65bebb31526. The idea of that commit is that if read_ref_at() is counting back to the Nth reflog but the reflog is short by one entry (e.g., because it was pruned), we can find the oid of the missing entry by looking at the "before" oid value of the entry that comes after it (whereas before, we looked at the "after" value of each entry and complained that we couldn't find the one from before the truncation). This works fine for resolving the oid of ref@{n}, as it is used by get_oid_basic(), which does not look at any other aspect of the reflog we found (e.g., its timestamp or message). But there's another caller of read_ref_at(): in show-branch we use it to walk over the reflog, and we do care about the reflog entry. And so that commit broke "show-branch --reflog"; it shows the reflog message for ref@{0} as ref@{1}, ref@{1} as ref@{2}, and so on. For example, in the new test in t3202 we produce: ! [branch@{0}] (0 seconds ago) commit: three ! [branch@{1}] (0 seconds ago) commit: three ! [branch@{2}] (60 seconds ago) commit: two ! [branch@{3}] (2 minutes ago) reset: moving to HEAD^ instead of the correct: ! [branch@{0}] (0 seconds ago) commit: three ! [branch@{1}] (60 seconds ago) commit: two ! [branch@{2}] (2 minutes ago) reset: moving to HEAD^ ! [branch@{3}] (2 minutes ago) commit: one But there's another bug, too: because it is looking at the "old" value of the reflog after the one we're interested in, it has to special-case ref@{0} (since there isn't anything after it). That's why it doesn't show the offset bug in the output above. But this special-case code fails to handle the situation where the reflog is empty or missing; it returns success even though the reflog message out-parameter has been left uninitialized. You can't trigger this through get_oid_basic(), but "show-branch --reflog" will pretty reliably segfault as it tries to access the garbage pointer. Fixing the segfault would be pretty easy. But the off-by-one problem is inherent in this approach. So let's start by reverting the commit to give us a clean slate to work with. This isn't a pure revert; all of the code changes are reverted, but for the tests: 1. We'll flip the cases in t1508 to expect_failure; making these work was the goal of 6436a2028, and we'll want to use them for our replacement approach. 2. There's a test in t3202 for "show-branch --reflog", but it expects the broken output! It was added by f2463490c4 (show-branch: show reflog message, 2021-12-02) which was fixing another bug, and I think the author simply didn't notice that the second line showed the wrong reflog. Rather than fixing that test, let's replace it with one that is more thorough (while still covering the reflog message fix from that commit). We'll use a longer reflog, which lets us see more entries (thus making the "off by one" pattern much more clear). And we'll use a more recent timestamp for "now" so that our relative dates have more resolution. That lets us see that the reflog dates are correct (whereas when you are 4 years away, two entries that are 60 seconds apart will have the same "4 years ago" relative date). Because we're adjusting the repository state, I've moved this new test to the end of the script, leaving the other tests undisturbed. We'll also add a new test which covers the missing reflog case; previously it segfaulted, but now it reports the empty reflog). Reported-by: Yasushi SHOJI <yasushi.shoji@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-26 18:02:26 +08:00
if (!is_null_oid(&cb->ooid)) {
oidcpy(cb->oid, noid);
if (!oideq(&cb->ooid, noid))
warning(_("log for ref %s has gap after %s"),
convert "enum date_mode" into a struct In preparation for adding date modes that may carry extra information beyond the mode itself, this patch converts the date_mode enum into a struct. Most of the conversion is fairly straightforward; we pass the struct as a pointer and dereference the type field where necessary. Locations that declare a date_mode can use a "{}" constructor. However, the tricky case is where we use the enum labels as constants, like: show_date(t, tz, DATE_NORMAL); Ideally we could say: show_date(t, tz, &{ DATE_NORMAL }); but of course C does not allow that. Likewise, we cannot cast the constant to a struct, because we need to pass an actual address. Our options are basically: 1. Manually add a "struct date_mode d = { DATE_NORMAL }" definition to each caller, and pass "&d". This makes the callers uglier, because they sometimes do not even have their own scope (e.g., they are inside a switch statement). 2. Provide a pre-made global "date_normal" struct that can be passed by address. We'd also need "date_rfc2822", "date_iso8601", and so forth. But at least the ugliness is defined in one place. 3. Provide a wrapper that generates the correct struct on the fly. The big downside is that we end up pointing to a single global, which makes our wrapper non-reentrant. But show_date is already not reentrant, so it does not matter. This patch implements 3, along with a minor macro to keep the size of the callers sane. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-26 00:55:02 +08:00
cb->refname, show_date(cb->date, cb->tz, DATE_MODE(RFC2822)));
Revert "refs: allow @{n} to work with n-sized reflog" This reverts commit 6436a20284f33d42103cac93bd82e65bebb31526. The idea of that commit is that if read_ref_at() is counting back to the Nth reflog but the reflog is short by one entry (e.g., because it was pruned), we can find the oid of the missing entry by looking at the "before" oid value of the entry that comes after it (whereas before, we looked at the "after" value of each entry and complained that we couldn't find the one from before the truncation). This works fine for resolving the oid of ref@{n}, as it is used by get_oid_basic(), which does not look at any other aspect of the reflog we found (e.g., its timestamp or message). But there's another caller of read_ref_at(): in show-branch we use it to walk over the reflog, and we do care about the reflog entry. And so that commit broke "show-branch --reflog"; it shows the reflog message for ref@{0} as ref@{1}, ref@{1} as ref@{2}, and so on. For example, in the new test in t3202 we produce: ! [branch@{0}] (0 seconds ago) commit: three ! [branch@{1}] (0 seconds ago) commit: three ! [branch@{2}] (60 seconds ago) commit: two ! [branch@{3}] (2 minutes ago) reset: moving to HEAD^ instead of the correct: ! [branch@{0}] (0 seconds ago) commit: three ! [branch@{1}] (60 seconds ago) commit: two ! [branch@{2}] (2 minutes ago) reset: moving to HEAD^ ! [branch@{3}] (2 minutes ago) commit: one But there's another bug, too: because it is looking at the "old" value of the reflog after the one we're interested in, it has to special-case ref@{0} (since there isn't anything after it). That's why it doesn't show the offset bug in the output above. But this special-case code fails to handle the situation where the reflog is empty or missing; it returns success even though the reflog message out-parameter has been left uninitialized. You can't trigger this through get_oid_basic(), but "show-branch --reflog" will pretty reliably segfault as it tries to access the garbage pointer. Fixing the segfault would be pretty easy. But the off-by-one problem is inherent in this approach. So let's start by reverting the commit to give us a clean slate to work with. This isn't a pure revert; all of the code changes are reverted, but for the tests: 1. We'll flip the cases in t1508 to expect_failure; making these work was the goal of 6436a2028, and we'll want to use them for our replacement approach. 2. There's a test in t3202 for "show-branch --reflog", but it expects the broken output! It was added by f2463490c4 (show-branch: show reflog message, 2021-12-02) which was fixing another bug, and I think the author simply didn't notice that the second line showed the wrong reflog. Rather than fixing that test, let's replace it with one that is more thorough (while still covering the reflog message fix from that commit). We'll use a longer reflog, which lets us see more entries (thus making the "off by one" pattern much more clear). And we'll use a more recent timestamp for "now" so that our relative dates have more resolution. That lets us see that the reflog dates are correct (whereas when you are 4 years away, two entries that are 60 seconds apart will have the same "4 years ago" relative date). Because we're adjusting the repository state, I've moved this new test to the end of the script, leaving the other tests undisturbed. We'll also add a new test which covers the missing reflog case; previously it segfaulted, but now it reports the empty reflog). Reported-by: Yasushi SHOJI <yasushi.shoji@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-26 18:02:26 +08:00
}
else if (cb->date == cb->at_time)
oidcpy(cb->oid, noid);
else if (!oideq(noid, cb->oid))
warning(_("log for ref %s unexpectedly ended on %s"),
cb->refname, show_date(cb->date, cb->tz,
convert "enum date_mode" into a struct In preparation for adding date modes that may carry extra information beyond the mode itself, this patch converts the date_mode enum into a struct. Most of the conversion is fairly straightforward; we pass the struct as a pointer and dereference the type field where necessary. Locations that declare a date_mode can use a "{}" constructor. However, the tricky case is where we use the enum labels as constants, like: show_date(t, tz, DATE_NORMAL); Ideally we could say: show_date(t, tz, &{ DATE_NORMAL }); but of course C does not allow that. Likewise, we cannot cast the constant to a struct, because we need to pass an actual address. Our options are basically: 1. Manually add a "struct date_mode d = { DATE_NORMAL }" definition to each caller, and pass "&d". This makes the callers uglier, because they sometimes do not even have their own scope (e.g., they are inside a switch statement). 2. Provide a pre-made global "date_normal" struct that can be passed by address. We'd also need "date_rfc2822", "date_iso8601", and so forth. But at least the ugliness is defined in one place. 3. Provide a wrapper that generates the correct struct on the fly. The big downside is that we end up pointing to a single global, which makes our wrapper non-reentrant. But show_date is already not reentrant, so it does not matter. This patch implements 3, along with a minor macro to keep the size of the callers sane. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-26 00:55:02 +08:00
DATE_MODE(RFC2822)));
Revert "refs: allow @{n} to work with n-sized reflog" This reverts commit 6436a20284f33d42103cac93bd82e65bebb31526. The idea of that commit is that if read_ref_at() is counting back to the Nth reflog but the reflog is short by one entry (e.g., because it was pruned), we can find the oid of the missing entry by looking at the "before" oid value of the entry that comes after it (whereas before, we looked at the "after" value of each entry and complained that we couldn't find the one from before the truncation). This works fine for resolving the oid of ref@{n}, as it is used by get_oid_basic(), which does not look at any other aspect of the reflog we found (e.g., its timestamp or message). But there's another caller of read_ref_at(): in show-branch we use it to walk over the reflog, and we do care about the reflog entry. And so that commit broke "show-branch --reflog"; it shows the reflog message for ref@{0} as ref@{1}, ref@{1} as ref@{2}, and so on. For example, in the new test in t3202 we produce: ! [branch@{0}] (0 seconds ago) commit: three ! [branch@{1}] (0 seconds ago) commit: three ! [branch@{2}] (60 seconds ago) commit: two ! [branch@{3}] (2 minutes ago) reset: moving to HEAD^ instead of the correct: ! [branch@{0}] (0 seconds ago) commit: three ! [branch@{1}] (60 seconds ago) commit: two ! [branch@{2}] (2 minutes ago) reset: moving to HEAD^ ! [branch@{3}] (2 minutes ago) commit: one But there's another bug, too: because it is looking at the "old" value of the reflog after the one we're interested in, it has to special-case ref@{0} (since there isn't anything after it). That's why it doesn't show the offset bug in the output above. But this special-case code fails to handle the situation where the reflog is empty or missing; it returns success even though the reflog message out-parameter has been left uninitialized. You can't trigger this through get_oid_basic(), but "show-branch --reflog" will pretty reliably segfault as it tries to access the garbage pointer. Fixing the segfault would be pretty easy. But the off-by-one problem is inherent in this approach. So let's start by reverting the commit to give us a clean slate to work with. This isn't a pure revert; all of the code changes are reverted, but for the tests: 1. We'll flip the cases in t1508 to expect_failure; making these work was the goal of 6436a2028, and we'll want to use them for our replacement approach. 2. There's a test in t3202 for "show-branch --reflog", but it expects the broken output! It was added by f2463490c4 (show-branch: show reflog message, 2021-12-02) which was fixing another bug, and I think the author simply didn't notice that the second line showed the wrong reflog. Rather than fixing that test, let's replace it with one that is more thorough (while still covering the reflog message fix from that commit). We'll use a longer reflog, which lets us see more entries (thus making the "off by one" pattern much more clear). And we'll use a more recent timestamp for "now" so that our relative dates have more resolution. That lets us see that the reflog dates are correct (whereas when you are 4 years away, two entries that are 60 seconds apart will have the same "4 years ago" relative date). Because we're adjusting the repository state, I've moved this new test to the end of the script, leaving the other tests undisturbed. We'll also add a new test which covers the missing reflog case; previously it segfaulted, but now it reports the empty reflog). Reported-by: Yasushi SHOJI <yasushi.shoji@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-26 18:02:26 +08:00
cb->reccnt++;
oidcpy(&cb->ooid, ooid);
oidcpy(&cb->noid, noid);
cb->found_it = 1;
Revert "refs: allow @{n} to work with n-sized reflog" This reverts commit 6436a20284f33d42103cac93bd82e65bebb31526. The idea of that commit is that if read_ref_at() is counting back to the Nth reflog but the reflog is short by one entry (e.g., because it was pruned), we can find the oid of the missing entry by looking at the "before" oid value of the entry that comes after it (whereas before, we looked at the "after" value of each entry and complained that we couldn't find the one from before the truncation). This works fine for resolving the oid of ref@{n}, as it is used by get_oid_basic(), which does not look at any other aspect of the reflog we found (e.g., its timestamp or message). But there's another caller of read_ref_at(): in show-branch we use it to walk over the reflog, and we do care about the reflog entry. And so that commit broke "show-branch --reflog"; it shows the reflog message for ref@{0} as ref@{1}, ref@{1} as ref@{2}, and so on. For example, in the new test in t3202 we produce: ! [branch@{0}] (0 seconds ago) commit: three ! [branch@{1}] (0 seconds ago) commit: three ! [branch@{2}] (60 seconds ago) commit: two ! [branch@{3}] (2 minutes ago) reset: moving to HEAD^ instead of the correct: ! [branch@{0}] (0 seconds ago) commit: three ! [branch@{1}] (60 seconds ago) commit: two ! [branch@{2}] (2 minutes ago) reset: moving to HEAD^ ! [branch@{3}] (2 minutes ago) commit: one But there's another bug, too: because it is looking at the "old" value of the reflog after the one we're interested in, it has to special-case ref@{0} (since there isn't anything after it). That's why it doesn't show the offset bug in the output above. But this special-case code fails to handle the situation where the reflog is empty or missing; it returns success even though the reflog message out-parameter has been left uninitialized. You can't trigger this through get_oid_basic(), but "show-branch --reflog" will pretty reliably segfault as it tries to access the garbage pointer. Fixing the segfault would be pretty easy. But the off-by-one problem is inherent in this approach. So let's start by reverting the commit to give us a clean slate to work with. This isn't a pure revert; all of the code changes are reverted, but for the tests: 1. We'll flip the cases in t1508 to expect_failure; making these work was the goal of 6436a2028, and we'll want to use them for our replacement approach. 2. There's a test in t3202 for "show-branch --reflog", but it expects the broken output! It was added by f2463490c4 (show-branch: show reflog message, 2021-12-02) which was fixing another bug, and I think the author simply didn't notice that the second line showed the wrong reflog. Rather than fixing that test, let's replace it with one that is more thorough (while still covering the reflog message fix from that commit). We'll use a longer reflog, which lets us see more entries (thus making the "off by one" pattern much more clear). And we'll use a more recent timestamp for "now" so that our relative dates have more resolution. That lets us see that the reflog dates are correct (whereas when you are 4 years away, two entries that are 60 seconds apart will have the same "4 years ago" relative date). Because we're adjusting the repository state, I've moved this new test to the end of the script, leaving the other tests undisturbed. We'll also add a new test which covers the missing reflog case; previously it segfaulted, but now it reports the empty reflog). Reported-by: Yasushi SHOJI <yasushi.shoji@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-26 18:02:26 +08:00
return 1;
}
cb->reccnt++;
oidcpy(&cb->ooid, ooid);
oidcpy(&cb->noid, noid);
Revert "refs: allow @{n} to work with n-sized reflog" This reverts commit 6436a20284f33d42103cac93bd82e65bebb31526. The idea of that commit is that if read_ref_at() is counting back to the Nth reflog but the reflog is short by one entry (e.g., because it was pruned), we can find the oid of the missing entry by looking at the "before" oid value of the entry that comes after it (whereas before, we looked at the "after" value of each entry and complained that we couldn't find the one from before the truncation). This works fine for resolving the oid of ref@{n}, as it is used by get_oid_basic(), which does not look at any other aspect of the reflog we found (e.g., its timestamp or message). But there's another caller of read_ref_at(): in show-branch we use it to walk over the reflog, and we do care about the reflog entry. And so that commit broke "show-branch --reflog"; it shows the reflog message for ref@{0} as ref@{1}, ref@{1} as ref@{2}, and so on. For example, in the new test in t3202 we produce: ! [branch@{0}] (0 seconds ago) commit: three ! [branch@{1}] (0 seconds ago) commit: three ! [branch@{2}] (60 seconds ago) commit: two ! [branch@{3}] (2 minutes ago) reset: moving to HEAD^ instead of the correct: ! [branch@{0}] (0 seconds ago) commit: three ! [branch@{1}] (60 seconds ago) commit: two ! [branch@{2}] (2 minutes ago) reset: moving to HEAD^ ! [branch@{3}] (2 minutes ago) commit: one But there's another bug, too: because it is looking at the "old" value of the reflog after the one we're interested in, it has to special-case ref@{0} (since there isn't anything after it). That's why it doesn't show the offset bug in the output above. But this special-case code fails to handle the situation where the reflog is empty or missing; it returns success even though the reflog message out-parameter has been left uninitialized. You can't trigger this through get_oid_basic(), but "show-branch --reflog" will pretty reliably segfault as it tries to access the garbage pointer. Fixing the segfault would be pretty easy. But the off-by-one problem is inherent in this approach. So let's start by reverting the commit to give us a clean slate to work with. This isn't a pure revert; all of the code changes are reverted, but for the tests: 1. We'll flip the cases in t1508 to expect_failure; making these work was the goal of 6436a2028, and we'll want to use them for our replacement approach. 2. There's a test in t3202 for "show-branch --reflog", but it expects the broken output! It was added by f2463490c4 (show-branch: show reflog message, 2021-12-02) which was fixing another bug, and I think the author simply didn't notice that the second line showed the wrong reflog. Rather than fixing that test, let's replace it with one that is more thorough (while still covering the reflog message fix from that commit). We'll use a longer reflog, which lets us see more entries (thus making the "off by one" pattern much more clear). And we'll use a more recent timestamp for "now" so that our relative dates have more resolution. That lets us see that the reflog dates are correct (whereas when you are 4 years away, two entries that are 60 seconds apart will have the same "4 years ago" relative date). Because we're adjusting the repository state, I've moved this new test to the end of the script, leaving the other tests undisturbed. We'll also add a new test which covers the missing reflog case; previously it segfaulted, but now it reports the empty reflog). Reported-by: Yasushi SHOJI <yasushi.shoji@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-26 18:02:26 +08:00
if (cb->cnt > 0)
cb->cnt--;
return 0;
}
static int read_ref_at_ent_oldest(struct object_id *ooid, struct object_id *noid,
const char *email UNUSED,
timestamp_t timestamp, int tz,
const char *message, void *cb_data)
{
struct read_ref_at_cb *cb = cb_data;
set_read_ref_cutoffs(cb, timestamp, tz, message);
oidcpy(cb->oid, ooid);
get_oid_basic(): special-case ref@{n} for oldest reflog entry The goal of 6436a20284 (refs: allow @{n} to work with n-sized reflog, 2021-01-07) was that if we have "n" entries in a reflog, we should still be able to resolve ref@{n} by looking at the "old" value of the oldest entry. Commit 6436a20284 tried to put the logic into read_ref_at() by shifting its idea of "n" by one. But we reverted that in the previous commit, since it led to bugs in other callers which cared about the details of the reflog entry we found. Instead, let's put the special case into the caller that resolves @{n}, as it cares only about the oid. read_ref_at() is even kind enough to return the "old" value from the final reflog; it just returns "1" to signal to us that we ran off the end of the reflog. But we can notice in the caller that we read just enough records for that "old" value to be the one we're looking for, and use it. Note that read_ref_at() could notice this case, too, and just return 0. But we don't want to do that, because the caller must be made aware that we only found the oid, not an actual reflog entry (and the call sites in show-branch do care about this). There is one complication, though. When read_ref_at() hits a truncated reflog, it will return the "old" value of the oldest entry only if it is not the null oid. Otherwise, it actually returns the "new" value from that entry! This bit of fudging is due to d1a4489a56 (avoid null SHA1 in oldest reflog, 2008-07-08), where asking for "ref@{20.years.ago}" for a ref created recently will produce the initial value as a convenience (even though technically it did not exist 20 years ago). But this convenience is only useful for time-based cutoffs. For count-based cutoffs, get_oid_basic() has always simply complained about going too far back: $ git rev-parse HEAD@{20} fatal: log for 'HEAD' only has 16 entries and we should continue to do so, rather than returning a nonsense value (there's even a test in t1508 already which covers this). So let's have the d1a4489a56 code kick in only when doing timestamp-based cutoffs. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-26 18:04:07 +08:00
if (cb->at_time && is_null_oid(cb->oid))
oidcpy(cb->oid, noid);
/* We just want the first entry */
return 1;
}
int read_ref_at(struct ref_store *refs, const char *refname,
unsigned int flags, timestamp_t at_time, int cnt,
struct object_id *oid, char **msg,
timestamp_t *cutoff_time, int *cutoff_tz, int *cutoff_cnt)
{
struct read_ref_at_cb cb;
memset(&cb, 0, sizeof(cb));
cb.refname = refname;
cb.at_time = at_time;
cb.cnt = cnt;
cb.msg = msg;
cb.cutoff_time = cutoff_time;
cb.cutoff_tz = cutoff_tz;
cb.cutoff_cnt = cutoff_cnt;
cb.oid = oid;
refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent, &cb);
if (!cb.reccnt) {
read_ref_at(): special-case ref@{0} for an empty reflog The previous commit special-cased get_oid_basic()'s handling of ref@{n} for a reflog with n entries. But its special case doesn't work for ref@{0} in an empty reflog, because read_ref_at() dies when it notices the empty reflog! We can make this work by special-casing this in read_ref_at(). It's somewhat gross, for two reasons: 1. We have no reflog entry to describe in the "msg" out-parameter. So we have to leave it uninitialized or make something up. 2. Likewise, we have no oid to put in the "oid" out-parameter. Leaving it untouched is actually the best thing here, as all of the callers will have initialized it with the current ref value via repo_dwim_log(). This is rather subtle, but it is how things worked in 6436a20284 (refs: allow @{n} to work with n-sized reflog, 2021-01-07) before we reverted it. The key difference from 6436a20284 here is that we'll return "1" to indicate that we _didn't_ find the requested reflog entry. Coupled with the special-casing in get_oid_basic() in the previous commit, that's enough to make looking up ref@{0} work, and we can flip 6436a20284's test back to expect_success. It also means that the call in show-branch which segfaulted with 6436a20284 (and which is now tested in t3202) remains OK. The caller notices that we could not find any reflog entry, and so it breaks out of its loop, showing nothing. This is different from the current behavior of producing an error, but it's just as reasonable (and is exactly what we'd do if you asked it to walk starting at ref@{1} but there was only 1 entry). Thus nobody should actually look at the reflog entry info we return. But we'll still put in some fake values just to be on the safe side, since this is such a subtle and confusing interface. Likewise, we'll document what's going on in a comment above the function declaration. If this were a function with a lot of callers, the footgun would probably not be worth it. But it has only ever had two callers in its 18-year existence, and it seems unlikely to grow more. So let's hold our noses and let users enjoy the convenience of a simulated ref@{0}. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-26 18:08:03 +08:00
if (cnt == 0) {
/*
* The caller asked for ref@{0}, and we had no entries.
* It's a bit subtle, but in practice all callers have
* prepped the "oid" field with the current value of
* the ref, which is the most reasonable fallback.
*
* We'll put dummy values into the out-parameters (so
* they're not just uninitialized garbage), and the
* caller can take our return value as a hint that
* we did not find any such reflog.
*/
set_read_ref_cutoffs(&cb, 0, 0, "empty reflog");
return 1;
}
if (flags & GET_OID_QUIETLY)
exit(128);
else
die(_("log for %s is empty"), refname);
}
if (cb.found_it)
return 0;
refs_for_each_reflog_ent(refs, refname, read_ref_at_ent_oldest, &cb);
return 1;
}
struct ref_transaction *ref_store_transaction_begin(struct ref_store *refs,
struct strbuf *err)
{
struct ref_transaction *tr;
assert(err);
CALLOC_ARRAY(tr, 1);
tr->ref_store = refs;
return tr;
}
void ref_transaction_free(struct ref_transaction *transaction)
{
size_t i;
if (!transaction)
return;
switch (transaction->state) {
case REF_TRANSACTION_OPEN:
case REF_TRANSACTION_CLOSED:
/* OK */
break;
case REF_TRANSACTION_PREPARED:
BUG("free called on a prepared reference transaction");
break;
default:
BUG("unexpected reference transaction state");
break;
}
for (i = 0; i < transaction->nr; i++) {
free(transaction->updates[i]->msg);
free((char *)transaction->updates[i]->new_target);
free((char *)transaction->updates[i]->old_target);
free(transaction->updates[i]);
}
free(transaction->updates);
free(transaction);
}
struct ref_update *ref_transaction_add_update(
struct ref_transaction *transaction,
const char *refname, unsigned int flags,
const struct object_id *new_oid,
const struct object_id *old_oid,
const char *new_target, const char *old_target,
const char *msg)
{
struct ref_update *update;
if (transaction->state != REF_TRANSACTION_OPEN)
BUG("update called for transaction that is not open");
if (old_oid && old_target)
BUG("only one of old_oid and old_target should be non NULL");
if (new_oid && new_target)
BUG("only one of new_oid and new_target should be non NULL");
FLEX_ALLOC_STR(update, refname, refname);
ALLOC_GROW(transaction->updates, transaction->nr + 1, transaction->alloc);
transaction->updates[transaction->nr++] = update;
update->flags = flags;
update->new_target = xstrdup_or_null(new_target);
update->old_target = xstrdup_or_null(old_target);
if ((flags & REF_HAVE_NEW) && new_oid)
oidcpy(&update->new_oid, new_oid);
if ((flags & REF_HAVE_OLD) && old_oid)
oidcpy(&update->old_oid, old_oid);
reflog: cleanse messages in the refs.c layer Regarding reflog messages: - We expect that a reflog message consists of a single line. The file format used by the files backend may add a LF after the message as a delimiter, and output by commands like "git log -g" may complete such an incomplete line by adding a LF at the end, but philosophically, the terminating LF is not a part of the message. - We however allow callers of refs API to supply a random sequence of NUL terminated bytes. We cleanse caller-supplied message by squashing a run of whitespaces into a SP, and by trimming trailing whitespace, before storing the message. This is how we tolerate, instead of erring out, a message with LF in it (be it at the end, in the middle, or both). Currently, the cleansing of the reflog message is done by the files backend, before the log is written out. This is sufficient with the current code, as that is the only backend that writes reflogs. But new backends can be added that write reflogs, and we'd want the resulting log message we would read out of "log -g" the same no matter what backend is used, and moving the code to do so to the generic layer is a way to do so. An added benefit is that the "cleansing" function could be updated later, independent from individual backends, to e.g. allow multi-line log messages if we wanted to, and when that happens, it would help a lot to ensure we covered all bases if the cleansing function (which would be updated) is called from the generic layer. Side note: I am not interested in supporting multi-line reflog messages right at the moment (nobody is asking for it), but I envision that instead of the "squash a run of whitespaces into a SP and rtrim" cleansing, we can %urlencode problematic bytes in the message *AND* append a SP at the end, when a new version of Git that supports multi-line and/or verbatim reflog messages writes a reflog record. The reading side can detect the presense of SP at the end (which should have been rtrimmed out if it were written by existing versions of Git) as a signal that decoding %urlencode recovers the original reflog message. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-11 01:19:53 +08:00
update->msg = normalize_reflog_message(msg);
return update;
}
int ref_transaction_update(struct ref_transaction *transaction,
const char *refname,
const struct object_id *new_oid,
const struct object_id *old_oid,
const char *new_target,
const char *old_target,
unsigned int flags, const char *msg,
struct strbuf *err)
{
assert(err);
if ((flags & REF_FORCE_CREATE_REFLOG) &&
(flags & REF_SKIP_CREATE_REFLOG)) {
strbuf_addstr(err, _("refusing to force and skip creation of reflog"));
return -1;
}
if (!(flags & REF_SKIP_REFNAME_VERIFICATION) &&
((new_oid && !is_null_oid(new_oid)) ?
check_refname_format(refname, REFNAME_ALLOW_ONELEVEL) :
!refname_is_safe(refname))) {
strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
refs.c: allow listing and deleting badly named refs We currently do not handle badly named refs well: $ cp .git/refs/heads/master .git/refs/heads/master.....@\*@\\. $ git branch fatal: Reference has invalid format: 'refs/heads/master.....@*@\.' $ git branch -D master.....@\*@\\. error: branch 'master.....@*@\.' not found. Users cannot recover from a badly named ref without manually finding and deleting the loose ref file or appropriate line in packed-refs. Making that easier will make it easier to tweak the ref naming rules in the future, for example to forbid shell metacharacters like '`' and '"', without putting people in a state that is hard to get out of. So allow "branch --list" to show these refs and allow "branch -d/-D" and "update-ref -d" to delete them. Other commands (for example to rename refs) will continue to not handle these refs but can be changed in later patches. Details: In resolving functions, refuse to resolve refs that don't pass the git-check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to resolve refs that escape the refs/ directory and do not match the pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD"). In locking functions, refuse to act on badly named refs unless they are being deleted and either are in the refs/ directory or match [A-Z_]*. Just like other invalid refs, flag resolved, badly named refs with the REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them in all iteration functions except for for_each_rawref. Flag badly named refs (but not symrefs pointing to badly named refs) with a REF_BAD_NAME flag to make it easier for future callers to notice and handle them specially. For example, in a later patch for-each-ref will use this flag to detect refs whose names can confuse callers parsing for-each-ref output. In the transaction API, refuse to create or update badly named refs, but allow deleting them (unless they try to escape refs/ and don't match [A-Z_]*). Signed-off-by: Ronnie Sahlberg <sahlberg@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-04 02:45:43 +08:00
refname);
return -1;
}
if (!(flags & REF_SKIP_REFNAME_VERIFICATION) &&
is_pseudo_ref(refname)) {
strbuf_addf(err, _("refusing to update pseudoref '%s'"),
refname);
return -1;
}
if (flags & ~REF_TRANSACTION_UPDATE_ALLOWED_FLAGS)
BUG("illegal flags 0x%x passed to ref_transaction_update()", flags);
refs: work around gcc-11 warning with REF_HAVE_NEW Using gcc-11 (or 12) to compile refs.o with -O3 results in: In file included from hashmap.h:4, from cache.h:6, from refs.c:5: In function ‘oidcpy’, inlined from ‘ref_transaction_add_update’ at refs.c:1065:3, inlined from ‘ref_transaction_update’ at refs.c:1094:2, inlined from ‘ref_transaction_verify’ at refs.c:1132:9: hash.h:262:9: warning: argument 2 null where non-null expected [-Wnonnull] 262 | memcpy(dst->hash, src->hash, GIT_MAX_RAWSZ); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from git-compat-util.h:177, from cache.h:4, from refs.c:5: refs.c: In function ‘ref_transaction_verify’: /usr/include/string.h:43:14: note: in a call to function ‘memcpy’ declared ‘nonnull’ 43 | extern void *memcpy (void *__restrict __dest, const void *__restrict __src, | ^~~~~~ That call to memcpy() is in a conditional block that requires REF_HAVE_NEW to be set. But in ref_transaction_update(), we make sure it isn't set coming in: if (flags & ~REF_TRANSACTION_UPDATE_ALLOWED_FLAGS) BUG("illegal flags 0x%x passed to ref_transaction_update()", flags); and then only set it if the variable isn't NULL: flags |= (new_oid ? REF_HAVE_NEW : 0) | (old_oid ? REF_HAVE_OLD : 0); So it should be impossible to reach that memcpy() with a NULL oid. But for whatever reason, gcc doesn't accept that hitting the BUG() means we won't go any further, even though it's marked with the noreturn attribute. And the conditional is correct; ALLOWED_FLAGS doesn't contain HAVE_NEW or HAVE_OLD, and you can even simplify it to check for those flags explicitly and the compiler still complains. We can work around this by just clearing the disallowed flags explicitly. This should be a noop because of the BUG() check, but it makes the compiler happy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-20 05:28:30 +08:00
/*
* Clear flags outside the allowed set; this should be a noop because
* of the BUG() check above, but it works around a -Wnonnull warning
* with some versions of "gcc -O3".
*/
flags &= REF_TRANSACTION_UPDATE_ALLOWED_FLAGS;
flags |= (new_oid ? REF_HAVE_NEW : 0) | (old_oid ? REF_HAVE_OLD : 0);
flags |= (new_target ? REF_HAVE_NEW : 0) | (old_target ? REF_HAVE_OLD : 0);
ref_transaction_add_update(transaction, refname, flags,
new_oid, old_oid, new_target,
old_target, msg);
return 0;
}
int ref_transaction_create(struct ref_transaction *transaction,
const char *refname,
const struct object_id *new_oid,
const char *new_target,
unsigned int flags, const char *msg,
struct strbuf *err)
{
if (new_oid && new_target)
BUG("create called with both new_oid and new_target set");
if ((!new_oid || is_null_oid(new_oid)) && !new_target) {
strbuf_addf(err, "'%s' has neither a valid OID nor a target", refname);
clone: die() instead of BUG() on bad refs When cloning directly from a local repository, we load a list of refs based on scanning the $GIT_DIR/refs/ directory of the "server" repository. If files exist in that directory that do not parse as hexadecimal hashes, then the ref array used by write_remote_refs() ends up with some entries with null OIDs. This causes us to hit a BUG() statement in ref_transaction_create(): BUG: create called without valid new_oid This BUG() call used to be a die() until 033abf97f (Replace all die("BUG: ...") calls by BUG() ones, 2018-05-02). Before that, the die() was added by f04c5b552 (ref_transaction_create(): check that new_sha1 is valid, 2015-02-17). The original report for this bug [1] mentioned that this problem did not exist in Git 2.27.0. The failure bisects unsurprisingly to 968f12fda (refs: turn on GIT_REF_PARANOIA by default, 2021-09-24). When GIT_REF_PARANOIA is enabled, this case always fails as far back as I am able to successfully compile and test the Git codebase. [1] https://github.com/git-for-windows/git/issues/3781 There are two approaches to consider here. One would be to remove this BUG() statement in favor of returning with an error. There are only two callers to ref_transaction_create(), so this would have a limited impact. The other approach would be to add special casing in 'git clone' to avoid this faulty input to the method. While I originally started with changing 'git clone', I decided that modifying ref_transaction_create() was a more complete solution. This prevents failing with a BUG() statement when we already have a good way to report an error (including a reason for that error) within the method. Both callers properly check the return value and die() with the error message, so this is an appropriate direction. The added test helps check against a regression, but does check that our intended error message is handled correctly. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-25 21:47:30 +08:00
return 1;
}
return ref_transaction_update(transaction, refname, new_oid,
null_oid(), new_target, NULL, flags,
msg, err);
}
int ref_transaction_delete(struct ref_transaction *transaction,
const char *refname,
const struct object_id *old_oid,
const char *old_target,
unsigned int flags,
const char *msg,
struct strbuf *err)
{
if (old_oid && is_null_oid(old_oid))
BUG("delete called with old_oid set to zeros");
if (old_oid && old_target)
BUG("delete called with both old_oid and old_target set");
if (old_target && !(flags & REF_NO_DEREF))
BUG("delete cannot operate on symrefs with deref mode");
return ref_transaction_update(transaction, refname,
null_oid(), old_oid,
NULL, old_target, flags,
msg, err);
}
int ref_transaction_verify(struct ref_transaction *transaction,
const char *refname,
const struct object_id *old_oid,
const char *old_target,
unsigned int flags,
struct strbuf *err)
{
if (!old_target && !old_oid)
BUG("verify called with old_oid and old_target set to NULL");
if (old_oid && old_target)
BUG("verify called with both old_oid and old_target set");
if (old_target && !(flags & REF_NO_DEREF))
BUG("verify cannot operate on symrefs with deref mode");
return ref_transaction_update(transaction, refname,
NULL, old_oid,
NULL, old_target,
flags, NULL, err);
}
int refs_update_ref(struct ref_store *refs, const char *msg,
const char *refname, const struct object_id *new_oid,
const struct object_id *old_oid, unsigned int flags,
enum action_on_err onerr)
{
struct ref_transaction *t = NULL;
struct strbuf err = STRBUF_INIT;
int ret = 0;
t = ref_store_transaction_begin(refs, &err);
if (!t ||
ref_transaction_update(t, refname, new_oid, old_oid, NULL, NULL,
flags, msg, &err) ||
ref_transaction_commit(t, &err)) {
ret = 1;
ref_transaction_free(t);
}
if (ret) {
const char *str = _("update_ref failed for ref '%s': %s");
switch (onerr) {
case UPDATE_REFS_MSG_ON_ERR:
error(str, refname, err.buf);
break;
case UPDATE_REFS_DIE_ON_ERR:
die(str, refname, err.buf);
break;
case UPDATE_REFS_QUIET_ON_ERR:
break;
}
strbuf_release(&err);
return 1;
}
strbuf_release(&err);
if (t)
ref_transaction_free(t);
return 0;
}
shorten_unambiguous_ref(): avoid sscanf() To shorten a fully qualified ref (e.g., taking "refs/heads/foo" to just "foo"), we munge the usual lookup rules ("refs/heads/%.*s", etc) to drop the ".*" modifier (so "refs/heads/%s"), and then use sscanf() to match that against the refname, pulling the "%s" content into a separate buffer. This has a few downsides: - sscanf("%s") reportedly misbehaves on macOS with some input and locale combinations, returning a partial or garbled string. See this thread: https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/ - scanf's matching of "%s" is greedy. So the "refs/remotes/%s/HEAD" rule would never pull "origin" out of "refs/remotes/origin/HEAD". Instead it always produced "origin/HEAD", which is redundant with the "refs/remotes/%s" rule. - scanf in general is an error-prone interface. For example, scanning for "%s" will copy bytes into a destination string, which must have been correctly sized ahead of time to avoid a buffer overflow. In this case, the code is OK (the buffer is pessimistically sized to match the original string, which should give us a maximum). But in general, we do not want to encourage people to use scanf at all. So instead, let's note that our lookup rules are not arbitrary format strings, but all contain exactly one "%.*s" placeholder. We already rely on this, both for lookup (we feed the lookup format along with exactly one int/ptr combo to snprintf, etc) and for shortening (we munge "%.*s" to "%s", and then insist that sscanf() finds exactly one result). We can parse this manually by just matching the bytes that occur before and after the "%.*s" placeholder. While we have a few extra lines of parsing code, the result is arguably simpler, as can skip the preprocessing step and its tricky memory management entirely. The in-code comments should explain the parsing strategy, but there's one subtle change here. The original code allocated a single buffer, and then overwrote it in each loop iteration, since that's the only option sscanf() gives us. But our parser can actually return a ptr/len combo for the matched string, which is all we need (since we just feed it back to the lookup rules with "%.*s"), and then copy it only when returning to the caller. There are a few new tests here, all using symbolic-ref (the code can be triggered in many ways, but symrefs are convenient in that we don't need to create a real ref, which avoids any complications from the filesystem munging the name): - the first covers the real-world case which misbehaved on macOS. Setting LC_ALL is required to trigger the problem there (since otherwise our tests use LC_ALL=C), and hopefully is at worst simply ignored on other systems (and doesn't cause libc to complain, etc, on systems without that locale). - the second covers the "origin/HEAD" case as discussed above, which is now fixed - the remainder are for "weird" cases that work both before and after this patch, but would be easy to get wrong with off-by-one problems in the parsing (and came out of discussions and earlier iterations of the patch that did get them wrong). - absent here are tests of boring, expected-to-work cases like "refs/heads/foo", etc. Those are covered all over the test suite both explicitly (for-each-ref's refname:short) and implicitly (in the output of git-status, etc). Reported-by: 孟子易 <mengziyi540841@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 23:16:21 +08:00
/*
* Check that the string refname matches a rule of the form
* "{prefix}%.*s{suffix}". So "foo/bar/baz" would match the rule
* "foo/%.*s/baz", and return the string "bar".
*/
static const char *match_parse_rule(const char *refname, const char *rule,
size_t *len)
{
shorten_unambiguous_ref(): avoid sscanf() To shorten a fully qualified ref (e.g., taking "refs/heads/foo" to just "foo"), we munge the usual lookup rules ("refs/heads/%.*s", etc) to drop the ".*" modifier (so "refs/heads/%s"), and then use sscanf() to match that against the refname, pulling the "%s" content into a separate buffer. This has a few downsides: - sscanf("%s") reportedly misbehaves on macOS with some input and locale combinations, returning a partial or garbled string. See this thread: https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/ - scanf's matching of "%s" is greedy. So the "refs/remotes/%s/HEAD" rule would never pull "origin" out of "refs/remotes/origin/HEAD". Instead it always produced "origin/HEAD", which is redundant with the "refs/remotes/%s" rule. - scanf in general is an error-prone interface. For example, scanning for "%s" will copy bytes into a destination string, which must have been correctly sized ahead of time to avoid a buffer overflow. In this case, the code is OK (the buffer is pessimistically sized to match the original string, which should give us a maximum). But in general, we do not want to encourage people to use scanf at all. So instead, let's note that our lookup rules are not arbitrary format strings, but all contain exactly one "%.*s" placeholder. We already rely on this, both for lookup (we feed the lookup format along with exactly one int/ptr combo to snprintf, etc) and for shortening (we munge "%.*s" to "%s", and then insist that sscanf() finds exactly one result). We can parse this manually by just matching the bytes that occur before and after the "%.*s" placeholder. While we have a few extra lines of parsing code, the result is arguably simpler, as can skip the preprocessing step and its tricky memory management entirely. The in-code comments should explain the parsing strategy, but there's one subtle change here. The original code allocated a single buffer, and then overwrote it in each loop iteration, since that's the only option sscanf() gives us. But our parser can actually return a ptr/len combo for the matched string, which is all we need (since we just feed it back to the lookup rules with "%.*s"), and then copy it only when returning to the caller. There are a few new tests here, all using symbolic-ref (the code can be triggered in many ways, but symrefs are convenient in that we don't need to create a real ref, which avoids any complications from the filesystem munging the name): - the first covers the real-world case which misbehaved on macOS. Setting LC_ALL is required to trigger the problem there (since otherwise our tests use LC_ALL=C), and hopefully is at worst simply ignored on other systems (and doesn't cause libc to complain, etc, on systems without that locale). - the second covers the "origin/HEAD" case as discussed above, which is now fixed - the remainder are for "weird" cases that work both before and after this patch, but would be easy to get wrong with off-by-one problems in the parsing (and came out of discussions and earlier iterations of the patch that did get them wrong). - absent here are tests of boring, expected-to-work cases like "refs/heads/foo", etc. Those are covered all over the test suite both explicitly (for-each-ref's refname:short) and implicitly (in the output of git-status, etc). Reported-by: 孟子易 <mengziyi540841@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 23:16:21 +08:00
/*
* Check that rule matches refname up to the first percent in the rule.
* We can bail immediately if not, but otherwise we leave "rule" at the
* %-placeholder, and "refname" at the start of the potential matched
* name.
*/
while (*rule != '%') {
if (!*rule)
BUG("rev-parse rule did not have percent");
if (*refname++ != *rule++)
return NULL;
}
shorten_unambiguous_ref(): avoid sscanf() To shorten a fully qualified ref (e.g., taking "refs/heads/foo" to just "foo"), we munge the usual lookup rules ("refs/heads/%.*s", etc) to drop the ".*" modifier (so "refs/heads/%s"), and then use sscanf() to match that against the refname, pulling the "%s" content into a separate buffer. This has a few downsides: - sscanf("%s") reportedly misbehaves on macOS with some input and locale combinations, returning a partial or garbled string. See this thread: https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/ - scanf's matching of "%s" is greedy. So the "refs/remotes/%s/HEAD" rule would never pull "origin" out of "refs/remotes/origin/HEAD". Instead it always produced "origin/HEAD", which is redundant with the "refs/remotes/%s" rule. - scanf in general is an error-prone interface. For example, scanning for "%s" will copy bytes into a destination string, which must have been correctly sized ahead of time to avoid a buffer overflow. In this case, the code is OK (the buffer is pessimistically sized to match the original string, which should give us a maximum). But in general, we do not want to encourage people to use scanf at all. So instead, let's note that our lookup rules are not arbitrary format strings, but all contain exactly one "%.*s" placeholder. We already rely on this, both for lookup (we feed the lookup format along with exactly one int/ptr combo to snprintf, etc) and for shortening (we munge "%.*s" to "%s", and then insist that sscanf() finds exactly one result). We can parse this manually by just matching the bytes that occur before and after the "%.*s" placeholder. While we have a few extra lines of parsing code, the result is arguably simpler, as can skip the preprocessing step and its tricky memory management entirely. The in-code comments should explain the parsing strategy, but there's one subtle change here. The original code allocated a single buffer, and then overwrote it in each loop iteration, since that's the only option sscanf() gives us. But our parser can actually return a ptr/len combo for the matched string, which is all we need (since we just feed it back to the lookup rules with "%.*s"), and then copy it only when returning to the caller. There are a few new tests here, all using symbolic-ref (the code can be triggered in many ways, but symrefs are convenient in that we don't need to create a real ref, which avoids any complications from the filesystem munging the name): - the first covers the real-world case which misbehaved on macOS. Setting LC_ALL is required to trigger the problem there (since otherwise our tests use LC_ALL=C), and hopefully is at worst simply ignored on other systems (and doesn't cause libc to complain, etc, on systems without that locale). - the second covers the "origin/HEAD" case as discussed above, which is now fixed - the remainder are for "weird" cases that work both before and after this patch, but would be easy to get wrong with off-by-one problems in the parsing (and came out of discussions and earlier iterations of the patch that did get them wrong). - absent here are tests of boring, expected-to-work cases like "refs/heads/foo", etc. Those are covered all over the test suite both explicitly (for-each-ref's refname:short) and implicitly (in the output of git-status, etc). Reported-by: 孟子易 <mengziyi540841@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 23:16:21 +08:00
/*
* Check that our "%" is the expected placeholder. This assumes there
* are no other percents (placeholder or quoted) in the string, but
* that is sufficient for our rev-parse rules.
*/
if (!skip_prefix(rule, "%.*s", &rule))
return NULL;
shorten_unambiguous_ref(): avoid sscanf() To shorten a fully qualified ref (e.g., taking "refs/heads/foo" to just "foo"), we munge the usual lookup rules ("refs/heads/%.*s", etc) to drop the ".*" modifier (so "refs/heads/%s"), and then use sscanf() to match that against the refname, pulling the "%s" content into a separate buffer. This has a few downsides: - sscanf("%s") reportedly misbehaves on macOS with some input and locale combinations, returning a partial or garbled string. See this thread: https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/ - scanf's matching of "%s" is greedy. So the "refs/remotes/%s/HEAD" rule would never pull "origin" out of "refs/remotes/origin/HEAD". Instead it always produced "origin/HEAD", which is redundant with the "refs/remotes/%s" rule. - scanf in general is an error-prone interface. For example, scanning for "%s" will copy bytes into a destination string, which must have been correctly sized ahead of time to avoid a buffer overflow. In this case, the code is OK (the buffer is pessimistically sized to match the original string, which should give us a maximum). But in general, we do not want to encourage people to use scanf at all. So instead, let's note that our lookup rules are not arbitrary format strings, but all contain exactly one "%.*s" placeholder. We already rely on this, both for lookup (we feed the lookup format along with exactly one int/ptr combo to snprintf, etc) and for shortening (we munge "%.*s" to "%s", and then insist that sscanf() finds exactly one result). We can parse this manually by just matching the bytes that occur before and after the "%.*s" placeholder. While we have a few extra lines of parsing code, the result is arguably simpler, as can skip the preprocessing step and its tricky memory management entirely. The in-code comments should explain the parsing strategy, but there's one subtle change here. The original code allocated a single buffer, and then overwrote it in each loop iteration, since that's the only option sscanf() gives us. But our parser can actually return a ptr/len combo for the matched string, which is all we need (since we just feed it back to the lookup rules with "%.*s"), and then copy it only when returning to the caller. There are a few new tests here, all using symbolic-ref (the code can be triggered in many ways, but symrefs are convenient in that we don't need to create a real ref, which avoids any complications from the filesystem munging the name): - the first covers the real-world case which misbehaved on macOS. Setting LC_ALL is required to trigger the problem there (since otherwise our tests use LC_ALL=C), and hopefully is at worst simply ignored on other systems (and doesn't cause libc to complain, etc, on systems without that locale). - the second covers the "origin/HEAD" case as discussed above, which is now fixed - the remainder are for "weird" cases that work both before and after this patch, but would be easy to get wrong with off-by-one problems in the parsing (and came out of discussions and earlier iterations of the patch that did get them wrong). - absent here are tests of boring, expected-to-work cases like "refs/heads/foo", etc. Those are covered all over the test suite both explicitly (for-each-ref's refname:short) and implicitly (in the output of git-status, etc). Reported-by: 孟子易 <mengziyi540841@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 23:16:21 +08:00
/*
* And now check that our suffix (if any) matches.
*/
if (!strip_suffix(refname, rule, len))
return NULL;
shorten_unambiguous_ref(): avoid sscanf() To shorten a fully qualified ref (e.g., taking "refs/heads/foo" to just "foo"), we munge the usual lookup rules ("refs/heads/%.*s", etc) to drop the ".*" modifier (so "refs/heads/%s"), and then use sscanf() to match that against the refname, pulling the "%s" content into a separate buffer. This has a few downsides: - sscanf("%s") reportedly misbehaves on macOS with some input and locale combinations, returning a partial or garbled string. See this thread: https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/ - scanf's matching of "%s" is greedy. So the "refs/remotes/%s/HEAD" rule would never pull "origin" out of "refs/remotes/origin/HEAD". Instead it always produced "origin/HEAD", which is redundant with the "refs/remotes/%s" rule. - scanf in general is an error-prone interface. For example, scanning for "%s" will copy bytes into a destination string, which must have been correctly sized ahead of time to avoid a buffer overflow. In this case, the code is OK (the buffer is pessimistically sized to match the original string, which should give us a maximum). But in general, we do not want to encourage people to use scanf at all. So instead, let's note that our lookup rules are not arbitrary format strings, but all contain exactly one "%.*s" placeholder. We already rely on this, both for lookup (we feed the lookup format along with exactly one int/ptr combo to snprintf, etc) and for shortening (we munge "%.*s" to "%s", and then insist that sscanf() finds exactly one result). We can parse this manually by just matching the bytes that occur before and after the "%.*s" placeholder. While we have a few extra lines of parsing code, the result is arguably simpler, as can skip the preprocessing step and its tricky memory management entirely. The in-code comments should explain the parsing strategy, but there's one subtle change here. The original code allocated a single buffer, and then overwrote it in each loop iteration, since that's the only option sscanf() gives us. But our parser can actually return a ptr/len combo for the matched string, which is all we need (since we just feed it back to the lookup rules with "%.*s"), and then copy it only when returning to the caller. There are a few new tests here, all using symbolic-ref (the code can be triggered in many ways, but symrefs are convenient in that we don't need to create a real ref, which avoids any complications from the filesystem munging the name): - the first covers the real-world case which misbehaved on macOS. Setting LC_ALL is required to trigger the problem there (since otherwise our tests use LC_ALL=C), and hopefully is at worst simply ignored on other systems (and doesn't cause libc to complain, etc, on systems without that locale). - the second covers the "origin/HEAD" case as discussed above, which is now fixed - the remainder are for "weird" cases that work both before and after this patch, but would be easy to get wrong with off-by-one problems in the parsing (and came out of discussions and earlier iterations of the patch that did get them wrong). - absent here are tests of boring, expected-to-work cases like "refs/heads/foo", etc. Those are covered all over the test suite both explicitly (for-each-ref's refname:short) and implicitly (in the output of git-status, etc). Reported-by: 孟子易 <mengziyi540841@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 23:16:21 +08:00
return refname; /* len set by strip_suffix() */
}
char *refs_shorten_unambiguous_ref(struct ref_store *refs,
const char *refname, int strict)
{
int i;
struct strbuf resolved_buf = STRBUF_INIT;
/* skip first rule, it will always match */
shorten_unambiguous_ref(): use NUM_REV_PARSE_RULES constant The ref_rev_parse_rules[] array is terminated with a NULL entry, and we count it and store the result in the local nr_rules variable. But we don't need to do so; since the array is a constant, we can compute its size directly. The original code probably didn't do that because it was written as part of for-each-ref, and saw the array only as a pointer. It was migrated in 7c2b3029df (make get_short_ref a public function, 2009-04-07) and could have been updated then, but that subtlety was not noticed. We even have a constant that represents this value already, courtesy of 60650a48c0 (remote: make refspec follow the same disambiguation rule as local refs, 2018-08-01), though again, nobody noticed at the time that it could be used here, too. The current count-up isn't a big deal, as we need to preprocess that array anyway. But it will become more cumbersome as we refactor the shortening code. So let's get rid of it and just use the constant everywhere. Note that there are two things here that aren't just simple text replacements: 1. We also use nr_rules to see if a previous call has initialized the static pre-processing variables. We can just use the scanf_fmts pointer to do the same thing, as it is non-NULL only after we've done that initialization. 2. If nr_rules is zero after we've counted it up, we bail from the function. This code is unreachable, though, as the set of rules is hard-coded and non-empty. And that becomes even more apparent now that we are using the constant. So we can drop this conditional completely (and ironically, the code would have the same output if it _did_ trigger, as we'd simply skip the loop entirely and return the whole refname). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 23:16:18 +08:00
for (i = NUM_REV_PARSE_RULES - 1; i > 0 ; --i) {
int j;
int rules_to_fail = i;
shorten_unambiguous_ref(): avoid sscanf() To shorten a fully qualified ref (e.g., taking "refs/heads/foo" to just "foo"), we munge the usual lookup rules ("refs/heads/%.*s", etc) to drop the ".*" modifier (so "refs/heads/%s"), and then use sscanf() to match that against the refname, pulling the "%s" content into a separate buffer. This has a few downsides: - sscanf("%s") reportedly misbehaves on macOS with some input and locale combinations, returning a partial or garbled string. See this thread: https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/ - scanf's matching of "%s" is greedy. So the "refs/remotes/%s/HEAD" rule would never pull "origin" out of "refs/remotes/origin/HEAD". Instead it always produced "origin/HEAD", which is redundant with the "refs/remotes/%s" rule. - scanf in general is an error-prone interface. For example, scanning for "%s" will copy bytes into a destination string, which must have been correctly sized ahead of time to avoid a buffer overflow. In this case, the code is OK (the buffer is pessimistically sized to match the original string, which should give us a maximum). But in general, we do not want to encourage people to use scanf at all. So instead, let's note that our lookup rules are not arbitrary format strings, but all contain exactly one "%.*s" placeholder. We already rely on this, both for lookup (we feed the lookup format along with exactly one int/ptr combo to snprintf, etc) and for shortening (we munge "%.*s" to "%s", and then insist that sscanf() finds exactly one result). We can parse this manually by just matching the bytes that occur before and after the "%.*s" placeholder. While we have a few extra lines of parsing code, the result is arguably simpler, as can skip the preprocessing step and its tricky memory management entirely. The in-code comments should explain the parsing strategy, but there's one subtle change here. The original code allocated a single buffer, and then overwrote it in each loop iteration, since that's the only option sscanf() gives us. But our parser can actually return a ptr/len combo for the matched string, which is all we need (since we just feed it back to the lookup rules with "%.*s"), and then copy it only when returning to the caller. There are a few new tests here, all using symbolic-ref (the code can be triggered in many ways, but symrefs are convenient in that we don't need to create a real ref, which avoids any complications from the filesystem munging the name): - the first covers the real-world case which misbehaved on macOS. Setting LC_ALL is required to trigger the problem there (since otherwise our tests use LC_ALL=C), and hopefully is at worst simply ignored on other systems (and doesn't cause libc to complain, etc, on systems without that locale). - the second covers the "origin/HEAD" case as discussed above, which is now fixed - the remainder are for "weird" cases that work both before and after this patch, but would be easy to get wrong with off-by-one problems in the parsing (and came out of discussions and earlier iterations of the patch that did get them wrong). - absent here are tests of boring, expected-to-work cases like "refs/heads/foo", etc. Those are covered all over the test suite both explicitly (for-each-ref's refname:short) and implicitly (in the output of git-status, etc). Reported-by: 孟子易 <mengziyi540841@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 23:16:21 +08:00
const char *short_name;
shorten_unambiguous_ref(): avoid integer truncation We parse the shortened name "foo" out of the full refname "refs/heads/foo", and then assign the result of strlen(short_name) to an int, which may truncate or wrap to negative. In practice, this should never happen, as it requires a 2GB refname. And even somebody trying to do something malicious should at worst end up with a confused answer (we use the size only to feed back as a placeholder length to strbuf_addf() to see if there are any collisions in the lookup rules). And it may even be impossible to trigger this, as we parse the string with sscanf(), and stdio formatting functions are not known for handling large strings well. I didn't test, but I wouldn't be surprised if sscanf() on many platforms simply reports no match here. But even if it is not a problem in practice so far, it is worth fixing for two reasons: 1. We'll shortly be replacing the sscanf() call with a real parser which will handle arbitrary-sized strings. 2. Assigning strlen() to an int is an anti-pattern that requires people to look twice when auditing for real overflow problems. So we'll make this a size_t. Unfortunately we still have to cast to int eventually for the strbuf_addf() call, but at least we can localize the cast there, and check that it will be valid. I used our new cast helper here, which will just bail completely. That should be OK, as anybody with a 2GB refname is up to no good, but if we really wanted to, we could detect it manually and just refuse to shorten the refname. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 23:16:14 +08:00
size_t short_name_len;
shorten_unambiguous_ref(): avoid sscanf() To shorten a fully qualified ref (e.g., taking "refs/heads/foo" to just "foo"), we munge the usual lookup rules ("refs/heads/%.*s", etc) to drop the ".*" modifier (so "refs/heads/%s"), and then use sscanf() to match that against the refname, pulling the "%s" content into a separate buffer. This has a few downsides: - sscanf("%s") reportedly misbehaves on macOS with some input and locale combinations, returning a partial or garbled string. See this thread: https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/ - scanf's matching of "%s" is greedy. So the "refs/remotes/%s/HEAD" rule would never pull "origin" out of "refs/remotes/origin/HEAD". Instead it always produced "origin/HEAD", which is redundant with the "refs/remotes/%s" rule. - scanf in general is an error-prone interface. For example, scanning for "%s" will copy bytes into a destination string, which must have been correctly sized ahead of time to avoid a buffer overflow. In this case, the code is OK (the buffer is pessimistically sized to match the original string, which should give us a maximum). But in general, we do not want to encourage people to use scanf at all. So instead, let's note that our lookup rules are not arbitrary format strings, but all contain exactly one "%.*s" placeholder. We already rely on this, both for lookup (we feed the lookup format along with exactly one int/ptr combo to snprintf, etc) and for shortening (we munge "%.*s" to "%s", and then insist that sscanf() finds exactly one result). We can parse this manually by just matching the bytes that occur before and after the "%.*s" placeholder. While we have a few extra lines of parsing code, the result is arguably simpler, as can skip the preprocessing step and its tricky memory management entirely. The in-code comments should explain the parsing strategy, but there's one subtle change here. The original code allocated a single buffer, and then overwrote it in each loop iteration, since that's the only option sscanf() gives us. But our parser can actually return a ptr/len combo for the matched string, which is all we need (since we just feed it back to the lookup rules with "%.*s"), and then copy it only when returning to the caller. There are a few new tests here, all using symbolic-ref (the code can be triggered in many ways, but symrefs are convenient in that we don't need to create a real ref, which avoids any complications from the filesystem munging the name): - the first covers the real-world case which misbehaved on macOS. Setting LC_ALL is required to trigger the problem there (since otherwise our tests use LC_ALL=C), and hopefully is at worst simply ignored on other systems (and doesn't cause libc to complain, etc, on systems without that locale). - the second covers the "origin/HEAD" case as discussed above, which is now fixed - the remainder are for "weird" cases that work both before and after this patch, but would be easy to get wrong with off-by-one problems in the parsing (and came out of discussions and earlier iterations of the patch that did get them wrong). - absent here are tests of boring, expected-to-work cases like "refs/heads/foo", etc. Those are covered all over the test suite both explicitly (for-each-ref's refname:short) and implicitly (in the output of git-status, etc). Reported-by: 孟子易 <mengziyi540841@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 23:16:21 +08:00
short_name = match_parse_rule(refname, ref_rev_parse_rules[i],
&short_name_len);
if (!short_name)
continue;
/*
* in strict mode, all (except the matched one) rules
* must fail to resolve to a valid non-ambiguous ref
*/
if (strict)
shorten_unambiguous_ref(): use NUM_REV_PARSE_RULES constant The ref_rev_parse_rules[] array is terminated with a NULL entry, and we count it and store the result in the local nr_rules variable. But we don't need to do so; since the array is a constant, we can compute its size directly. The original code probably didn't do that because it was written as part of for-each-ref, and saw the array only as a pointer. It was migrated in 7c2b3029df (make get_short_ref a public function, 2009-04-07) and could have been updated then, but that subtlety was not noticed. We even have a constant that represents this value already, courtesy of 60650a48c0 (remote: make refspec follow the same disambiguation rule as local refs, 2018-08-01), though again, nobody noticed at the time that it could be used here, too. The current count-up isn't a big deal, as we need to preprocess that array anyway. But it will become more cumbersome as we refactor the shortening code. So let's get rid of it and just use the constant everywhere. Note that there are two things here that aren't just simple text replacements: 1. We also use nr_rules to see if a previous call has initialized the static pre-processing variables. We can just use the scanf_fmts pointer to do the same thing, as it is non-NULL only after we've done that initialization. 2. If nr_rules is zero after we've counted it up, we bail from the function. This code is unreachable, though, as the set of rules is hard-coded and non-empty. And that becomes even more apparent now that we are using the constant. So we can drop this conditional completely (and ironically, the code would have the same output if it _did_ trigger, as we'd simply skip the loop entirely and return the whole refname). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 23:16:18 +08:00
rules_to_fail = NUM_REV_PARSE_RULES;
/*
* check if the short name resolves to a valid ref,
* but use only rules prior to the matched one
*/
for (j = 0; j < rules_to_fail; j++) {
const char *rule = ref_rev_parse_rules[j];
/* skip matched rule */
if (i == j)
continue;
/*
* the short name is ambiguous, if it resolves
* (with this previous rule) to a valid ref
* read_ref() returns 0 on success
*/
strbuf_reset(&resolved_buf);
strbuf_addf(&resolved_buf, rule,
shorten_unambiguous_ref(): avoid integer truncation We parse the shortened name "foo" out of the full refname "refs/heads/foo", and then assign the result of strlen(short_name) to an int, which may truncate or wrap to negative. In practice, this should never happen, as it requires a 2GB refname. And even somebody trying to do something malicious should at worst end up with a confused answer (we use the size only to feed back as a placeholder length to strbuf_addf() to see if there are any collisions in the lookup rules). And it may even be impossible to trigger this, as we parse the string with sscanf(), and stdio formatting functions are not known for handling large strings well. I didn't test, but I wouldn't be surprised if sscanf() on many platforms simply reports no match here. But even if it is not a problem in practice so far, it is worth fixing for two reasons: 1. We'll shortly be replacing the sscanf() call with a real parser which will handle arbitrary-sized strings. 2. Assigning strlen() to an int is an anti-pattern that requires people to look twice when auditing for real overflow problems. So we'll make this a size_t. Unfortunately we still have to cast to int eventually for the strbuf_addf() call, but at least we can localize the cast there, and check that it will be valid. I used our new cast helper here, which will just bail completely. That should be OK, as anybody with a 2GB refname is up to no good, but if we really wanted to, we could detect it manually and just refuse to shorten the refname. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 23:16:14 +08:00
cast_size_t_to_int(short_name_len),
short_name);
if (refs_ref_exists(refs, resolved_buf.buf))
break;
}
/*
* short name is non-ambiguous if all previous rules
* haven't resolved to a valid ref
*/
if (j == rules_to_fail) {
strbuf_release(&resolved_buf);
shorten_unambiguous_ref(): avoid sscanf() To shorten a fully qualified ref (e.g., taking "refs/heads/foo" to just "foo"), we munge the usual lookup rules ("refs/heads/%.*s", etc) to drop the ".*" modifier (so "refs/heads/%s"), and then use sscanf() to match that against the refname, pulling the "%s" content into a separate buffer. This has a few downsides: - sscanf("%s") reportedly misbehaves on macOS with some input and locale combinations, returning a partial or garbled string. See this thread: https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/ - scanf's matching of "%s" is greedy. So the "refs/remotes/%s/HEAD" rule would never pull "origin" out of "refs/remotes/origin/HEAD". Instead it always produced "origin/HEAD", which is redundant with the "refs/remotes/%s" rule. - scanf in general is an error-prone interface. For example, scanning for "%s" will copy bytes into a destination string, which must have been correctly sized ahead of time to avoid a buffer overflow. In this case, the code is OK (the buffer is pessimistically sized to match the original string, which should give us a maximum). But in general, we do not want to encourage people to use scanf at all. So instead, let's note that our lookup rules are not arbitrary format strings, but all contain exactly one "%.*s" placeholder. We already rely on this, both for lookup (we feed the lookup format along with exactly one int/ptr combo to snprintf, etc) and for shortening (we munge "%.*s" to "%s", and then insist that sscanf() finds exactly one result). We can parse this manually by just matching the bytes that occur before and after the "%.*s" placeholder. While we have a few extra lines of parsing code, the result is arguably simpler, as can skip the preprocessing step and its tricky memory management entirely. The in-code comments should explain the parsing strategy, but there's one subtle change here. The original code allocated a single buffer, and then overwrote it in each loop iteration, since that's the only option sscanf() gives us. But our parser can actually return a ptr/len combo for the matched string, which is all we need (since we just feed it back to the lookup rules with "%.*s"), and then copy it only when returning to the caller. There are a few new tests here, all using symbolic-ref (the code can be triggered in many ways, but symrefs are convenient in that we don't need to create a real ref, which avoids any complications from the filesystem munging the name): - the first covers the real-world case which misbehaved on macOS. Setting LC_ALL is required to trigger the problem there (since otherwise our tests use LC_ALL=C), and hopefully is at worst simply ignored on other systems (and doesn't cause libc to complain, etc, on systems without that locale). - the second covers the "origin/HEAD" case as discussed above, which is now fixed - the remainder are for "weird" cases that work both before and after this patch, but would be easy to get wrong with off-by-one problems in the parsing (and came out of discussions and earlier iterations of the patch that did get them wrong). - absent here are tests of boring, expected-to-work cases like "refs/heads/foo", etc. Those are covered all over the test suite both explicitly (for-each-ref's refname:short) and implicitly (in the output of git-status, etc). Reported-by: 孟子易 <mengziyi540841@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 23:16:21 +08:00
return xmemdupz(short_name, short_name_len);
}
}
strbuf_release(&resolved_buf);
return xstrdup(refname);
}
upload/receive-pack: allow hiding ref hierarchies A repository may have refs that are only used for its internal bookkeeping purposes that should not be exposed to the others that come over the network. Teach upload-pack to omit some refs from its initial advertisement by paying attention to the uploadpack.hiderefs multi-valued configuration variable. Do the same to receive-pack via the receive.hiderefs variable. As a convenient short-hand, allow using transfer.hiderefs to set the value to both of these variables. Any ref that is under the hierarchies listed on the value of these variable is excluded from responses to requests made by "ls-remote", "fetch", etc. (for upload-pack) and "push" (for receive-pack). Because these hidden refs do not count as OUR_REF, an attempt to fetch objects at the tip of them will be rejected, and because these refs do not get advertised, "git push :" will not see local branches that have the same name as them as "matching" ones to be sent. An attempt to update/delete these hidden refs with an explicit refspec, e.g. "git push origin :refs/hidden/22", is rejected. This is not a new restriction. To the pusher, it would appear that there is no such ref, so its push request will conclude with "Now that I sent you all the data, it is time for you to update the refs. I saw that the ref did not exist when I started pushing, and I want the result to point at this commit". The receiving end will apply the compare-and-swap rule to this request and rejects the push with "Well, your update request conflicts with somebody else; I see there is such a ref.", which is the right thing to do. Otherwise a push to a hidden ref will always be "the last one wins", which is not a good default. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 08:08:30 +08:00
int parse_hide_refs_config(const char *var, const char *value, const char *section,
struct strvec *hide_refs)
upload/receive-pack: allow hiding ref hierarchies A repository may have refs that are only used for its internal bookkeeping purposes that should not be exposed to the others that come over the network. Teach upload-pack to omit some refs from its initial advertisement by paying attention to the uploadpack.hiderefs multi-valued configuration variable. Do the same to receive-pack via the receive.hiderefs variable. As a convenient short-hand, allow using transfer.hiderefs to set the value to both of these variables. Any ref that is under the hierarchies listed on the value of these variable is excluded from responses to requests made by "ls-remote", "fetch", etc. (for upload-pack) and "push" (for receive-pack). Because these hidden refs do not count as OUR_REF, an attempt to fetch objects at the tip of them will be rejected, and because these refs do not get advertised, "git push :" will not see local branches that have the same name as them as "matching" ones to be sent. An attempt to update/delete these hidden refs with an explicit refspec, e.g. "git push origin :refs/hidden/22", is rejected. This is not a new restriction. To the pusher, it would appear that there is no such ref, so its push request will conclude with "Now that I sent you all the data, it is time for you to update the refs. I saw that the ref did not exist when I started pushing, and I want the result to point at this commit". The receiving end will apply the compare-and-swap rule to this request and rejects the push with "Well, your update request conflicts with somebody else; I see there is such a ref.", which is the right thing to do. Otherwise a push to a hidden ref will always be "the last one wins", which is not a good default. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 08:08:30 +08:00
{
const char *key;
upload/receive-pack: allow hiding ref hierarchies A repository may have refs that are only used for its internal bookkeeping purposes that should not be exposed to the others that come over the network. Teach upload-pack to omit some refs from its initial advertisement by paying attention to the uploadpack.hiderefs multi-valued configuration variable. Do the same to receive-pack via the receive.hiderefs variable. As a convenient short-hand, allow using transfer.hiderefs to set the value to both of these variables. Any ref that is under the hierarchies listed on the value of these variable is excluded from responses to requests made by "ls-remote", "fetch", etc. (for upload-pack) and "push" (for receive-pack). Because these hidden refs do not count as OUR_REF, an attempt to fetch objects at the tip of them will be rejected, and because these refs do not get advertised, "git push :" will not see local branches that have the same name as them as "matching" ones to be sent. An attempt to update/delete these hidden refs with an explicit refspec, e.g. "git push origin :refs/hidden/22", is rejected. This is not a new restriction. To the pusher, it would appear that there is no such ref, so its push request will conclude with "Now that I sent you all the data, it is time for you to update the refs. I saw that the ref did not exist when I started pushing, and I want the result to point at this commit". The receiving end will apply the compare-and-swap rule to this request and rejects the push with "Well, your update request conflicts with somebody else; I see there is such a ref.", which is the right thing to do. Otherwise a push to a hidden ref will always be "the last one wins", which is not a good default. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 08:08:30 +08:00
if (!strcmp("transfer.hiderefs", var) ||
(!parse_config_key(var, section, NULL, NULL, &key) &&
!strcmp(key, "hiderefs"))) {
upload/receive-pack: allow hiding ref hierarchies A repository may have refs that are only used for its internal bookkeeping purposes that should not be exposed to the others that come over the network. Teach upload-pack to omit some refs from its initial advertisement by paying attention to the uploadpack.hiderefs multi-valued configuration variable. Do the same to receive-pack via the receive.hiderefs variable. As a convenient short-hand, allow using transfer.hiderefs to set the value to both of these variables. Any ref that is under the hierarchies listed on the value of these variable is excluded from responses to requests made by "ls-remote", "fetch", etc. (for upload-pack) and "push" (for receive-pack). Because these hidden refs do not count as OUR_REF, an attempt to fetch objects at the tip of them will be rejected, and because these refs do not get advertised, "git push :" will not see local branches that have the same name as them as "matching" ones to be sent. An attempt to update/delete these hidden refs with an explicit refspec, e.g. "git push origin :refs/hidden/22", is rejected. This is not a new restriction. To the pusher, it would appear that there is no such ref, so its push request will conclude with "Now that I sent you all the data, it is time for you to update the refs. I saw that the ref did not exist when I started pushing, and I want the result to point at this commit". The receiving end will apply the compare-and-swap rule to this request and rejects the push with "Well, your update request conflicts with somebody else; I see there is such a ref.", which is the right thing to do. Otherwise a push to a hidden ref will always be "the last one wins", which is not a good default. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 08:08:30 +08:00
char *ref;
int len;
if (!value)
return config_error_nonbool(var);
/* drop const to remove trailing '/' characters */
ref = (char *)strvec_push(hide_refs, value);
upload/receive-pack: allow hiding ref hierarchies A repository may have refs that are only used for its internal bookkeeping purposes that should not be exposed to the others that come over the network. Teach upload-pack to omit some refs from its initial advertisement by paying attention to the uploadpack.hiderefs multi-valued configuration variable. Do the same to receive-pack via the receive.hiderefs variable. As a convenient short-hand, allow using transfer.hiderefs to set the value to both of these variables. Any ref that is under the hierarchies listed on the value of these variable is excluded from responses to requests made by "ls-remote", "fetch", etc. (for upload-pack) and "push" (for receive-pack). Because these hidden refs do not count as OUR_REF, an attempt to fetch objects at the tip of them will be rejected, and because these refs do not get advertised, "git push :" will not see local branches that have the same name as them as "matching" ones to be sent. An attempt to update/delete these hidden refs with an explicit refspec, e.g. "git push origin :refs/hidden/22", is rejected. This is not a new restriction. To the pusher, it would appear that there is no such ref, so its push request will conclude with "Now that I sent you all the data, it is time for you to update the refs. I saw that the ref did not exist when I started pushing, and I want the result to point at this commit". The receiving end will apply the compare-and-swap rule to this request and rejects the push with "Well, your update request conflicts with somebody else; I see there is such a ref.", which is the right thing to do. Otherwise a push to a hidden ref will always be "the last one wins", which is not a good default. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 08:08:30 +08:00
len = strlen(ref);
while (len && ref[len - 1] == '/')
ref[--len] = '\0';
}
return 0;
}
int ref_is_hidden(const char *refname, const char *refname_full,
const struct strvec *hide_refs)
upload/receive-pack: allow hiding ref hierarchies A repository may have refs that are only used for its internal bookkeeping purposes that should not be exposed to the others that come over the network. Teach upload-pack to omit some refs from its initial advertisement by paying attention to the uploadpack.hiderefs multi-valued configuration variable. Do the same to receive-pack via the receive.hiderefs variable. As a convenient short-hand, allow using transfer.hiderefs to set the value to both of these variables. Any ref that is under the hierarchies listed on the value of these variable is excluded from responses to requests made by "ls-remote", "fetch", etc. (for upload-pack) and "push" (for receive-pack). Because these hidden refs do not count as OUR_REF, an attempt to fetch objects at the tip of them will be rejected, and because these refs do not get advertised, "git push :" will not see local branches that have the same name as them as "matching" ones to be sent. An attempt to update/delete these hidden refs with an explicit refspec, e.g. "git push origin :refs/hidden/22", is rejected. This is not a new restriction. To the pusher, it would appear that there is no such ref, so its push request will conclude with "Now that I sent you all the data, it is time for you to update the refs. I saw that the ref did not exist when I started pushing, and I want the result to point at this commit". The receiving end will apply the compare-and-swap rule to this request and rejects the push with "Well, your update request conflicts with somebody else; I see there is such a ref.", which is the right thing to do. Otherwise a push to a hidden ref will always be "the last one wins", which is not a good default. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 08:08:30 +08:00
{
refs: support negative transfer.hideRefs If you hide a hierarchy of refs using the transfer.hideRefs config, there is no way to later override that config to "unhide" it. This patch implements a "negative" hide which causes matches to immediately be marked as unhidden, even if another match would hide it. We take care to apply the matches in reverse-order from how they are fed to us by the config machinery, as that lets our usual "last one wins" config precedence work (and entries in .git/config, for example, will override /etc/gitconfig). So you can now do: $ git config --system transfer.hideRefs refs/secret $ git config transfer.hideRefs '!refs/secret/not-so-secret' to hide refs/secret in all repos, except for one public bit in one specific repo. Or you can even do: $ git clone \ -u "git -c transfer.hiderefs="!refs/foo" upload-pack" \ remote:repo.git to clone remote:repo.git, overriding any hiding it has configured. There are two alternatives that were considered and rejected: 1. A generic config mechanism for removing an item from a list. E.g.: (e.g., "[transfer] hideRefs -= refs/foo"). This is nice because it could apply to other multi-valued config, as well. But it is not nearly as flexible. There is no way to say: [transfer] hideRefs = refs/secret hideRefs = refs/secret/not-so-secret Having explicit negative specifications means we can override previous entries, even if they are not the same literal string. 2. Adding another variable to override some parts of hideRefs (e.g., "exposeRefs"). This solves the problem from alternative (1), but it cannot easily obey the normal config precedence, because it would use two separate lists. For example: [transfer] hideRefs = refs/secret exposeRefs = refs/secret/not-so-secret hideRefs = refs/secret/not-so-secret/no-really-its-secret With two lists, we have to apply the "expose" rules first, and only then apply the "hide" rules. But that does not match what the above config intends. Of course we could internally parse that to a single list, respecting the ordering, which saves us having to invent the new "!" syntax. But using a single name communicates to the user that the ordering _is_ important. And "!" is well-known for negation, and should not appear at the beginning of a ref (it is actually valid in a ref-name, but all entries here should be fully-qualified, starting with "refs/"). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-29 04:23:26 +08:00
int i;
upload/receive-pack: allow hiding ref hierarchies A repository may have refs that are only used for its internal bookkeeping purposes that should not be exposed to the others that come over the network. Teach upload-pack to omit some refs from its initial advertisement by paying attention to the uploadpack.hiderefs multi-valued configuration variable. Do the same to receive-pack via the receive.hiderefs variable. As a convenient short-hand, allow using transfer.hiderefs to set the value to both of these variables. Any ref that is under the hierarchies listed on the value of these variable is excluded from responses to requests made by "ls-remote", "fetch", etc. (for upload-pack) and "push" (for receive-pack). Because these hidden refs do not count as OUR_REF, an attempt to fetch objects at the tip of them will be rejected, and because these refs do not get advertised, "git push :" will not see local branches that have the same name as them as "matching" ones to be sent. An attempt to update/delete these hidden refs with an explicit refspec, e.g. "git push origin :refs/hidden/22", is rejected. This is not a new restriction. To the pusher, it would appear that there is no such ref, so its push request will conclude with "Now that I sent you all the data, it is time for you to update the refs. I saw that the ref did not exist when I started pushing, and I want the result to point at this commit". The receiving end will apply the compare-and-swap rule to this request and rejects the push with "Well, your update request conflicts with somebody else; I see there is such a ref.", which is the right thing to do. Otherwise a push to a hidden ref will always be "the last one wins", which is not a good default. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 08:08:30 +08:00
refs: support negative transfer.hideRefs If you hide a hierarchy of refs using the transfer.hideRefs config, there is no way to later override that config to "unhide" it. This patch implements a "negative" hide which causes matches to immediately be marked as unhidden, even if another match would hide it. We take care to apply the matches in reverse-order from how they are fed to us by the config machinery, as that lets our usual "last one wins" config precedence work (and entries in .git/config, for example, will override /etc/gitconfig). So you can now do: $ git config --system transfer.hideRefs refs/secret $ git config transfer.hideRefs '!refs/secret/not-so-secret' to hide refs/secret in all repos, except for one public bit in one specific repo. Or you can even do: $ git clone \ -u "git -c transfer.hiderefs="!refs/foo" upload-pack" \ remote:repo.git to clone remote:repo.git, overriding any hiding it has configured. There are two alternatives that were considered and rejected: 1. A generic config mechanism for removing an item from a list. E.g.: (e.g., "[transfer] hideRefs -= refs/foo"). This is nice because it could apply to other multi-valued config, as well. But it is not nearly as flexible. There is no way to say: [transfer] hideRefs = refs/secret hideRefs = refs/secret/not-so-secret Having explicit negative specifications means we can override previous entries, even if they are not the same literal string. 2. Adding another variable to override some parts of hideRefs (e.g., "exposeRefs"). This solves the problem from alternative (1), but it cannot easily obey the normal config precedence, because it would use two separate lists. For example: [transfer] hideRefs = refs/secret exposeRefs = refs/secret/not-so-secret hideRefs = refs/secret/not-so-secret/no-really-its-secret With two lists, we have to apply the "expose" rules first, and only then apply the "hide" rules. But that does not match what the above config intends. Of course we could internally parse that to a single list, respecting the ordering, which saves us having to invent the new "!" syntax. But using a single name communicates to the user that the ordering _is_ important. And "!" is well-known for negation, and should not appear at the beginning of a ref (it is actually valid in a ref-name, but all entries here should be fully-qualified, starting with "refs/"). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-29 04:23:26 +08:00
for (i = hide_refs->nr - 1; i >= 0; i--) {
const char *match = hide_refs->v[i];
const char *subject;
refs: support negative transfer.hideRefs If you hide a hierarchy of refs using the transfer.hideRefs config, there is no way to later override that config to "unhide" it. This patch implements a "negative" hide which causes matches to immediately be marked as unhidden, even if another match would hide it. We take care to apply the matches in reverse-order from how they are fed to us by the config machinery, as that lets our usual "last one wins" config precedence work (and entries in .git/config, for example, will override /etc/gitconfig). So you can now do: $ git config --system transfer.hideRefs refs/secret $ git config transfer.hideRefs '!refs/secret/not-so-secret' to hide refs/secret in all repos, except for one public bit in one specific repo. Or you can even do: $ git clone \ -u "git -c transfer.hiderefs="!refs/foo" upload-pack" \ remote:repo.git to clone remote:repo.git, overriding any hiding it has configured. There are two alternatives that were considered and rejected: 1. A generic config mechanism for removing an item from a list. E.g.: (e.g., "[transfer] hideRefs -= refs/foo"). This is nice because it could apply to other multi-valued config, as well. But it is not nearly as flexible. There is no way to say: [transfer] hideRefs = refs/secret hideRefs = refs/secret/not-so-secret Having explicit negative specifications means we can override previous entries, even if they are not the same literal string. 2. Adding another variable to override some parts of hideRefs (e.g., "exposeRefs"). This solves the problem from alternative (1), but it cannot easily obey the normal config precedence, because it would use two separate lists. For example: [transfer] hideRefs = refs/secret exposeRefs = refs/secret/not-so-secret hideRefs = refs/secret/not-so-secret/no-really-its-secret With two lists, we have to apply the "expose" rules first, and only then apply the "hide" rules. But that does not match what the above config intends. Of course we could internally parse that to a single list, respecting the ordering, which saves us having to invent the new "!" syntax. But using a single name communicates to the user that the ordering _is_ important. And "!" is well-known for negation, and should not appear at the beginning of a ref (it is actually valid in a ref-name, but all entries here should be fully-qualified, starting with "refs/"). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-29 04:23:26 +08:00
int neg = 0;
const char *p;
refs: support negative transfer.hideRefs If you hide a hierarchy of refs using the transfer.hideRefs config, there is no way to later override that config to "unhide" it. This patch implements a "negative" hide which causes matches to immediately be marked as unhidden, even if another match would hide it. We take care to apply the matches in reverse-order from how they are fed to us by the config machinery, as that lets our usual "last one wins" config precedence work (and entries in .git/config, for example, will override /etc/gitconfig). So you can now do: $ git config --system transfer.hideRefs refs/secret $ git config transfer.hideRefs '!refs/secret/not-so-secret' to hide refs/secret in all repos, except for one public bit in one specific repo. Or you can even do: $ git clone \ -u "git -c transfer.hiderefs="!refs/foo" upload-pack" \ remote:repo.git to clone remote:repo.git, overriding any hiding it has configured. There are two alternatives that were considered and rejected: 1. A generic config mechanism for removing an item from a list. E.g.: (e.g., "[transfer] hideRefs -= refs/foo"). This is nice because it could apply to other multi-valued config, as well. But it is not nearly as flexible. There is no way to say: [transfer] hideRefs = refs/secret hideRefs = refs/secret/not-so-secret Having explicit negative specifications means we can override previous entries, even if they are not the same literal string. 2. Adding another variable to override some parts of hideRefs (e.g., "exposeRefs"). This solves the problem from alternative (1), but it cannot easily obey the normal config precedence, because it would use two separate lists. For example: [transfer] hideRefs = refs/secret exposeRefs = refs/secret/not-so-secret hideRefs = refs/secret/not-so-secret/no-really-its-secret With two lists, we have to apply the "expose" rules first, and only then apply the "hide" rules. But that does not match what the above config intends. Of course we could internally parse that to a single list, respecting the ordering, which saves us having to invent the new "!" syntax. But using a single name communicates to the user that the ordering _is_ important. And "!" is well-known for negation, and should not appear at the beginning of a ref (it is actually valid in a ref-name, but all entries here should be fully-qualified, starting with "refs/"). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-29 04:23:26 +08:00
if (*match == '!') {
neg = 1;
match++;
}
if (*match == '^') {
subject = refname_full;
match++;
} else {
subject = refname;
}
/* refname can be NULL when namespaces are used. */
if (subject &&
skip_prefix(subject, match, &p) &&
(!*p || *p == '/'))
refs: support negative transfer.hideRefs If you hide a hierarchy of refs using the transfer.hideRefs config, there is no way to later override that config to "unhide" it. This patch implements a "negative" hide which causes matches to immediately be marked as unhidden, even if another match would hide it. We take care to apply the matches in reverse-order from how they are fed to us by the config machinery, as that lets our usual "last one wins" config precedence work (and entries in .git/config, for example, will override /etc/gitconfig). So you can now do: $ git config --system transfer.hideRefs refs/secret $ git config transfer.hideRefs '!refs/secret/not-so-secret' to hide refs/secret in all repos, except for one public bit in one specific repo. Or you can even do: $ git clone \ -u "git -c transfer.hiderefs="!refs/foo" upload-pack" \ remote:repo.git to clone remote:repo.git, overriding any hiding it has configured. There are two alternatives that were considered and rejected: 1. A generic config mechanism for removing an item from a list. E.g.: (e.g., "[transfer] hideRefs -= refs/foo"). This is nice because it could apply to other multi-valued config, as well. But it is not nearly as flexible. There is no way to say: [transfer] hideRefs = refs/secret hideRefs = refs/secret/not-so-secret Having explicit negative specifications means we can override previous entries, even if they are not the same literal string. 2. Adding another variable to override some parts of hideRefs (e.g., "exposeRefs"). This solves the problem from alternative (1), but it cannot easily obey the normal config precedence, because it would use two separate lists. For example: [transfer] hideRefs = refs/secret exposeRefs = refs/secret/not-so-secret hideRefs = refs/secret/not-so-secret/no-really-its-secret With two lists, we have to apply the "expose" rules first, and only then apply the "hide" rules. But that does not match what the above config intends. Of course we could internally parse that to a single list, respecting the ordering, which saves us having to invent the new "!" syntax. But using a single name communicates to the user that the ordering _is_ important. And "!" is well-known for negation, and should not appear at the beginning of a ref (it is actually valid in a ref-name, but all entries here should be fully-qualified, starting with "refs/"). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-29 04:23:26 +08:00
return !neg;
upload/receive-pack: allow hiding ref hierarchies A repository may have refs that are only used for its internal bookkeeping purposes that should not be exposed to the others that come over the network. Teach upload-pack to omit some refs from its initial advertisement by paying attention to the uploadpack.hiderefs multi-valued configuration variable. Do the same to receive-pack via the receive.hiderefs variable. As a convenient short-hand, allow using transfer.hiderefs to set the value to both of these variables. Any ref that is under the hierarchies listed on the value of these variable is excluded from responses to requests made by "ls-remote", "fetch", etc. (for upload-pack) and "push" (for receive-pack). Because these hidden refs do not count as OUR_REF, an attempt to fetch objects at the tip of them will be rejected, and because these refs do not get advertised, "git push :" will not see local branches that have the same name as them as "matching" ones to be sent. An attempt to update/delete these hidden refs with an explicit refspec, e.g. "git push origin :refs/hidden/22", is rejected. This is not a new restriction. To the pusher, it would appear that there is no such ref, so its push request will conclude with "Now that I sent you all the data, it is time for you to update the refs. I saw that the ref did not exist when I started pushing, and I want the result to point at this commit". The receiving end will apply the compare-and-swap rule to this request and rejects the push with "Well, your update request conflicts with somebody else; I see there is such a ref.", which is the right thing to do. Otherwise a push to a hidden ref will always be "the last one wins", which is not a good default. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 08:08:30 +08:00
}
return 0;
}
const char **hidden_refs_to_excludes(const struct strvec *hide_refs)
{
const char **pattern;
for (pattern = hide_refs->v; *pattern; pattern++) {
/*
* We can't feed any excludes from hidden refs config
* sections, since later rules may override previous
* ones. For example, with rules "refs/foo" and
* "!refs/foo/bar", we should show "refs/foo/bar" (and
* everything underneath it), but the earlier exclusion
* would cause us to skip all of "refs/foo". We
* likewise don't implement the namespace stripping
* required for '^' rules.
*
* Both are possible to do, but complicated, so avoid
* populating the jump list at all if we see either of
* these patterns.
*/
if (**pattern == '!' || **pattern == '^')
return NULL;
}
return hide_refs->v;
}
refs: properly apply exclude patterns to namespaced refs Reference namespaces allow commands like git-upload-pack(1) to serve different sets of references to the client depending on which namespace is enabled, which is for example useful in fork networks. Namespaced refs are stored with a `refs/namespaces/$namespace` prefix, but all the user will ultimately see is a stripped version where that prefix is removed. The way that this interacts with "transfer.hideRefs" is not immediately obvious: the hidden refs can either apply to the stripped references, or to the non-stripped ones that still have the namespace prefix. In fact, the "transfer.hideRefs" machinery does the former and applies to the stripped reference by default, but rules can have "^" prefixed to switch this behaviour to instead match against the full reference name. Namespaces are exclusively handled at the generic "refs" layer, the respective backends have no clue that such a thing even exists. This also has the consequence that they cannot handle hiding references as soon as reference namespaces come into play because they neither know whether a namespace is active, nor do they know how to strip references if they are active. Handling such exclude patterns in `refs_for_each_namespaced_ref()` and `refs_for_each_fullref_in_prefixes()` is broken though, as both support that the user passes both namespaces and exclude patterns. In the case where both are set we will exclude references with unstripped names, even though we really wanted to exclude references based on their stripped names. This only surfaces when: - A repository uses reference namespaces. - "transfer.hideRefs" is active. - The namespaced references are packed into the "packed-refs" file. None of our tests exercise this scenario, and thus we haven't ever hit it. While t5509 exercises both (1) and (2), it does not happen to hit (3). It is trivial to demonstrate the bug though by explicitly packing refs in the tests, and then we indeed surface the breakage. Fix this bug by prefixing exclude patterns with the namespace in the generic layer. The newly introduced function will be used outside of "refs.c" in the next patch, so we add a declaration to "refs.h". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-16 16:50:03 +08:00
const char **get_namespaced_exclude_patterns(const char **exclude_patterns,
const char *namespace,
struct strvec *out)
{
if (!namespace || !*namespace || !exclude_patterns || !*exclude_patterns)
return exclude_patterns;
for (size_t i = 0; exclude_patterns[i]; i++)
strvec_pushf(out, "%s%s", namespace, exclude_patterns[i]);
return out->v;
}
const char *find_descendant_ref(const char *dirname,
const struct string_list *extras,
const struct string_list *skip)
{
int pos;
if (!extras)
return NULL;
/*
* Look at the place where dirname would be inserted into
* extras. If there is an entry at that position that starts
* with dirname (remember, dirname includes the trailing
* slash) and is not in skip, then we have a conflict.
*/
for (pos = string_list_find_insert_index(extras, dirname, 0);
pos < extras->nr; pos++) {
const char *extra_refname = extras->items[pos].string;
if (!starts_with(extra_refname, dirname))
break;
if (!skip || !string_list_has_string(skip, extra_refname))
return extra_refname;
}
return NULL;
}
int refs_head_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
{
struct object_id oid;
int flag;
if (refs_resolve_ref_unsafe(refs, "HEAD", RESOLVE_REF_READING,
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 22:37:01 +08:00
&oid, &flag))
return fn("HEAD", NULL, &oid, flag, cb_data);
return 0;
}
struct ref_iterator *refs_ref_iterator_begin(
struct ref_store *refs,
const char *prefix,
const char **exclude_patterns,
int trim,
enum do_for_each_ref_flags flags)
{
struct ref_iterator *iter;
if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN)) {
static int ref_paranoia = -1;
if (ref_paranoia < 0)
refs: turn on GIT_REF_PARANOIA by default The original point of the GIT_REF_PARANOIA flag was to include broken refs in iterations, so that possibly-destructive operations would not silently ignore them (and would generally instead try to operate on the oids and fail when the objects could not be accessed). We already turned this on by default for some dangerous operations, like "repack -ad" (where missing a reachability tip would mean dropping the associated history). But it was not on for general use, even though it could easily result in the spreading of corruption (e.g., imagine cloning a repository which simply omits some of its refs because their objects are missing; the result quietly succeeds even though you did not clone everything!). This patch turns on GIT_REF_PARANOIA by default. So a clone as mentioned above would actually fail (upload-pack tells us about the broken ref, and when we ask for the objects, pack-objects fails to deliver them). This may be inconvenient when working with a corrupted repository, but: - we are better off to err on the side of complaining about corruption, and then provide mechanisms for explicitly loosening safety. - this is only one type of corruption anyway. If we are missing any other objects in the history that _aren't_ ref tips, then we'd behave similarly (happily show the ref, but then barf when we started traversing). We retain the GIT_REF_PARANOIA variable, but simply default it to "1" instead of "0". That gives the user an escape hatch for loosening this when working with a corrupt repository. It won't work across a remote connection to upload-pack (because we can't necessarily set environment variables on the remote), but there the client has other options (e.g., choosing which refs to fetch). As a bonus, this also makes ref iteration faster in general (because we don't have to call has_object_file() for each ref), though probably not noticeably so in the general case. In a repo with a million refs, it shaved a few hundred milliseconds off of upload-pack's advertisement; that's noticeable, but most repos are not nearly that large. The possible downside here is that any operation which iterates refs but doesn't ever open their objects may now quietly claim to have X when the object is corrupted (e.g., "git rev-list new-branch --not --all" will treat a broken ref as uninteresting). But again, that's not really any different than corruption below the ref level. We might have refs/heads/old-branch as non-corrupt, but we are not actively checking that we have the entire reachable history. Or the pointed-to object could even be corrupted on-disk (but our "do we have it" check would still succeed). In that sense, this is merely bringing ref-corruption in line with general object corruption. One alternative implementation would be to actually check for broken refs, and then _immediately die_ if we see any. That would cause the "rev-list --not --all" case above to abort immediately. But in many ways that's the worst of all worlds: - it still spends time looking up the objects an extra time - it still doesn't catch corruption below the ref level - it's even more inconvenient; with the current implementation of GIT_REF_PARANOIA for something like upload-pack, we can make the advertisement and let the client choose a non-broken piece of history. If we bail as soon as we see a broken ref, they cannot even see the advertisement. The test changes here show some of the fallout. A non-destructive "git repack -adk" now fails by default (but we can override it). Deleting a broken ref now actually tells the hooks the correct "before" state, rather than a confusing null oid. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-25 02:46:13 +08:00
ref_paranoia = git_env_bool("GIT_REF_PARANOIA", 1);
if (ref_paranoia) {
flags |= DO_FOR_EACH_INCLUDE_BROKEN;
flags |= DO_FOR_EACH_OMIT_DANGLING_SYMREFS;
}
}
iter = refs->be->iterator_begin(refs, prefix, exclude_patterns, flags);
/*
* `iterator_begin()` already takes care of prefix, but we
* might need to do some trimming:
*/
if (trim)
iter = prefix_ref_iterator_begin(iter, "", trim);
return iter;
}
static int do_for_each_ref(struct ref_store *refs, const char *prefix,
const char **exclude_patterns,
each_ref_fn fn, int trim,
enum do_for_each_ref_flags flags, void *cb_data)
do_for_each_ref(): reimplement using reference iteration Use the reference iterator interface to implement do_for_each_ref(). Delete a bunch of code supporting the old for_each_ref() implementation. And now that do_for_each_ref() is generic code (it is no longer tied to the files backend), move it to refs.c. The implementation is via a new function, do_for_each_ref_iterator(), which takes a reference iterator as argument and calls a callback function for each of the references in the iterator. This change requires the current_ref performance hack for peel_ref() to be implemented via ref_iterator_peel() rather than peel_entry() because we don't have a ref_entry handy (it is hidden under three layers: file_ref_iterator, merge_ref_iterator, and cache_ref_iterator). So: * do_for_each_ref_iterator() records the active iterator in current_ref_iter while it is running. * peel_ref() checks whether current_ref_iter is pointing at the requested reference. If so, it asks the iterator to peel the reference (which it can do efficiently via its "peel" virtual function). For extra safety, we do the optimization only if the refname *addresses* are the same, not only if the refname *strings* are the same, to forestall possible mixups between refnames that come from different ref_iterators. Please note that this optimization of peel_ref() is only available when iterating via do_for_each_ref_iterator() (including all of the for_each_ref() functions, which call it indirectly). It would be complicated to implement a similar optimization when iterating directly using a reference iterator, because multiple reference iterators can be in use at the same time, with interleaved calls to ref_iterator_advance(). (In fact we do exactly that in merge_ref_iterator.) But that is not necessary. peel_ref() is only called while iterating over references. Callers who iterate using the for_each_ref() functions benefit from the optimization described above. Callers who iterate using reference iterators directly have access to the ref_iterator, so they can call ref_iterator_peel() themselves to get an analogous optimization in a more straightforward manner. If we rewrite all callers to use the reference iteration API, then we can remove the current_ref_iter hack permanently. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-18 12:15:16 +08:00
{
struct ref_iterator *iter;
if (!refs)
return 0;
iter = refs_ref_iterator_begin(refs, prefix, exclude_patterns, trim,
flags);
do_for_each_ref(): reimplement using reference iteration Use the reference iterator interface to implement do_for_each_ref(). Delete a bunch of code supporting the old for_each_ref() implementation. And now that do_for_each_ref() is generic code (it is no longer tied to the files backend), move it to refs.c. The implementation is via a new function, do_for_each_ref_iterator(), which takes a reference iterator as argument and calls a callback function for each of the references in the iterator. This change requires the current_ref performance hack for peel_ref() to be implemented via ref_iterator_peel() rather than peel_entry() because we don't have a ref_entry handy (it is hidden under three layers: file_ref_iterator, merge_ref_iterator, and cache_ref_iterator). So: * do_for_each_ref_iterator() records the active iterator in current_ref_iter while it is running. * peel_ref() checks whether current_ref_iter is pointing at the requested reference. If so, it asks the iterator to peel the reference (which it can do efficiently via its "peel" virtual function). For extra safety, we do the optimization only if the refname *addresses* are the same, not only if the refname *strings* are the same, to forestall possible mixups between refnames that come from different ref_iterators. Please note that this optimization of peel_ref() is only available when iterating via do_for_each_ref_iterator() (including all of the for_each_ref() functions, which call it indirectly). It would be complicated to implement a similar optimization when iterating directly using a reference iterator, because multiple reference iterators can be in use at the same time, with interleaved calls to ref_iterator_advance(). (In fact we do exactly that in merge_ref_iterator.) But that is not necessary. peel_ref() is only called while iterating over references. Callers who iterate using the for_each_ref() functions benefit from the optimization described above. Callers who iterate using reference iterators directly have access to the ref_iterator, so they can call ref_iterator_peel() themselves to get an analogous optimization in a more straightforward manner. If we rewrite all callers to use the reference iteration API, then we can remove the current_ref_iter hack permanently. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-18 12:15:16 +08:00
return do_for_each_ref_iterator(iter, fn, cb_data);
do_for_each_ref(): reimplement using reference iteration Use the reference iterator interface to implement do_for_each_ref(). Delete a bunch of code supporting the old for_each_ref() implementation. And now that do_for_each_ref() is generic code (it is no longer tied to the files backend), move it to refs.c. The implementation is via a new function, do_for_each_ref_iterator(), which takes a reference iterator as argument and calls a callback function for each of the references in the iterator. This change requires the current_ref performance hack for peel_ref() to be implemented via ref_iterator_peel() rather than peel_entry() because we don't have a ref_entry handy (it is hidden under three layers: file_ref_iterator, merge_ref_iterator, and cache_ref_iterator). So: * do_for_each_ref_iterator() records the active iterator in current_ref_iter while it is running. * peel_ref() checks whether current_ref_iter is pointing at the requested reference. If so, it asks the iterator to peel the reference (which it can do efficiently via its "peel" virtual function). For extra safety, we do the optimization only if the refname *addresses* are the same, not only if the refname *strings* are the same, to forestall possible mixups between refnames that come from different ref_iterators. Please note that this optimization of peel_ref() is only available when iterating via do_for_each_ref_iterator() (including all of the for_each_ref() functions, which call it indirectly). It would be complicated to implement a similar optimization when iterating directly using a reference iterator, because multiple reference iterators can be in use at the same time, with interleaved calls to ref_iterator_advance(). (In fact we do exactly that in merge_ref_iterator.) But that is not necessary. peel_ref() is only called while iterating over references. Callers who iterate using the for_each_ref() functions benefit from the optimization described above. Callers who iterate using reference iterators directly have access to the ref_iterator, so they can call ref_iterator_peel() themselves to get an analogous optimization in a more straightforward manner. If we rewrite all callers to use the reference iteration API, then we can remove the current_ref_iter hack permanently. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-18 12:15:16 +08:00
}
int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
{
return do_for_each_ref(refs, "", NULL, fn, 0, 0, cb_data);
}
int refs_for_each_ref_in(struct ref_store *refs, const char *prefix,
each_ref_fn fn, void *cb_data)
{
return do_for_each_ref(refs, prefix, NULL, fn, strlen(prefix), 0, cb_data);
}
int refs_for_each_fullref_in(struct ref_store *refs, const char *prefix,
const char **exclude_patterns,
each_ref_fn fn, void *cb_data)
{
return do_for_each_ref(refs, prefix, exclude_patterns, fn, 0, 0, cb_data);
}
int refs_for_each_replace_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
{
const char *git_replace_ref_base = ref_namespace[NAMESPACE_REPLACE].ref;
return do_for_each_ref(refs, git_replace_ref_base, NULL, fn,
strlen(git_replace_ref_base),
DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
}
int refs_for_each_namespaced_ref(struct ref_store *refs,
const char **exclude_patterns,
each_ref_fn fn, void *cb_data)
{
refs: properly apply exclude patterns to namespaced refs Reference namespaces allow commands like git-upload-pack(1) to serve different sets of references to the client depending on which namespace is enabled, which is for example useful in fork networks. Namespaced refs are stored with a `refs/namespaces/$namespace` prefix, but all the user will ultimately see is a stripped version where that prefix is removed. The way that this interacts with "transfer.hideRefs" is not immediately obvious: the hidden refs can either apply to the stripped references, or to the non-stripped ones that still have the namespace prefix. In fact, the "transfer.hideRefs" machinery does the former and applies to the stripped reference by default, but rules can have "^" prefixed to switch this behaviour to instead match against the full reference name. Namespaces are exclusively handled at the generic "refs" layer, the respective backends have no clue that such a thing even exists. This also has the consequence that they cannot handle hiding references as soon as reference namespaces come into play because they neither know whether a namespace is active, nor do they know how to strip references if they are active. Handling such exclude patterns in `refs_for_each_namespaced_ref()` and `refs_for_each_fullref_in_prefixes()` is broken though, as both support that the user passes both namespaces and exclude patterns. In the case where both are set we will exclude references with unstripped names, even though we really wanted to exclude references based on their stripped names. This only surfaces when: - A repository uses reference namespaces. - "transfer.hideRefs" is active. - The namespaced references are packed into the "packed-refs" file. None of our tests exercise this scenario, and thus we haven't ever hit it. While t5509 exercises both (1) and (2), it does not happen to hit (3). It is trivial to demonstrate the bug though by explicitly packing refs in the tests, and then we indeed surface the breakage. Fix this bug by prefixing exclude patterns with the namespace in the generic layer. The newly introduced function will be used outside of "refs.c" in the next patch, so we add a declaration to "refs.h". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-16 16:50:03 +08:00
struct strvec namespaced_exclude_patterns = STRVEC_INIT;
struct strbuf prefix = STRBUF_INIT;
int ret;
refs: properly apply exclude patterns to namespaced refs Reference namespaces allow commands like git-upload-pack(1) to serve different sets of references to the client depending on which namespace is enabled, which is for example useful in fork networks. Namespaced refs are stored with a `refs/namespaces/$namespace` prefix, but all the user will ultimately see is a stripped version where that prefix is removed. The way that this interacts with "transfer.hideRefs" is not immediately obvious: the hidden refs can either apply to the stripped references, or to the non-stripped ones that still have the namespace prefix. In fact, the "transfer.hideRefs" machinery does the former and applies to the stripped reference by default, but rules can have "^" prefixed to switch this behaviour to instead match against the full reference name. Namespaces are exclusively handled at the generic "refs" layer, the respective backends have no clue that such a thing even exists. This also has the consequence that they cannot handle hiding references as soon as reference namespaces come into play because they neither know whether a namespace is active, nor do they know how to strip references if they are active. Handling such exclude patterns in `refs_for_each_namespaced_ref()` and `refs_for_each_fullref_in_prefixes()` is broken though, as both support that the user passes both namespaces and exclude patterns. In the case where both are set we will exclude references with unstripped names, even though we really wanted to exclude references based on their stripped names. This only surfaces when: - A repository uses reference namespaces. - "transfer.hideRefs" is active. - The namespaced references are packed into the "packed-refs" file. None of our tests exercise this scenario, and thus we haven't ever hit it. While t5509 exercises both (1) and (2), it does not happen to hit (3). It is trivial to demonstrate the bug though by explicitly packing refs in the tests, and then we indeed surface the breakage. Fix this bug by prefixing exclude patterns with the namespace in the generic layer. The newly introduced function will be used outside of "refs.c" in the next patch, so we add a declaration to "refs.h". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-16 16:50:03 +08:00
exclude_patterns = get_namespaced_exclude_patterns(exclude_patterns,
get_git_namespace(),
&namespaced_exclude_patterns);
strbuf_addf(&prefix, "%srefs/", get_git_namespace());
ret = do_for_each_ref(refs, prefix.buf, exclude_patterns, fn, 0, 0, cb_data);
strvec_clear(&namespaced_exclude_patterns);
strbuf_release(&prefix);
return ret;
}
int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
{
return do_for_each_ref(refs, "", NULL, fn, 0,
DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
}
int refs_for_each_include_root_refs(struct ref_store *refs, each_ref_fn fn,
void *cb_data)
{
return do_for_each_ref(refs, "", NULL, fn, 0,
DO_FOR_EACH_INCLUDE_ROOT_REFS, cb_data);
}
static int qsort_strcmp(const void *va, const void *vb)
{
const char *a = *(const char **)va;
const char *b = *(const char **)vb;
return strcmp(a, b);
}
static void find_longest_prefixes_1(struct string_list *out,
struct strbuf *prefix,
const char **patterns, size_t nr)
{
size_t i;
for (i = 0; i < nr; i++) {
char c = patterns[i][prefix->len];
if (!c || is_glob_special(c)) {
string_list_append(out, prefix->buf);
return;
}
}
i = 0;
while (i < nr) {
size_t end;
/*
* Set "end" to the index of the element _after_ the last one
* in our group.
*/
for (end = i + 1; end < nr; end++) {
if (patterns[i][prefix->len] != patterns[end][prefix->len])
break;
}
strbuf_addch(prefix, patterns[i][prefix->len]);
find_longest_prefixes_1(out, prefix, patterns + i, end - i);
strbuf_setlen(prefix, prefix->len - 1);
i = end;
}
}
static void find_longest_prefixes(struct string_list *out,
const char **patterns)
{
struct strvec sorted = STRVEC_INIT;
struct strbuf prefix = STRBUF_INIT;
strvec_pushv(&sorted, patterns);
QSORT(sorted.v, sorted.nr, qsort_strcmp);
find_longest_prefixes_1(out, &prefix, sorted.v, sorted.nr);
strvec_clear(&sorted);
strbuf_release(&prefix);
}
int refs_for_each_fullref_in_prefixes(struct ref_store *ref_store,
const char *namespace,
const char **patterns,
const char **exclude_patterns,
each_ref_fn fn, void *cb_data)
{
refs: properly apply exclude patterns to namespaced refs Reference namespaces allow commands like git-upload-pack(1) to serve different sets of references to the client depending on which namespace is enabled, which is for example useful in fork networks. Namespaced refs are stored with a `refs/namespaces/$namespace` prefix, but all the user will ultimately see is a stripped version where that prefix is removed. The way that this interacts with "transfer.hideRefs" is not immediately obvious: the hidden refs can either apply to the stripped references, or to the non-stripped ones that still have the namespace prefix. In fact, the "transfer.hideRefs" machinery does the former and applies to the stripped reference by default, but rules can have "^" prefixed to switch this behaviour to instead match against the full reference name. Namespaces are exclusively handled at the generic "refs" layer, the respective backends have no clue that such a thing even exists. This also has the consequence that they cannot handle hiding references as soon as reference namespaces come into play because they neither know whether a namespace is active, nor do they know how to strip references if they are active. Handling such exclude patterns in `refs_for_each_namespaced_ref()` and `refs_for_each_fullref_in_prefixes()` is broken though, as both support that the user passes both namespaces and exclude patterns. In the case where both are set we will exclude references with unstripped names, even though we really wanted to exclude references based on their stripped names. This only surfaces when: - A repository uses reference namespaces. - "transfer.hideRefs" is active. - The namespaced references are packed into the "packed-refs" file. None of our tests exercise this scenario, and thus we haven't ever hit it. While t5509 exercises both (1) and (2), it does not happen to hit (3). It is trivial to demonstrate the bug though by explicitly packing refs in the tests, and then we indeed surface the breakage. Fix this bug by prefixing exclude patterns with the namespace in the generic layer. The newly introduced function will be used outside of "refs.c" in the next patch, so we add a declaration to "refs.h". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-16 16:50:03 +08:00
struct strvec namespaced_exclude_patterns = STRVEC_INIT;
struct string_list prefixes = STRING_LIST_INIT_DUP;
struct string_list_item *prefix;
struct strbuf buf = STRBUF_INIT;
int ret = 0, namespace_len;
find_longest_prefixes(&prefixes, patterns);
if (namespace)
strbuf_addstr(&buf, namespace);
namespace_len = buf.len;
refs: properly apply exclude patterns to namespaced refs Reference namespaces allow commands like git-upload-pack(1) to serve different sets of references to the client depending on which namespace is enabled, which is for example useful in fork networks. Namespaced refs are stored with a `refs/namespaces/$namespace` prefix, but all the user will ultimately see is a stripped version where that prefix is removed. The way that this interacts with "transfer.hideRefs" is not immediately obvious: the hidden refs can either apply to the stripped references, or to the non-stripped ones that still have the namespace prefix. In fact, the "transfer.hideRefs" machinery does the former and applies to the stripped reference by default, but rules can have "^" prefixed to switch this behaviour to instead match against the full reference name. Namespaces are exclusively handled at the generic "refs" layer, the respective backends have no clue that such a thing even exists. This also has the consequence that they cannot handle hiding references as soon as reference namespaces come into play because they neither know whether a namespace is active, nor do they know how to strip references if they are active. Handling such exclude patterns in `refs_for_each_namespaced_ref()` and `refs_for_each_fullref_in_prefixes()` is broken though, as both support that the user passes both namespaces and exclude patterns. In the case where both are set we will exclude references with unstripped names, even though we really wanted to exclude references based on their stripped names. This only surfaces when: - A repository uses reference namespaces. - "transfer.hideRefs" is active. - The namespaced references are packed into the "packed-refs" file. None of our tests exercise this scenario, and thus we haven't ever hit it. While t5509 exercises both (1) and (2), it does not happen to hit (3). It is trivial to demonstrate the bug though by explicitly packing refs in the tests, and then we indeed surface the breakage. Fix this bug by prefixing exclude patterns with the namespace in the generic layer. The newly introduced function will be used outside of "refs.c" in the next patch, so we add a declaration to "refs.h". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-16 16:50:03 +08:00
exclude_patterns = get_namespaced_exclude_patterns(exclude_patterns,
namespace,
&namespaced_exclude_patterns);
for_each_string_list_item(prefix, &prefixes) {
strbuf_addstr(&buf, prefix->string);
ret = refs_for_each_fullref_in(ref_store, buf.buf,
exclude_patterns, fn, cb_data);
if (ret)
break;
strbuf_setlen(&buf, namespace_len);
}
refs: properly apply exclude patterns to namespaced refs Reference namespaces allow commands like git-upload-pack(1) to serve different sets of references to the client depending on which namespace is enabled, which is for example useful in fork networks. Namespaced refs are stored with a `refs/namespaces/$namespace` prefix, but all the user will ultimately see is a stripped version where that prefix is removed. The way that this interacts with "transfer.hideRefs" is not immediately obvious: the hidden refs can either apply to the stripped references, or to the non-stripped ones that still have the namespace prefix. In fact, the "transfer.hideRefs" machinery does the former and applies to the stripped reference by default, but rules can have "^" prefixed to switch this behaviour to instead match against the full reference name. Namespaces are exclusively handled at the generic "refs" layer, the respective backends have no clue that such a thing even exists. This also has the consequence that they cannot handle hiding references as soon as reference namespaces come into play because they neither know whether a namespace is active, nor do they know how to strip references if they are active. Handling such exclude patterns in `refs_for_each_namespaced_ref()` and `refs_for_each_fullref_in_prefixes()` is broken though, as both support that the user passes both namespaces and exclude patterns. In the case where both are set we will exclude references with unstripped names, even though we really wanted to exclude references based on their stripped names. This only surfaces when: - A repository uses reference namespaces. - "transfer.hideRefs" is active. - The namespaced references are packed into the "packed-refs" file. None of our tests exercise this scenario, and thus we haven't ever hit it. While t5509 exercises both (1) and (2), it does not happen to hit (3). It is trivial to demonstrate the bug though by explicitly packing refs in the tests, and then we indeed surface the breakage. Fix this bug by prefixing exclude patterns with the namespace in the generic layer. The newly introduced function will be used outside of "refs.c" in the next patch, so we add a declaration to "refs.h". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-16 16:50:03 +08:00
strvec_clear(&namespaced_exclude_patterns);
string_list_clear(&prefixes, 0);
strbuf_release(&buf);
return ret;
}
static int refs_read_special_head(struct ref_store *ref_store,
const char *refname, struct object_id *oid,
struct strbuf *referent, unsigned int *type,
int *failure_errno)
{
struct strbuf full_path = STRBUF_INIT;
struct strbuf content = STRBUF_INIT;
int result = -1;
strbuf_addf(&full_path, "%s/%s", ref_store->gitdir, refname);
if (strbuf_read_file(&content, full_path.buf, 0) < 0) {
*failure_errno = errno;
goto done;
}
result = parse_loose_ref_contents(ref_store->repo->hash_algo, content.buf,
oid, referent, type, failure_errno);
done:
strbuf_release(&full_path);
strbuf_release(&content);
return result;
}
int refs_read_raw_ref(struct ref_store *ref_store, const char *refname,
struct object_id *oid, struct strbuf *referent,
unsigned int *type, int *failure_errno)
{
assert(failure_errno);
if (is_pseudo_ref(refname))
return refs_read_special_head(ref_store, refname, oid, referent,
type, failure_errno);
return ref_store->be->read_raw_ref(ref_store, refname, oid, referent,
type, failure_errno);
}
refs: add ability for backends to special-case reading of symbolic refs Reading of symbolic and non-symbolic references is currently treated the same in reference backends: we always call `refs_read_raw_ref()` and then decide based on the returned flags what type it is. This has one downside though: symbolic references may be treated different from normal references in a backend from normal references. The packed-refs backend for example doesn't even know about symbolic references, and as a result it is pointless to even ask it for one. There are cases where we really only care about whether a reference is symbolic or not, but don't care about whether it exists at all or may be a non-symbolic reference. But it is not possible to optimize for this case right now, and as a consequence we will always first check for a loose reference to exist, and if it doesn't, we'll query the packed-refs backend for a known-to-not-be-symbolic reference. This is inefficient and requires us to search all packed references even though we know to not care for the result at all. Introduce a new function `refs_read_symbolic_ref()` which allows us to fix this case. This function will only ever return symbolic references and can thus optimize for the scenario layed out above. By default, if the backend doesn't provide an implementation for it, we just use the old code path and fall back to `read_raw_ref()`. But in case the backend provides its own, more efficient implementation, we will use that one instead. Note that this function is explicitly designed to not distinguish between missing references and non-symbolic references. If it did, we'd be forced to always search the packed-refs backend to see whether the symbolic reference the user asked for really doesn't exist, or if it exists as a non-symbolic reference. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 17:33:46 +08:00
int refs_read_symbolic_ref(struct ref_store *ref_store, const char *refname,
struct strbuf *referent)
{
return ref_store->be->read_symbolic_ref(ref_store, refname, referent);
refs: add ability for backends to special-case reading of symbolic refs Reading of symbolic and non-symbolic references is currently treated the same in reference backends: we always call `refs_read_raw_ref()` and then decide based on the returned flags what type it is. This has one downside though: symbolic references may be treated different from normal references in a backend from normal references. The packed-refs backend for example doesn't even know about symbolic references, and as a result it is pointless to even ask it for one. There are cases where we really only care about whether a reference is symbolic or not, but don't care about whether it exists at all or may be a non-symbolic reference. But it is not possible to optimize for this case right now, and as a consequence we will always first check for a loose reference to exist, and if it doesn't, we'll query the packed-refs backend for a known-to-not-be-symbolic reference. This is inefficient and requires us to search all packed references even though we know to not care for the result at all. Introduce a new function `refs_read_symbolic_ref()` which allows us to fix this case. This function will only ever return symbolic references and can thus optimize for the scenario layed out above. By default, if the backend doesn't provide an implementation for it, we just use the old code path and fall back to `read_raw_ref()`. But in case the backend provides its own, more efficient implementation, we will use that one instead. Note that this function is explicitly designed to not distinguish between missing references and non-symbolic references. If it did, we'd be forced to always search the packed-refs backend to see whether the symbolic reference the user asked for really doesn't exist, or if it exists as a non-symbolic reference. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 17:33:46 +08:00
}
const char *refs_resolve_ref_unsafe(struct ref_store *refs,
const char *refname,
int resolve_flags,
2021-10-16 17:39:08 +08:00
struct object_id *oid,
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 22:37:01 +08:00
int *flags)
{
static struct strbuf sb_refname = STRBUF_INIT;
struct object_id unused_oid;
int unused_flags;
int symref_count;
if (!oid)
oid = &unused_oid;
if (!flags)
flags = &unused_flags;
*flags = 0;
if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
if (!(resolve_flags & RESOLVE_REF_ALLOW_BAD_NAME) ||
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 22:37:01 +08:00
!refname_is_safe(refname))
return NULL;
/*
* repo_dwim_ref() uses REF_ISBROKEN to distinguish between
* missing refs and refs that were present but invalid,
* to complain about the latter to stderr.
*
* We don't know whether the ref exists, so don't set
* REF_ISBROKEN yet.
*/
*flags |= REF_BAD_NAME;
}
for (symref_count = 0; symref_count < SYMREF_MAXDEPTH; symref_count++) {
unsigned int read_flags = 0;
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 22:37:01 +08:00
int failure_errno;
if (refs_read_raw_ref(refs, refname, oid, &sb_refname,
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 22:37:01 +08:00
&read_flags, &failure_errno)) {
*flags |= read_flags;
refs_resolve_ref_unsafe: handle d/f conflicts for writes If our call to refs_read_raw_ref() fails, we check errno to see if the ref is simply missing, or if we encountered a more serious error. If it's just missing, then in "write" mode (i.e., when RESOLVE_REFS_READING is not set), this is perfectly fine. However, checking for ENOENT isn't sufficient to catch all missing-ref cases. In the filesystem backend, we may also see EISDIR when we try to resolve "a" and "a/b" exists. Likewise, we may see ENOTDIR if we try to resolve "a/b" and "a" exists. In both of those cases, we know that our resolved ref doesn't exist, but we return an error (rather than reporting the refname and returning a null sha1). This has been broken for a long time, but nobody really noticed because the next step after resolving without the READING flag is usually to lock the ref and write it. But in both of those cases, the write will fail with the same errno due to the directory/file conflict. There are two cases where we can notice this, though: 1. If we try to write "a" and there's a leftover directory already at "a", even though there is no ref "a/b". The actual write is smart enough to move the empty "a" out of the way. This is reasonably rare, if only because the writing code has to do an independent resolution before trying its write (because the actual update_ref() code handles this case fine). The notes-merge code does this, and before the fix in the prior commit t3308 erroneously expected this case to fail. 2. When resolving symbolic refs, we typically do not use the READING flag because we want to resolve even symrefs that point to unborn refs. Even if those unborn refs could not actually be written because of d/f conflicts with existing refs. You can see this by asking "git symbolic-ref" to report the target of a symref pointing past a d/f conflict. We can fix the problem by recognizing the other "missing" errnos and treating them like ENOENT. This should be safe to do even for callers who are then going to actually write the ref, because the actual writing process will fail if the d/f conflict is a real one (and t1404 checks these cases). Arguably this should be the responsibility of the files-backend to normalize all "missing ref" errors into ENOENT (since something like EISDIR may not be meaningful at all to a database backend). However other callers of refs_read_raw_ref() may actually care about the distinction; putting this into resolve_ref() is the minimal fix for now. The new tests in t1401 use git-symbolic-ref, which is the most direct way to check the resolution by itself. Interestingly we actually had a test that setup this case already, but we only used it to verify that the funny state could be overwritten, not that it could be resolved. We also add a new test in t3200, as "branch -m" was the original motivation for looking into this. What happens is this: 0. HEAD is pointing to branch "a" 1. The user asks to rename "a" to "a/b". 2. We create "a/b" and delete "a". 3. We then try to update any worktree HEADs that point to the renamed ref (including the main repo HEAD). To do that, we have to resolve each HEAD. But now our HEAD is pointing at "a", and we get EISDIR due to the loose "a/b". As a result, we think there is no HEAD, and we do not update it. It now points to the bogus "a". Interestingly this case used to work, but only accidentally. Before 31824d180d (branch: fix branch renaming not updating HEADs correctly, 2017-08-24), we'd update any HEAD which we couldn't resolve. That was wrong, but it papered over the fact that we were incorrectly failing to resolve HEAD. So while the bug demonstrated by the git-symbolic-ref is quite old, the regression to "branch -m" is recent. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-06 22:42:17 +08:00
/* In reading mode, refs must eventually resolve */
if (resolve_flags & RESOLVE_REF_READING)
return NULL;
/*
* Otherwise a missing ref is OK. But the files backend
* may show errors besides ENOENT if there are
* similarly-named refs.
*/
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 22:37:01 +08:00
if (failure_errno != ENOENT &&
failure_errno != EISDIR &&
failure_errno != ENOTDIR)
return NULL;
refs_resolve_ref_unsafe: handle d/f conflicts for writes If our call to refs_read_raw_ref() fails, we check errno to see if the ref is simply missing, or if we encountered a more serious error. If it's just missing, then in "write" mode (i.e., when RESOLVE_REFS_READING is not set), this is perfectly fine. However, checking for ENOENT isn't sufficient to catch all missing-ref cases. In the filesystem backend, we may also see EISDIR when we try to resolve "a" and "a/b" exists. Likewise, we may see ENOTDIR if we try to resolve "a/b" and "a" exists. In both of those cases, we know that our resolved ref doesn't exist, but we return an error (rather than reporting the refname and returning a null sha1). This has been broken for a long time, but nobody really noticed because the next step after resolving without the READING flag is usually to lock the ref and write it. But in both of those cases, the write will fail with the same errno due to the directory/file conflict. There are two cases where we can notice this, though: 1. If we try to write "a" and there's a leftover directory already at "a", even though there is no ref "a/b". The actual write is smart enough to move the empty "a" out of the way. This is reasonably rare, if only because the writing code has to do an independent resolution before trying its write (because the actual update_ref() code handles this case fine). The notes-merge code does this, and before the fix in the prior commit t3308 erroneously expected this case to fail. 2. When resolving symbolic refs, we typically do not use the READING flag because we want to resolve even symrefs that point to unborn refs. Even if those unborn refs could not actually be written because of d/f conflicts with existing refs. You can see this by asking "git symbolic-ref" to report the target of a symref pointing past a d/f conflict. We can fix the problem by recognizing the other "missing" errnos and treating them like ENOENT. This should be safe to do even for callers who are then going to actually write the ref, because the actual writing process will fail if the d/f conflict is a real one (and t1404 checks these cases). Arguably this should be the responsibility of the files-backend to normalize all "missing ref" errors into ENOENT (since something like EISDIR may not be meaningful at all to a database backend). However other callers of refs_read_raw_ref() may actually care about the distinction; putting this into resolve_ref() is the minimal fix for now. The new tests in t1401 use git-symbolic-ref, which is the most direct way to check the resolution by itself. Interestingly we actually had a test that setup this case already, but we only used it to verify that the funny state could be overwritten, not that it could be resolved. We also add a new test in t3200, as "branch -m" was the original motivation for looking into this. What happens is this: 0. HEAD is pointing to branch "a" 1. The user asks to rename "a" to "a/b". 2. We create "a/b" and delete "a". 3. We then try to update any worktree HEADs that point to the renamed ref (including the main repo HEAD). To do that, we have to resolve each HEAD. But now our HEAD is pointing at "a", and we get EISDIR due to the loose "a/b". As a result, we think there is no HEAD, and we do not update it. It now points to the bogus "a". Interestingly this case used to work, but only accidentally. Before 31824d180d (branch: fix branch renaming not updating HEADs correctly, 2017-08-24), we'd update any HEAD which we couldn't resolve. That was wrong, but it papered over the fact that we were incorrectly failing to resolve HEAD. So while the bug demonstrated by the git-symbolic-ref is quite old, the regression to "branch -m" is recent. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-06 22:42:17 +08:00
oidclr(oid, refs->repo->hash_algo);
if (*flags & REF_BAD_NAME)
*flags |= REF_ISBROKEN;
return refname;
}
*flags |= read_flags;
if (!(read_flags & REF_ISSYMREF)) {
if (*flags & REF_BAD_NAME) {
oidclr(oid, refs->repo->hash_algo);
*flags |= REF_ISBROKEN;
}
return refname;
}
refname = sb_refname.buf;
if (resolve_flags & RESOLVE_REF_NO_RECURSE) {
oidclr(oid, refs->repo->hash_algo);
return refname;
}
if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
if (!(resolve_flags & RESOLVE_REF_ALLOW_BAD_NAME) ||
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 22:37:01 +08:00
!refname_is_safe(refname))
return NULL;
*flags |= REF_ISBROKEN | REF_BAD_NAME;
}
}
return NULL;
}
/* backend functions */
int ref_store_create_on_disk(struct ref_store *refs, int flags, struct strbuf *err)
{
return refs->be->create_on_disk(refs, flags, err);
}
int ref_store_remove_on_disk(struct ref_store *refs, struct strbuf *err)
{
return refs->be->remove_on_disk(refs, err);
}
int repo_resolve_gitlink_ref(struct repository *r,
const char *submodule, const char *refname,
struct object_id *oid)
{
struct ref_store *refs;
int flags;
refs = repo_get_submodule_ref_store(r, submodule);
if (!refs)
return -1;
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 22:37:01 +08:00
if (!refs_resolve_ref_unsafe(refs, refname, 0, oid, &flags) ||
is_null_oid(oid))
return -1;
return 0;
}
/*
* Look up a ref store by name. If that ref_store hasn't been
* registered yet, return NULL.
*/
static struct ref_store *lookup_ref_store_map(struct strmap *map,
const char *name)
{
struct strmap_entry *entry;
if (!map->map.tablesize)
/* It's initialized on demand in register_ref_store(). */
return NULL;
entry = strmap_get_entry(map, name);
return entry ? entry->value : NULL;
}
/*
* Create, record, and return a ref_store instance for the specified
* gitdir using the given ref storage format.
*/
static struct ref_store *ref_store_init(struct repository *repo,
enum ref_storage_format format,
const char *gitdir,
unsigned int flags)
{
const struct ref_storage_be *be;
struct ref_store *refs;
be = find_ref_storage_backend(format);
if (!be)
BUG("reference backend is unknown");
refs = be->init(repo, gitdir, flags);
return refs;
}
void ref_store_release(struct ref_store *ref_store)
{
ref_store->be->release(ref_store);
free(ref_store->gitdir);
}
struct ref_store *get_main_ref_store(struct repository *r)
{
if (r->refs_private)
return r->refs_private;
if (!r->gitdir)
BUG("attempting to get main_ref_store outside of repository");
r->refs_private = ref_store_init(r, r->ref_storage_format,
r->gitdir, REF_STORE_ALL_CAPS);
r->refs_private = maybe_debug_wrap_ref_store(r->gitdir, r->refs_private);
return r->refs_private;
}
/*
* Associate a ref store with a name. It is a fatal error to call this
* function twice for the same name.
*/
static void register_ref_store_map(struct strmap *map,
const char *type,
struct ref_store *refs,
const char *name)
{
if (!map->map.tablesize)
strmap_init(map);
if (strmap_put(map, name, refs))
BUG("%s ref_store '%s' initialized twice", type, name);
}
struct ref_store *repo_get_submodule_ref_store(struct repository *repo,
const char *submodule)
{
struct strbuf submodule_sb = STRBUF_INIT;
struct ref_store *refs;
char *to_free = NULL;
size_t len;
struct repository *subrepo;
if (!submodule)
return NULL;
len = strlen(submodule);
while (len && is_dir_sep(submodule[len - 1]))
len--;
if (!len)
return NULL;
if (submodule[len])
/* We need to strip off one or more trailing slashes */
submodule = to_free = xmemdupz(submodule, len);
refs = lookup_ref_store_map(&repo->submodule_ref_stores, submodule);
if (refs)
goto done;
strbuf_addstr(&submodule_sb, submodule);
if (!is_nonbare_repository_dir(&submodule_sb))
goto done;
if (submodule_to_gitdir(&submodule_sb, submodule))
goto done;
subrepo = xmalloc(sizeof(*subrepo));
if (repo_submodule_init(subrepo, repo, submodule,
null_oid())) {
free(subrepo);
goto done;
}
refs = ref_store_init(subrepo, subrepo->ref_storage_format,
submodule_sb.buf,
REF_STORE_READ | REF_STORE_ODB);
register_ref_store_map(&repo->submodule_ref_stores, "submodule",
refs, submodule);
done:
strbuf_release(&submodule_sb);
free(to_free);
return refs;
}
struct ref_store *get_worktree_ref_store(const struct worktree *wt)
{
struct ref_store *refs;
const char *id;
if (wt->is_current)
return get_main_ref_store(wt->repo);
id = wt->id ? wt->id : "/";
refs = lookup_ref_store_map(&wt->repo->worktree_ref_stores, id);
if (refs)
return refs;
if (wt->id) {
struct strbuf common_path = STRBUF_INIT;
strbuf_git_common_path(&common_path, wt->repo,
"worktrees/%s", wt->id);
refs = ref_store_init(wt->repo, wt->repo->ref_storage_format,
common_path.buf, REF_STORE_ALL_CAPS);
strbuf_release(&common_path);
} else {
refs = ref_store_init(wt->repo, wt->repo->ref_storage_format,
wt->repo->commondir, REF_STORE_ALL_CAPS);
}
if (refs)
register_ref_store_map(&wt->repo->worktree_ref_stores,
"worktree", refs, id);
return refs;
}
void base_ref_store_init(struct ref_store *refs, struct repository *repo,
const char *path, const struct ref_storage_be *be)
{
refs->be = be;
refs->repo = repo;
refs->gitdir = xstrdup(path);
}
/* backend functions */
int refs_pack_refs(struct ref_store *refs, struct pack_refs_opts *opts)
{
return refs->be->pack_refs(refs, opts);
}
int peel_iterated_oid(struct repository *r, const struct object_id *base, struct object_id *peeled)
{
refs: switch peel_ref() to peel_iterated_oid() The peel_ref() interface is confusing and error-prone: - it's typically used by ref iteration callbacks that have both a refname and oid. But since they pass only the refname, we may load the ref value from the filesystem again. This is inefficient, but also means we are open to a race if somebody simultaneously updates the ref. E.g., this: int some_ref_cb(const char *refname, const struct object_id *oid, ...) { if (!peel_ref(refname, &peeled)) printf("%s peels to %s", oid_to_hex(oid), oid_to_hex(&peeled); } could print nonsense. It is correct to say "refname peels to..." (you may see the "before" value or the "after" value, either of which is consistent), but mentioning both oids may be mixing before/after values. Worse, whether this is possible depends on whether the optimization to read from the current iterator value kicks in. So it is actually not possible with: for_each_ref(some_ref_cb); but it _is_ possible with: head_ref(some_ref_cb); which does not use the iterator mechanism (though in practice, HEAD should never peel to anything, so this may not be triggerable). - it must take a fully-qualified refname for the read_ref_full() code path to work. Yet we routinely pass it partial refnames from callbacks to for_each_tag_ref(), etc. This happens to work when iterating because there we do not call read_ref_full() at all, and only use the passed refname to check if it is the same as the iterator. But the requirements for the function parameters are quite unclear. Instead of taking a refname, let's instead take an oid. That fixes both problems. It's a little funny for a "ref" function not to involve refs at all. The key thing is that it's optimizing under the hood based on having access to the ref iterator. So let's change the name to make it clear why you'd want this function versus just peel_object(). There are two other directions I considered but rejected: - we could pass the peel information into the each_ref_fn callback. However, we don't know if the caller actually wants it or not. For packed-refs, providing it is essentially free. But for loose refs, we actually have to peel the object, which would be wasteful in most cases. We could likewise pass in a flag to the callback indicating whether the peeled information is known, but that complicates those callbacks, as they then have to decide whether to manually peel themselves. Plus it requires changing the interface of every callback, whether they care about peeling or not, and there are many of them. - we could make a function to return the peeled value of the current iterated ref (computing it if necessary), and BUG() otherwise. I.e.: int peel_current_iterated_ref(struct object_id *out); Each of the current callers is an each_ref_fn callback, so they'd mostly be happy. But: - we use those callbacks with functions like head_ref(), which do not use the iteration code. So we'd need to handle the fallback case there, anyway. - it's possible that a caller would want to call into generic code that sometimes is used during iteration and sometimes not. This encapsulates the logic to do the fast thing when possible, and fallback when necessary. The implementation is mostly obvious, but I want to call out a few things in the patch: - the test-tool coverage for peel_ref() is now meaningless, as it all collapses to a single peel_object() call (arguably they were pretty uninteresting before; the tricky part of that function is the fast-path we see during iteration, but these calls didn't trigger that). I've just dropped it entirely, though note that some other tests relied on the tags we created; I've moved that creation to the tests where it matters. - we no longer need to take a ref_store parameter, since we'd never look up a ref now. We do still rely on a global "current iterator" variable which _could_ be kept per-ref-store. But in practice this is only useful if there are multiple recursive iterations, at which point the more appropriate solution is probably a stack of iterators. No caller used the actual ref-store parameter anyway (they all call the wrapper that passes the_repository). - the original only kicked in the optimization when the "refname" pointer matched (i.e., not string comparison). We do likewise with the "oid" parameter here, but fall back to doing an actual oideq() call. This in theory lets us kick in the optimization more often, though in practice no current caller cares. It should never be wrong, though (peeling is a property of an object, so two refs pointing to the same object would peel identically). - the original took care not to touch the peeled out-parameter unless we found something to put in it. But no caller cares about this, and anyway, it is enforced by peel_object() itself (and even in the optimized iterator case, that's where we eventually end up). We can shorten the code and avoid an extra copy by just passing the out-parameter through the stack. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21 03:44:43 +08:00
if (current_ref_iter &&
(current_ref_iter->oid == base ||
oideq(current_ref_iter->oid, base)))
return ref_iterator_peel(current_ref_iter, peeled);
return peel_object(r, base, peeled) ? -1 : 0;
}
int refs_update_symref(struct ref_store *refs, const char *ref,
const char *target, const char *logmsg)
{
struct ref_transaction *transaction;
struct strbuf err = STRBUF_INIT;
int ret = 0;
reflog: cleanse messages in the refs.c layer Regarding reflog messages: - We expect that a reflog message consists of a single line. The file format used by the files backend may add a LF after the message as a delimiter, and output by commands like "git log -g" may complete such an incomplete line by adding a LF at the end, but philosophically, the terminating LF is not a part of the message. - We however allow callers of refs API to supply a random sequence of NUL terminated bytes. We cleanse caller-supplied message by squashing a run of whitespaces into a SP, and by trimming trailing whitespace, before storing the message. This is how we tolerate, instead of erring out, a message with LF in it (be it at the end, in the middle, or both). Currently, the cleansing of the reflog message is done by the files backend, before the log is written out. This is sufficient with the current code, as that is the only backend that writes reflogs. But new backends can be added that write reflogs, and we'd want the resulting log message we would read out of "log -g" the same no matter what backend is used, and moving the code to do so to the generic layer is a way to do so. An added benefit is that the "cleansing" function could be updated later, independent from individual backends, to e.g. allow multi-line log messages if we wanted to, and when that happens, it would help a lot to ensure we covered all bases if the cleansing function (which would be updated) is called from the generic layer. Side note: I am not interested in supporting multi-line reflog messages right at the moment (nobody is asking for it), but I envision that instead of the "squash a run of whitespaces into a SP and rtrim" cleansing, we can %urlencode problematic bytes in the message *AND* append a SP at the end, when a new version of Git that supports multi-line and/or verbatim reflog messages writes a reflog record. The reading side can detect the presense of SP at the end (which should have been rtrimmed out if it were written by existing versions of Git) as a signal that decoding %urlencode recovers the original reflog message. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-11 01:19:53 +08:00
transaction = ref_store_transaction_begin(refs, &err);
if (!transaction ||
ref_transaction_update(transaction, ref, NULL, NULL,
target, NULL, REF_NO_DEREF,
logmsg, &err) ||
ref_transaction_commit(transaction, &err)) {
ret = error("%s", err.buf);
}
strbuf_release(&err);
if (transaction)
ref_transaction_free(transaction);
return ret;
}
int ref_update_reject_duplicates(struct string_list *refnames,
struct strbuf *err)
{
size_t i, n = refnames->nr;
assert(err);
for (i = 1; i < n; i++) {
int cmp = strcmp(refnames->items[i - 1].string,
refnames->items[i].string);
if (!cmp) {
strbuf_addf(err,
_("multiple updates for ref '%s' not allowed"),
refnames->items[i].string);
return 1;
} else if (cmp > 0) {
BUG("ref_update_reject_duplicates() received unsorted list");
}
}
return 0;
}
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:56:14 +08:00
static int run_transaction_hook(struct ref_transaction *transaction,
const char *state)
{
struct child_process proc = CHILD_PROCESS_INIT;
struct strbuf buf = STRBUF_INIT;
refs: remove lookup cache for reference-transaction hook When adding the reference-transaction hook, there were concerns about the performance impact it may have on setups which do not make use of the new hook at all. After all, it gets executed every time a reftx is prepared, committed or aborted, which linearly scales with the number of reference-transactions created per session. And as there are code paths like `git push` which create a new transaction for each reference to be updated, this may translate to calling `find_hook()` quite a lot. To address this concern, a cache was added with the intention to not repeatedly do negative hook lookups. Turns out this cache caused a regression, which was fixed via e5256c82e5 (refs: fix interleaving hook calls with reference-transaction hook, 2020-08-07). In the process of discussing the fix, we realized that the cache doesn't really help even in the negative-lookup case. While performance tests added to benchmark this did show a slight improvement in the 1% range, this really doesn't warrent having a cache. Furthermore, it's quite flaky, too. E.g. running it twice in succession produces the following results: Test master pks-reftx-hook-remove-cache -------------------------------------------------------------------------- 1400.2: update-ref 2.79(2.16+0.74) 2.73(2.12+0.71) -2.2% 1400.3: update-ref --stdin 0.22(0.08+0.14) 0.21(0.08+0.12) -4.5% Test master pks-reftx-hook-remove-cache -------------------------------------------------------------------------- 1400.2: update-ref 2.70(2.09+0.72) 2.74(2.13+0.71) +1.5% 1400.3: update-ref --stdin 0.21(0.10+0.10) 0.21(0.08+0.13) +0.0% One case notably absent from those benchmarks is a single executable searching for the hook hundreds of times, which is exactly the case for which the negative cache was added. p1400.2 will spawn a new update-ref for each transaction and p1400.3 only has a single reference-transaction for all reference updates. So this commit adds a third benchmark, which performs an non-atomic push of a thousand references. This will create a new reference transaction per reference. But even for this case, the negative cache doesn't consistently improve performance: Test master pks-reftx-hook-remove-cache -------------------------------------------------------------------------- 1400.4: nonatomic push 6.63(6.50+0.13) 6.81(6.67+0.14) +2.7% 1400.4: nonatomic push 6.35(6.21+0.14) 6.39(6.23+0.16) +0.6% 1400.4: nonatomic push 6.43(6.31+0.13) 6.42(6.28+0.15) -0.2% So let's just remove the cache altogether to simplify the code. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-25 18:35:24 +08:00
const char *hook;
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:56:14 +08:00
int ret = 0, i;
hook = find_hook(transaction->ref_store->repo, "reference-transaction");
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:56:14 +08:00
if (!hook)
return ret;
strvec_pushl(&proc.args, hook, state, NULL);
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:56:14 +08:00
proc.in = -1;
proc.stdout_to_stderr = 1;
proc.trace2_hook_name = "reference-transaction";
ret = start_command(&proc);
if (ret)
return ret;
sigchain_push(SIGPIPE, SIG_IGN);
for (i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];
strbuf_reset(&buf);
if (!(update->flags & REF_HAVE_OLD))
strbuf_addf(&buf, "%s ", oid_to_hex(null_oid()));
else if (update->old_target)
strbuf_addf(&buf, "ref:%s ", update->old_target);
else
strbuf_addf(&buf, "%s ", oid_to_hex(&update->old_oid));
if (!(update->flags & REF_HAVE_NEW))
strbuf_addf(&buf, "%s ", oid_to_hex(null_oid()));
else if (update->new_target)
strbuf_addf(&buf, "ref:%s ", update->new_target);
else
strbuf_addf(&buf, "%s ", oid_to_hex(&update->new_oid));
strbuf_addf(&buf, "%s\n", update->refname);
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:56:14 +08:00
if (write_in_full(proc.in, buf.buf, buf.len) < 0) {
if (errno != EPIPE) {
/* Don't leak errno outside this API */
errno = 0;
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:56:14 +08:00
ret = -1;
}
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:56:14 +08:00
break;
}
}
close(proc.in);
sigchain_pop(SIGPIPE);
strbuf_release(&buf);
ret |= finish_command(&proc);
return ret;
}
int ref_transaction_prepare(struct ref_transaction *transaction,
struct strbuf *err)
{
struct ref_store *refs = transaction->ref_store;
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:56:14 +08:00
int ret;
switch (transaction->state) {
case REF_TRANSACTION_OPEN:
/* Good. */
break;
case REF_TRANSACTION_PREPARED:
BUG("prepare called twice on reference transaction");
break;
case REF_TRANSACTION_CLOSED:
BUG("prepare called on a closed reference transaction");
break;
default:
BUG("unexpected reference transaction state");
break;
}
if (refs->repo->objects->odb->disable_ref_updates) {
strbuf_addstr(err,
_("ref updates forbidden inside quarantine environment"));
return -1;
}
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:56:14 +08:00
ret = refs->be->transaction_prepare(refs, transaction, err);
if (ret)
return ret;
ret = run_transaction_hook(transaction, "prepared");
if (ret) {
ref_transaction_abort(transaction, err);
die(_("ref updates aborted by hook"));
}
return 0;
}
int ref_transaction_abort(struct ref_transaction *transaction,
struct strbuf *err)
{
struct ref_store *refs = transaction->ref_store;
int ret = 0;
switch (transaction->state) {
case REF_TRANSACTION_OPEN:
/* No need to abort explicitly. */
break;
case REF_TRANSACTION_PREPARED:
ret = refs->be->transaction_abort(refs, transaction, err);
break;
case REF_TRANSACTION_CLOSED:
BUG("abort called on a closed reference transaction");
break;
default:
BUG("unexpected reference transaction state");
break;
}
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:56:14 +08:00
run_transaction_hook(transaction, "aborted");
ref_transaction_free(transaction);
return ret;
}
int ref_transaction_commit(struct ref_transaction *transaction,
struct strbuf *err)
{
struct ref_store *refs = transaction->ref_store;
int ret;
switch (transaction->state) {
case REF_TRANSACTION_OPEN:
/* Need to prepare first. */
ret = ref_transaction_prepare(transaction, err);
if (ret)
return ret;
break;
case REF_TRANSACTION_PREPARED:
/* Fall through to finish. */
break;
case REF_TRANSACTION_CLOSED:
BUG("commit called on a closed reference transaction");
break;
default:
BUG("unexpected reference transaction state");
break;
}
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:56:14 +08:00
ret = refs->be->transaction_finish(refs, transaction, err);
if (!ret)
run_transaction_hook(transaction, "committed");
return ret;
}
int refs_verify_refname_available(struct ref_store *refs,
const char *refname,
const struct string_list *extras,
const struct string_list *skip,
struct strbuf *err)
{
const char *slash;
const char *extra_refname;
struct strbuf dirname = STRBUF_INIT;
struct strbuf referent = STRBUF_INIT;
struct object_id oid;
unsigned int type;
struct ref_iterator *iter;
int ok;
int ret = -1;
/*
* For the sake of comments in this function, suppose that
* refname is "refs/foo/bar".
*/
assert(err);
strbuf_grow(&dirname, strlen(refname) + 1);
for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
/*
* Just saying "Is a directory" when we e.g. can't
* lock some multi-level ref isn't very informative,
* the user won't be told *what* is a directory, so
* let's not use strerror() below.
*/
int ignore_errno;
/* Expand dirname to the new prefix, not including the trailing slash: */
strbuf_add(&dirname, refname + dirname.len, slash - refname - dirname.len);
/*
* We are still at a leading dir of the refname (e.g.,
* "refs/foo"; if there is a reference with that name,
* it is a conflict, *unless* it is in skip.
*/
if (skip && string_list_has_string(skip, dirname.buf))
continue;
if (!refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
&type, &ignore_errno)) {
strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
dirname.buf, refname);
goto cleanup;
}
if (extras && string_list_has_string(extras, dirname.buf)) {
strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
refname, dirname.buf);
goto cleanup;
}
}
/*
* We are at the leaf of our refname (e.g., "refs/foo/bar").
* There is no point in searching for a reference with that
* name, because a refname isn't considered to conflict with
* itself. But we still need to check for references whose
* names are in the "refs/foo/bar/" namespace, because they
* *do* conflict.
*/
strbuf_addstr(&dirname, refname + dirname.len);
strbuf_addch(&dirname, '/');
iter = refs_ref_iterator_begin(refs, dirname.buf, NULL, 0,
DO_FOR_EACH_INCLUDE_BROKEN);
while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
if (skip &&
string_list_has_string(skip, iter->refname))
continue;
strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
iter->refname, refname);
ref_iterator_abort(iter);
goto cleanup;
}
if (ok != ITER_DONE)
BUG("error while iterating over references");
extra_refname = find_descendant_ref(dirname.buf, extras, skip);
if (extra_refname)
strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
refname, extra_refname);
else
ret = 0;
cleanup:
strbuf_release(&referent);
strbuf_release(&dirname);
return ret;
}
struct do_for_each_reflog_help {
each_reflog_fn *fn;
void *cb_data;
};
static int do_for_each_reflog_helper(const char *refname,
const char *referent UNUSED,
const struct object_id *oid UNUSED,
int flags UNUSED,
void *cb_data)
{
struct do_for_each_reflog_help *hp = cb_data;
return hp->fn(refname, hp->cb_data);
}
int refs_for_each_reflog(struct ref_store *refs, each_reflog_fn fn, void *cb_data)
{
struct ref_iterator *iter;
struct do_for_each_reflog_help hp = { fn, cb_data };
iter = refs->be->reflog_iterator_begin(refs);
return do_for_each_ref_iterator(iter, do_for_each_reflog_helper, &hp);
}
int refs_for_each_reflog_ent_reverse(struct ref_store *refs,
const char *refname,
each_reflog_ent_fn fn,
void *cb_data)
{
return refs->be->for_each_reflog_ent_reverse(refs, refname,
fn, cb_data);
}
int refs_for_each_reflog_ent(struct ref_store *refs, const char *refname,
each_reflog_ent_fn fn, void *cb_data)
{
return refs->be->for_each_reflog_ent(refs, refname, fn, cb_data);
}
int refs_reflog_exists(struct ref_store *refs, const char *refname)
{
return refs->be->reflog_exists(refs, refname);
}
int refs_create_reflog(struct ref_store *refs, const char *refname,
struct strbuf *err)
{
return refs->be->create_reflog(refs, refname, err);
}
int refs_delete_reflog(struct ref_store *refs, const char *refname)
{
return refs->be->delete_reflog(refs, refname);
}
int refs_reflog_expire(struct ref_store *refs,
const char *refname,
unsigned int flags,
reflog_expiry_prepare_fn prepare_fn,
reflog_expiry_should_prune_fn should_prune_fn,
reflog_expiry_cleanup_fn cleanup_fn,
void *policy_cb_data)
{
return refs->be->reflog_expire(refs, refname, flags,
prepare_fn, should_prune_fn,
cleanup_fn, policy_cb_data);
}
int initial_ref_transaction_commit(struct ref_transaction *transaction,
struct strbuf *err)
{
struct ref_store *refs = transaction->ref_store;
return refs->be->initial_transaction_commit(refs, transaction, err);
}
void ref_transaction_for_each_queued_update(struct ref_transaction *transaction,
ref_transaction_for_each_queued_update_fn cb,
void *cb_data)
{
int i;
for (i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];
cb(update->refname,
(update->flags & REF_HAVE_OLD) ? &update->old_oid : NULL,
(update->flags & REF_HAVE_NEW) ? &update->new_oid : NULL,
cb_data);
}
}
reflog: cleanse messages in the refs.c layer Regarding reflog messages: - We expect that a reflog message consists of a single line. The file format used by the files backend may add a LF after the message as a delimiter, and output by commands like "git log -g" may complete such an incomplete line by adding a LF at the end, but philosophically, the terminating LF is not a part of the message. - We however allow callers of refs API to supply a random sequence of NUL terminated bytes. We cleanse caller-supplied message by squashing a run of whitespaces into a SP, and by trimming trailing whitespace, before storing the message. This is how we tolerate, instead of erring out, a message with LF in it (be it at the end, in the middle, or both). Currently, the cleansing of the reflog message is done by the files backend, before the log is written out. This is sufficient with the current code, as that is the only backend that writes reflogs. But new backends can be added that write reflogs, and we'd want the resulting log message we would read out of "log -g" the same no matter what backend is used, and moving the code to do so to the generic layer is a way to do so. An added benefit is that the "cleansing" function could be updated later, independent from individual backends, to e.g. allow multi-line log messages if we wanted to, and when that happens, it would help a lot to ensure we covered all bases if the cleansing function (which would be updated) is called from the generic layer. Side note: I am not interested in supporting multi-line reflog messages right at the moment (nobody is asking for it), but I envision that instead of the "squash a run of whitespaces into a SP and rtrim" cleansing, we can %urlencode problematic bytes in the message *AND* append a SP at the end, when a new version of Git that supports multi-line and/or verbatim reflog messages writes a reflog record. The reading side can detect the presense of SP at the end (which should have been rtrimmed out if it were written by existing versions of Git) as a signal that decoding %urlencode recovers the original reflog message. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-11 01:19:53 +08:00
int refs_delete_refs(struct ref_store *refs, const char *logmsg,
struct string_list *refnames, unsigned int flags)
{
struct ref_transaction *transaction;
struct strbuf err = STRBUF_INIT;
struct string_list_item *item;
int ret = 0, failures = 0;
reflog: cleanse messages in the refs.c layer Regarding reflog messages: - We expect that a reflog message consists of a single line. The file format used by the files backend may add a LF after the message as a delimiter, and output by commands like "git log -g" may complete such an incomplete line by adding a LF at the end, but philosophically, the terminating LF is not a part of the message. - We however allow callers of refs API to supply a random sequence of NUL terminated bytes. We cleanse caller-supplied message by squashing a run of whitespaces into a SP, and by trimming trailing whitespace, before storing the message. This is how we tolerate, instead of erring out, a message with LF in it (be it at the end, in the middle, or both). Currently, the cleansing of the reflog message is done by the files backend, before the log is written out. This is sufficient with the current code, as that is the only backend that writes reflogs. But new backends can be added that write reflogs, and we'd want the resulting log message we would read out of "log -g" the same no matter what backend is used, and moving the code to do so to the generic layer is a way to do so. An added benefit is that the "cleansing" function could be updated later, independent from individual backends, to e.g. allow multi-line log messages if we wanted to, and when that happens, it would help a lot to ensure we covered all bases if the cleansing function (which would be updated) is called from the generic layer. Side note: I am not interested in supporting multi-line reflog messages right at the moment (nobody is asking for it), but I envision that instead of the "squash a run of whitespaces into a SP and rtrim" cleansing, we can %urlencode problematic bytes in the message *AND* append a SP at the end, when a new version of Git that supports multi-line and/or verbatim reflog messages writes a reflog record. The reading side can detect the presense of SP at the end (which should have been rtrimmed out if it were written by existing versions of Git) as a signal that decoding %urlencode recovers the original reflog message. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-11 01:19:53 +08:00
char *msg;
if (!refnames->nr)
return 0;
reflog: cleanse messages in the refs.c layer Regarding reflog messages: - We expect that a reflog message consists of a single line. The file format used by the files backend may add a LF after the message as a delimiter, and output by commands like "git log -g" may complete such an incomplete line by adding a LF at the end, but philosophically, the terminating LF is not a part of the message. - We however allow callers of refs API to supply a random sequence of NUL terminated bytes. We cleanse caller-supplied message by squashing a run of whitespaces into a SP, and by trimming trailing whitespace, before storing the message. This is how we tolerate, instead of erring out, a message with LF in it (be it at the end, in the middle, or both). Currently, the cleansing of the reflog message is done by the files backend, before the log is written out. This is sufficient with the current code, as that is the only backend that writes reflogs. But new backends can be added that write reflogs, and we'd want the resulting log message we would read out of "log -g" the same no matter what backend is used, and moving the code to do so to the generic layer is a way to do so. An added benefit is that the "cleansing" function could be updated later, independent from individual backends, to e.g. allow multi-line log messages if we wanted to, and when that happens, it would help a lot to ensure we covered all bases if the cleansing function (which would be updated) is called from the generic layer. Side note: I am not interested in supporting multi-line reflog messages right at the moment (nobody is asking for it), but I envision that instead of the "squash a run of whitespaces into a SP and rtrim" cleansing, we can %urlencode problematic bytes in the message *AND* append a SP at the end, when a new version of Git that supports multi-line and/or verbatim reflog messages writes a reflog record. The reading side can detect the presense of SP at the end (which should have been rtrimmed out if it were written by existing versions of Git) as a signal that decoding %urlencode recovers the original reflog message. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-11 01:19:53 +08:00
msg = normalize_reflog_message(logmsg);
/*
* Since we don't check the references' old_oids, the
* individual updates can't fail, so we can pack all of the
* updates into a single transaction.
*/
transaction = ref_store_transaction_begin(refs, &err);
if (!transaction) {
ret = error("%s", err.buf);
goto out;
}
for_each_string_list_item(item, refnames) {
ret = ref_transaction_delete(transaction, item->string,
NULL, NULL, flags, msg, &err);
if (ret) {
warning(_("could not delete reference %s: %s"),
item->string, err.buf);
strbuf_reset(&err);
failures = 1;
}
}
ret = ref_transaction_commit(transaction, &err);
if (ret) {
if (refnames->nr == 1)
error(_("could not delete reference %s: %s"),
refnames->items[0].string, err.buf);
else
error(_("could not delete references: %s"), err.buf);
}
out:
if (!ret && failures)
ret = -1;
ref_transaction_free(transaction);
strbuf_release(&err);
reflog: cleanse messages in the refs.c layer Regarding reflog messages: - We expect that a reflog message consists of a single line. The file format used by the files backend may add a LF after the message as a delimiter, and output by commands like "git log -g" may complete such an incomplete line by adding a LF at the end, but philosophically, the terminating LF is not a part of the message. - We however allow callers of refs API to supply a random sequence of NUL terminated bytes. We cleanse caller-supplied message by squashing a run of whitespaces into a SP, and by trimming trailing whitespace, before storing the message. This is how we tolerate, instead of erring out, a message with LF in it (be it at the end, in the middle, or both). Currently, the cleansing of the reflog message is done by the files backend, before the log is written out. This is sufficient with the current code, as that is the only backend that writes reflogs. But new backends can be added that write reflogs, and we'd want the resulting log message we would read out of "log -g" the same no matter what backend is used, and moving the code to do so to the generic layer is a way to do so. An added benefit is that the "cleansing" function could be updated later, independent from individual backends, to e.g. allow multi-line log messages if we wanted to, and when that happens, it would help a lot to ensure we covered all bases if the cleansing function (which would be updated) is called from the generic layer. Side note: I am not interested in supporting multi-line reflog messages right at the moment (nobody is asking for it), but I envision that instead of the "squash a run of whitespaces into a SP and rtrim" cleansing, we can %urlencode problematic bytes in the message *AND* append a SP at the end, when a new version of Git that supports multi-line and/or verbatim reflog messages writes a reflog record. The reading side can detect the presense of SP at the end (which should have been rtrimmed out if it were written by existing versions of Git) as a signal that decoding %urlencode recovers the original reflog message. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-11 01:19:53 +08:00
free(msg);
return ret;
}
int refs_rename_ref(struct ref_store *refs, const char *oldref,
const char *newref, const char *logmsg)
{
reflog: cleanse messages in the refs.c layer Regarding reflog messages: - We expect that a reflog message consists of a single line. The file format used by the files backend may add a LF after the message as a delimiter, and output by commands like "git log -g" may complete such an incomplete line by adding a LF at the end, but philosophically, the terminating LF is not a part of the message. - We however allow callers of refs API to supply a random sequence of NUL terminated bytes. We cleanse caller-supplied message by squashing a run of whitespaces into a SP, and by trimming trailing whitespace, before storing the message. This is how we tolerate, instead of erring out, a message with LF in it (be it at the end, in the middle, or both). Currently, the cleansing of the reflog message is done by the files backend, before the log is written out. This is sufficient with the current code, as that is the only backend that writes reflogs. But new backends can be added that write reflogs, and we'd want the resulting log message we would read out of "log -g" the same no matter what backend is used, and moving the code to do so to the generic layer is a way to do so. An added benefit is that the "cleansing" function could be updated later, independent from individual backends, to e.g. allow multi-line log messages if we wanted to, and when that happens, it would help a lot to ensure we covered all bases if the cleansing function (which would be updated) is called from the generic layer. Side note: I am not interested in supporting multi-line reflog messages right at the moment (nobody is asking for it), but I envision that instead of the "squash a run of whitespaces into a SP and rtrim" cleansing, we can %urlencode problematic bytes in the message *AND* append a SP at the end, when a new version of Git that supports multi-line and/or verbatim reflog messages writes a reflog record. The reading side can detect the presense of SP at the end (which should have been rtrimmed out if it were written by existing versions of Git) as a signal that decoding %urlencode recovers the original reflog message. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-11 01:19:53 +08:00
char *msg;
int retval;
msg = normalize_reflog_message(logmsg);
retval = refs->be->rename_ref(refs, oldref, newref, msg);
free(msg);
return retval;
}
branch: add a --copy (-c) option to go with --move (-m) Add the ability to --copy a branch and its reflog and configuration, this uses the same underlying machinery as the --move (-m) option except the reflog and configuration is copied instead of being moved. This is useful for e.g. copying a topic branch to a new version, e.g. work to work-2 after submitting the work topic to the list, while preserving all the tracking info and other configuration that goes with the branch, and unlike --move keeping the other already-submitted branch around for reference. Like --move, when the source branch is the currently checked out branch the HEAD is moved to the destination branch. In the case of --move we don't really have a choice (other than remaining on a detached HEAD) and in order to keep the functionality consistent, we are doing it in similar way for --copy too. The most common usage of this feature is expected to be moving to a new topic branch which is a copy of the current one, in that case moving to the target branch is what the user wants, and doesn't unexpectedly behave differently than --move would. One outstanding caveat of this implementation is that: git checkout maint && git checkout master && git branch -c topic && git checkout - Will check out 'maint' instead of 'master'. This is because the @{-N} feature (or its -1 shorthand "-") relies on HEAD reflogs created by the checkout command, so in this case we'll checkout maint instead of master, as the user might expect. What to do about that is left to a future change. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Sahil Dua <sahildua2305@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-19 05:19:16 +08:00
int refs_copy_existing_ref(struct ref_store *refs, const char *oldref,
const char *newref, const char *logmsg)
{
reflog: cleanse messages in the refs.c layer Regarding reflog messages: - We expect that a reflog message consists of a single line. The file format used by the files backend may add a LF after the message as a delimiter, and output by commands like "git log -g" may complete such an incomplete line by adding a LF at the end, but philosophically, the terminating LF is not a part of the message. - We however allow callers of refs API to supply a random sequence of NUL terminated bytes. We cleanse caller-supplied message by squashing a run of whitespaces into a SP, and by trimming trailing whitespace, before storing the message. This is how we tolerate, instead of erring out, a message with LF in it (be it at the end, in the middle, or both). Currently, the cleansing of the reflog message is done by the files backend, before the log is written out. This is sufficient with the current code, as that is the only backend that writes reflogs. But new backends can be added that write reflogs, and we'd want the resulting log message we would read out of "log -g" the same no matter what backend is used, and moving the code to do so to the generic layer is a way to do so. An added benefit is that the "cleansing" function could be updated later, independent from individual backends, to e.g. allow multi-line log messages if we wanted to, and when that happens, it would help a lot to ensure we covered all bases if the cleansing function (which would be updated) is called from the generic layer. Side note: I am not interested in supporting multi-line reflog messages right at the moment (nobody is asking for it), but I envision that instead of the "squash a run of whitespaces into a SP and rtrim" cleansing, we can %urlencode problematic bytes in the message *AND* append a SP at the end, when a new version of Git that supports multi-line and/or verbatim reflog messages writes a reflog record. The reading side can detect the presense of SP at the end (which should have been rtrimmed out if it were written by existing versions of Git) as a signal that decoding %urlencode recovers the original reflog message. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-11 01:19:53 +08:00
char *msg;
int retval;
msg = normalize_reflog_message(logmsg);
retval = refs->be->copy_ref(refs, oldref, newref, msg);
free(msg);
return retval;
branch: add a --copy (-c) option to go with --move (-m) Add the ability to --copy a branch and its reflog and configuration, this uses the same underlying machinery as the --move (-m) option except the reflog and configuration is copied instead of being moved. This is useful for e.g. copying a topic branch to a new version, e.g. work to work-2 after submitting the work topic to the list, while preserving all the tracking info and other configuration that goes with the branch, and unlike --move keeping the other already-submitted branch around for reference. Like --move, when the source branch is the currently checked out branch the HEAD is moved to the destination branch. In the case of --move we don't really have a choice (other than remaining on a detached HEAD) and in order to keep the functionality consistent, we are doing it in similar way for --copy too. The most common usage of this feature is expected to be moving to a new topic branch which is a copy of the current one, in that case moving to the target branch is what the user wants, and doesn't unexpectedly behave differently than --move would. One outstanding caveat of this implementation is that: git checkout maint && git checkout master && git branch -c topic && git checkout - Will check out 'maint' instead of 'master'. This is because the @{-N} feature (or its -1 shorthand "-") relies on HEAD reflogs created by the checkout command, so in this case we'll checkout maint instead of master, as the user might expect. What to do about that is left to a future change. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Sahil Dua <sahildua2305@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-19 05:19:16 +08:00
}
const char *ref_update_original_update_refname(struct ref_update *update)
{
while (update->parent_update)
update = update->parent_update;
return update->refname;
}
int ref_update_has_null_new_value(struct ref_update *update)
{
return !update->new_target && is_null_oid(&update->new_oid);
}
int ref_update_check_old_target(const char *referent, struct ref_update *update,
struct strbuf *err)
{
if (!update->old_target)
BUG("called without old_target set");
if (!strcmp(referent, update->old_target))
return 0;
if (!strcmp(referent, ""))
strbuf_addf(err, "verifying symref target: '%s': "
"reference is missing but expected %s",
ref_update_original_update_refname(update),
update->old_target);
else
strbuf_addf(err, "verifying symref target: '%s': "
"is at %s but expected %s",
ref_update_original_update_refname(update),
referent, update->old_target);
return -1;
}
refs: implement logic to migrate between ref storage formats With the introduction of the new "reftable" backend, users may want to migrate repositories between the backends without having to recreate the whole repository. Add the logic to do so. The implementation is generic and works with arbitrary ref storage formats so that a backend does not need to implement any migration logic. It does have a few limitations though: - We do not migrate repositories with worktrees, because worktrees have separate ref storages. It makes the overall affair more complex if we have to migrate multiple storages at once. - We do not migrate reflogs, because we have no interfaces to write many reflog entries. - We do not lock the repository for concurrent access, and thus concurrent writes may end up with weird in-between states. There is no way to fully lock the "files" backend for writes due to its format, and thus we punt on this topic altogether and defer to the user to avoid those from happening. In other words, this version is a minimum viable product for migrating a repository's ref storage format. It works alright for bare repos, which often have neither worktrees nor reflogs. But it will not work for many other repositories without some preparations. These limitations are not set into stone though, and ideally we will eventually address them over time. The logic is not yet used by anything, and thus there are no tests for it. Those will be added in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06 13:29:45 +08:00
struct migration_data {
struct ref_store *old_refs;
struct ref_transaction *transaction;
struct strbuf *errbuf;
};
static int migrate_one_ref(const char *refname, const char *referent UNUSED, const struct object_id *oid,
refs: implement logic to migrate between ref storage formats With the introduction of the new "reftable" backend, users may want to migrate repositories between the backends without having to recreate the whole repository. Add the logic to do so. The implementation is generic and works with arbitrary ref storage formats so that a backend does not need to implement any migration logic. It does have a few limitations though: - We do not migrate repositories with worktrees, because worktrees have separate ref storages. It makes the overall affair more complex if we have to migrate multiple storages at once. - We do not migrate reflogs, because we have no interfaces to write many reflog entries. - We do not lock the repository for concurrent access, and thus concurrent writes may end up with weird in-between states. There is no way to fully lock the "files" backend for writes due to its format, and thus we punt on this topic altogether and defer to the user to avoid those from happening. In other words, this version is a minimum viable product for migrating a repository's ref storage format. It works alright for bare repos, which often have neither worktrees nor reflogs. But it will not work for many other repositories without some preparations. These limitations are not set into stone though, and ideally we will eventually address them over time. The logic is not yet used by anything, and thus there are no tests for it. Those will be added in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06 13:29:45 +08:00
int flags, void *cb_data)
{
struct migration_data *data = cb_data;
struct strbuf symref_target = STRBUF_INIT;
int ret;
if (flags & REF_ISSYMREF) {
ret = refs_read_symbolic_ref(data->old_refs, refname, &symref_target);
if (ret < 0)
goto done;
ret = ref_transaction_update(data->transaction, refname, NULL, null_oid(),
symref_target.buf, NULL,
REF_SKIP_CREATE_REFLOG | REF_NO_DEREF, NULL, data->errbuf);
if (ret < 0)
goto done;
} else {
ret = ref_transaction_create(data->transaction, refname, oid, NULL,
refs: implement logic to migrate between ref storage formats With the introduction of the new "reftable" backend, users may want to migrate repositories between the backends without having to recreate the whole repository. Add the logic to do so. The implementation is generic and works with arbitrary ref storage formats so that a backend does not need to implement any migration logic. It does have a few limitations though: - We do not migrate repositories with worktrees, because worktrees have separate ref storages. It makes the overall affair more complex if we have to migrate multiple storages at once. - We do not migrate reflogs, because we have no interfaces to write many reflog entries. - We do not lock the repository for concurrent access, and thus concurrent writes may end up with weird in-between states. There is no way to fully lock the "files" backend for writes due to its format, and thus we punt on this topic altogether and defer to the user to avoid those from happening. In other words, this version is a minimum viable product for migrating a repository's ref storage format. It works alright for bare repos, which often have neither worktrees nor reflogs. But it will not work for many other repositories without some preparations. These limitations are not set into stone though, and ideally we will eventually address them over time. The logic is not yet used by anything, and thus there are no tests for it. Those will be added in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06 13:29:45 +08:00
REF_SKIP_CREATE_REFLOG | REF_SKIP_OID_VERIFICATION,
NULL, data->errbuf);
if (ret < 0)
goto done;
}
done:
strbuf_release(&symref_target);
return ret;
}
static int move_files(const char *from_path, const char *to_path, struct strbuf *errbuf)
{
struct strbuf from_buf = STRBUF_INIT, to_buf = STRBUF_INIT;
size_t from_len, to_len;
DIR *from_dir;
int ret;
from_dir = opendir(from_path);
if (!from_dir) {
strbuf_addf(errbuf, "could not open source directory '%s': %s",
from_path, strerror(errno));
ret = -1;
goto done;
}
strbuf_addstr(&from_buf, from_path);
strbuf_complete(&from_buf, '/');
from_len = from_buf.len;
strbuf_addstr(&to_buf, to_path);
strbuf_complete(&to_buf, '/');
to_len = to_buf.len;
while (1) {
struct dirent *ent;
errno = 0;
ent = readdir(from_dir);
if (!ent)
break;
if (!strcmp(ent->d_name, ".") ||
!strcmp(ent->d_name, ".."))
continue;
strbuf_setlen(&from_buf, from_len);
strbuf_addstr(&from_buf, ent->d_name);
strbuf_setlen(&to_buf, to_len);
strbuf_addstr(&to_buf, ent->d_name);
ret = rename(from_buf.buf, to_buf.buf);
if (ret < 0) {
strbuf_addf(errbuf, "could not link file '%s' to '%s': %s",
from_buf.buf, to_buf.buf, strerror(errno));
goto done;
}
}
if (errno) {
strbuf_addf(errbuf, "could not read entry from directory '%s': %s",
from_path, strerror(errno));
ret = -1;
goto done;
}
ret = 0;
done:
strbuf_release(&from_buf);
strbuf_release(&to_buf);
if (from_dir)
closedir(from_dir);
return ret;
}
static int count_reflogs(const char *reflog UNUSED, void *payload)
{
size_t *reflog_count = payload;
(*reflog_count)++;
return 0;
}
static int has_worktrees(void)
{
struct worktree **worktrees = get_worktrees();
int ret = 0;
size_t i;
for (i = 0; worktrees[i]; i++) {
if (is_main_worktree(worktrees[i]))
continue;
ret = 1;
}
free_worktrees(worktrees);
return ret;
}
int repo_migrate_ref_storage_format(struct repository *repo,
enum ref_storage_format format,
unsigned int flags,
struct strbuf *errbuf)
{
struct ref_store *old_refs = NULL, *new_refs = NULL;
struct ref_transaction *transaction = NULL;
struct strbuf new_gitdir = STRBUF_INIT;
struct migration_data data;
size_t reflog_count = 0;
int did_migrate_refs = 0;
int ret;
if (repo->ref_storage_format == format) {
strbuf_addstr(errbuf, "current and new ref storage format are equal");
ret = -1;
goto done;
}
old_refs = get_main_ref_store(repo);
/*
* We do not have any interfaces that would allow us to write many
* reflog entries. Once we have them we can remove this restriction.
*/
if (refs_for_each_reflog(old_refs, count_reflogs, &reflog_count) < 0) {
strbuf_addstr(errbuf, "cannot count reflogs");
ret = -1;
goto done;
}
if (reflog_count) {
strbuf_addstr(errbuf, "migrating reflogs is not supported yet");
ret = -1;
goto done;
}
/*
* Worktrees complicate the migration because every worktree has a
* separate ref storage. While it should be feasible to implement, this
* is pushed out to a future iteration.
*
* TODO: we should really be passing the caller-provided repository to
* `has_worktrees()`, but our worktree subsystem doesn't yet support
* that.
*/
if (has_worktrees()) {
strbuf_addstr(errbuf, "migrating repositories with worktrees is not supported yet");
ret = -1;
goto done;
}
/*
* The overall logic looks like this:
*
* 1. Set up a new temporary directory and initialize it with the new
* format. This is where all refs will be migrated into.
*
* 2. Enumerate all refs and write them into the new ref storage.
* This operation is safe as we do not yet modify the main
* repository.
*
* 3. If we're in dry-run mode then we are done and can hand over the
* directory to the caller for inspection. If not, we now start
* with the destructive part.
*
* 4. Delete the old ref storage from disk. As we have a copy of refs
* in the new ref storage it's okay(ish) if we now get interrupted
* as there is an equivalent copy of all refs available.
*
* 5. Move the new ref storage files into place.
*
* 6. Change the repository format to the new ref format.
*/
strbuf_addf(&new_gitdir, "%s/%s", old_refs->gitdir, "ref_migration.XXXXXX");
if (!mkdtemp(new_gitdir.buf)) {
strbuf_addf(errbuf, "cannot create migration directory: %s",
strerror(errno));
ret = -1;
goto done;
}
new_refs = ref_store_init(repo, format, new_gitdir.buf,
REF_STORE_ALL_CAPS);
ret = ref_store_create_on_disk(new_refs, 0, errbuf);
if (ret < 0)
goto done;
transaction = ref_store_transaction_begin(new_refs, errbuf);
if (!transaction)
goto done;
data.old_refs = old_refs;
data.transaction = transaction;
data.errbuf = errbuf;
/*
* We need to use the internal `do_for_each_ref()` here so that we can
* also include broken refs and symrefs. These would otherwise be
* skipped silently.
*
* Ideally, we would do this call while locking the old ref storage
* such that there cannot be any concurrent modifications. We do not
* have the infra for that though, and the "files" backend does not
* allow for a central lock due to its design. It's thus on the user to
* ensure that there are no concurrent writes.
*/
ret = do_for_each_ref(old_refs, "", NULL, migrate_one_ref, 0,
DO_FOR_EACH_INCLUDE_ROOT_REFS | DO_FOR_EACH_INCLUDE_BROKEN,
&data);
if (ret < 0)
goto done;
/*
* TODO: we might want to migrate to `initial_ref_transaction_commit()`
* here, which is more efficient for the files backend because it would
* write new refs into the packed-refs file directly. At this point,
* the files backend doesn't handle pseudo-refs and symrefs correctly
* though, so this requires some more work.
*/
ret = ref_transaction_commit(transaction, errbuf);
if (ret < 0)
goto done;
did_migrate_refs = 1;
if (flags & REPO_MIGRATE_REF_STORAGE_FORMAT_DRYRUN) {
printf(_("Finished dry-run migration of refs, "
"the result can be found at '%s'\n"), new_gitdir.buf);
ret = 0;
goto done;
}
/*
* Release the new ref store such that any potentially-open files will
* be closed. This is required for platforms like Cygwin, where
* renaming an open file results in EPERM.
*/
ref_store_release(new_refs);
FREE_AND_NULL(new_refs);
refs: implement logic to migrate between ref storage formats With the introduction of the new "reftable" backend, users may want to migrate repositories between the backends without having to recreate the whole repository. Add the logic to do so. The implementation is generic and works with arbitrary ref storage formats so that a backend does not need to implement any migration logic. It does have a few limitations though: - We do not migrate repositories with worktrees, because worktrees have separate ref storages. It makes the overall affair more complex if we have to migrate multiple storages at once. - We do not migrate reflogs, because we have no interfaces to write many reflog entries. - We do not lock the repository for concurrent access, and thus concurrent writes may end up with weird in-between states. There is no way to fully lock the "files" backend for writes due to its format, and thus we punt on this topic altogether and defer to the user to avoid those from happening. In other words, this version is a minimum viable product for migrating a repository's ref storage format. It works alright for bare repos, which often have neither worktrees nor reflogs. But it will not work for many other repositories without some preparations. These limitations are not set into stone though, and ideally we will eventually address them over time. The logic is not yet used by anything, and thus there are no tests for it. Those will be added in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06 13:29:45 +08:00
/*
* Until now we were in the non-destructive phase, where we only
* populated the new ref store. From hereon though we are about
* to get hands by deleting the old ref store and then moving
* the new one into place.
*
* Assuming that there were no concurrent writes, the new ref
* store should have all information. So if we fail from hereon
* we may be in an in-between state, but it would still be able
* to recover by manually moving remaining files from the
* temporary migration directory into place.
*/
ret = ref_store_remove_on_disk(old_refs, errbuf);
if (ret < 0)
goto done;
ret = move_files(new_gitdir.buf, old_refs->gitdir, errbuf);
if (ret < 0)
goto done;
if (rmdir(new_gitdir.buf) < 0)
warning_errno(_("could not remove temporary migration directory '%s'"),
new_gitdir.buf);
/*
* We have migrated the repository, so we now need to adjust the
* repository format so that clients will use the new ref store.
* We also need to swap out the repository's main ref store.
*/
initialize_repository_version(hash_algo_by_ptr(repo->hash_algo), format, 1);
/*
* Unset the old ref store and release it. `get_main_ref_store()` will
* make sure to lazily re-initialize the repository's ref store with
* the new format.
*/
refs: implement logic to migrate between ref storage formats With the introduction of the new "reftable" backend, users may want to migrate repositories between the backends without having to recreate the whole repository. Add the logic to do so. The implementation is generic and works with arbitrary ref storage formats so that a backend does not need to implement any migration logic. It does have a few limitations though: - We do not migrate repositories with worktrees, because worktrees have separate ref storages. It makes the overall affair more complex if we have to migrate multiple storages at once. - We do not migrate reflogs, because we have no interfaces to write many reflog entries. - We do not lock the repository for concurrent access, and thus concurrent writes may end up with weird in-between states. There is no way to fully lock the "files" backend for writes due to its format, and thus we punt on this topic altogether and defer to the user to avoid those from happening. In other words, this version is a minimum viable product for migrating a repository's ref storage format. It works alright for bare repos, which often have neither worktrees nor reflogs. But it will not work for many other repositories without some preparations. These limitations are not set into stone though, and ideally we will eventually address them over time. The logic is not yet used by anything, and thus there are no tests for it. Those will be added in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06 13:29:45 +08:00
ref_store_release(old_refs);
FREE_AND_NULL(old_refs);
repo->refs_private = NULL;
refs: implement logic to migrate between ref storage formats With the introduction of the new "reftable" backend, users may want to migrate repositories between the backends without having to recreate the whole repository. Add the logic to do so. The implementation is generic and works with arbitrary ref storage formats so that a backend does not need to implement any migration logic. It does have a few limitations though: - We do not migrate repositories with worktrees, because worktrees have separate ref storages. It makes the overall affair more complex if we have to migrate multiple storages at once. - We do not migrate reflogs, because we have no interfaces to write many reflog entries. - We do not lock the repository for concurrent access, and thus concurrent writes may end up with weird in-between states. There is no way to fully lock the "files" backend for writes due to its format, and thus we punt on this topic altogether and defer to the user to avoid those from happening. In other words, this version is a minimum viable product for migrating a repository's ref storage format. It works alright for bare repos, which often have neither worktrees nor reflogs. But it will not work for many other repositories without some preparations. These limitations are not set into stone though, and ideally we will eventually address them over time. The logic is not yet used by anything, and thus there are no tests for it. Those will be added in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06 13:29:45 +08:00
ret = 0;
done:
if (ret && did_migrate_refs) {
strbuf_complete(errbuf, '\n');
strbuf_addf(errbuf, _("migrated refs can be found at '%s'"),
new_gitdir.buf);
}
if (new_refs) {
refs: implement logic to migrate between ref storage formats With the introduction of the new "reftable" backend, users may want to migrate repositories between the backends without having to recreate the whole repository. Add the logic to do so. The implementation is generic and works with arbitrary ref storage formats so that a backend does not need to implement any migration logic. It does have a few limitations though: - We do not migrate repositories with worktrees, because worktrees have separate ref storages. It makes the overall affair more complex if we have to migrate multiple storages at once. - We do not migrate reflogs, because we have no interfaces to write many reflog entries. - We do not lock the repository for concurrent access, and thus concurrent writes may end up with weird in-between states. There is no way to fully lock the "files" backend for writes due to its format, and thus we punt on this topic altogether and defer to the user to avoid those from happening. In other words, this version is a minimum viable product for migrating a repository's ref storage format. It works alright for bare repos, which often have neither worktrees nor reflogs. But it will not work for many other repositories without some preparations. These limitations are not set into stone though, and ideally we will eventually address them over time. The logic is not yet used by anything, and thus there are no tests for it. Those will be added in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06 13:29:45 +08:00
ref_store_release(new_refs);
free(new_refs);
}
refs: implement logic to migrate between ref storage formats With the introduction of the new "reftable" backend, users may want to migrate repositories between the backends without having to recreate the whole repository. Add the logic to do so. The implementation is generic and works with arbitrary ref storage formats so that a backend does not need to implement any migration logic. It does have a few limitations though: - We do not migrate repositories with worktrees, because worktrees have separate ref storages. It makes the overall affair more complex if we have to migrate multiple storages at once. - We do not migrate reflogs, because we have no interfaces to write many reflog entries. - We do not lock the repository for concurrent access, and thus concurrent writes may end up with weird in-between states. There is no way to fully lock the "files" backend for writes due to its format, and thus we punt on this topic altogether and defer to the user to avoid those from happening. In other words, this version is a minimum viable product for migrating a repository's ref storage format. It works alright for bare repos, which often have neither worktrees nor reflogs. But it will not work for many other repositories without some preparations. These limitations are not set into stone though, and ideally we will eventually address them over time. The logic is not yet used by anything, and thus there are no tests for it. Those will be added in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06 13:29:45 +08:00
ref_transaction_free(transaction);
strbuf_release(&new_gitdir);
return ret;
}
int ref_update_expects_existing_old_ref(struct ref_update *update)
{
return (update->flags & REF_HAVE_OLD) &&
(!is_null_oid(&update->old_oid) || update->old_target);
}