The `get_git_common_dir()` function retrieves the path to the common
directory for `the_repository`. Make it accept a `struct repository`
such that it can work on arbitrary repositories and make it part of the
repository subsystem. This reduces our reliance on `the_repository` and
clarifies scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `get_git_dir()` function retrieves the path to the Git directory for
`the_repository`. Make it accept a `struct repository` such that it can
work on arbitrary repositories and make it part of the repository
subsystem. This reduces our reliance on `the_repository` and clarifies
scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark unused parameters as UNUSED to squelch -Wunused warnings.
* jk/mark-unused-parameters:
t-hashmap: stop calling setup() for t_intern() test
scalar: mark unused parameters in dummy function
daemon: mark unused parameters in non-posix fallbacks
setup: mark unused parameter in config callback
test-mergesort: mark unused parameters in trivial callback
t-hashmap: mark unused parameters in callback function
reftable: mark unused parameters in virtual functions
reftable: drop obsolete test function declarations
reftable: ignore unused argc/argv in test functions
unit-tests: ignore unused argc/argv
t/helper: mark more unused argv/argc arguments
oss-fuzz: mark unused argv/argc argument
refs: mark unused parameters in do_for_each_reflog_helper()
refs: mark unused parameters in ref_store fsck callbacks
update-ref: mark more unused parameters in parser callbacks
imap-send: mark unused parameter in ssl_socket_connect() fallback
The default object hash and ref backend format used to be settable
only with explicit command line option to "git init" and
environment variables, but now they can be configured in the user's
global and system wide configuration.
* ps/hash-and-ref-format-from-config:
setup: make ref storage format configurable via config
setup: make object format configurable via config
setup: merge configuration of repository formats
t0001: delete repositories when object format tests finish
t0001: exercise initialization with ref formats more thoroughly
This is logically a continuation of 783a86c142 (config: mark unused
callback parameters, 2022-08-19), but this case was introduced much
later in 4412a04fe6 (init.templateDir: consider this config setting
protected, 2024-03-29).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the preceding commit, introduce a new "init.defaultRefFormat"
config that allows the user to globally set the ref storage format used
by newly created repositories.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The object format for repositories can either be configured explicitly
by passing the `--object-format=` option to git-init(1) or git-clone(1),
or globally by setting the `GIT_DEFAULT_HASH` environment variable.
While the former makes sense, setting random environment variables is
not really a good user experience in case someone decides to only use
SHA256 repositories.
It is only natural to expect for a user that things like this can also
be configured via their config. As such, introduce a new config
"init.defaultObjectFormat", similar to "init.defaultBranch", that allows
the user to configure the default object format when creating new repos.
The precedence order now is the following, where the first one wins:
1. The `--object-format=` switch.
2. The `GIT_DEFAULT_HASH` environment variable.
3. The `init.defaultObjectFormat` config variable.
This matches the typical precedence order we use in Git. We typically
let the environment override the config such that the latter can easily
be overridden on an ephemeral basis, for example by scripts.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The configuration of repository formats is split up across two functions
`validate_hash_algorithm()` and `validate_ref_storage_format()`. This is
fine as-is, but we are about to extend the logic to also read default
values from the config. With the logic split across two functions, we
would either have to pass in additional parameters read from the config,
or read the config multiple times. Both of these options feel a bit
unwieldy.
Merge the code into a new function `repository_format_configure()` that
is responsible for configuring the whole repository's format. Like this,
we can easily read the config in a single place, only.
Furthermore, move the calls to `repo_set_ref_storage_format()` and
`repo_set_hash_algo()` into this new function as well, such that all the
logic to configure the repository format is self-contained here.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pathname of a repository comes from getcwd() and it could be a
path aliased via symbolic links, e.g., the real directory may be
/home/u/repository but a symbolic link /home/u/repo may point at it,
and the clone request may come as "git clone file:///home/u/repo/"
A request to check if /home/u/repository is safe would be rejected
if the safe.directory configuration allows /home/u/repo/ but not its
alias /home/u/repository/. Normalize the paths configured for the
safe.directory configuration variable before comparing them with the
path being checked.
Two and a half things to note, compared to the previous step to
normalize the actual path of the suspected repository, are:
- A configured safe.directory may be coming from .gitignore in the
home directory that may be shared across machines. The path
meant to match with an entry may not necessarily exist on all of
such machines, so not being able to convert them to real path on
this machine is *not* a condition that is worthy of warning.
Hence, we ignore a path that cannot be converted to a real path.
- A configured safe.directory is essentially a random string that
user throws at us, written completely unrelated to the directory
the current process happens to be in. Hence it makes little
sense to give a non-absolute path. Hence we ignore any
non-absolute paths, except for ".".
- The safe.directory set to "." was once advertised on the list as
a valid workaround for the regression caused by the overly tight
safe.directory check introduced in 2.45.1; we treat it to mean
"if we are at the top level of a repository, it is OK".
(cf. <834862fd-b579-438a-b9b3-5246bf27ce8a@gmail.com>).
Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pathname of a repository comes from getcwd() and it could be a
path aliased via symbolic links, e.g., the real directory may be
/home/u/repository but a symbolic link /home/u/repo may point at it,
and the clone request may come as "git clone file:///home/u/repo/".
A request to check if /home/u/repo is safe would be rejected if the
safe.directory configuration allows /home/u/repository/ but not its
alias /home/u/repo/. Normalize the path being checked before
comparing with safe.directory value(s).
Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The paths given in the safe.directory configuration variable are
allowed to contain "~user" (which interpolates to user's home
directory) and "%(prefix)" (which interpolates to the installation
location in RUNTIME_PREFIX-enabled builds, and a call to the
git_config_pathname() function is tasked to obtain a copy of the
path with these constructs interpolated.
The function, when it succeeds, always yields an allocated string in
the location given as the out-parameter; even when there is nothing
to interpolate in the original, a literal copy is made. The code
path that contains this caller somehow made two contradicting and
incorrect assumptions of the behaviour when there is no need for
interpolation, and was written with extra defensiveness against
two phantom risks that do not exist.
One wrong assumption was that the function might yield NULL when
there is no interpolation. This led to the use of an extra "check"
variable, conditionally holding either the interpolated or the
original string. The assumption was with us since 8959555c
(setup_git_directory(): add an owner check for the top-level
directory, 2022-03-02) originally introduced the safe.directory
feature.
Another wrong assumption was that the function might yield the same
pointer as the input when there is no interpolation. This led to a
conditional free'ing of the interpolated copy, that the conditional
never skipped, as we always received an allocated string.
Simplify the code by removing the extra defensiveness.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help
transition the codebase to rely less on the availability of the
singleton the_repository instance.
* ps/use-the-repository:
hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE`
t/helper: remove dependency on `the_repository` in "proc-receive"
t/helper: fix segfault in "oid-array" command without repository
t/helper: use correct object hash in partial-clone helper
compat/fsmonitor: fix socket path in networked SHA256 repos
replace-object: use hash algorithm from passed-in repository
protocol-caps: use hash algorithm from passed-in repository
oidset: pass hash algorithm when parsing file
http-fetch: don't crash when parsing packfile without a repo
hash-ll: merge with "hash.h"
refs: avoid include cycle with "repository.h"
global: introduce `USE_THE_REPOSITORY_VARIABLE` macro
hash: require hash algorithm in `empty_tree_oid_hex()`
hash: require hash algorithm in `is_empty_{blob,tree}_oid()`
hash: make `is_null_oid()` independent of `the_repository`
hash: convert `oidcmp()` and `oideq()` to compare whole hash
global: ensure that object IDs are always padded
hash: require hash algorithm in `oidread()` and `oidclr()`
hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()`
hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions
We forgot to normalize the result of getcwd() to NFC on macOS where
all other paths are normalized, which has been corrected. This still
does not address the case where core.precomposeUnicode configuration
is not defined globally.
* tb/precompose-getcwd:
macOS: ls-files path fails if path of workdir is NFD
A new command has been added to migrate a repository that uses the
files backend for its ref storage to use the reftable backend, with
limitations.
* ps/ref-storage-migration:
builtin/refs: new command to migrate ref storage formats
refs: implement logic to migrate between ref storage formats
refs: implement removal of ref storages
worktree: don't store main worktree twice
reftable: inline `merged_table_release()`
refs/files: fix NULL pointer deref when releasing ref store
refs/files: extract function to iterate through root refs
refs/files: refactor `add_pseudoref_and_head_entries()`
refs: allow to skip creation of reflog entries
refs: pass storage format to `ref_store_init()` explicitly
refs: convert ref storage format to an enum
setup: unset ref storage when reinitializing repository version
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.
It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.
Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit
For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).
Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/ref-storage-migration:
builtin/refs: new command to migrate ref storage formats
refs: implement logic to migrate between ref storage formats
refs: implement removal of ref storages
worktree: don't store main worktree twice
reftable: inline `merged_table_release()`
refs/files: fix NULL pointer deref when releasing ref store
refs/files: extract function to iterate through root refs
refs/files: refactor `add_pseudoref_and_head_entries()`
refs: allow to skip creation of reflog entries
refs: pass storage format to `ref_store_init()` explicitly
refs: convert ref storage format to an enum
setup: unset ref storage when reinitializing repository version
The safe.directory configuration knob has been updated to
optionally allow leading path matches.
* jc/safe-directory-leading-path:
safe.directory: allow "lead/ing/path/*" match
Leakfixes.
* ps/leakfixes:
builtin/mv: fix leaks for submodule gitfile paths
builtin/mv: refactor to use `struct strvec`
builtin/mv duplicate string list memory
builtin/mv: refactor `add_slash()` to always return allocated strings
strvec: add functions to replace and remove strings
submodule: fix leaking memory for submodule entries
commit-reach: fix memory leak in `ahead_behind()`
builtin/credential: clear credential before exit
config: plug various memory leaks
config: clarify memory ownership in `git_config_string()`
builtin/log: stop using globals for format config
builtin/log: stop using globals for log config
convert: refactor code to clarify ownership of check_roundtrip_encoding
diff: refactor code to clarify memory ownership of prefixes
config: clarify memory ownership in `git_config_pathname()`
http: refactor code to clarify memory ownership
checkout: clarify memory ownership in `unique_tracking_name()`
strbuf: fix leak when `appendwholeline()` fails with EOF
transport-helper: fix leaking helper name
The ref storage format is tracked as a simple unsigned integer, which
makes it harder than necessary to discover what that integer actually is
or where its values are defined.
Convert the ref storage format to instead be an enum.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reinitializing a repository's version we may end up unsetting the
hash algorithm when it matches the default hash algorithm. If we didn't
do that then the previously configured value might remain intact.
While the same issue exists for the ref storage extension, we don't do
this here. This has been fine for most of the part because it is not
supported to re-initialize a repository with a different ref storage
format anyway. We're about to introduce a new command to migrate ref
storages though, so this is about to become an issue there.
Prepare for this and unset the ref storage format when reinitializing a
repository with the "files" format.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git init" in an already created directory, when the user
configuration has includeif.onbranch, started to fail recently,
which has been corrected.
* ps/fix-reinit-includeif-onbranch:
setup: fix bug with "includeIf.onbranch" when initializing dir
Under macOS, `git ls-files path` does not work (gives an error)
if the absolute 'path' contains characters in NFD (decomposed).
This happens when core.precomposeunicode is true, which is the
most common case. The bug report says:
$ cd somewhere # some safe place, /tmp or ~/tmp etc.
$ mkdir $'u\xcc\x88' # ü in NFD
$ cd ü # or cd $'u\xcc\x88' or cd $'\xc3\xbc'
$ git init
$ git ls-files $'/somewhere/u\xcc\x88' # NFD
fatal: /somewhere/ü: '/somewhere/ü' is outside repository at '/somewhere/ü'
$ git ls-files $'/somewhere/\xc3\xbc' # NFC
(the same error as above)
In the 'fatal:' error message, there are three ü;
the 1st and 2nd are in NFC, the 3rd is in NFD.
Add test cases that follows the bug report, with the simplification
that the 'ü' is replaced by an 'ä', which is already used as NFD and
NFC in t3910.
The solution is to add a call to precompose_string_if_needed()
to this code in setup.c :
`work_tree = precompose_string_if_needed(get_git_work_tree());`
There is, however, a limitation with this very usage of Git:
The (repo) local .gitconfig file is not used, only the global
"core.precomposeunicode" is taken into account, if it is set (or not).
To set it to true is a good recommendation anyway, and here is the
analyzes from Jun T :
The problem is the_repository->config->hash_initialized
is set to 1 before the_repository->commondir is set to ".git".
Due to this, .git/config is never read, and precomposed_unicode
is never set to 1 (remains -1).
run_builtin() {
setup_git_directory() {
strbuf_getcwd() { # setup.c:1542
precompose_{strbuf,string}_if_needed() {
# precomposed_unicode is still -1
git_congig_get_bool("core.precomposeunicode") {
git_config_check_init() {
repo_read_config() {
git_config_init() {
# !!!
the_repository->config->hash_initialized=1
# !!!
}
# does not read .git/config since
# the_repository->commondir is still NULL
}
}
}
returns without converting to NFC
}
returns cwd in NFD
}
setup_discovered_git_dir() {
set_git_work_tree(".") {
repo_set_worktree() {
# this function indirectly calls strbuf_getcwd()
# --> precompose_{strbuf,string}_if_needed() -->
# {git,repo}_config_get_bool("core.precomposeunicode"),
# but does not try to read .git/config since
# the_repository->config->hash_initialized
# is already set to 1 above. And it will not read
# .git/config even if hash_initialized is 0
# since the_repository->commondir is still NULL.
the_repository->worktree = NFD
}
}
}
setup_git_env() {
repo_setup_gitdir() {
repo_set_commondir() {
# finally commondir is set here
the_repository->commondir = ".git"
}
}
}
} // END setup_git_directory
Reported-by: Jun T <takimoto-j@kba.biglobe.ne.jp>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Further clean-up the refs subsystem to stop relying on
the_repository, and instead use the repository associated to the
ref_store object.
* ps/refs-without-the-repository-updates:
refs/packed: remove references to `the_hash_algo`
refs/files: remove references to `the_hash_algo`
refs/files: use correct repository
refs: remove `dwim_log()`
refs: drop `git_default_branch_name()`
refs: pass repo when peeling objects
refs: move object peeling into "object.c"
refs: pass ref store when detecting dangling symrefs
refs: convert iteration over replace refs to accept ref store
refs: retrieve worktree ref stores via associated repository
refs: refactor `resolve_gitlink_ref()` to accept a repository
refs: pass repo when retrieving submodule ref store
refs: track ref stores via strmap
refs: implement releasing ref storages
refs: rename `init_db` callback to avoid confusion
refs: adjust names for `init` and `init_db` callbacks
Before discovering the repository details, We used to assume SHA-1
as the "default" hash function, which has been corrected. Hopefully
this will smoke out codepaths that rely on such an unwarranted
assumptions.
* ps/undecided-is-not-necessarily-sha1:
repository: stop setting SHA1 as the default object hash
oss-fuzz/commit-graph: set up hash algorithm
builtin/shortlog: don't set up revisions without repo
builtin/diff: explicitly set hash algo when there is no repo
builtin/bundle: abort "verify" early when there is no repository
builtin/blame: don't access potentially unitialized `the_hash_algo`
builtin/rev-parse: allow shortening to more than 40 hex characters
remote-curl: fix parsing of detached SHA256 heads
attr: fix BUG() when parsing attrs outside of repo
attr: don't recompute default attribute source
parse-options-cb: only abbreviate hashes when hash algo is known
path: move `validate_headref()` to its only user
path: harden validation of HEAD with non-standard hashes
When safe.directory was introduced in v2.30.3 timeframe, 8959555c
(setup_git_directory(): add an owner check for the top-level
directory, 2022-03-02), it only allowed specific opt-out
directories. Immediately after an embargoed release that included
the change, 0f85c4a3 (setup: opt-out of check with safe.directory=*,
2022-04-13) was done as a response to loosen the check so that a
single '*' can be used to say "I trust all repositories" for folks
who host too many repositories to list individually.
Let's further loosen the check to allow people to say "everything
under this hierarchy is deemed safe" by specifying such a leading
directory with "/*" appended to it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The out parameter of `git_config_pathname()` is a `const char **` even
though we transfer ownership of memory to the caller. This is quite
misleading and has led to many memory leaks all over the place. Adapt
the parameter to instead be `char **`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was reported that git-init(1) can fail when initializing an existing
directory in case the config contains an "includeIf.onbranch:"
condition:
$ mkdir repo
$ git -c includeIf.onbranch:main.path=nonexistent init repo
BUG: refs.c:2056: reference backend is unknown
The same error can also be triggered when re-initializing an already
existing repository.
The bug has been introduced in 173761e21b (setup: start tracking ref
storage format, 2023-12-29), which wired up the ref storage format. The
root cause is in `init_db()`, which tries to read the config before we
have initialized `the_repository` and most importantly its ref storage
format. We eventually end up calling `include_by_branch()` and execute
`refs_resolve_ref_unsafe()`, but because we have not initialized the ref
storage format yet this will trigger the above bug.
Interestingly, `include_by_branch()` has a mechanism that will only
cause us to resolve the ref when `the_repository->gitdir` is set. This
is also the reason why this only happens when we initialize an already
existing directory or repository: `gitdir` is set in those cases, but
not when creating a new directory.
Now there are two ways to address the issue:
- We can adapt `include_by_branch()` to also make the code conditional
on whether `the_repository->ref_storage_format` is set.
- We can shift around code such that we initialize the repository
format before we read the config.
While the first approach would be safe, it may also cause us to paper
over issues where a ref store should have been set up. In our case for
example, it may be reasonable to expect that re-initializing the repo
will cause the "onbranch:" condition to trigger, but we would not do
that if the ref storage format was not set up yet. This also used to
work before the above commit that introduced this bug.
Rearrange the code such that we set up the repository format before
reading the config. This fixes the bug and ensures that "onbranch:"
conditions can trigger.
Reported-by: Heghedus Razvan <heghedus.razvan@protonmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Tested-by: Heghedus Razvan <heghedus.razvan@protonmail.com>
[jc: fixed a test and backported to v2.44.0 codebase]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Updates to symbolic refs can now be made as a part of ref
transaction.
* kn/ref-transaction-symref:
refs: remove `create_symref` and associated dead code
refs: rename `refs_create_symref()` to `refs_update_symref()`
refs: use transaction in `refs_create_symref()`
refs: add support for transactional symref updates
refs: move `original_update_refname` to 'refs.c'
refs: support symrefs in 'reference-transaction' hook
files-backend: extract out `create_symref_lock()`
refs: accept symref values in `ref_transaction_update()`
The `git_default_branch_name()` function is a thin wrapper around
`repo_default_branch_name()` with two differences:
- We implicitly rely on `the_repository`.
- We cache the default branch name.
None of the callsites of `git_default_branch_name()` are hot code paths
though, so the caching of the branch name is not really required.
Refactor the callsites to use `repo_default_branch_name()` instead and
drop `git_default_branch_name()`, thus getting rid of one more case
where we rely on `the_repository`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reference backends have two callbacks `init` and `init_db`. The
similarity of these two callbacks has repeatedly confused me whenever I
was looking at them, where I always had to look up which of them does
what.
Rename the `init_db` callback to `create_on_disk`, which should
hopefully be clearer.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The refs API lost functions that implicitly assumes to work on the
primary ref_store by forcing the callers to pass a ref_store as an
argument.
* ps/refs-without-the-repository:
refs: remove functions without ref store
cocci: apply rules to rewrite callers of "refs" interfaces
cocci: introduce rules to transform "refs" to pass ref store
refs: add `exclude_patterns` parameter to `for_each_fullref_in()`
refs: introduce missing functions that accept a `struct ref_store`
* ps/refs-without-the-repository:
refs: remove functions without ref store
cocci: apply rules to rewrite callers of "refs" interfaces
cocci: introduce rules to transform "refs" to pass ref store
refs: add `exclude_patterns` parameter to `for_each_fullref_in()`
refs: introduce missing functions that accept a `struct ref_store`
Apply the rules that rewrite callers of "refs" interfaces to explicitly
pass `struct ref_store`. The resulting patch has been applied with the
`--whitespace=fix` option.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While `validate_headref()` is only called from `is_git_directory()` in
"setup.c", it is currently implemented in "path.c". Move it over such
that it becomes clear that it is only really used during setup in order
to discover repositories.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint-2.44: (41 commits)
Git 2.44.1
Git 2.43.4
Git 2.42.2
Git 2.41.1
Git 2.40.2
Git 2.39.4
fsck: warn about symlink pointing inside a gitdir
core.hooksPath: add some protection while cloning
init.templateDir: consider this config setting protected
clone: prevent hooks from running during a clone
Add a helper function to compare file contents
init: refactor the template directory discovery into its own function
find_hook(): refactor the `STRIP_EXTENSION` logic
clone: when symbolic links collide with directories, keep the latter
entry: report more colliding paths
t5510: verify that D/F confusion cannot lead to an RCE
submodule: require the submodule path to contain directories only
clone_submodule: avoid using `access()` on directories
submodules: submodule paths must not contain symlinks
clone: prevent clashing git dirs when cloning submodule in parallel
...
* maint-2.43: (40 commits)
Git 2.43.4
Git 2.42.2
Git 2.41.1
Git 2.40.2
Git 2.39.4
fsck: warn about symlink pointing inside a gitdir
core.hooksPath: add some protection while cloning
init.templateDir: consider this config setting protected
clone: prevent hooks from running during a clone
Add a helper function to compare file contents
init: refactor the template directory discovery into its own function
find_hook(): refactor the `STRIP_EXTENSION` logic
clone: when symbolic links collide with directories, keep the latter
entry: report more colliding paths
t5510: verify that D/F confusion cannot lead to an RCE
submodule: require the submodule path to contain directories only
clone_submodule: avoid using `access()` on directories
submodules: submodule paths must not contain symlinks
clone: prevent clashing git dirs when cloning submodule in parallel
t7423: add tests for symlinked submodule directories
...
* maint-2.42: (39 commits)
Git 2.42.2
Git 2.41.1
Git 2.40.2
Git 2.39.4
fsck: warn about symlink pointing inside a gitdir
core.hooksPath: add some protection while cloning
init.templateDir: consider this config setting protected
clone: prevent hooks from running during a clone
Add a helper function to compare file contents
init: refactor the template directory discovery into its own function
find_hook(): refactor the `STRIP_EXTENSION` logic
clone: when symbolic links collide with directories, keep the latter
entry: report more colliding paths
t5510: verify that D/F confusion cannot lead to an RCE
submodule: require the submodule path to contain directories only
clone_submodule: avoid using `access()` on directories
submodules: submodule paths must not contain symlinks
clone: prevent clashing git dirs when cloning submodule in parallel
t7423: add tests for symlinked submodule directories
has_dir_name(): do not get confused by characters < '/'
...
* maint-2.41: (38 commits)
Git 2.41.1
Git 2.40.2
Git 2.39.4
fsck: warn about symlink pointing inside a gitdir
core.hooksPath: add some protection while cloning
init.templateDir: consider this config setting protected
clone: prevent hooks from running during a clone
Add a helper function to compare file contents
init: refactor the template directory discovery into its own function
find_hook(): refactor the `STRIP_EXTENSION` logic
clone: when symbolic links collide with directories, keep the latter
entry: report more colliding paths
t5510: verify that D/F confusion cannot lead to an RCE
submodule: require the submodule path to contain directories only
clone_submodule: avoid using `access()` on directories
submodules: submodule paths must not contain symlinks
clone: prevent clashing git dirs when cloning submodule in parallel
t7423: add tests for symlinked submodule directories
has_dir_name(): do not get confused by characters < '/'
docs: document security issues around untrusted .git dirs
...
* maint-2.40: (39 commits)
Git 2.40.2
Git 2.39.4
fsck: warn about symlink pointing inside a gitdir
core.hooksPath: add some protection while cloning
init.templateDir: consider this config setting protected
clone: prevent hooks from running during a clone
Add a helper function to compare file contents
init: refactor the template directory discovery into its own function
find_hook(): refactor the `STRIP_EXTENSION` logic
clone: when symbolic links collide with directories, keep the latter
entry: report more colliding paths
t5510: verify that D/F confusion cannot lead to an RCE
submodule: require the submodule path to contain directories only
clone_submodule: avoid using `access()` on directories
submodules: submodule paths must not contain symlinks
clone: prevent clashing git dirs when cloning submodule in parallel
t7423: add tests for symlinked submodule directories
has_dir_name(): do not get confused by characters < '/'
docs: document security issues around untrusted .git dirs
upload-pack: disable lazy-fetching by default
...
The ability to configuring the template directory is a delicate feature:
It allows defining hooks that will be run e.g. during a `git clone`
operation, such as the `post-checkout` hook.
As such, it is of utmost importance that Git would not allow that config
setting to be changed during a `git clone` by mistake, allowing an
attacker a chance for a Remote Code Execution, allowing attackers to run
arbitrary code on unsuspecting users' machines.
As a defense-in-depth measure, to prevent minor vulnerabilities in the
`git clone` code from ballooning into higher-serverity attack vectors,
let's make this a protected setting just like `safe.directory` and
friends, i.e. ignore any `init.templateDir` entries from any local
config.
Note: This does not change the behavior of any recursive clone (modulo
bugs), as the local repository config is not even supposed to be written
while cloning the superproject, except in one scenario: If a config
template is configured that sets the template directory. This might be
done because `git clone --recurse-submodules --template=<directory>`
does not pass that template directory on to the submodules'
initialization.
Another scenario where this commit changes behavior is where
repositories are _not_ cloned recursively, and then some (intentional,
benign) automation configures the template directory to be used before
initializing the submodules.
So the caveat is that this could theoretically break existing processes.
In both scenarios, there is a way out, though: configuring the template
directory via the environment variable `GIT_TEMPLATE_DIR`.
This change in behavior is a trade-off between security and
backwards-compatibility that is struck in favor of security.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
We will need to call this function from `hook.c` to be able to prevent
hooks from running that were written as part of a `clone` but did not
originate from the template directory.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When cloning from somebody else's repositories, it is possible that,
say, the `upload-pack` command is overridden in the repository that is
about to be cloned, which would then be run in the user's context who
started the clone.
To remind the user that this is a potentially unsafe operation, let's
extend the ownership checks we have already established for regular
gitdir discovery to extend also to local repositories that are about to
be cloned.
This protection extends also to file:// URLs.
The fixes in this commit address CVE-2024-32004.
Note: This commit does not touch the `fetch`/`clone` code directly, but
instead the function used implicitly by both: `enter_repo()`. This
function is also used by `git receive-pack` (i.e. pushes), by `git
upload-archive`, by `git daemon` and by `git http-backend`. In setups
that want to serve repositories owned by different users than the
account running the service, this will require `safe.*` settings to be
configured accordingly.
Also note: there are tiny time windows where a time-of-check-time-of-use
("TOCTOU") race is possible. The real solution to those would be to work
with `fstat()` and `openat()`. However, the latter function is not
available on Windows (and would have to be emulated with rather
expensive low-level `NtCreateFile()` calls), and the changes would be
quite extensive, for my taste too extensive for the little gain given
that embargoed releases need to pay extra attention to avoid introducing
inadvertent bugs.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Work to support a repository that work with both SHA-1 and SHA-256
hash algorithms has started.
* eb/hash-transition: (30 commits)
t1016-compatObjectFormat: add tests to verify the conversion between objects
t1006: test oid compatibility with cat-file
t1006: rename sha1 to oid
test-lib: compute the compatibility hash so tests may use it
builtin/ls-tree: let the oid determine the output algorithm
object-file: handle compat objects in check_object_signature
tree-walk: init_tree_desc take an oid to get the hash algorithm
builtin/cat-file: let the oid determine the output algorithm
rev-parse: add an --output-object-format parameter
repository: implement extensions.compatObjectFormat
object-file: update object_info_extended to reencode objects
object-file-convert: convert commits that embed signed tags
object-file-convert: convert commit objects when writing
object-file-convert: don't leak when converting tag objects
object-file-convert: convert tag objects when writing
object-file-convert: add a function to convert trees between algorithms
object: factor out parse_mode out of fast-import and tree-walk into in object.h
cache: add a function to read an OID of a specific algorithm
tag: sign both hashes
commit: export add_header_signature to support handling signatures on tags
...
Users with safe.bareRepository=explicit can still work from within
$GIT_DIR of a seconary worktree (which resides at .git/worktrees/$name/)
of the primary worktree without explicitly specifying the $GIT_DIR
environment variable or the --git-dir=<path> option.
* jc/safe-implicit-bare:
setup: notice more types of implicit bare repositories
Setting the safe.bareRepository configuration variable to explicit
stops git from using a bare repository, unless the repository is
explicitly specified, either by the "--git-dir=<path>" command line
option, or by exporting $GIT_DIR environment variable. This may be
a reasonable measure to safeguard users from accidentally straying
into a bare repository in unexpected places, but often gets in the
way of users who need valid accesses to the repository.
Earlier, 45bb9162 (setup: allow cwd=.git w/ bareRepository=explicit,
2024-01-20) loosened the rule such that being inside the ".git"
directory of a non-bare repository does not really count as
accessing a "bare" repository. The reason why such a loosening is
needed is because often hooks and third-party tools run from within
$GIT_DIR while working with a non-bare repository.
More importantly, the reason why this is safe is because a directory
whose contents look like that of a "bare" repository cannot be a
bare repository that came embedded within a checkout of a malicious
project, as long as its directory name is ".git", because ".git" is
not a name allowed for a directory in payload.
There are at least two other cases where tools have to work in a
bare-repository looking directory that is not an embedded bare
repository, and accesses to them are still not allowed by the recent
change.
- A secondary worktree (whose name is $name) has its $GIT_DIR
inside "worktrees/$name/" subdirectory of the $GIT_DIR of the
primary worktree of the same repository.
- A submodule worktree (whose name is $name) has its $GIT_DIR
inside "modules/$name/" subdirectory of the $GIT_DIR of its
superproject.
As long as the primary worktree or the superproject in these cases
are not bare, the pathname of these "looks like bare but not really"
directories will have "/.git/worktrees/" and "/.git/modules/" as a
substring in its leading part, and we can take advantage of the same
security guarantee allow git to work from these places.
Extend the earlier "in a directory called '.git' we are OK" logic
used for the primary worktree to also cover the secondary worktree's
and non-embedded submodule's $GIT_DIR, by moving the logic to a
helper function "is_implicit_bare_repo()". We deliberately exclude
secondary worktrees and submodules of a bare repository, as these
are exactly what safe.bareRepository=explicit setting is designed to
forbid accesses to without an explicit GIT_DIR/--git-dir=<path>
Helped-by: Kyle Lippincott <spectral@google.com>
Helped-by: Kyle Meyer <kyle@kyleam.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The TODO comment suggested to heed core.bare from template config file
if no command line override given. And the prev_bare_repository
variable seems to have been placed for this sole purpose as it is not
used anywhere else.
However, it was clarified by Junio [1] that such values (including
core.bare) are ignored intentionally and does not make sense to
propagate them from template config to repository config. Also, the
directories for the worktree and repository are already created, and
therefore the bare/non-bare decision has already been made, by the
point we reach the codepath where the TODO comment is placed.
Therefore, prev_bare_repository does not have a usecase with/without
supporting core.bare from template. And the removal of
prev_bare_repository is safe as proved by the later part of the
comment:
"Unfortunately, the line above is equivalent to
is_bare_repository_cfg = !work_tree;
which ignores the config entirely even if no `--[no-]bare`
command line option was present.
To see why, note that before this function, there was this call:
prev_bare_repository = is_bare_repository()
expanding the right hand side:
= is_bare_repository_cfg && !get_git_work_tree()
= is_bare_repository_cfg && !work_tree
note that the last simplification above is valid because nothing
calls repo_init() or set_git_work_tree() between any of the
relevant calls in the code, and thus the !get_git_work_tree()
calls will return the same result each time. So, what we are
interested in computing is the right hand side of the line of
code just above this comment:
prev_bare_repository || !work_tree
= is_bare_repository_cfg && !work_tree || !work_tree
= !work_tree
because "A && !B || !B == !B" for all boolean values of A & B."
Therefore, remove the TODO comment and remove prev_bare_repository
variable. Also, update relevant testcases and remove one redundant
testcase.
[1]: https://lore.kernel.org/git/xmqqjzonpy9l.fsf@gitster.g/
Helped-by: Elijah Newren <newren@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 18c9cb7524 (builtin/clone: create the refdb with the correct object
format, 2023-12-12), we have changed git-clone(1) so that it delays
creation of the refdb until after it has learned about the remote's
object format. This change was required for the reftable backend, which
encodes the object format into the tables. So if we pre-initialized the
refdb with the default object format, but the remote uses a different
object format than that, then the resulting tables would have encoded
the wrong object format.
This change unfortunately breaks remote helpers which try to access the
repository that is about to be created. Because the refdb has not yet
been initialized at the point where we spawn the remote helper, we also
don't yet have "HEAD" or "refs/". Consequently, any Git commands ran by
the remote helper which try to access the repository would fail because
it cannot be discovered.
This is essentially a chicken-and-egg problem: we cannot initialize the
refdb because we don't know about the object format. But we cannot learn
about the object format because the remote helper may be unable to
access the partially-initialized repository.
Ideally, we would address this issue via capabilities. But the remote
helper protocol is not structured in a way that guarantees that the
capability announcement happens before the remote helper tries to access
the repository.
Instead, fix this issue by partially initializing the refdb up to the
point where it becomes discoverable by Git commands.
Reported-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove unused header "#include".
* en/header-cleanup:
treewide: remove unnecessary includes in source files
treewide: add direct includes currently only pulled in transitively
trace2/tr2_tls.h: remove unnecessary include
submodule-config.h: remove unnecessary include
pkt-line.h: remove unnecessary include
line-log.h: remove unnecessary include
http.h: remove unnecessary include
fsmonitor--daemon.h: remove unnecessary includes
blame.h: remove unnecessary includes
archive.h: remove unnecessary include
treewide: remove unnecessary includes in source files
treewide: remove unnecessary includes from header files