git/git-submodule.sh

678 lines
11 KiB
Bash
Raw Normal View History

#!/bin/sh
#
# git-submodule.sh: add, init, update or list git submodules
#
# Copyright (c) 2007 Lars Hjemli
dashless=$(basename "$0" | sed -e 's/-/ /')
USAGE="[--quiet] [--cached]
or: $dashless [--quiet] add [-b <branch>] [-f|--force] [--name <name>] [--reference <repository>] [--] <repository> [<path>]
or: $dashless [--quiet] status [--cached] [--recursive] [--] [<path>...]
or: $dashless [--quiet] init [--] [<path>...]
or: $dashless [--quiet] deinit [-f|--force] (--all| [--] <path>...)
clone, submodule: pass partial clone filters to submodules When cloning a repo with a --filter and with --recurse-submodules enabled, the partial clone filter only applies to the top-level repo. This can lead to unexpected bandwidth and disk usage for projects which include large submodules. For example, a user might wish to make a partial clone of Gerrit and would run: `git clone --recurse-submodules --filter=blob:5k https://gerrit.googlesource.com/gerrit`. However, only the superproject would be a partial clone; all the submodules would have all blobs downloaded regardless of their size. With this change, the same filter can also be applied to submodules, meaning the expected bandwidth and disk savings apply consistently. To avoid changing default behavior, add a new clone flag, `--also-filter-submodules`. When this is set along with `--filter` and `--recurse-submodules`, the filter spec is passed along to git-submodule and git-submodule--helper, such that submodule clones also have the filter applied. This applies the same filter to the superproject and all submodules. Users who need to customize the filter per-submodule would need to clone with `--no-recurse-submodules` and then manually initialize each submodule with the proper filter. Applying filters to submodules should be safe thanks to Jonathan Tan's recent work [1, 2, 3] eliminating the use of alternates as a method of accessing submodule objects, so any submodule object access now triggers a lazy fetch from the submodule's promisor remote if the accessed object is missing. This patch is a reworked version of [4], which was created prior to Jonathan Tan's work. [1]: 8721e2e (Merge branch 'jt/partial-clone-submodule-1', 2021-07-16) [2]: 11e5d0a (Merge branch 'jt/grep-wo-submodule-odb-as-alternate', 2021-09-20) [3]: 162a13b (Merge branch 'jt/no-abuse-alternate-odb-for-submodules', 2021-10-25) [4]: https://lore.kernel.org/git/52bf9d45b8e2b72ff32aa773f2415bf7b2b86da2.1563322192.git.steadmon@google.com/ Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-05 13:00:49 +08:00
or: $dashless [--quiet] update [--init [--filter=<filter-spec>]] [--remote] [-N|--no-fetch] [-f|--force] [--checkout|--merge|--rebase] [--[no-]recommend-shallow] [--reference <repository>] [--recursive] [--[no-]single-branch] [--] [<path>...]
or: $dashless [--quiet] set-branch (--default|--branch <branch>) [--] <path>
or: $dashless [--quiet] set-url [--] <path> <newurl>
or: $dashless [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...]
or: $dashless [--quiet] foreach [--recursive] <command>
or: $dashless [--quiet] sync [--recursive] [--] [<path>...]
or: $dashless [--quiet] absorbgitdirs [--] [<path>...]"
OPTIONS_SPEC=
SUBDIRECTORY_OK=Yes
. git-sh-setup
require_work_tree
wt_prefix=$(git rev-parse --show-prefix)
cd_to_toplevel
transport: add protocol policy config option Previously the `GIT_ALLOW_PROTOCOL` environment variable was used to specify a whitelist of protocols to be used in clone/fetch/push commands. This patch introduces new configuration options for more fine-grained control for allowing/disallowing protocols. This also has the added benefit of allowing easier construction of a protocol whitelist on systems where setting an environment variable is non-trivial. Now users can specify a policy to be used for each type of protocol via the 'protocol.<name>.allow' config option. A default policy for all unconfigured protocols can be set with the 'protocol.allow' config option. If no user configured default is made git will allow known-safe protocols (http, https, git, ssh, file), disallow known-dangerous protocols (ext), and have a default policy of `user` for all other protocols. The supported policies are `always`, `never`, and `user`. The `user` policy can be used to configure a protocol to be usable when explicitly used by a user, while disallowing it for commands which run clone/fetch/push commands without direct user intervention (e.g. recursive initialization of submodules). Commands which can potentially clone/fetch/push from untrusted repositories without user intervention can export `GIT_PROTOCOL_FROM_USER` with a value of '0' to prevent protocols configured to the `user` policy from being used. Fix remote-ext tests to use the new config to allow the ext protocol to be tested. Based on a patch by Jeff King <peff@peff.net> Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-15 06:39:52 +08:00
# Tell the rest of git that any URLs we get don't come
# directly from the user, so it can apply policy as appropriate.
GIT_PROTOCOL_FROM_USER=0
export GIT_PROTOCOL_FROM_USER
command=
quiet=
branch=
force=
reference=
cached=
recursive=
init=
clone --recurse-submodules: prevent name squatting on Windows In addition to preventing `.git` from being tracked by Git, on Windows we also have to prevent `git~1` from being tracked, as the default NTFS short name (also known as the "8.3 filename") for the file name `.git` is `git~1`, otherwise it would be possible for malicious repositories to write directly into the `.git/` directory, e.g. a `post-checkout` hook that would then be executed _during_ a recursive clone. When we implemented appropriate protections in 2b4c6efc821 (read-cache: optionally disallow NTFS .git variants, 2014-12-16), we had analyzed carefully that the `.git` directory or file would be guaranteed to be the first directory entry to be written. Otherwise it would be possible e.g. for a file named `..git` to be assigned the short name `git~1` and subsequently, the short name generated for `.git` would be `git~2`. Or `git~3`. Or even `~9999999` (for a detailed explanation of the lengths we have to go to protect `.gitmodules`, see the commit message of e7cb0b4455c (is_ntfs_dotgit: match other .git files, 2018-05-11)). However, by exploiting two issues (that will be addressed in a related patch series close by), it is currently possible to clone a submodule into a non-empty directory: - On Windows, file names cannot end in a space or a period (for historical reasons: the period separating the base name from the file extension was not actually written to disk, and the base name/file extension was space-padded to the full 8/3 characters, respectively). Helpfully, when creating a directory under the name, say, `sub.`, that trailing period is trimmed automatically and the actual name on disk is `sub`. This means that while Git thinks that the submodule names `sub` and `sub.` are different, they both access `.git/modules/sub/`. - While the backslash character is a valid file name character on Linux, it is not so on Windows. As Git tries to be cross-platform, it therefore allows backslash characters in the file names stored in tree objects. Which means that it is totally possible that a submodule `c` sits next to a file `c\..git`, and on Windows, during recursive clone a file called `..git` will be written into `c/`, of course _before_ the submodule is cloned. Note that the actual exploit is not quite as simple as having a submodule `c` next to a file `c\..git`, as we have to make sure that the directory `.git/modules/b` already exists when the submodule is checked out, otherwise a different code path is taken in `module_clone()` that does _not_ allow a non-empty submodule directory to exist already. Even if we will address both issues nearby (the next commit will disallow backslash characters in tree entries' file names on Windows, and another patch will disallow creating directories/files with trailing spaces or periods), it is a wise idea to defend in depth against this sort of attack vector: when submodules are cloned recursively, we now _require_ the directory to be empty, addressing CVE-2019-1349. Note: the code path we patch is shared with the code path of `git submodule update --init`, which must not expect, in general, that the directory is empty. Hence we have to introduce the new option `--force-init` and hand it all the way down from `git submodule` to the actual `git submodule--helper` process that performs the initial clone. Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-09-12 20:20:39 +08:00
require_init=
files=
submodule update: add --remote for submodule's upstream changes The current `update` command incorporates the superproject's gitlinked SHA-1 ($sha1) into the submodule HEAD ($subsha1). Depending on the options you use, it may checkout $sha1, rebase the $subsha1 onto $sha1, or merge $sha1 into $subsha1. This helps you keep up with changes in the upstream superproject. However, it's also useful to stay up to date with changes in the upstream subproject. Previous workflows for incorporating such changes include the ungainly: $ git submodule foreach 'git checkout $(git config --file $toplevel/.gitmodules submodule.$name.branch) && git pull' With this patch, all of the useful functionality for incorporating superproject changes can be reused to incorporate upstream subproject updates. When you specify --remote, the target $sha1 is replaced with a $sha1 of the submodule's origin/master tracking branch. If you want to merge a different tracking branch, you can configure the `submodule.<name>.branch` option in `.gitmodules`. You can override the `.gitmodules` configuration setting for a particular superproject by configuring the option in that superproject's default configuration (using the usual configuration hierarchy, e.g. `.git/config`, `~/.gitconfig`, etc.). Previous use of submodule.<name>.branch ======================================= Because we're adding a new configuration option, it's a good idea to check if anyone else is already using the option. The foreach-pull example above was described by Ævar in commit f030c96d8643fa0a1a9b2bd9c2f36a77721fb61f Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Date: Fri May 21 16:10:10 2010 +0000 git-submodule foreach: Add $toplevel variable Gerrit uses the same interpretation for the setting, but because Gerrit has direct access to the subproject repositories, it updates the superproject repositories automatically when a subproject changes. Gerrit also accepts the special value '.', which it expands into the superproject's branch name. Although the --remote functionality is using `submodule.<name>.branch` slightly differently, the effect is the same. The foreach-pull example uses the option to record the name of the local branch to checkout before pulls. The tracking branch to be pulled is recorded in `.git/modules/<name>/config`, which was initialized by the module clone during `submodule add` or `submodule init`. Because the branch name stored in `submodule.<name>.branch` was likely the same as the branch name used during the initial `submodule add`, the same branch will be pulled in each workflow. Implementation details ====================== In order to ensure a current tracking branch state, `update --remote` fetches the submodule's remote repository before calculating the SHA-1. However, I didn't change the logic guarding the existing fetch: if test -z "$nofetch" then # Run fetch only if $sha1 isn't present or it # is not reachable from a ref. (clear_local_git_env; cd "$path" && ( (rev=$(git rev-list -n 1 $sha1 --not --all 2>/dev/null) && test -z "$rev") || git-fetch)) || die "$(eval_gettext "Unable to fetch in submodule path '\$path'")" fi There will not be a double-fetch, because the new $sha1 determined after the `--remote` triggered fetch should always exist in the repository. If it doesn't, it's because some racy process removed it from the submodule's repository and we *should* be re-fetching. Signed-off-by: W. Trevor King <wking@tremily.us> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-20 00:03:32 +08:00
remote=
nofetch=
rebase=
merge=
checkout=
custom_name=
depth=
clone: pass --progress decision to recursive submodules When cloning with "--recursive", we'd generally expect submodules to show progress reports if the main clone did, too. In older versions of git, this mostly worked out of the box. Since we show progress by default when stderr is a tty, and since the child clones inherit the parent stderr, then both processes would come to the same decision by default. If the parent clone was asked for "--quiet", we passed down "--quiet" to the child. However, if stderr was not a tty and the user specified "--progress", we did not propagate this to the child. That's a minor bug, but things got much worse when we switched recently to submodule--helper's update_clone command. With that change, the stderr of the child clones are always connected to a pipe, and we never output progress at all. This patch teaches git-submodule and git-submodule--helper how to pass down an explicit "--progress" flag when cloning. The clone command then decides to propagate that flag based on the cloning decision made earlier (which takes into account isatty(2) of the parent process, existing --progress or --quiet flags, etc). Since the child processes always run without a tty on stderr, we don't have to worry about passing an explicit "--no-progress"; it's the default for them. This fixes the recent loss of progress during recursive clones. And as a bonus, it makes: git clone --recursive --progress ... 2>&1 | cat work by triggering progress explicitly in the children. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-22 13:24:46 +08:00
progress=
dissociate=
single_branch=
jobs=
recommend_shallow=
clone, submodule: pass partial clone filters to submodules When cloning a repo with a --filter and with --recurse-submodules enabled, the partial clone filter only applies to the top-level repo. This can lead to unexpected bandwidth and disk usage for projects which include large submodules. For example, a user might wish to make a partial clone of Gerrit and would run: `git clone --recurse-submodules --filter=blob:5k https://gerrit.googlesource.com/gerrit`. However, only the superproject would be a partial clone; all the submodules would have all blobs downloaded regardless of their size. With this change, the same filter can also be applied to submodules, meaning the expected bandwidth and disk savings apply consistently. To avoid changing default behavior, add a new clone flag, `--also-filter-submodules`. When this is set along with `--filter` and `--recurse-submodules`, the filter spec is passed along to git-submodule and git-submodule--helper, such that submodule clones also have the filter applied. This applies the same filter to the superproject and all submodules. Users who need to customize the filter per-submodule would need to clone with `--no-recurse-submodules` and then manually initialize each submodule with the proper filter. Applying filters to submodules should be safe thanks to Jonathan Tan's recent work [1, 2, 3] eliminating the use of alternates as a method of accessing submodule objects, so any submodule object access now triggers a lazy fetch from the submodule's promisor remote if the accessed object is missing. This patch is a reworked version of [4], which was created prior to Jonathan Tan's work. [1]: 8721e2e (Merge branch 'jt/partial-clone-submodule-1', 2021-07-16) [2]: 11e5d0a (Merge branch 'jt/grep-wo-submodule-odb-as-alternate', 2021-09-20) [3]: 162a13b (Merge branch 'jt/no-abuse-alternate-odb-for-submodules', 2021-10-25) [4]: https://lore.kernel.org/git/52bf9d45b8e2b72ff32aa773f2415bf7b2b86da2.1563322192.git.steadmon@google.com/ Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-05 13:00:49 +08:00
filter=
isnumber()
{
n=$(($1 + 0)) 2>/dev/null && test "$n" = "$1"
}
#
# Add a new submodule to the working tree, .gitmodules and the index
#
git-submodule - make "submodule add" more strict, and document it This change makes "submodule add" much more strict in the arguments it takes, and is intended to address confusion as recently noted on the git-list. With this change, the required syntax is: $ git submodule add URL path Specifically, this eliminates the form $ git submodule add URL which was confused by more than one person as $ git submodule add path With this patch, the URL locating the submodule's origin repository can be either an absolute URL, or (if it begins with ./ or ../) can express the submodule's repository location relative to the superproject's origin. This patch also eliminates a third form of URL, which was relative to the superproject's top-level directory (not its repository). Any URL that was neither absolute nor matched ./*|../* was assumed to point to a subdirectory of the superproject as the location of the submodule's origin repository. This URL form was confusing and does not seem to correspond to an important use-case. Specifically, no-one has identified the need to clone from a repository already in the superproject's tree, but if this is needed it is easily done using an absolute URL: $(pwd)/relative-path. So, no functionality is lost with this patch. (t6008-rev-list-submodule.sh did rely upon this relative URL, fixed by using $(pwd).) Following this change, there are exactly four variants of submodule-add, as both arguments have two flavors: URL can be absolute, or can begin with ./|../ and thus names the submodule's origin relative to the superproject's origin. Note: With this patch, "submodule add" discerns an absolute URL as matching /*|*:*: e.g., URL begins with /, or it contains a :. This works for all valid URLs, an absolute path in POSIX, as well as an absolute path on Windows). path can either already exist as a valid git repo, or will be cloned from the given URL. The first form here eases creation of a new submodule in an existing superproject as the submodule can be added and tested in-tree before pushing to the public repository. However, the more usual form is the second, where the repo is cloned from the given URL. This specifically addresses the issue of $ git submodule add a/b/c attempting to clone from a repository at "a/b/c" to create a new module in "c". This also simplifies description of "relative URL" as there is now exactly *one* form: a URL relative to the parent's origin repo. Signed-off-by: Mark Levedahl <mlevedahl@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-10 09:05:40 +08:00
# $@ = repo path
#
# optional branch is stored in global branch variable
#
cmd_add()
{
# parse $args after "submodule ... add".
reference_path=
while test $# -ne 0
do
case "$1" in
-b | --branch)
case "$2" in '') usage ;; esac
branch=$2
shift
;;
-f | --force)
force=$1
;;
-q|--quiet)
quiet=1
;;
--progress)
progress=1
;;
--reference)
case "$2" in '') usage ;; esac
reference_path=$2
shift
;;
--reference=*)
reference_path="${1#--reference=}"
;;
--ref-format)
case "$2" in '') usage ;; esac
ref_format="--ref-format=$2"
shift
;;
--ref-format=*)
ref_format="$1"
;;
--dissociate)
dissociate=1
;;
--name)
case "$2" in '') usage ;; esac
custom_name=$2
shift
;;
--depth)
case "$2" in '') usage ;; esac
depth="--depth=$2"
shift
;;
--depth=*)
depth=$1
;;
--)
shift
break
;;
-*)
usage
;;
*)
break
;;
esac
shift
done
if test -z "$1"
submodule: support reading .gitmodules when it's not in the working tree When the .gitmodules file is not available in the working tree, try using the content from the index and from the current branch. This covers the case when the file is part of the repository but for some reason it is not checked out, for example because of a sparse checkout. This makes it possible to use at least the 'git submodule' commands which *read* the gitmodules configuration file without fully populating the working tree. Writing to .gitmodules will still require that the file is checked out, so check for that before calling config_set_in_gitmodules_file_gently. Add a similar check also in git-submodule.sh::cmd_add() to anticipate the eventual failure of the "git submodule add" command when .gitmodules is not safely writeable; this prevents the command from leaving the repository in a spurious state (e.g. the submodule repository was cloned but .gitmodules was not updated because config_set_in_gitmodules_file_gently failed). Moreover, since config_from_gitmodules() now accesses the global object store, it is necessary to protect all code paths which call the function against concurrent access to the global object store. Currently this only happens in builtin/grep.c::grep_submodules(), so call grep_read_lock() before invoking code involving config_from_gitmodules(). Finally, add t7418-submodule-sparse-gitmodules.sh to verify that reading from .gitmodules succeeds and that writing to it fails when the file is not checked out. NOTE: there is one rare case where this new feature does not work properly yet: nested submodules without .gitmodules in their working tree. This has been documented with a warning and a test_expect_failure item in t7814, and in this case the current behavior is not altered: no config is read. Signed-off-by: Antonio Ospite <ao2@ao2.it> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-26 00:18:12 +08:00
then
usage
fi
git ${wt_prefix:+-C "$wt_prefix"} submodule--helper add \
${quiet:+--quiet} \
${force:+--force} \
${progress:+"--progress"} \
${branch:+--branch "$branch"} \
${reference_path:+--reference "$reference_path"} \
${ref_format:+"$ref_format"} \
${dissociate:+--dissociate} \
${custom_name:+--name "$custom_name"} \
${depth:+"$depth"} \
-- \
"$@"
}
#
# Execute an arbitrary command sequence in each checked out
# submodule
#
# $@ = command to execute
#
cmd_foreach()
{
# parse $args after "submodule ... foreach".
while test $# -ne 0
do
case "$1" in
-q|--quiet)
quiet=1
;;
--recursive)
recursive=1
;;
-*)
usage
;;
*)
break
;;
esac
shift
done
git ${wt_prefix:+-C "$wt_prefix"} submodule--helper foreach \
${quiet:+--quiet} \
${recursive:+--recursive} \
-- \
"$@"
}
#
# Register submodules in .git/config
#
# $@ = requested paths (default to all)
#
cmd_init()
{
# parse $args after "submodule ... init".
while test $# -ne 0
do
case "$1" in
-q|--quiet)
quiet=1
;;
--)
shift
break
;;
-*)
usage
;;
*)
break
;;
esac
shift
done
git ${wt_prefix:+-C "$wt_prefix"} submodule--helper init \
${quiet:+--quiet} \
-- \
"$@"
}
#
# Unregister submodules from .git/config and remove their work tree
#
cmd_deinit()
{
# parse $args after "submodule ... deinit".
deinit_all=
while test $# -ne 0
do
case "$1" in
-f|--force)
force=$1
;;
-q|--quiet)
quiet=1
;;
--all)
deinit_all=t
;;
--)
shift
break
;;
-*)
usage
;;
*)
break
;;
esac
shift
done
git ${wt_prefix:+-C "$wt_prefix"} submodule--helper deinit \
${quiet:+--quiet} \
${force:+--force} \
${deinit_all:+--all} \
-- \
"$@"
}
#
# Update each submodule path to correct revision, using clone and checkout as needed
#
# $@ = requested paths (default to all)
#
cmd_update()
{
# parse $args after "submodule ... update".
while test $# -ne 0
do
case "$1" in
-q|--quiet)
quiet=1
;;
-v|--verbose)
quiet=0
;;
clone: pass --progress decision to recursive submodules When cloning with "--recursive", we'd generally expect submodules to show progress reports if the main clone did, too. In older versions of git, this mostly worked out of the box. Since we show progress by default when stderr is a tty, and since the child clones inherit the parent stderr, then both processes would come to the same decision by default. If the parent clone was asked for "--quiet", we passed down "--quiet" to the child. However, if stderr was not a tty and the user specified "--progress", we did not propagate this to the child. That's a minor bug, but things got much worse when we switched recently to submodule--helper's update_clone command. With that change, the stderr of the child clones are always connected to a pipe, and we never output progress at all. This patch teaches git-submodule and git-submodule--helper how to pass down an explicit "--progress" flag when cloning. The clone command then decides to propagate that flag based on the cloning decision made earlier (which takes into account isatty(2) of the parent process, existing --progress or --quiet flags, etc). Since the child processes always run without a tty on stderr, we don't have to worry about passing an explicit "--no-progress"; it's the default for them. This fixes the recent loss of progress during recursive clones. And as a bonus, it makes: git clone --recursive --progress ... 2>&1 | cat work by triggering progress explicitly in the children. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-22 13:24:46 +08:00
--progress)
progress=1
clone: pass --progress decision to recursive submodules When cloning with "--recursive", we'd generally expect submodules to show progress reports if the main clone did, too. In older versions of git, this mostly worked out of the box. Since we show progress by default when stderr is a tty, and since the child clones inherit the parent stderr, then both processes would come to the same decision by default. If the parent clone was asked for "--quiet", we passed down "--quiet" to the child. However, if stderr was not a tty and the user specified "--progress", we did not propagate this to the child. That's a minor bug, but things got much worse when we switched recently to submodule--helper's update_clone command. With that change, the stderr of the child clones are always connected to a pipe, and we never output progress at all. This patch teaches git-submodule and git-submodule--helper how to pass down an explicit "--progress" flag when cloning. The clone command then decides to propagate that flag based on the cloning decision made earlier (which takes into account isatty(2) of the parent process, existing --progress or --quiet flags, etc). Since the child processes always run without a tty on stderr, we don't have to worry about passing an explicit "--no-progress"; it's the default for them. This fixes the recent loss of progress during recursive clones. And as a bonus, it makes: git clone --recursive --progress ... 2>&1 | cat work by triggering progress explicitly in the children. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-22 13:24:46 +08:00
;;
-i|--init)
init=1
;;
clone --recurse-submodules: prevent name squatting on Windows In addition to preventing `.git` from being tracked by Git, on Windows we also have to prevent `git~1` from being tracked, as the default NTFS short name (also known as the "8.3 filename") for the file name `.git` is `git~1`, otherwise it would be possible for malicious repositories to write directly into the `.git/` directory, e.g. a `post-checkout` hook that would then be executed _during_ a recursive clone. When we implemented appropriate protections in 2b4c6efc821 (read-cache: optionally disallow NTFS .git variants, 2014-12-16), we had analyzed carefully that the `.git` directory or file would be guaranteed to be the first directory entry to be written. Otherwise it would be possible e.g. for a file named `..git` to be assigned the short name `git~1` and subsequently, the short name generated for `.git` would be `git~2`. Or `git~3`. Or even `~9999999` (for a detailed explanation of the lengths we have to go to protect `.gitmodules`, see the commit message of e7cb0b4455c (is_ntfs_dotgit: match other .git files, 2018-05-11)). However, by exploiting two issues (that will be addressed in a related patch series close by), it is currently possible to clone a submodule into a non-empty directory: - On Windows, file names cannot end in a space or a period (for historical reasons: the period separating the base name from the file extension was not actually written to disk, and the base name/file extension was space-padded to the full 8/3 characters, respectively). Helpfully, when creating a directory under the name, say, `sub.`, that trailing period is trimmed automatically and the actual name on disk is `sub`. This means that while Git thinks that the submodule names `sub` and `sub.` are different, they both access `.git/modules/sub/`. - While the backslash character is a valid file name character on Linux, it is not so on Windows. As Git tries to be cross-platform, it therefore allows backslash characters in the file names stored in tree objects. Which means that it is totally possible that a submodule `c` sits next to a file `c\..git`, and on Windows, during recursive clone a file called `..git` will be written into `c/`, of course _before_ the submodule is cloned. Note that the actual exploit is not quite as simple as having a submodule `c` next to a file `c\..git`, as we have to make sure that the directory `.git/modules/b` already exists when the submodule is checked out, otherwise a different code path is taken in `module_clone()` that does _not_ allow a non-empty submodule directory to exist already. Even if we will address both issues nearby (the next commit will disallow backslash characters in tree entries' file names on Windows, and another patch will disallow creating directories/files with trailing spaces or periods), it is a wise idea to defend in depth against this sort of attack vector: when submodules are cloned recursively, we now _require_ the directory to be empty, addressing CVE-2019-1349. Note: the code path we patch is shared with the code path of `git submodule update --init`, which must not expect, in general, that the directory is empty. Hence we have to introduce the new option `--force-init` and hand it all the way down from `git submodule` to the actual `git submodule--helper` process that performs the initial clone. Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-09-12 20:20:39 +08:00
--require-init)
require_init=1
;;
submodule update: add --remote for submodule's upstream changes The current `update` command incorporates the superproject's gitlinked SHA-1 ($sha1) into the submodule HEAD ($subsha1). Depending on the options you use, it may checkout $sha1, rebase the $subsha1 onto $sha1, or merge $sha1 into $subsha1. This helps you keep up with changes in the upstream superproject. However, it's also useful to stay up to date with changes in the upstream subproject. Previous workflows for incorporating such changes include the ungainly: $ git submodule foreach 'git checkout $(git config --file $toplevel/.gitmodules submodule.$name.branch) && git pull' With this patch, all of the useful functionality for incorporating superproject changes can be reused to incorporate upstream subproject updates. When you specify --remote, the target $sha1 is replaced with a $sha1 of the submodule's origin/master tracking branch. If you want to merge a different tracking branch, you can configure the `submodule.<name>.branch` option in `.gitmodules`. You can override the `.gitmodules` configuration setting for a particular superproject by configuring the option in that superproject's default configuration (using the usual configuration hierarchy, e.g. `.git/config`, `~/.gitconfig`, etc.). Previous use of submodule.<name>.branch ======================================= Because we're adding a new configuration option, it's a good idea to check if anyone else is already using the option. The foreach-pull example above was described by Ævar in commit f030c96d8643fa0a1a9b2bd9c2f36a77721fb61f Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Date: Fri May 21 16:10:10 2010 +0000 git-submodule foreach: Add $toplevel variable Gerrit uses the same interpretation for the setting, but because Gerrit has direct access to the subproject repositories, it updates the superproject repositories automatically when a subproject changes. Gerrit also accepts the special value '.', which it expands into the superproject's branch name. Although the --remote functionality is using `submodule.<name>.branch` slightly differently, the effect is the same. The foreach-pull example uses the option to record the name of the local branch to checkout before pulls. The tracking branch to be pulled is recorded in `.git/modules/<name>/config`, which was initialized by the module clone during `submodule add` or `submodule init`. Because the branch name stored in `submodule.<name>.branch` was likely the same as the branch name used during the initial `submodule add`, the same branch will be pulled in each workflow. Implementation details ====================== In order to ensure a current tracking branch state, `update --remote` fetches the submodule's remote repository before calculating the SHA-1. However, I didn't change the logic guarding the existing fetch: if test -z "$nofetch" then # Run fetch only if $sha1 isn't present or it # is not reachable from a ref. (clear_local_git_env; cd "$path" && ( (rev=$(git rev-list -n 1 $sha1 --not --all 2>/dev/null) && test -z "$rev") || git-fetch)) || die "$(eval_gettext "Unable to fetch in submodule path '\$path'")" fi There will not be a double-fetch, because the new $sha1 determined after the `--remote` triggered fetch should always exist in the repository. If it doesn't, it's because some racy process removed it from the submodule's repository and we *should* be re-fetching. Signed-off-by: W. Trevor King <wking@tremily.us> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-20 00:03:32 +08:00
--remote)
remote=1
;;
-N|--no-fetch)
nofetch=1
;;
-f|--force)
force=$1
;;
-r|--rebase)
rebase=1
;;
--ref-format)
case "$2" in '') usage ;; esac
ref_format="--ref-format=$2"
shift
;;
--ref-format=*)
ref_format="$1"
;;
--reference)
case "$2" in '') usage ;; esac
reference="--reference=$2"
shift
;;
--reference=*)
reference="$1"
;;
--dissociate)
dissociate=1
;;
-m|--merge)
merge=1
;;
--recursive)
recursive=1
;;
--checkout)
checkout=1
;;
--recommend-shallow)
recommend_shallow="--recommend-shallow"
;;
--no-recommend-shallow)
recommend_shallow="--no-recommend-shallow"
;;
--depth)
case "$2" in '') usage ;; esac
depth="--depth=$2"
shift
;;
--depth=*)
depth=$1
;;
-j|--jobs)
case "$2" in '') usage ;; esac
jobs="--jobs=$2"
shift
;;
--jobs=*)
jobs=$1
;;
--single-branch)
single_branch="--single-branch"
;;
--no-single-branch)
single_branch="--no-single-branch"
;;
clone, submodule: pass partial clone filters to submodules When cloning a repo with a --filter and with --recurse-submodules enabled, the partial clone filter only applies to the top-level repo. This can lead to unexpected bandwidth and disk usage for projects which include large submodules. For example, a user might wish to make a partial clone of Gerrit and would run: `git clone --recurse-submodules --filter=blob:5k https://gerrit.googlesource.com/gerrit`. However, only the superproject would be a partial clone; all the submodules would have all blobs downloaded regardless of their size. With this change, the same filter can also be applied to submodules, meaning the expected bandwidth and disk savings apply consistently. To avoid changing default behavior, add a new clone flag, `--also-filter-submodules`. When this is set along with `--filter` and `--recurse-submodules`, the filter spec is passed along to git-submodule and git-submodule--helper, such that submodule clones also have the filter applied. This applies the same filter to the superproject and all submodules. Users who need to customize the filter per-submodule would need to clone with `--no-recurse-submodules` and then manually initialize each submodule with the proper filter. Applying filters to submodules should be safe thanks to Jonathan Tan's recent work [1, 2, 3] eliminating the use of alternates as a method of accessing submodule objects, so any submodule object access now triggers a lazy fetch from the submodule's promisor remote if the accessed object is missing. This patch is a reworked version of [4], which was created prior to Jonathan Tan's work. [1]: 8721e2e (Merge branch 'jt/partial-clone-submodule-1', 2021-07-16) [2]: 11e5d0a (Merge branch 'jt/grep-wo-submodule-odb-as-alternate', 2021-09-20) [3]: 162a13b (Merge branch 'jt/no-abuse-alternate-odb-for-submodules', 2021-10-25) [4]: https://lore.kernel.org/git/52bf9d45b8e2b72ff32aa773f2415bf7b2b86da2.1563322192.git.steadmon@google.com/ Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-05 13:00:49 +08:00
--filter)
case "$2" in '') usage ;; esac
filter="--filter=$2"
shift
;;
--filter=*)
filter="$1"
;;
--)
shift
break
;;
-*)
usage
;;
*)
break
;;
esac
shift
done
git ${wt_prefix:+-C "$wt_prefix"} submodule--helper update \
${quiet:+--quiet} \
${force:+--force} \
${progress:+"--progress"} \
${remote:+--remote} \
${recursive:+--recursive} \
${init:+--init} \
${nofetch:+--no-fetch} \
${rebase:+--rebase} \
${merge:+--merge} \
${checkout:+--checkout} \
${ref_format:+"$ref_format"} \
${reference:+"$reference"} \
${dissociate:+"--dissociate"} \
${depth:+"$depth"} \
clone --recurse-submodules: prevent name squatting on Windows In addition to preventing `.git` from being tracked by Git, on Windows we also have to prevent `git~1` from being tracked, as the default NTFS short name (also known as the "8.3 filename") for the file name `.git` is `git~1`, otherwise it would be possible for malicious repositories to write directly into the `.git/` directory, e.g. a `post-checkout` hook that would then be executed _during_ a recursive clone. When we implemented appropriate protections in 2b4c6efc821 (read-cache: optionally disallow NTFS .git variants, 2014-12-16), we had analyzed carefully that the `.git` directory or file would be guaranteed to be the first directory entry to be written. Otherwise it would be possible e.g. for a file named `..git` to be assigned the short name `git~1` and subsequently, the short name generated for `.git` would be `git~2`. Or `git~3`. Or even `~9999999` (for a detailed explanation of the lengths we have to go to protect `.gitmodules`, see the commit message of e7cb0b4455c (is_ntfs_dotgit: match other .git files, 2018-05-11)). However, by exploiting two issues (that will be addressed in a related patch series close by), it is currently possible to clone a submodule into a non-empty directory: - On Windows, file names cannot end in a space or a period (for historical reasons: the period separating the base name from the file extension was not actually written to disk, and the base name/file extension was space-padded to the full 8/3 characters, respectively). Helpfully, when creating a directory under the name, say, `sub.`, that trailing period is trimmed automatically and the actual name on disk is `sub`. This means that while Git thinks that the submodule names `sub` and `sub.` are different, they both access `.git/modules/sub/`. - While the backslash character is a valid file name character on Linux, it is not so on Windows. As Git tries to be cross-platform, it therefore allows backslash characters in the file names stored in tree objects. Which means that it is totally possible that a submodule `c` sits next to a file `c\..git`, and on Windows, during recursive clone a file called `..git` will be written into `c/`, of course _before_ the submodule is cloned. Note that the actual exploit is not quite as simple as having a submodule `c` next to a file `c\..git`, as we have to make sure that the directory `.git/modules/b` already exists when the submodule is checked out, otherwise a different code path is taken in `module_clone()` that does _not_ allow a non-empty submodule directory to exist already. Even if we will address both issues nearby (the next commit will disallow backslash characters in tree entries' file names on Windows, and another patch will disallow creating directories/files with trailing spaces or periods), it is a wise idea to defend in depth against this sort of attack vector: when submodules are cloned recursively, we now _require_ the directory to be empty, addressing CVE-2019-1349. Note: the code path we patch is shared with the code path of `git submodule update --init`, which must not expect, in general, that the directory is empty. Hence we have to introduce the new option `--force-init` and hand it all the way down from `git submodule` to the actual `git submodule--helper` process that performs the initial clone. Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-09-12 20:20:39 +08:00
${require_init:+--require-init} \
${dissociate:+"--dissociate"} \
$single_branch \
$recommend_shallow \
$jobs \
clone, submodule: pass partial clone filters to submodules When cloning a repo with a --filter and with --recurse-submodules enabled, the partial clone filter only applies to the top-level repo. This can lead to unexpected bandwidth and disk usage for projects which include large submodules. For example, a user might wish to make a partial clone of Gerrit and would run: `git clone --recurse-submodules --filter=blob:5k https://gerrit.googlesource.com/gerrit`. However, only the superproject would be a partial clone; all the submodules would have all blobs downloaded regardless of their size. With this change, the same filter can also be applied to submodules, meaning the expected bandwidth and disk savings apply consistently. To avoid changing default behavior, add a new clone flag, `--also-filter-submodules`. When this is set along with `--filter` and `--recurse-submodules`, the filter spec is passed along to git-submodule and git-submodule--helper, such that submodule clones also have the filter applied. This applies the same filter to the superproject and all submodules. Users who need to customize the filter per-submodule would need to clone with `--no-recurse-submodules` and then manually initialize each submodule with the proper filter. Applying filters to submodules should be safe thanks to Jonathan Tan's recent work [1, 2, 3] eliminating the use of alternates as a method of accessing submodule objects, so any submodule object access now triggers a lazy fetch from the submodule's promisor remote if the accessed object is missing. This patch is a reworked version of [4], which was created prior to Jonathan Tan's work. [1]: 8721e2e (Merge branch 'jt/partial-clone-submodule-1', 2021-07-16) [2]: 11e5d0a (Merge branch 'jt/grep-wo-submodule-odb-as-alternate', 2021-09-20) [3]: 162a13b (Merge branch 'jt/no-abuse-alternate-odb-for-submodules', 2021-10-25) [4]: https://lore.kernel.org/git/52bf9d45b8e2b72ff32aa773f2415bf7b2b86da2.1563322192.git.steadmon@google.com/ Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-05 13:00:49 +08:00
$filter \
submodule foreach: fix "<command> --quiet" not being respected Robin reported that git submodule foreach --quiet git pull --quiet origin is not really quiet anymore [1]. "git pull" behaves as if --quiet is not given. This happens because parseopt in submodule--helper will try to parse both --quiet options as if they are foreach's options, not git-pull's. The parsed options are removed from the command line. So when we do pull later, we execute just this git pull origin When calling submodule helper, adding "--" in front of "git pull" will stop parseopt for parsing options that do not really belong to submodule--helper foreach. PARSE_OPT_KEEP_UNKNOWN is removed as a safety measure. parseopt should never see unknown options or something has gone wrong. There are also a couple usage string update while I'm looking at them. While at it, I also add "--" to other subcommands that pass "$@" to submodule--helper. "$@" in these cases are paths and less likely to be --something-like-this. But the point still stands, git-submodule has parsed and classified what are options, what are paths. submodule--helper should never consider paths passed by git-submodule to be options even if they look like one. The test case is also contributed by Robin. [1] it should be quiet before fc1b9243cd (submodule: port submodule subcommand 'foreach' from shell to C, 2018-05-10) because parseopt can't accidentally eat options then. Reported-by: Robin H. Johnson <robbat2@gentoo.org> Tested-by: Robin H. Johnson <robbat2@gentoo.org> Signed-off-by: Robin H. Johnson <robbat2@gentoo.org> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-12 18:08:19 +08:00
-- \
"$@"
}
#
# Configures a submodule's default branch
#
# $@ = requested path
#
cmd_set_branch() {
default=
branch=
while test $# -ne 0
do
case "$1" in
-q|--quiet)
# we don't do anything with this but we need to accept it
;;
-d|--default)
default=1
;;
-b|--branch)
case "$2" in '') usage ;; esac
branch=$2
shift
;;
--)
shift
break
;;
-*)
usage
;;
*)
break
;;
esac
shift
done
git ${wt_prefix:+-C "$wt_prefix"} submodule--helper set-branch \
${quiet:+--quiet} \
${branch:+--branch "$branch"} \
${default:+--default} \
-- \
"$@"
}
#
# Configures a submodule's remote url
#
# $@ = requested path, requested url
#
cmd_set_url() {
while test $# -ne 0
do
case "$1" in
-q|--quiet)
quiet=1
;;
--)
shift
break
;;
-*)
usage
;;
*)
break
;;
esac
shift
done
git ${wt_prefix:+-C "$wt_prefix"} submodule--helper set-url \
${quiet:+--quiet} \
-- \
"$@"
}
#
# Show commit summary for submodules in index or working tree
#
# If '--cached' is given, show summary between index and given commit,
# or between working tree and given commit
#
# $@ = [commit (default 'HEAD'),] requested paths (default all)
#
cmd_summary() {
summary_limit=-1
for_status=
diff_cmd=diff-index
# parse $args after "submodule ... summary".
while test $# -ne 0
do
case "$1" in
--cached)
cached=1
;;
--files)
files="$1"
;;
--for-status)
for_status="$1"
;;
-n|--summary-limit)
summary_limit="$2"
isnumber "$summary_limit" || usage
shift
;;
--summary-limit=*)
summary_limit="${1#--summary-limit=}"
isnumber "$summary_limit" || usage
;;
--)
shift
break
;;
-*)
usage
;;
*)
break
;;
esac
shift
done
git ${wt_prefix:+-C "$wt_prefix"} submodule--helper summary \
${files:+--files} \
${cached:+--cached} \
${for_status:+--for-status} \
${summary_limit:+-n $summary_limit} \
-- \
"$@"
}
#
# List all submodules, prefixed with:
# - submodule not initialized
# + different revision checked out
#
# If --cached was specified the revision in the index will be printed
# instead of the currently checked out revision.
#
# $@ = requested paths (default to all)
#
cmd_status()
{
# parse $args after "submodule ... status".
while test $# -ne 0
do
case "$1" in
-q|--quiet)
quiet=1
;;
--cached)
cached=1
;;
--recursive)
recursive=1
;;
--)
shift
break
;;
-*)
usage
;;
*)
break
;;
esac
shift
done
git ${wt_prefix:+-C "$wt_prefix"} submodule--helper status \
${quiet:+--quiet} \
${cached:+--cached} \
${recursive:+--recursive} \
-- \
"$@"
}
#
# Sync remote urls for submodules
# This makes the value for remote.$remote.url match the value
# specified in .gitmodules.
#
cmd_sync()
{
while test $# -ne 0
do
case "$1" in
-q|--quiet)
quiet=1
shift
;;
--recursive)
recursive=1
shift
;;
--)
shift
break
;;
-*)
usage
;;
*)
break
;;
esac
done
git ${wt_prefix:+-C "$wt_prefix"} submodule--helper sync \
${quiet:+--quiet} \
${recursive:+--recursive} \
-- \
"$@"
}
cmd_absorbgitdirs()
{
git ${wt_prefix:+-C "$wt_prefix"} submodule--helper absorbgitdirs "$@"
}
# This loop parses the command line arguments to find the
# subcommand name to dispatch. Parsing of the subcommand specific
# options are primarily done by the subcommand implementations.
# Subcommand specific options such as --branch and --cached are
# parsed here as well, for backward compatibility.
while test $# != 0 && test -z "$command"
do
case "$1" in
add | foreach | init | deinit | update | set-branch | set-url | status | summary | sync | absorbgitdirs)
command=$1
;;
-q|--quiet)
quiet=1
;;
--cached)
cached=1
;;
--)
break
;;
-*)
usage
;;
*)
break
;;
esac
shift
done
# No command word defaults to "status"
if test -z "$command"
then
if test $# = 0
then
command=status
else
usage
fi
fi
# "--cached" is accepted only by "status" and "summary"
if test -n "$cached" && test "$command" != status && test "$command" != summary
then
usage
fi
"cmd_$(echo $command | sed -e s/-/_/g)" "$@"