In the past, we used to execute `git gc --auto` as part of our automatic
housekeeping routines. As git-gc(1) may require quite some time to
perform the housekeeping, it knows to detach itself and run in the
background so that the user can continue their work.
Eventually, we refactored our automatic housekeeping to instead use the
more flexible git-maintenance(1) command. The upside of this new infra
is that the user can configure which maintenance tasks are performed, at
least to a certain degree. So while it continues to run git-gc(1) by
default, it can also be adapted to e.g. use git-multi-pack-index(1) for
maintenance of the object database.
The auto-detach of the new infra is somewhat broken though once the user
configures non-standard tasks. The problem is essentially that we detach
at the wrong level in the process hierarchy: git-maintenance(1) never
detaches itself, but instead it continues to be git-gc(1) which does.
When configured to only run the git-gc(1) maintenance task, then the
result is basically the same as before. But when configured to run other
tasks, then git-maintenance(1) will wait for these to run to completion.
Even worse, it may be that git-gc(1) runs concurrently with other
housekeeping tasks, stomping on each others feet.
Fix this bug by asking git-gc(1) to not detach when it is being invoked
via git-maintenance(1). Instead, git-maintenance(1) now respects a new
config "maintenance.autoDetach", the equivalent of "gc.autoDetach", and
detaches itself into the background when running as part of our auto
maintenance. This should continue to behave the same for all users which
use the git-gc(1) task, only. For others though, it means that we now
properly perform all tasks in the background. The default behaviour of
git-maintenance(1) when executed by the user does not change, it will
remain in the foreground unless they pass the `--detach` option.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The intention is to use it in `git var GIT_SHELL_PATH`, therefore we
need this function to stop being file-local only.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 776297548e (Do not use SHELL_PATH from build system in
prepare_shell_cmd on Windows, 2012-04-17), the hard-coded path to the
Unix shell was replaced by passing `sh` instead when executing Unix
shell scripts in Git.
This was done because the hard-coded path to the Unix shell is incorrect
on Windows because it not only is a Unix-style absolute path instead of
a Windows one, but Git uses the runtime prefix feature on Windows, i.e.
the correct path cannot be hard-coded.
Naturally, the `sh` argument will be resolved to the full path of said
executable eventually.
To help fixing the bug where `git var GIT_SHELL_PATH` currently does not
reflect that logic, but shows that incorrect hard-coded Unix-style
absolute path, let's resolve the full path to the `sh` executable early
in the `git_shell_path()` function so that we can use it in `git var`,
too, and be sure that the output is equivalent to what `run_command()`
does when it is asked to execute a command-line using a Unix shell.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This encapsulates the platform-specific logic better.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help
transition the codebase to rely less on the availability of the
singleton the_repository instance.
* ps/use-the-repository:
hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE`
t/helper: remove dependency on `the_repository` in "proc-receive"
t/helper: fix segfault in "oid-array" command without repository
t/helper: use correct object hash in partial-clone helper
compat/fsmonitor: fix socket path in networked SHA256 repos
replace-object: use hash algorithm from passed-in repository
protocol-caps: use hash algorithm from passed-in repository
oidset: pass hash algorithm when parsing file
http-fetch: don't crash when parsing packfile without a repo
hash-ll: merge with "hash.h"
refs: avoid include cycle with "repository.h"
global: introduce `USE_THE_REPOSITORY_VARIABLE` macro
hash: require hash algorithm in `empty_tree_oid_hex()`
hash: require hash algorithm in `is_empty_{blob,tree}_oid()`
hash: make `is_null_oid()` independent of `the_repository`
hash: convert `oidcmp()` and `oideq()` to compare whole hash
global: ensure that object IDs are always padded
hash: require hash algorithm in `oidread()` and `oidclr()`
hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()`
hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions
Building with "-Werror -Wwrite-strings" is now supported.
* ps/no-writable-strings: (27 commits)
config.mak.dev: enable `-Wwrite-strings` warning
builtin/merge: always store allocated strings in `pull_twohead`
builtin/rebase: always store allocated string in `options.strategy`
builtin/rebase: do not assign default backend to non-constant field
imap-send: fix leaking memory in `imap_server_conf`
imap-send: drop global `imap_server_conf` variable
mailmap: always store allocated strings in mailmap blob
revision: always store allocated strings in output encoding
remote-curl: avoid assigning string constant to non-const variable
send-pack: always allocate receive status
parse-options: cast long name for OPTION_ALIAS
http: do not assign string constant to non-const field
compat/win32: fix const-correctness with string constants
pretty: add casts for decoration option pointers
object-file: make `buf` parameter of `index_mem()` a constant
object-file: mark cached object buffers as const
ident: add casts for fallback name and GECOS
entry: refactor how we remove items for delayed checkouts
line-log: always allocate the output prefix
line-log: stop assigning string constant to file parent buffer
...
Varargs functions that are unannotated as printf-like or execl-like
have been annotated as such.
* jc/varargs-attributes:
__attribute__: add a few missing format attributes
__attribute__: mark some functions with LAST_ARG_MUST_BE_NULL
__attribute__: remove redundant attribute declaration for git_die_config()
__attribute__: trace2_region_enter_printf() is like "printf"
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.
It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.
Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit
For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).
Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The last part of the parameter list the function takes is like
parameters to printf. Mark it as such.
An existing call that formats a value of type size_t using "%d" was
found by the compiler with the help with this annotation; fix it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're about to enable `-Wwrite-strings`, which changes the type of
string constants to `const char[]`. Fix various sites where we assign
such constants to non-const variables.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds a trace point in start_command so we can see the full
command invocation without having to resort to strace/code inspection.
For example:
$ GIT_TRACE=1 git test foo
git.c:755 trace: exec: git-test foo
run-command.c:657 trace: run_command: git-test foo
run-command.c:657 trace: run_command: 'echo $*' foo
run-command.c:749 trace: start_command: /bin/sh -c 'echo $* "$@"' 'echo $*' foo
Prior changes have made the documentation around the internals of the
alias command execution clearer, but I have still found this detailed
view of the aliased command being run helpful for debugging purposes.
A test case is added to ensure the full command output is present in
the execution flow.
Signed-off-by: Ian Wienand <iwienand@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `run_auto_maintenance()` function is responsible for spawning a new
`git maintenance run --auto` process. To do so, it sets up the `sturct
child_process` and then runs it by executing `run_command()` directly.
This is rather inflexible in case callers want to modify the child
process somewhat, e.g. to redirect stderr or stdout.
Introduce a new `prepare_auto_maintenance()` function to plug this gap.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each of these were checked with
gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).
...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file. These cases were:
* builtin/credential-cache.c
* builtin/pull.c
* builtin/send-pack.c
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git bisect visualize" stopped running "gitk" on Git for Windows
when the command was reimplemented in C around Git 2.34 timeframe.
This has been corrected.
* ma/locate-in-path-for-windows:
docs: update when `git bisect visualize` uses `gitk`
compat/mingw: implement a native locate_in_PATH()
run-command: conditionally define locate_in_PATH()
This commit doesn't change any behaviour by itself, but allows us to easily
define compat replacements for locate_in_PATH(). It prepares us for the next
commit that adds a native Windows implementation of locate_in_PATH().
Signed-off-by: Matthias Aßhauer <mha1993@live.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Header files cleanup.
* en/header-split-cache-h-part-3: (28 commits)
fsmonitor-ll.h: split this header out of fsmonitor.h
hash-ll, hashmap: move oidhash() to hash-ll
object-store-ll.h: split this header out of object-store.h
khash: name the structs that khash declares
merge-ll: rename from ll-merge
git-compat-util.h: remove unneccessary include of wildmatch.h
builtin.h: remove unneccessary includes
list-objects-filter-options.h: remove unneccessary include
diff.h: remove unnecessary include of oidset.h
repository: remove unnecessary include of path.h
log-tree: replace include of revision.h with simple forward declaration
cache.h: remove this no-longer-used header
read-cache*.h: move declarations for read-cache.c functions from cache.h
repository.h: move declaration of the_index from cache.h
merge.h: move declarations for merge.c from cache.h
diff.h: move declaration for global in diff.c from cache.h
preload-index.h: move declarations for preload-index.c from elsewhere
sparse-index.h: move declarations for sparse-index.c from cache.h
name-hash.h: move declarations for name-hash.c from cache.h
run-command.h: move declarations for run-command.c from cache.h
...
Simplify error message when run-command fails to start a command.
* rs/run-command-exec-error-on-noent:
run-command: report exec error even on ENOENT
t1800: loosen matching of error message for bad shebang
If execve(2) fails with ENOENT and we report the error, we use the
format "cannot run %s", followed by the actual error message. For other
errors we use "cannot exec '%s'".
Stop making this subtle distinction and use the second format for all
execve(2) errors. This simplifies the code and makes the prefix more
precise by indicating the failed operation. It also allows us to
slightly simplify t1800.16.
On Windows -- which lacks execve(2) -- we already use a single format in
all cases: "cannot spawn %s".
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix the build problem with NO_PTHREADS defined, a fallout from
recent header file shuffling.
* tb/run-command-needs-alloc-h:
run-command.c: fix missing include under `NO_PTHREADS`
Git 2.41-rc0 fails to compile run-command.c with `NO_PTHREADS` defined,
i.e.
$ make NO_PTHREADS=1 run-command.o
as `ALLOC_GROW()` macro is used in the `atexit()` emulation but the
macro definition is not available.
This bisects to 36bf195890 (alloc.h: move ALLOC_GROW() functions from
cache.h, 2023-02-24), which replaced includes of <cache.h> with
<alloc.h>, which is the new home of `ALLOC_GROW()` (and
`ALLOC_GROW_BY()`).
We can see that run-command.c is the only one that try to use these
macros without including <alloc.h>.
$ git grep -l -e '[^_]ALLOC_GROW(' -e 'ALLOC_GROW_BY(' \*.c | sort >/tmp/1
$ git grep -l 'alloc\.h' \*.c | sort >/tmp/2
$ comm -23 /tmp/[12]
compat/simple-ipc/ipc-unix-socket.c
run-command.c
The "compat/" file only talks about the macro in the comment,
and is not broken.
We could fix this by conditionally including "alloc.h" when
`NO_PTHREADS` is defined. But we have relatively few examples of
conditional includes throughout the tree[^1].
Instead, include "alloc.h" unconditionally in run-command.c to allow it
to successfully compile even when NO_PTHREADS is defined.
[^1]: With `git grep -E -A1 '#if(n)?def' -- **/*.c | grep '#include' -B1`.
Reported-by: Randall S. Becker <randall.becker@nexbridge.ca>
Co-authored-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Dozens of files made use of trace and trace2 functions, without
explicitly including trace.h or trace2.h. This made it more difficult
to find which files could remove a dependence on cache.h. Make C files
explicitly include trace.h or trace2.h if they are using them.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Dozens of files made use of gettext functions, without explicitly
including gettext.h. This made it more difficult to find which files
could remove a dependence on cache.h. Make C files explicitly include
gettext.h if they are using it.
However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After forking but before exec-ing a command, we install special
error/warn/die handlers in the child. These ignore the error messages
they get, since the idea is that they shouldn't be called in the first
place.
Arguably they could pass along that error message _in addition_ to
saying "error() should not be called in a child", but since the whole
point is to avoid any conflicts on stdio/malloc locks, etc, we're better
to just keep these simple. Seeing them trigger is effectively a bug, and
the developer is probably better off grabbing a stack trace.
But we do want to mark the functions so that -Wunused-parameter doesn't
complain.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While it makes sense not to inherit stdin from the parent process to
avoid deadlocking, it's not necessary to completely ban stdin to
children. An informed user should be able to configure stdin safely. By
setting `some_child.process.no_stdin=1` before calling `get_next_task()`
we provide a reasonable default behavior but enable users to set up
stdin streaming for themselves during the callback.
`some_child.process.stdout_to_stderr`, however, remains unmodifiable by
`get_next_task()` - the rest of the run_processes_parallel() API depends
on child output in stderr.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove code that's been unused since it was added in
c553c72eed (run-command: add an asynchronous parallel child
processor, 2015-12-15).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix typos in code comments which repeat various words. Most of the
cases are simple in that they repeat a word that usually cannot be
repeated in a grammatically correct sentence. Just remove the
incorrectly duplicated word in these cases and rewrap text, if needed.
A tricky case is usage of "that that", which is sometimes grammatically
correct. However, an instance of this in "t7527-builtin-fsmonitor.sh"
doesn't need two words "that", because there is only one daemon being
discussed, so replace the second "that" with "the".
Reword code comment "entries exist on on-disk index" in function
update_one in file cache-tree.c, by replacing incorrect preposition "on"
with "in".
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation for pthread_create and pthread_sigmask state that:
"On success, pthread_create() returns 0;
on error, it returns an error number"
As such, we ought to check for an error
by seeing if the output is not 0.
Checking for "less than" is a mistake
as the error code numbers can be greater than 0.
Signed-off-by: Seija <doremylover123@gmail.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Simplify the run-command API.
* rs/no-more-run-command-v:
replace and remove run_command_v_opt()
replace and remove run_command_v_opt_cd_env_tr2()
replace and remove run_command_v_opt_tr2()
replace and remove run_command_v_opt_cd_env()
use child_process members "args" and "env" directly
use child_process member "args" instead of string array variable
sequencer: simplify building argument list in do_exec()
bisect--helper: factor out do_bisect_run()
bisect: simplify building "checkout" argument list
am: simplify building "show" argument list
run-command: fix return value comment
merge: remove always-the-same "verbose" arguments
Replace the remaining calls of run_command_v_opt() with run_command()
calls and explict struct child_process variables. This is more verbose,
but not by much overall. The code becomes more flexible, e.g. it's easy
to extend to conditionally add a new argument.
Then remove the now unused function and its own flag names, simplifying
the run-command API.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The convenience function run_command_v_opt_cd_env_tr2() has no external
callers left. Inline it and remove it from the API.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The convenience function run_command_v_opt_tr2() is only used by a
single caller. Use struct child_process and run_command() directly
instead and remove the underused function.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
run_command_v_opt_cd_env() is only used in an example in a comment. Use
the struct child_process member "env" and run_command() directly instead
and then remove the unused convenience function.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
As with the *_fn members removed in a preceding commit, let's not copy
the "processes" member of the "struct run_process_parallel_opts" over
to the "struct parallel_processes".
In this case we need the number of processes for the kill_children()
function, which will be called from a signal handler. To do that
adjust this code added in c553c72eed (run-command: add an
asynchronous parallel child processor, 2015-12-15) so that we use a
dedicated "struct parallel_processes_for_signal" for passing data to
the signal handler, in addition to the "struct parallel_process" it'll
now have access to our "opts" variable.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Continue the migration away from the "max_processes" member of "struct
parallel_processes" to the "processes" member of the "struct
run_process_parallel_opts", in this case we needed to pass the "opts"
further down into pp_cleanup() and pp_buffer_stderr().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Neither the "processes" nor "max_processes" members ever change after
their initialization, and they're always equivalent, but some existing
code used "pp->max_processes" when we were already passing the "opts"
to the function, let's use the "opts" directly instead.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As with the *_fn members removed in a preceding commit, let's not copy
the "data" member of the "struct run_process_parallel_opts" over to
the "struct parallel_processes". Now that we're passing the "opts"
down there's no reason to do so.
This makes the code easier to follow, as we have a "const" attribute
on the "struct run_process_parallel_opts", but not "struct
parallel_processes". We do not alter the "ungroup" argument, so
storing it in the non-const structure would make this control flow
less obvious.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As with the *_fn members removed in the preceding commit, let's not
copy the "ungroup" member of the "struct run_process_parallel_opts"
over to the "struct parallel_processes". Now that we're passing the
"opts" down there's no reason to do so.
This makes the code easier to follow, as we have a "const" attribute
on the "struct run_process_parallel_opts", but not "struct
parallel_processes". We do not alter the "ungroup" argument, so
storing it in the non-const structure would make this control flow
less obvious.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The only remaining reason for copying the callbacks in the "struct
run_process_parallel_opts" over to the "struct parallel_processes" was
to avoid two if/else statements in case the "start_failure" and
"task_finished" callbacks were NULL.
Let's handle those cases in pp_start_one() and pp_collect_finished()
instead, and avoid the default_* stub functions, and the need to copy
this data around.
Organizing the code like this made more sense before the "struct
run_parallel_parallel_opts" existed, as we'd have needed to pass each
of these as a separate parameter.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a "const" to two "struct parallel_processes" parameters where
we're not modifying anything in "pp". For kill_children() we'll call
it from both the signal handler, and from run_processes_parallel()
itself. Adding a "const" there makes it clear that we don't need to
modify any state when killing our children.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Have the users of the "run_processes_parallel_tr2()" function use
"run_processes_parallel()" instead. In preceding commits the latter
was refactored to take a "struct run_process_parallel_opts" argument,
since the only reason for "run_processes_parallel_tr2()" to exist was
to take arguments that are now a part of that struct we can do away
with it.
See ee4512ed48 (trace2: create new combined trace facility,
2019-02-22) for the addition of the "*_tr2()" variant of the function,
it was used by every caller except "t/helper/test-run-command.c"..
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As noted in fd3aaf53f7 (run-command: add an "ungroup" option to
run_process_parallel(), 2022-06-07) which added the "ungroup" passing
it to "run_process_parallel()" via the global
"run_processes_parallel_ungroup" variable was a compromise to get the
smallest possible regression fix for "maint" at the time.
This follow-up to that is a start at passing that parameter and others
via a new "struct run_process_parallel_opts", as the earlier
version[1] of what became fd3aaf53f7 did.
Since we need to change all of the occurrences of "n" to
"opt->SOMETHING" let's take the opportunity and rename the terse "n"
to "processes". We could also have picked "max_processes", "jobs",
"threads" etc., but as the API is named "run_processes_parallel()"
let's go with "processes".
Since the new "run_processes_parallel()" function is able to take an
optional "tr2_category" and "tr2_label" via the struct we can at this
point migrate all of the users of "run_processes_parallel_tr2()" over
to it.
But let's not migrate all the API users yet, only the two users that
passed the "ungroup" parameter via the
"run_processes_parallel_ungroup" global
1. https://lore.kernel.org/git/cover-v2-0.8-00000000000-20220518T195858Z-avarab@gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use a designated initializer to initialize those parts of pp_init()
that don't need any conditionals for their initialization, this sets
us on a path to pp_init() itself into mostly a validation and
allocation function.
Since we're doing that we can add "const" to some of the members of
the "struct parallel_processes", which helps to clarify and
self-document this code. E.g. we never alter the "data" pointer we
pass t user callbacks, nor (after the preceding change to stop
invoking online_cpus()) do we change "max_processes", the same goes
for the "ungroup" option.
We can also do away with a call to strbuf_init() in favor of macro
initialization, and to rely on other fields being NULL'd or zero'd.
Making members of a struct "const" rather that the pointer to the
struct itself is usually painful, as e.g. it precludes us from
incrementally setting up the structure. In this case we only set it up
with the assignment in run_process_parallel() and pp_init(), and don't
pass the struct pointer around as "const", so making individual
members "const" is worth the potential hassle for extra safety.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a "jobs = 0" is passed let's BUG() out rather than fall back on
online_cpus(). The default behavior was added when this API was
implemented in c553c72eed (run-command: add an asynchronous parallel
child processor, 2015-12-15).
Most of our code in-tree that scales up to "online_cpus()" by default
calls that function by itself. Keeping this default behavior just for
the sake of two callers means that we'd need to maintain this one spot
where we're second-guessing the config passed down into pp_init().
The preceding commit has an overview of the API callers that passed
"jobs = 0". There were only two of them (actually three, but they
resolved to these two config parsing codepaths).
The "fetch.parallel" caller already had a test for the
"fetch.parallel=0" case added in 0353c68818 (fetch: do not run a
redundant fetch from submodule, 2022-05-16), but there was no such
test for "submodule.fetchJobs". Let's add one here.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the "n" variable added in c553c72eed (run-command: add an
asynchronous parallel child processor, 2015-12-15) a "size_t". As
we'll see in a subsequent commit we do pass "0" here, but never "jobs
< 0".
We could have made it an "unsigned int", but as we're having to change
this let's not leave another case in the codebase where a size_t and
"unsigned int" size differ on some platforms. In this case it's likely
to never matter, but it's easier to not need to worry about it.
After this and preceding changes:
make run-command.o DEVOPTS=extra-all CFLAGS=-Wno-unused-parameter
Only has one (and new) -Wsigned-compare warning relevant to a
comparison about our "n" or "{nr,max}_processes": About using our
"n" (size_t) in the same expression as online_cpus() (int). A
subsequent commit will adjust & deal with online_cpus() and that
warning.
The only users of the "n" parameter are:
* builtin/fetch.c: defaults to 1, reads from the "fetch.parallel"
config. As seen in the code that parses the config added in
d54dea77db (fetch: let --jobs=<n> parallelize --multiple, too,
2019-10-05) will die if the git_config_int() return value is < 0.
It will however pass us n = 0, as we'll see in a subsequent commit.
* submodule.c: defaults to 1, reads from "submodule.fetchJobs"
config. Read via code originally added in a028a1930c (fetching
submodules: respect `submodule.fetchJobs` config option, 2016-02-29).
It now piggy-backs on the the submodule.fetchJobs code and
validation added in f20e7c1ea2 (submodule: remove
submodule.fetchjobs from submodule-config parsing, 2017-08-02).
Like builtin/fetch.c it will die if the git_config_int() return
value is < 0, but like builtin/fetch.c it will pass us n = 0.
* builtin/submodule--helper.c: defaults to 1. Read via code
originally added in 2335b870fa (submodule update: expose parallelism
to the user, 2016-02-29).
Since f20e7c1ea2 (submodule: remove submodule.fetchjobs from
submodule-config parsing, 2017-08-02) it shares a config parser and
semantics with the submodule.c caller.
* hook.c: hardcoded to 1, see 96e7225b31 (hook: add 'run'
subcommand, 2021-12-22).
* t/helper/test-run-command.c: can be -1 after parsing the arguments,
but will then be overridden to online_cpus() before passing it to
this API. See be5d88e112 (test-tool run-command: learn to run (parts
of) the testsuite, 2019-10-04).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the "run_processes_parallel{,_tr2}()" functions to return void,
instead of int. Ever since c553c72eed (run-command: add an
asynchronous parallel child processor, 2015-12-15) they have
unconditionally returned 0.
To get a "real" return value out of this function the caller needs to
get it via the "task_finished_fn" callback, see the example in hook.c
added in 96e7225b31 (hook: add 'run' subcommand, 2021-12-22).
So the "result = " and "if (!result)" code added to "builtin/fetch.c"
d54dea77db (fetch: let --jobs=<n> parallelize --multiple, too,
2019-10-05) has always been redundant, we always took that "if"
path. Likewise the "ret =" in "t/helper/test-run-command.c" added in
be5d88e112 (test-tool run-command: learn to run (parts of) the
testsuite, 2019-10-04) wasn't used, instead we got the return value
from the "if (suite.failed.nr > 0)" block seen in the context.
Subsequent commits will alter this API interface, getting rid of this
always-zero return value makes it easier to understand those changes.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our pipe_command() helper lets you both write to and read from a child
process on its stdin/stdout. It's supposed to work without deadlocks
because we use poll() to check when descriptors are ready for reading or
writing. But there's a bug: if both the data to be written and the data
to be read back exceed the pipe buffer, we'll deadlock.
The issue is that the code assumes that if you have, say, a 2MB buffer
to write and poll() tells you that the pipe descriptor is ready for
writing, that calling:
write(cmd->in, buf, 2*1024*1024);
will do a partial write, filling the pipe buffer and then returning what
it did write. And that is what it would do on a socket, but not for a
pipe. When writing to a pipe, at least on Linux, it will block waiting
for the child process to read() more. And now we have a potential
deadlock, because the child may be writing back to us, waiting for us to
read() ourselves.
An easy way to trigger this is:
git -c add.interactive.useBuiltin=true \
-c interactive.diffFilter=cat \
checkout -p HEAD~200
The diff against HEAD~200 will be big, and the filter wants to write all
of it back to us (obviously this is a dummy filter, but in the real
world something like diff-highlight would similarly stream back a big
output).
If you set add.interactive.useBuiltin to false, the problem goes away,
because now we're not using pipe_command() anymore (instead, that part
happens in perl). But this isn't a bug in the interactive code at all.
It's the underlying pipe_command() code which is broken, and has been
all along.
We presumably didn't notice because most calls only do input _or_
output, not both. And the few that do both, like gpg calls, may have
large inputs or outputs, but never both at the same time (e.g., consider
signing, which has a large payload but a small signature comes back).
The obvious fix is to put the descriptor into non-blocking mode, and
indeed, that makes the problem go away. Callers shouldn't need to
care, because they never see the descriptor (they hand us a buffer to
feed into it).
The included test fails reliably on Linux without this patch. Curiously,
it doesn't fail in our Windows CI environment, but has been reported to
do so for individual developers. It should pass in any environment after
this patch (courtesy of the compat/ layers added in the last few
commits).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>