mirrors/git

mirror of https://github.com/git/git.git synced 2025-01-09 17:13:47 +08:00

Author	SHA1	Message	Date
brian m. carlson	ad9bb6dfe6	http: add support for authtype and credential Now that we have the credential helper code set up to handle arbitrary authentications schemes, let's add support for this in the HTTP code, where we really want to use it. If we're using this new functionality, don't set a username and password, and instead set a header wherever we'd normally do so, including for proxy authentication. Since we can now handle this case, ask the credential helper to enable the appropriate capabilities. Finally, if we're using the authtype value, set "Expect: 100-continue". Any type of authentication that requires multiple rounds (such as NTLM or Kerberos) requires a 100 Continue (if we're larger than http.postBuffer) because otherwise we send the pack data before we're authenticated, the push gets a 401 response, and we can't rewind the stream. We don't know for certain what other custom schemes might require this, the HTTP/1.1 standard has required handling this since 1999, the broken HTTP server for which we disabled this (Google's) is now fixed and has been for some time, and libcurl has a 1-second fallback in case the HTTP server is still broken. In addition, it is not unreasonable to require compliance with a 25-year old standard to use new Git features. For all of these reasons, do so here. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-16 22:39:07 -07:00
brian m. carlson	5af5cc68aa	docs: indicate new credential protocol fields Now that we have new fields (authtype and credential), let's document them for users and credential helper implementers. Indicate specifically what common values of authtype are and what values are allowed. Note that, while common, digest and NTLM authentication are insecure because they require unsalted, uniterated password hashes to be stored. Tell users that they can continue to use a username and password even if the new capability is supported. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-16 22:39:07 -07:00
brian m. carlson	2ae6dc686d	credential: add a field called "ephemeral" Now that we have support for a wide variety of types of authentication, it's important to indicate to other credential helpers whether they should store credentials, since not every credential helper may intuitively understand all possible values of the authtype field. Do so with a boolean field called "ephemeral", to indicate whether the credential is expected to be temporary. For example, in HTTP Digest authentication, the Authorization header value is based off a nonce. It isn't useful to store this value for later use because reusing the credential long term will not result in successful authentication due to the nonce necessarily differing. An additional case is potentially short-lived credentials, which may last only a few hours. It similarly wouldn't be helper for other credential helpers to attempt to provide these much later. We do still pass the value to "git credential store" or "git credential erase", since it may be helpful to the original helper to know whether the operation was successful. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-16 22:39:07 -07:00
brian m. carlson	ca9ccbf674	credential: gate new fields on capability We support the new credential and authtype fields, but we lack a way to indicate to a credential helper that we'd like them to be used. Without some sort of indication, the credential helper doesn't know if it should try to provide us a username and password, or a pre-encoded credential. For example, the helper might prefer a more restricted Bearer token if pre-encoded credentials are possible, but might have to fall back to more general username and password if not. Let's provide a simple way to indicate whether Git (or, for that matter, the helper) is capable of understanding the authtype and credential fields. We send this capability when we generate a request, and the other side may reply to indicate to us that it does, too. For now, don't enable sending capabilities for the HTTP code. In a future commit, we'll introduce appropriate handling for that code, which requires more in-depth work. The logic for determining whether a capability is supported may seem complex, but it is not. At each stage, we emit the capability to the following stage if all preceding stages have declared it. Thus, if the caller to git credential fill didn't declare it, then we won't send it to the helper, and if fill's caller did send but the helper doesn't understand it, then we won't send it on in the response. If we're an internal user, then we know about all capabilities and will request them. For "git credential approve" and "git credential reject", we set the helper capability before calling the helper, since we assume that the input we're getting from the external program comes from a previous call to "git credential fill", and thus we'll invoke send a capability to the helper if and only if we got one from the standard input, which is the correct behavior. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-16 22:39:06 -07:00
brian m. carlson	6a6d6fb12e	credential: add a field for pre-encoded credentials At the moment, our credential code wants to find a username and password for access, which, for HTTP, it will pass to libcurl to encode and process. However, many users want to use authentication schemes that libcurl doesn't support, such as Bearer authentication. In these schemes, the secret is not a username and password pair, but some sort of token that meets the production for authentication data in the RFC. In fact, in general, it's useful to allow our credential helper to have knowledge about what specifically to put in the protocol header. Thus, add a field, credential, which contains data that's preencoded to be suitable for the protocol in question. If we have such data, we need neither a username nor a password, so make that adjustment as well. It is in theory possible to reuse the password field for this. However, if we do so, we must know whether the credential helper supports our new scheme before sending it data, which necessitates some sort of capability inquiry, because otherwise an uninformed credential helper would store our preencoded data as a password, which would fail the next time we attempted to connect to the remote server. This design is substantially simpler, and we can hint to the credential helper that we support this approach with a simple new field instead of needing to query it first. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-16 22:39:06 -07:00
brian m. carlson	d01c76f1cf	http: use new headers for each object request Currently we create one set of headers for all object requests and reuse it. However, we'll need to adjust the headers for authentication purposes in the future, so let's create a new set for each request so that we can adjust them if the authentication changes. Note that the cost of allocation here is tiny compared to the fact that we're making a network call, not to mention probably a full TLS connection, so this shouldn't have a significant impact on performance. Moreover, nobody who cares about performance is using the dumb HTTP protocol anyway, since it often makes huge numbers of requests compared to the smart protocol. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-16 22:39:06 -07:00
brian m. carlson	90765ea81e	remote-curl: reset headers on new request When we retry a post_rpc request, we currently reuse the same headers as before. In the future, we'd like to be able to modify them based on the result we get back, so let's reset them on each retry so we can avoid sending potentially duplicate headers if the values change. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-16 22:39:06 -07:00
brian m. carlson	7046f1d572	credential: add an authtype field When Git makes an HTTP request, it can negotiate the type of authentication to use with the server provided the authentication scheme is one of a few well-known types (Basic, Digest, NTLM, or Negotiate). However, some servers wish to use other types of authentication, such as the Bearer type from OAuth2. Since libcurl doesn't natively support this type, it isn't possible to use it, and the user is forced to specify the Authorization header using the http.extraheader setting. However, storing a plaintext token in the repository configuration is not very secure, especially if a repository can be shared by multiple parties. We already have support for many types of secure credential storage by using credential helpers, so let's teach credential helpers how to produce credentials for an arbitrary scheme. If the credential helper specifies an authtype field, then it specifies an authentication scheme (e.g., Bearer) and the password field specifies the raw authentication token, with any encoding already specified. We reuse the password field for this because some credential helpers store the metadata without encryption even though the password is encrypted, and we'd like to avoid insecure storage if an older version of the credential helper gets ahold of the data. The username is not used in this case, but it is still preserved for the purpose of finding the right credential if the user has multiple accounts. If the authtype field is not specified, then the password behaves as normal and it is passed along with the username to libcurl. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-16 22:39:06 -07:00
Linus Arver	8882ee9d68	mailmap: change primary address for Linus Arver Linus will lose access to his work email soon. Signed-off-by: Linus Arver <linusa@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-16 22:25:11 -07:00
Junio C Hamano	21306a098c	The twentieth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-16 14:50:31 -07:00
Junio C Hamano	93e3f9df7a	Merge branch 'pw/t3428-cleanup' Test cleanup. * pw/t3428-cleanup: t3428: restore coverage for "apply" backend t3428: use test_commit_message t3428: modernize test setup	2024-04-16 14:50:31 -07:00
Junio C Hamano	51c15ac1b6	Merge branch 'ba/osxkeychain-updates' Update osxkeychain backend with features required for the recent credential subsystem. * ba/osxkeychain-updates: osxkeychain: store new attributes osxkeychain: erase matching passwords only osxkeychain: erase all matching credentials osxkeychain: replace deprecated SecKeychain API	2024-04-16 14:50:30 -07:00
Junio C Hamano	82a31ec324	Merge branch 'jt/reftable-geometric-compaction' The strategy to compact multiple tables of reftables after many operations accumulate many entries has been improved to avoid accumulating too many tables uncollected. * jt/reftable-geometric-compaction: reftable/stack: use geometric table compaction reftable/stack: add env to disable autocompaction reftable/stack: expose option to disable auto-compaction	2024-04-16 14:50:30 -07:00
Junio C Hamano	2b49e41155	Merge branch 'tb/make-indent-conditional-with-non-spaces' Adjust to an upcoming changes to GNU make that breaks our Makefiles. * tb/make-indent-conditional-with-non-spaces: Makefile(s): do not enforce "all indents must be done with tab" Makefile(s): avoid recipe prefix in conditional statements	2024-04-16 14:50:29 -07:00
Junio C Hamano	a7589384d5	Merge branch 'rs/usage-fallback-to-show-message-format' vreportf(), which is usede by error() and friends, has been taught to give the error message printf-format string when its vsnprintf() call fails, instead of showing nothing useful to identify the nature of the error. * rs/usage-fallback-to-show-message-format: usage: report vsnprintf(3) failure	2024-04-16 14:50:29 -07:00
Junio C Hamano	107313eb11	Merge branch 'rs/date-mode-pass-by-value' The codepaths that reach date_mode_from_type() have been updated to pass "struct date_mode" by value to make them thread safe. * rs/date-mode-pass-by-value: date: make DATE_MODE thread-safe	2024-04-16 14:50:29 -07:00
Junio C Hamano	2d642afb0a	Merge branch 'sj/userdiff-c-sharp' The userdiff patterns for C# has been updated. Acked-by: Johannes Sixt <j6t@kdbg.org> cf. <c2154457-3f2f-496e-9b8b-c8ea7257027b@kdbg.org> * sj/userdiff-c-sharp: userdiff: better method/property matching for C#	2024-04-16 14:50:28 -07:00
Junio C Hamano	625ef1c6f1	Merge branch 'tb/t7700-fixup' Test fix. * tb/t7700-fixup: t/t7700-repack.sh: fix test breakages with `GIT_TEST_MULTI_PACK_INDEX=1 `	2024-04-16 14:50:28 -07:00
Junio C Hamano	92e8388bd3	Merge branch 'jc/local-extern-shell-rules' Document and apply workaround for a buggy version of dash that mishandles "local var=val" construct. * jc/local-extern-shell-rules: t1016: local VAR="VAL" fix t0610: local VAR="VAL" fix t: teach lint that RHS of 'local VAR=VAL' needs to be quoted t: local VAR="VAL" (quote ${magic-reference}) t: local VAR="VAL" (quote command substitution) t: local VAR="VAL" (quote positional parameters) CodingGuidelines: quote assigned value in 'local var=$val' CodingGuidelines: describe "export VAR=VAL" rule	2024-04-16 14:50:27 -07:00
René Scharfe	20fee9af9e	apply: avoid using fixed-size buffer in write_out_one_reject() On some systems PATH_MAX is not a hard limit. Support longer paths by building them on the heap instead of using static buffers. Take care to work around (arguably buggy) implementations of free(3) that change errno by calling it only after using the errno value. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-16 13:38:35 -07:00
Marcel Röthke	167395bb47	rerere: fix crashes due to unmatched opening conflict markers When rerere handles a conflict with an unmatched opening conflict marker in a file with other conflicts, it will fail create a preimage and also fail allocate the status member of struct rerere_dir. Currently the status member is allocated after the error handling. This will lead to a SEGFAULT when the status member is accessed during cleanup of the failed parse. Additionally, in subsequent executions of rerere, after removing the MERGE_RR.lock manually, rerere crashes for a similar reason. MERGE_RR points to a conflict id that has no preimage, therefore the status member is not allocated and a SEGFAULT happens when trying to check if a preimage exists. Solve this by making sure the status field is allocated correctly and add tests to prevent the bug from reoccurring. This does not fix the root cause, failing to parse stray conflict markers, but I don't think we can do much better than recognizing it, printing an error, and moving on gracefully. Signed-off-by: Marcel Röthke <marcel@roethke.info> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-16 08:42:36 -07:00
Junio C Hamano	548fe35913	The ninteenth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 14:11:44 -07:00
Junio C Hamano	cb25f97eab	Merge branch 'jc/t2104-style-fixes' Test style fixes. * jc/t2104-style-fixes: t2104: style fixes	2024-04-15 14:11:44 -07:00
Junio C Hamano	b415f15b49	Merge branch 'jc/unleak-core-excludesfile' The variable that holds the value read from the core.excludefile configuration variable used to leak, which has been corrected. * jc/unleak-core-excludesfile: config: do not leak excludes_file	2024-04-15 14:11:44 -07:00
Junio C Hamano	eba498a774	Merge branch 'jk/libcurl-8.7-regression-workaround' Fix was added to work around a regression in libcURL 8.7.0 (which has already been fixed in their tip of the tree). * jk/libcurl-8.7-regression-workaround: remote-curl: add Transfer-Encoding header only for older curl INSTALL: bump libcurl version to 7.21.3 http: reset POSTFIELDSIZE when clearing curl handle	2024-04-15 14:11:44 -07:00
Junio C Hamano	372aabe912	Merge branch 'ps/t0610-umask-fix' The "shared repository" test in the t0610 reftable test failed under restrictive umask setting (e.g. 007), which has been corrected. * ps/t0610-umask-fix: t0610: execute git-pack-refs(1) with specified umask t0610: make `--shared=` tests reusable	2024-04-15 14:11:43 -07:00
Junio C Hamano	d75ec4c627	Merge branch 'gt/add-u-commit-i-pathspec-check' "git add -u <pathspec>" and "git commit [-i] <pathspec>" did not diagnose a pathspec element that did not match any files in certain situations, unlike "git add <pathspec>" did. * gt/add-u-commit-i-pathspec-check: builtin/add: error out when passing untracked path with -u builtin/commit: error out when passing untracked path with -i revision: optionally record matches with pathspec elements	2024-04-15 14:11:43 -07:00
Junio C Hamano	6c142bc846	Merge branch 'ds/fetch-config-parse-microfix' A config parser callback function fell through instead of returning after recognising and processing a variable, wasting cycles, which has been corrected. * ds/fetch-config-parse-microfix: fetch: return when parsing submodule.recurse	2024-04-15 14:11:43 -07:00
Junio C Hamano	ce729ea9ba	Merge branch 'rs/apply-reject-fd-leakfix' A file descriptor leak in an error codepath, used when "git apply --reject" fails to create the .rej file, has been corrected. rs/apply-reject-fd-leakfix: apply: don't leak fd on fdopen() error	2024-04-15 14:11:43 -07:00
Junio C Hamano	c7a9ec4728	Merge branch 'rs/apply-lift-path-length-limit' "git apply" has been updated to lift the hardcoded pathname length limit, which in turn allowed a mksnpath() function that is no longer used. * rs/apply-lift-path-length-limit: path: remove mksnpath() apply: avoid fixed-size buffer in create_one_file()	2024-04-15 14:11:42 -07:00
Junio C Hamano	509cc1d413	Merge branch 'ma/win32-unix-domain-socket' Windows binary used to decide the use of unix-domain socket at build time, but it learned to make the decision at runtime instead. * ma/win32-unix-domain-socket: Win32: detect unix socket support at runtime	2024-04-15 14:11:42 -07:00
René Scharfe	21b5821acd	imap-send: increase command size limit nfvasprintf() has a 8KB limit, but it's not relevant, as its result is combined with other strings and added to a 1KB buffer by its caller. That 1KB limit is not mentioned in RFC 9051, which specifies IMAP. While 1KB is plenty for user names, passwords and mailbox names, there's no point in limiting our commands like that. Call xstrvfmt() instead of open-coding it and use strbuf to format the command to send, as we need its length. Fail hard if it exceeds INT_MAX, because socket_write() can't take more than that. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 11:34:17 -07:00
Peter Krefting	8198993c81	bisect: report the found commit with "show" When "git bisect" finds the first bad commit and shows it to the user, it calls "git diff-tree" to do so, whose output is meant to be stable and deliberately ignores end-user customizations. As the output is supposed to be consumed by humans, replace this with a call to "git show". This command honors configuration options (such as "log.date" and "log.mailmap") and other UI improvements (renames are detected). Pass some hard-coded options to "git show" to make the output similar to the one we are replacing, such as showing a patch summary only. Reported-by: Michael Osipov <michael.osipov@innomotics.com> Signed-off-By: Peter Krefting <peter@softwolves.pp.se> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 11:29:09 -07:00
Yehezkel Bernat	f412d72c19	Documentation: fix linkgit reference In git-replay documentation, linkgit to git-rev-parse is missing the man section, which breaks its rendering. Add section number as done in other references to this command. Signed-off-by: Yehezkel Bernat <YehezkelShB@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 11:02:43 -07:00
René Scharfe	44bdba2fa6	git-compat-util: fix NO_OPENSSL on current macOS `b195aa00c1` (git-compat-util: suppress unavoidable Apple-specific deprecation warnings, 2014-12-16) started to define __AVAILABILITY_MACROS_USES_AVAILABILITY in git-compat-util.h. On current versions it is already defined (e.g. on macOS 14.4.1). Undefine it before redefining it to avoid a compilation error. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 11:01:31 -07:00
Patrick Steinhardt	795006fff4	pack-bitmap: gracefully handle missing BTMP chunks In `0fea6b73f1` (Merge branch 'tb/multi-pack-verbatim-reuse', 2024-01-12) we have introduced multi-pack verbatim reuse of objects. This series has introduced a new BTMP chunk, which encodes information about bitmapped objects in the multi-pack index. Starting with `dab60934e3` (pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions, 2023-12-14) we use this information to figure out objects which we can reuse from each of the packfiles. One thing that we glossed over though is backwards compatibility with repositories that do not yet have BTMP chunks in their multi-pack index. In that case, `nth_bitmapped_pack()` would return an error, which causes us to emit a warning followed by another error message. These warnings are visible to users that fetch from a repository: ``` $ git fetch ... remote: error: MIDX does not contain the BTMP chunk remote: warning: unable to load pack: 'pack-f6bb7bd71d345ea9fe604b60cab9ba9ece54ffbe.idx', disabling pack-reuse remote: Enumerating objects: 40, done. remote: Counting objects: 100% (40/40), done. remote: Compressing objects: 100% (39/39), done. remote: Total 40 (delta 5), reused 0 (delta 0), pack-reused 0 (from 0) ... ``` While the fetch succeeds the user is left wondering what they did wrong. Furthermore, as visible both from the warning and from the reuse stats, pack-reuse is completely disabled in such repositories. What is quite interesting is that this issue can even be triggered in case `pack.allowPackReuse=single` is set, which is the default value. One could have expected that in this case we fall back to the old logic, which is to use the preferred packfile without consulting BTMP chunks at all. But either we fail with the above error in case they are missing, or we use the first pack in the multi-pack-index. The former case disables pack-reuse altogether, whereas the latter case may result in reusing objects from a suboptimal packfile. Fix this issue by partially reverting the logic back to what we had before this patch series landed. Namely, in the case where we have no BTMP chunks or when `pack.allowPackReuse=single` are set, we use the preferred pack instead of consulting the BTMP chunks. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:42:00 -07:00
Patrick Steinhardt	9da5c992dd	reftable/block: avoid copying block iterators on seek When seeking a reftable record in a block we need to position the iterator _before_ the sought-after record so that the next call to `block_iter_next()` would yield that record. To achieve this, the loop that performs the linear seeks to restore the previous position once it has found the record. This is done by advancing two `block_iter`s: one to check whether the next record is our sought-after record, and one that we update after every iteration. This of course involves quite a lot of copying and also leads to needless memory allocations. Refactor the code to get rid of the `next` iterator and the copying this involves. Instead, we can restore the previous offset such that the call to `next` will return the correct record. Next to being simpler conceptually this also leads to a nice speedup. The following benchmark parser 10k refs out of 100k existing refs via `git-rev-list --no-walk`: Benchmark 1: rev-list: print many refs (HEAD~) Time (mean ± σ): 170.2 ms ± 1.7 ms [User: 86.1 ms, System: 83.6 ms] Range (min … max): 166.4 ms … 180.3 ms 500 runs Benchmark 2: rev-list: print many refs (HEAD~) Time (mean ± σ): 161.6 ms ± 1.6 ms [User: 78.1 ms, System: 83.0 ms] Range (min … max): 158.4 ms … 172.3 ms 500 runs Summary rev-list: print many refs (HEAD) ran 1.05 ± 0.01 times faster than rev-list: print many refs (HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:37:59 -07:00
Patrick Steinhardt	ce1f213cc9	reftable/block: reuse `zstream` state on inflation When calling `inflateInit()` and `inflate()`, the zlib library will allocate several data structures for the underlying `zstream` to keep track of various information. Thus, when inflating repeatedly, it is possible to optimize memory allocation patterns by reusing the `zstream` and then calling `inflateReset()` on it to prepare it for the next chunk of data to inflate. This is exactly what the reftable code is doing: when iterating through reflogs we need to potentially inflate many log blocks, but we discard the `zstream` every single time. Instead, as we reuse the `block_reader` for each of the blocks anyway, we can initialize the `zstream` once and then reuse it for subsequent inflations. Refactor the code to do so, which leads to a significant reduction in the number of allocations. The following measurements were done when iterating through 1 million reflog entries. Before: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 23,028 allocs, 22,906 frees, 162,813,552 bytes allocated After: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 302 allocs, 180 frees, 88,352 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:36:09 -07:00
Patrick Steinhardt	15a60b747e	reftable/block: open-code call to `uncompress2()` The reftable format stores log blocks in a compressed format. Thus, whenever we want to read such a block we first need to decompress it. This is done by calling the convenience function `uncompress2()` of the zlib library, which is a simple wrapper that manages the lifecycle of the `zstream` structure for us. While nice for one-off inflation of data, when iterating through reflogs we will likely end up inflating many such log blocks. This requires us to reallocate the state of the `zstream` every single time, which adds up over time. It would thus be great to reuse the `zstream` instead of discarding it after every inflation. Open-code the call to `uncompress2()` such that we can start reusing the `zstream` in the subsequent commit. Note that our open-coded variant is different from `uncompress2()` in two ways: - We do not loop around `inflate()` until we have processed all input. As our input is limited by the maximum block size, which is 16MB, we should not hit limits of `inflate()`. - We use `Z_FINISH` instead of `Z_NO_FLUSH`. Quoting the `inflate()` documentation: "inflate() should normally be called until it returns Z_STREAM_END or an error. However if all decompression is to be performed in a single step (a single call of inflate), the parameter flush should be set to Z_FINISH." Furthermore, "Z_FINISH also informs inflate to not maintain a sliding window if the stream completes, which reduces inflate's memory footprint." Other than that this commit is expected to be functionally equivalent and does not yet reuse the `zstream`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:36:09 -07:00
Patrick Steinhardt	dd347bbce6	reftable/block: reuse uncompressed blocks The reftable backend stores reflog entries in a compressed format and thus needs to uncompress blocks before one can read records from it. For each reflog block we thus have to allocate an array that we can decompress the block contents into. This block is being discarded whenever the table iterator moves to the next block. Consequently, we reallocate a new array on every block, which is quite wasteful. Refactor the code to reuse the uncompressed block data when moving the block reader to a new block. This significantly reduces the number of allocations when iterating through many compressed blocks. The following measurements are done with `git reflog list` when listing 100k reflogs. Before: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 45,755 allocs, 45,633 frees, 254,779,456 bytes allocated After: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 23,028 allocs, 22,906 frees, 162,813,547 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:36:09 -07:00
Patrick Steinhardt	b00bcb7c49	reftable/reader: iterate to next block in place The table iterator has to iterate towards the next block once it has yielded all records of the current block. This is done by creating a new table iterator, initializing it to the next block, releasing the old iterator and then copying over the data. Refactor the code to instead advance the table iterator in place. This is simpler and unlocks some optimizations in subsequent patches. Also, it allows us to avoid some allocations. The following measurements show a single matching ref out of 1 million refs. Before this change: HEAP SUMMARY: in use at exit: 13,603 bytes in 125 blocks total heap usage: 7,235 allocs, 7,110 frees, 301,481 bytes allocated After: HEAP SUMMARY: in use at exit: 13,603 bytes in 125 blocks total heap usage: 315 allocs, 190 frees, 107,027 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:36:09 -07:00
Patrick Steinhardt	bcdc586db0	reftable/block: move ownership of block reader into `struct table_iter` The table iterator allows the caller to iterate through all records in a reftable table. To do so it iterates through all blocks of the desired type one by one, where for each block it creates a new block iterator and yields all its entries. One of the things that is somewhat confusing in this context is who owns the block reader that is being used to read the blocks and pass them to the block iterator. Intuitively, as the table iterator is responsible for iterating through the blocks, one would assume that this iterator is also responsible for managing the lifecycle of the reader. And while it somewhat is, the block reader is ultimately stored inside of the block iterator. Refactor the code such that the block reader is instead fully managed by the table iterator. Instead of passing the reader to the block iterator, we now only end up passing the block data to it. Despite clearing up the lifecycle of the reader, it will also allow for better reuse of the reader in subsequent patches. The following benchmark prints a single matching ref out of 1 million refs. Before: HEAP SUMMARY: in use at exit: 13,603 bytes in 125 blocks total heap usage: 6,607 allocs, 6,482 frees, 509,635 bytes allocated After: HEAP SUMMARY: in use at exit: 13,603 bytes in 125 blocks total heap usage: 7,235 allocs, 7,110 frees, 301,481 bytes allocated Note that while there are more allocation and free calls now, the overall number of bytes allocated is significantly lower. The number of allocations will be reduced significantly by the next patch though. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:36:09 -07:00
Patrick Steinhardt	b371221a60	reftable/block: introduce `block_reader_release()` Introduce a new function `block_reader_release()` that releases resources acquired by the block reader. This function will be extended in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:36:09 -07:00
Patrick Steinhardt	aac8c03cc4	reftable/block: better grouping of functions Function definitions and declaration of `struct block_reader` and `struct block_iter` are somewhat mixed up, making it hard to see which functions belong together. Rearrange them. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:36:09 -07:00
Patrick Steinhardt	42c7bdc36d	reftable/block: merge `block_iter_seek()` and `block_reader_seek()` The function `block_iter_seek()` is merely a simple wrapper around `block_reader_seek()`. Merge those two functions into a new function `block_iter_seek_key()` that more clearly says what it is actually doing. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:36:09 -07:00
Patrick Steinhardt	3122d44025	reftable/block: rename `block_reader_start()` The function `block_reader_start()` does not really apply to the block reader, but to the block iterator. It's name is thus somewhat confusing. Rename it to `block_iter_seek_start()` to clarify. We will rename `block_reader_seek()` in similar spirit in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:36:09 -07:00
Rubén Justo	48e1ca27b1	launch_editor: waiting message on error When advice.waitingForEditor configuration is not set to false, we show a hint telling that we are waiting for user's editor to close the file when we launch an editor and wait for it to return control back to us. We give the message on an incomplete line, expecting that we can go back to the beginning of the line and clear the message when the editor returns. However, it is possible that the editor exits with an error status, in which case we show an error message and then return to our caller. In such a case, the error message is given where the terminal cursor happens to be, which is most likely after the "we are waiting for your editor" message on the same line. Clear the line before showing the error. While we're here, make the error message follow our CodingGuideLines. Signed-off-by: Rubén Justo <rjusto@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:13:32 -07:00
Thalia Archibald	ab4ad1fa8a	fast-import: make comments more precise The former is somewhat imprecise. The latter became out of sync with the behavior in `e814c39c2f` (fast-import: refactor parsing of spaces, 2014-06-18). Signed-off-by: Thalia Archibald <thalia@archibald.dev> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:06:18 -07:00
Thalia Archibald	be4d6a371e	fast-import: forbid escaped NUL in paths NUL cannot appear in paths. Even disregarding filesystem path limitations, the tree object format delimits with NUL, so such a path cannot be encoded by Git. When a quoted path is unquoted, it could possibly contain NUL from "\000". Forbid it so it isn't truncated. fast-import still has other issues with NUL, but those will be addressed later. Signed-off-by: Thalia Archibald <thalia@archibald.dev> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:06:18 -07:00
Thalia Archibald	a923a04b80	fast-import: document C-style escapes for paths Simply saying “C-style” string quoting is imprecise, as only a subset of C escapes are supported. Document the exact escapes. Signed-off-by: Thalia Archibald <thalia@archibald.dev> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:06:18 -07:00

... 2 3 4 5 6 ...

73174 Commits