The dumb-http code regressed when the result of re-indexing a pack
yielded an *.idx file that differs in content from the *.idx file it
downloaded from the remote. This has been corrected by no longer
relying on the *.idx file we got from the remote.
* jk/dumb-http-finalize:
packfile: use oidread() instead of hashcpy() to fill object_id
packfile: use object_id in find_pack_entry_one()
packfile: convert find_sha1_pack() to use object_id
http-walker: use object_id instead of bare hash
packfile: warn people away from parse_packed_git()
packfile: drop sha1_pack_index_name()
packfile: drop sha1_pack_name()
packfile: drop has_pack_index()
dumb-http: store downloaded pack idx as tempfile
t5550: count fetches in "previously-fetched .idx" test
midx: avoid duplicate packed_git entries
The sha1_pack_name() function has a few ugly bits:
- it writes into a static strbuf (and not even a ring buffer of them),
which can lead to subtle invalidation problems
- it uses the term "sha1", but it's really using the_hash_algo, which
could be sha256
There's only one caller of it left. And in fact that caller is better
off using the underlying odb_pack_name() function itself, since it's
just copying the result into its own strbuf anyway.
Converting that caller lets us get rid of this now-obselete function.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The has_pack_index() function has several oddities that may make it
surprising if you are trying to find out if we have a pack with some
$hash:
- it is not looking for a valid pack that we found while searching
object directories. It just looks for any pack-$hash.idx file in the
pack directory.
- it only looks in the local directory, not any alternates
- it takes a bare "unsigned char" hash, which we try to avoid these
days
The only caller it has is in the dumb http code; it wants to know if we
already have the pack idx in question. This can happen if we downloaded
the pack (and generated its index) during a previous fetch.
Before the previous patch ("dumb-http: store downloaded pack idx as
tempfile"), it could also happen if we downloaded the .idx from the
remote but didn't get the matching .pack. But since that patch, we don't
hold on to those .idx files. So there's no need to look for the .idx
file in the filesystem; we can just scan through the packed_git list to
see if we have it.
That lets us simplify the dumb http code a bit, as we know that if we
have the .idx we have the matching .pack already. And it lets us get rid
of this odd function that is unlikely to be needed again.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
This patch fixes a regression in b1b8dfde69 (finalize_object_file():
implement collision check, 2024-09-26) where fetching a v1 pack idx file
over the dumb-http protocol would cause the fetch to fail.
The core of the issue is that dumb-http stores the idx we fetch from the
remote at the same path that will eventually hold the idx we generate
from "index-pack --stdin". The sequence is something like this:
0. We realize we need some object X, which we don't have locally, and
nor does the other side have it as a loose object.
1. We download the list of remote packs from objects/info/packs.
2. For each entry in that file, we download each pack index and store
it locally in .git/objects/pack/pack-$hash.idx (the $hash is not
something we can verify yet and is given to us by the remote).
3. We check each pack index we got to see if it has object X. When we
find a match, we download the matching .pack file from the remote
to a tempfile. We feed that to "index-pack --stdin", which
reindexes the pack, rather than trusting that it has what the other
side claims it does. In most cases, this will end up generating the
exact same (byte-for-byte) pack index which we'll store at the same
pack-$hash.idx path, because the index generation and $hash id are
computed based on what's in the packfile. But:
a. The other side might have used other options to generate the
index. For instance we use index v2 by default, but long ago
it was v1 (and you can still ask for v1 explicitly).
b. The other side might even use a different mechanism to
determine $hash. E.g., long ago it was based on the sorted
list of objects in the packfile, but we switched to using the
pack checksum in 1190a1acf8 (pack-objects: name pack files
after trailer hash, 2013-12-05).
The regression we saw in the real world was (3a). A recent client
fetching from a server with a v1 index downloaded that index, then
complained about trying to overwrite it with its own v2 index. This
collision is otherwise harmless; we know we want to replace the remote
version with our local one, but the collision check doesn't realize
that.
There are a few options to fix it:
- we could teach index-pack a command-line option to ignore only pack
idx collisions, and use it when the dumb-http code invokes
index-pack. This would be an awkward thing to expose users to and
would involve a lot of boilerplate to get the option down to the
collision code.
- we could delete the remote .idx file right before running
index-pack. It should be redundant at that point (since we've just
downloaded the matching pack). But it feels risky to delete
something from our own .git/objects based on what the other side has
said. I'm not entirely positive that a malicious server couldn't lie
about which pack-$hash.idx it has and get us to delete something
precious.
- we can stop co-mingling the downloaded idx files in our local
objects directory. This is a slightly bigger change but I think
fixes the root of the problem more directly.
This patch implements the third option. The big design questions are:
where do we store the downloaded files, and how do we manage their
lifetimes?
There are some additional quirks to the dumb-http system we should
consider. Remember that in step 2 we downloaded every pack index, but in
step 3 we may only download some of the matching packs. What happens to
those other idx files now? They sit in the .git/objects/pack directory,
possibly waiting to be used at a later date. That may save bandwidth for
a subsequent fetch, but it also creates a lot of weird corner cases:
- our local object directory now has semi-untrusted .idx files sitting
around, without their matching .pack
- in case 3b, we noted that we might not generate the same hash as the
other side. In that case even if we download the matching pack,
our index-pack invocation will store it in a different
pack-$hash.idx file. And the unmatched .idx will sit there forever.
- if the server repacks, it may delete the old packs. Now we have
these orphaned .idx files sitting around locally that will never be
used (nor deleted).
- if we repack locally we may delete our local version of the server's
pack index and not realize we have it. So we'll download it again,
even though we have all of the objects it mentions.
I think the right solution here is probably some more complex cache
management system: download the remote .idx files to their own storage
directory, mark them as "seen" when we get their matching pack (to avoid
re-downloading even if we repack), and then delete them when the
server's objects/info/refs no longer mentions them.
But since the dumb http protocol is so ancient and so inferior to the
smart http protocol, I don't think it's worth spending a lot of time
creating such a system. For this patch I'm just downloading the idx
files to .git/objects/tmp_pack_*, and marking them as tempfiles to be
deleted when we exit (and due to the name, any we miss due to a crash,
etc, should eventually be removed by "git gc" runs based on timestamps).
That is slightly worse for one case: if we download an idx but not the
matching pack, we won't retain that idx for subsequent runs. But the
flip side is that we're making other cases better (we never hold on to
useless idx files forever). I suspect that worse case does not even come
up often, since it implies that the packs are generated to match
distinct parts of history (i.e., in practice even in a repo with many
packs you're going to end up grabbing all of those packs to do a clone).
If somebody really cares about that, I think the right path forward is a
managed cache directory as above, and this patch is providing the first
step in that direction anyway (by moving things out of the objects/pack/
directory).
There are two test changes. One demonstrates the broken v1 index case
(it double-checks the resulting clone with fsck to be careful, but prior
to this patch it actually fails at the clone step). The other tweaks the
expectation for a test that covers the "slightly worse" case to
accommodate the extra index download.
The code changes are fairly simple. We stop using finalize_object_file()
to copy the remote's index file into place, and leave it as a tempfile.
We give the tempfile a real ".idx" name, since the packfile code expects
that, and thus we make sure it is out of the usual packs/ directory (so
we'd never mistake it for a real local .idx).
We also have to change parse_pack_index(), which creates a temporary
packed_git to access our index (we need this because all of the pack idx
code assumes we have that struct). It reads the index data from the
tempfile, but prior to this patch would speculatively write the
finalized name into the packed_git struct using the pack-$hash we expect
to use.
I was mildly surprised that this worked at all, since we call
verify_pack_index() on the packed_git which mentions the final name
before moving the file into place! But it works because
parse_pack_index() leaves the mmap-ed data in the struct, so the
lazy-open in verify_pack_index() never triggers, and we read from the
tempfile, ignoring the filename in the struct completely. Hacky, but it
works.
After this patch, parse_pack_index() now uses the index filename we pass
in to derive a matching .pack name. This is OK to change because there
are only two callers, both in the dumb http code (and the other passes
in an existing pack-$hash.idx name, so the derived name is going to be
pack-$hash.pack, which is what we were using anyway).
I'll follow up with some more cleanups in that area, but this patch is
sufficient to fix the regression.
Reported-by: fox <fox.gbr@townlong-yak.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The `result` parameter passed to `http_request_reauth()` may either
point to a `struct strbuf` or a `FILE *`, where the `target` parameter
tells us which of either it actually is. To accommodate for both types
the pointer is a `void *`, which we then pass directly to functions
without doing a cast.
This is fine on most platforms, but it breaks on FreeBSD because
`fileno()` is implemented as a macro that tries to directly access the
`FILE *` structure.
Fix this issue by storing the `FILE *` in a local variable before we
pass it on to other functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Leakfixes.
* jk/http-leakfixes: (28 commits)
http-push: clean up local_refs at exit
http-push: clean up loose request when falling back to packed
http-push: clean up objects list
http-push: free xml_ctx.cdata after use
http-push: free remote_ls_ctx.dentry_name
http-push: free transfer_request strbuf
http-push: free transfer_request dest field
http-push: free curl header lists
http-push: free repo->url string
http-push: clear refspecs before exiting
http-walker: free fake packed_git list
remote-curl: free HEAD ref with free_one_ref()
http: stop leaking buffer in http_get_info_packs()
http: call git_inflate_end() when releasing http_object_request
http: fix leak of http_object_request struct
http: fix leak when redacting cookies from curl trace
transport-helper: fix leak of dummy refs_list
fetch-pack: clear pack lockfiles list
fetch: free "raw" string when shrinking refspec
transport-helper: fix strbuf leak in push_refs_with_push()
...
Typofix.
* ak/typofix-2.46-maint:
upload-pack: fix a typo
sideband: fix a typo
setup: fix a typo
run-command: fix a typo
revision: fix a typo
refs: fix typos
rebase: fix a typo
read-cache-ll: fix a typo
pretty: fix a typo
object-file: fix a typo
merge-ort: fix typos
merge-ll: fix a typo
http: fix a typo
gpg-interface: fix a typo
git-p4: fix typos
git-instaweb: fix a typo
fsmonitor-settings: fix a typo
diffcore-rename: fix typos
config.mak.dev: fix a typo
We use http_get_strbuf() to fetch the remote info/packs content into a
strbuf, but never free it, causing a leak. There's no need to hold onto
it, as we've already parsed it completely.
This lets us mark t5619 as leak-free.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In new_http_object_request(), we initialize the zlib stream with
git_inflate_init(). We must have a matching git_inflate_end() to avoid
leaking any memory allocated by zlib.
In most cases this happens in finish_http_object_request(), but we don't
always get there. If we abort a request mid-stream, then we may clean it
up without hitting that function.
We can't just add a git_inflate_end() call to the release function,
though. That would double-free the cases that did actually finish.
Instead, we'll move the call from the finish function to the release
function. This does delay it for the cases that do finish, but I don't
think it matters. We should have already reached Z_STREAM_END (and
complain if we didn't), and we do not record any status code from
git_inflate_end().
This leak is triggered by t5550 at least (and probably other dumb-http
tests).
I did find one other related spot of interest. If we try to read a
previously downloaded file and fail, we reset the stream by calling
memset() followed by a fresh git_inflate_init(). I don't think this case
is triggered in the test suite, but it seemed like an obvious leak, so I
added the appropriate git_inflate_end() before the memset() there.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new_http_object_request() function allocates a struct on the heap,
along with some fields inside the struct. But the matching function to
clean it up, release_http_object_request(), only frees the interior
fields without freeing the struct itself, causing a leak.
The related http_pack_request new/release pair gets this right, and at
first glance we should be able to do the same thing and just add a
single free() call. But there's a catch.
These http_object_request structs are typically embedded in the
object_request struct of http-walker.c. And when we clean up that parent
struct, it sanity-checks the embedded struct to make sure we are not
leaking descriptors. Which means a use-after-free if we simply free()
the embedded struct.
I have no idea how valuable that sanity-check is, or whether it can
simply be deleted. This all goes back to 5424bc557f (http*: add helper
methods for fetching objects (loose), 2009-06-06). But the obvious way
to make it all work is to be sure we set the pointer to NULL after
freeing it (and our freeing process closes the descriptor, so we know
there is no leak).
To make sure we do that consistently, we'll switch the pointer we take
in release_http_object_request() to a pointer-to-pointer, and we'll set
it to NULL ourselves. And then the compiler can help us find each caller
which needs to be updated.
Most cases will just pass "&obj_req->req", which will obviously do the
right thing. In a few cases, like http-push's finish_request(), we are
working with a copy of the pointer, so we don't NULL the original. But
it's OK because the next step is to free the struct containing the
original pointer anyway.
This lets us mark t5551 as leak-free. Ironically this is the "smart"
http test, and the leak here only affects dumb http. But there's a
single dumb-http invocation in there. The full dumb tests are in t5550,
which still has some more leaks.
This also makes t5559 leak-free, as it's just an HTTP/2 variant of
t5551. But we don't need to mark it as such, since it inherits the flag
from t5551.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When redacting headers for GIT_TRACE_CURL, we build up a redacted cookie
header in a local strbuf, and then copy it into the output. But we
forget to release the temporary strbuf, leaking it for every cookie
header we show.
The other redacted headers don't run into this problem, since they're
able to work in-place in the output buffer. But the cookie parsing is
too complicated for that, since we redact the cookies individually.
This leak is triggered by the cookie tests in t5551.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation for `http.proxy` describes that option, and the
environment variables it overrides, as supporting "the syntax understood
by curl". curl allows SOCKS proxies to use a path to a Unix domain
socket, like `socks5h://localhost/path/to/socket.sock`. Git should
therefore include, if present, the path part of the proxy URL in what it
passes to libcurl.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The http.cookieFile and http.saveCookies configuration variables
have a few values that need to be avoided, which are now ignored
with warning messages.
* jc/http-cookiefile:
http.c: cookie file tightening
The http transport can now be told to send request with
authentication material without first getting a 401 response.
* bc/http-proactive-auth:
http: allow authenticating proactively
The http.cookiefile configuration variable is used to call
curl_easy_setopt() to set CURLOPT_COOKIEFILE and if http.savecookies
is set, the same value is used for CURLOPT_COOKIEJAR. The former is
used only to read cookies at startup, the latter is used to write
cookies at the end.
The manual pages https://curl.se/libcurl/c/CURLOPT_COOKIEFILE.html
and https://curl.se/libcurl/c/CURLOPT_COOKIEJAR.html talk about two
interesting special values.
* "" (an empty string) given to CURLOPT_COOKIEFILE means not to
read cookies from any file upon startup.
* It is not specified what "" (an empty string) given to
CURLOPT_COOKIEJAR does; presumably open a file whose name is an
empty string and write cookies to it? In any case, that is not
what we want to see happen, ever.
* "-" (a dash) given to CURLOPT_COOKIEFILE makes cURL read cookies
from the standard input, and given to CURLOPT_COOKIEJAR makes
cURL write cookies to the standard output. Neither of which we
want ever to happen.
So, let's make sure we avoid these nonsense cases. Specifically,
when http.cookies is set to "-", ignore it with a warning, and when
it is set to "" and http.savecookies is set, ignore http.savecookies
with a warning.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When making a request over HTTP(S), Git only sends authentication if it
receives a 401 response. Thus, if a repository is open to the public
for reading, Git will typically never ask for authentication for fetches
and clones.
However, there may be times when a user would like to authenticate
nevertheless. For example, a forge may give higher rate limits to users
who authenticate because they are easier to contact in case of excessive
use. Or it may be useful for a known heavy user, such as an internal
service, to proactively authenticate so its use can be monitored and, if
necessary, throttled.
Let's make this possible with a new option, "http.proactiveAuth". This
option specifies a type of authentication which can be used to
authenticate against the host in question. This is necessary because we
lack the WWW-Authenticate header to provide us details; similarly, we
cannot accept certain types of authentication because we require
information from the server, such as a nonce or challenge, to
successfully authenticate.
If we're in auto mode and we got a username and password, set the
authentication scheme to Basic. libcurl will not send authentication
proactively unless there's a single choice of allowed authentication,
and we know in this case we didn't get an authtype entry telling us what
scheme to use, or we would have taken a different codepath and written
the header ourselves. In any event, of the other schemes that libcurl
supports, Digest and NTLM require a nonce or challenge, which means that
they cannot work with proactive auth, and GSSAPI does not use a username
and password at all, so Basic is the only logical choice among the
built-in options.
Note that the existing http_proactive_auth variable signifies proactive
auth if there are already credentials, which is different from the
functionality we're adding, which always seeks credentials even if none
are provided. Nonetheless, t5540 tests the existing behavior for
WebDAV-based pushes to an open repository without credentials, so we
preserve it. While at first this may seem an insecure and bizarre
decision, it may be that authentication is done with TLS certificates,
in which case it might actually provide a quite high level of security.
Expand the variable to use an enum to handle the additional cases and a
helper function to distinguish our new cases from the old ones.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help
transition the codebase to rely less on the availability of the
singleton the_repository instance.
* ps/use-the-repository:
hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE`
t/helper: remove dependency on `the_repository` in "proc-receive"
t/helper: fix segfault in "oid-array" command without repository
t/helper: use correct object hash in partial-clone helper
compat/fsmonitor: fix socket path in networked SHA256 repos
replace-object: use hash algorithm from passed-in repository
protocol-caps: use hash algorithm from passed-in repository
oidset: pass hash algorithm when parsing file
http-fetch: don't crash when parsing packfile without a repo
hash-ll: merge with "hash.h"
refs: avoid include cycle with "repository.h"
global: introduce `USE_THE_REPOSITORY_VARIABLE` macro
hash: require hash algorithm in `empty_tree_oid_hex()`
hash: require hash algorithm in `is_empty_{blob,tree}_oid()`
hash: make `is_null_oid()` independent of `the_repository`
hash: convert `oidcmp()` and `oideq()` to compare whole hash
global: ensure that object IDs are always padded
hash: require hash algorithm in `oidread()` and `oidclr()`
hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()`
hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.
It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.
Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit
For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).
Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `write_accept_language()`, we put all acceptable languages into an
array. While all entries in that array are allocated strings, the final
entry in that array is a string constant. This is fine because we
explicitly skip over the last entry when freeing the array, but will
cause warnings once we enable `-Wwrite-strings`.
Adapt the code to also allocate the final entry.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The out parameter of `git_config_string()` is a `const char **` even
though we transfer ownership of memory to the caller. This is quite
misleading and has led to many memory leaks all over the place. Adapt
the parameter to instead be `char **`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The out parameter of `git_config_pathname()` is a `const char **` even
though we transfer ownership of memory to the caller. This is quite
misleading and has led to many memory leaks all over the place. Adapt
the parameter to instead be `char **`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are various variables assigned via `git_config_string()` and
`git_config_pathname()` which are never free'd. This bug is relatable
because the out parameter of those functions are a `const char **`, even
though memory ownership is transferred to the caller.
We're about to adapt the functions to instead use `char **`. Prepare the
code accordingly. Note that the `(const char **)` casts will go away
once we have adapted the functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The credential helper protocol, together with the HTTP layer, have
been enhanced to support authentication schemes different from
username & password pair, like Bearer and NTLM.
* bc/credential-scheme-enhancement:
credential: add method for querying capabilities
credential-cache: implement authtype capability
t: add credential tests for authtype
credential: add support for multistage credential rounds
t5563: refactor for multi-stage authentication
docs: set a limit on credential line length
credential: enable state capability
credential: add an argument to keep state
http: add support for authtype and credential
docs: indicate new credential protocol fields
credential: add a field called "ephemeral"
credential: gate new fields on capability
credential: add a field for pre-encoded credentials
http: use new headers for each object request
remote-curl: reset headers on new request
credential: add an authtype field
Over HTTP, NTLM and Kerberos require two rounds of authentication on the
client side. It's possible that there are custom authentication schemes
that also implement this same approach. Since these are tricky schemes
to implement and the HTTP library in use may not always handle them
gracefully on all systems, it would be helpful to allow the credential
helper to implement them instead for increased portability and
robustness.
To allow this to happen, add a boolean flag, continue, that indicates
that instead of failing when we get a 401, we should retry another round
of authentication. However, this necessitates some changes in our
current credential code so that we can make this work.
Keep the state[] headers between iterations, but only use them to send
to the helper and only consider the new ones we read from the credential
helper to be valid on subsequent iterations. That avoids us passing
stale data when we finally approve or reject the credential. Similarly,
clear the multistage and wwwauth[] values appropriately so that we
don't pass stale data or think we're trying a multiround response when
we're not. Remove the credential values so that we can actually fill a
second time with new responses.
Limit the number of iterations of reauthentication we do to 3. This
means that if there's a problem, we'll terminate with an error message
instead of retrying indefinitely and not informing the user (and
possibly conducting a DoS on the server).
In our tests, handle creating multiple response output files from our
helper so we can verify that each of the messages sent is correct.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have the credential helper code set up to handle arbitrary
authentications schemes, let's add support for this in the HTTP code,
where we really want to use it. If we're using this new functionality,
don't set a username and password, and instead set a header wherever
we'd normally do so, including for proxy authentication.
Since we can now handle this case, ask the credential helper to enable
the appropriate capabilities.
Finally, if we're using the authtype value, set "Expect: 100-continue".
Any type of authentication that requires multiple rounds (such as NTLM
or Kerberos) requires a 100 Continue (if we're larger than
http.postBuffer) because otherwise we send the pack data before we're
authenticated, the push gets a 401 response, and we can't rewind the
stream. We don't know for certain what other custom schemes might
require this, the HTTP/1.1 standard has required handling this since
1999, the broken HTTP server for which we disabled this (Google's) is
now fixed and has been for some time, and libcurl has a 1-second
fallback in case the HTTP server is still broken. In addition, it is
not unreasonable to require compliance with a 25-year old standard to
use new Git features. For all of these reasons, do so here.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We support the new credential and authtype fields, but we lack a way to
indicate to a credential helper that we'd like them to be used. Without
some sort of indication, the credential helper doesn't know if it should
try to provide us a username and password, or a pre-encoded credential.
For example, the helper might prefer a more restricted Bearer token if
pre-encoded credentials are possible, but might have to fall back to
more general username and password if not.
Let's provide a simple way to indicate whether Git (or, for that matter,
the helper) is capable of understanding the authtype and credential
fields. We send this capability when we generate a request, and the
other side may reply to indicate to us that it does, too.
For now, don't enable sending capabilities for the HTTP code. In a
future commit, we'll introduce appropriate handling for that code,
which requires more in-depth work.
The logic for determining whether a capability is supported may seem
complex, but it is not. At each stage, we emit the capability to the
following stage if all preceding stages have declared it. Thus, if the
caller to git credential fill didn't declare it, then we won't send it
to the helper, and if fill's caller did send but the helper doesn't
understand it, then we won't send it on in the response. If we're an
internal user, then we know about all capabilities and will request
them.
For "git credential approve" and "git credential reject", we set the
helper capability before calling the helper, since we assume that the
input we're getting from the external program comes from a previous call
to "git credential fill", and thus we'll invoke send a capability to the
helper if and only if we got one from the standard input, which is the
correct behavior.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently we create one set of headers for all object requests and reuse
it. However, we'll need to adjust the headers for authentication
purposes in the future, so let's create a new set for each request so
that we can adjust them if the authentication changes.
Note that the cost of allocation here is tiny compared to the fact that
we're making a network call, not to mention probably a full TLS
connection, so this shouldn't have a significant impact on performance.
Moreover, nobody who cares about performance is using the dumb HTTP
protocol anyway, since it often makes huge numbers of requests compared
to the smart protocol.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix was added to work around a regression in libcURL 8.7.0 (which has
already been fixed in their tip of the tree).
* jk/libcurl-8.7-regression-workaround:
remote-curl: add Transfer-Encoding header only for older curl
INSTALL: bump libcurl version to 7.21.3
http: reset POSTFIELDSIZE when clearing curl handle
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
HTTP Header redaction code has been adjusted for a newer version of
cURL library that shows its traces differently from earlier
versions.
* jk/redact-h2h3-headers-fix:
http: update curl http/2 info matching for curl 8.3.0
http: factor out matching of curl http/2 trace lines
This backport to `maint-2.39` is needed to bring the following test
cases back to a working state in conjunction with recent libcurl
versions:
- t5559.17 GIT_TRACE_CURL redacts auth details
- t5559.18 GIT_CURL_VERBOSE redacts auth details
- t5559.38 cookies are redacted by default
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
In get_active_slot(), we return a CURL handle that may have been used
before (reusing them is good because it lets curl reuse the same
connection across many requests). We set a few curl options back to
defaults that may have been modified by previous requests.
We reset POSTFIELDS to NULL, but do not reset POSTFIELDSIZE (which
defaults to "-1"). This usually doesn't matter because most POSTs will
set both fields together anyway. But there is one exception: when
handling a large request in remote-curl's post_rpc(), we don't set
_either_, and instead set a READFUNCTION to stream data into libcurl.
This can interact weirdly with a stale POSTFIELDSIZE setting, because
curl will assume it should read only some set number of bytes from our
READFUNCTION. However, it has worked in practice because we also
manually set a "Transfer-Encoding: chunked" header, which libcurl uses
as a clue to set the POSTFIELDSIZE to -1 itself.
So everything works, but we're better off resetting the size manually
for a few reasons:
- there was a regression in curl 8.7.0 where the chunked header
detection didn't kick in, causing any large HTTP requests made by
Git to fail. This has since been fixed (but not yet released). In
the issue, curl folks recommended setting it explicitly to -1:
https://github.com/curl/curl/issues/13229#issuecomment-2029826058
and it indeed works around the regression. So even though it won't
be strictly necessary after the fix there, this will help folks who
end up using the affected libcurl versions.
- it's consistent with what a new curl handle would look like. Since
get_active_slot() may or may not return a used handle, this reduces
the possibility of heisenbugs that only appear with certain request
patterns.
Note that the recommendation in the curl issue is to actually drop the
manual Transfer-Encoding header. Modern libcurl will add the header
itself when streaming from a READFUNCTION. However, that code wasn't
added until 802aa5ae2 (HTTP: use chunked Transfer-Encoding for HTTP_POST
if size unknown, 2019-07-22), which is in curl 7.66.0. We claim to
support back to 7.19.5, so those older versions still need the manual
header.
This is a backport of 3242311742 (http: reset POSTFIELDSIZE when
clearing curl handle, 2024-04-02) into the `maint-2.39` branch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In get_active_slot(), we return a CURL handle that may have been used
before (reusing them is good because it lets curl reuse the same
connection across many requests). We set a few curl options back to
defaults that may have been modified by previous requests.
We reset POSTFIELDS to NULL, but do not reset POSTFIELDSIZE (which
defaults to "-1"). This usually doesn't matter because most POSTs will
set both fields together anyway. But there is one exception: when
handling a large request in remote-curl's post_rpc(), we don't set
_either_, and instead set a READFUNCTION to stream data into libcurl.
This can interact weirdly with a stale POSTFIELDSIZE setting, because
curl will assume it should read only some set number of bytes from our
READFUNCTION. However, it has worked in practice because we also
manually set a "Transfer-Encoding: chunked" header, which libcurl uses
as a clue to set the POSTFIELDSIZE to -1 itself.
So everything works, but we're better off resetting the size manually
for a few reasons:
- there was a regression in curl 8.7.0 where the chunked header
detection didn't kick in, causing any large HTTP requests made by
Git to fail. This has since been fixed (but not yet released). In
the issue, curl folks recommended setting it explicitly to -1:
https://github.com/curl/curl/issues/13229#issuecomment-2029826058
and it indeed works around the regression. So even though it won't
be strictly necessary after the fix there, this will help folks who
end up using the affected libcurl versions.
- it's consistent with what a new curl handle would look like. Since
get_active_slot() may or may not return a used handle, this reduces
the possibility of heisenbugs that only appear with certain request
patterns.
Note that the recommendation in the curl issue is to actually drop the
manual Transfer-Encoding header. Modern libcurl will add the header
itself when streaming from a READFUNCTION. However, that code wasn't
added until 802aa5ae2 (HTTP: use chunked Transfer-Encoding for HTTP_POST
if size unknown, 2019-07-22), which is in curl 7.66.0. We claim to
support back to 7.19.5, so those older versions still need the manual
header.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove unused header "#include".
* en/header-cleanup:
treewide: remove unnecessary includes in source files
treewide: add direct includes currently only pulled in transitively
trace2/tr2_tls.h: remove unnecessary include
submodule-config.h: remove unnecessary include
pkt-line.h: remove unnecessary include
line-log.h: remove unnecessary include
http.h: remove unnecessary include
fsmonitor--daemon.h: remove unnecessary includes
blame.h: remove unnecessary includes
archive.h: remove unnecessary include
treewide: remove unnecessary includes in source files
treewide: remove unnecessary includes from header files
Each of these were checked with
gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).
...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file. These cases were:
* builtin/credential-cache.c
* builtin/pull.c
* builtin/send-pack.c
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's somewhat traditional to respect sites' self-identification.
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
HTTP Header redaction code has been adjusted for a newer version of
cURL library that shows its traces differently from earlier
versions.
* jk/redact-h2h3-headers-fix:
http: update curl http/2 info matching for curl 8.3.0
http: factor out matching of curl http/2 trace lines
To redact header lines in http/2 curl traces, we have to parse past some
prefix bytes that curl sticks in the info lines it passes to us. That
changed once already, and we adapted in db30130165 (http: handle both
"h2" and "h2h3" in curl info lines, 2023-06-17).
Now it has changed again, in curl's fbacb14c4 (http2: cleanup trace
messages, 2023-08-04), which was released in curl 8.3.0. Running a build
of git linked against that version will fail to redact the trace (and as
before, t5559 notices and complains).
The format here is a little more complicated than the other ones, as it
now includes a "stream id". This is not constant but is always numeric,
so we can easily parse past it.
We'll continue to match the old versions, of course, since we want to
work with many different versions of curl. We can't even select one
format at compile time, because the behavior depends on the runtime
version of curl we use, not the version we build against.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have to parse out curl's http/2 trace lines so we can redact their
headers. We already match two different types of lines from various
vintages of curl. In preparation for adding another (which will be
slightly more complex), let's pull the matching into its own function,
rather than doing it in the middle of a conditional.
While we're doing so, let's expand the comment a bit to describe the two
matches. That probably should have been part of db30130165 (http: handle
both "h2" and "h2h3" in curl info lines, 2023-06-17), but will become
even more important as we add new types.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These functions are all used as callbacks for curl, so they have to
conform to a particular interface. But they don't need all of their
parameters:
- fwrite_null() throws away the input, so it doesn't look at most
parameters
- fwrite_wwwauth() in theory could take the auth struct in its void
pointer, but instead we just access it as the global http_auth
(matching the rest of the code in this file)
- curl_trace() always writes via the trace mechanism, so it doesn't
need its void pointer to know where to send things. Likewise, it
ignores the CURL parameter, since nothing we trace requires querying
the handle.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce reliance on a global state in the config reading API.
* gc/config-context:
config: pass source to config_parser_event_fn_t
config: add kvi.path, use it to evaluate includes
config.c: remove config_reader from configsets
config: pass kvi to die_bad_number()
trace2: plumb config kvi
config.c: pass ctx with CLI config
config: pass ctx with config files
config.c: pass ctx in configsets
config: add ctx arg to config_fn_t
urlmatch.h: use config_fn_t type
config: inline git_color_default_config
Header files cleanup.
* en/header-split-cache-h-part-3: (28 commits)
fsmonitor-ll.h: split this header out of fsmonitor.h
hash-ll, hashmap: move oidhash() to hash-ll
object-store-ll.h: split this header out of object-store.h
khash: name the structs that khash declares
merge-ll: rename from ll-merge
git-compat-util.h: remove unneccessary include of wildmatch.h
builtin.h: remove unneccessary includes
list-objects-filter-options.h: remove unneccessary include
diff.h: remove unnecessary include of oidset.h
repository: remove unnecessary include of path.h
log-tree: replace include of revision.h with simple forward declaration
cache.h: remove this no-longer-used header
read-cache*.h: move declarations for read-cache.c functions from cache.h
repository.h: move declaration of the_index from cache.h
merge.h: move declarations for merge.c from cache.h
diff.h: move declaration for global in diff.c from cache.h
preload-index.h: move declarations for preload-index.c from elsewhere
sparse-index.h: move declarations for sparse-index.c from cache.h
name-hash.h: move declarations for name-hash.c from cache.h
run-command.h: move declarations for run-command.c from cache.h
...
Plumb "struct key_value_info" through all code paths that end in
die_bad_number(), which lets us remove the helper functions that read
analogous values from "struct config_reader". As a result, nothing reads
config_reader.config_kvi any more, so remove that too.
In config.c, this requires changing the signature of
git_configset_get_value() to 'return' "kvi" in an out parameter so that
git_configset_get_<type>() can pass it to git_config_<type>(). Only
numeric types will use "kvi", so for non-numeric types (e.g.
git_configset_get_string()), pass NULL to indicate that the out
parameter isn't needed.
Outside of config.c, config callbacks now need to pass "ctx->kvi" to any
of the git_config_<type>() functions that parse a config string into a
number type. Included is a .cocci patch to make that refactor.
The only exceptional case is builtin/config.c, where git_config_<type>()
is called outside of a config callback (namely, on user-provided input),
so config source information has never been available. In this case,
die_bad_number() defaults to a generic, but perfectly descriptive
message. Let's provide a safe, non-NULL for "kvi" anyway, but make sure
not to change the message.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).
In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.
Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:
- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed
Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.
The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:
- trace2/tr2_cfg.c:tr2_cfg_set_fl()
This is indirectly called by git_config_set() so that the trace2
machinery can notice the new config values and update its settings
using the tr2 config parsing function, i.e. tr2_cfg_cb().
- builtin/checkout.c:checkout_main()
This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
This might be worth refactoring away in the future, since
git_xmerge_config() can call git_default_config(), which can do much
more than just parsing.
Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The vast majority of files including object-store.h did not need dir.h
nor khash.h. Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.
After this patch:
$ git grep -h include..object-store | sort | uniq -c
2 #include "object-store.h"
129 #include "object-store-ll.h"
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When redacting auth tokens in trace output from curl, we look for http/2
headers of the form "h2h3 [header: value]". This comes from b637a41ebe
(http: redact curl h2h3 headers in info, 2022-11-11).
But the "h2h3" prefix changed to just "h2" in curl's fc2f1e547 (http2:
support HTTP/2 to forward proxies, non-tunneling, 2023-04-14). That's in
released version curl 8.1.0; linking against that version means we'll
fail to correctly redact the trace. Our t5559.17 notices and fails.
We can fix this by matching either prefix, which should handle both old
and new versions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Header clean-up.
* en/header-split-cache-h: (24 commits)
protocol.h: move definition of DEFAULT_GIT_PORT from cache.h
mailmap, quote: move declarations of global vars to correct unit
treewide: reduce includes of cache.h in other headers
treewide: remove double forward declaration of read_in_full
cache.h: remove unnecessary includes
treewide: remove cache.h inclusion due to pager.h changes
pager.h: move declarations for pager.c functions from cache.h
treewide: remove cache.h inclusion due to editor.h changes
editor: move editor-related functions and declarations into common file
treewide: remove cache.h inclusion due to object.h changes
object.h: move some inline functions and defines from cache.h
treewide: remove cache.h inclusion due to object-file.h changes
object-file.h: move declarations for object-file.c functions from cache.h
treewide: remove cache.h inclusion due to git-zlib changes
git-zlib: move declarations for git-zlib functions from cache.h
treewide: remove cache.h inclusion due to object-name.h changes
object-name.h: move declarations for object-name.c functions from cache.h
treewide: remove unnecessary cache.h inclusion
treewide: be explicit about dependence on mem-pool.h
treewide: be explicit about dependence on oid-array.h
...
Dozens of files made use of trace and trace2 functions, without
explicitly including trace.h or trace2.h. This made it more difficult
to find which files could remove a dependence on cache.h. Make C files
explicitly include trace.h or trace2.h if they are using them.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>