git/remote-curl.c

1415 lines
35 KiB
C
Raw Normal View History

#include "cache.h"
#include "config.h"
#include "remote.h"
#include "connect.h"
#include "strbuf.h"
#include "walker.h"
#include "http.h"
#include "exec-cmd.h"
#include "run-command.h"
#include "pkt-line.h"
#include "string-list.h"
#include "sideband.h"
#include "argv-array.h"
http: hoist credential request out of handle_curl_result When we are handling a curl response code in http_request or in the remote-curl RPC code, we use the handle_curl_result helper to translate curl's response into an easy-to-use code. When we see an HTTP 401, we do one of two things: 1. If we already had a filled-in credential, we mark it as rejected, and then return HTTP_NOAUTH to indicate to the caller that we failed. 2. If we didn't, then we ask for a new credential and tell the caller HTTP_REAUTH to indicate that they may want to try again. Rejecting in the first case makes sense; it is the natural result of the request we just made. However, prompting for more credentials in the second step does not always make sense. We do not know for sure that the caller is going to make a second request, and nor are we sure that it will be to the same URL. Logically, the prompt belongs not to the request we just finished, but to the request we are (maybe) about to make. In practice, it is very hard to trigger any bad behavior. Currently, if we make a second request, it will always be to the same URL (even in the face of redirects, because curl handles the redirects internally). And we almost always retry on HTTP_REAUTH these days. The one exception is if we are streaming a large RPC request to the server (e.g., a pushed packfile), in which case we cannot restart. It's extremely unlikely to see a 401 response at this stage, though, as we would typically have seen it when we sent a probe request, before streaming the data. This patch drops the automatic prompt out of case 2, and instead requires the caller to do it. This is a few extra lines of code, and the bug it fixes is unlikely to come up in practice. But it is conceptually cleaner, and paves the way for better handling of credentials across redirects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2013-09-28 16:31:45 +08:00
#include "credential.h"
#include "sha1-array.h"
#include "send-pack.h"
#include "protocol.h"
#include "quote.h"
static struct remote *remote;
/* always ends with a trailing slash */
static struct strbuf url = STRBUF_INIT;
struct options {
int verbosity;
unsigned long depth;
char *deepen_since;
struct string_list deepen_not;
struct string_list push_options;
char *filter;
unsigned progress : 1,
check_self_contained_and_connected : 1,
cloning : 1,
update_shallow : 1,
followtags : 1,
dry_run : 1,
signed push: teach smart-HTTP to pass "git push --signed" around The "--signed" option received by "git push" is first passed to the transport layer, which the native transport directly uses to notice that a push certificate needs to be sent. When the transport-helper is involved, however, the option needs to be told to the helper with set_helper_option(), and the helper needs to take necessary action. For the smart-HTTP helper, the "necessary action" involves spawning the "git send-pack" subprocess with the "--signed" option. Once the above all gets wired in, the smart-HTTP transport now can use the push certificate mechanism to authenticate its pushes. Add a test that is modeled after tests for the native transport in t5534-push-signed.sh to t5541-http-push-smart.sh. Update the test Apache configuration to pass GNUPGHOME environment variable through. As PassEnv would trigger warnings for an environment variable that is not set, export it from test-lib.sh set to a harmless value when GnuPG is not being used in the tests. Note that the added test is deliberately loose and does not check the nonce in this step. This is because the stateless RPC mode is inevitably flaky and a nonce that comes back in the actual push processing is one issued by a different process; if the two interactions with the server crossed a second boundary, the nonces will not match and such a check will fail. A later patch in the series will work around this shortcoming. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-16 05:59:00 +08:00
thin : 1,
/* One of the SEND_PACK_PUSH_CERT_* constants. */
fetch, upload-pack: --deepen=N extends shallow boundary by N commits In git-fetch, --depth argument is always relative with the latest remote refs. This makes it a bit difficult to cover this use case, where the user wants to make the shallow history, say 3 levels deeper. It would work if remote refs have not moved yet, but nobody can guarantee that, especially when that use case is performed a couple months after the last clone or "git fetch --depth". Also, modifying shallow boundary using --depth does not work well with clones created by --since or --not. This patch fixes that. A new argument --deepen=<N> will add <N> more (*) parent commits to the current history regardless of where remote refs are. Have/Want negotiation is still respected. So if remote refs move, the server will send two chunks: one between "have" and "want" and another to extend shallow history. In theory, the client could send no "want"s in order to get the second chunk only. But the protocol does not allow that. Either you send no want lines, which means ls-remote; or you have to send at least one want line that carries deep-relative to the server.. The main work was done by Dongcan Jiang. I fixed it up here and there. And of course all the bugs belong to me. (*) We could even support --deepen=<N> where <N> is negative. In that case we can cut some history from the shallow clone. This operation (and --depth=<shorter depth>) does not require interaction with remote side (and more complicated to implement as a result). Helped-by: Duy Nguyen <pclouds@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Dongcan Jiang <dongcan.jiang@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-12 18:54:09 +08:00
push_cert : 2,
deepen_relative : 1,
from_promisor : 1,
no_dependents : 1;
};
static struct options options;
static struct string_list cas_options = STRING_LIST_INIT_DUP;
static int set_option(const char *name, const char *value)
{
if (!strcmp(name, "verbosity")) {
char *end;
int v = strtol(value, &end, 10);
if (value == end || *end)
return -1;
options.verbosity = v;
return 0;
}
else if (!strcmp(name, "progress")) {
if (!strcmp(value, "true"))
options.progress = 1;
else if (!strcmp(value, "false"))
options.progress = 0;
else
return -1;
return 0;
}
else if (!strcmp(name, "depth")) {
char *end;
unsigned long v = strtoul(value, &end, 10);
if (value == end || *end)
return -1;
options.depth = v;
return 0;
}
else if (!strcmp(name, "deepen-since")) {
options.deepen_since = xstrdup(value);
return 0;
}
else if (!strcmp(name, "deepen-not")) {
string_list_append(&options.deepen_not, value);
return 0;
}
fetch, upload-pack: --deepen=N extends shallow boundary by N commits In git-fetch, --depth argument is always relative with the latest remote refs. This makes it a bit difficult to cover this use case, where the user wants to make the shallow history, say 3 levels deeper. It would work if remote refs have not moved yet, but nobody can guarantee that, especially when that use case is performed a couple months after the last clone or "git fetch --depth". Also, modifying shallow boundary using --depth does not work well with clones created by --since or --not. This patch fixes that. A new argument --deepen=<N> will add <N> more (*) parent commits to the current history regardless of where remote refs are. Have/Want negotiation is still respected. So if remote refs move, the server will send two chunks: one between "have" and "want" and another to extend shallow history. In theory, the client could send no "want"s in order to get the second chunk only. But the protocol does not allow that. Either you send no want lines, which means ls-remote; or you have to send at least one want line that carries deep-relative to the server.. The main work was done by Dongcan Jiang. I fixed it up here and there. And of course all the bugs belong to me. (*) We could even support --deepen=<N> where <N> is negative. In that case we can cut some history from the shallow clone. This operation (and --depth=<shorter depth>) does not require interaction with remote side (and more complicated to implement as a result). Helped-by: Duy Nguyen <pclouds@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Dongcan Jiang <dongcan.jiang@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-12 18:54:09 +08:00
else if (!strcmp(name, "deepen-relative")) {
if (!strcmp(value, "true"))
options.deepen_relative = 1;
else if (!strcmp(value, "false"))
options.deepen_relative = 0;
else
return -1;
return 0;
}
else if (!strcmp(name, "followtags")) {
if (!strcmp(value, "true"))
options.followtags = 1;
else if (!strcmp(value, "false"))
options.followtags = 0;
else
return -1;
return 0;
}
else if (!strcmp(name, "dry-run")) {
if (!strcmp(value, "true"))
options.dry_run = 1;
else if (!strcmp(value, "false"))
options.dry_run = 0;
else
return -1;
return 0;
}
else if (!strcmp(name, "check-connectivity")) {
if (!strcmp(value, "true"))
options.check_self_contained_and_connected = 1;
else if (!strcmp(value, "false"))
options.check_self_contained_and_connected = 0;
else
return -1;
return 0;
}
else if (!strcmp(name, "cas")) {
struct strbuf val = STRBUF_INIT;
strbuf_addf(&val, "--" CAS_OPT_NAME "=%s", value);
string_list_append(&cas_options, val.buf);
strbuf_release(&val);
return 0;
} else if (!strcmp(name, "cloning")) {
if (!strcmp(value, "true"))
options.cloning = 1;
else if (!strcmp(value, "false"))
options.cloning = 0;
else
return -1;
return 0;
} else if (!strcmp(name, "update-shallow")) {
if (!strcmp(value, "true"))
options.update_shallow = 1;
else if (!strcmp(value, "false"))
options.update_shallow = 0;
else
return -1;
return 0;
signed push: teach smart-HTTP to pass "git push --signed" around The "--signed" option received by "git push" is first passed to the transport layer, which the native transport directly uses to notice that a push certificate needs to be sent. When the transport-helper is involved, however, the option needs to be told to the helper with set_helper_option(), and the helper needs to take necessary action. For the smart-HTTP helper, the "necessary action" involves spawning the "git send-pack" subprocess with the "--signed" option. Once the above all gets wired in, the smart-HTTP transport now can use the push certificate mechanism to authenticate its pushes. Add a test that is modeled after tests for the native transport in t5534-push-signed.sh to t5541-http-push-smart.sh. Update the test Apache configuration to pass GNUPGHOME environment variable through. As PassEnv would trigger warnings for an environment variable that is not set, export it from test-lib.sh set to a harmless value when GnuPG is not being used in the tests. Note that the added test is deliberately loose and does not check the nonce in this step. This is because the stateless RPC mode is inevitably flaky and a nonce that comes back in the actual push processing is one issued by a different process; if the two interactions with the server crossed a second boundary, the nonces will not match and such a check will fail. A later patch in the series will work around this shortcoming. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-16 05:59:00 +08:00
} else if (!strcmp(name, "pushcert")) {
if (!strcmp(value, "true"))
options.push_cert = SEND_PACK_PUSH_CERT_ALWAYS;
signed push: teach smart-HTTP to pass "git push --signed" around The "--signed" option received by "git push" is first passed to the transport layer, which the native transport directly uses to notice that a push certificate needs to be sent. When the transport-helper is involved, however, the option needs to be told to the helper with set_helper_option(), and the helper needs to take necessary action. For the smart-HTTP helper, the "necessary action" involves spawning the "git send-pack" subprocess with the "--signed" option. Once the above all gets wired in, the smart-HTTP transport now can use the push certificate mechanism to authenticate its pushes. Add a test that is modeled after tests for the native transport in t5534-push-signed.sh to t5541-http-push-smart.sh. Update the test Apache configuration to pass GNUPGHOME environment variable through. As PassEnv would trigger warnings for an environment variable that is not set, export it from test-lib.sh set to a harmless value when GnuPG is not being used in the tests. Note that the added test is deliberately loose and does not check the nonce in this step. This is because the stateless RPC mode is inevitably flaky and a nonce that comes back in the actual push processing is one issued by a different process; if the two interactions with the server crossed a second boundary, the nonces will not match and such a check will fail. A later patch in the series will work around this shortcoming. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-16 05:59:00 +08:00
else if (!strcmp(value, "false"))
options.push_cert = SEND_PACK_PUSH_CERT_NEVER;
else if (!strcmp(value, "if-asked"))
options.push_cert = SEND_PACK_PUSH_CERT_IF_ASKED;
signed push: teach smart-HTTP to pass "git push --signed" around The "--signed" option received by "git push" is first passed to the transport layer, which the native transport directly uses to notice that a push certificate needs to be sent. When the transport-helper is involved, however, the option needs to be told to the helper with set_helper_option(), and the helper needs to take necessary action. For the smart-HTTP helper, the "necessary action" involves spawning the "git send-pack" subprocess with the "--signed" option. Once the above all gets wired in, the smart-HTTP transport now can use the push certificate mechanism to authenticate its pushes. Add a test that is modeled after tests for the native transport in t5534-push-signed.sh to t5541-http-push-smart.sh. Update the test Apache configuration to pass GNUPGHOME environment variable through. As PassEnv would trigger warnings for an environment variable that is not set, export it from test-lib.sh set to a harmless value when GnuPG is not being used in the tests. Note that the added test is deliberately loose and does not check the nonce in this step. This is because the stateless RPC mode is inevitably flaky and a nonce that comes back in the actual push processing is one issued by a different process; if the two interactions with the server crossed a second boundary, the nonces will not match and such a check will fail. A later patch in the series will work around this shortcoming. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-16 05:59:00 +08:00
else
return -1;
return 0;
} else if (!strcmp(name, "push-option")) {
if (*value != '"')
string_list_append(&options.push_options, value);
else {
struct strbuf unquoted = STRBUF_INIT;
if (unquote_c_style(&unquoted, value, NULL) < 0)
die("invalid quoting in push-option value");
string_list_append_nodup(&options.push_options,
strbuf_detach(&unquoted, NULL));
}
return 0;
#if LIBCURL_VERSION_NUM >= 0x070a08
} else if (!strcmp(name, "family")) {
if (!strcmp(value, "ipv4"))
git_curl_ipresolve = CURL_IPRESOLVE_V4;
else if (!strcmp(value, "ipv6"))
git_curl_ipresolve = CURL_IPRESOLVE_V6;
else if (!strcmp(value, "all"))
git_curl_ipresolve = CURL_IPRESOLVE_WHATEVER;
else
return -1;
return 0;
#endif /* LIBCURL_VERSION_NUM >= 0x070a08 */
} else if (!strcmp(name, "from-promisor")) {
options.from_promisor = 1;
return 0;
} else if (!strcmp(name, "no-dependents")) {
options.no_dependents = 1;
return 0;
} else if (!strcmp(name, "filter")) {
options.filter = xstrdup(value);
return 0;
} else {
return 1 /* unsupported */;
}
}
struct discovery {
char *service;
char *buf_alloc;
char *buf;
size_t len;
remote-curl: always parse incoming refs When remote-curl receives a list of refs from a server, it keeps the whole buffer intact. When we get a "list" command, we feed the result to get_remote_heads, and when we get a "fetch" or "push" command, we feed it to fetch-pack or send-pack, respectively. If the HTTP response from the server is truncated for any reason, we will get an incomplete ref advertisement. If we then feed this incomplete list to fetch-pack, one of a few things may happen: 1. If the truncation is in a packet header, fetch-pack will notice the bogus line and complain. 2. If the truncation is inside a packet, fetch-pack will keep waiting for us to send the rest of the packet, which we never will. 3. If the truncation is at a packet boundary, fetch-pack will keep waiting for us to send the next packet, which we never will. As a result, fetch-pack hangs, waiting for input. However, remote-curl believes it has sent all of the advertisement, and therefore waits for fetch-pack to speak. The two processes end up in a deadlock. We do notice the broken ref list if we feed it to get_remote_heads. So if git asks the helper to do a "list" followed by a "fetch", we are safe; we'll abort during the list operation, which parses the refs. This patch teaches remote-curl to always parse and save the incoming ref list when we read the ref advertisement from a server. That means that we will always verify and abort before even running fetch-pack (or send-pack) when reading a corrupted list, even if we do not run the "list" command explicitly. Since we save the result, in the common case of running "list" then "fetch", we do not do any extra parsing at all. In the case of just a "fetch", we do an extra round of parsing, but only once. Note also that the "fetch" case will now also initialize server_capabilities from the remote (in remote-curl; we already would do so inside fetch-pack). Doing "list+fetch" already does this. It doesn't actually matter now, but the new behavior is arguably more correct, should remote-curl ever start caring about the server's capability list. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-21 04:07:19 +08:00
struct ref *refs;
struct oid_array shallow;
enum protocol_version version;
unsigned proto_git : 1;
};
static struct discovery *last_discovery;
static struct ref *parse_git_refs(struct discovery *heads, int for_push)
{
struct ref *list = NULL;
struct packet_reader reader;
packet_reader_init(&reader, -1, heads->buf, heads->len,
PACKET_READ_CHOMP_NEWLINE |
PACKET_READ_GENTLE_ON_EOF);
heads->version = discover_version(&reader);
switch (heads->version) {
case protocol_v2:
/*
* Do nothing. This isn't a list of refs but rather a
* capability advertisement. Client would have run
* 'stateless-connect' so we'll dump this capability listing
* and let them request the refs themselves.
*/
break;
case protocol_v1:
case protocol_v0:
get_remote_heads(&reader, &list, for_push ? REF_NORMAL : 0,
NULL, &heads->shallow);
break;
case protocol_unknown_version:
BUG("unknown protocol version");
}
return list;
}
static struct ref *parse_info_refs(struct discovery *heads)
{
char *data, *start, *mid;
char *ref_name;
int i = 0;
struct ref *refs = NULL;
struct ref *ref = NULL;
struct ref *last_ref = NULL;
data = heads->buf;
start = NULL;
mid = data;
while (i < heads->len) {
if (!start) {
start = &data[i];
}
if (data[i] == '\t')
mid = &data[i];
if (data[i] == '\n') {
if (mid - start != 40)
die("%sinfo/refs not valid: is this a git repository?",
url.buf);
data[i] = 0;
ref_name = mid + 1;
ref = alloc_ref(ref_name);
get_oid_hex(start, &ref->old_oid);
if (!refs)
refs = ref;
if (last_ref)
last_ref->next = ref;
last_ref = ref;
start = NULL;
}
i++;
}
ref = alloc_ref("HEAD");
if (!http_fetch_ref(url.buf, ref) &&
!resolve_remote_symref(ref, refs)) {
ref->next = refs;
refs = ref;
} else {
free(ref);
}
return refs;
}
static void free_discovery(struct discovery *d)
{
if (d) {
if (d == last_discovery)
last_discovery = NULL;
free(d->shallow.oid);
free(d->buf_alloc);
remote-curl: always parse incoming refs When remote-curl receives a list of refs from a server, it keeps the whole buffer intact. When we get a "list" command, we feed the result to get_remote_heads, and when we get a "fetch" or "push" command, we feed it to fetch-pack or send-pack, respectively. If the HTTP response from the server is truncated for any reason, we will get an incomplete ref advertisement. If we then feed this incomplete list to fetch-pack, one of a few things may happen: 1. If the truncation is in a packet header, fetch-pack will notice the bogus line and complain. 2. If the truncation is inside a packet, fetch-pack will keep waiting for us to send the rest of the packet, which we never will. 3. If the truncation is at a packet boundary, fetch-pack will keep waiting for us to send the next packet, which we never will. As a result, fetch-pack hangs, waiting for input. However, remote-curl believes it has sent all of the advertisement, and therefore waits for fetch-pack to speak. The two processes end up in a deadlock. We do notice the broken ref list if we feed it to get_remote_heads. So if git asks the helper to do a "list" followed by a "fetch", we are safe; we'll abort during the list operation, which parses the refs. This patch teaches remote-curl to always parse and save the incoming ref list when we read the ref advertisement from a server. That means that we will always verify and abort before even running fetch-pack (or send-pack) when reading a corrupted list, even if we do not run the "list" command explicitly. Since we save the result, in the common case of running "list" then "fetch", we do not do any extra parsing at all. In the case of just a "fetch", we do an extra round of parsing, but only once. Note also that the "fetch" case will now also initialize server_capabilities from the remote (in remote-curl; we already would do so inside fetch-pack). Doing "list+fetch" already does this. It doesn't actually matter now, but the new behavior is arguably more correct, should remote-curl ever start caring about the server's capability list. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-21 04:07:19 +08:00
free_refs(d->refs);
free(d->service);
free(d);
}
}
static int show_http_message(struct strbuf *type, struct strbuf *charset,
struct strbuf *msg)
remote-curl: show server content on http errors If an http request to a remote git server fails, we show only the http response code, or sometimes a custom message for particular codes. This gives the server no opportunity to offer a more detailed explanation of the reason for the failure, or to give extra advice. This patch teaches remote-curl to record and display the body content of a failed http response. We only display such responses when the content-type is advertised as text/plain, as it is the most likely to look presentable on the user's terminal (and it is hoped to be a good indication that the message is intended for git clients, and not for a web browser). Each line of the new output is prepended with "remote:". Example output may look like this (assuming the server is configured to display such a helpful message): $ GIT_SMART_HTTP=0 git clone https://example.com/some/repo.git Cloning into 'repo'... remote: Sorry, fetching via dumb http is forbidden. remote: Please upgrade your git client to v1.6.6 or greater remote: and make sure that smart-http is enabled. error: The requested URL returned error: 403 while accessing http://localhost:5001/some/repo.git/info/refs fatal: HTTP request failed For the sake of simplicity, we only record and display these errors during the initial fetch of the ref list, as that is the initial contact with the server and where the most common, interesting errors happen (and there is already precedent, as that is the only place we currently massage http error codes into more helpful messages). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-06 06:17:23 +08:00
{
const char *p, *eol;
/*
* We only show text/plain parts, as other types are likely
* to be ugly to look at on the user's terminal.
*/
if (strcmp(type->buf, "text/plain"))
remote-curl: show server content on http errors If an http request to a remote git server fails, we show only the http response code, or sometimes a custom message for particular codes. This gives the server no opportunity to offer a more detailed explanation of the reason for the failure, or to give extra advice. This patch teaches remote-curl to record and display the body content of a failed http response. We only display such responses when the content-type is advertised as text/plain, as it is the most likely to look presentable on the user's terminal (and it is hoped to be a good indication that the message is intended for git clients, and not for a web browser). Each line of the new output is prepended with "remote:". Example output may look like this (assuming the server is configured to display such a helpful message): $ GIT_SMART_HTTP=0 git clone https://example.com/some/repo.git Cloning into 'repo'... remote: Sorry, fetching via dumb http is forbidden. remote: Please upgrade your git client to v1.6.6 or greater remote: and make sure that smart-http is enabled. error: The requested URL returned error: 403 while accessing http://localhost:5001/some/repo.git/info/refs fatal: HTTP request failed For the sake of simplicity, we only record and display these errors during the initial fetch of the ref list, as that is the initial contact with the server and where the most common, interesting errors happen (and there is already precedent, as that is the only place we currently massage http error codes into more helpful messages). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-06 06:17:23 +08:00
return -1;
if (charset->len)
strbuf_reencode(msg, charset->buf, get_log_output_encoding());
remote-curl: show server content on http errors If an http request to a remote git server fails, we show only the http response code, or sometimes a custom message for particular codes. This gives the server no opportunity to offer a more detailed explanation of the reason for the failure, or to give extra advice. This patch teaches remote-curl to record and display the body content of a failed http response. We only display such responses when the content-type is advertised as text/plain, as it is the most likely to look presentable on the user's terminal (and it is hoped to be a good indication that the message is intended for git clients, and not for a web browser). Each line of the new output is prepended with "remote:". Example output may look like this (assuming the server is configured to display such a helpful message): $ GIT_SMART_HTTP=0 git clone https://example.com/some/repo.git Cloning into 'repo'... remote: Sorry, fetching via dumb http is forbidden. remote: Please upgrade your git client to v1.6.6 or greater remote: and make sure that smart-http is enabled. error: The requested URL returned error: 403 while accessing http://localhost:5001/some/repo.git/info/refs fatal: HTTP request failed For the sake of simplicity, we only record and display these errors during the initial fetch of the ref list, as that is the initial contact with the server and where the most common, interesting errors happen (and there is already precedent, as that is the only place we currently massage http error codes into more helpful messages). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-06 06:17:23 +08:00
strbuf_trim(msg);
if (!msg->len)
return -1;
p = msg->buf;
do {
eol = strchrnul(p, '\n');
fprintf(stderr, "remote: %.*s\n", (int)(eol - p), p);
p = eol + 1;
} while(*eol);
return 0;
}
static int get_protocol_http_header(enum protocol_version version,
struct strbuf *header)
{
if (version > 0) {
strbuf_addf(header, GIT_PROTOCOL_HEADER ": version=%d",
version);
return 1;
}
return 0;
}
static struct discovery *discover_refs(const char *service, int for_push)
{
struct strbuf exp = STRBUF_INIT;
struct strbuf type = STRBUF_INIT;
struct strbuf charset = STRBUF_INIT;
struct strbuf buffer = STRBUF_INIT;
struct strbuf refs_url = STRBUF_INIT;
remote-curl: rewrite base url from info/refs redirects For efficiency and security reasons, an earlier commit in this series taught http_get_* to re-write the base url based on redirections we saw while making a specific request. This commit wires that option into the info/refs request, meaning that a redirect from http://example.com/foo.git/info/refs to https://example.com/bar.git/info/refs will behave as if "https://example.com/bar.git" had been provided to git in the first place. The tests bear some explanation. We introduce two new hierearchies into the httpd test config: 1. Requests to /smart-redir-limited will work only for the initial info/refs request, but not any subsequent requests. As a result, we can confirm whether the client is re-rooting its requests after the initial contact, since otherwise it will fail (it will ask for "repo.git/git-upload-pack", which is not redirected). 2. Requests to smart-redir-auth will redirect, and require auth after the redirection. Since we are using the redirected base for further requests, we also update the credential struct, in order not to mislead the user (or credential helpers) about which credential is needed. We can therefore check the GIT_ASKPASS prompts to make sure we are prompting for the new location. Because we have neither multiple servers nor https support in our test setup, we can only redirect between paths, meaning we need to turn on credential.useHttpPath to see the difference. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2013-09-28 16:35:35 +08:00
struct strbuf effective_url = STRBUF_INIT;
struct strbuf protocol_header = STRBUF_INIT;
struct string_list extra_headers = STRING_LIST_INIT_DUP;
struct discovery *last = last_discovery;
int http_ret, maybe_smart = 0;
struct http_get_options http_options;
enum protocol_version version = get_protocol_version_config();
if (last && !strcmp(service, last->service))
return last;
free_discovery(last);
strbuf_addf(&refs_url, "%sinfo/refs", url.buf);
if ((starts_with(url.buf, "http://") || starts_with(url.buf, "https://")) &&
git_env_bool("GIT_SMART_HTTP", 1)) {
maybe_smart = 1;
if (!strchr(url.buf, '?'))
strbuf_addch(&refs_url, '?');
else
strbuf_addch(&refs_url, '&');
strbuf_addf(&refs_url, "service=%s", service);
}
/*
* NEEDSWORK: If we are trying to use protocol v2 and we are planning
* to perform a push, then fallback to v0 since the client doesn't know
* how to push yet using v2.
*/
if (version == protocol_v2 && !strcmp("git-receive-pack", service))
version = protocol_v0;
/* Add the extra Git-Protocol header */
if (get_protocol_http_header(version, &protocol_header))
string_list_append(&extra_headers, protocol_header.buf);
memset(&http_options, 0, sizeof(http_options));
http_options.content_type = &type;
http_options.charset = &charset;
http_options.effective_url = &effective_url;
http_options.base_url = &url;
http_options.extra_headers = &extra_headers;
http: make redirects more obvious We instruct curl to always follow HTTP redirects. This is convenient, but it creates opportunities for malicious servers to create confusing situations. For instance, imagine Alice is a git user with access to a private repository on Bob's server. Mallory runs her own server and wants to access objects from Bob's repository. Mallory may try a few tricks that involve asking Alice to clone from her, build on top, and then push the result: 1. Mallory may simply redirect all fetch requests to Bob's server. Git will transparently follow those redirects and fetch Bob's history, which Alice may believe she got from Mallory. The subsequent push seems like it is just feeding Mallory back her own objects, but is actually leaking Bob's objects. There is nothing in git's output to indicate that Bob's repository was involved at all. The downside (for Mallory) of this attack is that Alice will have received Bob's entire repository, and is likely to notice that when building on top of it. 2. If Mallory happens to know the sha1 of some object X in Bob's repository, she can instead build her own history that references that object. She then runs a dumb http server, and Alice's client will fetch each object individually. When it asks for X, Mallory redirects her to Bob's server. The end result is that Alice obtains objects from Bob, but they may be buried deep in history. Alice is less likely to notice. Both of these attacks are fairly hard to pull off. There's a social component in getting Mallory to convince Alice to work with her. Alice may be prompted for credentials in accessing Bob's repository (but not always, if she is using a credential helper that caches). Attack (1) requires a certain amount of obliviousness on Alice's part while making a new commit. Attack (2) requires that Mallory knows a sha1 in Bob's repository, that Bob's server supports dumb http, and that the object in question is loose on Bob's server. But we can probably make things a bit more obvious without any loss of functionality. This patch does two things to that end. First, when we encounter a whole-repo redirect during the initial ref discovery, we now inform the user on stderr, making attack (1) much more obvious. Second, the decision to follow redirects is now configurable. The truly paranoid can set the new http.followRedirects to false to avoid any redirection entirely. But for a more practical default, we will disallow redirects only after the initial ref discovery. This is enough to thwart attacks similar to (2), while still allowing the common use of redirects at the repository level. Since c93c92f30 (http: update base URLs when we see redirects, 2013-09-28) we re-root all further requests from the redirect destination, which should generally mean that no further redirection is necessary. As an escape hatch, in case there really is a server that needs to redirect individual requests, the user can set http.followRedirects to "true" (and this can be done on a per-server basis via http.*.followRedirects config). Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-07 02:24:41 +08:00
http_options.initial_request = 1;
http_options.no_cache = 1;
http_options.keep_error = 1;
http_ret = http_get_strbuf(refs_url.buf, &buffer, &http_options);
switch (http_ret) {
case HTTP_OK:
break;
case HTTP_MISSING_TARGET:
show_http_message(&type, &charset, &buffer);
die("repository '%s' not found", url.buf);
case HTTP_NOAUTH:
show_http_message(&type, &charset, &buffer);
die("Authentication failed for '%s'", url.buf);
default:
show_http_message(&type, &charset, &buffer);
die("unable to access '%s': %s", url.buf, curl_errorstr);
}
http: make redirects more obvious We instruct curl to always follow HTTP redirects. This is convenient, but it creates opportunities for malicious servers to create confusing situations. For instance, imagine Alice is a git user with access to a private repository on Bob's server. Mallory runs her own server and wants to access objects from Bob's repository. Mallory may try a few tricks that involve asking Alice to clone from her, build on top, and then push the result: 1. Mallory may simply redirect all fetch requests to Bob's server. Git will transparently follow those redirects and fetch Bob's history, which Alice may believe she got from Mallory. The subsequent push seems like it is just feeding Mallory back her own objects, but is actually leaking Bob's objects. There is nothing in git's output to indicate that Bob's repository was involved at all. The downside (for Mallory) of this attack is that Alice will have received Bob's entire repository, and is likely to notice that when building on top of it. 2. If Mallory happens to know the sha1 of some object X in Bob's repository, she can instead build her own history that references that object. She then runs a dumb http server, and Alice's client will fetch each object individually. When it asks for X, Mallory redirects her to Bob's server. The end result is that Alice obtains objects from Bob, but they may be buried deep in history. Alice is less likely to notice. Both of these attacks are fairly hard to pull off. There's a social component in getting Mallory to convince Alice to work with her. Alice may be prompted for credentials in accessing Bob's repository (but not always, if she is using a credential helper that caches). Attack (1) requires a certain amount of obliviousness on Alice's part while making a new commit. Attack (2) requires that Mallory knows a sha1 in Bob's repository, that Bob's server supports dumb http, and that the object in question is loose on Bob's server. But we can probably make things a bit more obvious without any loss of functionality. This patch does two things to that end. First, when we encounter a whole-repo redirect during the initial ref discovery, we now inform the user on stderr, making attack (1) much more obvious. Second, the decision to follow redirects is now configurable. The truly paranoid can set the new http.followRedirects to false to avoid any redirection entirely. But for a more practical default, we will disallow redirects only after the initial ref discovery. This is enough to thwart attacks similar to (2), while still allowing the common use of redirects at the repository level. Since c93c92f30 (http: update base URLs when we see redirects, 2013-09-28) we re-root all further requests from the redirect destination, which should generally mean that no further redirection is necessary. As an escape hatch, in case there really is a server that needs to redirect individual requests, the user can set http.followRedirects to "true" (and this can be done on a per-server basis via http.*.followRedirects config). Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-07 02:24:41 +08:00
if (options.verbosity && !starts_with(refs_url.buf, url.buf))
warning(_("redirecting to %s"), url.buf);
last= xcalloc(1, sizeof(*last_discovery));
last->service = xstrdup(service);
last->buf_alloc = strbuf_detach(&buffer, &last->len);
last->buf = last->buf_alloc;
strbuf_addf(&exp, "application/x-%s-advertisement", service);
if (maybe_smart &&
(5 <= last->len && last->buf[4] == '#') &&
!strbuf_cmp(&exp, &type)) {
pkt-line: share buffer/descriptor reading implementation The packet_read function reads from a descriptor. The packet_get_line function is similar, but reads from an in-memory buffer, and uses a completely separate implementation. This patch teaches the generic packet_read function to accept either source, and we can do away with packet_get_line's implementation. There are two other differences to account for between the old and new functions. The first is that we used to read into a strbuf, but now read into a fixed size buffer. The only two callers are fine with that, and in fact it simplifies their code, since they can use the same static-buffer interface as the rest of the packet_read_line callers (and we provide a similar convenience wrapper for reading from a buffer rather than a descriptor). This is technically an externally-visible behavior change in that we used to accept arbitrary sized packets up to 65532 bytes, and now cap out at LARGE_PACKET_MAX, 65520. In practice this doesn't matter, as we use it only for parsing smart-http headers (of which there is exactly one defined, and it is small and fixed-size). And any extension headers would be breaking the protocol to go over LARGE_PACKET_MAX anyway. The other difference is that packet_get_line would return on error rather than dying. However, both callers of packet_get_line are actually improved by dying. The first caller does its own error checking, but we can drop that; as a result, we'll actually get more specific reporting about protocol breakage when packet_read dies internally. The only downside is that packet_read will not print the smart-http URL that failed, but that's not a big deal; anybody not debugging can already see the remote's URL already, and anybody debugging would want to run with GIT_CURL_VERBOSE anyway to see way more information. The second caller, which is just trying to skip past any extra smart-http headers (of which there are none defined, but which we allow to keep room for future expansion), did not error check at all. As a result, it would treat an error just like a flush packet. The resulting mess would generally cause an error later in get_remote_heads, but now we get error reporting much closer to the source of the problem. Brown-paper-bag-fixes-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-24 06:31:34 +08:00
char *line;
/*
* smart HTTP response; validate that the service
* pkt-line matches our request.
*/
pkt-line: share buffer/descriptor reading implementation The packet_read function reads from a descriptor. The packet_get_line function is similar, but reads from an in-memory buffer, and uses a completely separate implementation. This patch teaches the generic packet_read function to accept either source, and we can do away with packet_get_line's implementation. There are two other differences to account for between the old and new functions. The first is that we used to read into a strbuf, but now read into a fixed size buffer. The only two callers are fine with that, and in fact it simplifies their code, since they can use the same static-buffer interface as the rest of the packet_read_line callers (and we provide a similar convenience wrapper for reading from a buffer rather than a descriptor). This is technically an externally-visible behavior change in that we used to accept arbitrary sized packets up to 65532 bytes, and now cap out at LARGE_PACKET_MAX, 65520. In practice this doesn't matter, as we use it only for parsing smart-http headers (of which there is exactly one defined, and it is small and fixed-size). And any extension headers would be breaking the protocol to go over LARGE_PACKET_MAX anyway. The other difference is that packet_get_line would return on error rather than dying. However, both callers of packet_get_line are actually improved by dying. The first caller does its own error checking, but we can drop that; as a result, we'll actually get more specific reporting about protocol breakage when packet_read dies internally. The only downside is that packet_read will not print the smart-http URL that failed, but that's not a big deal; anybody not debugging can already see the remote's URL already, and anybody debugging would want to run with GIT_CURL_VERBOSE anyway to see way more information. The second caller, which is just trying to skip past any extra smart-http headers (of which there are none defined, but which we allow to keep room for future expansion), did not error check at all. As a result, it would treat an error just like a flush packet. The resulting mess would generally cause an error later in get_remote_heads, but now we get error reporting much closer to the source of the problem. Brown-paper-bag-fixes-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-24 06:31:34 +08:00
line = packet_read_line_buf(&last->buf, &last->len, NULL);
if (!line)
die("invalid server response; expected service, got flush packet");
strbuf_reset(&exp);
strbuf_addf(&exp, "# service=%s", service);
pkt-line: share buffer/descriptor reading implementation The packet_read function reads from a descriptor. The packet_get_line function is similar, but reads from an in-memory buffer, and uses a completely separate implementation. This patch teaches the generic packet_read function to accept either source, and we can do away with packet_get_line's implementation. There are two other differences to account for between the old and new functions. The first is that we used to read into a strbuf, but now read into a fixed size buffer. The only two callers are fine with that, and in fact it simplifies their code, since they can use the same static-buffer interface as the rest of the packet_read_line callers (and we provide a similar convenience wrapper for reading from a buffer rather than a descriptor). This is technically an externally-visible behavior change in that we used to accept arbitrary sized packets up to 65532 bytes, and now cap out at LARGE_PACKET_MAX, 65520. In practice this doesn't matter, as we use it only for parsing smart-http headers (of which there is exactly one defined, and it is small and fixed-size). And any extension headers would be breaking the protocol to go over LARGE_PACKET_MAX anyway. The other difference is that packet_get_line would return on error rather than dying. However, both callers of packet_get_line are actually improved by dying. The first caller does its own error checking, but we can drop that; as a result, we'll actually get more specific reporting about protocol breakage when packet_read dies internally. The only downside is that packet_read will not print the smart-http URL that failed, but that's not a big deal; anybody not debugging can already see the remote's URL already, and anybody debugging would want to run with GIT_CURL_VERBOSE anyway to see way more information. The second caller, which is just trying to skip past any extra smart-http headers (of which there are none defined, but which we allow to keep room for future expansion), did not error check at all. As a result, it would treat an error just like a flush packet. The resulting mess would generally cause an error later in get_remote_heads, but now we get error reporting much closer to the source of the problem. Brown-paper-bag-fixes-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-24 06:31:34 +08:00
if (strcmp(line, exp.buf))
die("invalid server response; got '%s'", line);
strbuf_release(&exp);
/* The header can include additional metadata lines, up
* until a packet flush marker. Ignore these now, but
* in the future we might start to scan them.
*/
pkt-line: share buffer/descriptor reading implementation The packet_read function reads from a descriptor. The packet_get_line function is similar, but reads from an in-memory buffer, and uses a completely separate implementation. This patch teaches the generic packet_read function to accept either source, and we can do away with packet_get_line's implementation. There are two other differences to account for between the old and new functions. The first is that we used to read into a strbuf, but now read into a fixed size buffer. The only two callers are fine with that, and in fact it simplifies their code, since they can use the same static-buffer interface as the rest of the packet_read_line callers (and we provide a similar convenience wrapper for reading from a buffer rather than a descriptor). This is technically an externally-visible behavior change in that we used to accept arbitrary sized packets up to 65532 bytes, and now cap out at LARGE_PACKET_MAX, 65520. In practice this doesn't matter, as we use it only for parsing smart-http headers (of which there is exactly one defined, and it is small and fixed-size). And any extension headers would be breaking the protocol to go over LARGE_PACKET_MAX anyway. The other difference is that packet_get_line would return on error rather than dying. However, both callers of packet_get_line are actually improved by dying. The first caller does its own error checking, but we can drop that; as a result, we'll actually get more specific reporting about protocol breakage when packet_read dies internally. The only downside is that packet_read will not print the smart-http URL that failed, but that's not a big deal; anybody not debugging can already see the remote's URL already, and anybody debugging would want to run with GIT_CURL_VERBOSE anyway to see way more information. The second caller, which is just trying to skip past any extra smart-http headers (of which there are none defined, but which we allow to keep room for future expansion), did not error check at all. As a result, it would treat an error just like a flush packet. The resulting mess would generally cause an error later in get_remote_heads, but now we get error reporting much closer to the source of the problem. Brown-paper-bag-fixes-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-24 06:31:34 +08:00
while (packet_read_line_buf(&last->buf, &last->len, NULL))
;
last->proto_git = 1;
} else if (maybe_smart &&
last->len > 5 && starts_with(last->buf + 4, "version 2")) {
last->proto_git = 1;
}
remote-curl: always parse incoming refs When remote-curl receives a list of refs from a server, it keeps the whole buffer intact. When we get a "list" command, we feed the result to get_remote_heads, and when we get a "fetch" or "push" command, we feed it to fetch-pack or send-pack, respectively. If the HTTP response from the server is truncated for any reason, we will get an incomplete ref advertisement. If we then feed this incomplete list to fetch-pack, one of a few things may happen: 1. If the truncation is in a packet header, fetch-pack will notice the bogus line and complain. 2. If the truncation is inside a packet, fetch-pack will keep waiting for us to send the rest of the packet, which we never will. 3. If the truncation is at a packet boundary, fetch-pack will keep waiting for us to send the next packet, which we never will. As a result, fetch-pack hangs, waiting for input. However, remote-curl believes it has sent all of the advertisement, and therefore waits for fetch-pack to speak. The two processes end up in a deadlock. We do notice the broken ref list if we feed it to get_remote_heads. So if git asks the helper to do a "list" followed by a "fetch", we are safe; we'll abort during the list operation, which parses the refs. This patch teaches remote-curl to always parse and save the incoming ref list when we read the ref advertisement from a server. That means that we will always verify and abort before even running fetch-pack (or send-pack) when reading a corrupted list, even if we do not run the "list" command explicitly. Since we save the result, in the common case of running "list" then "fetch", we do not do any extra parsing at all. In the case of just a "fetch", we do an extra round of parsing, but only once. Note also that the "fetch" case will now also initialize server_capabilities from the remote (in remote-curl; we already would do so inside fetch-pack). Doing "list+fetch" already does this. It doesn't actually matter now, but the new behavior is arguably more correct, should remote-curl ever start caring about the server's capability list. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-21 04:07:19 +08:00
if (last->proto_git)
last->refs = parse_git_refs(last, for_push);
else
last->refs = parse_info_refs(last);
strbuf_release(&refs_url);
strbuf_release(&exp);
strbuf_release(&type);
strbuf_release(&charset);
remote-curl: rewrite base url from info/refs redirects For efficiency and security reasons, an earlier commit in this series taught http_get_* to re-write the base url based on redirections we saw while making a specific request. This commit wires that option into the info/refs request, meaning that a redirect from http://example.com/foo.git/info/refs to https://example.com/bar.git/info/refs will behave as if "https://example.com/bar.git" had been provided to git in the first place. The tests bear some explanation. We introduce two new hierearchies into the httpd test config: 1. Requests to /smart-redir-limited will work only for the initial info/refs request, but not any subsequent requests. As a result, we can confirm whether the client is re-rooting its requests after the initial contact, since otherwise it will fail (it will ask for "repo.git/git-upload-pack", which is not redirected). 2. Requests to smart-redir-auth will redirect, and require auth after the redirection. Since we are using the redirected base for further requests, we also update the credential struct, in order not to mislead the user (or credential helpers) about which credential is needed. We can therefore check the GIT_ASKPASS prompts to make sure we are prompting for the new location. Because we have neither multiple servers nor https support in our test setup, we can only redirect between paths, meaning we need to turn on credential.useHttpPath to see the difference. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2013-09-28 16:35:35 +08:00
strbuf_release(&effective_url);
strbuf_release(&buffer);
strbuf_release(&protocol_header);
string_list_clear(&extra_headers, 0);
last_discovery = last;
return last;
}
static struct ref *get_refs(int for_push)
{
struct discovery *heads;
if (for_push)
remote-curl: always parse incoming refs When remote-curl receives a list of refs from a server, it keeps the whole buffer intact. When we get a "list" command, we feed the result to get_remote_heads, and when we get a "fetch" or "push" command, we feed it to fetch-pack or send-pack, respectively. If the HTTP response from the server is truncated for any reason, we will get an incomplete ref advertisement. If we then feed this incomplete list to fetch-pack, one of a few things may happen: 1. If the truncation is in a packet header, fetch-pack will notice the bogus line and complain. 2. If the truncation is inside a packet, fetch-pack will keep waiting for us to send the rest of the packet, which we never will. 3. If the truncation is at a packet boundary, fetch-pack will keep waiting for us to send the next packet, which we never will. As a result, fetch-pack hangs, waiting for input. However, remote-curl believes it has sent all of the advertisement, and therefore waits for fetch-pack to speak. The two processes end up in a deadlock. We do notice the broken ref list if we feed it to get_remote_heads. So if git asks the helper to do a "list" followed by a "fetch", we are safe; we'll abort during the list operation, which parses the refs. This patch teaches remote-curl to always parse and save the incoming ref list when we read the ref advertisement from a server. That means that we will always verify and abort before even running fetch-pack (or send-pack) when reading a corrupted list, even if we do not run the "list" command explicitly. Since we save the result, in the common case of running "list" then "fetch", we do not do any extra parsing at all. In the case of just a "fetch", we do an extra round of parsing, but only once. Note also that the "fetch" case will now also initialize server_capabilities from the remote (in remote-curl; we already would do so inside fetch-pack). Doing "list+fetch" already does this. It doesn't actually matter now, but the new behavior is arguably more correct, should remote-curl ever start caring about the server's capability list. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-21 04:07:19 +08:00
heads = discover_refs("git-receive-pack", for_push);
else
remote-curl: always parse incoming refs When remote-curl receives a list of refs from a server, it keeps the whole buffer intact. When we get a "list" command, we feed the result to get_remote_heads, and when we get a "fetch" or "push" command, we feed it to fetch-pack or send-pack, respectively. If the HTTP response from the server is truncated for any reason, we will get an incomplete ref advertisement. If we then feed this incomplete list to fetch-pack, one of a few things may happen: 1. If the truncation is in a packet header, fetch-pack will notice the bogus line and complain. 2. If the truncation is inside a packet, fetch-pack will keep waiting for us to send the rest of the packet, which we never will. 3. If the truncation is at a packet boundary, fetch-pack will keep waiting for us to send the next packet, which we never will. As a result, fetch-pack hangs, waiting for input. However, remote-curl believes it has sent all of the advertisement, and therefore waits for fetch-pack to speak. The two processes end up in a deadlock. We do notice the broken ref list if we feed it to get_remote_heads. So if git asks the helper to do a "list" followed by a "fetch", we are safe; we'll abort during the list operation, which parses the refs. This patch teaches remote-curl to always parse and save the incoming ref list when we read the ref advertisement from a server. That means that we will always verify and abort before even running fetch-pack (or send-pack) when reading a corrupted list, even if we do not run the "list" command explicitly. Since we save the result, in the common case of running "list" then "fetch", we do not do any extra parsing at all. In the case of just a "fetch", we do an extra round of parsing, but only once. Note also that the "fetch" case will now also initialize server_capabilities from the remote (in remote-curl; we already would do so inside fetch-pack). Doing "list+fetch" already does this. It doesn't actually matter now, but the new behavior is arguably more correct, should remote-curl ever start caring about the server's capability list. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-21 04:07:19 +08:00
heads = discover_refs("git-upload-pack", for_push);
remote-curl: always parse incoming refs When remote-curl receives a list of refs from a server, it keeps the whole buffer intact. When we get a "list" command, we feed the result to get_remote_heads, and when we get a "fetch" or "push" command, we feed it to fetch-pack or send-pack, respectively. If the HTTP response from the server is truncated for any reason, we will get an incomplete ref advertisement. If we then feed this incomplete list to fetch-pack, one of a few things may happen: 1. If the truncation is in a packet header, fetch-pack will notice the bogus line and complain. 2. If the truncation is inside a packet, fetch-pack will keep waiting for us to send the rest of the packet, which we never will. 3. If the truncation is at a packet boundary, fetch-pack will keep waiting for us to send the next packet, which we never will. As a result, fetch-pack hangs, waiting for input. However, remote-curl believes it has sent all of the advertisement, and therefore waits for fetch-pack to speak. The two processes end up in a deadlock. We do notice the broken ref list if we feed it to get_remote_heads. So if git asks the helper to do a "list" followed by a "fetch", we are safe; we'll abort during the list operation, which parses the refs. This patch teaches remote-curl to always parse and save the incoming ref list when we read the ref advertisement from a server. That means that we will always verify and abort before even running fetch-pack (or send-pack) when reading a corrupted list, even if we do not run the "list" command explicitly. Since we save the result, in the common case of running "list" then "fetch", we do not do any extra parsing at all. In the case of just a "fetch", we do an extra round of parsing, but only once. Note also that the "fetch" case will now also initialize server_capabilities from the remote (in remote-curl; we already would do so inside fetch-pack). Doing "list+fetch" already does this. It doesn't actually matter now, but the new behavior is arguably more correct, should remote-curl ever start caring about the server's capability list. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-21 04:07:19 +08:00
return heads->refs;
}
static void output_refs(struct ref *refs)
{
struct ref *posn;
for (posn = refs; posn; posn = posn->next) {
if (posn->symref)
printf("@%s %s\n", posn->symref, posn->name);
else
printf("%s %s\n", oid_to_hex(&posn->old_oid), posn->name);
}
printf("\n");
fflush(stdout);
}
struct rpc_state {
const char *service_name;
const char **argv;
struct strbuf *stdin_preamble;
char *service_url;
char *hdr_content_type;
char *hdr_accept;
char *protocol_header;
char *buf;
size_t alloc;
size_t len;
size_t pos;
int in;
int out;
remote-curl: don't hang when a server dies before any output In the event that a HTTP server closes the connection after giving a 200 but before giving any packets, we don't want to hang forever waiting for a response that will never come. Instead, we should die immediately. One case where this happens is when attempting to fetch a dangling object by its object name. In this case, the server dies before sending any data. Prior to this patch, fetch-pack would wait for data from the server, and remote-curl would wait for fetch-pack, causing a deadlock. Despite this patch, there is other possible malformed input that could cause the same deadlock (e.g. a half-finished pktline, or a pktline but no trailing flush). There are a few possible solutions to this: 1. Allowing remote-curl to tell fetch-pack about the EOF (so that fetch-pack could know that no more data is coming until it says something else). This is tricky because an out-of-band signal would be required, or the http response would have to be re-framed inside another layer of pkt-line or something. 2. Make remote-curl understand some of the protocol. It turns out that in addition to understanding pkt-line, it would need to watch for ack/nak. This is somewhat fragile, as information about the protocol would end up in two places. Also, pkt-lines which are already at the length limit would need special handling. Both of these solutions would require a fair amount of work, whereas this hack is easy and solves at least some of the problem. Still to do: it would be good to give a better error message than "fatal: The remote end hung up unexpectedly". Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-19 04:30:49 +08:00
int any_written;
struct strbuf result;
unsigned gzip_request : 1;
unsigned initial_buffer : 1;
};
static size_t rpc_out(void *ptr, size_t eltsize,
size_t nmemb, void *buffer_)
{
size_t max = eltsize * nmemb;
struct rpc_state *rpc = buffer_;
size_t avail = rpc->len - rpc->pos;
if (!avail) {
rpc->initial_buffer = 0;
pkt-line: share buffer/descriptor reading implementation The packet_read function reads from a descriptor. The packet_get_line function is similar, but reads from an in-memory buffer, and uses a completely separate implementation. This patch teaches the generic packet_read function to accept either source, and we can do away with packet_get_line's implementation. There are two other differences to account for between the old and new functions. The first is that we used to read into a strbuf, but now read into a fixed size buffer. The only two callers are fine with that, and in fact it simplifies their code, since they can use the same static-buffer interface as the rest of the packet_read_line callers (and we provide a similar convenience wrapper for reading from a buffer rather than a descriptor). This is technically an externally-visible behavior change in that we used to accept arbitrary sized packets up to 65532 bytes, and now cap out at LARGE_PACKET_MAX, 65520. In practice this doesn't matter, as we use it only for parsing smart-http headers (of which there is exactly one defined, and it is small and fixed-size). And any extension headers would be breaking the protocol to go over LARGE_PACKET_MAX anyway. The other difference is that packet_get_line would return on error rather than dying. However, both callers of packet_get_line are actually improved by dying. The first caller does its own error checking, but we can drop that; as a result, we'll actually get more specific reporting about protocol breakage when packet_read dies internally. The only downside is that packet_read will not print the smart-http URL that failed, but that's not a big deal; anybody not debugging can already see the remote's URL already, and anybody debugging would want to run with GIT_CURL_VERBOSE anyway to see way more information. The second caller, which is just trying to skip past any extra smart-http headers (of which there are none defined, but which we allow to keep room for future expansion), did not error check at all. As a result, it would treat an error just like a flush packet. The resulting mess would generally cause an error later in get_remote_heads, but now we get error reporting much closer to the source of the problem. Brown-paper-bag-fixes-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-24 06:31:34 +08:00
avail = packet_read(rpc->out, NULL, NULL, rpc->buf, rpc->alloc, 0);
if (!avail)
return 0;
rpc->pos = 0;
rpc->len = avail;
}
if (max < avail)
avail = max;
memcpy(ptr, rpc->buf + rpc->pos, avail);
rpc->pos += avail;
return avail;
}
#ifndef NO_CURL_IOCTL
static curlioerr rpc_ioctl(CURL *handle, int cmd, void *clientp)
{
struct rpc_state *rpc = clientp;
switch (cmd) {
case CURLIOCMD_NOP:
return CURLIOE_OK;
case CURLIOCMD_RESTARTREAD:
if (rpc->initial_buffer) {
rpc->pos = 0;
return CURLIOE_OK;
}
error("unable to rewind rpc post data - try increasing http.postBuffer");
return CURLIOE_FAILRESTART;
default:
return CURLIOE_UNKNOWNCMD;
}
}
#endif
static size_t rpc_in(char *ptr, size_t eltsize,
size_t nmemb, void *buffer_)
{
size_t size = eltsize * nmemb;
struct rpc_state *rpc = buffer_;
remote-curl: don't hang when a server dies before any output In the event that a HTTP server closes the connection after giving a 200 but before giving any packets, we don't want to hang forever waiting for a response that will never come. Instead, we should die immediately. One case where this happens is when attempting to fetch a dangling object by its object name. In this case, the server dies before sending any data. Prior to this patch, fetch-pack would wait for data from the server, and remote-curl would wait for fetch-pack, causing a deadlock. Despite this patch, there is other possible malformed input that could cause the same deadlock (e.g. a half-finished pktline, or a pktline but no trailing flush). There are a few possible solutions to this: 1. Allowing remote-curl to tell fetch-pack about the EOF (so that fetch-pack could know that no more data is coming until it says something else). This is tricky because an out-of-band signal would be required, or the http response would have to be re-framed inside another layer of pkt-line or something. 2. Make remote-curl understand some of the protocol. It turns out that in addition to understanding pkt-line, it would need to watch for ack/nak. This is somewhat fragile, as information about the protocol would end up in two places. Also, pkt-lines which are already at the length limit would need special handling. Both of these solutions would require a fair amount of work, whereas this hack is easy and solves at least some of the problem. Still to do: it would be good to give a better error message than "fatal: The remote end hung up unexpectedly". Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-19 04:30:49 +08:00
if (size)
rpc->any_written = 1;
write_or_die(rpc->in, ptr, size);
return size;
}
static int run_slot(struct active_request_slot *slot,
struct slot_results *results)
{
http: prompt for credentials on failed POST All of the smart-http GET requests go through the http_get_* functions, which will prompt for credentials and retry if we see an HTTP 401. POST requests, however, do not go through any central point. Moreover, it is difficult to retry in the general case; we cannot assume the request body fits in memory or is even seekable, and we don't know how much of it was consumed during the attempt. Most of the time, this is not a big deal; for both fetching and pushing, we make a GET request before doing any POSTs, so typically we figure out the credentials during the first request, then reuse them during the POST. However, some servers may allow a client to get the list of refs from receive-pack without authentication, and then require authentication when the client actually tries to POST the pack. This is not ideal, as the client may do a non-trivial amount of work to generate the pack (e.g., delta-compressing objects). However, for a long time it has been the recommended example configuration in git-http-backend(1) for setting up a repository with anonymous fetch and authenticated push. This setup has always been broken without putting a username into the URL. Prior to commit 986bbc0, it did work with a username in the URL, because git would prompt for credentials before making any requests at all. However, post-986bbc0, it is totally broken. Since it has been advertised in the manpage for some time, we should make sure it works. Unfortunately, it is not as easy as simply calling post_rpc again when it fails, due to the input issue mentioned above. However, we can still make this specific case work by retrying in two specific instances: 1. If the request is large (bigger than LARGE_PACKET_MAX), we will first send a probe request with a single flush packet. Since this request is static, we can freely retry it. 2. If the request is small and we are not using gzip, then we have the whole thing in-core, and we can freely retry. That means we will not retry in some instances, including: 1. If we are using gzip. However, we only do so when calling git-upload-pack, so it does not apply to pushes. 2. If we have a large request, the probe succeeds, but then the real POST wants authentication. This is an extremely unlikely configuration and not worth worrying about. While it might be nice to cover those instances, doing so would be significantly more complex for very little real-world gain. In the long run, we will be much better off when curl learns to internally handle authentication as a callback, and we can cleanly handle all cases that way. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-27 21:27:15 +08:00
int err;
struct slot_results results_buf;
if (!results)
results = &results_buf;
http: never use curl_easy_perform We currently don't reuse http connections when fetching via the smart-http protocol. This is bad because the TCP handshake introduces latency, and especially because SSL connection setup may be non-trivial. We can fix it by consistently using curl's "multi" interface. The reason is rather complicated: Our http code has two ways of being used: queuing many "slots" to be fetched in parallel, or fetching a single request in a blocking manner. The parallel code is built on curl's "multi" interface. Most of the single-request code uses http_request, which is built on top of the parallel code (we just feed it one slot, and wait until it finishes). However, one could also accomplish the single-request scheme by avoiding curl's multi interface entirely and just using curl_easy_perform. This is simpler, and is used by post_rpc in the smart-http protocol. It does work to use the same curl handle in both contexts, as long as it is not at the same time. However, internally curl may not share all of the cached resources between both contexts. In particular, a connection formed using the "multi" code will go into a reuse pool connected to the "multi" object. Further requests using the "easy" interface will not be able to reuse that connection. The smart http protocol does ref discovery via http_request, which uses the "multi" interface, and then follows up with the "easy" interface for its rpc calls. As a result, we make two HTTP connections rather than reusing a single one. We could teach the ref discovery to use the "easy" interface. But it is only once we have done this discovery that we know whether the protocol will be smart or dumb. If it is dumb, then our further requests, which want to fetch objects in parallel, will not be able to reuse the same connection. Instead, this patch switches post_rpc to build on the parallel interface, which means that we use it consistently everywhere. It's a little more complicated to use, but since we have the infrastructure already, it doesn't add any code; we can just factor out the relevant bits from http_request. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-18 18:34:20 +08:00
err = run_one_slot(slot, results);
http: prompt for credentials on failed POST All of the smart-http GET requests go through the http_get_* functions, which will prompt for credentials and retry if we see an HTTP 401. POST requests, however, do not go through any central point. Moreover, it is difficult to retry in the general case; we cannot assume the request body fits in memory or is even seekable, and we don't know how much of it was consumed during the attempt. Most of the time, this is not a big deal; for both fetching and pushing, we make a GET request before doing any POSTs, so typically we figure out the credentials during the first request, then reuse them during the POST. However, some servers may allow a client to get the list of refs from receive-pack without authentication, and then require authentication when the client actually tries to POST the pack. This is not ideal, as the client may do a non-trivial amount of work to generate the pack (e.g., delta-compressing objects). However, for a long time it has been the recommended example configuration in git-http-backend(1) for setting up a repository with anonymous fetch and authenticated push. This setup has always been broken without putting a username into the URL. Prior to commit 986bbc0, it did work with a username in the URL, because git would prompt for credentials before making any requests at all. However, post-986bbc0, it is totally broken. Since it has been advertised in the manpage for some time, we should make sure it works. Unfortunately, it is not as easy as simply calling post_rpc again when it fails, due to the input issue mentioned above. However, we can still make this specific case work by retrying in two specific instances: 1. If the request is large (bigger than LARGE_PACKET_MAX), we will first send a probe request with a single flush packet. Since this request is static, we can freely retry it. 2. If the request is small and we are not using gzip, then we have the whole thing in-core, and we can freely retry. That means we will not retry in some instances, including: 1. If we are using gzip. However, we only do so when calling git-upload-pack, so it does not apply to pushes. 2. If we have a large request, the probe succeeds, but then the real POST wants authentication. This is an extremely unlikely configuration and not worth worrying about. While it might be nice to cover those instances, doing so would be significantly more complex for very little real-world gain. In the long run, we will be much better off when curl learns to internally handle authentication as a callback, and we can cleanly handle all cases that way. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-27 21:27:15 +08:00
if (err != HTTP_OK && err != HTTP_REAUTH) {
struct strbuf msg = STRBUF_INIT;
if (results->http_code && results->http_code != 200)
strbuf_addf(&msg, "HTTP %ld", results->http_code);
if (results->curl_result != CURLE_OK) {
if (msg.len)
strbuf_addch(&msg, ' ');
strbuf_addf(&msg, "curl %d", results->curl_result);
if (curl_errorstr[0]) {
strbuf_addch(&msg, ' ');
strbuf_addstr(&msg, curl_errorstr);
}
}
error("RPC failed; %s", msg.buf);
strbuf_release(&msg);
}
return err;
}
static int probe_rpc(struct rpc_state *rpc, struct slot_results *results)
{
struct active_request_slot *slot;
struct curl_slist *headers = http_copy_default_headers();
struct strbuf buf = STRBUF_INIT;
int err;
slot = get_active_slot();
headers = curl_slist_append(headers, rpc->hdr_content_type);
headers = curl_slist_append(headers, rpc->hdr_accept);
curl_easy_setopt(slot->curl, CURLOPT_NOBODY, 0);
curl_easy_setopt(slot->curl, CURLOPT_POST, 1);
curl_easy_setopt(slot->curl, CURLOPT_URL, rpc->service_url);
curl_easy_setopt(slot->curl, CURLOPT_ENCODING, NULL);
curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, "0000");
curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, 4);
curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, headers);
curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, fwrite_buffer);
curl_easy_setopt(slot->curl, CURLOPT_FILE, &buf);
err = run_slot(slot, results);
curl_slist_free_all(headers);
strbuf_release(&buf);
return err;
}
remote-curl.c: xcurl_off_t is not portable (on 32 bit platfoms) When setting DEVELOPER = 1 DEVOPTS = extra-all "gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516" errors out with "comparison is always false due to limited range of data type" "[-Werror=type-limits]" It turns out that the function xcurl_off_t() has 2 flavours: - It gives a warning 32 bit systems, like Linux - It takes the signed ssize_t as a paramter, but the only caller is using a size_t (which is typically unsigned these days) The original motivation of this function is to make sure that sizes > 2GiB are handled correctly. The curl documentation says: "For any given platform/compiler curl_off_t must be typedef'ed to a 64-bit wide signed integral data type" On a 32 bit system "size_t" can be promoted into a 64 bit signed value without loss of data, and therefore we may see the "comparison is always false" warning. On a 64 bit system it may happen, at least in theory, that size_t is > 2^63, and then the promotion from an unsigned "size_t" into a signed "curl_off_t" may be a problem. One solution to suppress a possible compiler warning could be to remove the function xcurl_off_t(). However, to be on the very safe side, we keep it and improve it: - The len parameter is changed from ssize_t to size_t - A temporally variable "size" is used, promoted int uintmax_t and the compared with "maximum_signed_value_of_type(curl_off_t)". Thanks to Junio C Hamano for this hint. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-10 01:41:10 +08:00
static curl_off_t xcurl_off_t(size_t len) {
uintmax_t size = len;
if (size > maximum_signed_value_of_type(curl_off_t))
die("cannot handle pushes this big");
remote-curl.c: xcurl_off_t is not portable (on 32 bit platfoms) When setting DEVELOPER = 1 DEVOPTS = extra-all "gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516" errors out with "comparison is always false due to limited range of data type" "[-Werror=type-limits]" It turns out that the function xcurl_off_t() has 2 flavours: - It gives a warning 32 bit systems, like Linux - It takes the signed ssize_t as a paramter, but the only caller is using a size_t (which is typically unsigned these days) The original motivation of this function is to make sure that sizes > 2GiB are handled correctly. The curl documentation says: "For any given platform/compiler curl_off_t must be typedef'ed to a 64-bit wide signed integral data type" On a 32 bit system "size_t" can be promoted into a 64 bit signed value without loss of data, and therefore we may see the "comparison is always false" warning. On a 64 bit system it may happen, at least in theory, that size_t is > 2^63, and then the promotion from an unsigned "size_t" into a signed "curl_off_t" may be a problem. One solution to suppress a possible compiler warning could be to remove the function xcurl_off_t(). However, to be on the very safe side, we keep it and improve it: - The len parameter is changed from ssize_t to size_t - A temporally variable "size" is used, promoted int uintmax_t and the compared with "maximum_signed_value_of_type(curl_off_t)". Thanks to Junio C Hamano for this hint. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-10 01:41:10 +08:00
return (curl_off_t)size;
}
static int post_rpc(struct rpc_state *rpc)
{
struct active_request_slot *slot;
struct curl_slist *headers = http_copy_default_headers();
int use_gzip = rpc->gzip_request;
char *gzip_body = NULL;
size_t gzip_size = 0;
int err, large_request = 0;
int needs_100_continue = 0;
/* Try to load the entire request, if we can fit it into the
* allocated buffer space we can use HTTP/1.0 and avoid the
* chunked encoding mess.
*/
while (1) {
size_t left = rpc->alloc - rpc->len;
char *buf = rpc->buf + rpc->len;
int n;
if (left < LARGE_PACKET_MAX) {
large_request = 1;
use_gzip = 0;
break;
}
pkt-line: share buffer/descriptor reading implementation The packet_read function reads from a descriptor. The packet_get_line function is similar, but reads from an in-memory buffer, and uses a completely separate implementation. This patch teaches the generic packet_read function to accept either source, and we can do away with packet_get_line's implementation. There are two other differences to account for between the old and new functions. The first is that we used to read into a strbuf, but now read into a fixed size buffer. The only two callers are fine with that, and in fact it simplifies their code, since they can use the same static-buffer interface as the rest of the packet_read_line callers (and we provide a similar convenience wrapper for reading from a buffer rather than a descriptor). This is technically an externally-visible behavior change in that we used to accept arbitrary sized packets up to 65532 bytes, and now cap out at LARGE_PACKET_MAX, 65520. In practice this doesn't matter, as we use it only for parsing smart-http headers (of which there is exactly one defined, and it is small and fixed-size). And any extension headers would be breaking the protocol to go over LARGE_PACKET_MAX anyway. The other difference is that packet_get_line would return on error rather than dying. However, both callers of packet_get_line are actually improved by dying. The first caller does its own error checking, but we can drop that; as a result, we'll actually get more specific reporting about protocol breakage when packet_read dies internally. The only downside is that packet_read will not print the smart-http URL that failed, but that's not a big deal; anybody not debugging can already see the remote's URL already, and anybody debugging would want to run with GIT_CURL_VERBOSE anyway to see way more information. The second caller, which is just trying to skip past any extra smart-http headers (of which there are none defined, but which we allow to keep room for future expansion), did not error check at all. As a result, it would treat an error just like a flush packet. The resulting mess would generally cause an error later in get_remote_heads, but now we get error reporting much closer to the source of the problem. Brown-paper-bag-fixes-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-24 06:31:34 +08:00
n = packet_read(rpc->out, NULL, NULL, buf, left, 0);
if (!n)
break;
rpc->len += n;
}
if (large_request) {
struct slot_results results;
http: prompt for credentials on failed POST All of the smart-http GET requests go through the http_get_* functions, which will prompt for credentials and retry if we see an HTTP 401. POST requests, however, do not go through any central point. Moreover, it is difficult to retry in the general case; we cannot assume the request body fits in memory or is even seekable, and we don't know how much of it was consumed during the attempt. Most of the time, this is not a big deal; for both fetching and pushing, we make a GET request before doing any POSTs, so typically we figure out the credentials during the first request, then reuse them during the POST. However, some servers may allow a client to get the list of refs from receive-pack without authentication, and then require authentication when the client actually tries to POST the pack. This is not ideal, as the client may do a non-trivial amount of work to generate the pack (e.g., delta-compressing objects). However, for a long time it has been the recommended example configuration in git-http-backend(1) for setting up a repository with anonymous fetch and authenticated push. This setup has always been broken without putting a username into the URL. Prior to commit 986bbc0, it did work with a username in the URL, because git would prompt for credentials before making any requests at all. However, post-986bbc0, it is totally broken. Since it has been advertised in the manpage for some time, we should make sure it works. Unfortunately, it is not as easy as simply calling post_rpc again when it fails, due to the input issue mentioned above. However, we can still make this specific case work by retrying in two specific instances: 1. If the request is large (bigger than LARGE_PACKET_MAX), we will first send a probe request with a single flush packet. Since this request is static, we can freely retry it. 2. If the request is small and we are not using gzip, then we have the whole thing in-core, and we can freely retry. That means we will not retry in some instances, including: 1. If we are using gzip. However, we only do so when calling git-upload-pack, so it does not apply to pushes. 2. If we have a large request, the probe succeeds, but then the real POST wants authentication. This is an extremely unlikely configuration and not worth worrying about. While it might be nice to cover those instances, doing so would be significantly more complex for very little real-world gain. In the long run, we will be much better off when curl learns to internally handle authentication as a callback, and we can cleanly handle all cases that way. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-27 21:27:15 +08:00
do {
err = probe_rpc(rpc, &results);
http: hoist credential request out of handle_curl_result When we are handling a curl response code in http_request or in the remote-curl RPC code, we use the handle_curl_result helper to translate curl's response into an easy-to-use code. When we see an HTTP 401, we do one of two things: 1. If we already had a filled-in credential, we mark it as rejected, and then return HTTP_NOAUTH to indicate to the caller that we failed. 2. If we didn't, then we ask for a new credential and tell the caller HTTP_REAUTH to indicate that they may want to try again. Rejecting in the first case makes sense; it is the natural result of the request we just made. However, prompting for more credentials in the second step does not always make sense. We do not know for sure that the caller is going to make a second request, and nor are we sure that it will be to the same URL. Logically, the prompt belongs not to the request we just finished, but to the request we are (maybe) about to make. In practice, it is very hard to trigger any bad behavior. Currently, if we make a second request, it will always be to the same URL (even in the face of redirects, because curl handles the redirects internally). And we almost always retry on HTTP_REAUTH these days. The one exception is if we are streaming a large RPC request to the server (e.g., a pushed packfile), in which case we cannot restart. It's extremely unlikely to see a 401 response at this stage, though, as we would typically have seen it when we sent a probe request, before streaming the data. This patch drops the automatic prompt out of case 2, and instead requires the caller to do it. This is a few extra lines of code, and the bug it fixes is unlikely to come up in practice. But it is conceptually cleaner, and paves the way for better handling of credentials across redirects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2013-09-28 16:31:45 +08:00
if (err == HTTP_REAUTH)
credential_fill(&http_auth);
http: prompt for credentials on failed POST All of the smart-http GET requests go through the http_get_* functions, which will prompt for credentials and retry if we see an HTTP 401. POST requests, however, do not go through any central point. Moreover, it is difficult to retry in the general case; we cannot assume the request body fits in memory or is even seekable, and we don't know how much of it was consumed during the attempt. Most of the time, this is not a big deal; for both fetching and pushing, we make a GET request before doing any POSTs, so typically we figure out the credentials during the first request, then reuse them during the POST. However, some servers may allow a client to get the list of refs from receive-pack without authentication, and then require authentication when the client actually tries to POST the pack. This is not ideal, as the client may do a non-trivial amount of work to generate the pack (e.g., delta-compressing objects). However, for a long time it has been the recommended example configuration in git-http-backend(1) for setting up a repository with anonymous fetch and authenticated push. This setup has always been broken without putting a username into the URL. Prior to commit 986bbc0, it did work with a username in the URL, because git would prompt for credentials before making any requests at all. However, post-986bbc0, it is totally broken. Since it has been advertised in the manpage for some time, we should make sure it works. Unfortunately, it is not as easy as simply calling post_rpc again when it fails, due to the input issue mentioned above. However, we can still make this specific case work by retrying in two specific instances: 1. If the request is large (bigger than LARGE_PACKET_MAX), we will first send a probe request with a single flush packet. Since this request is static, we can freely retry it. 2. If the request is small and we are not using gzip, then we have the whole thing in-core, and we can freely retry. That means we will not retry in some instances, including: 1. If we are using gzip. However, we only do so when calling git-upload-pack, so it does not apply to pushes. 2. If we have a large request, the probe succeeds, but then the real POST wants authentication. This is an extremely unlikely configuration and not worth worrying about. While it might be nice to cover those instances, doing so would be significantly more complex for very little real-world gain. In the long run, we will be much better off when curl learns to internally handle authentication as a callback, and we can cleanly handle all cases that way. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-27 21:27:15 +08:00
} while (err == HTTP_REAUTH);
if (err != HTTP_OK)
return -1;
if (results.auth_avail & CURLAUTH_GSSNEGOTIATE)
needs_100_continue = 1;
}
headers = curl_slist_append(headers, rpc->hdr_content_type);
headers = curl_slist_append(headers, rpc->hdr_accept);
headers = curl_slist_append(headers, needs_100_continue ?
"Expect: 100-continue" : "Expect:");
/* Add the extra Git-Protocol header */
if (rpc->protocol_header)
headers = curl_slist_append(headers, rpc->protocol_header);
retry:
slot = get_active_slot();
curl_easy_setopt(slot->curl, CURLOPT_NOBODY, 0);
curl_easy_setopt(slot->curl, CURLOPT_POST, 1);
curl_easy_setopt(slot->curl, CURLOPT_URL, rpc->service_url);
curl_easy_setopt(slot->curl, CURLOPT_ENCODING, "");
if (large_request) {
/* The request body is large and the size cannot be predicted.
* We must use chunked encoding to send it.
*/
headers = curl_slist_append(headers, "Transfer-Encoding: chunked");
rpc->initial_buffer = 1;
curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, rpc_out);
curl_easy_setopt(slot->curl, CURLOPT_INFILE, rpc);
#ifndef NO_CURL_IOCTL
curl_easy_setopt(slot->curl, CURLOPT_IOCTLFUNCTION, rpc_ioctl);
curl_easy_setopt(slot->curl, CURLOPT_IOCTLDATA, rpc);
#endif
if (options.verbosity > 1) {
fprintf(stderr, "POST %s (chunked)\n", rpc->service_name);
fflush(stderr);
}
} else if (gzip_body) {
/*
* If we are looping to retry authentication, then the previous
* run will have set up the headers and gzip buffer already,
* and we just need to send it.
*/
curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, gzip_body);
curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE_LARGE, xcurl_off_t(gzip_size));
} else if (use_gzip && 1024 < rpc->len) {
/* The client backend isn't giving us compressed data so
* we can try to deflate it ourselves, this may save on
* the transfer time.
*/
2011-06-11 02:52:15 +08:00
git_zstream stream;
int ret;
git_deflate_init_gzip(&stream, Z_BEST_COMPRESSION);
gzip_size = git_deflate_bound(&stream, rpc->len);
gzip_body = xmalloc(gzip_size);
stream.next_in = (unsigned char *)rpc->buf;
stream.avail_in = rpc->len;
stream.next_out = (unsigned char *)gzip_body;
stream.avail_out = gzip_size;
ret = git_deflate(&stream, Z_FINISH);
if (ret != Z_STREAM_END)
die("cannot deflate request; zlib deflate error %d", ret);
ret = git_deflate_end_gently(&stream);
if (ret != Z_OK)
die("cannot deflate request; zlib end error %d", ret);
gzip_size = stream.total_out;
headers = curl_slist_append(headers, "Content-Encoding: gzip");
curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, gzip_body);
curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE_LARGE, xcurl_off_t(gzip_size));
if (options.verbosity > 1) {
fprintf(stderr, "POST %s (gzip %lu to %lu bytes)\n",
rpc->service_name,
(unsigned long)rpc->len, (unsigned long)gzip_size);
fflush(stderr);
}
} else {
/* We know the complete request size in advance, use the
* more normal Content-Length approach.
*/
curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, rpc->buf);
curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE_LARGE, xcurl_off_t(rpc->len));
if (options.verbosity > 1) {
fprintf(stderr, "POST %s (%lu bytes)\n",
rpc->service_name, (unsigned long)rpc->len);
fflush(stderr);
}
}
curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, headers);
curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, rpc_in);
curl_easy_setopt(slot->curl, CURLOPT_FILE, rpc);
remote-curl: don't hang when a server dies before any output In the event that a HTTP server closes the connection after giving a 200 but before giving any packets, we don't want to hang forever waiting for a response that will never come. Instead, we should die immediately. One case where this happens is when attempting to fetch a dangling object by its object name. In this case, the server dies before sending any data. Prior to this patch, fetch-pack would wait for data from the server, and remote-curl would wait for fetch-pack, causing a deadlock. Despite this patch, there is other possible malformed input that could cause the same deadlock (e.g. a half-finished pktline, or a pktline but no trailing flush). There are a few possible solutions to this: 1. Allowing remote-curl to tell fetch-pack about the EOF (so that fetch-pack could know that no more data is coming until it says something else). This is tricky because an out-of-band signal would be required, or the http response would have to be re-framed inside another layer of pkt-line or something. 2. Make remote-curl understand some of the protocol. It turns out that in addition to understanding pkt-line, it would need to watch for ack/nak. This is somewhat fragile, as information about the protocol would end up in two places. Also, pkt-lines which are already at the length limit would need special handling. Both of these solutions would require a fair amount of work, whereas this hack is easy and solves at least some of the problem. Still to do: it would be good to give a better error message than "fatal: The remote end hung up unexpectedly". Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-19 04:30:49 +08:00
rpc->any_written = 0;
err = run_slot(slot, NULL);
http: hoist credential request out of handle_curl_result When we are handling a curl response code in http_request or in the remote-curl RPC code, we use the handle_curl_result helper to translate curl's response into an easy-to-use code. When we see an HTTP 401, we do one of two things: 1. If we already had a filled-in credential, we mark it as rejected, and then return HTTP_NOAUTH to indicate to the caller that we failed. 2. If we didn't, then we ask for a new credential and tell the caller HTTP_REAUTH to indicate that they may want to try again. Rejecting in the first case makes sense; it is the natural result of the request we just made. However, prompting for more credentials in the second step does not always make sense. We do not know for sure that the caller is going to make a second request, and nor are we sure that it will be to the same URL. Logically, the prompt belongs not to the request we just finished, but to the request we are (maybe) about to make. In practice, it is very hard to trigger any bad behavior. Currently, if we make a second request, it will always be to the same URL (even in the face of redirects, because curl handles the redirects internally). And we almost always retry on HTTP_REAUTH these days. The one exception is if we are streaming a large RPC request to the server (e.g., a pushed packfile), in which case we cannot restart. It's extremely unlikely to see a 401 response at this stage, though, as we would typically have seen it when we sent a probe request, before streaming the data. This patch drops the automatic prompt out of case 2, and instead requires the caller to do it. This is a few extra lines of code, and the bug it fixes is unlikely to come up in practice. But it is conceptually cleaner, and paves the way for better handling of credentials across redirects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2013-09-28 16:31:45 +08:00
if (err == HTTP_REAUTH && !large_request) {
credential_fill(&http_auth);
goto retry;
http: hoist credential request out of handle_curl_result When we are handling a curl response code in http_request or in the remote-curl RPC code, we use the handle_curl_result helper to translate curl's response into an easy-to-use code. When we see an HTTP 401, we do one of two things: 1. If we already had a filled-in credential, we mark it as rejected, and then return HTTP_NOAUTH to indicate to the caller that we failed. 2. If we didn't, then we ask for a new credential and tell the caller HTTP_REAUTH to indicate that they may want to try again. Rejecting in the first case makes sense; it is the natural result of the request we just made. However, prompting for more credentials in the second step does not always make sense. We do not know for sure that the caller is going to make a second request, and nor are we sure that it will be to the same URL. Logically, the prompt belongs not to the request we just finished, but to the request we are (maybe) about to make. In practice, it is very hard to trigger any bad behavior. Currently, if we make a second request, it will always be to the same URL (even in the face of redirects, because curl handles the redirects internally). And we almost always retry on HTTP_REAUTH these days. The one exception is if we are streaming a large RPC request to the server (e.g., a pushed packfile), in which case we cannot restart. It's extremely unlikely to see a 401 response at this stage, though, as we would typically have seen it when we sent a probe request, before streaming the data. This patch drops the automatic prompt out of case 2, and instead requires the caller to do it. This is a few extra lines of code, and the bug it fixes is unlikely to come up in practice. But it is conceptually cleaner, and paves the way for better handling of credentials across redirects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2013-09-28 16:31:45 +08:00
}
http: prompt for credentials on failed POST All of the smart-http GET requests go through the http_get_* functions, which will prompt for credentials and retry if we see an HTTP 401. POST requests, however, do not go through any central point. Moreover, it is difficult to retry in the general case; we cannot assume the request body fits in memory or is even seekable, and we don't know how much of it was consumed during the attempt. Most of the time, this is not a big deal; for both fetching and pushing, we make a GET request before doing any POSTs, so typically we figure out the credentials during the first request, then reuse them during the POST. However, some servers may allow a client to get the list of refs from receive-pack without authentication, and then require authentication when the client actually tries to POST the pack. This is not ideal, as the client may do a non-trivial amount of work to generate the pack (e.g., delta-compressing objects). However, for a long time it has been the recommended example configuration in git-http-backend(1) for setting up a repository with anonymous fetch and authenticated push. This setup has always been broken without putting a username into the URL. Prior to commit 986bbc0, it did work with a username in the URL, because git would prompt for credentials before making any requests at all. However, post-986bbc0, it is totally broken. Since it has been advertised in the manpage for some time, we should make sure it works. Unfortunately, it is not as easy as simply calling post_rpc again when it fails, due to the input issue mentioned above. However, we can still make this specific case work by retrying in two specific instances: 1. If the request is large (bigger than LARGE_PACKET_MAX), we will first send a probe request with a single flush packet. Since this request is static, we can freely retry it. 2. If the request is small and we are not using gzip, then we have the whole thing in-core, and we can freely retry. That means we will not retry in some instances, including: 1. If we are using gzip. However, we only do so when calling git-upload-pack, so it does not apply to pushes. 2. If we have a large request, the probe succeeds, but then the real POST wants authentication. This is an extremely unlikely configuration and not worth worrying about. While it might be nice to cover those instances, doing so would be significantly more complex for very little real-world gain. In the long run, we will be much better off when curl learns to internally handle authentication as a callback, and we can cleanly handle all cases that way. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-27 21:27:15 +08:00
if (err != HTTP_OK)
err = -1;
remote-curl: don't hang when a server dies before any output In the event that a HTTP server closes the connection after giving a 200 but before giving any packets, we don't want to hang forever waiting for a response that will never come. Instead, we should die immediately. One case where this happens is when attempting to fetch a dangling object by its object name. In this case, the server dies before sending any data. Prior to this patch, fetch-pack would wait for data from the server, and remote-curl would wait for fetch-pack, causing a deadlock. Despite this patch, there is other possible malformed input that could cause the same deadlock (e.g. a half-finished pktline, or a pktline but no trailing flush). There are a few possible solutions to this: 1. Allowing remote-curl to tell fetch-pack about the EOF (so that fetch-pack could know that no more data is coming until it says something else). This is tricky because an out-of-band signal would be required, or the http response would have to be re-framed inside another layer of pkt-line or something. 2. Make remote-curl understand some of the protocol. It turns out that in addition to understanding pkt-line, it would need to watch for ack/nak. This is somewhat fragile, as information about the protocol would end up in two places. Also, pkt-lines which are already at the length limit would need special handling. Both of these solutions would require a fair amount of work, whereas this hack is easy and solves at least some of the problem. Still to do: it would be good to give a better error message than "fatal: The remote end hung up unexpectedly". Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-19 04:30:49 +08:00
if (!rpc->any_written)
err = -1;
curl_slist_free_all(headers);
free(gzip_body);
return err;
}
static int rpc_service(struct rpc_state *rpc, struct discovery *heads)
{
const char *svc = rpc->service_name;
struct strbuf buf = STRBUF_INIT;
struct strbuf *preamble = rpc->stdin_preamble;
struct child_process client = CHILD_PROCESS_INIT;
int err = 0;
client.in = -1;
client.out = -1;
client.git_cmd = 1;
client.argv = rpc->argv;
if (start_command(&client))
exit(1);
if (preamble)
write_or_die(client.in, preamble->buf, preamble->len);
if (heads)
write_or_die(client.in, heads->buf, heads->len);
rpc->alloc = http_post_buffer;
rpc->buf = xmalloc(rpc->alloc);
rpc->in = client.in;
rpc->out = client.out;
strbuf_init(&rpc->result, 0);
strbuf_addf(&buf, "%s%s", url.buf, svc);
rpc->service_url = strbuf_detach(&buf, NULL);
strbuf_addf(&buf, "Content-Type: application/x-%s-request", svc);
rpc->hdr_content_type = strbuf_detach(&buf, NULL);
strbuf_addf(&buf, "Accept: application/x-%s-result", svc);
rpc->hdr_accept = strbuf_detach(&buf, NULL);
if (get_protocol_http_header(heads->version, &buf))
rpc->protocol_header = strbuf_detach(&buf, NULL);
else
rpc->protocol_header = NULL;
while (!err) {
pkt-line: share buffer/descriptor reading implementation The packet_read function reads from a descriptor. The packet_get_line function is similar, but reads from an in-memory buffer, and uses a completely separate implementation. This patch teaches the generic packet_read function to accept either source, and we can do away with packet_get_line's implementation. There are two other differences to account for between the old and new functions. The first is that we used to read into a strbuf, but now read into a fixed size buffer. The only two callers are fine with that, and in fact it simplifies their code, since they can use the same static-buffer interface as the rest of the packet_read_line callers (and we provide a similar convenience wrapper for reading from a buffer rather than a descriptor). This is technically an externally-visible behavior change in that we used to accept arbitrary sized packets up to 65532 bytes, and now cap out at LARGE_PACKET_MAX, 65520. In practice this doesn't matter, as we use it only for parsing smart-http headers (of which there is exactly one defined, and it is small and fixed-size). And any extension headers would be breaking the protocol to go over LARGE_PACKET_MAX anyway. The other difference is that packet_get_line would return on error rather than dying. However, both callers of packet_get_line are actually improved by dying. The first caller does its own error checking, but we can drop that; as a result, we'll actually get more specific reporting about protocol breakage when packet_read dies internally. The only downside is that packet_read will not print the smart-http URL that failed, but that's not a big deal; anybody not debugging can already see the remote's URL already, and anybody debugging would want to run with GIT_CURL_VERBOSE anyway to see way more information. The second caller, which is just trying to skip past any extra smart-http headers (of which there are none defined, but which we allow to keep room for future expansion), did not error check at all. As a result, it would treat an error just like a flush packet. The resulting mess would generally cause an error later in get_remote_heads, but now we get error reporting much closer to the source of the problem. Brown-paper-bag-fixes-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-24 06:31:34 +08:00
int n = packet_read(rpc->out, NULL, NULL, rpc->buf, rpc->alloc, 0);
if (!n)
break;
rpc->pos = 0;
rpc->len = n;
err |= post_rpc(rpc);
}
close(client.in);
client.in = -1;
if (!err) {
strbuf_read(&rpc->result, client.out, 0);
} else {
char buf[4096];
for (;;)
if (xread(client.out, buf, sizeof(buf)) <= 0)
break;
}
close(client.out);
client.out = -1;
err |= finish_command(&client);
free(rpc->service_url);
free(rpc->hdr_content_type);
free(rpc->hdr_accept);
free(rpc->protocol_header);
free(rpc->buf);
strbuf_release(&buf);
return err;
}
static int fetch_dumb(int nr_heads, struct ref **to_fetch)
{
struct walker *walker;
char **targets;
int ret, i;
ALLOC_ARRAY(targets, nr_heads);
if (options.depth || options.deepen_since)
die("dumb http transport does not support shallow capabilities");
for (i = 0; i < nr_heads; i++)
targets[i] = xstrdup(oid_to_hex(&to_fetch[i]->old_oid));
walker = get_http_walker(url.buf);
walker->get_verbosely = options.verbosity >= 3;
walker->get_recover = 0;
ret = walker_fetch(walker, nr_heads, targets, NULL, NULL);
walker_free(walker);
for (i = 0; i < nr_heads; i++)
free(targets[i]);
free(targets);
return ret ? error("fetch failed.") : 0;
}
static int fetch_git(struct discovery *heads,
int nr_heads, struct ref **to_fetch)
{
struct rpc_state rpc;
struct strbuf preamble = STRBUF_INIT;
int i, err;
struct argv_array args = ARGV_ARRAY_INIT;
argv_array_pushl(&args, "fetch-pack", "--stateless-rpc",
"--stdin", "--lock-pack", NULL);
if (options.followtags)
argv_array_push(&args, "--include-tag");
if (options.thin)
argv_array_push(&args, "--thin");
if (options.verbosity >= 3)
argv_array_pushl(&args, "-v", "-v", NULL);
if (options.check_self_contained_and_connected)
argv_array_push(&args, "--check-self-contained-and-connected");
if (options.cloning)
argv_array_push(&args, "--cloning");
if (options.update_shallow)
argv_array_push(&args, "--update-shallow");
if (!options.progress)
argv_array_push(&args, "--no-progress");
if (options.depth)
argv_array_pushf(&args, "--depth=%lu", options.depth);
if (options.deepen_since)
argv_array_pushf(&args, "--shallow-since=%s", options.deepen_since);
for (i = 0; i < options.deepen_not.nr; i++)
argv_array_pushf(&args, "--shallow-exclude=%s",
options.deepen_not.items[i].string);
fetch, upload-pack: --deepen=N extends shallow boundary by N commits In git-fetch, --depth argument is always relative with the latest remote refs. This makes it a bit difficult to cover this use case, where the user wants to make the shallow history, say 3 levels deeper. It would work if remote refs have not moved yet, but nobody can guarantee that, especially when that use case is performed a couple months after the last clone or "git fetch --depth". Also, modifying shallow boundary using --depth does not work well with clones created by --since or --not. This patch fixes that. A new argument --deepen=<N> will add <N> more (*) parent commits to the current history regardless of where remote refs are. Have/Want negotiation is still respected. So if remote refs move, the server will send two chunks: one between "have" and "want" and another to extend shallow history. In theory, the client could send no "want"s in order to get the second chunk only. But the protocol does not allow that. Either you send no want lines, which means ls-remote; or you have to send at least one want line that carries deep-relative to the server.. The main work was done by Dongcan Jiang. I fixed it up here and there. And of course all the bugs belong to me. (*) We could even support --deepen=<N> where <N> is negative. In that case we can cut some history from the shallow clone. This operation (and --depth=<shorter depth>) does not require interaction with remote side (and more complicated to implement as a result). Helped-by: Duy Nguyen <pclouds@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Dongcan Jiang <dongcan.jiang@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-12 18:54:09 +08:00
if (options.deepen_relative && options.depth)
argv_array_push(&args, "--deepen-relative");
if (options.from_promisor)
argv_array_push(&args, "--from-promisor");
if (options.no_dependents)
argv_array_push(&args, "--no-dependents");
if (options.filter)
argv_array_pushf(&args, "--filter=%s", options.filter);
argv_array_push(&args, url.buf);
for (i = 0; i < nr_heads; i++) {
struct ref *ref = to_fetch[i];
if (!*ref->name)
die("cannot fetch by sha1 over smart http");
packet_buf_write(&preamble, "%s %s\n",
oid_to_hex(&ref->old_oid), ref->name);
}
packet_buf_flush(&preamble);
memset(&rpc, 0, sizeof(rpc));
rpc.service_name = "git-upload-pack",
rpc.argv = args.argv;
rpc.stdin_preamble = &preamble;
rpc.gzip_request = 1;
err = rpc_service(&rpc, heads);
if (rpc.result.len)
write_or_die(1, rpc.result.buf, rpc.result.len);
strbuf_release(&rpc.result);
strbuf_release(&preamble);
argv_array_clear(&args);
return err;
}
static int fetch(int nr_heads, struct ref **to_fetch)
{
remote-curl: always parse incoming refs When remote-curl receives a list of refs from a server, it keeps the whole buffer intact. When we get a "list" command, we feed the result to get_remote_heads, and when we get a "fetch" or "push" command, we feed it to fetch-pack or send-pack, respectively. If the HTTP response from the server is truncated for any reason, we will get an incomplete ref advertisement. If we then feed this incomplete list to fetch-pack, one of a few things may happen: 1. If the truncation is in a packet header, fetch-pack will notice the bogus line and complain. 2. If the truncation is inside a packet, fetch-pack will keep waiting for us to send the rest of the packet, which we never will. 3. If the truncation is at a packet boundary, fetch-pack will keep waiting for us to send the next packet, which we never will. As a result, fetch-pack hangs, waiting for input. However, remote-curl believes it has sent all of the advertisement, and therefore waits for fetch-pack to speak. The two processes end up in a deadlock. We do notice the broken ref list if we feed it to get_remote_heads. So if git asks the helper to do a "list" followed by a "fetch", we are safe; we'll abort during the list operation, which parses the refs. This patch teaches remote-curl to always parse and save the incoming ref list when we read the ref advertisement from a server. That means that we will always verify and abort before even running fetch-pack (or send-pack) when reading a corrupted list, even if we do not run the "list" command explicitly. Since we save the result, in the common case of running "list" then "fetch", we do not do any extra parsing at all. In the case of just a "fetch", we do an extra round of parsing, but only once. Note also that the "fetch" case will now also initialize server_capabilities from the remote (in remote-curl; we already would do so inside fetch-pack). Doing "list+fetch" already does this. It doesn't actually matter now, but the new behavior is arguably more correct, should remote-curl ever start caring about the server's capability list. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-21 04:07:19 +08:00
struct discovery *d = discover_refs("git-upload-pack", 0);
if (d->proto_git)
return fetch_git(d, nr_heads, to_fetch);
else
return fetch_dumb(nr_heads, to_fetch);
}
static void parse_fetch(struct strbuf *buf)
{
struct ref **to_fetch = NULL;
struct ref *list_head = NULL;
struct ref **list = &list_head;
int alloc_heads = 0, nr_heads = 0;
do {
const char *p;
if (skip_prefix(buf->buf, "fetch ", &p)) {
const char *name;
struct ref *ref;
struct object_id old_oid;
if (get_oid_hex(p, &old_oid))
die("protocol error: expected sha/ref, got %s'", p);
if (p[GIT_SHA1_HEXSZ] == ' ')
name = p + GIT_SHA1_HEXSZ + 1;
else if (!p[GIT_SHA1_HEXSZ])
name = "";
else
die("protocol error: expected sha/ref, got %s'", p);
ref = alloc_ref(name);
oidcpy(&ref->old_oid, &old_oid);
*list = ref;
list = &ref->next;
ALLOC_GROW(to_fetch, nr_heads + 1, alloc_heads);
to_fetch[nr_heads++] = ref;
}
else
die("http transport does not support %s", buf->buf);
strbuf_reset(buf);
if (strbuf_getline_lf(buf, stdin) == EOF)
return;
if (!*buf->buf)
break;
} while (1);
if (fetch(nr_heads, to_fetch))
exit(128); /* error already reported */
free_refs(list_head);
free(to_fetch);
printf("\n");
fflush(stdout);
strbuf_reset(buf);
}
static int push_dav(int nr_spec, char **specs)
{
struct child_process child = CHILD_PROCESS_INIT;
size_t i;
child.git_cmd = 1;
argv_array_push(&child.args, "http-push");
argv_array_push(&child.args, "--helper-status");
if (options.dry_run)
argv_array_push(&child.args, "--dry-run");
if (options.verbosity > 1)
argv_array_push(&child.args, "--verbose");
argv_array_push(&child.args, url.buf);
for (i = 0; i < nr_spec; i++)
argv_array_push(&child.args, specs[i]);
if (run_command(&child))
die("git-http-push failed");
return 0;
}
static int push_git(struct discovery *heads, int nr_spec, char **specs)
{
struct rpc_state rpc;
int i, err;
struct argv_array args;
struct string_list_item *cas_option;
struct strbuf preamble = STRBUF_INIT;
argv_array_init(&args);
argv_array_pushl(&args, "send-pack", "--stateless-rpc", "--helper-status",
NULL);
if (options.thin)
argv_array_push(&args, "--thin");
if (options.dry_run)
argv_array_push(&args, "--dry-run");
if (options.push_cert == SEND_PACK_PUSH_CERT_ALWAYS)
argv_array_push(&args, "--signed=yes");
else if (options.push_cert == SEND_PACK_PUSH_CERT_IF_ASKED)
argv_array_push(&args, "--signed=if-asked");
if (options.verbosity == 0)
argv_array_push(&args, "--quiet");
else if (options.verbosity > 1)
argv_array_push(&args, "--verbose");
for (i = 0; i < options.push_options.nr; i++)
argv_array_pushf(&args, "--push-option=%s",
options.push_options.items[i].string);
argv_array_push(&args, options.progress ? "--progress" : "--no-progress");
for_each_string_list_item(cas_option, &cas_options)
argv_array_push(&args, cas_option->string);
argv_array_push(&args, url.buf);
argv_array_push(&args, "--stdin");
for (i = 0; i < nr_spec; i++)
packet_buf_write(&preamble, "%s\n", specs[i]);
packet_buf_flush(&preamble);
memset(&rpc, 0, sizeof(rpc));
rpc.service_name = "git-receive-pack",
rpc.argv = args.argv;
rpc.stdin_preamble = &preamble;
err = rpc_service(&rpc, heads);
if (rpc.result.len)
write_or_die(1, rpc.result.buf, rpc.result.len);
strbuf_release(&rpc.result);
strbuf_release(&preamble);
argv_array_clear(&args);
return err;
}
static int push(int nr_spec, char **specs)
{
remote-curl: always parse incoming refs When remote-curl receives a list of refs from a server, it keeps the whole buffer intact. When we get a "list" command, we feed the result to get_remote_heads, and when we get a "fetch" or "push" command, we feed it to fetch-pack or send-pack, respectively. If the HTTP response from the server is truncated for any reason, we will get an incomplete ref advertisement. If we then feed this incomplete list to fetch-pack, one of a few things may happen: 1. If the truncation is in a packet header, fetch-pack will notice the bogus line and complain. 2. If the truncation is inside a packet, fetch-pack will keep waiting for us to send the rest of the packet, which we never will. 3. If the truncation is at a packet boundary, fetch-pack will keep waiting for us to send the next packet, which we never will. As a result, fetch-pack hangs, waiting for input. However, remote-curl believes it has sent all of the advertisement, and therefore waits for fetch-pack to speak. The two processes end up in a deadlock. We do notice the broken ref list if we feed it to get_remote_heads. So if git asks the helper to do a "list" followed by a "fetch", we are safe; we'll abort during the list operation, which parses the refs. This patch teaches remote-curl to always parse and save the incoming ref list when we read the ref advertisement from a server. That means that we will always verify and abort before even running fetch-pack (or send-pack) when reading a corrupted list, even if we do not run the "list" command explicitly. Since we save the result, in the common case of running "list" then "fetch", we do not do any extra parsing at all. In the case of just a "fetch", we do an extra round of parsing, but only once. Note also that the "fetch" case will now also initialize server_capabilities from the remote (in remote-curl; we already would do so inside fetch-pack). Doing "list+fetch" already does this. It doesn't actually matter now, but the new behavior is arguably more correct, should remote-curl ever start caring about the server's capability list. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-21 04:07:19 +08:00
struct discovery *heads = discover_refs("git-receive-pack", 1);
int ret;
if (heads->proto_git)
ret = push_git(heads, nr_spec, specs);
else
ret = push_dav(nr_spec, specs);
free_discovery(heads);
return ret;
}
static void parse_push(struct strbuf *buf)
{
char **specs = NULL;
int alloc_spec = 0, nr_spec = 0, i, ret;
do {
if (starts_with(buf->buf, "push ")) {
ALLOC_GROW(specs, nr_spec + 1, alloc_spec);
specs[nr_spec++] = xstrdup(buf->buf + 5);
}
else
die("http transport does not support %s", buf->buf);
strbuf_reset(buf);
if (strbuf_getline_lf(buf, stdin) == EOF)
goto free_specs;
if (!*buf->buf)
break;
} while (1);
ret = push(nr_spec, specs);
printf("\n");
fflush(stdout);
if (ret)
exit(128); /* error already reported */
free_specs:
for (i = 0; i < nr_spec; i++)
free(specs[i]);
free(specs);
}
/*
* Used to represent the state of a connection to an HTTP server when
* communicating using git's wire-protocol version 2.
*/
struct proxy_state {
char *service_name;
char *service_url;
struct curl_slist *headers;
struct strbuf request_buffer;
int in;
int out;
struct packet_reader reader;
size_t pos;
int seen_flush;
};
static void proxy_state_init(struct proxy_state *p, const char *service_name,
enum protocol_version version)
{
struct strbuf buf = STRBUF_INIT;
memset(p, 0, sizeof(*p));
p->service_name = xstrdup(service_name);
p->in = 0;
p->out = 1;
strbuf_init(&p->request_buffer, 0);
strbuf_addf(&buf, "%s%s", url.buf, p->service_name);
p->service_url = strbuf_detach(&buf, NULL);
p->headers = http_copy_default_headers();
strbuf_addf(&buf, "Content-Type: application/x-%s-request", p->service_name);
p->headers = curl_slist_append(p->headers, buf.buf);
strbuf_reset(&buf);
strbuf_addf(&buf, "Accept: application/x-%s-result", p->service_name);
p->headers = curl_slist_append(p->headers, buf.buf);
strbuf_reset(&buf);
p->headers = curl_slist_append(p->headers, "Transfer-Encoding: chunked");
/* Add the Git-Protocol header */
if (get_protocol_http_header(version, &buf))
p->headers = curl_slist_append(p->headers, buf.buf);
packet_reader_init(&p->reader, p->in, NULL, 0,
PACKET_READ_GENTLE_ON_EOF);
strbuf_release(&buf);
}
static void proxy_state_clear(struct proxy_state *p)
{
free(p->service_name);
free(p->service_url);
curl_slist_free_all(p->headers);
strbuf_release(&p->request_buffer);
}
/*
* CURLOPT_READFUNCTION callback function.
* Attempts to copy over a single packet-line at a time into the
* curl provided buffer.
*/
static size_t proxy_in(char *buffer, size_t eltsize,
size_t nmemb, void *userdata)
{
size_t max;
struct proxy_state *p = userdata;
size_t avail = p->request_buffer.len - p->pos;
if (eltsize != 1)
BUG("curl read callback called with size = %"PRIuMAX" != 1",
(uintmax_t)eltsize);
max = nmemb;
if (!avail) {
if (p->seen_flush) {
p->seen_flush = 0;
return 0;
}
strbuf_reset(&p->request_buffer);
switch (packet_reader_read(&p->reader)) {
case PACKET_READ_EOF:
die("unexpected EOF when reading from parent process");
case PACKET_READ_NORMAL:
packet_buf_write_len(&p->request_buffer, p->reader.line,
p->reader.pktlen);
break;
case PACKET_READ_DELIM:
packet_buf_delim(&p->request_buffer);
break;
case PACKET_READ_FLUSH:
packet_buf_flush(&p->request_buffer);
p->seen_flush = 1;
break;
}
p->pos = 0;
avail = p->request_buffer.len;
}
if (max < avail)
avail = max;
memcpy(buffer, p->request_buffer.buf + p->pos, avail);
p->pos += avail;
return avail;
}
static size_t proxy_out(char *buffer, size_t eltsize,
size_t nmemb, void *userdata)
{
size_t size;
struct proxy_state *p = userdata;
if (eltsize != 1)
BUG("curl read callback called with size = %"PRIuMAX" != 1",
(uintmax_t)eltsize);
size = nmemb;
write_or_die(p->out, buffer, size);
return size;
}
/* Issues a request to the HTTP server configured in `p` */
static int proxy_request(struct proxy_state *p)
{
struct active_request_slot *slot;
slot = get_active_slot();
curl_easy_setopt(slot->curl, CURLOPT_ENCODING, "");
curl_easy_setopt(slot->curl, CURLOPT_NOBODY, 0);
curl_easy_setopt(slot->curl, CURLOPT_POST, 1);
curl_easy_setopt(slot->curl, CURLOPT_URL, p->service_url);
curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, p->headers);
/* Setup function to read request from client */
curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, proxy_in);
curl_easy_setopt(slot->curl, CURLOPT_READDATA, p);
/* Setup function to write server response to client */
curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, proxy_out);
curl_easy_setopt(slot->curl, CURLOPT_WRITEDATA, p);
if (run_slot(slot, NULL) != HTTP_OK)
return -1;
return 0;
}
static int stateless_connect(const char *service_name)
{
struct discovery *discover;
struct proxy_state p;
/*
* Run the info/refs request and see if the server supports protocol
* v2. If and only if the server supports v2 can we successfully
* establish a stateless connection, otherwise we need to tell the
* client to fallback to using other transport helper functions to
* complete their request.
*/
discover = discover_refs(service_name, 0);
if (discover->version != protocol_v2) {
printf("fallback\n");
fflush(stdout);
return -1;
} else {
/* Stateless Connection established */
printf("\n");
fflush(stdout);
}
proxy_state_init(&p, service_name, discover->version);
/*
* Dump the capability listing that we got from the server earlier
* during the info/refs request.
*/
write_or_die(p.out, discover->buf, discover->len);
/* Peek the next packet line. Until we see EOF keep sending POSTs */
while (packet_reader_peek(&p.reader) != PACKET_READ_EOF) {
if (proxy_request(&p)) {
/* We would have an err here */
break;
}
}
proxy_state_clear(&p);
return 0;
}
add an extra level of indirection to main() There are certain startup tasks that we expect every git process to do. In some cases this is just to improve the quality of the program (e.g., setting up gettext()). In others it is a requirement for using certain functions in libgit.a (e.g., system_path() expects that you have called git_extract_argv0_path()). Most commands are builtins and are covered by the git.c version of main(). However, there are still a few external commands that use their own main(). Each of these has to remember to include the correct startup sequence, and we are not always consistent. Rather than just fix the inconsistencies, let's make this harder to get wrong by providing a common main() that can run this standard startup. We basically have two options to do this: - the compat/mingw.h file already does something like this by adding a #define that replaces the definition of main with a wrapper that calls mingw_startup(). The upside is that the code in each program doesn't need to be changed at all; it's rewritten on the fly by the preprocessor. The downside is that it may make debugging of the startup sequence a bit more confusing, as the preprocessor is quietly inserting new code. - the builtin functions are all of the form cmd_foo(), and git.c's main() calls them. This is much more explicit, which may make things more obvious to somebody reading the code. It's also more flexible (because of course we have to figure out _which_ cmd_foo() to call). The downside is that each of the builtins must define cmd_foo(), instead of just main(). This patch chooses the latter option, preferring the more explicit approach, even though it is more invasive. We introduce a new file common-main.c, with the "real" main. It expects to call cmd_main() from whatever other objects it is linked against. We link common-main.o against anything that links against libgit.a, since we know that such programs will need to do this setup. Note that common-main.o can't actually go inside libgit.a, as the linker would not pick up its main() function automatically (it has no callers). The rest of the patch is just adjusting all of the various external programs (mostly in t/helper) to use cmd_main(). I've provided a global declaration for cmd_main(), which means that all of the programs also need to match its signature. In particular, many functions need to switch to "const char **" instead of "char **" for argv. This effect ripples out to a few other variables and functions, as well. This makes the patch even more invasive, but the end result is much better. We should be treating argv strings as const anyway, and now all programs conform to the same signature (which also matches the way builtins are defined). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-01 13:58:58 +08:00
int cmd_main(int argc, const char **argv)
{
struct strbuf buf = STRBUF_INIT;
int nongit;
setup_git_directory_gently(&nongit);
if (argc < 2) {
error("remote-curl: usage: git remote-curl <remote> [<url>]");
return 1;
}
options.verbosity = 1;
options.progress = !!isatty(2);
options.thin = 1;
string_list_init(&options.deepen_not, 1);
string_list_init(&options.push_options, 1);
remote = remote_get(argv[1]);
if (argc > 2) {
end_url_with_slash(&url, argv[2]);
} else {
end_url_with_slash(&url, remote->url[0]);
}
http_init(remote, url.buf, 0);
do {
const char *arg;
if (strbuf_getline_lf(&buf, stdin) == EOF) {
if (ferror(stdin))
error("remote-curl: error reading command stream from git");
return 1;
}
if (buf.len == 0)
break;
if (starts_with(buf.buf, "fetch ")) {
if (nongit)
die("remote-curl: fetch attempted without a local repo");
parse_fetch(&buf);
} else if (!strcmp(buf.buf, "list") || starts_with(buf.buf, "list ")) {
int for_push = !!strstr(buf.buf + 4, "for-push");
output_refs(get_refs(for_push));
} else if (starts_with(buf.buf, "push ")) {
parse_push(&buf);
} else if (skip_prefix(buf.buf, "option ", &arg)) {
char *value = strchr(arg, ' ');
int result;
if (value)
*value++ = '\0';
else
value = "true";
result = set_option(arg, value);
if (!result)
printf("ok\n");
else if (result < 0)
printf("error invalid value\n");
else
printf("unsupported\n");
fflush(stdout);
} else if (!strcmp(buf.buf, "capabilities")) {
printf("stateless-connect\n");
printf("fetch\n");
printf("option\n");
printf("push\n");
printf("check-connectivity\n");
printf("\n");
fflush(stdout);
} else if (skip_prefix(buf.buf, "stateless-connect ", &arg)) {
if (!stateless_connect(arg))
break;
} else {
error("remote-curl: unknown command '%s' from git", buf.buf);
return 1;
}
strbuf_reset(&buf);
} while (1);
http_cleanup();
return 0;
}