git/pkt-line.c

675 lines
16 KiB
C
Raw Normal View History

#include "cache.h"
#include "pkt-line.h"
pkt-line: show packets in async processes as "sideband" If you run "GIT_TRACE_PACKET=1 git push", you may get confusing output like (line prefixes omitted for clarity): packet: push< \1000eunpack ok0019ok refs/heads/master0000 packet: push< unpack ok packet: push< ok refs/heads/master packet: push< 0000 packet: push< 0000 Why do we see the data twice, once apparently wrapped inside another pkt-line, and once unwrapped? Why do we get two flush packets? The answer is that we start an async process to demux the sideband data. The first entry comes from the sideband process reading the data, and the second from push itself. Likewise, the first flush is inside the demuxed packet, and the second is an actual sideband flush. We can make this a bit more clear by marking the sideband demuxer explicitly as "sideband" rather than "push". The most elegant way to do this would be to simply call packet_trace_identity() inside the sideband demuxer. But we can't do that reliably, because it relies on a global variable, which might be shared if pthreads are in use. What we really need is thread-local storage for packet_trace_identity. But the async code does not provide an interface for that, and it would be messy to add it here (we'd have to care about pthreads, initializing our pthread_key_t ahead of time, etc). So instead, let us just assume that any async process is handling sideband data. That's always true now, and is likely to remain so in the future. The output looks like: packet: sideband< \1000eunpack ok0019ok refs/heads/master0000 packet: push< unpack ok packet: push< ok refs/heads/master packet: push< 0000 packet: sideband< 0000 Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-02 04:24:13 +08:00
#include "run-command.h"
pkt-line: provide a LARGE_PACKET_MAX static buffer Most of the callers of packet_read_line just read into a static 1000-byte buffer (callers which handle arbitrary binary data already use LARGE_PACKET_MAX). This works fine in practice, because: 1. The only variable-sized data in these lines is a ref name, and refs tend to be a lot shorter than 1000 characters. 2. When sending ref lines, git-core always limits itself to 1000 byte packets. However, the only limit given in the protocol specification in Documentation/technical/protocol-common.txt is LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in pack-protocol.txt, and then only describing what we write, not as a specific limit for readers. This patch lets us bump the 1000-byte limit to LARGE_PACKET_MAX. Even though git-core will never write a packet where this makes a difference, there are two good reasons to do this: 1. Other git implementations may have followed protocol-common.txt and used a larger maximum size. We don't bump into it in practice because it would involve very long ref names. 2. We may want to increase the 1000-byte limit one day. Since packets are transferred before any capabilities, it's difficult to do this in a backwards-compatible way. But if we bump the size of buffer the readers can handle, eventually older versions of git will be obsolete enough that we can justify bumping the writers, as well. We don't have plans to do this anytime soon, but there is no reason not to start the clock ticking now. Just bumping all of the reading bufs to LARGE_PACKET_MAX would waste memory. Instead, since most readers just read into a temporary buffer anyway, let's provide a single static buffer that all callers can use. We can further wrap this detail away by having the packet_read_line wrapper just use the buffer transparently and return a pointer to the static storage. That covers most of the cases, and the remaining ones already read into their own LARGE_PACKET_MAX buffers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-21 04:02:57 +08:00
char packet_buffer[LARGE_PACKET_MAX];
static const char *packet_trace_prefix = "git";
static struct trace_key trace_packet = TRACE_KEY_INIT(PACKET);
static struct trace_key trace_pack = TRACE_KEY_INIT(PACKFILE);
void packet_trace_identity(const char *prog)
{
packet_trace_prefix = xstrdup(prog);
}
pkt-line: show packets in async processes as "sideband" If you run "GIT_TRACE_PACKET=1 git push", you may get confusing output like (line prefixes omitted for clarity): packet: push< \1000eunpack ok0019ok refs/heads/master0000 packet: push< unpack ok packet: push< ok refs/heads/master packet: push< 0000 packet: push< 0000 Why do we see the data twice, once apparently wrapped inside another pkt-line, and once unwrapped? Why do we get two flush packets? The answer is that we start an async process to demux the sideband data. The first entry comes from the sideband process reading the data, and the second from push itself. Likewise, the first flush is inside the demuxed packet, and the second is an actual sideband flush. We can make this a bit more clear by marking the sideband demuxer explicitly as "sideband" rather than "push". The most elegant way to do this would be to simply call packet_trace_identity() inside the sideband demuxer. But we can't do that reliably, because it relies on a global variable, which might be shared if pthreads are in use. What we really need is thread-local storage for packet_trace_identity. But the async code does not provide an interface for that, and it would be messy to add it here (we'd have to care about pthreads, initializing our pthread_key_t ahead of time, etc). So instead, let us just assume that any async process is handling sideband data. That's always true now, and is likely to remain so in the future. The output looks like: packet: sideband< \1000eunpack ok0019ok refs/heads/master0000 packet: push< unpack ok packet: push< ok refs/heads/master packet: push< 0000 packet: sideband< 0000 Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-02 04:24:13 +08:00
static const char *get_trace_prefix(void)
{
return in_async() ? "sideband" : packet_trace_prefix;
}
static int packet_trace_pack(const char *buf, unsigned int len, int sideband)
{
if (!sideband) {
trace_verbatim(&trace_pack, buf, len);
return 1;
} else if (len && *buf == '\1') {
trace_verbatim(&trace_pack, buf + 1, len - 1);
return 1;
} else {
/* it's another non-pack sideband */
return 0;
}
}
static void packet_trace(const char *buf, unsigned int len, int write)
{
int i;
struct strbuf out;
static int in_pack, sideband;
if (!trace_want(&trace_packet) && !trace_want(&trace_pack))
return;
if (in_pack) {
if (packet_trace_pack(buf, len, sideband))
return;
} else if (starts_with(buf, "PACK") || starts_with(buf, "\1PACK")) {
in_pack = 1;
sideband = *buf == '\1';
packet_trace_pack(buf, len, sideband);
/*
* Make a note in the human-readable trace that the pack data
* started.
*/
buf = "PACK ...";
len = strlen(buf);
}
if (!trace_want(&trace_packet))
return;
/* +32 is just a guess for header + quoting */
strbuf_init(&out, len+32);
strbuf_addf(&out, "packet: %12s%c ",
pkt-line: show packets in async processes as "sideband" If you run "GIT_TRACE_PACKET=1 git push", you may get confusing output like (line prefixes omitted for clarity): packet: push< \1000eunpack ok0019ok refs/heads/master0000 packet: push< unpack ok packet: push< ok refs/heads/master packet: push< 0000 packet: push< 0000 Why do we see the data twice, once apparently wrapped inside another pkt-line, and once unwrapped? Why do we get two flush packets? The answer is that we start an async process to demux the sideband data. The first entry comes from the sideband process reading the data, and the second from push itself. Likewise, the first flush is inside the demuxed packet, and the second is an actual sideband flush. We can make this a bit more clear by marking the sideband demuxer explicitly as "sideband" rather than "push". The most elegant way to do this would be to simply call packet_trace_identity() inside the sideband demuxer. But we can't do that reliably, because it relies on a global variable, which might be shared if pthreads are in use. What we really need is thread-local storage for packet_trace_identity. But the async code does not provide an interface for that, and it would be messy to add it here (we'd have to care about pthreads, initializing our pthread_key_t ahead of time, etc). So instead, let us just assume that any async process is handling sideband data. That's always true now, and is likely to remain so in the future. The output looks like: packet: sideband< \1000eunpack ok0019ok refs/heads/master0000 packet: push< unpack ok packet: push< ok refs/heads/master packet: push< 0000 packet: sideband< 0000 Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-02 04:24:13 +08:00
get_trace_prefix(), write ? '>' : '<');
/* XXX we should really handle printable utf8 */
for (i = 0; i < len; i++) {
/* suppress newlines */
if (buf[i] == '\n')
continue;
if (buf[i] >= 0x20 && buf[i] <= 0x7e)
strbuf_addch(&out, buf[i]);
else
strbuf_addf(&out, "\\%o", buf[i]);
}
strbuf_addch(&out, '\n');
trace_strbuf(&trace_packet, &out);
strbuf_release(&out);
}
/*
* If we buffered things up above (we don't, but we should),
* we'd flush it here
*/
void packet_flush(int fd)
{
packet_trace("0000", 4, 1);
if (write_in_full(fd, "0000", 4) < 0)
die_errno(_("unable to write flush packet"));
}
void packet_delim(int fd)
{
packet_trace("0001", 4, 1);
if (write_in_full(fd, "0001", 4) < 0)
die_errno(_("unable to write delim packet"));
}
void packet_response_end(int fd)
{
packet_trace("0002", 4, 1);
if (write_in_full(fd, "0002", 4) < 0)
die_errno(_("unable to write response end packet"));
}
int packet_flush_gently(int fd)
{
packet_trace("0000", 4, 1);
if (write_in_full(fd, "0000", 4) < 0)
return error(_("flush packet write failed"));
return 0;
}
void packet_buf_flush(struct strbuf *buf)
{
packet_trace("0000", 4, 1);
strbuf_add(buf, "0000", 4);
}
void packet_buf_delim(struct strbuf *buf)
{
packet_trace("0001", 4, 1);
strbuf_add(buf, "0001", 4);
}
void set_packet_header(char *buf, int size)
{
static char hexchar[] = "0123456789abcdef";
#define hex(a) (hexchar[(a) & 15])
buf[0] = hex(size >> 12);
buf[1] = hex(size >> 8);
buf[2] = hex(size >> 4);
buf[3] = hex(size);
#undef hex
}
static void format_packet(struct strbuf *out, const char *prefix,
const char *fmt, va_list args)
{
pkt-line: allow writing of LARGE_PACKET_MAX buffers When we send out pkt-lines with refnames, we use a static 1000-byte buffer. This means that the maximum size of a ref over the git protocol is around 950 bytes (the exact size depends on the protocol line being written, but figure on a sha1 plus some boilerplate). This is enough for any sane workflow, but occasionally odd things happen (e.g., a bug may create a ref "foo/foo/foo/..." accidentally). With the current code, you cannot even use "push" to delete such a ref from a remote. Let's switch to using a strbuf, with a hard-limit of LARGE_PACKET_MAX (which is specified by the protocol). This matches the size of the readers, as of 74543a0 (pkt-line: provide a LARGE_PACKET_MAX static buffer, 2013-02-20). Versions of git older than that will complain about our large packets, but it's really no worse than the current behavior. Right now the sender barfs with "impossibly long line" trying to send the packet, and afterwards the reader will barf with "protocol error: bad line length %d", which is arguably better anyway. Note that we're not really _solving_ the problem here, but just bumping the limits. In theory, the length of a ref is unbounded, and pkt-line can only represent sizes up to 65531 bytes. So we are just bumping the limit, not removing it. But hopefully 64K should be enough for anyone. As a bonus, by using a strbuf for the formatting we can eliminate an unnecessary copy in format_buf_write. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-10 17:47:02 +08:00
size_t orig_len, n;
pkt-line: allow writing of LARGE_PACKET_MAX buffers When we send out pkt-lines with refnames, we use a static 1000-byte buffer. This means that the maximum size of a ref over the git protocol is around 950 bytes (the exact size depends on the protocol line being written, but figure on a sha1 plus some boilerplate). This is enough for any sane workflow, but occasionally odd things happen (e.g., a bug may create a ref "foo/foo/foo/..." accidentally). With the current code, you cannot even use "push" to delete such a ref from a remote. Let's switch to using a strbuf, with a hard-limit of LARGE_PACKET_MAX (which is specified by the protocol). This matches the size of the readers, as of 74543a0 (pkt-line: provide a LARGE_PACKET_MAX static buffer, 2013-02-20). Versions of git older than that will complain about our large packets, but it's really no worse than the current behavior. Right now the sender barfs with "impossibly long line" trying to send the packet, and afterwards the reader will barf with "protocol error: bad line length %d", which is arguably better anyway. Note that we're not really _solving_ the problem here, but just bumping the limits. In theory, the length of a ref is unbounded, and pkt-line can only represent sizes up to 65531 bytes. So we are just bumping the limit, not removing it. But hopefully 64K should be enough for anyone. As a bonus, by using a strbuf for the formatting we can eliminate an unnecessary copy in format_buf_write. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-10 17:47:02 +08:00
orig_len = out->len;
strbuf_addstr(out, "0000");
strbuf_addstr(out, prefix);
pkt-line: allow writing of LARGE_PACKET_MAX buffers When we send out pkt-lines with refnames, we use a static 1000-byte buffer. This means that the maximum size of a ref over the git protocol is around 950 bytes (the exact size depends on the protocol line being written, but figure on a sha1 plus some boilerplate). This is enough for any sane workflow, but occasionally odd things happen (e.g., a bug may create a ref "foo/foo/foo/..." accidentally). With the current code, you cannot even use "push" to delete such a ref from a remote. Let's switch to using a strbuf, with a hard-limit of LARGE_PACKET_MAX (which is specified by the protocol). This matches the size of the readers, as of 74543a0 (pkt-line: provide a LARGE_PACKET_MAX static buffer, 2013-02-20). Versions of git older than that will complain about our large packets, but it's really no worse than the current behavior. Right now the sender barfs with "impossibly long line" trying to send the packet, and afterwards the reader will barf with "protocol error: bad line length %d", which is arguably better anyway. Note that we're not really _solving_ the problem here, but just bumping the limits. In theory, the length of a ref is unbounded, and pkt-line can only represent sizes up to 65531 bytes. So we are just bumping the limit, not removing it. But hopefully 64K should be enough for anyone. As a bonus, by using a strbuf for the formatting we can eliminate an unnecessary copy in format_buf_write. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-10 17:47:02 +08:00
strbuf_vaddf(out, fmt, args);
n = out->len - orig_len;
if (n > LARGE_PACKET_MAX)
die(_("protocol error: impossibly long line"));
pkt-line: allow writing of LARGE_PACKET_MAX buffers When we send out pkt-lines with refnames, we use a static 1000-byte buffer. This means that the maximum size of a ref over the git protocol is around 950 bytes (the exact size depends on the protocol line being written, but figure on a sha1 plus some boilerplate). This is enough for any sane workflow, but occasionally odd things happen (e.g., a bug may create a ref "foo/foo/foo/..." accidentally). With the current code, you cannot even use "push" to delete such a ref from a remote. Let's switch to using a strbuf, with a hard-limit of LARGE_PACKET_MAX (which is specified by the protocol). This matches the size of the readers, as of 74543a0 (pkt-line: provide a LARGE_PACKET_MAX static buffer, 2013-02-20). Versions of git older than that will complain about our large packets, but it's really no worse than the current behavior. Right now the sender barfs with "impossibly long line" trying to send the packet, and afterwards the reader will barf with "protocol error: bad line length %d", which is arguably better anyway. Note that we're not really _solving_ the problem here, but just bumping the limits. In theory, the length of a ref is unbounded, and pkt-line can only represent sizes up to 65531 bytes. So we are just bumping the limit, not removing it. But hopefully 64K should be enough for anyone. As a bonus, by using a strbuf for the formatting we can eliminate an unnecessary copy in format_buf_write. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-10 17:47:02 +08:00
set_packet_header(&out->buf[orig_len], n);
pkt-line: allow writing of LARGE_PACKET_MAX buffers When we send out pkt-lines with refnames, we use a static 1000-byte buffer. This means that the maximum size of a ref over the git protocol is around 950 bytes (the exact size depends on the protocol line being written, but figure on a sha1 plus some boilerplate). This is enough for any sane workflow, but occasionally odd things happen (e.g., a bug may create a ref "foo/foo/foo/..." accidentally). With the current code, you cannot even use "push" to delete such a ref from a remote. Let's switch to using a strbuf, with a hard-limit of LARGE_PACKET_MAX (which is specified by the protocol). This matches the size of the readers, as of 74543a0 (pkt-line: provide a LARGE_PACKET_MAX static buffer, 2013-02-20). Versions of git older than that will complain about our large packets, but it's really no worse than the current behavior. Right now the sender barfs with "impossibly long line" trying to send the packet, and afterwards the reader will barf with "protocol error: bad line length %d", which is arguably better anyway. Note that we're not really _solving_ the problem here, but just bumping the limits. In theory, the length of a ref is unbounded, and pkt-line can only represent sizes up to 65531 bytes. So we are just bumping the limit, not removing it. But hopefully 64K should be enough for anyone. As a bonus, by using a strbuf for the formatting we can eliminate an unnecessary copy in format_buf_write. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-10 17:47:02 +08:00
packet_trace(out->buf + orig_len + 4, n - 4, 1);
}
static int packet_write_fmt_1(int fd, int gently, const char *prefix,
const char *fmt, va_list args)
{
static struct strbuf buf = STRBUF_INIT;
strbuf_reset(&buf);
format_packet(&buf, prefix, fmt, args);
if (write_in_full(fd, buf.buf, buf.len) < 0) {
if (!gently) {
check_pipe(errno);
die_errno(_("packet write with format failed"));
}
return error(_("packet write with format failed"));
}
return 0;
}
void packet_write_fmt(int fd, const char *fmt, ...)
{
va_list args;
va_start(args, fmt);
packet_write_fmt_1(fd, 0, "", fmt, args);
va_end(args);
}
int packet_write_fmt_gently(int fd, const char *fmt, ...)
{
int status;
va_list args;
va_start(args, fmt);
status = packet_write_fmt_1(fd, 1, "", fmt, args);
va_end(args);
return status;
}
static int do_packet_write(const int fd_out, const char *buf, size_t size,
struct strbuf *err)
{
char header[4];
size_t packet_size;
if (size > LARGE_PACKET_DATA_MAX) {
strbuf_addstr(err, _("packet write failed - data exceeds max packet size"));
return -1;
}
packet_trace(buf, size, 1);
packet_size = size + 4;
set_packet_header(header, packet_size);
/*
* Write the header and the buffer in 2 parts so that we do
* not need to allocate a buffer or rely on a static buffer.
* This also avoids putting a large buffer on the stack which
* might have multi-threading issues.
*/
if (write_in_full(fd_out, header, 4) < 0 ||
write_in_full(fd_out, buf, size) < 0) {
strbuf_addf(err, _("packet write failed: %s"), strerror(errno));
return -1;
}
return 0;
}
static int packet_write_gently(const int fd_out, const char *buf, size_t size)
{
struct strbuf err = STRBUF_INIT;
if (do_packet_write(fd_out, buf, size, &err)) {
error("%s", err.buf);
strbuf_release(&err);
return -1;
}
return 0;
}
void packet_write(int fd_out, const char *buf, size_t size)
{
struct strbuf err = STRBUF_INIT;
if (do_packet_write(fd_out, buf, size, &err))
die("%s", err.buf);
}
void packet_fwrite(FILE *f, const char *buf, size_t size)
{
size_t packet_size;
char header[4];
if (size > LARGE_PACKET_DATA_MAX)
die(_("packet write failed - data exceeds max packet size"));
packet_trace(buf, size, 1);
packet_size = size + 4;
set_packet_header(header, packet_size);
fwrite_or_die(f, header, 4);
fwrite_or_die(f, buf, size);
}
void packet_fwrite_fmt(FILE *fh, const char *fmt, ...)
{
static struct strbuf buf = STRBUF_INIT;
va_list args;
strbuf_reset(&buf);
va_start(args, fmt);
format_packet(&buf, "", fmt, args);
va_end(args);
fwrite_or_die(fh, buf.buf, buf.len);
}
void packet_fflush(FILE *f)
{
packet_trace("0000", 4, 1);
fwrite_or_die(f, "0000", 4);
fflush_or_die(f);
}
void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
{
va_list args;
va_start(args, fmt);
format_packet(buf, "", fmt, args);
va_end(args);
}
int write_packetized_from_fd_no_flush(int fd_in, int fd_out)
{
char *buf = xmalloc(LARGE_PACKET_DATA_MAX);
int err = 0;
ssize_t bytes_to_write;
while (!err) {
bytes_to_write = xread(fd_in, buf, LARGE_PACKET_DATA_MAX);
if (bytes_to_write < 0) {
free(buf);
return COPY_READ_ERROR;
}
if (bytes_to_write == 0)
break;
err = packet_write_gently(fd_out, buf, bytes_to_write);
}
free(buf);
return err;
}
int write_packetized_from_buf_no_flush_count(const char *src_in, size_t len,
int fd_out, int *packet_counter)
{
int err = 0;
size_t bytes_written = 0;
size_t bytes_to_write;
while (!err) {
if ((len - bytes_written) > LARGE_PACKET_DATA_MAX)
bytes_to_write = LARGE_PACKET_DATA_MAX;
else
bytes_to_write = len - bytes_written;
if (bytes_to_write == 0)
break;
err = packet_write_gently(fd_out, src_in + bytes_written, bytes_to_write);
bytes_written += bytes_to_write;
if (packet_counter)
(*packet_counter)++;
}
return err;
}
pkt-line: share buffer/descriptor reading implementation The packet_read function reads from a descriptor. The packet_get_line function is similar, but reads from an in-memory buffer, and uses a completely separate implementation. This patch teaches the generic packet_read function to accept either source, and we can do away with packet_get_line's implementation. There are two other differences to account for between the old and new functions. The first is that we used to read into a strbuf, but now read into a fixed size buffer. The only two callers are fine with that, and in fact it simplifies their code, since they can use the same static-buffer interface as the rest of the packet_read_line callers (and we provide a similar convenience wrapper for reading from a buffer rather than a descriptor). This is technically an externally-visible behavior change in that we used to accept arbitrary sized packets up to 65532 bytes, and now cap out at LARGE_PACKET_MAX, 65520. In practice this doesn't matter, as we use it only for parsing smart-http headers (of which there is exactly one defined, and it is small and fixed-size). And any extension headers would be breaking the protocol to go over LARGE_PACKET_MAX anyway. The other difference is that packet_get_line would return on error rather than dying. However, both callers of packet_get_line are actually improved by dying. The first caller does its own error checking, but we can drop that; as a result, we'll actually get more specific reporting about protocol breakage when packet_read dies internally. The only downside is that packet_read will not print the smart-http URL that failed, but that's not a big deal; anybody not debugging can already see the remote's URL already, and anybody debugging would want to run with GIT_CURL_VERBOSE anyway to see way more information. The second caller, which is just trying to skip past any extra smart-http headers (of which there are none defined, but which we allow to keep room for future expansion), did not error check at all. As a result, it would treat an error just like a flush packet. The resulting mess would generally cause an error later in get_remote_heads, but now we get error reporting much closer to the source of the problem. Brown-paper-bag-fixes-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-24 06:31:34 +08:00
static int get_packet_data(int fd, char **src_buf, size_t *src_size,
void *dst, unsigned size, int options)
{
pkt-line: share buffer/descriptor reading implementation The packet_read function reads from a descriptor. The packet_get_line function is similar, but reads from an in-memory buffer, and uses a completely separate implementation. This patch teaches the generic packet_read function to accept either source, and we can do away with packet_get_line's implementation. There are two other differences to account for between the old and new functions. The first is that we used to read into a strbuf, but now read into a fixed size buffer. The only two callers are fine with that, and in fact it simplifies their code, since they can use the same static-buffer interface as the rest of the packet_read_line callers (and we provide a similar convenience wrapper for reading from a buffer rather than a descriptor). This is technically an externally-visible behavior change in that we used to accept arbitrary sized packets up to 65532 bytes, and now cap out at LARGE_PACKET_MAX, 65520. In practice this doesn't matter, as we use it only for parsing smart-http headers (of which there is exactly one defined, and it is small and fixed-size). And any extension headers would be breaking the protocol to go over LARGE_PACKET_MAX anyway. The other difference is that packet_get_line would return on error rather than dying. However, both callers of packet_get_line are actually improved by dying. The first caller does its own error checking, but we can drop that; as a result, we'll actually get more specific reporting about protocol breakage when packet_read dies internally. The only downside is that packet_read will not print the smart-http URL that failed, but that's not a big deal; anybody not debugging can already see the remote's URL already, and anybody debugging would want to run with GIT_CURL_VERBOSE anyway to see way more information. The second caller, which is just trying to skip past any extra smart-http headers (of which there are none defined, but which we allow to keep room for future expansion), did not error check at all. As a result, it would treat an error just like a flush packet. The resulting mess would generally cause an error later in get_remote_heads, but now we get error reporting much closer to the source of the problem. Brown-paper-bag-fixes-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-24 06:31:34 +08:00
ssize_t ret;
if (fd >= 0 && src_buf && *src_buf)
BUG("multiple sources given to packet_read");
pkt-line: share buffer/descriptor reading implementation The packet_read function reads from a descriptor. The packet_get_line function is similar, but reads from an in-memory buffer, and uses a completely separate implementation. This patch teaches the generic packet_read function to accept either source, and we can do away with packet_get_line's implementation. There are two other differences to account for between the old and new functions. The first is that we used to read into a strbuf, but now read into a fixed size buffer. The only two callers are fine with that, and in fact it simplifies their code, since they can use the same static-buffer interface as the rest of the packet_read_line callers (and we provide a similar convenience wrapper for reading from a buffer rather than a descriptor). This is technically an externally-visible behavior change in that we used to accept arbitrary sized packets up to 65532 bytes, and now cap out at LARGE_PACKET_MAX, 65520. In practice this doesn't matter, as we use it only for parsing smart-http headers (of which there is exactly one defined, and it is small and fixed-size). And any extension headers would be breaking the protocol to go over LARGE_PACKET_MAX anyway. The other difference is that packet_get_line would return on error rather than dying. However, both callers of packet_get_line are actually improved by dying. The first caller does its own error checking, but we can drop that; as a result, we'll actually get more specific reporting about protocol breakage when packet_read dies internally. The only downside is that packet_read will not print the smart-http URL that failed, but that's not a big deal; anybody not debugging can already see the remote's URL already, and anybody debugging would want to run with GIT_CURL_VERBOSE anyway to see way more information. The second caller, which is just trying to skip past any extra smart-http headers (of which there are none defined, but which we allow to keep room for future expansion), did not error check at all. As a result, it would treat an error just like a flush packet. The resulting mess would generally cause an error later in get_remote_heads, but now we get error reporting much closer to the source of the problem. Brown-paper-bag-fixes-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-24 06:31:34 +08:00
/* Read up to "size" bytes from our source, whatever it is. */
if (src_buf && *src_buf) {
ret = size < *src_size ? size : *src_size;
memcpy(dst, *src_buf, ret);
*src_buf += ret;
*src_size -= ret;
} else {
ret = read_in_full(fd, dst, size);
if (ret < 0) {
if (options & PACKET_READ_GENTLE_ON_READ_ERROR)
return error_errno(_("read error"));
die_errno(_("read error"));
}
pkt-line: share buffer/descriptor reading implementation The packet_read function reads from a descriptor. The packet_get_line function is similar, but reads from an in-memory buffer, and uses a completely separate implementation. This patch teaches the generic packet_read function to accept either source, and we can do away with packet_get_line's implementation. There are two other differences to account for between the old and new functions. The first is that we used to read into a strbuf, but now read into a fixed size buffer. The only two callers are fine with that, and in fact it simplifies their code, since they can use the same static-buffer interface as the rest of the packet_read_line callers (and we provide a similar convenience wrapper for reading from a buffer rather than a descriptor). This is technically an externally-visible behavior change in that we used to accept arbitrary sized packets up to 65532 bytes, and now cap out at LARGE_PACKET_MAX, 65520. In practice this doesn't matter, as we use it only for parsing smart-http headers (of which there is exactly one defined, and it is small and fixed-size). And any extension headers would be breaking the protocol to go over LARGE_PACKET_MAX anyway. The other difference is that packet_get_line would return on error rather than dying. However, both callers of packet_get_line are actually improved by dying. The first caller does its own error checking, but we can drop that; as a result, we'll actually get more specific reporting about protocol breakage when packet_read dies internally. The only downside is that packet_read will not print the smart-http URL that failed, but that's not a big deal; anybody not debugging can already see the remote's URL already, and anybody debugging would want to run with GIT_CURL_VERBOSE anyway to see way more information. The second caller, which is just trying to skip past any extra smart-http headers (of which there are none defined, but which we allow to keep room for future expansion), did not error check at all. As a result, it would treat an error just like a flush packet. The resulting mess would generally cause an error later in get_remote_heads, but now we get error reporting much closer to the source of the problem. Brown-paper-bag-fixes-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-24 06:31:34 +08:00
}
/* And complain if we didn't get enough bytes to satisfy the read. */
if (ret != size) {
if (options & PACKET_READ_GENTLE_ON_EOF)
return -1;
if (options & PACKET_READ_GENTLE_ON_READ_ERROR)
return error(_("the remote end hung up unexpectedly"));
die(_("the remote end hung up unexpectedly"));
}
return ret;
}
int packet_length(const char lenbuf_hex[4])
{
int val = hex2chr(lenbuf_hex);
return (val < 0) ? val : (val << 8) | hex2chr(lenbuf_hex + 2);
}
static char *find_packfile_uri_path(const char *buffer)
{
const char *URI_MARK = "://";
char *path;
int len;
/* First char is sideband mark */
buffer += 1;
len = strspn(buffer, "0123456789abcdefABCDEF");
/* size of SHA1 and SHA256 hash */
if (!(len == 40 || len == 64) || buffer[len] != ' ')
return NULL; /* required "<hash>SP" not seen */
path = strstr(buffer + len + 1, URI_MARK);
if (!path)
return NULL;
path = strchr(path + strlen(URI_MARK), '/');
if (!path || !*(path + 1))
return NULL;
/* position after '/' */
return ++path;
}
enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
size_t *src_len, char *buffer,
unsigned size, int *pktlen,
int options)
{
int len;
char linelen[4];
char *uri_path_start;
if (get_packet_data(fd, src_buffer, src_len, linelen, 4, options) < 0) {
*pktlen = -1;
return PACKET_READ_EOF;
}
len = packet_length(linelen);
if (len < 0) {
if (options & PACKET_READ_GENTLE_ON_READ_ERROR)
return error(_("protocol error: bad line length "
"character: %.4s"), linelen);
die(_("protocol error: bad line length character: %.4s"), linelen);
} else if (!len) {
packet_trace("0000", 4, 0);
*pktlen = 0;
return PACKET_READ_FLUSH;
} else if (len == 1) {
packet_trace("0001", 4, 0);
*pktlen = 0;
return PACKET_READ_DELIM;
} else if (len == 2) {
packet_trace("0002", 4, 0);
*pktlen = 0;
return PACKET_READ_RESPONSE_END;
} else if (len < 4) {
if (options & PACKET_READ_GENTLE_ON_READ_ERROR)
return error(_("protocol error: bad line length %d"),
len);
die(_("protocol error: bad line length %d"), len);
}
len -= 4;
if ((unsigned)len >= size) {
if (options & PACKET_READ_GENTLE_ON_READ_ERROR)
return error(_("protocol error: bad line length %d"),
len);
die(_("protocol error: bad line length %d"), len);
}
if (get_packet_data(fd, src_buffer, src_len, buffer, len, options) < 0) {
*pktlen = -1;
return PACKET_READ_EOF;
}
pkt-line: teach packet_read_line to chomp newlines The packets sent during ref negotiation are all terminated by newline; even though the code to chomp these newlines is short, we end up doing it in a lot of places. This patch teaches packet_read_line to auto-chomp the trailing newline; this lets us get rid of a lot of inline chomping code. As a result, some call-sites which are not reading line-oriented data (e.g., when reading chunks of packfiles alongside sideband) transition away from packet_read_line to the generic packet_read interface. This patch converts all of the existing callsites. Since the function signature of packet_read_line does not change (but its behavior does), there is a possibility of new callsites being introduced in later commits, silently introducing an incompatibility. However, since a later patch in this series will change the signature, such a commit would have to be merged directly into this commit, not to the tip of the series; we can therefore ignore the issue. This is an internal cleanup and should produce no change of behavior in the normal case. However, there is one corner case to note. Callers of packet_read_line have never been able to tell the difference between a flush packet ("0000") and an empty packet ("0004"), as both cause packet_read_line to return a length of 0. Readers treat them identically, even though Documentation/technical/protocol-common.txt says we must not; it also says that implementations should not send an empty pkt-line. By stripping out the newline before the result gets to the caller, we will now treat the newline-only packet ("0005\n") the same as an empty packet, which in turn gets treated like a flush packet. In practice this doesn't matter, as neither empty nor newline-only packets are part of git's protocols (at least not for the line-oriented bits, and readers who are not expecting line-oriented packets will be calling packet_read directly, anyway). But even if we do decide to care about the distinction later, it is orthogonal to this patch. The right place to tighten would be to stop treating empty packets as flush packets, and this change does not make doing so any harder. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-21 04:02:28 +08:00
if ((options & PACKET_READ_CHOMP_NEWLINE) &&
len && buffer[len-1] == '\n')
len--;
buffer[len] = 0;
if (options & PACKET_READ_REDACT_URI_PATH &&
(uri_path_start = find_packfile_uri_path(buffer))) {
const char *redacted = "<redacted>";
struct strbuf tracebuf = STRBUF_INIT;
strbuf_insert(&tracebuf, 0, buffer, len);
strbuf_splice(&tracebuf, uri_path_start - buffer,
strlen(uri_path_start), redacted, strlen(redacted));
packet_trace(tracebuf.buf, tracebuf.len, 0);
strbuf_release(&tracebuf);
} else {
packet_trace(buffer, len, 0);
}
if ((options & PACKET_READ_DIE_ON_ERR_PACKET) &&
starts_with(buffer, "ERR "))
die(_("remote error: %s"), buffer + 4);
*pktlen = len;
return PACKET_READ_NORMAL;
}
int packet_read(int fd, char *buffer, unsigned size, int options)
{
int pktlen = -1;
packet_read_with_status(fd, NULL, NULL, buffer, size, &pktlen,
options);
return pktlen;
}
char *packet_read_line(int fd, int *dst_len)
{
int len = packet_read(fd, packet_buffer, sizeof(packet_buffer),
pkt-line: provide a LARGE_PACKET_MAX static buffer Most of the callers of packet_read_line just read into a static 1000-byte buffer (callers which handle arbitrary binary data already use LARGE_PACKET_MAX). This works fine in practice, because: 1. The only variable-sized data in these lines is a ref name, and refs tend to be a lot shorter than 1000 characters. 2. When sending ref lines, git-core always limits itself to 1000 byte packets. However, the only limit given in the protocol specification in Documentation/technical/protocol-common.txt is LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in pack-protocol.txt, and then only describing what we write, not as a specific limit for readers. This patch lets us bump the 1000-byte limit to LARGE_PACKET_MAX. Even though git-core will never write a packet where this makes a difference, there are two good reasons to do this: 1. Other git implementations may have followed protocol-common.txt and used a larger maximum size. We don't bump into it in practice because it would involve very long ref names. 2. We may want to increase the 1000-byte limit one day. Since packets are transferred before any capabilities, it's difficult to do this in a backwards-compatible way. But if we bump the size of buffer the readers can handle, eventually older versions of git will be obsolete enough that we can justify bumping the writers, as well. We don't have plans to do this anytime soon, but there is no reason not to start the clock ticking now. Just bumping all of the reading bufs to LARGE_PACKET_MAX would waste memory. Instead, since most readers just read into a temporary buffer anyway, let's provide a single static buffer that all callers can use. We can further wrap this detail away by having the packet_read_line wrapper just use the buffer transparently and return a pointer to the static storage. That covers most of the cases, and the remaining ones already read into their own LARGE_PACKET_MAX buffers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-21 04:02:57 +08:00
PACKET_READ_CHOMP_NEWLINE);
pkt-line: share buffer/descriptor reading implementation The packet_read function reads from a descriptor. The packet_get_line function is similar, but reads from an in-memory buffer, and uses a completely separate implementation. This patch teaches the generic packet_read function to accept either source, and we can do away with packet_get_line's implementation. There are two other differences to account for between the old and new functions. The first is that we used to read into a strbuf, but now read into a fixed size buffer. The only two callers are fine with that, and in fact it simplifies their code, since they can use the same static-buffer interface as the rest of the packet_read_line callers (and we provide a similar convenience wrapper for reading from a buffer rather than a descriptor). This is technically an externally-visible behavior change in that we used to accept arbitrary sized packets up to 65532 bytes, and now cap out at LARGE_PACKET_MAX, 65520. In practice this doesn't matter, as we use it only for parsing smart-http headers (of which there is exactly one defined, and it is small and fixed-size). And any extension headers would be breaking the protocol to go over LARGE_PACKET_MAX anyway. The other difference is that packet_get_line would return on error rather than dying. However, both callers of packet_get_line are actually improved by dying. The first caller does its own error checking, but we can drop that; as a result, we'll actually get more specific reporting about protocol breakage when packet_read dies internally. The only downside is that packet_read will not print the smart-http URL that failed, but that's not a big deal; anybody not debugging can already see the remote's URL already, and anybody debugging would want to run with GIT_CURL_VERBOSE anyway to see way more information. The second caller, which is just trying to skip past any extra smart-http headers (of which there are none defined, but which we allow to keep room for future expansion), did not error check at all. As a result, it would treat an error just like a flush packet. The resulting mess would generally cause an error later in get_remote_heads, but now we get error reporting much closer to the source of the problem. Brown-paper-bag-fixes-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-24 06:31:34 +08:00
if (dst_len)
*dst_len = len;
return (len > 0) ? packet_buffer : NULL;
}
int packet_read_line_gently(int fd, int *dst_len, char **dst_line)
{
int len = packet_read(fd, packet_buffer, sizeof(packet_buffer),
PACKET_READ_CHOMP_NEWLINE|PACKET_READ_GENTLE_ON_EOF);
if (dst_len)
*dst_len = len;
if (dst_line)
*dst_line = (len > 0) ? packet_buffer : NULL;
return len;
}
ssize_t read_packetized_to_strbuf(int fd_in, struct strbuf *sb_out, int options)
{
int packet_len;
size_t orig_len = sb_out->len;
size_t orig_alloc = sb_out->alloc;
for (;;) {
strbuf_grow(sb_out, LARGE_PACKET_DATA_MAX);
packet_len = packet_read(fd_in,
/* strbuf_grow() above always allocates one extra byte to
* store a '\0' at the end of the string. packet_read()
* writes a '\0' extra byte at the end, too. Let it know
* that there is already room for the extra byte.
*/
sb_out->buf + sb_out->len, LARGE_PACKET_DATA_MAX+1,
options);
if (packet_len <= 0)
break;
sb_out->len += packet_len;
}
if (packet_len < 0) {
if (orig_alloc == 0)
strbuf_release(sb_out);
else
strbuf_setlen(sb_out, orig_len);
return packet_len;
}
return sb_out->len - orig_len;
}
int recv_sideband(const char *me, int in_stream, int out)
{
char buf[LARGE_PACKET_MAX + 1];
int len;
struct strbuf scratch = STRBUF_INIT;
enum sideband_type sideband_type;
while (1) {
sideband: diagnose more sideband anomalies In demultiplex_sideband(), there are two oddities when we check an incoming packet: - if it has zero length, then we assume it's a flush packet. This means we fail to notice the difference between a real flush and a true zero-length packet that's missing its sideband designator. It's not a huge problem in practice because we'd never send a zero-length data packet (even our keepalives are otherwise-empty sideband-1 packets). But it would be nice to detect and report the error, since it's likely to cause other confusion (we think the other side flushed, but they do not). - we try to detect packets missing their designator by checking for "if (len < 1)". But this will never trigger for "len == 0"; we've already detected that and left the function before then. It _could_ detect a negative "len" parameter. But in that case, the error message is wrong. The issue is not "no sideband" but rather "eof while reading the packet". However, this can't actually be triggered in practice, because neither of the two callers uses pkt_read's GENTLE_ON_EOF flag. Which means they'd die with "the remote end hung up unexpectedly" before we even get here. So this truly is dead code. We can improve these cases by passing in a pkt-line status to the demultiplexer, and by having recv_sideband() use GENTLE_ON_EOF. This gives us two improvements: - we can now reliably detect flush packets, and will report a normal packet missing its sideband designator as an error - we'll report an eof with a more detailed "protocol error: eof while reading sideband packet", rather than the generic "the remote end hung up unexpectedly" - when we see an eof, we'll flush the sideband scratch buffer, which may provide some hints from the remote about why they hung up (though note we already flush on newlines, so it's likely that most such messages already made it through) In some sense this patch goes against fbd76cd450 (sideband: reverse its dependency on pkt-line, 2019-01-16), which caused the sideband code not to depend on the pkt-line code. But that commit was really just trying to deal with the circular header dependency. The two modules are conceptually interlinked, and it was just trying to keep things compiling. And indeed, there's a sticking point in this patch: because pkt-line.h includes sideband.h, we can't add the reverse include we need for the sideband code to have an "enum packet_read_status" parameter. Nor can we forward declare it, because you can't forward declare an enum in C. However, C does guarantee that enums fit in an int, so we can just use that type. One alternative would be for the callers to check themselves that they got something sane from the pkt-line code. But besides duplicating logic, this gets quite tricky. Any error condition requires flushing the sideband #2 scratch buffer, which only demultiplex_sideband() knows how to do. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-28 17:33:24 +08:00
int status = packet_read_with_status(in_stream, NULL, NULL,
buf, LARGE_PACKET_MAX,
&len,
PACKET_READ_GENTLE_ON_EOF);
if (!demultiplex_sideband(me, status, buf, len, 0, &scratch,
&sideband_type))
continue;
switch (sideband_type) {
case SIDEBAND_PRIMARY:
write_or_die(out, buf + 1, len - 1);
break;
default: /* errors: message already written */
if (scratch.len > 0)
BUG("unhandled incomplete sideband: '%s'",
scratch.buf);
return sideband_type;
}
}
}
/* Packet Reader Functions */
void packet_reader_init(struct packet_reader *reader, int fd,
char *src_buffer, size_t src_len,
int options)
{
memset(reader, 0, sizeof(*reader));
reader->fd = fd;
reader->src_buffer = src_buffer;
reader->src_len = src_len;
reader->buffer = packet_buffer;
reader->buffer_size = sizeof(packet_buffer);
reader->options = options;
reader->me = "git";
reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
}
enum packet_read_status packet_reader_read(struct packet_reader *reader)
{
struct strbuf scratch = STRBUF_INIT;
if (reader->line_peeked) {
reader->line_peeked = 0;
return reader->status;
}
/*
* Consume all progress packets until a primary payload packet is
* received
*/
while (1) {
enum sideband_type sideband_type;
reader->status = packet_read_with_status(reader->fd,
&reader->src_buffer,
&reader->src_len,
reader->buffer,
reader->buffer_size,
&reader->pktlen,
reader->options);
if (!reader->use_sideband)
break;
sideband: diagnose more sideband anomalies In demultiplex_sideband(), there are two oddities when we check an incoming packet: - if it has zero length, then we assume it's a flush packet. This means we fail to notice the difference between a real flush and a true zero-length packet that's missing its sideband designator. It's not a huge problem in practice because we'd never send a zero-length data packet (even our keepalives are otherwise-empty sideband-1 packets). But it would be nice to detect and report the error, since it's likely to cause other confusion (we think the other side flushed, but they do not). - we try to detect packets missing their designator by checking for "if (len < 1)". But this will never trigger for "len == 0"; we've already detected that and left the function before then. It _could_ detect a negative "len" parameter. But in that case, the error message is wrong. The issue is not "no sideband" but rather "eof while reading the packet". However, this can't actually be triggered in practice, because neither of the two callers uses pkt_read's GENTLE_ON_EOF flag. Which means they'd die with "the remote end hung up unexpectedly" before we even get here. So this truly is dead code. We can improve these cases by passing in a pkt-line status to the demultiplexer, and by having recv_sideband() use GENTLE_ON_EOF. This gives us two improvements: - we can now reliably detect flush packets, and will report a normal packet missing its sideband designator as an error - we'll report an eof with a more detailed "protocol error: eof while reading sideband packet", rather than the generic "the remote end hung up unexpectedly" - when we see an eof, we'll flush the sideband scratch buffer, which may provide some hints from the remote about why they hung up (though note we already flush on newlines, so it's likely that most such messages already made it through) In some sense this patch goes against fbd76cd450 (sideband: reverse its dependency on pkt-line, 2019-01-16), which caused the sideband code not to depend on the pkt-line code. But that commit was really just trying to deal with the circular header dependency. The two modules are conceptually interlinked, and it was just trying to keep things compiling. And indeed, there's a sticking point in this patch: because pkt-line.h includes sideband.h, we can't add the reverse include we need for the sideband code to have an "enum packet_read_status" parameter. Nor can we forward declare it, because you can't forward declare an enum in C. However, C does guarantee that enums fit in an int, so we can just use that type. One alternative would be for the callers to check themselves that they got something sane from the pkt-line code. But besides duplicating logic, this gets quite tricky. Any error condition requires flushing the sideband #2 scratch buffer, which only demultiplex_sideband() knows how to do. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-28 17:33:24 +08:00
if (demultiplex_sideband(reader->me, reader->status,
reader->buffer, reader->pktlen, 1,
&scratch, &sideband_type))
break;
}
if (reader->status == PACKET_READ_NORMAL)
/* Skip the sideband designator if sideband is used */
reader->line = reader->use_sideband ?
reader->buffer + 1 : reader->buffer;
else
reader->line = NULL;
return reader->status;
}
enum packet_read_status packet_reader_peek(struct packet_reader *reader)
{
/* Only allow peeking a single line */
if (reader->line_peeked)
return reader->status;
/* Peek a line by reading it and setting peeked flag */
packet_reader_read(reader);
reader->line_peeked = 1;
return reader->status;
}
void packet_writer_init(struct packet_writer *writer, int dest_fd)
{
writer->dest_fd = dest_fd;
writer->use_sideband = 0;
}
void packet_writer_write(struct packet_writer *writer, const char *fmt, ...)
{
va_list args;
va_start(args, fmt);
packet_write_fmt_1(writer->dest_fd, 0,
writer->use_sideband ? "\001" : "", fmt, args);
va_end(args);
}
void packet_writer_error(struct packet_writer *writer, const char *fmt, ...)
{
va_list args;
va_start(args, fmt);
packet_write_fmt_1(writer->dest_fd, 0,
writer->use_sideband ? "\003" : "ERR ", fmt, args);
va_end(args);
}
void packet_writer_delim(struct packet_writer *writer)
{
packet_delim(writer->dest_fd);
}
void packet_writer_flush(struct packet_writer *writer)
{
packet_flush(writer->dest_fd);
}