linux/net/9p/trans_fd.c

1199 lines
27 KiB
C
Raw Normal View History

// SPDX-License-Identifier: GPL-2.0-only
/*
* Fd transport layer. Includes deprecated socket layer.
*
* Copyright (C) 2006 by Russ Cox <rsc@swtch.com>
* Copyright (C) 2004-2005 by Latchesar Ionkov <lucho@ionkov.net>
* Copyright (C) 2004-2008 by Eric Van Hensbergen <ericvh@gmail.com>
* Copyright (C) 1997-2002 by Ron Minnich <rminnich@sarnoff.com>
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/in.h>
#include <linux/module.h>
#include <linux/net.h>
#include <linux/ipv6.h>
#include <linux/kthread.h>
#include <linux/errno.h>
#include <linux/kernel.h>
#include <linux/un.h>
#include <linux/uaccess.h>
#include <linux/inet.h>
#include <linux/file.h>
#include <linux/parser.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 16:04:11 +08:00
#include <linux/slab.h>
#include <linux/seq_file.h>
#include <net/9p/9p.h>
#include <net/9p/client.h>
#include <net/9p/transport.h>
#include <linux/syscalls.h> /* killme */
#define P9_PORT 564
#define MAX_SOCK_BUF (1024*1024)
#define MAXPOLLWADDR 2
static struct p9_trans_module p9_tcp_trans;
static struct p9_trans_module p9_fd_trans;
/**
* struct p9_fd_opts - per-transport options
* @rfd: file descriptor for reading (trans=fd)
* @wfd: file descriptor for writing (trans=fd)
* @port: port to connect to (trans=tcp)
net: 9p: Fix kerneldoc warnings of missing parameters etc net/9p/client.c:420: warning: Function parameter or member 'c' not described in 'p9_client_cb' net/9p/client.c:420: warning: Function parameter or member 'req' not described in 'p9_client_cb' net/9p/client.c:420: warning: Function parameter or member 'status' not described in 'p9_client_cb' net/9p/client.c:568: warning: Function parameter or member 'uidata' not described in 'p9_check_zc_errors' net/9p/trans_common.c:23: warning: Function parameter or member 'nr_pages' not described in 'p9_release_pages' net/9p/trans_common.c:23: warning: Function parameter or member 'pages' not described in 'p9_release_pages' net/9p/trans_fd.c:132: warning: Function parameter or member 'rreq' not described in 'p9_conn' net/9p/trans_fd.c:132: warning: Function parameter or member 'wreq' not described in 'p9_conn' net/9p/trans_fd.c:56: warning: Function parameter or member 'privport' not described in 'p9_fd_opts' net/9p/trans_rdma.c:113: warning: Function parameter or member 'cqe' not described in 'p9_rdma_context' net/9p/trans_rdma.c:129: warning: Function parameter or member 'privport' not described in 'p9_rdma_opts' net/9p/trans_virtio.c:215: warning: Function parameter or member 'limit' not described in 'pack_sg_list_p' net/9p/trans_virtio.c:83: warning: Function parameter or member 'chan_list' not described in 'virtio_chan' net/9p/trans_virtio.c:83: warning: Function parameter or member 'p9_max_pages' not described in 'virtio_chan' net/9p/trans_virtio.c:83: warning: Function parameter or member 'ring_bufs_avail' not described in 'virtio_chan' net/9p/trans_virtio.c:83: warning: Function parameter or member 'tag' not described in 'virtio_chan' net/9p/trans_virtio.c:83: warning: Function parameter or member 'vc_wq' not described in 'virtio_chan' Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Dominique Martinet <asmadeus@codewreck.org> Link: https://lore.kernel.org/r/20201031182655.1082065-1-andrew@lunn.ch Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-01 02:26:55 +08:00
* @privport: port is privileged
*/
struct p9_fd_opts {
int rfd;
int wfd;
u16 port;
bool privport;
};
/*
* Option Parsing (code inspired by NFS code)
* - a little lazy - parse all fd-transport options
*/
enum {
/* Options that take integer arguments */
Opt_port, Opt_rfdno, Opt_wfdno, Opt_err,
/* Options that take no arguments */
Opt_privport,
};
static const match_table_t tokens = {
{Opt_port, "port=%u"},
{Opt_rfdno, "rfdno=%u"},
{Opt_wfdno, "wfdno=%u"},
{Opt_privport, "privport"},
{Opt_err, NULL},
};
enum {
Rworksched = 1, /* read work scheduled or running */
Rpending = 2, /* can read */
Wworksched = 4, /* write work scheduled or running */
Wpending = 8, /* can write */
};
struct p9_poll_wait {
struct p9_conn *conn;
wait_queue_entry_t wait;
wait_queue_head_t *wait_addr;
};
/**
* struct p9_conn - fd mux connection state information
* @mux_list: list link for mux to manage multiple connections (?)
* @client: reference to client instance for this connection
* @err: error state
* @req_lock: lock protecting req_list and requests statuses
* @req_list: accounting for requests which have been sent
* @unsent_req_list: accounting for requests that haven't been sent
net: 9p: Fix kerneldoc warnings of missing parameters etc net/9p/client.c:420: warning: Function parameter or member 'c' not described in 'p9_client_cb' net/9p/client.c:420: warning: Function parameter or member 'req' not described in 'p9_client_cb' net/9p/client.c:420: warning: Function parameter or member 'status' not described in 'p9_client_cb' net/9p/client.c:568: warning: Function parameter or member 'uidata' not described in 'p9_check_zc_errors' net/9p/trans_common.c:23: warning: Function parameter or member 'nr_pages' not described in 'p9_release_pages' net/9p/trans_common.c:23: warning: Function parameter or member 'pages' not described in 'p9_release_pages' net/9p/trans_fd.c:132: warning: Function parameter or member 'rreq' not described in 'p9_conn' net/9p/trans_fd.c:132: warning: Function parameter or member 'wreq' not described in 'p9_conn' net/9p/trans_fd.c:56: warning: Function parameter or member 'privport' not described in 'p9_fd_opts' net/9p/trans_rdma.c:113: warning: Function parameter or member 'cqe' not described in 'p9_rdma_context' net/9p/trans_rdma.c:129: warning: Function parameter or member 'privport' not described in 'p9_rdma_opts' net/9p/trans_virtio.c:215: warning: Function parameter or member 'limit' not described in 'pack_sg_list_p' net/9p/trans_virtio.c:83: warning: Function parameter or member 'chan_list' not described in 'virtio_chan' net/9p/trans_virtio.c:83: warning: Function parameter or member 'p9_max_pages' not described in 'virtio_chan' net/9p/trans_virtio.c:83: warning: Function parameter or member 'ring_bufs_avail' not described in 'virtio_chan' net/9p/trans_virtio.c:83: warning: Function parameter or member 'tag' not described in 'virtio_chan' net/9p/trans_virtio.c:83: warning: Function parameter or member 'vc_wq' not described in 'virtio_chan' Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Dominique Martinet <asmadeus@codewreck.org> Link: https://lore.kernel.org/r/20201031182655.1082065-1-andrew@lunn.ch Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-01 02:26:55 +08:00
* @rreq: read request
* @wreq: write request
* @req: current request being processed (if any)
* @tmp_buf: temporary buffer to read in header
* @rc: temporary fcall for reading current frame
* @wpos: write position for current frame
* @wsize: amount of data to write for current frame
* @wbuf: current write buffer
* @poll_pending_link: pending links to be polled per conn
* @poll_wait: array of wait_q's for various worker threads
* @pt: poll state
* @rq: current read work
* @wq: current write work
* @wsched: ????
*
*/
struct p9_conn {
struct list_head mux_list;
struct p9_client *client;
int err;
spinlock_t req_lock;
struct list_head req_list;
struct list_head unsent_req_list;
struct p9_req_t *rreq;
struct p9_req_t *wreq;
char tmp_buf[P9_HDRSZ];
struct p9_fcall rc;
int wpos;
int wsize;
char *wbuf;
struct list_head poll_pending_link;
struct p9_poll_wait poll_wait[MAXPOLLWADDR];
poll_table pt;
struct work_struct rq;
struct work_struct wq;
unsigned long wsched;
};
/**
* struct p9_trans_fd - transport state
* @rd: reference to file to read from
* @wr: reference of file to write to
* @conn: connection state reference
*
*/
struct p9_trans_fd {
struct file *rd;
struct file *wr;
struct p9_conn conn;
};
static void p9_poll_workfn(struct work_struct *work);
static DEFINE_SPINLOCK(p9_poll_lock);
static LIST_HEAD(p9_poll_pending_list);
static DECLARE_WORK(p9_poll_work, p9_poll_workfn);
static unsigned int p9_ipport_resv_min = P9_DEF_MIN_RESVPORT;
static unsigned int p9_ipport_resv_max = P9_DEF_MAX_RESVPORT;
static void p9_mux_poll_stop(struct p9_conn *m)
{
unsigned long flags;
int i;
for (i = 0; i < ARRAY_SIZE(m->poll_wait); i++) {
struct p9_poll_wait *pwait = &m->poll_wait[i];
if (pwait->wait_addr) {
remove_wait_queue(pwait->wait_addr, &pwait->wait);
pwait->wait_addr = NULL;
}
}
spin_lock_irqsave(&p9_poll_lock, flags);
list_del_init(&m->poll_pending_link);
spin_unlock_irqrestore(&p9_poll_lock, flags);
flush_work(&p9_poll_work);
}
/**
* p9_conn_cancel - cancel all pending requests with error
* @m: mux data
* @err: error code
*
*/
static void p9_conn_cancel(struct p9_conn *m, int err)
{
struct p9_req_t *req, *rtmp;
LIST_HEAD(cancel_list);
p9_debug(P9_DEBUG_ERROR, "mux %p err %d\n", m, err);
spin_lock(&m->req_lock);
if (m->err) {
spin_unlock(&m->req_lock);
return;
}
m->err = err;
list_for_each_entry_safe(req, rtmp, &m->req_list, req_list) {
list_move(&req->req_list, &cancel_list);
WRITE_ONCE(req->status, REQ_STATUS_ERROR);
}
list_for_each_entry_safe(req, rtmp, &m->unsent_req_list, req_list) {
list_move(&req->req_list, &cancel_list);
WRITE_ONCE(req->status, REQ_STATUS_ERROR);
}
spin_unlock(&m->req_lock);
list_for_each_entry_safe(req, rtmp, &cancel_list, req_list) {
p9_debug(P9_DEBUG_ERROR, "call back req %p\n", req);
list_del(&req->req_list);
if (!req->t_err)
req->t_err = err;
p9_client_cb(m->client, req, REQ_STATUS_ERROR);
}
}
static __poll_t
p9_fd_poll(struct p9_client *client, struct poll_table_struct *pt, int *err)
{
__poll_t ret;
struct p9_trans_fd *ts = NULL;
if (client && client->status == Connected)
ts = client->trans;
if (!ts) {
if (err)
*err = -EREMOTEIO;
return EPOLLERR;
}
ret = vfs_poll(ts->rd, pt);
if (ts->rd != ts->wr)
ret = (ret & ~EPOLLOUT) | (vfs_poll(ts->wr, pt) & ~EPOLLIN);
return ret;
}
/**
* p9_fd_read- read from a fd
* @client: client instance
* @v: buffer to receive data into
* @len: size of receive buffer
*
*/
static int p9_fd_read(struct p9_client *client, void *v, int len)
{
int ret;
struct p9_trans_fd *ts = NULL;
loff_t pos;
if (client && client->status != Disconnected)
ts = client->trans;
if (!ts)
return -EREMOTEIO;
if (!(ts->rd->f_flags & O_NONBLOCK))
p9_debug(P9_DEBUG_ERROR, "blocking read ...\n");
pos = ts->rd->f_pos;
ret = kernel_read(ts->rd, v, len, &pos);
if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN)
client->status = Disconnected;
return ret;
}
/**
* p9_read_work - called when there is some data to be read from a transport
* @work: container of work to be done
*
*/
static void p9_read_work(struct work_struct *work)
{
__poll_t n;
int err;
struct p9_conn *m;
m = container_of(work, struct p9_conn, rq);
if (m->err < 0)
return;
p9_debug(P9_DEBUG_TRANS, "start mux %p pos %zd\n", m, m->rc.offset);
if (!m->rc.sdata) {
m->rc.sdata = m->tmp_buf;
m->rc.offset = 0;
m->rc.capacity = P9_HDRSZ; /* start by reading header */
}
clear_bit(Rpending, &m->wsched);
p9_debug(P9_DEBUG_TRANS, "read mux %p pos %zd size: %zd = %zd\n",
m, m->rc.offset, m->rc.capacity,
m->rc.capacity - m->rc.offset);
err = p9_fd_read(m->client, m->rc.sdata + m->rc.offset,
m->rc.capacity - m->rc.offset);
p9_debug(P9_DEBUG_TRANS, "mux %p got %d bytes\n", m, err);
if (err == -EAGAIN)
goto end_clear;
if (err <= 0)
goto error;
m->rc.offset += err;
/* header read in */
if ((!m->rreq) && (m->rc.offset == m->rc.capacity)) {
p9_debug(P9_DEBUG_TRANS, "got new header\n");
/* Header size */
m->rc.size = P9_HDRSZ;
err = p9_parse_header(&m->rc, &m->rc.size, NULL, NULL, 0);
if (err) {
p9_debug(P9_DEBUG_ERROR,
"error parsing header: %d\n", err);
goto error;
}
p9_debug(P9_DEBUG_TRANS,
"mux %p pkt: size: %d bytes tag: %d\n",
m, m->rc.size, m->rc.tag);
m->rreq = p9_tag_lookup(m->client, m->rc.tag);
if (!m->rreq || (m->rreq->status != REQ_STATUS_SENT)) {
p9_debug(P9_DEBUG_ERROR, "Unexpected packet tag %d\n",
m->rc.tag);
err = -EIO;
goto error;
}
9p/fd: Fix write overflow in p9_read_work This error was reported while fuzzing: BUG: KASAN: slab-out-of-bounds in _copy_to_iter+0xd35/0x1190 Write of size 4043 at addr ffff888008724eb1 by task kworker/1:1/24 CPU: 1 PID: 24 Comm: kworker/1:1 Not tainted 6.1.0-rc5-00002-g1adf73218daa-dirty #223 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Workqueue: events p9_read_work Call Trace: <TASK> dump_stack_lvl+0x4c/0x64 print_report+0x178/0x4b0 kasan_report+0xae/0x130 kasan_check_range+0x179/0x1e0 memcpy+0x38/0x60 _copy_to_iter+0xd35/0x1190 copy_page_to_iter+0x1d5/0xb00 pipe_read+0x3a1/0xd90 __kernel_read+0x2a5/0x760 kernel_read+0x47/0x60 p9_read_work+0x463/0x780 process_one_work+0x91d/0x1300 worker_thread+0x8c/0x1210 kthread+0x280/0x330 ret_from_fork+0x22/0x30 </TASK> Allocated by task 457: kasan_save_stack+0x1c/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x7e/0x90 __kmalloc+0x59/0x140 p9_fcall_init.isra.11+0x5d/0x1c0 p9_tag_alloc+0x251/0x550 p9_client_prepare_req+0x162/0x350 p9_client_rpc+0x18d/0xa90 p9_client_create+0x670/0x14e0 v9fs_session_init+0x1fd/0x14f0 v9fs_mount+0xd7/0xaf0 legacy_get_tree+0xf3/0x1f0 vfs_get_tree+0x86/0x2c0 path_mount+0x885/0x1940 do_mount+0xec/0x100 __x64_sys_mount+0x1a0/0x1e0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd This BUG pops up when trying to reproduce https://syzkaller.appspot.com/bug?id=6c7cd46c7bdd0e86f95d26ec3153208ad186f9fa The callstack is different but the issue is valid and re-producable with the same re-producer in the link. The root cause of this issue is that we check the size of the message received against the msize of the client in p9_read_work. However, it turns out that capacity is no longer consistent with msize. Thus, the message size should be checked against sdata capacity. As the msize is non-consistant with the capacity of the tag and as we are now checking message size against capacity directly, there is no point checking message size against msize. So remove it. Link: https://lkml.kernel.org/r/20221117091159.31533-2-guozihua@huawei.com Link: https://lkml.kernel.org/r/20221117091159.31533-3-guozihua@huawei.com Reported-by: syzbot+0f89bd13eaceccc0e126@syzkaller.appspotmail.com Fixes: 60ece0833b6c ("net/9p: allocate appropriate reduced message buffers") Signed-off-by: GUO Zihua <guozihua@huawei.com> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> [Dominique: squash patches 1 & 2 and fix size including header part] Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-11-17 17:11:57 +08:00
if (m->rc.size > m->rreq->rc.capacity) {
p9_debug(P9_DEBUG_ERROR,
"requested packet size too big: %d for tag %d with capacity %zd\n",
m->rc.size, m->rc.tag, m->rreq->rc.capacity);
err = -EIO;
goto error;
}
if (!m->rreq->rc.sdata) {
p9_debug(P9_DEBUG_ERROR,
"No recv fcall for tag %d (req %p), disconnecting!\n",
m->rc.tag, m->rreq);
p9_req_put(m->client, m->rreq);
m->rreq = NULL;
err = -EIO;
goto error;
}
m->rc.sdata = m->rreq->rc.sdata;
memcpy(m->rc.sdata, m->tmp_buf, m->rc.capacity);
m->rc.capacity = m->rc.size;
}
/* packet is read in
* not an else because some packets (like clunk) have no payload
*/
if ((m->rreq) && (m->rc.offset == m->rc.capacity)) {
p9_debug(P9_DEBUG_TRANS, "got new packet\n");
m->rreq->rc.size = m->rc.offset;
spin_lock(&m->req_lock);
if (m->rreq->status == REQ_STATUS_SENT) {
list_del(&m->rreq->req_list);
p9_client_cb(m->client, m->rreq, REQ_STATUS_RCVD);
} else if (m->rreq->status == REQ_STATUS_FLSHD) {
/* Ignore replies associated with a cancelled request. */
p9_debug(P9_DEBUG_TRANS,
"Ignore replies associated with a cancelled request\n");
} else {
spin_unlock(&m->req_lock);
p9_debug(P9_DEBUG_ERROR,
"Request tag %d errored out while we were reading the reply\n",
m->rc.tag);
err = -EIO;
goto error;
}
spin_unlock(&m->req_lock);
m->rc.sdata = NULL;
m->rc.offset = 0;
m->rc.capacity = 0;
p9_req_put(m->client, m->rreq);
m->rreq = NULL;
}
end_clear:
clear_bit(Rworksched, &m->wsched);
if (!list_empty(&m->req_list)) {
if (test_and_clear_bit(Rpending, &m->wsched))
n = EPOLLIN;
else
n = p9_fd_poll(m->client, NULL, NULL);
if ((n & EPOLLIN) && !test_and_set_bit(Rworksched, &m->wsched)) {
p9_debug(P9_DEBUG_TRANS, "sched read work %p\n", m);
schedule_work(&m->rq);
}
}
return;
error:
p9_conn_cancel(m, err);
clear_bit(Rworksched, &m->wsched);
}
/**
* p9_fd_write - write to a socket
* @client: client instance
* @v: buffer to send data from
* @len: size of send buffer
*
*/
static int p9_fd_write(struct p9_client *client, void *v, int len)
{
ssize_t ret;
struct p9_trans_fd *ts = NULL;
if (client && client->status != Disconnected)
ts = client->trans;
if (!ts)
return -EREMOTEIO;
if (!(ts->wr->f_flags & O_NONBLOCK))
p9_debug(P9_DEBUG_ERROR, "blocking write ...\n");
ret = kernel_write(ts->wr, v, len, &ts->wr->f_pos);
if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN)
client->status = Disconnected;
return ret;
}
/**
* p9_write_work - called when a transport can send some data
* @work: container for work to be done
*
*/
static void p9_write_work(struct work_struct *work)
{
__poll_t n;
int err;
struct p9_conn *m;
struct p9_req_t *req;
m = container_of(work, struct p9_conn, wq);
if (m->err < 0) {
clear_bit(Wworksched, &m->wsched);
return;
}
if (!m->wsize) {
spin_lock(&m->req_lock);
if (list_empty(&m->unsent_req_list)) {
clear_bit(Wworksched, &m->wsched);
spin_unlock(&m->req_lock);
return;
}
req = list_entry(m->unsent_req_list.next, struct p9_req_t,
req_list);
WRITE_ONCE(req->status, REQ_STATUS_SENT);
p9_debug(P9_DEBUG_TRANS, "move req %p\n", req);
list_move_tail(&req->req_list, &m->req_list);
m->wbuf = req->tc.sdata;
m->wsize = req->tc.size;
m->wpos = 0;
p9_req_get(req);
m->wreq = req;
spin_unlock(&m->req_lock);
}
p9_debug(P9_DEBUG_TRANS, "mux %p pos %d size %d\n",
m, m->wpos, m->wsize);
clear_bit(Wpending, &m->wsched);
err = p9_fd_write(m->client, m->wbuf + m->wpos, m->wsize - m->wpos);
p9_debug(P9_DEBUG_TRANS, "mux %p sent %d bytes\n", m, err);
if (err == -EAGAIN)
goto end_clear;
if (err < 0)
goto error;
else if (err == 0) {
err = -EREMOTEIO;
goto error;
}
m->wpos += err;
if (m->wpos == m->wsize) {
m->wpos = m->wsize = 0;
p9_req_put(m->client, m->wreq);
m->wreq = NULL;
}
end_clear:
clear_bit(Wworksched, &m->wsched);
if (m->wsize || !list_empty(&m->unsent_req_list)) {
if (test_and_clear_bit(Wpending, &m->wsched))
n = EPOLLOUT;
else
n = p9_fd_poll(m->client, NULL, NULL);
if ((n & EPOLLOUT) &&
!test_and_set_bit(Wworksched, &m->wsched)) {
p9_debug(P9_DEBUG_TRANS, "sched write work %p\n", m);
schedule_work(&m->wq);
}
}
return;
error:
p9_conn_cancel(m, err);
clear_bit(Wworksched, &m->wsched);
}
static int p9_pollwake(wait_queue_entry_t *wait, unsigned int mode, int sync, void *key)
{
struct p9_poll_wait *pwait =
container_of(wait, struct p9_poll_wait, wait);
struct p9_conn *m = pwait->conn;
unsigned long flags;
spin_lock_irqsave(&p9_poll_lock, flags);
if (list_empty(&m->poll_pending_link))
list_add_tail(&m->poll_pending_link, &p9_poll_pending_list);
spin_unlock_irqrestore(&p9_poll_lock, flags);
schedule_work(&p9_poll_work);
return 1;
}
/**
* p9_pollwait - add poll task to the wait queue
* @filp: file pointer being polled
* @wait_address: wait_q to block on
* @p: poll state
*
* called by files poll operation to add v9fs-poll task to files wait queue
*/
static void
p9_pollwait(struct file *filp, wait_queue_head_t *wait_address, poll_table *p)
{
struct p9_conn *m = container_of(p, struct p9_conn, pt);
struct p9_poll_wait *pwait = NULL;
int i;
for (i = 0; i < ARRAY_SIZE(m->poll_wait); i++) {
if (m->poll_wait[i].wait_addr == NULL) {
pwait = &m->poll_wait[i];
break;
}
}
if (!pwait) {
p9_debug(P9_DEBUG_ERROR, "not enough wait_address slots\n");
return;
}
pwait->conn = m;
pwait->wait_addr = wait_address;
init_waitqueue_func_entry(&pwait->wait, p9_pollwake);
add_wait_queue(wait_address, &pwait->wait);
}
/**
* p9_conn_create - initialize the per-session mux data
* @client: client instance
*
* Note: Creates the polling task if this is the first session.
*/
static void p9_conn_create(struct p9_client *client)
{
__poll_t n;
struct p9_trans_fd *ts = client->trans;
struct p9_conn *m = &ts->conn;
p9_debug(P9_DEBUG_TRANS, "client %p msize %d\n", client, client->msize);
INIT_LIST_HEAD(&m->mux_list);
m->client = client;
spin_lock_init(&m->req_lock);
INIT_LIST_HEAD(&m->req_list);
INIT_LIST_HEAD(&m->unsent_req_list);
INIT_WORK(&m->rq, p9_read_work);
INIT_WORK(&m->wq, p9_write_work);
INIT_LIST_HEAD(&m->poll_pending_link);
init_poll_funcptr(&m->pt, p9_pollwait);
n = p9_fd_poll(client, &m->pt, NULL);
if (n & EPOLLIN) {
p9_debug(P9_DEBUG_TRANS, "mux %p can read\n", m);
set_bit(Rpending, &m->wsched);
}
if (n & EPOLLOUT) {
p9_debug(P9_DEBUG_TRANS, "mux %p can write\n", m);
set_bit(Wpending, &m->wsched);
}
}
/**
* p9_poll_mux - polls a mux and schedules read or write works if necessary
* @m: connection to poll
*
*/
static void p9_poll_mux(struct p9_conn *m)
{
__poll_t n;
int err = -ECONNRESET;
if (m->err < 0)
return;
n = p9_fd_poll(m->client, NULL, &err);
if (n & (EPOLLERR | EPOLLHUP | EPOLLNVAL)) {
p9_debug(P9_DEBUG_TRANS, "error mux %p err %d\n", m, n);
p9_conn_cancel(m, err);
}
if (n & EPOLLIN) {
set_bit(Rpending, &m->wsched);
p9_debug(P9_DEBUG_TRANS, "mux %p can read\n", m);
if (!test_and_set_bit(Rworksched, &m->wsched)) {
p9_debug(P9_DEBUG_TRANS, "sched read work %p\n", m);
schedule_work(&m->rq);
}
}
if (n & EPOLLOUT) {
set_bit(Wpending, &m->wsched);
p9_debug(P9_DEBUG_TRANS, "mux %p can write\n", m);
if ((m->wsize || !list_empty(&m->unsent_req_list)) &&
!test_and_set_bit(Wworksched, &m->wsched)) {
p9_debug(P9_DEBUG_TRANS, "sched write work %p\n", m);
schedule_work(&m->wq);
}
}
}
/**
* p9_fd_request - send 9P request
* The function can sleep until the request is scheduled for sending.
* The function can be interrupted. Return from the function is not
* a guarantee that the request is sent successfully.
*
* @client: client instance
* @req: request to be sent
*
*/
static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
{
__poll_t n;
struct p9_trans_fd *ts = client->trans;
struct p9_conn *m = &ts->conn;
p9_debug(P9_DEBUG_TRANS, "mux %p task %p tcall %p id %d\n",
m, current, &req->tc, req->tc.id);
if (m->err < 0)
return m->err;
spin_lock(&m->req_lock);
WRITE_ONCE(req->status, REQ_STATUS_UNSENT);
list_add_tail(&req->req_list, &m->unsent_req_list);
spin_unlock(&m->req_lock);
if (test_and_clear_bit(Wpending, &m->wsched))
n = EPOLLOUT;
else
n = p9_fd_poll(m->client, NULL, NULL);
if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
schedule_work(&m->wq);
return 0;
}
static int p9_fd_cancel(struct p9_client *client, struct p9_req_t *req)
{
struct p9_trans_fd *ts = client->trans;
struct p9_conn *m = &ts->conn;
int ret = 1;
p9_debug(P9_DEBUG_TRANS, "client %p req %p\n", client, req);
spin_lock(&m->req_lock);
if (req->status == REQ_STATUS_UNSENT) {
list_del(&req->req_list);
WRITE_ONCE(req->status, REQ_STATUS_FLSHD);
p9_req_put(client, req);
ret = 0;
}
spin_unlock(&m->req_lock);
return ret;
}
static int p9_fd_cancelled(struct p9_client *client, struct p9_req_t *req)
{
struct p9_trans_fd *ts = client->trans;
struct p9_conn *m = &ts->conn;
p9_debug(P9_DEBUG_TRANS, "client %p req %p\n", client, req);
spin_lock(&m->req_lock);
/* Ignore cancelled request if message has been received
* before lock.
*/
if (req->status == REQ_STATUS_RCVD) {
spin_unlock(&m->req_lock);
return 0;
}
/* we haven't received a response for oldreq,
* remove it from the list.
*/
list_del(&req->req_list);
WRITE_ONCE(req->status, REQ_STATUS_FLSHD);
spin_unlock(&m->req_lock);
p9_req_put(client, req);
return 0;
}
static int p9_fd_show_options(struct seq_file *m, struct p9_client *clnt)
{
if (clnt->trans_mod == &p9_tcp_trans) {
if (clnt->trans_opts.tcp.port != P9_PORT)
seq_printf(m, ",port=%u", clnt->trans_opts.tcp.port);
} else if (clnt->trans_mod == &p9_fd_trans) {
if (clnt->trans_opts.fd.rfd != ~0)
seq_printf(m, ",rfd=%u", clnt->trans_opts.fd.rfd);
if (clnt->trans_opts.fd.wfd != ~0)
seq_printf(m, ",wfd=%u", clnt->trans_opts.fd.wfd);
}
return 0;
}
/**
* parse_opts - parse mount options into p9_fd_opts structure
* @params: options string passed from mount
* @opts: fd transport-specific structure to parse options into
*
* Returns 0 upon success, -ERRNO upon failure
*/
static int parse_opts(char *params, struct p9_fd_opts *opts)
{
char *p;
substring_t args[MAX_OPT_ARGS];
int option;
char *options, *tmp_options;
opts->port = P9_PORT;
opts->rfd = ~0;
opts->wfd = ~0;
opts->privport = false;
if (!params)
return 0;
tmp_options = kstrdup(params, GFP_KERNEL);
if (!tmp_options) {
p9_debug(P9_DEBUG_ERROR,
"failed to allocate copy of option string\n");
return -ENOMEM;
}
options = tmp_options;
while ((p = strsep(&options, ",")) != NULL) {
int token;
int r;
if (!*p)
continue;
token = match_token(p, tokens, args);
if ((token != Opt_err) && (token != Opt_privport)) {
r = match_int(&args[0], &option);
if (r < 0) {
p9_debug(P9_DEBUG_ERROR,
"integer field, but no integer?\n");
continue;
}
}
switch (token) {
case Opt_port:
opts->port = option;
break;
case Opt_rfdno:
opts->rfd = option;
break;
case Opt_wfdno:
opts->wfd = option;
break;
case Opt_privport:
opts->privport = true;
break;
default:
continue;
}
}
kfree(tmp_options);
return 0;
}
static int p9_fd_open(struct p9_client *client, int rfd, int wfd)
{
struct p9_trans_fd *ts = kzalloc(sizeof(struct p9_trans_fd),
GFP_KERNEL);
if (!ts)
return -ENOMEM;
ts->rd = fget(rfd);
if (!ts->rd)
goto out_free_ts;
if (!(ts->rd->f_mode & FMODE_READ))
goto out_put_rd;
9p/trans_fd: always use O_NONBLOCK read/write syzbot is reporting hung task at p9_fd_close() [1], for p9_mux_poll_stop() from p9_conn_destroy() from p9_fd_close() is failing to interrupt already started kernel_read() from p9_fd_read() from p9_read_work() and/or kernel_write() from p9_fd_write() from p9_write_work() requests. Since p9_socket_open() sets O_NONBLOCK flag, p9_mux_poll_stop() does not need to interrupt kernel_read()/kernel_write(). However, since p9_fd_open() does not set O_NONBLOCK flag, but pipe blocks unless signal is pending, p9_mux_poll_stop() needs to interrupt kernel_read()/kernel_write() when the file descriptor refers to a pipe. In other words, pipe file descriptor needs to be handled as if socket file descriptor. We somehow need to interrupt kernel_read()/kernel_write() on pipes. A minimal change, which this patch is doing, is to set O_NONBLOCK flag from p9_fd_open(), for O_NONBLOCK flag does not affect reading/writing of regular files. But this approach changes O_NONBLOCK flag on userspace- supplied file descriptors (which might break userspace programs), and O_NONBLOCK flag could be changed by userspace. It would be possible to set O_NONBLOCK flag every time p9_fd_read()/p9_fd_write() is invoked, but still remains small race window for clearing O_NONBLOCK flag. If we don't want to manipulate O_NONBLOCK flag, we might be able to surround kernel_read()/kernel_write() with set_thread_flag(TIF_SIGPENDING) and recalc_sigpending(). Since p9_read_work()/p9_write_work() works are processed by kernel threads which process global system_wq workqueue, signals could not be delivered from remote threads when p9_mux_poll_stop() from p9_conn_destroy() from p9_fd_close() is called. Therefore, calling set_thread_flag(TIF_SIGPENDING)/recalc_sigpending() every time would be needed if we count on signals for making kernel_read()/kernel_write() non-blocking. Link: https://lkml.kernel.org/r/345de429-a88b-7097-d177-adecf9fed342@I-love.SAKURA.ne.jp Link: https://syzkaller.appspot.com/bug?extid=8b41a1365f1106fd0f33 [1] Reported-by: syzbot <syzbot+8b41a1365f1106fd0f33@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot <syzbot+8b41a1365f1106fd0f33@syzkaller.appspotmail.com> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> [Dominique: add comment at Christian's suggestion] Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-08-26 23:27:46 +08:00
/* prevent workers from hanging on IO when fd is a pipe */
ts->rd->f_flags |= O_NONBLOCK;
ts->wr = fget(wfd);
if (!ts->wr)
goto out_put_rd;
if (!(ts->wr->f_mode & FMODE_WRITE))
goto out_put_wr;
9p/trans_fd: always use O_NONBLOCK read/write syzbot is reporting hung task at p9_fd_close() [1], for p9_mux_poll_stop() from p9_conn_destroy() from p9_fd_close() is failing to interrupt already started kernel_read() from p9_fd_read() from p9_read_work() and/or kernel_write() from p9_fd_write() from p9_write_work() requests. Since p9_socket_open() sets O_NONBLOCK flag, p9_mux_poll_stop() does not need to interrupt kernel_read()/kernel_write(). However, since p9_fd_open() does not set O_NONBLOCK flag, but pipe blocks unless signal is pending, p9_mux_poll_stop() needs to interrupt kernel_read()/kernel_write() when the file descriptor refers to a pipe. In other words, pipe file descriptor needs to be handled as if socket file descriptor. We somehow need to interrupt kernel_read()/kernel_write() on pipes. A minimal change, which this patch is doing, is to set O_NONBLOCK flag from p9_fd_open(), for O_NONBLOCK flag does not affect reading/writing of regular files. But this approach changes O_NONBLOCK flag on userspace- supplied file descriptors (which might break userspace programs), and O_NONBLOCK flag could be changed by userspace. It would be possible to set O_NONBLOCK flag every time p9_fd_read()/p9_fd_write() is invoked, but still remains small race window for clearing O_NONBLOCK flag. If we don't want to manipulate O_NONBLOCK flag, we might be able to surround kernel_read()/kernel_write() with set_thread_flag(TIF_SIGPENDING) and recalc_sigpending(). Since p9_read_work()/p9_write_work() works are processed by kernel threads which process global system_wq workqueue, signals could not be delivered from remote threads when p9_mux_poll_stop() from p9_conn_destroy() from p9_fd_close() is called. Therefore, calling set_thread_flag(TIF_SIGPENDING)/recalc_sigpending() every time would be needed if we count on signals for making kernel_read()/kernel_write() non-blocking. Link: https://lkml.kernel.org/r/345de429-a88b-7097-d177-adecf9fed342@I-love.SAKURA.ne.jp Link: https://syzkaller.appspot.com/bug?extid=8b41a1365f1106fd0f33 [1] Reported-by: syzbot <syzbot+8b41a1365f1106fd0f33@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot <syzbot+8b41a1365f1106fd0f33@syzkaller.appspotmail.com> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> [Dominique: add comment at Christian's suggestion] Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-08-26 23:27:46 +08:00
ts->wr->f_flags |= O_NONBLOCK;
client->trans = ts;
client->status = Connected;
return 0;
out_put_wr:
fput(ts->wr);
out_put_rd:
fput(ts->rd);
out_free_ts:
kfree(ts);
return -EIO;
}
static int p9_socket_open(struct p9_client *client, struct socket *csocket)
{
struct p9_trans_fd *p;
struct file *file;
p = kzalloc(sizeof(struct p9_trans_fd), GFP_KERNEL);
if (!p) {
sock_release(csocket);
return -ENOMEM;
}
csocket->sk->sk_allocation = GFP_NOIO;
Treewide: Stop corrupting socket's task_frag Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the GFP_NOIO flag on sk_allocation which the networking system uses to decide when it is safe to use current->task_frag. The results of this are unexpected corruption in task_frag when SUNRPC is involved in memory reclaim. The corruption can be seen in crashes, but the root cause is often difficult to ascertain as a crashing machine's stack trace will have no evidence of being near NFS or SUNRPC code. I believe this problem to be much more pervasive than reports to the community may indicate. Fix this by having kernel users of sockets that may corrupt task_frag due to reclaim set sk_use_task_frag = false. Preemptively correcting this situation for users that still set sk_allocation allows them to convert to memalloc_nofs_save/restore without the same unexpected corruptions that are sure to follow, unlikely to show up in testing, and difficult to bisect. CC: Philipp Reisner <philipp.reisner@linbit.com> CC: Lars Ellenberg <lars.ellenberg@linbit.com> CC: "Christoph Böhmwalder" <christoph.boehmwalder@linbit.com> CC: Jens Axboe <axboe@kernel.dk> CC: Josef Bacik <josef@toxicpanda.com> CC: Keith Busch <kbusch@kernel.org> CC: Christoph Hellwig <hch@lst.de> CC: Sagi Grimberg <sagi@grimberg.me> CC: Lee Duncan <lduncan@suse.com> CC: Chris Leech <cleech@redhat.com> CC: Mike Christie <michael.christie@oracle.com> CC: "James E.J. Bottomley" <jejb@linux.ibm.com> CC: "Martin K. Petersen" <martin.petersen@oracle.com> CC: Valentina Manea <valentina.manea.m@gmail.com> CC: Shuah Khan <shuah@kernel.org> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CC: David Howells <dhowells@redhat.com> CC: Marc Dionne <marc.dionne@auristor.com> CC: Steve French <sfrench@samba.org> CC: Christine Caulfield <ccaulfie@redhat.com> CC: David Teigland <teigland@redhat.com> CC: Mark Fasheh <mark@fasheh.com> CC: Joel Becker <jlbec@evilplan.org> CC: Joseph Qi <joseph.qi@linux.alibaba.com> CC: Eric Van Hensbergen <ericvh@gmail.com> CC: Latchesar Ionkov <lucho@ionkov.net> CC: Dominique Martinet <asmadeus@codewreck.org> CC: Ilya Dryomov <idryomov@gmail.com> CC: Xiubo Li <xiubli@redhat.com> CC: Chuck Lever <chuck.lever@oracle.com> CC: Jeff Layton <jlayton@kernel.org> CC: Trond Myklebust <trond.myklebust@hammerspace.com> CC: Anna Schumaker <anna@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-16 20:45:27 +08:00
csocket->sk->sk_use_task_frag = false;
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs update from Al Viro: - big one - consolidation of descriptor-related logics; almost all of that is moved to fs/file.c (BTW, I'm seriously tempted to rename the result to fd.c. As it is, we have a situation when file_table.c is about handling of struct file and file.c is about handling of descriptor tables; the reasons are historical - file_table.c used to be about a static array of struct file we used to have way back). A lot of stray ends got cleaned up and converted to saner primitives, disgusting mess in android/binder.c is still disgusting, but at least doesn't poke so much in descriptor table guts anymore. A bunch of relatively minor races got fixed in process, plus an ext4 struct file leak. - related thing - fget_light() partially unuglified; see fdget() in there (and yes, it generates the code as good as we used to have). - also related - bits of Cyrill's procfs stuff that got entangled into that work; _not_ all of it, just the initial move to fs/proc/fd.c and switch of fdinfo to seq_file. - Alex's fs/coredump.c spiltoff - the same story, had been easier to take that commit than mess with conflicts. The rest is a separate pile, this was just a mechanical code movement. - a few misc patches all over the place. Not all for this cycle, there'll be more (and quite a few currently sit in akpm's tree)." Fix up trivial conflicts in the android binder driver, and some fairly simple conflicts due to two different changes to the sock_alloc_file() interface ("take descriptor handling from sock_alloc_file() to callers" vs "net: Providing protocol type via system.sockprotoname xattr of /proc/PID/fd entries" adding a dentry name to the socket) * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits) MAX_LFS_FILESIZE should be a loff_t compat: fs: Generic compat_sys_sendfile implementation fs: push rcu_barrier() from deactivate_locked_super() to filesystems btrfs: reada_extent doesn't need kref for refcount coredump: move core dump functionality into its own file coredump: prevent double-free on an error path in core dumper usb/gadget: fix misannotations fcntl: fix misannotations ceph: don't abuse d_delete() on failure exits hypfs: ->d_parent is never NULL or negative vfs: delete surplus inode NULL check switch simple cases of fget_light to fdget new helpers: fdget()/fdput() switch o2hb_region_dev_write() to fget_light() proc_map_files_readdir(): don't bother with grabbing files make get_file() return its argument vhost_set_vring(): turn pollstart/pollstop into bool switch prctl_set_mm_exe_file() to fget_light() switch xfs_find_handle() to fget_light() switch xfs_swapext() to fget_light() ...
2012-10-03 11:25:04 +08:00
file = sock_alloc_file(csocket, 0, NULL);
if (IS_ERR(file)) {
pr_err("%s (%d): failed to map fd\n",
__func__, task_pid_nr(current));
kfree(p);
return PTR_ERR(file);
}
get_file(file);
p->wr = p->rd = file;
client->trans = p;
client->status = Connected;
p->rd->f_flags |= O_NONBLOCK;
p9_conn_create(client);
return 0;
}
/**
* p9_conn_destroy - cancels all pending requests of mux
* @m: mux to destroy
*
*/
static void p9_conn_destroy(struct p9_conn *m)
{
p9_debug(P9_DEBUG_TRANS, "mux %p prev %p next %p\n",
m, m->mux_list.prev, m->mux_list.next);
p9_mux_poll_stop(m);
cancel_work_sync(&m->rq);
if (m->rreq) {
p9_req_put(m->client, m->rreq);
m->rreq = NULL;
}
cancel_work_sync(&m->wq);
if (m->wreq) {
p9_req_put(m->client, m->wreq);
m->wreq = NULL;
}
p9_conn_cancel(m, -ECONNRESET);
m->client = NULL;
}
/**
* p9_fd_close - shutdown file descriptor transport
* @client: client instance
*
*/
static void p9_fd_close(struct p9_client *client)
{
struct p9_trans_fd *ts;
if (!client)
return;
ts = client->trans;
if (!ts)
return;
client->status = Disconnected;
p9_conn_destroy(&ts->conn);
if (ts->rd)
fput(ts->rd);
if (ts->wr)
fput(ts->wr);
kfree(ts);
}
/*
* stolen from NFS - maybe should be made a generic function?
*/
static inline int valid_ipaddr4(const char *buf)
{
int rc, count, in[4];
rc = sscanf(buf, "%d.%d.%d.%d", &in[0], &in[1], &in[2], &in[3]);
if (rc != 4)
return -EINVAL;
for (count = 0; count < 4; count++) {
if (in[count] > 255)
return -EINVAL;
}
return 0;
}
static int p9_bind_privport(struct socket *sock)
{
struct sockaddr_in cl;
int port, err = -EINVAL;
memset(&cl, 0, sizeof(cl));
cl.sin_family = AF_INET;
cl.sin_addr.s_addr = htonl(INADDR_ANY);
for (port = p9_ipport_resv_max; port >= p9_ipport_resv_min; port--) {
cl.sin_port = htons((ushort)port);
err = kernel_bind(sock, (struct sockaddr *)&cl, sizeof(cl));
if (err != -EADDRINUSE)
break;
}
return err;
}
static int
p9_fd_create_tcp(struct p9_client *client, const char *addr, char *args)
{
int err;
struct socket *csocket;
struct sockaddr_in sin_server;
struct p9_fd_opts opts;
err = parse_opts(args, &opts);
if (err < 0)
return err;
if (addr == NULL || valid_ipaddr4(addr) < 0)
return -EINVAL;
csocket = NULL;
client->trans_opts.tcp.port = opts.port;
client->trans_opts.tcp.privport = opts.privport;
sin_server.sin_family = AF_INET;
sin_server.sin_addr.s_addr = in_aton(addr);
sin_server.sin_port = htons(opts.port);
err = __sock_create(current->nsproxy->net_ns, PF_INET,
SOCK_STREAM, IPPROTO_TCP, &csocket, 1);
if (err) {
pr_err("%s (%d): problem creating socket\n",
__func__, task_pid_nr(current));
return err;
}
if (opts.privport) {
err = p9_bind_privport(csocket);
if (err < 0) {
pr_err("%s (%d): problem binding to privport\n",
__func__, task_pid_nr(current));
sock_release(csocket);
return err;
}
}
net: annotate data-races around sock->ops IPV6_ADDRFORM socket option is evil, because it can change sock->ops while other threads might read it. Same issue for sk->sk_family being set to AF_INET. Adding READ_ONCE() over sock->ops reads is needed for sockets that might be impacted by IPV6_ADDRFORM. Note that mptcp_is_tcpsk() can also overwrite sock->ops. Adding annotations for all sk->sk_family reads will require more patches :/ BUG: KCSAN: data-race in ____sys_sendmsg / do_ipv6_setsockopt write to 0xffff888109f24ca0 of 8 bytes by task 4470 on cpu 0: do_ipv6_setsockopt+0x2c5e/0x2ce0 net/ipv6/ipv6_sockglue.c:491 ipv6_setsockopt+0x57/0x130 net/ipv6/ipv6_sockglue.c:1012 udpv6_setsockopt+0x95/0xa0 net/ipv6/udp.c:1690 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3663 __sys_setsockopt+0x1c3/0x230 net/socket.c:2273 __do_sys_setsockopt net/socket.c:2284 [inline] __se_sys_setsockopt net/socket.c:2281 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2281 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff888109f24ca0 of 8 bytes by task 4469 on cpu 1: sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x349/0x4c0 net/socket.c:2503 ___sys_sendmsg net/socket.c:2557 [inline] __sys_sendmmsg+0x263/0x500 net/socket.c:2643 __do_sys_sendmmsg net/socket.c:2672 [inline] __se_sys_sendmmsg net/socket.c:2669 [inline] __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2669 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0xffffffff850e32b8 -> 0xffffffff850da890 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 4469 Comm: syz-executor.1 Not tainted 6.4.0-rc5-syzkaller-00313-g4c605260bc60 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023 Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230808135809.2300241-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08 21:58:09 +08:00
err = READ_ONCE(csocket->ops)->connect(csocket,
(struct sockaddr *)&sin_server,
sizeof(struct sockaddr_in), 0);
if (err < 0) {
pr_err("%s (%d): problem connecting socket to %s\n",
__func__, task_pid_nr(current), addr);
sock_release(csocket);
return err;
}
return p9_socket_open(client, csocket);
}
static int
p9_fd_create_unix(struct p9_client *client, const char *addr, char *args)
{
int err;
struct socket *csocket;
struct sockaddr_un sun_server;
csocket = NULL;
if (!addr || !strlen(addr))
return -EINVAL;
if (strlen(addr) >= UNIX_PATH_MAX) {
pr_err("%s (%d): address too long: %s\n",
__func__, task_pid_nr(current), addr);
return -ENAMETOOLONG;
}
sun_server.sun_family = PF_UNIX;
strcpy(sun_server.sun_path, addr);
err = __sock_create(current->nsproxy->net_ns, PF_UNIX,
SOCK_STREAM, 0, &csocket, 1);
if (err < 0) {
pr_err("%s (%d): problem creating socket\n",
__func__, task_pid_nr(current));
return err;
}
net: annotate data-races around sock->ops IPV6_ADDRFORM socket option is evil, because it can change sock->ops while other threads might read it. Same issue for sk->sk_family being set to AF_INET. Adding READ_ONCE() over sock->ops reads is needed for sockets that might be impacted by IPV6_ADDRFORM. Note that mptcp_is_tcpsk() can also overwrite sock->ops. Adding annotations for all sk->sk_family reads will require more patches :/ BUG: KCSAN: data-race in ____sys_sendmsg / do_ipv6_setsockopt write to 0xffff888109f24ca0 of 8 bytes by task 4470 on cpu 0: do_ipv6_setsockopt+0x2c5e/0x2ce0 net/ipv6/ipv6_sockglue.c:491 ipv6_setsockopt+0x57/0x130 net/ipv6/ipv6_sockglue.c:1012 udpv6_setsockopt+0x95/0xa0 net/ipv6/udp.c:1690 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3663 __sys_setsockopt+0x1c3/0x230 net/socket.c:2273 __do_sys_setsockopt net/socket.c:2284 [inline] __se_sys_setsockopt net/socket.c:2281 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2281 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff888109f24ca0 of 8 bytes by task 4469 on cpu 1: sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x349/0x4c0 net/socket.c:2503 ___sys_sendmsg net/socket.c:2557 [inline] __sys_sendmmsg+0x263/0x500 net/socket.c:2643 __do_sys_sendmmsg net/socket.c:2672 [inline] __se_sys_sendmmsg net/socket.c:2669 [inline] __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2669 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0xffffffff850e32b8 -> 0xffffffff850da890 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 4469 Comm: syz-executor.1 Not tainted 6.4.0-rc5-syzkaller-00313-g4c605260bc60 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023 Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230808135809.2300241-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08 21:58:09 +08:00
err = READ_ONCE(csocket->ops)->connect(csocket, (struct sockaddr *)&sun_server,
sizeof(struct sockaddr_un) - 1, 0);
if (err < 0) {
pr_err("%s (%d): problem connecting socket: %s: %d\n",
__func__, task_pid_nr(current), addr, err);
sock_release(csocket);
return err;
}
return p9_socket_open(client, csocket);
}
static int
p9_fd_create(struct p9_client *client, const char *addr, char *args)
{
int err;
struct p9_fd_opts opts;
err = parse_opts(args, &opts);
if (err < 0)
return err;
client->trans_opts.fd.rfd = opts.rfd;
client->trans_opts.fd.wfd = opts.wfd;
if (opts.rfd == ~0 || opts.wfd == ~0) {
pr_err("Insufficient options for proto=fd\n");
return -ENOPROTOOPT;
}
err = p9_fd_open(client, opts.rfd, opts.wfd);
if (err < 0)
return err;
p9_conn_create(client);
return 0;
}
static struct p9_trans_module p9_tcp_trans = {
.name = "tcp",
.maxsize = MAX_SOCK_BUF,
.pooled_rbuffers = false,
.def = 0,
.create = p9_fd_create_tcp,
.close = p9_fd_close,
.request = p9_fd_request,
.cancel = p9_fd_cancel,
.cancelled = p9_fd_cancelled,
.show_options = p9_fd_show_options,
.owner = THIS_MODULE,
};
MODULE_ALIAS_9P("tcp");
static struct p9_trans_module p9_unix_trans = {
.name = "unix",
.maxsize = MAX_SOCK_BUF,
.def = 0,
.create = p9_fd_create_unix,
.close = p9_fd_close,
.request = p9_fd_request,
.cancel = p9_fd_cancel,
.cancelled = p9_fd_cancelled,
.show_options = p9_fd_show_options,
.owner = THIS_MODULE,
};
MODULE_ALIAS_9P("unix");
static struct p9_trans_module p9_fd_trans = {
.name = "fd",
.maxsize = MAX_SOCK_BUF,
.def = 0,
.create = p9_fd_create,
.close = p9_fd_close,
.request = p9_fd_request,
.cancel = p9_fd_cancel,
.cancelled = p9_fd_cancelled,
.show_options = p9_fd_show_options,
.owner = THIS_MODULE,
};
MODULE_ALIAS_9P("fd");
/**
* p9_poll_workfn - poll worker thread
* @work: work queue
*
* polls all v9fs transports for new events and queues the appropriate
* work to the work queue
*
*/
static void p9_poll_workfn(struct work_struct *work)
{
unsigned long flags;
p9_debug(P9_DEBUG_TRANS, "start %p\n", current);
spin_lock_irqsave(&p9_poll_lock, flags);
while (!list_empty(&p9_poll_pending_list)) {
struct p9_conn *conn = list_first_entry(&p9_poll_pending_list,
struct p9_conn,
poll_pending_link);
list_del_init(&conn->poll_pending_link);
spin_unlock_irqrestore(&p9_poll_lock, flags);
p9_poll_mux(conn);
spin_lock_irqsave(&p9_poll_lock, flags);
}
spin_unlock_irqrestore(&p9_poll_lock, flags);
p9_debug(P9_DEBUG_TRANS, "finish\n");
}
static int __init p9_trans_fd_init(void)
{
v9fs_register_trans(&p9_tcp_trans);
v9fs_register_trans(&p9_unix_trans);
v9fs_register_trans(&p9_fd_trans);
return 0;
}
static void __exit p9_trans_fd_exit(void)
{
workqueue: deprecate flush[_delayed]_work_sync() flush[_delayed]_work_sync() are now spurious. Mark them deprecated and convert all users to flush[_delayed]_work(). If you're cc'd and wondering what's going on: Now all workqueues are non-reentrant and the regular flushes guarantee that the work item is not pending or running on any CPU on return, so there's no reason to use the sync flushes at all and they're going away. This patch doesn't make any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mattia Dongili <malattia@linux.it> Cc: Kent Yoder <key@linux.vnet.ibm.com> Cc: David Airlie <airlied@linux.ie> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: Bryan Wu <bryan.wu@canonical.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-wireless@vger.kernel.org Cc: Anton Vorontsov <cbou@mail.ru> Cc: Sangbeom Kim <sbkim73@samsung.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Avi Kivity <avi@redhat.com>
2012-08-21 05:51:24 +08:00
flush_work(&p9_poll_work);
v9fs_unregister_trans(&p9_tcp_trans);
v9fs_unregister_trans(&p9_unix_trans);
v9fs_unregister_trans(&p9_fd_trans);
}
module_init(p9_trans_fd_init);
module_exit(p9_trans_fd_exit);
MODULE_AUTHOR("Eric Van Hensbergen <ericvh@gmail.com>");
MODULE_DESCRIPTION("Filedescriptor Transport for 9P");
MODULE_LICENSE("GPL");