If I/O size is larger than rdma_readwrite_threshold, use RDMA write for
SMB read by specifying channel SMB2_CHANNEL_RDMA_V1 or
SMB2_CHANNEL_RDMA_V1_INVALIDATE in the SMB packet, depending on SMB dialect
used. Append a smbd_buffer_descriptor_v1 to the end of the SMB packet and fill
in other values to indicate this SMB read uses RDMA write.
There is no need to read from the transport for incoming payload. At the time
SMB read response comes back, the data is already transferred and placed in the
pages by RDMA hardware.
When SMB read is finished, deregister the memory regions if RDMA write is used
for this SMB read. smbd_deregister_mr may need to do local invalidation and
sleep, if server remote invalidation is not used.
There are situations where the MID may not be created on I/O failure, under
which memory region is deregistered when read data context is released.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
This patch is for preparing upper layer doing SMB read via RDMA write.
When RDMA write is used for SMB read, the returned data length is in
DataRemaining in the response packet. Reading it properly by adding a
parameter to specifiy where the returned data length is.
Add the defition for memory registration to wdata and return the correct
length based on if RDMA write is used.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
When sending I/O, if size is larger than rdma_readwrite_threshold we prepare
to send SMB write packet for a RDMA read via memory registration. The actual
I/O is done by remote peer through local RDMA hardware. Modify the relevant
fields in the packet accordingly, and append a smbd_buffer_descriptor_v1 to
the end of the SMB write packet.
On write I/O finish, deregister the memory region if this was for a RDMA read.
If remote invalidation is not used, the call to smbd_deregister_mr will do
local invalidation and possibly wait. Memory region is normally deregistered
in MID callback as soon as it's used. There are situations where the MID may
not be created on I/O failure, under which memory region is deregistered when
write data context is released.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Memory registration is used for transferring payload via RDMA read or write.
After I/O is done, memory registrations are recovered and reused. This
process can be time consuming and is done in a work queue.
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
With SMB Direct connected, use it for sending data via RDMA send.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
The transport doesn't maintain send buffers or send queue for transferring
payload via RDMA send. There is no data copy in the transport on send.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
With SMB Direct connected, use it for receiving data via RDMA receive.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
On the receive path, the transport maintains receive buffers and a reassembly
queue for transferring payload via RDMA recv. There is data copy in the
transport on recv when it copies the payload to upper layer.
The transport recognizes the RFC1002 header length use in the SMB
upper layer payloads in CIFS. Because this length is mainly used for TCP and
not applicable to RDMA, it is handled as a out-of-band information and is
never sent over the wire, and the trasnport behaves like TCP to upper layer
by processing and exposing the length correctly on data payloads.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
When connecting over SMB Direct, the transport negotiates its maximum I/O sizes
with the server and determines how to choose to do RDMA send/recv vs
read/write. Expose these maximum I/O sizes to upper layer so we will get the
correct sized payloads.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
When upper layer wants to umount, make it call shutdown on transport when
SMB Direct is used.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Add function to tear down a SMB Direct connection. This is used by upper layer
to free all SMB Direct connection and transport resources.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Do a reconnect on SMB Direct when it is used as the connection. Reconnect can
happen for many reasons and it's mostly the decision of SMB2 upper layer.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Add function to implement a reconnect to SMB Direct. This involves tearing down
the current connection and establishing/negotiating a new connection.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
When "rdma" is specified in the mount option, make CIFS connect to
SMB Direct.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Prevent build errors when CIFS=y and INFINIBAND=m.
fs/cifs/smbdirect.o: In function `smbd_qp_async_error_upcall':
smbdirect.c:(.text+0x28c): undefined reference to `ib_event_msg'
fs/cifs/smbdirect.o: In function `smbd_destroy_rdma_work':
smbdirect.c:(.text+0xfde): undefined reference to `ib_drain_qp'
smbdirect.c:(.text+0xfea): undefined reference to `rdma_destroy_qp'
smbdirect.c:(.text+0x12a0): undefined reference to `ib_free_cq'
smbdirect.c:(.text+0x12ac): undefined reference to `ib_free_cq'
smbdirect.c:(.text+0x12b8): undefined reference to `ib_dealloc_pd'
smbdirect.c:(.text+0x12c4): undefined reference to `rdma_destroy_id'
fs/cifs/smbdirect.o: In function `_smbd_get_connection':
smbdirect.c:(.text+0x168c): undefined reference to `rdma_create_id'
smbdirect.c:(.text+0x1713): undefined reference to `rdma_resolve_addr'
smbdirect.c:(.text+0x1780): undefined reference to `rdma_resolve_route'
smbdirect.c:(.text+0x17e3): undefined reference to `rdma_destroy_id'
smbdirect.c:(.text+0x183d): undefined reference to `rdma_destroy_id'
smbdirect.c:(.text+0x199d): undefined reference to `ib_alloc_cq'
smbdirect.c:(.text+0x19d9): undefined reference to `ib_alloc_cq'
smbdirect.c:(.text+0x1a89): undefined reference to `rdma_create_qp'
smbdirect.c:(.text+0x1b3c): undefined reference to `rdma_connect'
smbdirect.c:(.text+0x2538): undefined reference to `rdma_destroy_qp'
smbdirect.c:(.text+0x2549): undefined reference to `ib_free_cq'
smbdirect.c:(.text+0x255a): undefined reference to `ib_free_cq'
smbdirect.c:(.text+0x2563): undefined reference to `ib_dealloc_pd'
smbdirect.c:(.text+0x256c): undefined reference to `rdma_destroy_id'
smbdirect.c:(.text+0x25f0): undefined reference to `__ib_alloc_pd'
smbdirect.c:(.text+0x26bb): undefined reference to `rdma_disconnect'
fs/cifs/smbdirect.o: In function `smbd_disconnect_rdma_work':
smbdirect.c:(.text+0x62): undefined reference to `rdma_disconnect'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org (moderated for non-subscribers)
Signed-off-by: Steve French <smfrench@gmail.com>
If cifs_zap_mapping() returned an error, we would return without putting
the xid that we got earlier. Restructure cifs_file_strict_mmap() and
cifs_file_mmap() to be more similar to each other and have a single
point of return that always puts the xid.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
For use-configurable SMB Direct protocol values, export them to /proc/fs/cifs.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
The upper layer calls this function to connect to peer through SMB Direct.
Each SMB Direct connection is based on a RDMA RC Queue Pair.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Add code to implement the core functions to establish a SMB Direct connection.
1. Establish an RDMA connection to SMB server.
2. Negotiate and setup SMB Direct protocol.
3. Implement idle connection timer and credit management.
SMB Direct is enabled by setting CONFIG_CIFS_SMB_DIRECT.
Add to Makefile to enable building SMB Direct.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
To prepare for protocol implementation, add constants and user-configurable
values for the SMB Direct protocol.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Ronnie Sahlberg <lsahlber.redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Add "rdma" to CIFS mount options to connect to SMB Direct.
Add checks to validate this is used on SMB 3.X dialects.
To connect to SMBDirect, use "mount.cifs -o rdma,vers=3.x".
At the time of this patch, 3.x can be 3.0, 3.02 or 3.1.1.
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Ronnie Sahlberg <lsahlber.redhat.com>
Build SMB Direct code when this option is set.
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Ronnie Sahlberg <lsahlber.redhat.com>
This patch is for preparing upper layer for doing SMB read via RDMA write.
When we assemble the SMB read packet header, we need to know the I/O layout
if this request is to use a RDMA write. rdata has all the information we need
for memory registration. Add rdata to smb2_new_read_req.
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Ronnie Sahlberg <lsahlber.redhat.com>
In both functions, use an array of 8 (arbitrary but should be big enough
for all current uses) iov and avoid having to kmalloc the array
for the common case.
If 8 is too small, then fall back to the original behaviour and use
kmalloc/kfree.
This should not change any behaviour but should save us a tiny amount of
cpu cycles.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
This function is similar to SendReceive2 except it does not expect
a 4 byte rfc1002 length header in the first io vector.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Pull networking fixes from David Miller:
1) Avoid negative netdev refcount in error flow of xfrm state add, from
Aviad Yehezkel.
2) Fix tcpdump decoding of IPSEC decap'd frames by filling in the
ethernet header protocol field in xfrm{4,6}_mode_tunnel_input().
From Yossi Kuperman.
3) Fix a syzbot triggered skb_under_panic in pppoe having to do with
failing to allocate an appropriate amount of headroom. From
Guillaume Nault.
4) Fix memory leak in vmxnet3 driver, from Neil Horman.
5) Cure out-of-bounds packet memory access in em_nbyte EMATCH module,
from Wolfgang Bumiller.
6) Restrict what kinds of sockets can be bound to the KCM multiplexer
and also disallow when another layer has attached to the socket and
made use of sk_user_data. From Tom Herbert.
7) Fix use before init of IOTLB in vhost code, from Jason Wang.
8) Correct STACR register write bit definition in IBM emac driver, from
Ivan Mikhaylov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net/ibm/emac: wrong bit is used for STA control register write
net/ibm/emac: add 8192 rx/tx fifo size
vhost: do not try to access device IOTLB when not initialized
vhost: use mutex_lock_nested() in vhost_dev_lock_vqs()
i40e: flower: check if TC offload is enabled on a netdev
qed: Free reserved MR tid
qed: Remove reserveration of dpi for kernel
kcm: Check if sk_user_data already set in kcm_attach
kcm: Only allow TCP sockets to be attached to a KCM mux
net: sched: fix TCF_LAYER_LINK case in tcf_get_base_ptr
net: sched: em_nbyte: don't add the data offset twice
mlxsw: spectrum_router: Don't log an error on missing neighbor
vmxnet3: repair memory leak
ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL
pppoe: take ->needed_headroom of lower device into account on xmit
xfrm: fix boolean assignment in xfrm_get_type_offload
xfrm: Fix eth_hdr(skb)->h_proto to reflect inner IP version
xfrm: fix error flow in case of add state fails
xfrm: Add SA to hardware at the end of xfrm_state_construct()
STA control register has areas of mode and opcodes for opeations. 18 bit is
using for mode selection, where 0 is old MIO/MDIO access method and 1 is
indirect access mode. 19-20 bits are using for setting up read/write
operation(STA opcodes). In current state 'read' is set into old MIO/MDIO mode
with 19 bit and write operation is set into 18 bit which is mode selection,
not a write operation. To correlate write with read we set it into 20 bit.
All those bit operations are MSB 0 based.
Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
emac4syn chips has availability to use 8192 rx/tx fifo buffer sizes,
in current state if we set it up in dts 8192 as example, we will get
only 2048 which may impact on network speed.
Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code will try to access dev->iotlb when processing
VHOST_IOTLB_INVALIDATE even if it was not initialized which may lead
to NULL pointer dereference. Fixes this by check dev->iotlb before.
Fixes: 6b1e6cc785 ("vhost: new device IOTLB API")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We used to call mutex_lock() in vhost_dev_lock_vqs() which tries to
hold mutexes of all virtqueues. This may confuse lockdep to report a
possible deadlock because of trying to hold locks belong to same
class. Switch to use mutex_lock_nested() to avoid false positive.
Fixes: 6b1e6cc785 ("vhost: new device IOTLB API")
Reported-by: syzbot+dbb7c1161485e61b0241@syzkaller.appspotmail.com
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>